Resource Optimization
Resource Optimization shows how AI and cloud cost tools can help teams identify wasted infrastructure capacity, rightsize resources, and automate scaling decisions to keep performance stable while reducing spend. By combining visibility, analysis, and automation, it turns resource management from a reactive cost-control task into a proactive way to improve efficiency and support the business.

Tomasz Olszowy
•
5
min.
Resource Optimization in DevOps: how to stop a premium customer cluster from becoming a bottomless pit
Friday, 7:01 a.m. The IT department. Just a few more hours until the weekend. Nothing unusual is happening on the general communication channel. Silence and calm. Every Friday should look like this. Unfortunately, there is tension in the background, the kind of tension so characteristic of non-obvious issues. Technically, everything is OK — all lights are green, and the Kubernetes platform serving premium customers is operating without incidents. The SLA is also being met. Yet everyone knows that something is not quite as it should be. The bills... the infrastructure bills are starting to grow very quickly.
This is one of those situations where there is a problem, no outage, but money is disappearing and, worst of all, someone has to explain it and account for it. After migrating to the cloud, the team did exactly what it had done before: it transferred its habit of leaving extra headroom from on-premises infrastructure into the cloud. Why they did this was neither strange nor unreasonable. They did it out of common sense, based on experience and pragmatism. The difference is that while such a tactic in local infrastructure does not theoretically generate steadily increasing costs, in the cloud it is exactly the opposite. A little more CPU — just in case, a little more RAM and more nodes, because premium customers pay for a well-scaled service, but above all for one that is always available and performant — with no lag. The problem was that a plan based on the “a little more” philosophy started costing more and more after a few months.
The assumption was simple. The cluster served premium customers from the financial and logistics sectors, so the team was naturally hesitant to take risks. Nobody wanted to be the one who reduced CPU requests the day before a traffic spike and then had to explain in a meeting why the application started lagging. That is exactly why requests and limits were set generously, test environments were left running after hours, and some nodes did very little while still increasing monthly costs.
After a brainstorming session, the department decided to make changes. Three tools were launched: AWS Cost Explorer, Kubecost, and Karpenter. Cost Explorer created rightsizing recommendations for EC2 and helped diagnose which instances were “underutilized” relative to their size and performance parameters. Kubecost identified costs at the namespace, workload, and pod levels. Karpenter added automatic node fitting and consolidation so that the cluster would stop maintaining infrastructure “out of habit.” Thanks to the Cost Optimization Stack tools and the analysis of the data they produced, the team adopted a new strategy. First, it needed to deal with the current challenge of identifying all the places where resources were being wasted, and in the next stage to develop cloud cost-optimization procedures, implement them, and thus establish new habits within the team.
The next stage was AI “joining the team.” Not in the role of a miracle worker, but as a patient analyst. AI analyzed data from Cost Explorer and Kubecost, compared the history of CPU, RAM, and storage usage, and then identified a recurring pattern. It turned out that for most of the week, several key services used only about 20–30% of their allocated resources. That was the moment when it became clear that some infrastructure decisions had simply turned into a very costly form of caution.
The team moved into action in stages. First, they performed rightsizing for selected EC2 instances in line with Cost Explorer’s recommendations. Then they lowered requests and limits for services that were clearly oversized. The next step was implementing stronger consolidation through Karpenter. This tool can scale node demand in the cluster so as to reduce their number when the cluster can run just as stably with fewer machines. In addition, the team decided to shut down test environments that were not used after hours and cleaned up old snapshots and volumes. Some of them had been there only because no one had been available to “clean them up,” or because they might still be useful one day.
One example of a correction looked quite modest:
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "1Gi"
Previously, in those same services, the requests had been nearly twice as high. On paper, that looked fairly “safe.” In practice, however, the company was paying for a comfort margin that the application almost never used. And that is exactly where real Resource Optimization begins. It is not about brutally cutting every resource, but about aligning them with the system’s actual behavior and real demand.
The first tangible change noticed by the team was the time spent auditing the cluster. Previously, a full audit took two days of work from two engineers. It required manual chart analysis, correlating costs with load, analyzing service behavior during peak hours, and then defending the conclusions in a meeting. After Kubecost and AI analysis were implemented, the same process was shortened to about four hours. That is roughly an 80% saving in the time spent on the analysis itself. Suddenly, the team had room for engineering work instead of “archaeological” digging through metrics.
The changes also had a very positive impact on costs. After the first month, compute costs fell by about 33%, and after full consolidation and proper rightsizing, the savings reached 37%. These numbers showed management that the decisions made had been the right ones. What is more, the changes also began to improve the company’s image. For what was probably the first time, the IT department stopped being associated only with maintaining infrastructure and reacting to problems, and started being seen as a team that analyzes and protects the budget. A team that makes decisions based on data and knows how to combine service stability with cost responsibility.
This change was also extremely important for the team itself. Morale improved, part of the monotonous manual work disappeared, and people, convinced that their actions were producing tangible results, regained a sense of purpose in their engagement. That is highly valuable. Engineers stopped being the people who merely watched whether the platform was burning money again overnight. They became architects of the environment once more. Team meetings brought less complaining, fewer ironic jokes about expensive, “oversized” solutions, and more satisfaction from a job well done.
From an image and reputation standpoint, the company also benefited. The strongest proof was the fact that customers saw that DevOps could not only deliver uptime but also guard cost efficiency while maintaining quality. This matters because Resource Optimization is not an accounting add-on to infrastructure, but part of mature platform management. If an environment is efficient, stable, and financially sound at the same time, the team stops being a necessary cost. It starts becoming a business partner.
And perhaps that is the most important lesson in this story. AI, Kubecost, Cost Explorer, and Karpenter did not create a revolution because they were fashionable. They worked because they combined visibility, rightsizing, and automation into one optimized process. The premium cluster remained premium. Sized to the customer’s needs. “Tailored” to those needs like a well-fitted suit. And that was the difference that could be felt everywhere: in the budget and in the team’s mood. The feeling that the infrastructure was finally working intelligently became a milestone for further change. Change for the better.


