Autoscaling Optimization

Autoscaling Optimization

Tomasz Olszowy

Apr 3, 2026

6

min.

You have the cloud. But does it have to cost this much?

Most companies overpay for the cloud not because they have overly large systems, but because nobody has had the time to properly organize everything. Technological progress has now made it so that AI can do this for you—and it’s actually very good at it.

Alright, let’s get specific. Every DevOps engineer has experienced this and knows from conversations with management or financially‑driven decision‑makers that the topic of infrastructure costs for systems running at only partial utilization is rarely raised. What do I mean by that? I mean that your infrastructure is probably sleeping half the day, and you’re still paying for it. Servers stay switched on every night at 100% readiness to take on load. Unfortunately, this goes on for days, weeks, even months. The infrastructure was configured to handle load no matter what happens or when, and at some point in the past someone decided it was safer to scale “for the peak” than risk a collapse under load. The problem is that in the vast majority of cases this infrastructure either sees only light traffic or simply sleeps, and no one ever returned to reconfigure it, verify needs, or as a result, reduce costs.

This isn’t neglect. It’s actually a rational engineering approach, rooted in experience. The idea is that when something breaks at 4 a.m., it’s not a problem—the infrastructure has a margin left; in fact, more than one margin. This is a strategy to prevent outages that generate costs and damage reputation. But that “safety cushion” isn’t free. Suddenly, on a monthly scale, the cost of resources you actually need turns out to be up to three times higher.

True, traditional autoscaling was supposed to fix this, but the assumption turned out to be purely theoretical. In practice it worked reactively—waiting until CPU hit 70% before spinning up a new instance. The problem is that starting a new machine takes time, and from the user’s perspective that delay is noticeable. So engineers proactively add extra capacity, lowering the trigger threshold to 50% for launching a new instance. In extreme cases they even disable aggressive scale‑down entirely. These decisions often come from painful experiences that ended in incidents.

No matter how you look at it, this isn’t cheap.

  • 30% typical waste from over‑provisioning

  • 40% potential cost reduction with AI‑driven scaling

  • 3× faster response with predictive alerting

Where does AI fit in, and what does it actually change?

AI‑based tools approach scaling in an entirely different way. Instead of reacting to what is happening now, they predict what might happen! Scaling happens proactively, based on historical data: traffic from the last several months, time of day, day of the week, patterns tied to campaigns or seasonality. They build a model from this. That model can then determine scenarios such as: “traffic will spike in 20 minutes; reserve resources now.”

The result? You can set much lower baseline resources, because the “extra buffer” is no longer needed. The system is ready and can autonomously manage its performance before demand actually arises. The engineer stops treating over‑provisioning as a risk‑management strategy, because risk is now managed through prediction.

Another thing these tools do is right‑sizing—that is, checking whether the instance type you’re using is appropriate at all. Many companies still run on m5.xlarge because that’s what was set up two years ago. No one has checked whether it’s still necessary or even makes sense. AI analyzes the actual resource usage and states directly: “these instances are two sizes too large; move to t3.large and you’ll save 40% on this layer.”

Teams spending the most on the cloud are usually not the ones with the highest traffic. There simply hasn’t been anyone who had the time to sit down and optimize it.

From the perspective of the person controlling the budget, it looks like this: a one‑time investment of 2–3 weeks of an engineer’s work, after which you have a self‑running system that reduces costs every month and lowers your bill. Return on investment (ROI) is usually achieved within 6–8 weeks.

>> Case 1 — e‑commerce platform <<
Kubernetes that finally started „sleeping at night”

Tool used: KEDA + Prometheus ML Adapter

Medium‑sized online store. Kubernetes clusters sized for Black Friday, running at the same level the whole year. The team tried reducing the number of nodes at night, but every time a newsletter went out, an incident occurred. So they kept everything powered on. Safe—but expensive.

After implementing KEDA with the Prometheus adapter feeding a time‑series model trained on 18 months of traffic, the model quickly picked up the patterns: traffic on Thursday evenings always jumps around 19:00 (when emails are sent), Sunday mornings are quiet, and the week before holidays has a higher baseline. The system started pre‑scaling 25 minutes before the predicted spike and aggressively reducing resources for the rest of the week.

Result: the baseline number of nodes dropped by one‑third in off‑peak windows. Engineers stopped receiving nighttime alerts. The team gained about 6 hours per week that used to go toward manual interventions and incident reviews.

Business outcome: infrastructure costs were about 35% lower in the first two billing cycles. The configuration took a senior engineer roughly three weeks.

→ –35% infrastructure costs

>> Case 2 — B2B SaaS application <<
Reserved Instances and a hefty bill

Tool used: AWS Compute Optimizer + Spot Instance automation

A SaaS company selling software to businesses. The application layer runs on long‑term Reserved Instances bought three years ago, when the workload was much more memory‑heavy. Since then the architecture has changed and become more API‑driven, but no one revisited the instance types or demand analysis. Migration required planning, and the team was always busy with more urgent tasks. A classic scenario.

In the end, AWS Compute Optimizer was deployed to analyze 14 days of CloudWatch metrics and generate a report: 60% of the instances run below 25% CPU and 30% memory—consistently! Recommendation: move to a smaller instance family and shift batch processing to Spot Instances using an intelligently managed mixed fleet within an Auto Scaling Group.

The recommendations were reviewed and a decision was made in a single meeting. Implementation took two weeks, with two engineers performing a gradual rollout. Spot Instance interruptions—the thing they feared most—occurred twice in the first month. In both cases Auto Scaling handled them automatically, with zero impact on the users.

Business outcome: monthly EC2 costs were reduced by about 40%. The team now holds a 30‑minute monthly review of Compute Optimizer recommendations, which is a fixed point in their calendar.

→ –40% costs on EC2

What to do in practice

Frankly, in my opinion your engineering team almost certainly knows about these tools. The problem rarely lies in a lack of knowledge, but rather in the fact that cost optimization always loses priority to new features or bug fixes. Until a decision‑maker says clearly: “This is important. Put it on the High Priorities list.”

The ball is in your court. This isn’t a technical decision, but an organizational one. What’s needed is 2–3 weeks of a proper review and deployment of one of these tools. An investment that in most cases pays for itself before the end of the second month, after which the system runs autonomously.

Three questions for you…

  • When did someone last review your instances for proper right‑sizing? Not “a while ago”—specifically, when?

  • Is your scaling predictive, or are you still waiting for the CPU alarm to go off?

  • If you gave us two weeks purely for infrastructure optimization—how much do you think you could realistically save?

Don’t know?

Take a closer look at us—at what we do and how we do it. You’re already on our website!

// The technology is ready. It is mature. It doesn’t require rewriting your architecture.
// The only question is: is this a priority for you?

© 2026 QualityMinds, All rights reserved

© 2026 QualityMinds, All rights reserved

© 2026 QualityMinds, All rights reserved