Senior DevOps Engineer
Gourav specializes in helping organizations design secure and scalable Kubernetes infrastructures on AWS.
When the customer reached out to us, they weren’t starting from scratch. Their Amazon EKS setup already reflected a lot of good decisions: IRSA was in place for IAM, IMDSv2 was enforced, and node provisioning was handled by Karpenter. They had event-driven scaling with KEDA, visibility through New Relic, and had even experimented with tools like Kubecost and CastAI to break down cloud costs. From the outside, things looked well-architected.
But under the hood, they were struggling with persistent inefficiencies. CPU utilization across their clusters was low. Amazon EC2 nodes weren’t being used effectively, and Karpenter’s bin-packing wasn't delivering the gains they expected. Despite all the right pieces being in place, they couldn’t translate observability into action. They also needed better visibility into how much each namespace or workload was costing them, but the third-party tools they tried were either too expensive or didn’t fit their workflows. What they really wanted was not more tooling, but clarity, alignment, and practical steps to make the most of the setup they had.
One of the first things we found was that their clusters, although configured well, weren’t running efficiently. Karpenter’s bin-packing was not effective, which meant that many of their Amazon EC2 nodes were underutilized. In some cases, average CPU usage was under 10 percent, even though pods had high memory requests. These issues were being caused by memory-heavy but CPU-light services that weren’t scheduled on the right instance types. Some services needed more than 7 GB of RAM but very little CPU. Since out-of-memory kills had occurred in the past, they had increased memory requests across the board, leading to large nodes that were mostly sitting idle.
We recommended moving these memory-heavy workloads to R-series instances, which offer higher memory-to-CPU ratios ideal for services that are RAM-intensive but compute-light. While R-series instances can be slightly more expensive on a per-vCPU basis, they make better use of allocated memory when CPU demand is low, helping improve overall bin-packing. The team had primarily been using M-series, which provides a balanced 1:4 memory-to-vCPU ratio. They found M-series more predictable across mixed workloads, particularly for services that don’t fully saturate memory or CPU. After discussion, we agreed that a blend of both families, using R-series for clearly memory-heavy deployments and M-series for more balanced ones, would achieve a better cost-to-performance ratio without compromising stability. which provide better memory-to-CPU ratios. The team explained they had been using M-series instances more often, as their memory-to-CPU ratio was usually 1:4. We agreed that M-series was still a good choice for many workloads, but emphasized that resource requests needed to be tuned based on actual usage. During this process, we also clarified that average CPU is a better metric for right-sizing decisions than max CPU, since container workloads can burst briefly without justifying higher provisioning.
We found that some node pools were being consolidated too aggressively. In earlier setups, pods were getting evicted frequently because of short idle timers, sometimes as short as one minute. This led to constant pod churn and unnecessary scheduling overhead. The team had already begun segmenting workloads into different node pools: API services on on-demand nodes, cron jobs on spot-backed pools, and worker services on their own pool. This was a strong move in the right direction. We helped further by reviewing their consolidation settings and advising more realistic consolidation windows between 15 and 30 minutes.
To right-size pods, we introduced them to the Vertical Pod Autoscaler. We recommended running it in staging first, where they enabled it in recommendation mode to safely observe its impact. In production, they set it to recommendation-only mode to avoid unplanned restarts. The VPA was left running for over 72 hours before applying any changes, which allowed it to gather enough historical usage data to provide stable and meaningful recommendations. Once these were implemented, they observed a steady drop in CPU over-requests and saw more efficient node utilization. Java-based services, in particular, benefited from this adjustment after the team updated heap configurations to enable more dynamic resource scaling. We recommended them to implement VPA on a stable on-demand instance.
Another problem was oversized instance types in Karpenter. We discovered this during a configuration audit of their NodeClaim policies, where instance types like 32xlarge and 48xlarge were unintentionally allowed. These configurations had gone unnoticed initially because they did not always result in active provisioning, but they created the risk of provisioning large, expensive nodes during bursts or capacity crunches. We helped the team identify these through a combination of reviewing Karpenter configurations and validating the NodeClaim templates directly. Once spotted, we worked together to restrict the instance types to a more appropriate range, targeting sizes like 4xlarge or 8xlarge, depending on the workload class. Their NodeClaims had allowed provisioning of massive instances like 32xlarge and 48xlarge, which wasn’t intentional. We reviewed the configuration and helped them restrict instance types to more practical sizes like 4xlarge or 8xlarge, depending on the workload class. They also set up a dedicated node pool for cron jobs that scaled to zero when not in use and consolidated workloads based on job frequency. This pool had a five-minute termination policy and was isolated from long-running API services.
For observability and cost tracking, they had started using Kubecost but were facing issues with configuration and visibility. We demonstrated how CloudKeeper Lens could offer detailed cost breakdowns by namespace, pod, and container without additional cost or setup. After validating the data in real time, they decided to rely on Cloudkeeper Lens going forward.
The impact of these changes was significant. CPU allocation dropped across clusters, and Amazon EC2 bin-packing improved. Pod churn decreased as consolidation settings were made more realistic. In some environments, pod lifetimes increased from a few hours to several days. The team also cleaned up unused NodePools, replaced over-provisioned workloads with right-sized ones, and started actively applying VPA recommendations.
As part of our final audit, we went through their Kubernetes configuration in depth. We found no signs of ENI exhaustion or networking issues. They were using large subnets with prefix delegation, IRSA for all pods, and no pods were running in privileged mode. About 50 percent of pods still ran as root, so we flagged that for future security hardening. We also advised moving from sidecar-based file logging to container-native stdout and stderr streams.
At the end of the engagement, the customer committed to applying the recommendations across all their production regions. Within the first 48 hours of implementation, they observed a meaningful drop in CPU over-allocation and saw pod restart rates decline significantly. Amazon EC2 usage reports showed that average CPU utilization rose from under 10 percent to closer to 25 percent in some clusters, and previously volatile node churn was almost eliminated. These early results gave the team confidence to roll out the changes across all production regions in a phased manner, with the expectation that further savings would materialize in the upcoming billing cycles. They confirmed VPA would remain in recreate mode for lower environments and off mode for production, with manual application of recommendations. They also planned to adopt Graviton-based instances after completing performance testing, and they confirmed that Karpenter’s behavior had stabilized. Earlier, their nodes were being replaced every few hours; now, pods were living much longer, with lower restart counts and fewer disruptions.
What we learned from this engagement is that having the right tools is only the first step. Effective Amazon EKS optimization is about understanding how workloads behave, how autoscaling interacts with requests and limits, and how to match infrastructure to application needs. Sometimes, just changing a consolidation timer or tuning a CPU request can make the difference between waste and efficiency. In this case, it made a measurable impact on both cost and stability.
Speak with our advisors to learn how you can take control of your Cloud Cost