Scans.AI is an AI-native platform that helps insurers, fleets, and automotive businesses turn complex inspections and claims into fast, reliable, data-driven decisions. It uses advanced automation and intelligent orchestration to simplify high-volume workflows.
Powered by cutting-edge AI, Scan.AI delivers 98%+ accuracy in damage detection and enables up to 15% savings per claim. The platform provides real-time orchestration across claims, inspections, and repairs for enterprises operating across Africa, India, and other global markets.
How Scans.AI Eliminated EKS Bottlenecks, Accelerated GPU
Workloads, and Reduced AWS Costs with CloudKeeper
How Scans.AI Eliminated EKS Bottlenecks, Accelerated GPU
Workloads, and Reduced AWS Costs with CloudKeeper

Overview
Challenges
As Scans.AI’s AI/ML workloads scaled on Amazon EKS, underlying platform inefficiencies began impacting performance, cost, and development velocity:
- Escalating AWS costs driven by EKS clusters running on outdated Kubernetes versions and incurring Extended Support Charges.
- Blocked EKS upgrades due to severe version skew across the control plane, kubelet, and critical add-ons.
- Excessive GPU startup latency, with inference and training workloads taking 15–20 minutes to initialize.
- Networking instability when enabling modern EKS features like Prefix Delegation, resulting in IP allocation failures.
- Slowed experimentation cycles, turning rapid AI iteration into long, inefficient development loops.
Scans.AI needed a structured, engineering-led approach to remove technical debt, restore performance, and regain cost efficiency across their EKS platform.
The Solution
Solution:
CloudKeeper partnered closely with Scans.AI’s engineering teams to deliver a phased EKS optimization and alignment strategy focused on stability first, followed by performance and efficiency gains.
- Conducted deep diagnostics to identify Kubernetes version skew across the control plane, self-managed node groups, kubelet, and kube-proxy.
- Designed and executed a safe, stepwise upgrade path involving controlled downgrades, node rotations, and add-on alignment.
- Successfully upgraded EKS clusters from Kubernetes 1.29 to 1.32, restoring upgrade hygiene and eliminating Extended Support risk.
- Identified subnet fragmentation as the root cause of Prefix Delegation failures.
- Guided migration of node groups to clean subnets to enable stable IP allocation and higher pod density.
- Realigned GPU node groups post-upgrade, resolving long-standing startup latency issues through EKS and GPU best practices.
This approach removed platform bottlenecks while unlocking both performance and cost improvements.
Impact
CloudKeeper’s engagement delivered measurable performance, cost, and platform stability improvements across Scans.AI’s EKS environment.
| Metrics | Outcomes |
| GPU Workload Startup Time | Optimized from 15–20 minutes to ~1 minute, delivering a 93% reduction in startup latency. |
| EKS Upgrade Readiness | Resolved critical version skew and upgraded clusters to Kubernetes 1.32, restoring upgrade hygiene and reducing operational risk. |
AWS Support Costs | Eliminated Extended Support Charges by aligning clusters with supported Kubernetes versions. |
| Pod Density & Networking | Enabled Prefix Delegation through subnet realignment, improving IP allocation stability and node utilization. |
Dramatically faster GPU workload startup and AI experimentation cycles
Fully unblocked and future-ready EKS upgrade posture
Elimination of unnecessary AWS Extended Support costs
Stable, high-density networking through functional Prefix Delegation
A clean, maintainable Kubernetes foundation built for scalable AI growth
Conclusion
Scans.AI’s partnership with CloudKeeper transformed their EKS platform from a constrained, high-cost environment into a stable, high-performance foundation for AI innovation. By resolving Kubernetes misalignment, eliminating networking constraints, and restoring upgrade hygiene, CloudKeeper enabled Scan.AI to move faster without compromising reliability or cost control.
With predictable infrastructure performance, reduced latency, and improved cost efficiency, Scan.AI is now well-positioned to confidently and sustainably scale its AI workloads.
Speak with our advisors to learn how you can take control of your Cloud Cost