How Scans.AI Eliminated EKS Bottlenecks, Accelerated GPU 
Workloads, and Reduced AWS Costs with CloudKeeper

Logo
Industry:
AI & Automation
Headquarters:
Johannesburg, South Africa
Founded in:
2019
Company Size:
51–200 employees

Overview

Scans.AI is an AI-native platform that helps insurers, fleets, and automotive businesses turn complex inspections and claims into fast, reliable, data-driven decisions. It uses advanced automation and intelligent orchestration to simplify high-volume workflows.

Powered by cutting-edge AI, Scan.AI delivers 98%+ accuracy in damage detection and enables up to 15% savings per claim. The platform provides real-time orchestration across claims, inspections, and repairs for enterprises operating across Africa, India, and other global markets.
 

Challenges

As Scans.AI’s AI/ML workloads scaled on Amazon EKS, underlying platform inefficiencies began impacting performance, cost, and development velocity:

  • Escalating AWS costs driven by EKS clusters running on outdated Kubernetes versions and incurring Extended Support Charges.
  • Blocked EKS upgrades due to severe version skew across the control plane, kubelet, and critical add-ons.
  • Excessive GPU startup latency, with inference and training workloads taking 15–20 minutes to initialize.
  • Networking instability when enabling modern EKS features like Prefix Delegation, resulting in IP allocation failures.
  • Slowed experimentation cycles, turning rapid AI iteration into long, inefficient development loops.

Scans.AI needed a structured, engineering-led approach to remove technical debt, restore performance, and regain cost efficiency across their EKS platform.

The Solution

Solution: 

CloudKeeper partnered closely with Scans.AI’s engineering teams to deliver a phased EKS optimization and alignment strategy focused on stability first, followed by performance and efficiency gains.

Phase 1: EKS Diagnosis and Upgrade Enablement
  • Conducted deep diagnostics to identify Kubernetes version skew across the control plane, self-managed node groups, kubelet, and kube-proxy.
  • Designed and executed a safe, stepwise upgrade path involving controlled downgrades, node rotations, and add-on alignment.
  • Successfully upgraded EKS clusters from Kubernetes 1.29 to 1.32, restoring upgrade hygiene and eliminating Extended Support risk.
Phase 2: Performance and Networking Optimization
  • Identified subnet fragmentation as the root cause of Prefix Delegation failures.
  • Guided migration of node groups to clean subnets to enable stable IP allocation and higher pod density.
  • Realigned GPU node groups post-upgrade, resolving long-standing startup latency issues through EKS and GPU best practices.
     

This approach removed platform bottlenecks while unlocking both performance and cost improvements.

Impact

CloudKeeper’s engagement delivered measurable performance, cost, and platform stability improvements across Scans.AI’s EKS environment.

Metrics                Outcomes                         
GPU Workload Startup TimeOptimized from 15–20 minutes to ~1 minute, delivering a 93% reduction in startup latency.
 
EKS Upgrade ReadinessResolved critical version skew and upgraded clusters to Kubernetes 1.32, restoring upgrade hygiene and reducing operational risk.

AWS Support Costs

Eliminated Extended Support Charges by aligning clusters with supported Kubernetes versions.
Pod Density & NetworkingEnabled Prefix Delegation through subnet realignment, improving IP allocation stability and node utilization.
With CloudKeeper, Scans.AI achieved:

Dramatically faster GPU workload startup and AI experimentation cycles

logo

Fully unblocked and future-ready EKS upgrade posture

logo

Elimination of unnecessary AWS Extended Support costs

logo

Stable, high-density networking through functional Prefix Delegation

logo

A clean, maintainable Kubernetes foundation built for scalable AI growth

logo

Conclusion

Scans.AI’s partnership with CloudKeeper transformed their EKS platform from a constrained, high-cost environment into a stable, high-performance foundation for AI innovation. By resolving Kubernetes misalignment, eliminating networking constraints, and restoring upgrade hygiene, CloudKeeper enabled Scan.AI to move faster without compromising reliability or cost control.

With predictable infrastructure performance, reduced latency, and improved cost efficiency, Scan.AI is now well-positioned to confidently and sustainably scale its AI workloads.

 

Talk to our team

Other Success Stories
  • Fundamento Logo

    How Fundamento improved AI reliability, GKE stability & cloud efficiency

    v
  • logo

    How ZenduIT improved cloud visibility, storage governance & monthly cost savings

    v
  • Nanonets Logo

    How Nanonets gained full FinOps visibility & reduced GCP costs with CloudKeeper

    v

Speak with our advisors to learn how you can take control of your Cloud Cost