How Scans.AI Eliminated EKS Bottlenecks, Accelerated GPU
Workloads, and Reduced AWS Costs with CloudKeeper

Industry:

AI & Automation

Headquarters:

Johannesburg, South Africa

Founded in:

2019

Company Size:

51–200 employees

Featured Tags:

AWS AI & Automation

Overview

Scans.AI is an AI-native platform that helps insurers, fleets, and automotive businesses turn complex inspections and claims into fast, reliable, data-driven decisions. It uses advanced automation and intelligent orchestration to simplify high-volume workflows.

Powered by cutting-edge AI, Scan.AI delivers 98%+ accuracy in damage detection and enables up to 15% savings per claim. The platform provides real-time orchestration across claims, inspections, and repairs for enterprises operating across Africa, India, and other global markets.

Challenges

As Scans.AI’s AI/ML workloads scaled on Amazon EKS, underlying platform inefficiencies began impacting performance, cost, and development velocity:

Escalating AWS costs driven by EKS clusters running on outdated Kubernetes versions and incurring Extended Support Charges.
Blocked EKS upgrades due to severe version skew across the control plane, kubelet, and critical add-ons.
Excessive GPU startup latency, with inference and training workloads taking 15–20 minutes to initialize.
Networking instability when enabling modern EKS features like Prefix Delegation, resulting in IP allocation failures.
Slowed experimentation cycles, turning rapid AI iteration into long, inefficient development loops.

Scans.AI needed a structured, engineering-led approach to remove technical debt, restore performance, and regain cost efficiency across their EKS platform.

The Solution

Solution:

CloudKeeper partnered closely with Scans.AI’s engineering teams to deliver a phased EKS optimization and alignment strategy focused on stability first, followed by performance and efficiency gains.

Phase 1: EKS Diagnosis and Upgrade Enablement

Conducted deep diagnostics to identify Kubernetes version skew across the control plane, self-managed node groups, kubelet, and kube-proxy.
Designed and executed a safe, stepwise upgrade path involving controlled downgrades, node rotations, and add-on alignment.
Successfully upgraded EKS clusters from Kubernetes 1.29 to 1.32, restoring upgrade hygiene and eliminating Extended Support risk.

Phase 2: Performance and Networking Optimization

Identified subnet fragmentation as the root cause of Prefix Delegation failures.
Guided migration of node groups to clean subnets to enable stable IP allocation and higher pod density.
Realigned GPU node groups post-upgrade, resolving long-standing startup latency issues through EKS and GPU best practices.

This approach removed platform bottlenecks while unlocking both performance and cost improvements.

Impact

CloudKeeper’s engagement delivered measurable performance, cost, and platform stability improvements across Scans.AI’s EKS environment.

Metrics	Outcomes
GPU Workload Startup Time	Optimized from 15–20 minutes to ~1 minute, delivering a 93% reduction in startup latency.
EKS Upgrade Readiness	Resolved critical version skew and upgraded clusters to Kubernetes 1.32, restoring upgrade hygiene and reducing operational risk.
AWS Support Costs	Eliminated Extended Support Charges by aligning clusters with supported Kubernetes versions.
Pod Density & Networking	Enabled Prefix Delegation through subnet realignment, improving IP allocation stability and node utilization.

With CloudKeeper, Scans.AI achieved:

Dramatically faster GPU workload startup and AI experimentation cycles

Fully unblocked and future-ready EKS upgrade posture

Elimination of unnecessary AWS Extended Support costs

Stable, high-density networking through functional Prefix Delegation

A clean, maintainable Kubernetes foundation built for scalable AI growth

Conclusion

Scans.AI’s partnership with CloudKeeper transformed their EKS platform from a constrained, high-cost environment into a stable, high-performance foundation for AI innovation. By resolving Kubernetes misalignment, eliminating networking constraints, and restoring upgrade hygiene, CloudKeeper enabled Scan.AI to move faster without compromising reliability or cost control.

With predictable infrastructure performance, reduced latency, and improved cost efficiency, Scan.AI is now well-positioned to confidently and sustainably scale its AI workloads.

Talk to our team

How Scans.AI Eliminated EKS Bottlenecks, Accelerated GPU Workloads, and Reduced AWS Costs with CloudKeeper

Solution:

Impact

Conclusion

How Scans.AI Eliminated EKS Bottlenecks, Accelerated GPU
Workloads, and Reduced AWS Costs with CloudKeeper