The Opportunity
We are seeking a Platform Specialist (Director level) to serve as the organization's top technical authority on Kubernetes and the most senior hands-on engineer for CK-Kube, our Kubernetes Cost Intelligence platform. This is a deep individual contributor role — 60%+ hands-on engineering — where you will architect, implement, and technically lead CK-Kube as the principal engineer. You will set the technical direction, write production code, and drive architectural decisions. We are not looking for a people manager; we are looking for the strongest Kubernetes systems engineer we can find.
What You'll Own
CK-Tuner-Kubernetes — Kubernetes Cost Intelligence Platform
Architect and implement the cost allocation engine — cluster, namespace, deployment, pod, and container granularity across EKS, AKS, and GKE
Design and build the real-time data collection pipeline: agent architecture, ClickHouse time-series storage, gRPC streaming between agent and datastore
Implement Karpenter integration for node lifecycle management and bin-packing optimization
Build custom Kubernetes controllers and operators for cost policy enforcement and automated remediation
Design shared cost distribution algorithms — system namespaces, control plane costs, networking overhead, idle capacity attribution
Integrate CK-Tuner-Kubernetes with CK-Lens for a unified cloud + container cost view
Container Optimization Engine
Design and implement container right-sizing algorithms for CPU and memory requests/limits based on real usage patterns
Build node pool optimization logic — instance type selection, scaling policies, bin-packing efficiency scoring
Implement Karpenter-based spot and preemptible node policies for fault-tolerant workloads
Build the automated right-sizing execution pipeline via CK-Tuner integration
GPU Container Cost Intelligence
GPU utilization tracking and idle GPU detection for AI/ML workloads running on Kubernetes
Multi-cluster GPU cost comparison across EKS, AKS, and GKE
Integration with the FinOps for AI initiative for GPU pod-level cost attribution
Responsibilities
Technical Leadership
Serve as CK-Tuner-Kubernetes's principal architect and most senior hands-on engineer
Set architectural standards and code quality bars; mentor engineers through technical pairing and design reviews
Drive technical roadmap and architecture decisions in partnership with Product Management
Hands-On Engineering
Write production Go code for CK-Tuner-Kubernetes's core systems: agent data collection, metrics processing, cost allocation engine
Design and implement custom Kubernetes controllers and operators
Build and optimize the ClickHouse time-series data model for cost metrics at scale
Implement gRPC streaming with backpressure, circuit breakers, and mTLS between agent and datastore
Develop Karpenter-based node optimization policies and consolidation algorithms
Performance-tune the metrics pipeline: 10-second scrape intervals, 1-minute rollups, multi-cluster aggregation
Technical Strategy
Design the agent data collection layer — hybrid metrics collection via Metrics API, Kubelet Summary, Kubelet Proxy, and optional Prometheus endpoints
Architect the ClickHouse time-series schema with materialized views for multi-resolution aggregation (5m, 1h, 1d)
Build the delta processing pipeline — in-memory state comparison with ring buffers (discovery 10K, metrics 50K, events 100K)
Design cost allocation algorithms for shared resources — control plane, networking, system namespaces, idle capacity
Architect multi-cloud Kubernetes support (EKS primary, AKS/GKE Phase 4) with provider-specific pricing API integrations
Build integration points with CK-Lens, CK-Tuner, and CK-Intelligence
Technical Landscape You'll Navigate
Kubernetes & Container Orchestration
Platforms: EKS (Fargate, managed node groups), AKS, GKE (Autopilot, standard), on-prem Kubernetes
Ecosystem: OpenCost, Karpenter, Helm, Kubernetes Operators, K8s API Server
Resource Management: Requests/limits, node autoscaling, pod scheduling, bin-packing, spot/preemptible nodes
Kubernetes Internals: Custom controllers, operators, CRDs, admission webhooks, scheduler plugins, informers, leader election, reconciliation loops
Data Engineering
ClickHouse (time-series analytics), Apache Pulsar/NATS JetStream (message broker), gRPC bidirectional streaming with backpressure
Cloud Providers
AWS: EKS, Fargate, EC2 (GPU instances), S3, CloudWatch, Cost & Usage Reports
Azure: AKS, Azure Monitor, Azure Billing APIs
GCP: GKE, GKE Autopilot, BigQuery Billing Export
Requirements
Experience
10+ years in systems/platform/infrastructure engineering with deep hands-on Kubernetes production experience (EKS, AKS, or GKE)
Track record of personally designing and implementing complex distributed systems — not just overseeing teams that build them
Experience building Kubernetes tooling: operators, controllers, CLI tools, or platform products
Prior work on cost/resource optimization, observability, or infrastructure intelligence platforms preferred
Experience with container orchestration at scale — multi-cluster, multi-cloud preferred
Technical Depth
Expert-level: Kubernetes internals (scheduler, controller-manager, kubelet, API server), resource management, pod lifecycle
Hands-on: Custom controller/operator development using controller-runtime or client-go
Production experience with Karpenter, OpenCost, or equivalent node/cost optimization tools
