DevOps Engineer
Aman is a skilled Cloud Infrastructure Specialist with expertise in designing, managing, and optimising scalable cloud environments.
While deploying a database on Amazon EKS, suddenly your pods crashed with `Too many open files` errors, followed by AWS CNI plugin failures breaking your networking? Here is the step following which you can solve it:-
Running workloads on Amazon Elastic Kubernetes Service (EKS) is generally smooth, but like any distributed system, unexpected problems arise. Recently, I encountered a tricky issue while deploying a database on an EKS cluster. The deployment kept failing with ulimit (file descriptor) errors and was soon followed by AWS CNI plugin failures, which broke pod networking.
While spinning up a database pod, the logs showed errors like:
Initially, this appeared to be a straightforward resource configuration issue. However, soon after, the AWS CNI plugin pods (`aws-node`) were restarting repeatedly with errors such as:
This created a failure: not only was the database pod crashing, but also the other workloads started failing due to broken pod networking.
I have listed the breakdown into three parts :
I modified the Launch Template to apply proper ulimit values via AWS EC2 user data:
#!/bin/bash
echo "fs.file-max = 2097152" >> /etc/sysctl.conf
echo "* soft nofile 65535" >> /etc/security/limits.conf
echo "* hard nofile 65535" >> /etc/security/limits.conf
ulimit -n 65535
I performed a rolling replacement of nodes in the Amazon EKS Node Group so that new instances inherited the updated limits.
Finally, I restarted the aws-node DaemonSet:
kubectl rollout restart ds aws-node -n kube-system
This reloaded the CNI plugin with the corrected host limits.
What looked like a database-specific error turned out to be a deeper issue with EKS worker node system limits. By adjusting the ulimit settings in the Launch Template and restarting the AWS CNI plugin, I restored stability to the cluster.
Speak with our advisors to learn how you can take control of your Cloud Cost