Kubernetes at Scale: Patterns We Use in Production

Running Kubernetes in production is fundamentally different from running it in development. Our DevOps team manages clusters serving millions of requests daily for clients across logistics, e-commerce, and SaaS — and the patterns we rely on have been refined through years of real incidents.

Key Patterns

Resource limits and requests are non-negotiable. Every container gets explicit CPU and memory bounds. We use Vertical Pod Autoscaler in recommendation mode to right-size workloads, and Horizontal Pod Autoscaler with custom metrics (not just CPU) for scaling decisions.

For high availability, we run multi-zone node pools with pod anti-affinity rules and PodDisruptionBudgets. Database workloads get dedicated node pools with local SSDs. We learned the hard way that mixing stateful and stateless workloads on the same nodes leads to cascading failures during node pressure events.

Observability

We standardized on Prometheus + Grafana with custom dashboards per service, plus distributed tracing via OpenTelemetry. The investment in observability pays for itself within the first production incident — you cannot fix what you cannot see.