DevOps Dictionary

Kubernetes Resource Requests and Limits

Kubernetes Resource Requests and Limits are settings that define how much CPU and memory a container needs and how much it is allowed to use. Requests help Kubernetes decide where to schedule a Pod. Limits cap the maximum resources a container can consume at runtime.

What Kubernetes resource requests and limits do

Requests and limits help Kubernetes manage compute capacity across a cluster. They are usually set per container in a Pod spec.

  • Requests tell Kubernetes the minimum CPU and memory a container is expected to need.
  • Limits tell Kubernetes the maximum CPU and memory a container is allowed to use.

For example, a container can request 250m CPU and 512Mi memory, while having a limit of 500m CPU and 1Gi memory. Kubernetes uses the request for scheduling. The container runtime enforces the limit.

How requests work

When you create a Pod, the Kubernetes scheduler checks the resource requests for all containers in that Pod. It then finds a Node with enough available allocatable CPU and memory.

If a Pod requests 1 CPU and 2Gi memory, Kubernetes will only schedule it onto a Node that has at least that much unrequested capacity. This does not mean the Pod always uses that amount. It means Kubernetes reserves scheduling capacity for it.

Requests are especially important for production clusters because they help prevent too many workloads from being packed onto the same Node.

How limits work

Limits define the maximum resources a container can use.

  • CPU limits throttle the container when it tries to use more CPU than allowed.
  • Memory limits can cause the container to be killed if it exceeds the configured limit.

CPU is compressible, so Kubernetes can slow the container down. Memory is not compressible, so a container that uses more memory than its limit may receive an out-of-memory kill, often shown as OOMKilled.

Example Kubernetes manifest

A typical container resource configuration looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
        - name: api
          image: example/api:1.0.0
          resources:
            requests:
              cpu: "250m"
              memory: "512Mi"
            limits:
              cpu: "500m"
              memory: "1Gi"

In this example, each replica requests a quarter of a CPU core and 512 MiB of memory. Kubernetes schedules each Pod based on those requested values. The container can use up to half a CPU core and 1 GiB of memory before enforcement starts.

CPU and memory units

Kubernetes uses specific units for CPU and memory:

  • CPU: 1 means one CPU core. 500m means 500 millicores, or half a core. 100m means one tenth of a core.
  • Memory: Mi and Gi are binary units. 512Mi means 512 mebibytes. 1Gi means 1 gibibyte.

For most teams, memory values such as 256Mi, 512Mi, 1Gi, and 2Gi are common starting points. CPU values such as 100m, 250m, 500m, and 1 are also common.

Requests vs limits

Requests and limits are related, but they solve different problems.

  • Requests affect scheduling. Kubernetes uses them to decide whether a Node has enough capacity for a Pod.
  • Limits affect runtime behavior. The container runtime enforces them after the Pod starts.
  • Requests support reliability. They help ensure workloads get the capacity they are expected to need.
  • Limits protect the Node. They stop one container from consuming too much CPU or memory.

A common pattern is to set memory requests and limits, but be careful with CPU limits. Strict CPU limits can cause throttling and latency spikes for services with bursty traffic.

Quality of Service classes

Kubernetes uses requests and limits to assign a Quality of Service, or QoS, class to each Pod. QoS affects eviction decisions when a Node is under resource pressure.

  • Guaranteed: Every container has CPU and memory requests and limits, and each request equals its matching limit.
  • Burstable: At least one container has a CPU or memory request, but the Pod does not meet the Guaranteed rules.
  • BestEffort: No containers have CPU or memory requests or limits.

When a Node runs out of memory, Kubernetes usually evicts lower-priority and lower-QoS Pods before higher-QoS Pods, although priority, actual usage, and other factors also matter.

Common use cases

  • Production APIs: Set requests based on normal traffic and memory limits based on tested peak usage.
  • Background workers: Use requests to reserve enough CPU for predictable job throughput.
  • Batch jobs: Set limits to prevent a single job from exhausting Node memory.
  • Multi-tenant clusters: Combine requests and limits with ResourceQuota and LimitRange objects to control usage by namespace.
  • Autoscaling: Use CPU or memory requests as a baseline for Horizontal Pod Autoscaler calculations.

Related Kubernetes concepts

Several Kubernetes features depend on or interact with requests and limits:

  • ResourceQuota: Sets total resource usage constraints for a namespace, such as maximum requested CPU or memory.
  • LimitRange: Sets default requests and limits for containers in a namespace.
  • Horizontal Pod Autoscaler: Often uses CPU or memory utilization compared to requests to scale replica counts.
  • Cluster Autoscaler: May add Nodes when Pods cannot be scheduled due to insufficient requested capacity.
  • Eviction: Kubernetes can evict Pods when a Node is under memory, disk, or PID pressure.

If you manage Kubernetes resources with infrastructure as code, you can define these settings in Helm charts, Kustomize overlays, or Terraform-managed manifests. For example, teams that manage workloads through Terraform can follow similar patterns when they deploy Kubernetes resources using Terraform.

Simple real-world example

Suppose you run a Node.js API on Kubernetes. During normal traffic, each Pod uses about 150m CPU and 300Mi memory. During deploys or traffic spikes, it can briefly use 400m CPU and 700Mi memory.

A practical starting configuration might be:

  • CPU request: 200m
  • CPU limit: 500m, or no CPU limit if latency is sensitive and cluster policy allows it
  • Memory request: 384Mi
  • Memory limit: 768Mi or 1Gi

You would then watch real usage in Prometheus, Grafana, Datadog, CloudWatch, or another monitoring tool and adjust based on observed p95 or p99 behavior. Guessing once and never revisiting the values often leads to wasted capacity or unstable Pods.

Benefits

  • Better scheduling: Kubernetes can place Pods on Nodes with enough capacity.
  • Improved cluster stability: Limits reduce the chance that one workload starves others.
  • More accurate autoscaling: Autoscalers get a clearer baseline for utilization.
  • Cost control: Teams can see which services reserve too much capacity.
  • Safer multi-team clusters: Platform teams can apply namespace-level policies and defaults.

Tradeoffs and limitations

  • Bad requests waste capacity: Overstated requests leave allocatable Node resources unused from Kubernetes' scheduling point of view.
  • Low memory limits cause restarts: If memory limits are too tight, Pods may get killed during normal spikes.
  • CPU limits can hurt latency: CPU throttling can affect APIs, queues, and databases under load.
  • Defaults can hide problems: Namespace defaults from LimitRange are useful, but application teams still need workload-specific values.
  • They do not replace load testing: You still need realistic traffic tests to understand resource behavior.

Operational best practices

  • Set memory requests and limits for production workloads.
  • Use CPU requests for scheduling and autoscaling baselines.
  • Be cautious with CPU limits on latency-sensitive services.
  • Review actual usage before and after major releases.
  • Use namespace-level ResourceQuota for shared clusters.
  • Add LimitRange defaults so Pods do not run without any resource settings.
  • Track OOMKilled, CPU throttling, restart count, and pending Pods as operational signals.

Resource planning also matters during cluster upgrades. If Nodes are drained and workloads need to move, accurate requests make rescheduling more predictable. Teams planning upgrades can pair this with practical Kubernetes upgrade checks to reduce disruption.

How this applies to cloud Kubernetes

Managed Kubernetes services such as Amazon Elastic Kubernetes Service, Google Kubernetes Engine, and Azure Kubernetes Service still rely on the same Kubernetes scheduling behavior. Cloud provider autoscaling and instance selection can make requests even more important because requested resources influence how efficiently Nodes are used.

For example, when running workloads on Amazon EKS, you may tune requests and limits alongside Node group sizes, instance types, and autoscaling policies. This applies to simple web services as well as heavier workloads such as Apache Airflow on AWS EKS.

Where Crossplane and infrastructure provisioning fit

Requests and limits are workload-level Kubernetes settings. They do not create cloud infrastructure by themselves. If you use Crossplane to provision cloud resources from Kubernetes, your application Pods still need proper resource settings.

For example, a team may use Crossplane to create an AWS database and then deploy an application that connects to it. The Crossplane resources manage cloud infrastructure, while the application's Deployment manages container CPU and memory. You can see this separation in workflows that deploy AWS resources using Crossplane on Kubernetes.

Common mistakes

  • Leaving requests empty: Pods may overpack Nodes and become unstable under load.
  • Setting request equal to peak usage: This can waste large amounts of cluster capacity.
  • Setting memory limits too close to normal usage: Small spikes can trigger OOM kills.
  • Using the same values for every service: A lightweight API, Java service, worker, and database sidecar often need different settings.
  • Ignoring sidecars: Service mesh proxies, log shippers, and agents also consume CPU and memory.

Short definition

Kubernetes resource requests and limits define the expected and maximum CPU and memory for containers. Requests guide Pod scheduling and autoscaling calculations. Limits enforce runtime caps to protect Nodes and other workloads. Good values come from real usage data, load testing, and regular review.

A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
Y
X
Z