Kubernetes Resource Requests and Limits are settings that define how much CPU and memory a container needs and how much it is allowed to use. Requests help Kubernetes decide where to schedule a Pod. Limits cap the maximum resources a container can consume at runtime.
Requests and limits help Kubernetes manage compute capacity across a cluster. They are usually set per container in a Pod spec.
For example, a container can request 250m CPU and 512Mi memory, while having a limit of 500m CPU and 1Gi memory. Kubernetes uses the request for scheduling. The container runtime enforces the limit.
When you create a Pod, the Kubernetes scheduler checks the resource requests for all containers in that Pod. It then finds a Node with enough available allocatable CPU and memory.
If a Pod requests 1 CPU and 2Gi memory, Kubernetes will only schedule it onto a Node that has at least that much unrequested capacity. This does not mean the Pod always uses that amount. It means Kubernetes reserves scheduling capacity for it.
Requests are especially important for production clusters because they help prevent too many workloads from being packed onto the same Node.
Limits define the maximum resources a container can use.
CPU is compressible, so Kubernetes can slow the container down. Memory is not compressible, so a container that uses more memory than its limit may receive an out-of-memory kill, often shown as OOMKilled.
A typical container resource configuration looks like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: example/api:1.0.0
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "1Gi"
In this example, each replica requests a quarter of a CPU core and 512 MiB of memory. Kubernetes schedules each Pod based on those requested values. The container can use up to half a CPU core and 1 GiB of memory before enforcement starts.
Kubernetes uses specific units for CPU and memory:
1 means one CPU core. 500m means 500 millicores, or half a core. 100m means one tenth of a core.Mi and Gi are binary units. 512Mi means 512 mebibytes. 1Gi means 1 gibibyte.For most teams, memory values such as 256Mi, 512Mi, 1Gi, and 2Gi are common starting points. CPU values such as 100m, 250m, 500m, and 1 are also common.
Requests and limits are related, but they solve different problems.
A common pattern is to set memory requests and limits, but be careful with CPU limits. Strict CPU limits can cause throttling and latency spikes for services with bursty traffic.
Kubernetes uses requests and limits to assign a Quality of Service, or QoS, class to each Pod. QoS affects eviction decisions when a Node is under resource pressure.
When a Node runs out of memory, Kubernetes usually evicts lower-priority and lower-QoS Pods before higher-QoS Pods, although priority, actual usage, and other factors also matter.
Several Kubernetes features depend on or interact with requests and limits:
If you manage Kubernetes resources with infrastructure as code, you can define these settings in Helm charts, Kustomize overlays, or Terraform-managed manifests. For example, teams that manage workloads through Terraform can follow similar patterns when they deploy Kubernetes resources using Terraform.
Suppose you run a Node.js API on Kubernetes. During normal traffic, each Pod uses about 150m CPU and 300Mi memory. During deploys or traffic spikes, it can briefly use 400m CPU and 700Mi memory.
A practical starting configuration might be:
200m500m, or no CPU limit if latency is sensitive and cluster policy allows it384Mi768Mi or 1GiYou would then watch real usage in Prometheus, Grafana, Datadog, CloudWatch, or another monitoring tool and adjust based on observed p95 or p99 behavior. Guessing once and never revisiting the values often leads to wasted capacity or unstable Pods.
OOMKilled, CPU throttling, restart count, and pending Pods as operational signals.Resource planning also matters during cluster upgrades. If Nodes are drained and workloads need to move, accurate requests make rescheduling more predictable. Teams planning upgrades can pair this with practical Kubernetes upgrade checks to reduce disruption.
Managed Kubernetes services such as Amazon Elastic Kubernetes Service, Google Kubernetes Engine, and Azure Kubernetes Service still rely on the same Kubernetes scheduling behavior. Cloud provider autoscaling and instance selection can make requests even more important because requested resources influence how efficiently Nodes are used.
For example, when running workloads on Amazon EKS, you may tune requests and limits alongside Node group sizes, instance types, and autoscaling policies. This applies to simple web services as well as heavier workloads such as Apache Airflow on AWS EKS.
Requests and limits are workload-level Kubernetes settings. They do not create cloud infrastructure by themselves. If you use Crossplane to provision cloud resources from Kubernetes, your application Pods still need proper resource settings.
For example, a team may use Crossplane to create an AWS database and then deploy an application that connects to it. The Crossplane resources manage cloud infrastructure, while the application's Deployment manages container CPU and memory. You can see this separation in workflows that deploy AWS resources using Crossplane on Kubernetes.
Kubernetes resource requests and limits define the expected and maximum CPU and memory for containers. Requests guide Pod scheduling and autoscaling calculations. Limits enforce runtime caps to protect Nodes and other workloads. Good values come from real usage data, load testing, and regular review.