How to Use Kubernetes Taints and Tolerations Without Stranding Pods

How to Use Kubernetes Taints and Tolerations Without Stranding Pods

Use taints and tolerations safely to control scheduling without stranding workloads.

Arthur Azrieli
Book Icon - Software Webflow Template
 min read

Kubernetes taints and tolerations are simple until a production rollout leaves pods stuck in Pending. The pressure usually comes from a real need: isolate noisy workloads, reserve expensive nodes, protect platform nodes, or keep specialized hardware available. The risk is that one missing toleration, one broad taint, or one mismatched node selector can strand workloads with no eligible place to run.

Use taints as a scheduling guardrail, not as the only scheduling rule. A taint repels pods. A toleration lets a pod ignore that repulsion. It does not force the pod onto that node. To place workloads safely, pair taints with labels, node affinity, resource requests, and rollout checks.

Understand what taints and tolerations actually do

A taint lives on a node. A toleration lives on a pod. The scheduler checks both when deciding whether a pod can run on a node.

A taint has three important parts:

  • key: the taint name, such as dedicated or workload
  • value: the taint value, such as batch or gpu
  • effect: what Kubernetes does when a pod does not tolerate the taint

Kubernetes supports these taint effects:

  • NoSchedule: new pods that do not tolerate the taint will not be scheduled onto the node. Existing pods stay where they are.
  • PreferNoSchedule: Kubernetes tries to avoid placing non-tolerating pods on the node, but it may still schedule them there if needed.
  • NoExecute: new pods that do not tolerate the taint will not be scheduled onto the node, and existing pods that do not tolerate it can be evicted.

That last detail matters. NoExecute can move running workloads, so treat it as an eviction control, not just a scheduling control.

Here is a basic taint on a node:

kubectl taint nodes worker-1 dedicated=batch:NoSchedule

Here is the matching toleration on a pod template:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-worker
spec:
  replicas: 3
  selector:
    matchLabels:
      app: batch-worker
  template:
    metadata:
      labels:
        app: batch-worker
    spec:
      tolerations:
        - key: "dedicated"
          operator: "Equal"
          value: "batch"
          effect: "NoSchedule"
      containers:
        - name: worker
          image: example.com/batch-worker:1.0.0
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"

This pod can now run on nodes tainted with dedicated=batch:NoSchedule. It can also run on untainted nodes unless you add a placement rule. That is one of the most common surprises.

Use labels and affinity to avoid accidental placement

A toleration is permission. It is not a preference and it is not a requirement. If you want a workload to run on a specific node pool, add a node label and a matching selector or affinity rule.

For example, label the nodes that should run batch workloads:

kubectl label nodes worker-1 workload=batch
kubectl label nodes worker-2 workload=batch

Then taint those nodes:

kubectl taint nodes worker-1 dedicated=batch:NoSchedule
kubectl taint nodes worker-2 dedicated=batch:NoSchedule

Now add both the toleration and node affinity to the workload:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-worker
spec:
  replicas: 3
  selector:
    matchLabels:
      app: batch-worker
  template:
    metadata:
      labels:
        app: batch-worker
    spec:
      tolerations:
        - key: "dedicated"
          operator: "Equal"
          value: "batch"
          effect: "NoSchedule"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: workload
                    operator: In
                    values:
                      - batch
      containers:
        - name: worker
          image: example.com/batch-worker:1.0.0
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"

This creates a cleaner scheduling contract:

  • The taint keeps unrelated workloads away from the batch nodes.
  • The toleration allows the batch workload to use those nodes.
  • The node affinity tells Kubernetes that the batch workload must run there.

If you manage Kubernetes manifests with infrastructure as code, keep taints, labels, and workload tolerations in the same review path. That reduces drift between node configuration and workload scheduling rules. For example, if you already deploy Kubernetes resources using Terraform, treat scheduling rules as part of the workload contract rather than a one-off cluster change.

Roll out taints safely in small steps

Do not taint a full node pool first. Start with visibility, then test on one node or one small node group.

  1. List current taints and labels. Know what is already in place before adding more rules.
  2. Find workloads already running on candidate nodes. A NoExecute taint can evict them. A NoSchedule taint will not, but future rescheduling may fail.
  3. Add tolerations to target workloads first. Roll out pod specs before you taint nodes.
  4. Add labels and affinity where you need required placement. Do not rely on tolerations alone.
  5. Taint one node or a small pool. Watch scheduler events and pending pods.
  6. Expand only after you verify the desired workloads land correctly.

Useful inspection commands:

kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

kubectl describe node worker-1

kubectl get pods -A --field-selector=status.phase=Pending

kubectl describe pod -n default batch-worker-abc123

When a pod is stranded, kubectl describe pod usually tells you why. Look at the Events section for scheduler messages such as:

  • had untolerated taint
  • didn't match Pod's node affinity/selector
  • Insufficient cpu
  • Insufficient memory

Those messages point to different fixes. If the pod has an untolerated taint error, add or correct the toleration. If it has a node affinity mismatch, check labels and selectors. If it has insufficient resources, the toleration is not the problem.

To remove a taint, append a trailing minus sign:

kubectl taint nodes worker-1 dedicated=batch:NoSchedule-

To apply a taint across labeled nodes:

kubectl taint nodes -l workload=batch dedicated=batch:NoSchedule

Be careful with bulk commands. Confirm the label selector first:

kubectl get nodes -l workload=batch

Choose the right taint effect for the job

The taint effect should match the operational outcome you want.

Use PreferNoSchedule when you want a soft boundary

PreferNoSchedule is useful when separation is helpful but not mandatory. For example, you may prefer to keep bursty background jobs off general-purpose nodes, but still allow Kubernetes to use spare capacity during pressure.

kubectl taint nodes worker-1 workload=batch:PreferNoSchedule

This is a safer starting point when you are learning how workloads behave. It gives the scheduler room to place pods if the cluster has limited options.

Use NoSchedule for hard admission control

NoSchedule is the common choice for dedicated node pools. Use it when unrelated pods should not land on a node class, such as nodes reserved for specific workload types.

kubectl taint nodes worker-1 dedicated=batch:NoSchedule

This will not evict existing pods. That makes it safer than NoExecute for most first rollouts.

Use NoExecute only when eviction is intended

NoExecute affects running pods. Pods without a matching toleration can be evicted from the node.

kubectl taint nodes worker-1 maintenance=true:NoExecute

You can allow a pod to remain temporarily with tolerationSeconds:

apiVersion: v1
kind: Pod
metadata:
  name: temporary-worker
spec:
  tolerations:
    - key: "maintenance"
      operator: "Equal"
      value: "true"
      effect: "NoExecute"
      tolerationSeconds: 300
  containers:
    - name: worker
      image: example.com/worker:1.0.0

In this example, the pod can stay on the tainted node for up to 300 seconds before eviction. Use this carefully with stateful workloads, long-running jobs, and anything that needs graceful shutdown time.

Watch for failure modes that strand pods

Most taint and toleration incidents come from small mismatches. These are the ones to check first.

  • The toleration key or value does not match. dedicated=batch and dedicated=jobs are different. Kubernetes will not infer intent.
  • The effect is missing or wrong. A toleration for NoSchedule does not automatically tolerate NoExecute.
  • The toleration is too broad. operator: Exists can tolerate every taint with that key. That may let workloads run on nodes they should avoid.
  • The pod tolerates the taint but has no placement rule. It may still run on untainted nodes unless you add node affinity or a node selector.
  • The pod has required node affinity that matches no nodes. A correct toleration will not help if no node has the expected label.
  • The target nodes lack resources. The scheduler still checks CPU, memory, pod count, volume constraints, and other requirements.
  • DaemonSets are missing tolerations. Node agents, log collectors, and monitoring pods often need tolerations so they can run on every intended node.
  • Autoscaling is not configured for tainted pools. If pending pods require a tainted node pool, confirm your autoscaler can provision nodes with the matching taints and labels.
  • Control plane or platform taints are ignored by broad tolerations. Avoid blanket tolerations unless you have a clear reason.

Here is an example of a broad toleration that you should avoid unless you really mean it:

tolerations:
  - operator: "Exists"

That tolerates all taints. It can defeat node isolation and place pods on nodes intended for other purposes.

A safer version names the exact taint:

tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "batch"
    effect: "NoSchedule"

For platform workloads such as schedulers, controllers, log agents, and monitoring agents, document why each toleration exists. If your team owns both application delivery and cluster operations, make that ownership explicit. A clear responsibility model matters as much as the YAML. If that ownership is still forming, this guide on how to build a DevOps team gives a useful way to think about operational boundaries.

Use a repeatable checklist before merging changes

Before you merge a change that adds or modifies taints and tolerations, run through this checklist.

  1. State the intent. Are you isolating workloads, reserving capacity, protecting nodes, or preparing maintenance?
  2. Pick the weakest effective taint. Start with PreferNoSchedule if soft separation is enough. Use NoSchedule for hard admission control. Use NoExecute only when eviction is expected.
  3. Add exact tolerations. Prefer explicit key, value, and effect.
  4. Add placement rules when needed. Use node affinity or node selectors so tolerated pods land where you expect.
  5. Check DaemonSets. Confirm node-level agents still run on tainted nodes.
  6. Check PodDisruptionBudgets. PodDisruptionBudgets, or PDBs, do not control initial scheduling, but they affect voluntary evictions and maintenance workflows.
  7. Check resource requests. A pod with valid tolerations can still stay pending if nodes lack requested CPU or memory.
  8. Apply in a small scope first. Use one node, one node pool, or one environment before a wider rollout.
  9. Read scheduler events. Do not guess. Use pod events to confirm the exact scheduling blocker.
  10. Document the contract. Record which workloads may tolerate which taints and why.

If your cluster resources are managed through Kubernetes-native control planes, keep the scheduling contract near the resource definitions. For example, teams that deploy AWS resources using Crossplane on Kubernetes often benefit from keeping workload, infrastructure, and operational rules reviewable through the same Git workflow. The same principle applies if you are running more complex workloads such as Apache Airflow on AWS Elastic Kubernetes Service, where schedulers, workers, and supporting services may have different placement needs.

Taints and tolerations work best when they are precise and boring. Use taints to repel the wrong pods, use tolerations to admit the right pods, and use labels plus affinity to control placement. Roll out one node pool at a time, inspect scheduler events, and avoid broad tolerations unless you can explain exactly why they are safe.