MeteorOps | DevOps Dictionary | Kubernetes Taints and Tolerations

Kubernetes Taints and Tolerations

Kubernetes Taints and Tolerations are scheduling controls that let you keep certain pods off certain nodes unless those pods explicitly accept the node’s taint. A taint marks a node with a restriction. A toleration is added to a pod so Kubernetes may schedule that pod onto a matching tainted node.

What Kubernetes taints and tolerations do

Taints and tolerations help you control where workloads can run in a Kubernetes cluster. They are useful when some nodes should be reserved, isolated, or treated differently from the rest of the node pool.

For example, you might taint GPU nodes so regular web services do not land on expensive GPU capacity. Only machine learning pods with the matching toleration should be allowed there.

How they work

A taint is applied to a node. It has three main parts:

Key: the taint name, such as workload or dedicated.
Value: an optional value, such as gpu or airflow.
Effect: the scheduling behavior Kubernetes should apply.

A common taint looks like this:

dedicated=gpu:NoSchedule

A pod must define a matching toleration to be considered for scheduling on that node:

tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "gpu"
    effect: "NoSchedule"

The important detail: a toleration allows a pod to run on a tainted node, but it does not force the pod to run there. If you need to attract pods to specific nodes, combine tolerations with node selectors, node affinity, or topology spread constraints.

Taint effects

Kubernetes supports three taint effects:

NoSchedule: Kubernetes will not schedule new pods onto the node unless they have a matching toleration.
PreferNoSchedule: Kubernetes will try to avoid scheduling pods onto the node, but it may still do so if needed.
NoExecute: Kubernetes will evict already-running pods that do not tolerate the taint, and it will block new pods without a matching toleration.

NoSchedule is the most common effect for dedicated node pools. NoExecute is stronger and is often used for node health conditions or temporary isolation.

Common use cases

Dedicated node pools: Reserve nodes for workloads such as databases, CI runners, Apache Airflow workers, or GPU jobs. For example, teams running workflow platforms may use dedicated nodes when they deploy Apache Airflow on AWS EKS.
Special hardware: Keep normal pods off nodes with GPUs, high-memory instances, local NVMe disks, or licensed software.
Workload isolation: Separate noisy, sensitive, or high-priority workloads from general application workloads.
Control plane protection: Prevent application pods from running on control plane nodes in clusters where those nodes are schedulable.
Maintenance and upgrades: Temporarily stop new pods from landing on nodes during planned work. This often appears alongside cordon, drain, and rollout planning during Kubernetes upgrades for startups.
Platform controllers: Keep controllers and infrastructure automation workloads on stable nodes, such as when running tools that manage cloud resources through Kubernetes, including setups that deploy AWS resources using Crossplane on Kubernetes.

Simple real-world example

Assume you have a node pool for CI jobs. These jobs are CPU-heavy and can disrupt production services if they share the same nodes.

You can taint the CI nodes:

kubectl taint nodes ci-node-1 dedicated=ci:NoSchedule

Then add a matching toleration to the CI runner pods:

tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "ci"
    effect: "NoSchedule"

To make sure CI pods prefer or require those nodes, add node affinity or a node selector, such as:

nodeSelector:
  workload: ci

In that setup, the taint keeps unrelated pods away. The toleration lets CI pods run there. The node selector directs CI pods to the intended node pool.

Taints vs tolerations vs node affinity

These features are related, but they solve different scheduling problems:

Taints: repel pods from a node.
Tolerations: let specific pods ignore matching taints.
Node selectors: require pods to run on nodes with specific labels.
Node affinity: provides more flexible rules for choosing nodes based on labels.

A practical pattern is to use taints and tolerations for exclusion, then use node affinity for placement. For example, taint a database node pool with dedicated=db:NoSchedule, add a matching toleration to database pods, and add node affinity so those pods target nodes labeled workload=db.

Key limitations and tradeoffs

Tolerations do not guarantee placement: A pod with a toleration can run on a tainted node, but Kubernetes may still place it elsewhere unless you add placement rules.
Too many taints can make scheduling harder: If every node pool has unique taints, pods need precise tolerations or they may remain pending.
Misconfigured tolerations can weaken isolation: A broad toleration using operator: Exists may allow pods onto nodes you meant to reserve.
NoExecute can evict running pods: Use it carefully, especially for stateful workloads or services without enough replicas.
Autoscaling behavior matters: Cluster autoscalers need enough information to add the right node groups for pending pods.

Operational tips

Use clear taint keys, such as dedicated, workload, or hardware.
Keep taint values consistent across environments, such as gpu, ci, db, or airflow.
Pair taints with labels so you can use node affinity or node selectors.
Check pending pods with kubectl describe pod. Scheduler messages usually explain which taint was not tolerated.
Manage taints and tolerations declaratively where possible. If your team manages Kubernetes manifests with infrastructure as code, you can apply the same discipline when you deploy Kubernetes resources using Terraform.

Quick summary

Kubernetes taints and tolerations give you a practical way to control workload placement. Taints mark nodes as restricted. Tolerations let approved pods run on those nodes. For predictable scheduling, use them with node labels, node selectors, and node affinity.

DevOps Dictionary