How to Drain Kubernetes Nodes Without Evicting Critical Workloads
Protect critical workloads during Kubernetes node drains using disruption controls.
Kubernetes liveness probes look simple until a deployment starts killing healthy pods. The usual pressure is real: you want broken containers restarted automatically, but a probe with aggressive thresholds can turn a slow startup, temporary CPU throttling, or a downstream outage into a restart loop.
The goal is not to make liveness probes “pass more often.” The goal is to use them only for conditions that a container restart can actually fix, and to give the application enough time to prove it is truly wedged before Kubernetes restarts it.
A liveness probe answers one narrow question: “Should Kubernetes restart this container?” If the probe fails enough times, the kubelet restarts the container according to the pod’s restart policy. That is powerful, but it is also blunt.
A liveness probe should detect problems such as:
A liveness probe should usually avoid checking:
If the database is unavailable and every pod fails its liveness probe because of that database check, Kubernetes will restart every pod. The restart will not fix the database. It will add load, increase cold starts, clear in-memory caches, and make recovery harder.
Use the right probe for the right job:
If you manage Kubernetes manifests through infrastructure as code, keep this distinction explicit in your modules or templates. The same principle applies whether you apply raw YAML, use Helm, or deploy Kubernetes resources using Terraform.
Begin with conservative thresholds, then tighten them after observing real startup time and failure behavior. Do not copy probe values across services without checking how each service starts, warms up, and handles load.
Here is a risky liveness probe:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: example/api:1.0.0
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 3
This can restart the container after roughly 20 seconds: 5 seconds of initial delay, then 3 failed probes spaced 5 seconds apart. If the service sometimes needs 45 seconds to load configuration, run migrations, compile templates, or warm caches, this pod can restart forever.
A safer pattern separates startup, readiness, and liveness:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: example/api:1.0.0
ports:
- name: http
containerPort: 8080
startupProbe:
httpGet:
path: /startupz
port: http
periodSeconds: 5
timeoutSeconds: 2
failureThreshold: 24
readinessProbe:
httpGet:
path: /readyz
port: http
periodSeconds: 5
timeoutSeconds: 2
failureThreshold: 3
successThreshold: 1
livenessProbe:
httpGet:
path: /livez
port: http
periodSeconds: 10
timeoutSeconds: 2
failureThreshold: 3
In this example, the startup probe allows up to about 120 seconds for startup: 24 failures multiplied by a 5-second period. While the startup probe is still failing, Kubernetes does not run the liveness probe. After startup succeeds, liveness begins.
This matters for applications with variable boot times, such as Java services, applications that load large models, services that hydrate caches, or workers that recover state during startup.
You should be able to explain every probe value in a deployment review. If the answer is “we copied it,” the probe is not tuned.
Use this rough calculation:
restart window ≈ initialDelaySeconds + (failureThreshold × periodSeconds)
For a liveness probe like this:
livenessProbe:
httpGet:
path: /livez
port: http
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 2
failureThreshold: 3
Kubernetes may restart the container after about 60 seconds of failing liveness checks. That may be fine for a stateless API with a 10-second normal startup time. It may be too aggressive for a batch worker that can pause during garbage collection, checkpoint recovery, or CPU pressure.
Use these starting points as a practical baseline, then adjust with real data:
failureThreshold × periodSeconds higher than your slow but acceptable startup time. If p95 startup is 70 seconds, start around 120 seconds.startupProbe for startup handling. Use initialDelaySeconds only when startup behavior is simple and predictable.Probe tuning should live next to deployment configuration, not in someone’s notes. If your platform team provisions services and dependencies through Kubernetes-native control planes, document probe defaults in the same place you manage patterns such as deploying AWS resources using Crossplane on Kubernetes.
The endpoint behind the probe matters more than the YAML. A clean probe configuration still fails if /health does too much work.
A good pattern is to expose separate endpoints:
/livez: checks that the process can respond and has not entered a fatal internal state./readyz: checks whether the application can serve traffic right now./startupz: checks whether initialization has completed.For example, an HTTP service might implement behavior like this:
GET /livez
200 OK if the process event loop is responsive
500 only if the process is internally unrecoverable
GET /readyz
200 OK if the service can accept traffic
503 if required dependencies are unavailable or the app is draining
GET /startupz
200 OK after bootstrapping is complete
503 while migrations, cache loading, or state recovery are still running
Here is a minimal Node.js example that keeps liveness local and pushes dependency checks into readiness:
import express from "express";
const app = express();
let started = false;
let shuttingDown = false;
async function checkDatabase() {
// Replace with a cheap ping or connection-pool status check.
// Do not run expensive queries here.
return true;
}
app.get("/startupz", (req, res) => {
if (started) {
return res.status(200).send("ok");
}
return res.status(503).send("starting");
});
app.get("/livez", (req, res) => {
// Keep this local. Do not check the database, cache, or third-party APIs.
return res.status(200).send("ok");
});
app.get("/readyz", async (req, res) => {
if (shuttingDown || !started) {
return res.status(503).send("not ready");
}
const databaseOk = await checkDatabase();
if (!databaseOk) {
return res.status(503).send("database unavailable");
}
return res.status(200).send("ok");
});
process.on("SIGTERM", () => {
shuttingDown = true;
setTimeout(() => process.exit(0), 10_000);
});
app.listen(8080, async () => {
// Run startup work here.
started = true;
});
For worker services without HTTP servers, an exec probe can work, but use it carefully. Every probe spawns a process inside the container. At scale, expensive exec probes can add measurable overhead.
livenessProbe:
exec:
command:
- /bin/sh
- -c
- test -f /tmp/worker-alive
periodSeconds: 15
timeoutSeconds: 2
failureThreshold: 4
If you use an exec probe, keep the command fast and deterministic. Avoid commands that call external systems, perform file tree scans, invoke package managers, or depend on shells that may not exist in minimal images.
Treat probe changes like production behavior changes. They can cause restarts, remove pods from service, and change rollout timing.
Use this rollout process:
startupProbe. Make sure the allowed startup window exceeds slow but valid startup.Useful commands:
kubectl get pods -n app
kubectl describe pod -n app api-7c9f6d8f7f-x2abc
kubectl logs -n app api-7c9f6d8f7f-x2abc --previous
kubectl get events -n app --sort-by=.lastTimestamp
Look for messages like:
Liveness probe failed: HTTP probe failed with statuscode: 503
Back-off restarting failed container
Readiness probe failed: Get "http://10.0.1.25:8080/readyz": context deadline exceeded
If --previous logs show that the application was still starting when it was killed, your startup window is too short or your liveness probe is starting too early. If logs show request latency spikes before liveness failures, your timeout may be too low or the endpoint may share overloaded application threads.
When Kubernetes is part of a larger platform rollout, probe settings should be reviewed with resource requests, limits, rollout strategy, and dependency provisioning. For example, an application deployed on Amazon Elastic Kubernetes Service (EKS) can still fail because a probe ignores slow boot behavior, even if the cluster itself is healthy. The same operational checks apply when you deploy Apache Airflow on AWS EKS or run a custom API service.
Most restart loops come from a few repeatable mistakes.
If /livez fails when PostgreSQL, Redis, Kafka, or an external API is unavailable, you are asking Kubernetes to restart your application because another system has a problem.
Move dependency checks to readiness:
readinessProbe:
httpGet:
path: /readyz
port: http
periodSeconds: 5
timeoutSeconds: 2
failureThreshold: 3
livenessProbe:
httpGet:
path: /livez
port: http
periodSeconds: 15
timeoutSeconds: 2
failureThreshold: 4
initialDelaySeconds can work for simple services, but it is a fixed delay. It does not adapt to real startup progress. A startupProbe lets Kubernetes wait until startup succeeds, then begins liveness checks.
Use startup probes for services with:
The default timeoutSeconds is 1 second. That can be too low for applications under CPU throttling, garbage collection, or temporary I/O pressure.
If probes fail during load but the application recovers without a restart, increase timeoutSeconds, increase failureThreshold, or make the probe endpoint cheaper. Also check CPU limits. A pod with a tight CPU limit can fail probes because the process cannot get scheduled quickly enough.
Health endpoints should avoid authentication middleware, redirects, rate limits, and expensive request logging. If your probe receives a 301, 302, 401, or 403, the kubelet may treat it as a failure depending on the probe behavior and response.
Make health routes boring:
Named ports reduce mistakes when container ports change:
ports:
- name: http
containerPort: 8080
livenessProbe:
httpGet:
path: /livez
port: http
If you use service meshes or sidecars, confirm whether the kubelet probes the application container directly or whether probe rewriting is active. Misconfigured sidecar behavior can make a healthy application look unhealthy.
Before you merge a liveness probe change, verify these points:
/livez checks only local process health./readyz handles dependency checks and traffic eligibility./startupz exists for slow or variable startup.timeoutSeconds is realistic for your runtime and CPU limits.Liveness probes are useful when they restart containers that cannot recover on their own. They become dangerous when they replace readiness checks, dependency monitoring, or startup handling. Start conservative, separate probe responsibilities, watch real failure events, and tune based on observed behavior rather than copied defaults.