MeteorOps | DevOps Dictionary

Kubernetes Node

A Kubernetes Node is a worker machine in a Kubernetes cluster that runs pods and provides the CPU, memory, storage, and networking those pods need. A node can be a virtual machine, a physical server, or a cloud instance, and it is managed by the Kubernetes control plane.

What a Kubernetes Node does

A node is where application workloads actually run. When you create a Deployment, Job, StatefulSet, or DaemonSet, Kubernetes schedules the resulting pods onto available nodes based on resource requests, constraints, and cluster state.

Each node contributes capacity to the cluster, such as:

CPU for application processing
Memory for containers and system processes
Ephemeral storage for temporary files, logs, and container layers
Networking so pods can communicate with each other and with external services
Attached storage access when pods use PersistentVolumes

How a node works

A Kubernetes node runs several core components that allow it to accept and manage pods:

kubelet: The node agent. It talks to the Kubernetes API server, starts pods, reports node health, and makes sure containers match the desired pod spec.
Container runtime: Runs containers on the node. Common examples include containerd and CRI-O.
kube-proxy: Handles Service networking rules on the node, usually through iptables, IPVS, or similar mechanisms.
Container Network Interface plugin: Provides pod networking. Examples include Cilium, Calico, Flannel, and AWS VPC CNI.

When a pod is created, the scheduler chooses a suitable node. The kubelet on that node receives the pod spec, asks the container runtime to pull images and start containers, then keeps reporting status back to the control plane.

Node types

Most clusters have more than one kind of node, especially in production environments.

Worker nodes: Run application pods and supporting workloads such as logging agents, service meshes, and monitoring collectors.
Control plane nodes: Run Kubernetes control plane components such as the API server, scheduler, controller manager, and etcd. In managed Kubernetes services, these are often hidden from you.
Specialized nodes: Run specific workloads, such as GPU jobs, high-memory services, ingress controllers, or storage-heavy applications.

Common use cases

Running application workloads: API services, web apps, background workers, and scheduled jobs.
Separating workload classes: For example, running production traffic on one node pool and CI jobs on another.
Supporting infrastructure workloads: Ingress controllers, DNS, observability agents, and security scanners.
Scaling capacity: Adding more nodes when the cluster needs more CPU or memory.
Handling specialized hardware: Running machine learning workloads on GPU nodes or high-throughput services on compute-optimized instances.

Node scheduling and placement

Kubernetes uses scheduling rules to decide where pods should run. These rules help teams control cost, reliability, compliance, and performance.

Resource requests and limits: Tell Kubernetes how much CPU and memory a pod needs and how much it may use.
Labels: Add metadata to nodes, such as node-type=gpu or environment=production.
Node selectors: Place pods only on nodes with specific labels.
Node affinity: Define stronger or softer placement preferences than simple node selectors.
Taints and tolerations: Keep general workloads away from dedicated nodes unless pods explicitly tolerate them.
Pod disruption budgets: Help control how many pods can be unavailable during voluntary disruptions, such as node maintenance.

For example, you might taint a GPU node pool so regular web services do not consume expensive GPU instances by mistake. Only pods with the right toleration would be scheduled there.

Node lifecycle

A node moves through several operational states during its life in a cluster:

Provisioned: A VM or server is created by a cloud provider, autoscaler, Terraform, Crossplane, or another provisioning tool.
Registered: The kubelet joins the cluster and registers the node with the API server.
Ready: The node can accept pods.
NotReady: Kubernetes cannot confirm that the node is healthy, often because of network, kubelet, runtime, or host issues.
Cordoned: The node is marked unschedulable, so Kubernetes will not place new pods on it.
Drained: Existing pods are safely evicted, usually before maintenance or termination.
Removed: The node is deleted from the cluster after shutdown, replacement, or scale-down.

Teams often automate node creation with infrastructure-as-code tools. For example, you can manage Kubernetes infrastructure with Terraform-based Kubernetes workflows or provision cloud resources through Crossplane running on Kubernetes.

Node health and reliability

Kubernetes tracks node health through status conditions. Common conditions include:

Ready: Whether the node is healthy enough to run pods.
MemoryPressure: Whether the node is low on available memory.
DiskPressure: Whether the node is low on disk space.
PIDPressure: Whether the node is running out of process IDs.
NetworkUnavailable: Whether pod networking is unavailable.

If a node becomes unhealthy, Kubernetes may stop scheduling new pods there. If the node stays unreachable, pods may be rescheduled elsewhere, depending on workload type, tolerations, and cluster settings.

Node pools and managed Kubernetes

In managed Kubernetes platforms such as Amazon EKS, Google Kubernetes Engine, and Azure Kubernetes Service, nodes are often organized into node pools or node groups. A node pool is a set of nodes with similar configuration, such as instance type, operating system image, labels, taints, and autoscaling settings.

For example, an EKS cluster might have:

A small on-demand node group for system workloads
A larger autoscaling node group for production services
A spot instance node group for fault-tolerant batch jobs
A GPU node group for model training or inference

If you run data platforms on Kubernetes, node sizing matters. A workload such as Apache Airflow on EKS may need separate placement rules for schedulers, workers, web servers, and supporting services, as shown in this guide to deploying Apache Airflow on AWS EKS.

Kubernetes Node vs. Pod

A node is the machine that provides compute capacity. A pod is the smallest deployable workload unit in Kubernetes.

Node: Runs many pods and belongs to the cluster infrastructure layer.
Pod: Runs one or more containers and belongs to the application workload layer.

For example, a cluster may have 6 nodes. A production API Deployment may run 12 pod replicas spread across those nodes for availability.

Kubernetes Node vs. Cluster

A Kubernetes cluster is the full system, including the control plane and all nodes. A node is one machine within that cluster.

Cluster: The complete Kubernetes environment.
Node: A single compute machine inside the cluster.

Simple example

Suppose you run a SaaS application on Kubernetes with three services: an API, a frontend, and a background worker. Your cluster has 4 nodes, each with 4 vCPUs and 16 GB of memory.

Kubernetes might place:

2 API pods on node 1
2 API pods and 1 frontend pod on node 2
2 worker pods on node 3
Monitoring, ingress, and spare capacity on node 4

If node 2 fails, Kubernetes marks it unhealthy and starts replacement pods on the remaining nodes if enough capacity exists. In a cloud environment with cluster autoscaling, a new node may be created to restore capacity.

Operational considerations

Right-size nodes: Very small nodes can increase overhead. Very large nodes can increase failure impact when one node goes down.
Set pod requests: Without CPU and memory requests, the scheduler cannot make good placement decisions.
Use multiple availability zones: Spread nodes across zones when your cloud provider and application architecture support it.
Plan upgrades carefully: Node upgrades require draining and replacing capacity. For startup teams, these practical Kubernetes upgrade tips can help reduce avoidable downtime.
Monitor node pressure: Watch memory, disk, CPU saturation, kubelet errors, and network plugin issues.
Separate critical workloads: Use node pools, labels, taints, and tolerations for workloads with different reliability or cost profiles.

Key takeaway

A Kubernetes Node is the compute unit that runs your pods. The control plane decides what should run, but nodes provide the actual runtime environment. Good node design affects performance, reliability, cost, upgrades, and incident response, so it is one of the core building blocks every Kubernetes operator needs to understand.

DevOps Dictionary