DevOps Dictionary

Kubernetes Volume

Kubernetes Volume is a storage resource attached to a Pod so containers can read, write, share, or persist data. It gives containers access to storage that can live beyond a single container process, depending on the volume type and how it is configured.

What a Kubernetes Volume does

Containers are usually ephemeral. If a container crashes or restarts, files written to its writable layer can disappear. A Kubernetes Volume solves this by mounting storage into one or more containers inside a Pod.

Volumes are commonly used to:

  • Share files between containers in the same Pod.
  • Keep application data after a container restarts.
  • Mount configuration, secrets, or certificates as files.
  • Attach cloud disks, network storage, or local node storage to workloads.
  • Provide scratch space for temporary processing.

How it works

A volume is defined in the Pod spec under volumes. Each container that needs access to that storage declares a volumeMount, which sets the mount path inside the container filesystem.

For example, a Pod can define a volume named app-data and mount it into a container at /var/lib/app. The application writes to that path as if it were a normal local directory, while Kubernetes handles the mount based on the configured volume type.

Common volume types

  • emptyDir: Temporary storage created when a Pod starts and deleted when the Pod is removed. Useful for caches, scratch files, and sidecar sharing.
  • configMap: Mounts configuration data as files inside a container.
  • secret: Mounts sensitive data, such as tokens or TLS certificates, as files.
  • hostPath: Mounts a file or directory from the node filesystem. This is powerful but risky and tightly couples the Pod to a node.
  • persistentVolumeClaim: Mounts durable storage requested through a PersistentVolumeClaim, often backed by a cloud disk or network storage system.
  • CSI volumes: Use Container Storage Interface drivers to connect Kubernetes to storage providers such as AWS EBS, Azure Disk, Google Persistent Disk, Ceph, or NetApp.

Kubernetes Volume vs PersistentVolume vs PersistentVolumeClaim

These terms are related, but they are not the same:

  • Volume: The storage definition attached to a Pod.
  • PersistentVolume, or PV: A cluster-level storage resource, usually provisioned manually or dynamically by a storage controller.
  • PersistentVolumeClaim, or PVC: A request for storage made by a workload or user. The Pod mounts the PVC as a volume.
  • StorageClass: Defines how dynamic storage should be provisioned, such as disk type, reclaim policy, encryption, or availability zone behavior.

In practice, stateful workloads usually mount a persistentVolumeClaim volume rather than referencing a raw PersistentVolume directly.

Common use cases

  • Databases: PostgreSQL, MySQL, MongoDB, and similar systems need durable storage for data files.
  • Message queues: Kafka, RabbitMQ, and similar tools may need persistent disks for logs, queues, or metadata.
  • CI workloads: Build Pods may use emptyDir for temporary source code, dependencies, or artifacts.
  • Sidecar patterns: One container writes logs or generated files while another reads and ships them.
  • Configuration delivery: Applications can receive config files through ConfigMap volumes and certificates through Secret volumes.

Simple example

A web application Pod might use two volumes:

  • An emptyDir volume mounted at /tmp for temporary uploads during request handling.
  • A configMap volume mounted at /etc/app so the application can read its runtime settings from files.

If the app also stores user-uploaded files permanently, it should use a PVC backed by durable storage instead of emptyDir.

Benefits

  • Clear Pod-level storage model: Storage is declared with the workload, which makes manifests easier to review and version.
  • Container sharing: Multiple containers in the same Pod can mount the same volume.
  • Support for durable workloads: PVC-based volumes let Kubernetes run stateful systems when paired with the right storage backend.
  • Cloud and on-prem support: CSI drivers allow Kubernetes to work with many storage systems through a common interface.

Limitations and operational concerns

  • Pod lifecycle matters: Some volumes, such as emptyDir, are deleted when the Pod is deleted.
  • Storage is provider-specific: Performance, failover behavior, volume expansion, and multi-attach support depend on the storage backend.
  • Zone placement can break scheduling: A cloud disk in one availability zone may prevent a Pod from running in another zone.
  • Backups are your responsibility: Kubernetes mounts storage, but it does not automatically create application-consistent backups.
  • hostPath can be dangerous: It can expose node files to containers and should be restricted in most production clusters.

How teams manage volumes in production

Platform teams usually standardize storage through StorageClasses, PVC templates, and deployment tooling. For example, a team may offer separate StorageClasses for general-purpose SSD, high-throughput disks, and encrypted storage.

If you manage infrastructure declaratively, you can define Kubernetes workloads and storage-related resources with tools such as Terraform. This is common in GitOps and infrastructure-as-code workflows, including setups that deploy Kubernetes resources using Terraform.

Teams that use Kubernetes as a control plane for cloud infrastructure may also provision cloud storage and related resources through Crossplane. For example, Crossplane can help define AWS resources from Kubernetes APIs, as shown in workflows for deploying AWS resources using Crossplane on Kubernetes.

Best practices

  • Use PVCs for data that must survive Pod deletion or rescheduling.
  • Use emptyDir only for temporary data.
  • Avoid hostPath unless you have a clear node-level requirement.
  • Set storage requests carefully, such as 20Gi for a small app database or 500Gi for a log-heavy workload.
  • Check whether your storage backend supports volume expansion before relying on it.
  • Test restore procedures, not only backup creation.
  • Use StatefulSets for workloads that need stable identities and persistent storage per replica.
  • Review storage behavior before cluster upgrades, especially CSI driver compatibility. This should be part of your Kubernetes upgrade planning.

Real-world example

Suppose you deploy Apache Airflow on Amazon EKS. The Airflow webserver may use ConfigMap and Secret volumes for configuration and credentials. Workers may use temporary volumes for task execution. The metadata database, usually PostgreSQL, needs durable storage or a managed database service. If you run supporting components inside the cluster, the volume choice affects reliability, recovery time, and scheduling behavior. A practical deployment guide, such as one for deploying Apache Airflow on AWS EKS, should account for these storage decisions.

Key takeaway

A Kubernetes Volume is the way a Pod gets access to storage. Use temporary volumes for short-lived files, ConfigMap and Secret volumes for file-based configuration, and PVC-backed volumes for durable application data. The right choice depends on lifecycle, performance, security, and recovery requirements.

A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
Y
X
Z