MeteorOps | DevOps Dictionary | Kubernetes VerticalPodAutoscaler (VPA)

Kubernetes VerticalPodAutoscaler (VPA)

Definition

Kubernetes VerticalPodAutoscaler (VPA), or Vertical Pod Autoscaler, is a Kubernetes autoscaling component that recommends or updates pod CPU and memory requests based on real workload usage. In practical terms, VPA helps your pods request the right amount of resources so Kubernetes can schedule them more accurately and your cluster can avoid both waste and resource pressure.

What VPA does

VPA focuses on vertical scaling, which means changing the resources assigned to each pod rather than changing the number of pod replicas.

Analyzes usage: VPA looks at historical and current CPU and memory usage for a workload.
Recommends requests: It calculates suggested CPU and memory requests for containers.
Applies requests: Depending on its mode, it can update pod requests automatically.
Improves scheduling: Better requests help the Kubernetes scheduler place pods on nodes with enough capacity.
Reduces manual tuning: Teams do not need to guess resource requests for every service and job.

How VPA works

VPA is usually installed as a set of Kubernetes custom resources and controllers. It is not a basic field on a Deployment or StatefulSet. You define a VerticalPodAutoscaler object that targets a workload, such as a Deployment, StatefulSet, or other supported controller.

The main VPA components are:

Recommender: Watches resource usage and calculates recommended CPU and memory requests.
Admission controller: Applies recommended requests when new pods are created, depending on the VPA configuration.
Updater: Evicts existing pods when their requests need to change, if automatic updates are enabled.

Kubernetes normally applies pod resource requests at pod creation time. Because of that, VPA often needs to recreate pods to change their requests. This makes rollout behavior important, especially for production workloads.

VPA update modes

VPA supports several update modes, which control whether it only reports recommendations or actively changes pods.

Off: VPA only provides recommendations. It does not change pods. This is useful for audits and safe initial testing.
Initial: VPA sets resource requests only when pods are created. It does not evict running pods later.
Recreate: VPA can evict pods so they are recreated with updated resource requests.
Auto: VPA automatically applies updates using the available update mechanism for the cluster version and VPA implementation.

Many teams start with Off mode, compare recommendations with current requests, then move selected workloads to Initial or Recreate once they understand the impact.

Common use cases

Right-sizing long-running services: For example, a backend API that was given 2 CPU and 4Gi memory but usually uses 300m CPU and 700Mi memory.
Reducing noisy scheduling failures: Accurate requests reduce cases where pods cannot be placed because requests are too large.
Handling workloads with unclear resource profiles: New services, internal tools, and vendor applications often start with guessed requests.
Improving bin packing: Better pod requests help clusters use node capacity more efficiently.
Tuning batch or data workloads: Jobs with stable but non-obvious resource usage can benefit from VPA recommendations.

For example, if you run Apache Airflow workers on Kubernetes, VPA can help identify whether worker pods are over-requesting memory or hitting CPU pressure during task execution. This can be useful alongside a managed setup such as Apache Airflow on AWS EKS.

Benefits

More accurate resource requests: VPA uses observed behavior rather than fixed guesses.
Better scheduling decisions: Kubernetes schedules pods based on requests, so request accuracy matters.
Less resource waste: Over-requested pods reserve capacity they do not use.
Lower risk of under-requesting: Under-requested workloads may suffer throttling, evictions, or unstable performance.
Useful recommendations: Even in Off mode, VPA gives platform teams data they can use in reviews and pull requests.

Tradeoffs and limitations

Pod restarts may occur: In automatic modes, VPA may evict pods so updated requests can take effect.
It does not add replicas: VPA changes per-pod requests. It does not scale replica counts like Horizontal Pod Autoscaler.
It can conflict with HPA: Avoid using VPA and HPA on the same CPU or memory signal for the same workload unless you design the setup carefully.
It needs good rollout controls: Use readiness probes, multiple replicas, and PodDisruptionBudgets for critical services.
It depends on metrics quality: Missing or incomplete metrics reduce recommendation quality.
It may not fit every workload: Very spiky workloads, latency-sensitive systems, and single-replica services need extra care.

Because VPA can recreate pods, treat it like any other change that affects workload availability. During maintenance or cluster upgrades, eviction behavior matters. The same planning used for Kubernetes upgrades also applies when enabling VPA on production workloads.

VPA vs HPA vs Cluster Autoscaler

VPA is often confused with other Kubernetes autoscaling tools. They solve related but different problems.

VPA: Adjusts CPU and memory requests for individual pods.
HPA: Adjusts the number of pod replicas based on metrics such as CPU utilization, memory, or custom application metrics.
Cluster Autoscaler or Karpenter: Adjusts the number or type of cluster nodes when pods cannot be scheduled or capacity is no longer needed.

A common setup is to use HPA for stateless services that need replica scaling and VPA in recommendation mode to tune requests. For workloads where replica count is fixed or scaling out is not useful, VPA can be a better fit.

Simple example

Suppose a Go API runs with these requests:

CPU request: 1000m
Memory request: 2Gi

After observing the workload, VPA may recommend:

CPU request: 300m
Memory request: 768Mi

If the service has 20 replicas, that change can free a meaningful amount of reserved cluster capacity. If the service was under-requested instead, VPA might recommend higher requests to reduce CPU throttling or memory-related evictions.

How teams usually adopt VPA

Install the VPA components and CRDs in a non-production cluster first.
Create VPA objects in Off mode for selected workloads.
Compare recommendations against current requests over several days or a full traffic cycle.
Set minimum and maximum allowed values with a VPA resource policy.
Move low-risk workloads to Initial mode.
Use Recreate or Auto mode only where restarts are acceptable and availability controls are in place.

If your cluster resources are managed through infrastructure as code, you can define VPA objects the same way you manage other Kubernetes manifests. For example, teams that manage manifests with Terraform can follow similar patterns to those used when they deploy Kubernetes resources using Terraform.

Key takeaway

Kubernetes VPA helps you tune pod CPU and memory requests using real usage data. It is most useful when teams want better scheduling, fewer manual request changes, and more accurate capacity planning. Use it carefully in automatic modes, because changing requests may require pod recreation and can affect availability.

DevOps Dictionary