MeteorOps | DevOps Dictionary

Kubernetes Job

Kubernetes Job is a Kubernetes API object that runs one or more pods until a finite task completes successfully. In practical terms, you use a Job for work that should start, run to completion, and stop, such as a database migration, batch import, report generation, test run, or one-time cleanup task inside a cluster.

What a Kubernetes Job does

A Job tells Kubernetes to create pods for a task and keep creating or retrying them until the requested number of successful completions is reached. Unlike a Deployment, which keeps an application running continuously, a Job is meant for work with a clear end state.

Common examples include:

Running a schema migration before a new application release
Processing a batch of files from object storage
Executing a one-time data backfill
Running integration tests in a temporary environment
Cleaning up expired records or unused resources

How it works

A Kubernetes Job uses a pod template, similar to other workload objects. The template defines the container image, command, environment variables, volumes, resource requests, and other pod settings.

When you create the Job, the Kubernetes Job controller creates pods from that template. It then tracks their status:

If a pod completes successfully, Kubernetes counts it toward the Job’s completion target.
If a pod fails, Kubernetes can retry it, depending on the Job’s settings.
When the required number of successful completions is reached, the Job is marked complete.

For example, a Job with completions: 1 and parallelism: 1 runs one pod until it succeeds. A Job with completions: 100 and parallelism: 10 can process up to 10 pods at a time until 100 successful runs finish.

Key fields and settings

spec.template: Defines the pod that the Job creates.
restartPolicy: Must be Never or OnFailure for Job pods.
completions: Number of successful pod completions required.
parallelism: Maximum number of pods that can run at the same time.
backoffLimit: Number of retries before Kubernetes marks the Job as failed.
activeDeadlineSeconds: Maximum time the Job can run before Kubernetes stops it.
ttlSecondsAfterFinished: Optional cleanup setting that removes completed or failed Jobs after a delay.

Common use cases

Kubernetes Jobs are useful when you want cluster-native execution for tasks that do not need a long-running service. Teams often use them in CI/CD pipelines, data workflows, and operational automation.

Release automation: Run a database migration as part of a deployment pipeline.
Batch processing: Process a fixed set of files, messages, or records.
Infrastructure tasks: Create or update resources as part of platform automation. For example, teams using Kubernetes as a control plane may pair Jobs with tools covered in guides on deploying Kubernetes resources using Terraform.
Data orchestration: Run task containers that support workflow systems such as Airflow, including deployments like Apache Airflow on AWS EKS.
Cloud resource automation: Trigger one-time operations in clusters that also manage cloud resources, such as setups using Crossplane on Kubernetes.

Kubernetes Job vs CronJob vs Deployment

Job: Runs a finite task now, or when it is created, until completion.
CronJob: Creates Jobs on a schedule, such as every hour or every night at 02:00.
Deployment: Keeps a set of application pods running continuously, usually for services such as APIs, web apps, and workers.

Use a Job when the task should finish. Use a CronJob when the task should run repeatedly on a schedule. Use a Deployment when the workload should stay available.

Benefits

Native scheduling: Jobs use the same Kubernetes scheduler, nodes, secrets, config maps, and service accounts as other workloads.
Retry behavior: Kubernetes can retry failed pods based on the Job configuration.
Parallel execution: Jobs can run multiple pods at once for faster batch processing.
Operational consistency: Teams can package one-off tasks as containers and run them with the same deployment controls used for applications.

Tradeoffs and limitations

Not ideal for long-running services: Jobs are designed to finish. Use a Deployment, StatefulSet, or DaemonSet for ongoing workloads.
Retries can repeat work: If a task is not idempotent, a retry can cause duplicate writes, repeated emails, or partial updates.
Logs may disappear with pods: If completed pods are cleaned up, you need centralized logging to keep task output.
Failed Jobs need alerting: Kubernetes records the failed state, but your monitoring stack must alert the right team.

Simple example

A platform team might create a Kubernetes Job during a release to run:

python manage.py migrate

The Job starts a pod using the application image, connects to the database using Kubernetes Secrets, runs the migration command, and exits. If the command succeeds, the Job completes. If it fails, Kubernetes can retry it according to the configured backoffLimit.

This keeps the migration close to the application runtime: same image, same cluster networking, same credentials model, and same observability pipeline.

Good practices

Make Job tasks idempotent where possible, so retries are safe.
Set CPU and memory requests to avoid unpredictable scheduling.
Use backoffLimit and activeDeadlineSeconds to prevent runaway retries.
Use ttlSecondsAfterFinished if you want automatic cleanup after completion.
Send logs and metrics to your normal observability tools before pods are removed.
Use specific service accounts with only the permissions the Job needs.

DevOps Dictionary