DevOps Dictionary

OpenTelemetry

OpenTelemetry, often shortened to OTel, is an open-source observability framework for collecting telemetry data from applications, services, and infrastructure. It provides a standard way to generate, collect, process, and export traces, metrics, and logs to observability tools.

What OpenTelemetry does

OpenTelemetry helps teams understand how software behaves in production. Instead of each monitoring tool requiring its own custom instrumentation, OpenTelemetry gives you common APIs, SDKs, agents, and data formats for observability data.

Teams use it to collect:

  • Traces: request paths across services, such as an API call moving through a gateway, backend service, database, and queue.
  • Metrics: numeric measurements over time, such as request count, latency, error rate, CPU usage, or memory usage.
  • Logs: timestamped event records from applications and systems.

How it works

OpenTelemetry instruments code and runtime environments so they emit telemetry data. That data can then be sent directly to an observability backend or routed through the OpenTelemetry Collector.

A typical flow looks like this:

  1. An application is instrumented with an OpenTelemetry SDK, auto-instrumentation agent, or library integration.
  2. The application creates spans, metrics, and logs while it runs.
  3. The telemetry data is exported using a protocol such as OTLP, the OpenTelemetry Protocol.
  4. The OpenTelemetry Collector receives, filters, enriches, batches, and forwards the data.
  5. An observability platform stores and visualizes the data for querying, dashboards, and alerts.

Key components

  • APIs: Language-specific interfaces used by application code and libraries to create telemetry data.
  • SDKs: Implementations of the APIs that handle sampling, context propagation, batching, and exporting.
  • Instrumentation libraries: Integrations for frameworks, databases, HTTP clients, message queues, and other common dependencies.
  • Auto-instrumentation: Runtime-based instrumentation that can collect telemetry with little or no code changes, depending on the language and framework.
  • OpenTelemetry Collector: A vendor-neutral service or agent that receives, processes, and exports telemetry data.
  • OTLP: The default OpenTelemetry data exchange protocol for sending telemetry between applications, collectors, and backends.

Common use cases

  • Tracing a slow user request across microservices.
  • Measuring API latency, throughput, and error rates.
  • Correlating logs with traces to debug production incidents faster.
  • Standardizing observability across Kubernetes, serverless, and VM-based workloads.
  • Reducing vendor-specific instrumentation in application code.
  • Sending the same telemetry stream to different backends during a migration.

Benefits

  • Vendor-neutral instrumentation: You can instrument once and export to many supported observability systems.
  • Better service visibility: Distributed traces show where requests spend time and where failures occur.
  • Consistent telemetry: Teams can use shared naming, attributes, and propagation standards across services.
  • Collector-based control: You can filter, sample, redact, transform, and route telemetry before it reaches storage.

Tradeoffs and limitations

  • Setup still takes planning: Teams need to decide what to instrument, what to sample, and which attributes to standardize.
  • Telemetry can become expensive: High-cardinality labels, verbose logs, and unsampled traces can increase storage and query costs.
  • Auto-instrumentation is not complete coverage: It helps with common frameworks, but custom business logic often needs manual spans or metrics.
  • Data quality depends on implementation: Poor span names, missing context propagation, or inconsistent attributes can make traces hard to use.

Simple example

Suppose a user clicks “checkout” in an online store and the request takes 4 seconds. With OpenTelemetry tracing, you can see that the request passed through the frontend, cart service, payment service, and inventory service. The trace may show that the payment service spent 3.2 seconds waiting for an external API. That gives the team a specific place to investigate instead of guessing across several services.

OpenTelemetry vs observability platforms

OpenTelemetry is not an observability dashboard or long-term storage system by itself. It collects and transports telemetry data. Observability platforms, such as Prometheus, Grafana, Jaeger, Tempo, Datadog, Honeycomb, New Relic, and others, store, query, alert on, and visualize that data.

In practice, OpenTelemetry often sits between your applications and those platforms. It standardizes how telemetry is produced, while your backend handles analysis and operations workflows.

A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
Y
X
Z