DevOps Dictionary

Observability

Observability is the capability to understand what a system is doing internally by analyzing the data it produces, typically logs (event records), metrics (numeric measurements over time), and traces (end-to-end timing across services). It addresses the problem that modern distributed systems can fail in subtle ways that are hard to reproduce, making it difficult to pinpoint where latency, errors, or resource pressure actually originate. At a high level, observability works by collecting and correlating telemetry from applications and infrastructure, then providing enough context to ask and answer new questions during incidents, not just the ones you predicted ahead of time.

With observability, teams can move from symptoms to root cause faster and validate fixes with evidence; without it, troubleshooting relies on guesswork, scattered dashboards, and longer outages. This gap exists because failures often emerge from interactions between components, where no single metric or log line tells the full story unless the signals are connected.

A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
Y
X
Z