DevOps Dictionary

Error Budget

Error Budget is the amount of failure a service is allowed to have while still meeting its reliability target, usually expressed through a service level objective (SLO) such as 99.9% successful requests over a rolling window. It solves the common tension between shipping changes quickly and keeping systems stable by turning reliability goals into a measurable budget teams can track and “spend.” At a high level, teams define an SLO, measure real user experience with a service level indicator (SLI, the metric that represents success such as request success rate or latency), and subtract actual errors or downtime from the budget as time passes.

With an error budget, release pace and operational risk are guided by data, so teams slow changes and prioritize stability when the budget is depleted; without it, decisions tend to swing between over-cautious gating and over-shipping until incidents force unplanned work. This gap exists because the budget makes reliability tradeoffs explicit and enforceable in day-to-day engineering.

A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
Y
X
Z