DevOps Dictionary

Postmortem

Postmortem is a written, structured review created after an incident, outage, or major degradation to capture what happened, why it happened, and what changes will prevent recurrence. It addresses the common problem of repeating failures and tribal knowledge by turning a stressful event into shared learning and concrete engineering work. A solid postmortem reconstructs a timeline, notes user and system impact, identifies contributing factors across code, infrastructure, and process, and ends with specific follow-ups such as fixes, improved monitoring and alerting (signals that detect issues early), safer release practices, and updated runbooks (step-by-step operational guides).

With a postmortem, teams improve reliability and reduce repeat incidents by focusing on systems and decisions rather than blame; without it, the same patterns often return, assumptions go unchallenged, and risk quietly accumulates. This gap exists because complex services fail through multiple small interactions that are easy to miss unless they are documented and reviewed.

A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
Y
X
Z