Incident Communication Templates

Copy-and-adapt templates for every stage of an incident lifecycle. Each template includes placeholders for service names, timestamps, and technical details.

Severity Levels and Response Expectations

Severity	Definition	Initial Update	Update Frequency	Response Team
SEV1	Complete service outage or data loss affecting all users	Within 5 minutes	Every 15 minutes	Incident Commander + full on-call rotation
SEV2	Major feature degraded or unavailable; significant user impact	Within 15 minutes	Every 30 minutes	On-call engineer + team lead
SEV3	Minor feature issue; workaround available; limited user impact	Within 30 minutes	Every 2 hours	On-call engineer
SEV4	Cosmetic issue or minor bug; no functional impact	Within 4 hours	Daily or as needed	Assigned engineer during business hours

Initial Notification Template

Post this as soon as an incident is confirmed. Speed of acknowledgment matters more than completeness at this stage.

Title: [Investigating] Increased error rates on [SERVICE NAME]

We are investigating reports of [BRIEF DESCRIPTION OF SYMPTOMS].

Affected services: [LIST AFFECTED COMPONENTS]
Start time: [TIMESTAMP in UTC]
Current status: Investigating

We will provide an update within [TIMEFRAME based on severity level].

Status Update: Investigating

Title: [Investigating] [SERVICE NAME] -- [BRIEF ISSUE]

We are continuing to investigate [ISSUE DESCRIPTION].

What we know so far:
- [OBSERVATION 1]
- [OBSERVATION 2]

Impact: [DESCRIBE USER-FACING IMPACT]
Workaround: [IF AVAILABLE, DESCRIBE WORKAROUND]

Next update: [TIMESTAMP or TIMEFRAME]

Status Update: Identified

Title: [Identified] [SERVICE NAME] -- [BRIEF ISSUE]

We have identified the root cause of [ISSUE DESCRIPTION].

Root cause: [BRIEF TECHNICAL EXPLANATION appropriate for your audience]
Remediation: [DESCRIBE THE FIX BEING APPLIED]
Expected resolution: [ESTIMATED TIME or "We will update as progress is made"]

Impact: [CURRENT USER-FACING IMPACT]

Next update: [TIMESTAMP or TIMEFRAME]

Status Update: Monitoring

Title: [Monitoring] [SERVICE NAME] -- [BRIEF ISSUE]

A fix has been implemented for [ISSUE DESCRIPTION].
We are monitoring the results.

Fix applied: [DESCRIBE WHAT WAS DONE]
Current metrics: [KEY METRICS showing recovery]
Monitoring period: [HOW LONG you will monitor before resolving]

If you continue to experience issues, please [CONTACT METHOD].

Next update: [TIMESTAMP or "when monitoring period completes"]

Status Update: Resolved

Title: [Resolved] [SERVICE NAME] -- [BRIEF ISSUE]

This incident has been resolved.

Duration: [START TIME] to [END TIME] ([TOTAL DURATION])
Root cause: [BRIEF SUMMARY]
Resolution: [WHAT FIXED IT]

A post-mortem will be published within [TIMEFRAME, typically 48-72 hours].

We apologize for the disruption and thank you for your patience.

Post-Mortem Summary Template

A condensed post-mortem summary suitable for publishing on your status page. The full internal post-mortem document typically contains more detail.

Post-Mortem: [INCIDENT TITLE]
Date: [DATE]
Duration: [TOTAL DURATION]
Severity: [SEV LEVEL]
Incident Commander: [NAME or ROLE]

Summary
-------
[2-3 sentence summary of what happened and the impact]

Timeline (all times UTC)
------------------------
[HH:MM] - [EVENT: e.g., Monitoring alert triggered]
[HH:MM] - [EVENT: e.g., On-call engineer paged]
[HH:MM] - [EVENT: e.g., Root cause identified]
[HH:MM] - [EVENT: e.g., Fix deployed]
[HH:MM] - [EVENT: e.g., Service fully recovered]

Root Cause
----------
[Technical explanation of what caused the incident]

Resolution
----------
[What was done to resolve the incident]

Lessons Learned
---------------
What went well:
- [ITEM]

What could be improved:
- [ITEM]

Action Items
------------
- [ACTION] -- Owner: [TEAM/PERSON] -- Due: [DATE]
- [ACTION] -- Owner: [TEAM/PERSON] -- Due: [DATE]

Communication Best Practices

Acknowledge quickly

A brief acknowledgment within minutes is more valuable than a detailed update after 30 minutes of silence.

Use plain language

Describe user-visible symptoms, not internal system names. "Login is failing" not "Auth service 503s from pod-7."

Commit to next update time

Every update should include when the next update will come. This reduces "are you still working on it?" support tickets.

Separate internal and external comms

Your status page audience is customers and users. Internal war-room details belong in Slack or your incident management tool.