Incident Communication Templates

Copy-and-adapt templates for every stage of an incident lifecycle. Each template includes placeholders for service names, timestamps, and technical details.

Severity Levels and Response Expectations

SeverityDefinitionInitial UpdateUpdate FrequencyResponse Team
SEV1Complete service outage or data loss affecting all usersWithin 5 minutesEvery 15 minutesIncident Commander + full on-call rotation
SEV2Major feature degraded or unavailable; significant user impactWithin 15 minutesEvery 30 minutesOn-call engineer + team lead
SEV3Minor feature issue; workaround available; limited user impactWithin 30 minutesEvery 2 hoursOn-call engineer
SEV4Cosmetic issue or minor bug; no functional impactWithin 4 hoursDaily or as neededAssigned engineer during business hours

Initial Notification Template

Post this as soon as an incident is confirmed. Speed of acknowledgment matters more than completeness at this stage.

Title: [Investigating] Increased error rates on [SERVICE NAME]

We are investigating reports of [BRIEF DESCRIPTION OF SYMPTOMS].

Affected services: [LIST AFFECTED COMPONENTS]
Start time: [TIMESTAMP in UTC]
Current status: Investigating

We will provide an update within [TIMEFRAME based on severity level].

Status Update: Investigating

Title: [Investigating] [SERVICE NAME] -- [BRIEF ISSUE]

We are continuing to investigate [ISSUE DESCRIPTION].

What we know so far:
- [OBSERVATION 1]
- [OBSERVATION 2]

Impact: [DESCRIBE USER-FACING IMPACT]
Workaround: [IF AVAILABLE, DESCRIBE WORKAROUND]

Next update: [TIMESTAMP or TIMEFRAME]

Status Update: Identified

Title: [Identified] [SERVICE NAME] -- [BRIEF ISSUE]

We have identified the root cause of [ISSUE DESCRIPTION].

Root cause: [BRIEF TECHNICAL EXPLANATION appropriate for your audience]
Remediation: [DESCRIBE THE FIX BEING APPLIED]
Expected resolution: [ESTIMATED TIME or "We will update as progress is made"]

Impact: [CURRENT USER-FACING IMPACT]

Next update: [TIMESTAMP or TIMEFRAME]

Status Update: Monitoring

Title: [Monitoring] [SERVICE NAME] -- [BRIEF ISSUE]

A fix has been implemented for [ISSUE DESCRIPTION].
We are monitoring the results.

Fix applied: [DESCRIBE WHAT WAS DONE]
Current metrics: [KEY METRICS showing recovery]
Monitoring period: [HOW LONG you will monitor before resolving]

If you continue to experience issues, please [CONTACT METHOD].

Next update: [TIMESTAMP or "when monitoring period completes"]

Status Update: Resolved

Title: [Resolved] [SERVICE NAME] -- [BRIEF ISSUE]

This incident has been resolved.

Duration: [START TIME] to [END TIME] ([TOTAL DURATION])
Root cause: [BRIEF SUMMARY]
Resolution: [WHAT FIXED IT]

A post-mortem will be published within [TIMEFRAME, typically 48-72 hours].

We apologize for the disruption and thank you for your patience.

Post-Mortem Summary Template

A condensed post-mortem summary suitable for publishing on your status page. The full internal post-mortem document typically contains more detail.

Post-Mortem: [INCIDENT TITLE]
Date: [DATE]
Duration: [TOTAL DURATION]
Severity: [SEV LEVEL]
Incident Commander: [NAME or ROLE]

Summary
-------
[2-3 sentence summary of what happened and the impact]

Timeline (all times UTC)
------------------------
[HH:MM] - [EVENT: e.g., Monitoring alert triggered]
[HH:MM] - [EVENT: e.g., On-call engineer paged]
[HH:MM] - [EVENT: e.g., Root cause identified]
[HH:MM] - [EVENT: e.g., Fix deployed]
[HH:MM] - [EVENT: e.g., Service fully recovered]

Root Cause
----------
[Technical explanation of what caused the incident]

Resolution
----------
[What was done to resolve the incident]

Lessons Learned
---------------
What went well:
- [ITEM]

What could be improved:
- [ITEM]

Action Items
------------
- [ACTION] -- Owner: [TEAM/PERSON] -- Due: [DATE]
- [ACTION] -- Owner: [TEAM/PERSON] -- Due: [DATE]

Communication Best Practices

Acknowledge quickly

A brief acknowledgment within minutes is more valuable than a detailed update after 30 minutes of silence.

Use plain language

Describe user-visible symptoms, not internal system names. "Login is failing" not "Auth service 503s from pod-7."

Commit to next update time

Every update should include when the next update will come. This reduces "are you still working on it?" support tickets.

Separate internal and external comms

Your status page audience is customers and users. Internal war-room details belong in Slack or your incident management tool.