Status Page Design Best Practices
Design patterns and UX principles for building status pages that communicate clearly during both normal operations and incidents.
Component Hierarchy
Organize your status page components in a hierarchy that matches how users think about your service. Group related components and show rollup status at each level.
Rollup logic:A parent component's status is the worst status among its children. If any child is "Major Outage," the parent shows "Major Outage."
Granularity tradeoff: Too few components and users cannot tell which part is affected. Too many and the page becomes noisy. Aim for 5-15 top-level components.
Design Approaches Comparison
| Approach | Structure | Best For | Drawbacks |
|---|---|---|---|
| Simple | Single overall status indicator with a list of recent incidents | Single-product services, small teams, early-stage products | Cannot communicate partial outages; all-or-nothing |
| Grouped | Components grouped by category (e.g., API, Dashboard, Infrastructure) with per-component status | Multi-service platforms, SaaS products with distinct features | Requires maintenance; users must understand your component names |
| Detailed | Per-component status with uptime graphs, latency metrics, and historical incident timeline | Infrastructure providers, API-first products, enterprise customers | Information-dense; can overwhelm non-technical users |
Metric Display Patterns
Uptime bar chart (90-day)
Show a horizontal bar of 90 vertical segments, one per day. Green for fully operational, yellow for degraded, red for outage, gray for no data. This is the most recognized pattern in the industry. Hovering a segment shows the date and any incidents.
Response time graph
Line chart showing p50 and p95 response times over the last 24 hours or 7 days. Use a consistent y-axis scale. Mark incident periods with a subtle background highlight so users can correlate latency spikes with known issues.
Uptime percentage display
Show the current uptime percentage for each component over 30, 60, and 90 day windows. Display to two decimal places (e.g., 99.98%). Avoid showing 100.00% unless it is literally true -- round down rather than up to maintain trust.
Incident Timeline UX
The incident timeline is the most-read section of your status page during an outage. Design it for scanability.
Reverse chronological order
Newest updates at the top. Users arriving during an active incident want the latest information first.
Timestamps in UTC and local time
Display timestamps in UTC with the user's local time in parentheses. Use relative time (e.g., "12 minutes ago") for recent updates, absolute time for older entries.
Status badges
Color-coded badges (Investigating, Identified, Monitoring, Resolved) provide instant visual scanning. Use consistent colors across the entire status page.
Collapse resolved incidents
After 24-48 hours, collapse resolved incidents to a single line to keep the current view focused. Provide an "Incident History" page for the full archive.
Subscriber Notifications
| Channel | Use Case | Latency | Considerations |
|---|---|---|---|
| All incidents and maintenance windows | Minutes | Highest reach; risk of spam filtering | |
| Webhook | Integration with internal tools (Slack, PagerDuty) | Seconds | Requires implementation; most flexible |
| RSS/Atom | Technical users and monitoring tools | Depends on poll interval | Low maintenance; no user management needed |
| SMS | Critical incidents (SEV1/SEV2) only | Seconds | Higher cost; use sparingly to avoid notification fatigue |
API Status Endpoint Design
Provide a machine-readable API endpoint so customers can programmatically check your status and integrate it into their own monitoring.
Recommended response format
GET /api/v1/status
{
"status": {
"indicator": "minor",
"description": "Minor System Outage"
},
"components": [
{
"id": "api",
"name": "REST API",
"status": "operational",
"updatedAt": "2026-04-12T10:30:00Z"
},
{
"id": "dashboard",
"name": "Web Dashboard",
"status": "degradedPerformance",
"updatedAt": "2026-04-12T10:25:00Z"
}
],
"activeIncidents": [
{
"id": "inc-2026-0412",
"title": "Elevated dashboard latency",
"status": "identified",
"severity": "SEV3",
"createdAt": "2026-04-12T10:15:00Z",
"updatedAt": "2026-04-12T10:25:00Z"
}
],
"scheduledMaintenances": []
}Status values: operational, degradedPerformance, partialOutage, majorOutage, underMaintenance
Response headers: Include Cache-Control (30-60 seconds) and an ETag to reduce polling load.
Availability: Host the status API on separate infrastructure from your main service so it remains available during outages.