Incident response is mostly drilled-in process, not heroics. The teams that recover fastest have practiced roles, clear channels, and ruthless focus on user impact over root cause. Here's the playbook.
Roles
Incident Commander (decides, doesn't fix). Comms (updates stakeholders). Ops (the fixer). Scribe (timeline log). Same person can wear multiple hats in small orgs but the roles should be explicit.
Channels
One ops channel (technical discussion). One status channel (broadcast to org). Status page updates externally if user-facing. Don't conflate channels — discussion noise drowns important updates.
During and after
During: focus on mitigation, not root cause. Roll back if you can. Communicate every 15-30 min even if 'still investigating'. After: blameless postmortem within a week, action items with owners, schedule the work.