Standby DISP-01 · The Fireman
00:00 / 02:38 Sev-1
An essay · ITIL Incident Management

The Fireman

An ITIL Incident Manager works a lot like a fireman. The parallels run deep — and the boundary that matters is where the job ends.

Reading~6 min DisciplineIncident Mgmt NotProblem Mgmt
Listen · Broadcast DISP-01 — narrated dispatch Two voices · ambient bed · 2:38 · headphones recommended
01 · The Alarm

Neither one gets to ask whether they're in the mood.

An incident, in ITIL's vocabulary, is an unplanned interruption to a service or a reduction in its quality. A fireman doesn't decide what's burning or why. They just know something is on fire and people need help, fast. An Incident Manager gets paged at 2am because production is down, payments are failing, or a service is throwing errors at a million users a minute. The cause is unknown, the stakes are real, and the clock is already running.

FIREBox alarm
03:47Residential structure, two stories. Smoke showing from second floor. Engine 1, Truck 2, BC-4 responding.
SYSPage
03:47payments-api 5xx rate exceeded threshold. p99 latency 14s. On-call paged. Sev-1 declared.
02 · Triage & Command

About coordination under pressure, not heroics.

A fireman arriving at a burning building doesn't run in alone with a bucket. They size up the scene, figure out where the fire is spreading, decide whether it's a rescue or a containment situation, and direct their crew — one team on the hose, one doing search and rescue, one ventilating the roof.

The Incident Manager does the same thing with engineers: who's looking at the database, who's checking the load balancers, who's talking to customer support, who's drafting the status page update. They're not usually the person typing the fix — they're the one keeping the response coherent so five smart people don't all debug the same thing while the actual problem festers somewhere else.

FIREFireground orders
03:51Engine 1 — pull a line to side Charlie.
03:52Truck 2 — primary search, second floor.
03:53BC-4 — vent the roof, hold the stairwell.
SYSBridge assignments
03:51DBA — query plan + connection pool.
03:52SRE — load balancer + recent deploys.
03:53Comms — status page, support brief.

The job isn't to put out the fire yourself. It's to make sure the people who can put it out aren't all standing in the same room.

03 · The Translator

Translating chaos into something the rest of the world can act on.

Both have to manage information flow outward. The fire chief talks to the homeowner, the police, the press. The Incident Manager talks to the execs asking "is it fixed yet," to support teams getting buried in tickets, to legal if it's a data issue.

They translate technical chaos into something the rest of the world can act on, while shielding the responders from the noise so they can actually work. A good Incident Manager is, in part, a human firewall.

FIREOutward comms
04:08Briefing the homeowner. Updating the press line. Coordinating with PD on traffic control. Crew stays on the building.
SYSOutward comms
04:08Status page updated. Exec Slack briefed. Support given holding language. Engineers stay on the bridge.
04 · Restore First, Explain Later

The job is not to figure out why. The job is to make it stop.

This is where most people get the role wrong. ITIL Incident Management is measured on one thing: restoration of normal service operation. MTTR, not MTTU — mean time to repair, not mean time to understand. A rollback that gets payments flowing again at 04:12 is a win, even if nobody yet knows what the bad deploy actually did.

The fireman doesn't stop to inspect the wiring before knocking down the flames. They knock down the flames. A workaround beats a root cause every time, because the customer is on fire now. Cause-and-origin can wait until everyone's safe.

FIRERestore the building
04:11Knock it down. Vent the smoke. Pull occupants. Save what you can.

Why it started doesn't matter yet. Get the scene safe.
SYSRestore the service
04:11Roll the deploy back. Fail over. Flush the cache. Restart the pod.

Root cause doesn't matter yet. Get the customers green.
05 · The Handoff

When the fire is out, the fireman leaves.

This is where the analogy gets sharp, and where a lot of orgs blur a line ITIL drew on purpose. When the fire is out, the fireman packs up and goes home. They don't sift through the ash with a clipboard. That's a different person — a fire marshal, an arson investigator — with a different mandate, a different uniform, and a different boss.

ITIL draws the same line. Once normal service is restored, the incident is closed. What remains — the why, the underlying defect, the how do we keep this from happening again — becomes a problem. And Problem Management owns it. Not Incident Management.

The Incident Manager's notes from the bridge become the Problem Manager's opening file. The handoff matters. Conflate the two roles and you get one of two failure modes: the incident drags on for hours while everyone debates root cause and customers stay broken, or the post-mortem never gets written because the responders are already onto the next page.

FIREScene transfer
04:32Fire knocked down. Overhaul complete. Marshal arrives. Scene transferred. Crew returns to quarters.

The marshal owns cause & origin. Not us.
SYSProblem opened
04:32Service restored. Monitors green. Problem record opened. Owner assigned. Incident closed.

Problem Mgmt owns RCA. Not us.

An Incident Manager who refuses to close until they understand is not doing the job better. They're doing a different job — badly, and at the customer's expense.

06 · Where it bends

Where the analogy gets a little loose.

A fireman's fires are mostly independent events. A house burns; the house next door usually doesn't. Incidents in a complex distributed system aren't like that. One small flame in a dependency graph can light up half the org in seconds — a slow database makes the API slow, which makes the frontend slow, which makes the retry storm, which takes down a service that didn't even know the database existed.

So the line between one incident and many incidents tracing to one problem gets harder to draw. The Incident Manager has to think more like a fire chief commanding a wildfire than a single-engine response. Containment lines. Spot fires. Wind direction. The blaze you're not looking at yet. And sometimes the handoff to Problem Management has to happen while parts of the fire are still burning — because the underlying defect is the only thing that explains why three apparently unrelated services all caught at once.

In summary

Stay calm. Coordinate. Communicate. Restore. Hand off.

That's the posture, in either uniform. The radios are different. The hoses are different. The job — and where the job stops — is the same.

07 · Forward

Brief the chief.

The fire is out. The bridge is closed. The runbook is open. Your task now is to brief. The Marshal needs the timeline. The COO needs the language. The next on-call needs the lesson.

Forward this dispatch to the people who set the budget for the next one.

DISP-01 · For exec The Fireman — DISP-01 briefing tile
Distribution

Pick the channel they actually open.

Forward by email Post to LinkedIn Post to X