An ITIL Incident Manager works a lot like a fireman. The parallels run deep — and the boundary that matters is where the job ends.
An incident, in ITIL's vocabulary, is an unplanned interruption to a service or a reduction in its quality. A fireman doesn't decide what's burning or why. They just know something is on fire and people need help, fast. An Incident Manager gets paged at 2am because production is down, payments are failing, or a service is throwing errors at a million users a minute. The cause is unknown, the stakes are real, and the clock is already running.
A fireman arriving at a burning building doesn't run in alone with a bucket. They size up the scene, figure out where the fire is spreading, decide whether it's a rescue or a containment situation, and direct their crew — one team on the hose, one doing search and rescue, one ventilating the roof.
The Incident Manager does the same thing with engineers: who's looking at the database, who's checking the load balancers, who's talking to customer support, who's drafting the status page update. They're not usually the person typing the fix — they're the one keeping the response coherent so five smart people don't all debug the same thing while the actual problem festers somewhere else.
The job isn't to put out the fire yourself. It's to make sure the people who can put it out aren't all standing in the same room.
Both have to manage information flow outward. The fire chief talks to the homeowner, the police, the press. The Incident Manager talks to the execs asking "is it fixed yet," to support teams getting buried in tickets, to legal if it's a data issue.
They translate technical chaos into something the rest of the world can act on, while shielding the responders from the noise so they can actually work. A good Incident Manager is, in part, a human firewall.
This is where most people get the role wrong. ITIL Incident Management is measured on one thing: restoration of normal service operation. MTTR, not MTTU — mean time to repair, not mean time to understand. A rollback that gets payments flowing again at 04:12 is a win, even if nobody yet knows what the bad deploy actually did.
The fireman doesn't stop to inspect the wiring before knocking down the flames. They knock down the flames. A workaround beats a root cause every time, because the customer is on fire now. Cause-and-origin can wait until everyone's safe.
This is where the analogy gets sharp, and where a lot of orgs blur a line ITIL drew on purpose. When the fire is out, the fireman packs up and goes home. They don't sift through the ash with a clipboard. That's a different person — a fire marshal, an arson investigator — with a different mandate, a different uniform, and a different boss.
ITIL draws the same line. Once normal service is restored, the incident is closed. What remains — the why, the underlying defect, the how do we keep this from happening again — becomes a problem. And Problem Management owns it. Not Incident Management.
The Incident Manager's notes from the bridge become the Problem Manager's opening file. The handoff matters. Conflate the two roles and you get one of two failure modes: the incident drags on for hours while everyone debates root cause and customers stay broken, or the post-mortem never gets written because the responders are already onto the next page.
An Incident Manager who refuses to close until they understand is not doing the job better. They're doing a different job — badly, and at the customer's expense.
A fireman's fires are mostly independent events. A house burns; the house next door usually doesn't. Incidents in a complex distributed system aren't like that. One small flame in a dependency graph can light up half the org in seconds — a slow database makes the API slow, which makes the frontend slow, which makes the retry storm, which takes down a service that didn't even know the database existed.
So the line between one incident and many incidents tracing to one problem gets harder to draw. The Incident Manager has to think more like a fire chief commanding a wildfire than a single-engine response. Containment lines. Spot fires. Wind direction. The blaze you're not looking at yet. And sometimes the handoff to Problem Management has to happen while parts of the fire are still burning — because the underlying defect is the only thing that explains why three apparently unrelated services all caught at once.
That's the posture, in either uniform. The radios are different. The hoses are different. The job — and where the job stops — is the same.
The fire is out. The bridge is closed. The runbook is open. Your task now is to brief. The Marshal needs the timeline. The COO needs the language. The next on-call needs the lesson.
Forward this dispatch to the people who set the budget for the next one.