Feature Incident Response
Incident KRIs: Volume, Severity, Time to Contain, Time to Resolve, and Root Cause Patterns
How to build an incident KRI dashboard that measures what regulators actually care about—response times, severity patterns, repeat incident rates, and root cause closure.
Table of Contents
TL;DR
- Incident KRIs measure whether your program generates learning and accountability—not just whether you logged what happened
- The core set: volume by severity, MTTD, MTTC, MTTR, regulatory notification timeliness, repeat incident rate, and corrective action aging
- Root cause trending is the leading indicator most programs skip—rising repeat rates in any category are a governance problem, not just an operational one
- OCC and FFIEC examiners look for evidence that incidents produce improvement, not just documentation
The CRO pulls up last quarter’s board deck and asks three questions: How many P1 incidents did we have? What was our average time to contain? And how many had the same root cause as the quarter before?
If your team scrambles to pull spreadsheets, the answer is already bad. Not because of the numbers—because those questions revealed that you don’t have incident KRIs. You have an incident log.
The difference matters when a regulator walks in. OCC and FFIEC examiners don’t just want to see that you tracked incidents. They want evidence that your program generates learning, drives accountability, and catches deterioration before it becomes a 36-hour notification event. That’s what KRIs do that logs don’t.
Why Incident Metrics Need a KRI Layer
Most incident programs are good at recording. They capture what happened, when it was detected, when it was resolved, and who owned the response. That’s a log. KRIs are different: they’re the metrics that tell you whether the program itself is healthy—whether it’s improving or degrading over time.
The FFIEC 36-hour computer security incident notification rule set an explicit external performance standard: banking organizations must notify the OCC no later than 36 hours after determining that a notification incident has occurred. That’s a clock you can’t miss. Your KRIs should be calibrated to catch the conditions that put that clock at risk before the clock starts.
An institution tracking strong MTTD and MTTC metrics will almost never have a 36-hour notification problem—because it detects and contains quickly enough to make the determination and notification with time to spare. An institution that can’t answer basic questions about its response times is flying blind toward that deadline.
The Core Incident KRI Set
1. Incident Volume by Severity Tier
Track total incidents by severity tier (P1/Critical, P2/High, P3/Moderate, P4/Low) month-over-month. Volume alone doesn’t tell you much—a high-transaction-volume fintech might generate dozens of P4 events weekly without any systemic concern. What matters is trend and composition.
Rising P1 volume over consecutive quarters is a KRI. A shift from mostly P4 to increasing P2/P3 is a KRI signaling control degradation upstream. Flat volume with increasing customer impact per incident is a KRI showing that your severity classification may be understating harm.
Threshold guidance:
- Green: No quarter-over-quarter increase in P1/P2 volume; P1 incidents remain within board-approved tolerance
- Amber: 15–25% increase in P1/P2 volume quarter-over-quarter, or isolated spike with documented root cause
- Red: >25% increase in P1/P2 volume without identifiable cause, or P1 count exceeds board-approved tolerance
2. Mean Time to Detect (MTTD)
MTTD measures how long between an incident occurring and your team becoming aware of it. This is where most incident programs are blind. An alert-heavy environment can still have poor MTTD if alerts aren’t reviewed promptly, if monitoring coverage has gaps, or if teams have alert fatigue from too many false positives.
MTTD degrades MTTR silently. If you’re not catching P1 incidents for four hours, your 24-hour MTTR target is already compressed into a 20-hour window before the response even starts.
Threshold guidance:
- Green: P1 MTTD under 1 hour; P2 MTTD under 4 hours
- Amber: P1 MTTD 1–4 hours consistently; P2 MTTD 4–12 hours
- Red: P1 MTTD exceeding 4 hours; any P1 incident with MTTD over 8 hours
3. Mean Time to Contain (MTTC)
Containment means the incident is no longer spreading. Customer impact is capped, systems are isolated or failed over, and further damage is stopped. This is different from resolution, which means the problem is fully fixed and normal operations are restored.
These two metrics get conflated constantly in incident reporting, and the conflation matters. Tracking them separately tells you exactly where your response process breaks down: is the gap in the detect-to-contain phase (awareness and initial response), or contain-to-resolve (root cause identification and remediation)?
For critical incidents, containment should target under four hours. High-severity incidents should target under eight hours.
4. Mean Time to Resolve (MTTR)
Resolution is full restoration—root cause identified, fix deployed, systems normalized, post-incident review scheduled. The cross-industry MTTR average runs roughly 72 hours, but financial institutions with mature programs average 15–24 hours for critical incidents (FS-ISAC, 2023 benchmarks).
Be careful not to close incidents prematurely to hit MTTR targets. A common pattern: incidents get marked “resolved” at containment, the underlying root cause fix gets tracked as a separate project, and MTTR looks clean while the actual vulnerability remains open. Incident triage and severity classification discipline directly affects the integrity of your MTTR data.
| Severity | MTTD Target | MTTC Target | MTTR Target |
|---|---|---|---|
| P1 Critical | < 1 hour | < 4 hours | < 24 hours |
| P2 High | < 4 hours | < 8 hours | < 48 hours |
| P3 Moderate | < 12 hours | < 24 hours | < 5 business days |
| P4 Low | < 24 hours | N/A | < 10 business days |
5. Regulatory Notification Rate and Timeliness
Under 12 CFR Part 53, banking organizations must notify the OCC within 36 hours of determining that a notification incident has occurred. Track two separate KRIs here:
Notification rate: What percentage of your incidents triggered the regulatory notification threshold? Tracking this over time shows whether your exposure to notification-level events is stable, growing, or declining.
Notification timeliness: Of incidents that required notification, what percentage were reported within the 36-hour window? Late notifications or missed notifications are MRA material.
A secondary signal: incidents that required analysis to determine whether notification was required. If this category is growing, your severity classification framework may be creating unnecessary ambiguity at the reporting threshold.
6. Repeat Incident Rate by Root Cause Category
This is the metric most programs skip, and it correlates most directly with program maturity.
Categorize incidents by root cause: technology failure, human error, process failure, third-party or vendor failure, external event (fraud, cyberattack, natural event). Track month-over-month: what percentage of incidents share a root cause category with an incident from the prior 90 days?
A repeat incident rate above 25–30% in any category means your corrective action plans either aren’t being implemented or aren’t addressing the actual root cause. The Basel Committee’s principles for sound operational risk management explicitly connect loss data analysis to identifying control weaknesses and repeating risk patterns—institutions that can’t demonstrate declining repeat rates over time have an examination finding waiting to happen.
The July 2024 CrowdStrike outage is an instructive case at scale: a single third-party software update created cascading failures across global financial institutions, airlines, and healthcare systems simultaneously. Institutions that had previously tracked third-party technology failure as a root cause category—and had open corrective actions from prior vendor incidents—were exposed to a governance question they needed to answer quickly.
7. Open Corrective Actions: Age and Closure Rate
Every significant incident should produce at least one corrective action with a defined owner and due date. Track:
- Open corrective actions by age bucket: 0–30 days, 31–60 days, 61–90 days, 90+ days
- Overdue rate: Corrective actions past their due date that remain open
- Reopen rate: Corrective actions marked closed that are linked to a subsequent incident in the same root cause category
Corrective action ownership without accountability produces the classic pattern: lots of logged actions, minimal closure, and the same categories showing up in the repeat incident KRI. The KRI governance and ownership framework covers how to build accountability structures that actually close the loop.
Threshold guidance:
- Green: < 10% of corrective actions overdue; no corrective actions open > 90 days without documented extension
- Amber: 10–25% of corrective actions overdue; some items in 90+ day bucket with documented reason
- Red: > 25% overdue; corrective actions in 90+ day bucket without documented rationale or escalation
8. Customer Impact Rate
What percentage of incidents resulted in customer-facing impact? Of those, what was the average duration of impact and the estimated customer count affected?
Persistent customer-facing incidents—even at P2/P3 severity—suggest systemic resilience gaps. Track this separately from internal operational incidents. Your regulators and your customers care about different things; your KRI dashboard should reflect both.
Setting Thresholds Against Your Risk Appetite
Incident KRI thresholds aren’t benchmarks you copy from a framework. They’re calibrated to your risk appetite, your business model, and your regulatory environment.
A real-time payments processor with 24/7 transaction volumes has near-zero tolerance for P1 incidents lasting more than two hours—because two hours at peak can mean millions in failed transactions and regulatory notification. A community bank with lower transaction intensity may calibrate differently.
Start with the question your board actually cares about: how much customer impact and regulatory exposure is acceptable, and under what conditions? Work backward to the operational metrics that predict that impact before it occurs. The KRI thresholds and false green/false red guidance covers calibration mechanics in detail.
The OCC’s 2025 Cybersecurity and Financial System Resilience Report emphasized that institutions need to demonstrate not just that they detect and respond to incidents, but that they learn from them. That’s exactly what root cause KRIs are designed to surface.
Root Cause Patterns as Leading Indicators
If you track 12 months of incidents by root cause category, patterns emerge that single-incident analysis misses entirely.
A cluster of third-party vendor failures in Q1 and Q2 is a leading indicator for Q3—especially if the underlying vendor relationships haven’t been remediated or renegotiated. Repeating human-error incidents in a specific process often precede a larger failure when that error hits a high-value transaction at a critical moment. Technology failure incidents concentrated in a specific system or platform suggest capacity or maintenance issues building toward a more significant outage.
This is how operational risk KRIs function as leading indicators rather than lagging scorecards. The Basel Committee on operational risk has long connected loss data analysis to prospective risk identification—the same principle applies at the incident level.
The repeat incident rate, combined with root cause trending, gives you the data to tell your board: “We’ve had four operational failures traced to the same process gap in 90 days. Here’s our remediation plan and our closure KPI.” That’s a fundamentally different conversation than “we had four incidents.”
Building the Dashboard
A practical incident KRI dashboard doesn’t require a sophisticated GRC platform. It requires consistent data entry, defined ownership, and a regular review cadence.
The minimum viable setup:
- Weekly incident data entry by the incident response team (severity, root cause category, MTTD/MTTC/MTTR, customer impact yes/no)
- Monthly KRI calculation and threshold assessment by the risk team
- Quarterly board reporting with trend charts, threshold status, and corrective action aging
The KRI Library (132 Key Risk Indicators) includes the full incident KRI set—volume, severity, containment time, resolution time, repeat rate, and corrective action aging—with calibrated thresholds ready to drop into your operational risk reporting. If you’re building this from scratch, it saves months of calibration work. Get the KRI Library here.
So What?
If your incident program can’t answer the CRO’s three questions—current volume by severity, average response times, and repeat rate by root cause—it’s time to build the KRI layer.
Start with what you can measure today: pull the last 90 days of incidents, classify by severity and root cause category, and calculate rough MTTD/MTTC/MTTR by severity tier. That baseline tells you where the gaps are and gives you the first data point for trend tracking.
Then set thresholds calibrated to your risk appetite, assign ownership, and put the dashboard in front of the board quarterly. Incident KRIs that trend in the wrong direction without a documented remediation plan aren’t a monitoring problem. They’re a governance problem.
Regulators examining your incident program aren’t looking for a clean log. They’re looking for evidence that your program generates accountability and improvement over time. Incident KRIs are how you prove it.
Sources:
- 12 CFR Part 53 — Computer-Security Incident Notification
- OCC 2025 Cybersecurity and Financial System Resilience Report
- Splunk — Top 8 Incident Response Metrics
- Rootly — Incident Response Metrics: Complete Guide to MTTD, MTTR, MTTC & More
- Basel Committee on Banking Supervision — Principles for the Sound Management of Operational Risk
◆ Need the working template?
Start with the source guide.
These answer-first guides summarize the required fields, evidence, and implementation steps behind the templates practitioners search for.
◆ Related template
KRI Library (132 Key Risk Indicators)
132 KRIs with thresholds, data sources, and escalation triggers pre-built for financial services.
◆ Immaterial Findings · Weekly
Sharp risk & compliance insights. No fluff.
◆ FAQ
Frequently asked questions.
What are the most important incident KRIs for financial institutions?
What MTTR benchmark should financial institutions target?
How does incident volume function as a KRI versus a KPI?
What does a regulator expect to see in an incident KRI dashboard?
How do root cause KRIs prevent repeat incidents?
What is the difference between MTTC and MTTR?
Author
Rebecca Leung
Rebecca Leung has 8+ years of risk and compliance experience across first and second line roles at commercial banks, asset managers, and fintechs. Former management consultant advising financial institutions on risk strategy. Founder of RiskTemplates.
◆ Related framework
KRI Library (132 Key Risk Indicators)
132 KRIs with thresholds, data sources, and escalation triggers pre-built for financial services.
◆ Keep reading
Related posts.
Incident Response
Business Email Compromise Incident Response: The First 48 Hours for Financial Institutions
BEC caused $3.04 billion in losses in 2025. Recovery depends almost entirely on speed. Here's the hour-by-hour playbook: wire recall steps, IC3 reporting, Financial Fraud Kill Chain, SAR requirements, and how to close the loop.
May 21, 2026
Incident Response
FFIEC 36-Hour Incident Notification Rule: What Banking Organizations Must Report, When, and to Whom
A practitioner's guide to the federal banking agencies' computer-security incident notification rule — what triggers the 36-hour clock, the two-tier framework for banks vs. bank service providers, and the gray areas that catch incident response teams off guard.
May 14, 2026
Incident Response
NYDFS Hits Delta Dental With $2.25M — The First 2026 Cyber Action Is About Notice and Retention, Not the Breach
NYDFS's first 2026 cybersecurity enforcement penalizes Delta Dental for a six-month notification delay and lengthened MOVEit retention settings — not for getting hit. What practitioners should pull from the consent order.
May 13, 2026