Operational Risk

KRI Thresholds: How to Stop Your Dashboard From Creating False Greens and False Reds

May 15, 2026 Rebecca Leung
Table of Contents

Your KRI dashboard is green. Every metric. Every domain. Has been for the past nine months. Your CRO just got asked by the audit committee whether the green dashboard means the firm is well-controlled, or whether it means the dashboard is broken.

If you can’t answer that question with data, the dashboard is broken.

TL;DR

  • False greens hide risk because thresholds are set too loose. False reds train people to ignore the dashboard because thresholds are set against noise.
  • The fix is statistical, not theatrical: 12-24 months of historical data, threshold set at 95th percentile or risk appetite — whichever is tighter — and a 60-day parallel run before go-live.
  • Red threshold must be tighter than or equal to the risk appetite limit for the same risk. If it’s looser, your dashboard is structurally incapable of warning you.
  • The deadliest KRI failure mode is “perpetual green” — a metric that has not triggered amber in 18+ months and that nobody checks. Cut it, retire it, or recalibrate it.

Why Most KRI Thresholds Are Wrong on Day One

Walk into any mid-size financial institution and pull up the KRI library. Most thresholds will be round numbers — 5%, 10%, 25 events. They were set by a working group three years ago based on what felt reasonable at the time. Nobody pulled historical data. Nobody computed a distribution. Nobody asked whether the threshold was tighter or looser than the risk appetite statement.

The result is a dashboard that’s wrong in two specific, opposite ways.

Failure mode 1: False greens

The threshold is set looser than it should be. Risk increases, the metric moves, but it stays inside green territory until the loss event actually happens. The dashboard stays calm right up until the moment it doesn’t matter anymore.

This is what happened at Silicon Valley Bank. Per the OIG’s Material Loss Review, SVB had liquidity metrics in place but the thresholds assumed deposit stickiness that didn’t hold in stress. The metrics weren’t lying — they were calibrated against a deposit base that didn’t exist anymore. By the time the early warning indicators actually triggered, the run was already happening.

False greens are dangerous because they generate confidence. The audit committee sees a green dashboard, concludes risk is well-managed, and moves on. The metric is decorative.

Failure mode 2: False reds

The threshold is set tighter than necessary, or against natural variance. The metric trips amber or red on noise — seasonality, normal business variation, one bad week — without any actual risk increase. The first time you investigate, you find nothing. The second time, you find nothing. By the third time, the operational risk team stops investigating amber signals on this metric.

That’s alert fatigue, and it’s not theoretical. According to industry research on KRI dashboard effectiveness, when more than 20% of amber/red signals don’t correspond to a real risk event, the dashboard loses credibility — and the metric is functionally retired even if it stays on the dashboard.

The Threshold Calibration Workflow

There’s a defensible method to setting thresholds. Use it.

Step 1: Pull 12-24 months of metric history

You cannot calibrate a threshold without a distribution. If the metric is new, run it in shadow mode for 90 days before setting any threshold. If the metric existed before but wasn’t tracked formally, pull the historical data from source systems even if it wasn’t in the dashboard.

Compute three things:

  • Mean — what’s normal
  • Standard deviation — how much normal varies
  • 95th percentile — what an unusually high reading looks like under historical conditions

Step 2: Find the risk appetite anchor

Pull the risk appetite statement for the corresponding risk. Find the quantitative limit. The 95th percentile from Step 1 and the appetite limit are your two candidate red thresholds.

Rule: red threshold equals the tighter of the two. If your appetite says 2% and your 95th percentile is 1.4%, your red is 1.4%. If your 95th percentile is 3.5% and your appetite is 2%, your red is 2% — and you have a separate problem because your historical data suggests you’ve been breaching appetite without escalating.

Step 3: Set amber at 70-80% of red

The amber band gives Treasury, ops, or the relevant 1LoD team room to act before red breaches require board escalation. Too narrow (95% of red) and amber becomes a synonym for red. Too wide (50%) and amber loses urgency.

Step 4: Run a 60-day parallel period

Before the threshold goes live in board reporting, run it in parallel: track when amber and red would have triggered, and for each signal, ask whether there was a corresponding real risk event (loss, near-miss, incident, control failure).

Parallel run signal patternWhat it meansWhat to do
>20% of amber/red signals have no real-risk correlateFalse reds — threshold too tight or wrong metricWiden bands or replace metric
Real risk events happen without amber triggersFalse greens — threshold too looseTighten bands
1-5% amber rate, real risk events trigger amber 80%+ of the timeGoldilocksGo live

Step 5: Annual recalibration

Re-run the distribution every year. Compare current 95th percentile to last year’s. If it’s drifted materially, the threshold needs to drift too — or the risk environment has shifted and the threshold should hold while you investigate. Document either choice in writing.

The “Perpetual Green” Audit

Pull every KRI in your library. For each one, answer two questions:

  1. Has this metric triggered amber or red in the last 18 months?
  2. If amber/red were triggered today, would anyone actually do anything?

If the answer to both is no, the KRI is dead weight. It’s not key — it’s ornamental. The threshold is wrong, the metric is wrong, or the risk it’s monitoring isn’t actually material to the firm.

Three options:

Option A: Recalibrate. Pull the underlying data, redo the distribution analysis from Step 1, and tighten the bands. Most “perpetual green” KRIs come back to life with proper calibration.

Option B: Replace. The metric is measuring the wrong thing. Cyber’s “number of failed login attempts” KRI is famously perpetual-green at most institutions because raw failed logins are not the relevant risk signal — failed logins per privileged account in a 24-hour window is.

Option C: Retire. The risk it monitors isn’t material anymore, or the metric is decorative. Move it to a watch list — tracked but not reported — and free up board reporting bandwidth for KRIs that matter.

Practitioners building libraries from scratch should start with our KRI guide with 50+ examples by risk domain, which covers domain-specific metrics across operational, credit, cyber, liquidity, third-party, and model risk.

Examples: Common KRI Threshold Mistakes

Concrete failure modes from real risk libraries.

Liquidity KRI: LCR threshold set to regulatory minimum

The mistake: Red at 100% LCR (regulatory minimum). Amber at 110%.

Why it fails: By the time LCR hits 110%, the institution is already in serious trouble — wholesale funding has tightened, deposits are draining, and the buffer is being eaten. Amber should trigger at 130-140% LCR for most institutions. Red at 115-120%. The regulatory minimum is not a risk threshold — it’s a violation threshold.

Cyber KRI: critical vulnerability patching SLA

The mistake: Red at “100% of critical vulnerabilities patched within 30 days.” Amber at 95%.

Why it fails: Critical vulnerabilities under NIST SP 800-40 Rev. 4 and modern incident response guidance get exploited in days, not weeks. 30 days as a “critical” threshold is the OS-patching SLA from 2010. Red should be 7-14 days for true criticals. The threshold was set against historical IT practice, not current attacker velocity.

Compliance KRI: BSA SAR filing within 30/60 days

The mistake: Red at “any SAR filed after 30/60 day deadline.” Amber set at 90% timeliness.

Why it fails: A single late SAR is a regulatory issue. Threshold should be 100% red — no amber band — and the metric should be supplemented by leading indicators (alerts pending review beyond 14 days, cases without disposition in 21 days). Lagging metrics with binary breach states need leading-indicator companions.

Operational risk KRI: number of operational loss events

The mistake: Red at “more than 10 operational losses per quarter.”

Why it fails: Loss count without severity tells you nothing. Ten $500 customer-reimbursement events are not the same risk profile as one $2 million wire fraud. Use severity-weighted aggregate loss, or split into count-and-severity matrices.

Third-party risk KRI: percentage of critical vendors with current SOC 2

The mistake: Red at “less than 90% of critical vendors have current SOC 2.”

Why it fails: 90% is a process metric, not a risk metric. The risk question is “which critical vendors don’t have a current SOC 2 and what compensating control covers the gap?” Red should be “any critical vendor without current SOC 2 or documented compensating control” — a 100% threshold tied to remediation status, not aggregate compliance.

Dynamic vs. Static Thresholds

Static thresholds — set annually, applied uniformly — are the default and the right choice for most KRIs. They’re explainable to examiners, auditable, and stable enough that a year-over-year comparison means something.

Dynamic thresholds — recalibrated automatically based on rolling statistics — are increasingly used for metrics with strong seasonality (fraud loss rates, transaction-volume KRIs, ATO attempt rates). They work, but they introduce three problems:

  1. Explainability. Examiners want to know why the threshold was X. “The model set it” is not an answer.
  2. Drift risk. A dynamic threshold trained on six months of escalating fraud will treat current fraud levels as normal. The base rate adjustment can mask deteriorating control environments.
  3. Override discipline. Dynamic thresholds need human override paths with documented criteria. Without them, you’re delegating risk appetite to a model.

If you use dynamic thresholds, write the recalibration algorithm in plain English, document the lookback window, set hard ceilings and floors that the dynamic threshold cannot move past, and require a documented human review before the threshold updates.

How Thresholds Should Connect to Risk Appetite

The red threshold for a KRI must equal or be tighter than the quantitative risk appetite limit for that risk. This is the simplest and most violated rule in KRI design.

RiskAppetite StatementWrong ThresholdRight Threshold
Liquidity”Maintain LCR above 120% under base case”Red at 100% LCRRed at 120% LCR
Credit”Net charge-offs below 1.2% annualized”Red at 1.5%Red at 1.2%
Cyber”Zero critical vulnerabilities open beyond 14 days”Red at >5% openRed at any critical past 14 days
AML”100% SARs filed within statutory deadline”Red at <95% timelinessRed at any late SAR

When the threshold is looser than appetite, the dashboard cannot warn you that you’ve breached appetite. By design.

For practitioners thinking about how KRIs feed into broader risk reporting, our piece on RCSA methodology and workshop facilitation walks through how risk ratings from the RCSA process should map cleanly to KRI red/amber breakpoints.

So What?

KRI thresholds are not a calibration detail. They’re the difference between a dashboard that warns you and a dashboard that lies to you.

Three actions to take this week:

  1. Pull the perpetual-green list. Any KRI that hasn’t triggered amber in 18 months goes on the audit list. Recalibrate, replace, or retire.
  2. Cross-check thresholds against appetite. For every quantitative appetite statement, find the matching KRI red threshold. If red is looser than appetite, fix it before the next board meeting.
  3. Document the 60-day parallel run policy. Any new KRI or recalibrated threshold goes through a parallel run before it goes live in board reporting. Get the policy in writing.

Our KRI Library ships with 50+ pre-calibrated KRIs across domains, threshold rationales, and the parallel-run template — built so the calibration work happens once, not every time the board asks why the dashboard is green.

Green dashboards make leadership comfortable. Comfortable leadership doesn’t ask questions. The question the audit committee should be asking is not “are we green?” — it’s “would we know if we weren’t?”

Need the working template?

Start with the source guide.

These answer-first guides summarize the required fields, evidence, and implementation steps behind the templates practitioners search for.

Frequently Asked Questions

What's a false green KRI?
A KRI that stays green even when the underlying risk is materializing. Usually a sign that the threshold is set too loose, the metric is measuring the wrong thing, or the data feed is stale. SVB's liquidity KRIs sat in acceptable territory until days before failure — the metrics weren't wrong, but the thresholds assumed deposit stickiness that didn't survive contact with reality.
What's a false red KRI?
A KRI that breaches a red threshold without any actual increase in risk — typically because the threshold is calibrated against noise rather than signal. Common causes: thresholds set as round numbers without statistical basis, seasonality not adjusted for, or a metric where the natural variance is wider than the threshold band. False reds train the organization to ignore the dashboard, which is worse than not having one.
How do you actually calibrate a KRI threshold?
Start with 12-24 months of historical data on the metric. Compute the mean, standard deviation, and 95th percentile. Set red at the 95th percentile or at the risk appetite limit, whichever is tighter. Set amber at 70-80% of red. Then run a 60-day parallel period before going live — if more than 20% of amber/red signals don't correspond to a real risk event, widen the bands. If real events don't trigger amber, tighten them. Iterate.
Should KRI thresholds be static or dynamic?
Most should be static and reviewed annually. Dynamic thresholds — where the system adjusts based on rolling statistics — are appropriate for metrics with strong seasonality or trend (e.g., fraud loss rates, transaction volumes) but introduce explainability problems for examiners. If you use dynamic thresholds, document the algorithm, the parameters, and the human override path. Don't let a black-box model decide when the board sees an amber.
What's the relationship between a KRI threshold and a risk appetite statement?
The red threshold should equal — or be tighter than — the corresponding risk appetite limit. If your appetite says 'no more than 2% of payments exceed 24-hour settlement,' your red threshold is 2%. Amber is 1.4-1.6%. Setting red looser than appetite means your dashboard will tell you everything's fine while you're already in breach. This is the most common KRI threshold finding in regulatory exams.
How many KRI thresholds should a single metric have?
Three is standard: green (normal), amber (warning), red (breach). Some institutions add a fourth — black — for catastrophic. Avoid more than four bands; the marginal information from a five-band threshold is usually noise, and humans struggle to act on more than three or four distinct signal levels. The goal isn't precision, it's escalation clarity.
Rebecca Leung

Rebecca Leung

Rebecca Leung has 8+ years of risk and compliance experience across first and second line roles at commercial banks, asset managers, and fintechs. Former management consultant advising financial institutions on risk strategy. Founder of RiskTemplates.

Related Framework

KRI Library (132 Key Risk Indicators)

132 KRIs with thresholds, data sources, and escalation triggers pre-built for financial services.

Immaterial Findings ✉️

Weekly newsletter

Sharp risk & compliance insights practitioners actually read. Enforcement actions, regulatory shifts, and practical frameworks — no fluff, no filler.

Join practitioners from banks, fintechs, and asset managers. Delivered weekly.