Risk Scoring Techniques: Likelihood x Impact and the 4 Variations Examiners Push Back On

May 8, 2026 • Rebecca Leung •

Table of Contents

TL;DR

Likelihood × Impact is the standard — but four application errors turn it from a defensible tool into an examination finding generator

The most common single finding: inherent and residual scores nearly identical, signaling controls were never actually evaluated

“Problems with scoring methods and ordinal scales in risk assessment” formally documented why multiplying ordinal scales produces pseudo-quantitative results — use L×I for banding decisions, not fine-grained comparisons

Impact thresholds must be calibrated to your institution’s size — a $200M credit union and a $10B bank cannot share the same dollar definitions of “Catastrophic”

You spent three days building your risk and control self-assessment. Forty risks, each scored on a 5×5 likelihood-impact matrix, heat map color-coded, formatted for the board deck. Then the examiner opens to risk #7 — Fraud Risk in Account Opening — and asks: “Walk me through how you arrived at a composite score of 20.”

And you realize: you don’t have a defensible answer.

This is the silent failure in most operational risk programs. The L×I formula is nearly universal — endorsed by COSO ERM, ISO 31000, and accepted by every bank examiner in the U.S. But how teams apply it determines whether the RCSA holds up to scrutiny or generates findings. Four variations account for most of the pushback.

The Math Problem Nobody Talks About

Before the four variations: a confession about the formula itself.

Likelihood × Impact sounds quantitative. It isn’t. The 1–5 scale you assign to likelihood is an ordinal scale — it ranks risks in order of rough probability but doesn’t measure equal intervals between ratings. A “3” likelihood isn’t exactly 1.5× more likely than a “2.” It just means “more probable.” When you multiply two ordinal numbers, you produce a risk priority number that looks precise but carries the same ordinal limitations.

The academic record on this is unambiguous. The 2010 paper “Problems with scoring methods and ordinal scales in risk assessment” (IBM Journal of Research and Development) established that ordinal-scale multiplication routinely produces reversed rankings and uninformative ratings — assigning identical scores to risks that differ by orders of magnitude, or ranking lower-risk scenarios above higher-risk ones. FMEA programs using Risk Priority Numbers run into the same structural problem.

None of this means you should abandon L×I. Most regulatory frameworks — COSO ERM and ISO 31000:2018 included — accept it as a practical approximation, and most examiners expect it. The problem arises when programs treat the output as more precise than it is: ranking Risk #37 (score 15) as meaningfully more concerning than Risk #22 (score 14) as if the one-point difference reflects real quantitative measurement. It doesn’t.

The defensible position: use L×I to categorize risks into bands (Critical, High, Medium, Low). Make decisions at the band level. Never present adjacent scores as precise comparisons across different risk categories.

Variation 1: Inherent and Residual Scores Nearly Identical

This is the single most common RCSA finding in operational risk examinations — and the most telling signal that an assessment didn’t actually evaluate controls.

The theory is clear: inherent risk is the raw exposure before any controls; residual is what remains after. The gap between them quantifies the work your control environment is doing. A wide gap means strong controls with evidence. A narrow gap means either your controls are nearly ineffective or you didn’t evaluate them.

When assessors score inherent and residual in the same workshop, at the same time, for the same risk — the social dynamics of the room drive scores toward similarity. Nobody wants to say their business unit is catastrophically risky. So they assign an inherent score, knock it down a few points, and call it residual. The gap averages 10–15% across the portfolio.

An examiner sees a portfolio where every inherent score of 20 resolves to a residual of 17 or 18 and has one interpretation: the team started from the residual and worked backward. No real control evaluation occurred.

Control Effectiveness	Expected Inherent-to-Residual Shift
Strong (tested, KRI-supported, audit opinion “effective”)	60–75% risk reduction
Adequate (tested periodically, some gaps)	40–60% reduction
Weak (design issues or testing gaps identified)	10–30% reduction
Not Tested	Residual should remain near inherent until testing occurs

Fix: Decouple the control evaluation from the inherent scoring. Before assigning a residual score, require an explicit control effectiveness rating (Strong, Adequate, Weak, Not Tested) for each control. Tie that rating to evidence — KRI data, last audit opinion, most recent control test result, loss event history. Document the evidence in the RCSA. The residual score then flows from the control effectiveness rating, not from intuition.

If there’s no evidence for a control rating, the control should be rated “Not Tested” and the residual should reflect that uncertainty — staying near inherent until testing is completed.

This is the structural fix that the RCSA methodology addresses in detail: separate the inherent discussion from the control evaluation, and require evidence before the residual is assigned.

Variation 2: Impact Thresholds Not Calibrated to Your Institution

Generic templates are convenient. They’re also one of the most reliable sources of examiner pushback for community and regional banks.

A template that defines “financial impact > $500,000 = Catastrophic” produces absurd results when applied without context. At a $50B regional bank, $500K is manageable. At a $200M community bank, $500K might represent a significant portion of quarterly net income. Same threshold, opposite practical meaning.

ISO 31000:2018 explicitly requires risk assessment to be conducted in the context of the organization — which includes size, complexity, and risk appetite. COSO ERM’s 2017 framework makes the same point in its guidance on setting risk tolerances: thresholds must be calibrated to what is material to your organization, not imported from a template written for a generic institution.

The tell in an examination: when a $200M credit union uses the same impact calibration table as a $10B bank because both used the same off-the-shelf RCSA template. Examiners at both the OCC and FDIC — whose Risk Management Manual of Examination Policies addresses risk rating adequacy — will ask whether your thresholds are calibrated to your actual risk profile.

Fix: Write your Catastrophic definition in organizational terms first: “An event that would materially impair our ability to operate or would require emergency board intervention.” Then translate that principle into dollar amounts, customer counts, regulatory consequences, and reputational harm specific to your institution’s size and risk appetite. Assign numbers to each severity level proportionally. Document the calibration rationale. When your size or risk profile changes materially — acquisition, product launch, regulatory consent order — revisit the calibration.

The COSO ERM Framework guide covers this in the context of setting risk appetite and tolerance — the same calibration exercise, applied to the ERM architecture that feeds your RCSA.

Variation 3: Scores Without Supporting Evidence

You’ve rated Fraud Risk as Likelihood 5 × Impact 5 = 25. The examiner asks: “What evidence supports a likelihood rating of 5?” If the answer is “we discussed it in the workshop” or “it seemed right” — that’s a finding.

RCSA workshops are social processes. The most senior person in the room often anchors the score, the group adjusts toward consensus, and the discussion ends when agreement forms. This produces ratings that reflect hierarchy and gut feel, not observable data. Examiners increasingly expect scores to be grounded in something more durable.

Evidence that supports likelihood scores:

Internal loss events from the past 12–24 months (count, dollar amount, frequency trend)
Near-misses and operational incidents logged in the loss event database
KRI trend data — are leading indicators moving toward or away from threshold?
Internal audit findings in the relevant process area
Industry loss data or regulatory enforcement actions in comparable peer institutions

Evidence that supports impact scores:

Historical cost of similar events at your institution
Regulatory penalty ranges for the relevant risk type
Customer complaint volume or churn associated with similar events
Estimated recovery time and operational disruption cost

The KRI Library exists precisely for this: pre-built indicators that give your team observable data points to anchor likelihood scores before the workshop begins. A pre-populated evidence brief — distributed before the session, not assembled during it — changes the conversation from “what do we feel?” to “what does the data show?”

None of this requires perfect data. A single documented evidence citation per dimension — “Likelihood = 4 based on three payment fraud events in the prior 24 months totaling $180K in recoverable losses” — transforms a subjective gut check into a defensible position.

Variation 4: Using Composite Scores to Compare Across Risk Categories

This one is subtler — and shows up most often in board risk reporting.

When you rank “Cybersecurity Risk (score 21)” above “Regulatory Compliance Risk (score 18)” in a single league table, you’re implying they can be meaningfully compared on the same quantitative scale. In most RCSA designs, they can’t.

Cybersecurity and compliance risks have different consequence types, different evidence bases, different control environments, and different regulatory treatments. Their composite scores are ordinal rankings within their respective domains. Treating the scores as directly comparable produces misleading prioritization — one that regulators and internal auditors increasingly push back on at larger, more sophisticated institutions.

The Basel Committee’s Principles for the Sound Management of Operational Risk (PSMOR, 2021) emphasizes that operational risk programs should use multiple data inputs and avoid over-reliance on any single metric. Cross-category score comparisons as a primary prioritization mechanism runs counter to this principle.

Fix: Present L×I scores as within-category prioritization tools. For the board risk report, use qualitative judgment — informed by the scores, but not mechanically produced by them — to structure the risk narrative. “Our top five operational risk concerns this quarter are…” followed by narrative context is more defensible than a league table of composite scores from different risk domains.

So What?

None of these four variations require rebuilding your program. They require tightening the discipline around how scores get assigned and documented.

The practical path forward:

Add a control effectiveness rating (Strong, Adequate, Weak, Not Tested) as a required field in your RCSA template, separate from the residual score. The residual flows from the rating, not the other way around.
Calibrate impact thresholds to your institution’s actual size and risk appetite — in writing, with a documented rationale. Revisit whenever your profile changes materially.
Build a pre-work evidence brief for each RCSA cycle: pull internal loss events, KRI trend data, and audit findings before the workshop. Require one evidence citation per risk dimension.
Adopt band-based decision-making rather than rank-ordering by composite scores. Decisions happen at the Critical/High/Medium/Low band level. Adjacent scores within a band are noise, not signal.

These changes address the most common examination findings without a full program rebuild — and they make the RCSA a more accurate reflection of where your actual risk lies.

The RCSA (Risk & Control Self-Assessment) template provides a structured framework with separate inherent scoring, control effectiveness rating fields, evidence documentation prompts, and a pre-populated evidence brief template — built to address the four variations examiners flag most often.

Need the working template?

Start with the source guide.

These answer-first guides summarize the required fields, evidence, and implementation steps behind the templates practitioners search for.

Operational Risk

RCSA Template Guide

Open source guide →

Frequently Asked Questions

Why is multiplying likelihood × impact scores technically flawed?

Likelihood and impact are typically measured on ordinal scales (1–5), which rank order risks but don't measure equal intervals between ratings. A '3' likelihood isn't exactly 1.5× more likely than a '2.' Multiplying two ordinal numbers produces a risk priority number that looks precise but carries the same ordinal limitations — it can produce reversed rankings and misleading comparisons. The 2010 paper 'Problems with scoring methods and ordinal scales in risk assessment' (IBM Journal of Research and Development) documented this formally. That said, most regulatory frameworks and examiners accept L×I as a practical approximation; the problem arises when programs treat the output as quantitatively precise rather than an ordinal ranking within risk bands.

How large should the gap be between inherent and residual risk scores?

The gap should reflect actual control effectiveness — not a reflexive 10% reduction. A strong control environment should show at least a one-band reduction (e.g., High inherent → Medium residual, or a score drop from 20 to 8–10). If every risk in your RCSA shows only a 5–10% score reduction, the assessment likely did not evaluate controls meaningfully. The gap needs to be backed by evidence: KRI data, audit opinions, control testing results, or recent loss events. A residual score without documented control evidence is what examiners call 'assumed residual' — and it's a finding.

What evidence should back a likelihood score in an RCSA?

The strongest evidence includes: internal loss events and near-misses from the past 12–24 months, KRI trending data showing control degradation or stability, internal audit findings in the relevant process area, and operational incident reports. Industry loss data or peer benchmarks can supplement where internal data is limited. The minimum standard is one documented evidence citation per risk dimension — even a single sentence ('Likelihood = 4 based on three fraud events in the past 24 months totaling $180K') transforms a gut-check score into a defensible position. 'We discussed it in the workshop' is not evidence.

How should impact thresholds be calibrated to our institution?

COSO ERM 2017 and ISO 31000:2018 both require risk assessment in the context of the organization, which includes size, complexity, and risk appetite. Set Catastrophic as 'an event that would materially impair the organization's ability to operate or require emergency board intervention' — then translate that into dollar amounts, customer counts, regulatory consequences, and reputational harm calibrated to your institution. A $200M community bank and a $50B regional bank cannot use the same financial thresholds. Document the calibration rationale, and revisit it when your size or risk profile changes materially.

Is it a problem to compare risk scores across different risk categories?

Yes — this is one of the more subtle examiner concerns. Composite scores from L×I are ordinal rankings within a risk domain, not cardinal measurements on a common scale. Ranking Cybersecurity Risk (score 21) against Regulatory Compliance Risk (score 18) as if they're directly comparable implies precision that doesn't exist. Examiners at larger institutions push back when the RCSA presents cross-category comparisons as if they're quantitative. Scores should drive within-category prioritization; board-level reporting should use qualitative judgment informed by scores, not raw number comparisons.

What's the fastest fix if my RCSA has these problems?

Start with two changes before the next assessment cycle. First, require one documented evidence citation per risk dimension — loss events, KRI trends, audit findings. This addresses both the 'scores without evidence' problem and forces actual likelihood discussion. Second, add a separate control effectiveness rating (Strong, Adequate, Weak, Not Tested) to each risk, required before the residual score is assigned. The gap between inherent and residual must be justified by the control effectiveness rating. These two structural changes resolve three of the four common findings without a full program rebuild.

Rebecca Leung

Rebecca Leung has 8+ years of risk and compliance experience across first and second line roles at commercial banks, asset managers, and fintechs. Former management consultant advising financial institutions on risk strategy. Founder of RiskTemplates.

Related Framework

RCSA (Risk & Control Self-Assessment)

141 pre-populated fintech risks with control assessments, questionnaire framework, and testing calendar.

See What's Included → Buy Now — $69

Keep Reading

Operational Risk

Funding Sources Aren't Real Until Tested: How to Prove Your Contingency Funding Plan Works

Most CFPs list contingent funding sources without proving they're accessible. Here's how to run fund-flow tests, build an evidence file, and show regulators that your liquidity plan actually works when it needs to.

May 15, 2026

Operational Risk

KRI Thresholds: How to Stop Your Dashboard From Creating False Greens and False Reds

Set KRI thresholds that actually warn before risk materializes. Calibration methods, the 60-day parallel run, and how to fix dashboards stuck in alert fatigue or perpetual green.

May 15, 2026

Operational Risk

Operational Risk Scenario Analysis: Building 'Severe But Plausible' Scenarios That Satisfy Internal Audit and the OCC

A practitioner's guide to designing, facilitating, and defending operational risk scenario analysis — from workshop setup and expert elicitation to loss estimation and ICAAP integration.

May 13, 2026

Immaterial Findings ✉️

Weekly newsletter

Sharp risk & compliance insights practitioners actually read. Enforcement actions, regulatory shifts, and practical frameworks — no fluff, no filler.

Join practitioners from banks, fintechs, and asset managers. Delivered weekly.