How to Build an Operational Risk Management Framework From Scratch

March 19, 2026 • Rebecca Leung •

operational risk RCSA key risk indicators

Table of Contents

TL;DR:

An operational risk management (ORM) framework isn’t a document — it’s a living system of identification, assessment, monitoring, and response that regulators will test under pressure.

The OCC’s 2025 supervisory priorities explicitly call out operational risk controls, third-party resilience, and enterprise change management — examiners are looking for this.

Build yours around four core pillars: risk and control self-assessments (RCSAs), key risk indicators (KRIs), loss event tracking, and scenario analysis. Here’s exactly how.

Your Examiner Doesn’t Care About Your Risk Framework Slide Deck

The OCC’s 2025 Bank Supervision Operating Plan reorganized its priorities into three categories: financial risk, operational risk, and compliance risk. Operational risk got its own section for the first time — with specific callouts for cybersecurity preventative controls, incident response, third-party risk management, and enterprise change management.

Translation: examiners are done accepting “we have an ORM program” as an answer. They want to see how it works. How risks flow from identification through assessment to monitoring. What triggers escalation. Who owns what.

And if you’re at a mid-size bank or fintech that’s been running on spreadsheets and good intentions, this is the guide that gets you from nothing to examiner-ready.

What Operational Risk Actually Covers (It’s Broader Than You Think)

The Basel Committee’s definition — “the risk of loss resulting from inadequate or failed internal processes, people, and systems, or from external events” — is deceptively simple. In practice, operational risk includes:

Risk Category	Examples	Who Typically Owns It
Process failures	Transaction errors, settlement breaks, misapplied payments	Operations / COO
People risk	Unauthorized trading, employee fraud, key person dependency	HR + Business Line
Technology risk	System outages, failed deployments, data corruption	CTO / CIO
External events	Cyberattacks, natural disasters, vendor failures	CISO + BCP
Legal & compliance	Regulatory fines, contract disputes, litigation	CLO + CCO
Third-party risk	Vendor outages, concentration risk, fourth-party exposure	TPRM / Vendor Management

The reason this matters: most firms that get MRAs (Matters Requiring Attention) or consent orders for operational risk deficiencies don’t fail because they ignored one big thing. They fail because they didn’t connect the dots across categories.

Case in point: In October 2020, the OCC issued a cease and desist order against Citibank and assessed a $400 million civil money penalty for deficiencies in risk management, data governance, and internal controls. The order cited “serious and longstanding deficiencies” across multiple operational risk areas — not one spectacular failure, but the accumulation of inadequate processes, poor data quality, and insufficient board oversight. When Citibank failed to remediate adequately, the OCC came back in July 2024 with an additional $75 million penalty specifically for violating the original order and lacking processes to monitor data quality impacts on regulatory reporting.

That’s $475 million because operational risk management was treated as a checkbox exercise instead of an integrated system.

The Four Pillars of an ORM Framework

Every functional ORM program rests on four interconnected components. Skip one and the whole thing wobbles.

Pillar 1: Risk and Control Self-Assessment (RCSA)

The RCSA is where you identify what can go wrong and evaluate whether your controls are working. It’s the foundation everything else builds on.

How to run one that isn’t theater:

Scope by process, not department. Map your RCSAs to actual business processes (loan origination, wire transfers, customer onboarding) rather than org chart boxes. A single process can cross three departments — the risk doesn’t care about your reporting lines.
Use a consistent risk taxonomy. Align to Basel II event types or develop your own, but make it consistent across the enterprise. The seven Basel event types are:
- Internal fraud
- External fraud
- Employment practices & workplace safety
- Clients, products & business practices
- Damage to physical assets
- Business disruption & system failures
- Execution, delivery & process management
Rate inherent risk AND residual risk. Inherent = what’s the exposure without controls. Residual = what’s left after controls. If your residual risk rating is always “low,” your assessment is wrong — go back and stress-test.
Document control effectiveness with evidence, not opinions. “We have a dual-approval process” isn’t evidence. “Dual-approval rejected 47 transactions in Q4, 12 of which were confirmed errors” is evidence. Track control testing results, exception rates, and failure incidents.
Refresh annually at minimum — quarterly for high-risk processes. An RCSA that’s 18 months old is a liability, not an asset. Forvis Mazars’ 2024 RCSA best practices guidance emphasizes that RCSA capabilities must be “adaptable, agile, and integrated” to keep pace with evolving operational environments. Static annual exercises don’t cut it anymore.

30-day RCSA launch plan:

Days 1–5: Select 3–5 critical processes for pilot. Pull process maps, prior audit findings, and loss event data.
Days 6–15: Facilitate RCSA workshops with process owners and control owners. Use structured templates with consistent rating scales (1–5 for likelihood and impact).
Days 16–25: Compile results, identify gaps between documented controls and actual practice, flag residual risks above appetite.
Days 26–30: Present results to risk committee, assign action items with owners and deadlines, establish refresh cadence.

Pillar 2: Key Risk Indicators (KRIs)

KRIs are the early warning system. They tell you a risk is materializing before it becomes a loss event.

The difference between a good KRI and a useless one:

Bad KRI	Good KRI	Why It’s Better
”Number of system outages"	"Unplanned system downtime hours for core banking platform, trailing 30 days”	Specific, measurable, tied to a critical system, trended over time
”Employee turnover"	"Turnover rate in BSA/AML team, trailing 90 days vs. 12-month avg”	Targets a high-risk function, includes comparison baseline
”Number of customer complaints"	"Complaint-to-transaction ratio for wire transfers, month over month”	Normalized, focused on a risk-prone process, shows trajectory
”Vendor incidents"	"Critical/high severity incidents from Tier 1 vendors, trailing quarter, vs. SLA breach threshold”	Tiered by vendor criticality, benchmarked against contractual SLAs

Setting thresholds that trigger action:

Every KRI needs three zones:

Green (within appetite): Business as usual. Report in standard dashboards.
Amber (approaching limit): Investigate root cause. First-line risk owner must document analysis within 5 business days. Risk committee notified.
Red (breached): Immediate escalation. First-line must submit remediation plan within 48 hours. Second-line validates. Board risk committee briefed at next meeting (or emergency session if severity warrants).

A KRI without thresholds is just a metric. A metric without thresholds is just a number. Numbers don’t prevent losses — escalation protocols do.

Start with 15–20 KRIs across your top risk categories. Don’t try to boil the ocean with 200 indicators nobody monitors. You can expand later once the muscle memory exists.

Pillar 3: Loss Event Tracking

Every operational risk loss needs to be captured, categorized, analyzed, and fed back into your RCSA and KRI program. This is where most firms are weakest — they track big losses when forced to, but let the small ones slip through.

Why small losses matter: According to ORX’s 2024 Banking Operational Risk Loss Data Report, global banks reported over 65,000 loss events in 2023, with an average loss size of €231,651. But the real insight was in frequency trends: low-severity external fraud events hit their highest level in ORX’s 22-year database history in 2022, with 38% of firms reporting their all-time peak fraud event counts that year. Transaction-related losses — processing errors, accounting mistakes, failed settlements — hit nearly €8 billion in 2023, making them the costliest operational risk category that year.

The firms that catch these trends early are the ones with disciplined loss event capture. The ones that don’t find out during exam prep.

What to capture for every loss event:

Event date and discovery date (the gap between these tells you something about detection controls)
Basel event type classification
Gross loss amount
Recovery amount (insurance, legal settlements)
Net loss
Root cause (use a standardized taxonomy: process, people, technology, external)
Business line and process
Related control failures (link back to RCSA)
Near-miss indicator (was this a loss or a near-miss? Both matter)

Minimum reporting thresholds: Most mid-size banks use €10,000–€20,000 as the minimum capture threshold. Below that, tracking costs exceed insight value. But track near-misses regardless of potential amount — they’re free lessons.

Pillar 4: Scenario Analysis

Scenario analysis stress-tests your framework against plausible-but-severe events that haven’t happened yet. It’s where you answer: “What if our core processor goes down for 72 hours?” or “What if a key vendor gets breached and exfiltrates customer data?”

Why this matters now more than ever: The July 2024 CrowdStrike outage demonstrated exactly how a single third-party failure cascades. A faulty software update crashed 8.5 million Windows machines globally, disrupting banks, airlines, hospitals, and government services. Insurance firm Parametrix estimated the top 500 US companies (excluding Microsoft) faced approximately $5.4 billion in financial losses. Banks that had scenario-analyzed a “critical vendor software failure” event were the ones with tested playbooks and faster recovery.

Running useful scenarios:

Select 5–8 scenarios annually that align with your top inherent risks and emerging threats. Prioritize scenarios the OCC is signaling concern about: cybersecurity events, third-party failures, and operational resilience disruptions.
Define severity and frequency estimates using structured expert judgment. Bring risk owners, business leaders, and subject matter experts together — not just the risk team in a room guessing.
Quantify potential impact in terms of direct financial loss, regulatory penalties, customer impact, and reputational damage.
Test your response capabilities against the scenario. Don’t just estimate the loss — walk through what you’d actually do. Who gets called? What decisions need to be made in the first hour? The first 24 hours?
Feed results back into capital planning (for Basel requirements) and insurance coverage reviews.

The ORM Lifecycle: Connecting the Pillars

These four pillars don’t operate in isolation. Here’s how they feed each other:

RCSA identifies risks & control gaps
    ↓
KRIs monitor the risks RCSA identified
    ↓
Loss events validate (or challenge) RCSA ratings
    ↓
Scenarios stress-test the risks KRIs can't predict
    ↓
All four inform risk appetite, capital, and reporting
    ↓
Board & management receive integrated view
    ↓
Loop back: update RCSA with new loss data & scenario results

The integration test: If a loss event occurs and you can’t trace it back to a risk in your RCSA, either your RCSA missed something or your taxonomy doesn’t match your loss event categories. If a KRI breaches red and nobody acts, your escalation protocols are broken. If a scenario materializes and your response looks nothing like what you planned, your scenarios are fantasy.

Building From Nothing: A 120-Day Implementation Roadmap

If you’re starting fresh — maybe you just got hired as the first risk manager, or maybe the examiner just handed you an MRA — here’s a realistic timeline.

Days 1–30: Foundation

Inventory existing risk-related documentation (even if scattered across departments)
Define your operational risk taxonomy (Basel-aligned or custom)
Draft the ORM policy: scope, governance, roles, risk appetite statement
Identify your risk committee structure and meeting cadence
Deliverable: Approved ORM policy, risk taxonomy, governance structure

Days 31–60: RCSA Pilot

Select 5 high-risk processes for initial RCSA
Conduct facilitated workshops with process owners
Document inherent risks, controls, control effectiveness, and residual risk ratings
Identify immediate gaps (risks with no controls, or controls with no evidence of effectiveness)
Deliverable: Completed RCSA for 5 processes, gap analysis report

Days 61–90: KRIs and Loss Event Tracking

Design 15–20 KRIs across top risk categories from RCSA results
Set green/amber/red thresholds with risk committee input
Implement loss event capture process (can start with a structured spreadsheet — don’t let tool selection delay launch)
Back-populate with known loss events from past 12 months
Deliverable: KRI dashboard (even if in Excel), loss event register, escalation procedures

Days 91–120: Scenario Analysis and Reporting

Conduct 3–5 scenario analysis workshops
Build first board-level ORM report integrating RCSA results, KRI status, loss event trends, and scenario outcomes
Establish quarterly reporting cadence
Document the program for examiner consumption — methodology documents, governance minutes, evidence of action on identified risks
Deliverable: Scenario analysis results, first integrated ORM report, program documentation package

The honest truth: 120 days gets you functional, not mature. A genuinely embedded ORM program takes 12–18 months of repetition, calibration, and cultural change. But 120 days gets you something defensible when the examiner shows up — and that’s the immediate goal.

Three Lines of Defense: Who Owns What

Operational risk is one of those domains where “everyone owns it” quickly becomes “nobody owns it.” Apply the three lines of defense model clearly:

Line	Role in ORM	Specific Responsibilities
1st Line: Business Units	Own and manage operational risks daily	Conduct RCSAs, maintain controls, report loss events, monitor KRIs, escalate breaches
2nd Line: Risk Management	Provide the framework, challenge, and oversight	Design ORM methodology, set risk appetite, validate RCSAs, aggregate reporting, independent challenge
3rd Line: Internal Audit	Independent assurance	Audit the ORM framework itself — is it effective? Are RCSAs credible? Are KRIs actionable? Are loss events being captured?

At a mid-size bank (under $50B assets): The operational risk function typically sits within the CRO organization, staffed by 2–5 dedicated ORM professionals. They own the methodology but depend on first-line risk coordinators embedded in each business unit to execute RCSAs and KRI monitoring.

At a fintech or smaller bank: You might not have a dedicated ORM team. In that case, the CCO or Head of Risk typically owns the framework, with operational risk activities distributed among department heads. The key is documenting those assignments — an examiner wants to see named individuals, not vague “the business owns it” statements.

What Examiners Actually Look For

Based on the OCC’s Semiannual Risk Perspective and recent enforcement trends, here’s what gets flagged:

No documented ORM policy or outdated policy. If your ORM policy is from 2019 and doesn’t mention third-party risk, cyber, or operational resilience, it’s a finding.
RCSAs that don’t reflect the actual risk environment. If your RCSA hasn’t been updated since you onboarded a major new vendor or launched a new product, that’s a gap.
KRIs with no defined thresholds or escalation procedures. Tracking 50 metrics nobody acts on is worse than tracking 10 that drive decisions.
No loss event tracking or incomplete capture. If the only losses you’ve recorded are the ones the auditor found, you have a culture problem.
Board and management reporting that’s all green. A risk dashboard that never shows amber or red signals either a perfect organization (unlikely) or a broken assessment process (probable).
No connection between ORM outputs and business decisions. The framework exists to inform decisions — new product approvals, vendor selections, capital allocation, technology investments. If none of those reference your ORM data, the program is decorative.

So What? Why This Matters Right Now

Operational risk isn’t theoretical. In 2023, global banks still experienced over 65,000 loss events tracked by ORX, with transaction-related losses alone hitting €8 billion. The CrowdStrike outage in July 2024 showed how a single third-party failure can cascade into billions in losses across industries. And regulators are watching more closely than ever — the OCC’s 2025 priorities elevate operational risk to a dedicated supervisory category for the first time.

If you’re at a mid-size bank or fintech without a structured ORM framework, you’re not just carrying unmanaged risk — you’re carrying exam risk. The MRA for “inadequate operational risk management” is one of the most common findings in community and mid-size bank exams.

The good news: you don’t need a million-dollar GRC platform to start. You need a policy, a taxonomy, a handful of RCSAs, some meaningful KRIs, and a process for capturing when things go wrong. Start there. Iterate. Mature.

Need a head start? The Compliance Essentials Bundle includes risk assessment templates, issues tracking, and documentation frameworks designed for exactly this stage of program buildout.

FAQ

What’s the difference between operational risk and enterprise risk management?

Operational risk is a subset of enterprise risk management (ERM). ERM covers all risk categories — credit, market, liquidity, strategic, reputational, and operational. An ORM framework zooms in on risks from internal processes, people, systems, and external events. In practice, your ORM program should feed into the broader ERM framework, with operational risk data flowing into enterprise-level risk appetite statements and board reporting. At many mid-size banks, the ORM team sits within the ERM function but maintains its own methodology and assessment cycle.

How many KRIs should a mid-size bank track?

Start with 15–20, focusing on your highest inherent risks from the RCSA. Common starter KRIs include: system availability rates for critical applications, transaction error rates, employee turnover in key risk functions (compliance, BSA, operations), cybersecurity incident volume, vendor SLA breach rates, and customer complaint trends. Quality over quantity — every KRI should have defined thresholds, an owner, and a documented escalation path. You can expand as the program matures, but tracking 100 indicators nobody reviews is worse than tracking 15 that drive action.

Can we build an ORM framework in spreadsheets or do we need GRC software?

Spreadsheets are a perfectly valid starting point, especially for banks under $10B in assets. What matters to examiners is the process, not the platform. A well-maintained Excel-based RCSA with documented methodology, evidence of refresh, and clear action tracking is infinitely better than a six-figure GRC tool that nobody uses properly. That said, spreadsheets break down around 50+ RCSAs or when you need to aggregate KRI data from multiple sources automatically. Plan for a GRC migration once the program is stable and you’ve proven the methodology works — typically 12–18 months after initial implementation.

Frequently Asked Questions

What are the four pillars of an operational risk management framework?

A comprehensive ORM framework is built on four pillars: Risk and Control Self-Assessments (RCSAs) that identify and assess risks within business processes; Key Risk Indicators (KRIs) providing early warning signals; Loss Event Tracking capturing actual operational losses; and Scenario Analysis estimating potential losses from severe but plausible events. Together, they create a quantitative and qualitative picture of operational risk exposure.

What does operational risk include beyond technology failures?

Operational risk encompasses six categories: process failures (errors, control breakdowns), people risk (fraud, misconduct, key-person dependency), technology risk (system outages, cybersecurity), external events (natural disasters, geopolitical disruptions), legal and compliance risk (regulatory penalties, litigation), and third-party risk (vendor failures and concentration). Process and people risks typically drive more frequent loss events than technology failures.

What is a Risk and Control Self-Assessment (RCSA)?

An RCSA is a structured process where business lines identify their key risks, assess the effectiveness of existing controls, and identify gaps requiring remediation. It's a core ORM tool because it embeds risk awareness in business operations rather than concentrating all risk identification in a central risk function. Regulators expect RCSAs to reflect actual business risks — not be generic checkbox exercises.

What do examiners look for in an operational risk management program?

Examiners look for evidence that ORM is integrated into business operations, not just a reporting function. Key exam areas include whether RCSA findings reflect real risks, whether KRI thresholds trigger genuine escalation, whether loss events are captured consistently with root-cause analysis, and whether management actions address identified gaps. The Citibank cease-and-desist order is the cited example of what inadequate ORM governance looks like at scale.

Rebecca Leung

Rebecca Leung has 8+ years of risk and compliance experience across first and second line roles at commercial banks, asset managers, and fintechs. Former management consultant advising financial institutions on risk strategy. Founder of RiskTemplates.

Related Framework

Compliance Essentials

Multi-domain compliance coverage: data privacy, incident response, BCP/DR, and SOC 2 — 43% off.

See What's Included → Buy Now — $169

Keep Reading

Operational Risk

Liquidity Stress Testing Techniques: Modeling Run-Off, Wholesale Withdrawal, and Contingent Draws

Go beyond the scenario labels. How to build defensible run-off rate assumptions, model wholesale funding cliff risk, and quantify contingent draw exposure — with the specific techniques examiners challenge.

May 4, 2026

Operational Risk

Risk Matrix Template: 5x5 vs 3x3 vs Heat Map — Which to Use and How to Defend It

A risk matrix is only as good as the calibration behind it. Here's how to choose between 5x5 and 3x3, build defensible scoring criteria, and present the result in a way regulators and boards actually trust.

May 3, 2026

Operational Risk

Risk Register Template: A Fintech Edition with 30+ Real Risk Examples and Scoring

Build a fintech risk register that survives examiner scrutiny. 30+ real risks across BaaS, fraud, vendor, AI, and compliance — with scoring, owners, and controls.

May 3, 2026

Immaterial Findings ✉️

Weekly newsletter

Sharp risk & compliance insights practitioners actually read. Enforcement actions, regulatory shifts, and practical frameworks — no fluff, no filler.

Join practitioners from banks, fintechs, and asset managers. Delivered weekly.