AI Risk

SR 11-7 for AI Systems: Applying Legacy Model Risk Guidance to LLMs

April 12, 2026 Rebecca Leung
Table of Contents

There’s a conversation happening in MRM teams right now: “Legal wants to use an LLM for contract analysis. It needs to go through model risk review.” “Great. Which SR 11-7 framework do we use?” “The same one we always use.” “OK — how do we document conceptual soundness for a transformer model?” Silence.

The framework is the right one. The problem is most banks are running a 2011 checklist against a 2025 system and hoping nobody notices the gaps. The OCC’s Fall 2025 Semiannual Risk Perspective called AI an “emerging risk” requiring management “commensurate with the materiality and complexity” of AI activities. Federal Reserve Governor Barr’s April 2025 speech explicitly cited SR 11-7 as foundational for AI governance while acknowledging the explainability and hallucination problems that current guidance doesn’t cleanly address.

The framework is sound. The implementation details are what break.

Here’s how to actually apply SR 11-7 to LLMs — from model inventory through governance ownership through validation scope — without the theater.


TL;DR

  • SR 11-7 applies to LLMs, but the implementation assumptions (static parameters, annual validation cycles, deterministic outputs) must be rethought for AI
  • The five places MRM programs most often break with LLMs: model inventory scoping, governance ownership, documentation gaps, vendor AI handling, and validation timing
  • OCC Bulletin 2025-26 signals a broader MRM guidance review is coming — don’t wait for it; build a defensible program now based on existing expectations
  • The GAO’s May 2025 report found examiners apply SR 11-7 inconsistently to AI tools — document your rationale for every scope and methodology decision

Does SR 11-7 Actually Apply to Your LLM?

Start here because most MRM programs waste energy on the wrong question.

SR 11-7 defines a “model” as “a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates.” That clause about “quantitative estimates” is what creates the debate about LLMs — their outputs are often text, not numbers.

Don’t get trapped in the definitional argument. The OCC’s 2021 Comptroller’s Handbook guidance resolved it pragmatically: “Regardless of how AI is classified (as a model or not a model), the associated risk management should be commensurate with the level of risk.”

The practical decision tree:

LLM Use CaseSR 11-7 Model?Required Treatment
Credit underwriting / risk scoringYes — outputs drive quantitative decisionsFull Tier 1 MRM governance
Fraud detection / transaction risk scoringYesFull MRM governance
Compliance monitoring (SAR flagging, AML)Yes — SR 21-8 confirms this explicitlyFull MRM governance
Customer-facing Q&A / support chatbotArguable — tier by consequenceCommensurate risk controls
Internal document drafting / summarizationLowest risk tierDocumented AUP + basic monitoring
Contract review / legal researchArguable — tier by consequenceRisk assessment + human review controls

SR 21-8 (the 2021 BSA/AML interagency guidance) explicitly confirmed that transaction monitoring and sanctions screening systems “meet the definition of ‘model’ under SR 11-7” — even when built on ML architectures. If your LLM is anywhere near the BSA/AML stack, it’s in scope.

The GARP analysis from February 2026 put it clearly: the real tension is that SR 11-7 was “grounded in the principle that models are simplified, relatively static representations of real-world relationships.” LLMs are neither simplified nor static. That tension doesn’t make SR 11-7 irrelevant — it makes the implementation decisions harder. Document your rationale for every scoping decision.


Step 1: Build an AI Model Inventory That Captures LLMs

SR 11-7 requires organizations to “maintain an inventory of models implemented for use, under development for implementation, or recently retired.” That sentence was written for credit scorecards. Your inventory needs to expand significantly.

What the inventory must include (baseline per OCC 2021 Comptroller’s Handbook):

  • Model name and model risk level (High/Medium/Low)
  • Owning department and model owner (person, not team)
  • Developer (in-house team, vendor, or open-source)
  • Purpose and intended use cases
  • Actual usage and any restrictions on use

LLM-specific fields to add:

  • Model type (foundation model, fine-tuned, RAG-augmented, prompt-engineered wrapper)
  • Provider name and model version (e.g., GPT-4o, Claude 3.5, Llama 3.1)
  • API endpoint or deployment environment
  • Fine-tuning data used (if any) and source
  • Prompt engineering approach and version control status
  • Output format (text, structured output, risk score)
  • Human-in-the-loop controls (required? Optional? None?)
  • Validation status and last review date
  • Revalidation triggers (version update, use case expansion, degradation threshold breach)

The shadow AI problem is your biggest inventory risk. The OCC’s Fall 2025 SARP and examiner feedback consistently flag models deployed without formal inventory registration. At most organizations, business lines are spinning up vendor LLM API connections faster than MRM can track them. The fix: require a model intake form for any AI system before deployment authorization, with a signed attestation from the business line owner.

The inventory will also need to capture embedded AI — the LLM baked into your vendor’s underwriting platform, your compliance screening tool, or your CRM. These often aren’t registered because nobody thinks of them as “your” models. They are your risk. Require vendors to disclose AI/ML components in their systems, and document them.


Step 2: Governance Ownership — Who Actually Owns the LLM?

SR 11-7’s governance pillar assumes “clear separation between model development, validation, and use.” For traditional credit models, that’s straightforward — development owns the model, validation reviews it, the business line uses it.

LLMs break this structure three ways:

Problem 1: The enterprise GenAI platform problem. When a bank deploys a single LLM platform used by legal, compliance, operations, and risk simultaneously, the “model owner” concept frays. SR 11-7 says “model owners typically are in individual business units” — but who owns an enterprise tool with a dozen use cases?

The fix: Assign a single accountable model owner — typically the function that initiated deployment or has the most material use case. Designate business line “model users” with documented use-case-specific limitations. Create a tiered governance structure: the enterprise owner sets the AI risk appetite and platform controls; business line users are responsible for use-case-specific controls and human oversight.

Problem 2: The vendor model ownership gap. When the foundation model comes from OpenAI or Anthropic, traditional “model owner” accountability doesn’t map cleanly. You can’t review their training data, audit their RLHF process, or demand architectural documentation. The bank uses the model but doesn’t control the model.

The fix: Your ownership responsibility is at the use-case boundary — the prompts you write, the guardrails you implement, the human review you require, the output validation you run. Document your use-case scope explicitly. Require contractual notifications from vendors for model version changes, training data updates, and known capability changes. The model card should live in your inventory even if it came from the vendor.

Problem 3: The MRM / data science accountability gap. At many banks, AI governance is split across model risk (SR 11-7 compliance), data science (model development), and increasingly IT/legal (vendor management and GenAI policy). Nobody is clearly accountable for the LLM.

The fix: Document the accountability matrix. MRM owns the validation standard. The business line owns the use case and risk acceptance. IT owns deployment controls. Legal owns acceptable use policy. Write this down in the AI governance framework and get senior management sign-off — examiners will ask.


Step 3: Documentation That Passes Examiner Review

SR 11-7 requires documentation “sufficiently detailed to allow parties unfamiliar with a model to understand how the model operates, as well as its limitations and key assumptions.” For a traditional regression model, that’s achievable. For an LLM with hundreds of billions of parameters, it requires a different documentation structure.

What you can document for in-house or fine-tuned LLMs:

Documentation ElementTraditional ModelLLM Equivalent
Conceptual soundnessTheoretical basis, mathematical derivationArchitecture rationale, use-case justification, training approach
Data qualityInput data sources, cleaning methodologyTraining data provenance, bias in training data, data cutoff date
Assumptions and limitationsModel assumptions, out-of-scope conditionsKnown failure modes, hallucination risks, use-case scope limits
Performance metricsAccuracy, GINI, KS statisticTask-specific metrics, hallucination rate, bias test results
Version controlModel version, parameter changesPrompt version, fine-tuning data version, API version

What you can document for third-party vendor LLMs (when you can’t see inside):

  • Use case boundary: exactly what questions/tasks the LLM is authorized to perform
  • Prompt engineering documentation with version history
  • Output validation controls: what human or automated checks run on LLM outputs
  • Bias and fairness testing results for your specific use case (even if you can’t audit training data)
  • Vendor model card (if provided) and any disclosed bias or limitation disclosures
  • Contractual terms covering notification of model changes
  • Incident history and response procedures

The documentation gap most likely to cause an MRA: Examiners consistently cite incomplete documentation around training data, bias testing, and drift thresholds. If your model file doesn’t answer “where did the training data come from?” and “how was bias evaluated?” — that’s a finding. For vendor LLMs, “we don’t know — we asked the vendor” is not a defensible answer. Require vendors to provide this.


Step 4: Validation Scope — In-House vs. Vendor LLMs

SR 11-7 requires “independent, rigorous testing of model performance, conceptual soundness, and ongoing monitoring.” The challenge for LLMs: traditional validation methods (backtesting, benchmarking, sensitivity analysis) don’t translate cleanly.

For in-house or fine-tuned LLMs:

The validation scope should include:

  1. Conceptual soundness review — Is the architecture appropriate for the use case? Are the training choices documented and defensible?
  2. Pre-deployment testing — Accuracy on held-out test data, hallucination rate testing across use-case scenarios, adversarial/red-team testing, bias and fairness evaluation for regulated decisions
  3. Use-case boundary testing — What happens when inputs are outside the intended scope? Does the model fail gracefully?
  4. Human-in-the-loop validation — Does the review process actually catch model errors before they affect customers or decisions?
  5. Ongoing monitoring plan — What metrics will you track? What thresholds trigger revalidation?

For vendor LLMs (APIs you don’t control):

Traditional validation frameworks assume you have access to model internals. You don’t. Reframe validation around what you can control:

  • Use-case scoped testing — Run representative samples through the model and evaluate output quality against documented criteria
  • Red-team testing — Try to break the model with adversarial inputs specific to your use case. Can it be manipulated into outputting incorrect risk decisions? Discriminatory advice?
  • Prompt sensitivity analysis — How much does output vary across slight rephrasing of inputs? High variance is a documented risk
  • Vendor documentation review — Model cards, safety evaluations, bias testing results, SOC 2 / Trust and Safety reports
  • Contractual rights — Do you have rights to notification of model changes? Audit rights? These are validation controls in their own right

The annual validation assumption is the biggest trap. Traditional MRM operates on a 12-month validation cycle. LLMs can change — via provider updates, through prompt modifications, or through distribution shift in input data — on much shorter timescales. Define event-triggered revalidation criteria before deployment:

  • Major provider model version update
  • Output quality degradation below documented thresholds (set a specific metric and threshold)
  • Deployment to a new use case not covered by original validation scope
  • Material change in the volume or type of inputs

Write these triggers into the model’s governance documentation, not into someone’s head.


Step 5: Ongoing Monitoring — What to Track and What to Do When It Degrades

SR 11-7’s ongoing monitoring requirement was written for models that produce the same output for the same input. LLMs don’t. This is the gap that catches MRM programs off guard.

Key monitoring metrics for LLMs in production:

MetricWhat It MeasuresTrigger for Review
Output accuracy / task completion rateQuality of LLM responses for defined use caseDrop below documented threshold
Hallucination rateFrequency of factually incorrect outputsAny increase from baseline
Latency / availabilitySystem performanceSLA breach or sustained degradation
Bias / fairness metricsDisparate impact across protected classesAny statistically significant change
User escalation / override rateHuman reviewers overriding LLM outputsSustained increase (model degrading)
Model version logTracking provider changesAny undocumented version change

The drift challenge for LLMs: Traditional models drift when real-world distributions shift away from training data. LLMs face an additional challenge — the provider may update the model silently, changing its behavior without changing the version number. Build monitoring that detects behavioral changes, not just data drift.

Governance of monitoring outputs: Who sees the monitoring reports? What’s the escalation path when a threshold is breached? SR 11-7 requires board and senior management oversight — that means your monitoring reports need to flow to the model risk committee, not just sit in a data science dashboard.


The Third-Party Vendor Problem

The OCC identified this clearly: when your organization deploys a vendor LLM, you bear the model risk even though you don’t own the model. Most vendor AI contracts don’t give banks what SR 11-7 requires.

What to push for in vendor AI contracts:

  • Notification of material model changes (version updates, training data changes, capability changes) with reasonable advance notice
  • Model cards or equivalent documentation covering architecture overview, training data description, known limitations, bias evaluations
  • Incident notification procedures for identified model failures or bias issues
  • Data handling terms covering what vendor does with your prompts and outputs
  • Termination and transition provisions covering data export and model continuity

What you’re unlikely to get (and how to compensate):

  • Access to model weights or training data — document why this is unavailable and what compensating controls exist
  • Independent audit of vendor model internals — compensate with your own use-case-level testing
  • Guaranteed model stability — compensate with version monitoring and revalidation triggers

The GAO’s May 2025 report (GAO-25-107197) confirmed that regulators rely on existing laws and guidance for AI oversight and that most AI review happens as part of broader safety-and-soundness or compliance examinations. Your third-party AI governance won’t be evaluated in isolation — it’ll come up when an examiner reviews TPRM, model risk, and IT risk together. The gaps compound.


OCC Bulletin 2025-26: What the Forthcoming Guidance Update Signals

OCC Bulletin 2025-26 (October 2025) is the first signal of a broader MRM guidance review. Its immediate scope is community banks and the misconception that SR 11-7 requires annual model validation — it explicitly clarifies that validation frequency should match model risk, not a calendar. The bulletin’s bigger signal is its closing line: this is “a first step as part of the OCC’s broader review of model risk management guidance, practices, and examiner feedback at banks of all sizes.”

Don’t wait for updated guidance. The existing framework — SR 11-7, OCC Bulletin 2011-12, OCC Bulletin 2021-39, SR 21-8 — already provides sufficient authority for examiners to evaluate your AI model risk program. Federal Reserve Governor Barr called on regulators to “review and update existing model risk management standards” in his April 2025 speech. That review is coming.

Build a defensible program now, while the standard is still being clarified. It’s much easier to calibrate a documented program than to explain an undocumented gap when new guidance drops.


So What? The Implementation Checklist

Here’s what “applying SR 11-7 to LLMs” actually looks like in practice:

Immediate (Days 1-30):

  • Run an AI inventory audit — identify all LLMs in use, in development, or embedded in vendor systems
  • Assign a single accountable model owner to each LLM or AI system
  • Define your risk tiers (High/Medium/Low) for AI use cases and document the criteria
  • Identify your highest-risk LLMs (credit, compliance, customer-facing) and prioritize for formal validation

Short-term (Days 31-90):

  • Expand your model inventory template to include LLM-specific fields (version, provider, prompt version, fine-tuning)
  • Develop a model intake process for new AI deployments with mandatory pre-deployment documentation
  • Establish revalidation triggers for each high-risk LLM
  • Review vendor AI contracts and identify documentation and notification gaps to address in renewals

Ongoing:

  • Build LLM monitoring into your model risk reporting dashboard
  • Run red-team testing for any customer-facing or high-risk LLM at least annually and after major model updates
  • Report AI model risk summary to model risk committee quarterly
  • Keep a vendor AI model change log — when providers update their models, document the date, version change, and any revalidation performed

If you’re building out your AI risk program from scratch, the AI Risk Assessment Template includes a model inventory template, pre-deployment checklist, and third-party AI vendor questionnaire designed for exactly this workflow.

For deeper dives into the specific testing techniques your validation team should be running, see LLM Model Risk Assessment: What MRM Teams Actually Need to Test. For the documentation requirements examiners look for, see AI Model Documentation Requirements: What Examiners Look For. And for context on how the OCC’s bulletin history got here, see OCC Model Risk Management Meets AI: What Bulletin 2011-12 Means for Your ML Program.


FAQ

Does SR 11-7 apply to LLMs? Yes — but it’s complicated. If an LLM is used in credit decisioning, fraud scoring, risk rating, or compliance monitoring, it almost certainly meets SR 11-7’s model definition. For other uses like document drafting or customer service automation, the OCC’s 2021 guidance resolves the ambiguity: “Regardless of how AI is classified (as a model or not a model), the associated risk management should be commensurate with the level of risk.” Don’t play the definitional game — tier the LLM by risk and govern it accordingly.

What needs to be in the model inventory for an LLM? At minimum: model name, use case and business function, risk tier, development source (in-house vs. vendor API), regulatory applicability, validation status, model owner (person, not team), and last review date. For vendor LLMs, also document the provider name, model version, API endpoint, and any fine-tuning or prompt engineering applied by your team.

Who should own an enterprise LLM used across multiple business lines? Assign a single accountable owner — typically the function that initiated the deployment or has the most material use case. SR 11-7 requires clear ownership; “shared ownership” is not defensible to an examiner. Document the owner, their responsibilities, and the escalation path.

How do you validate a vendor LLM you can’t examine internally? Scope validation around what you can control: the use case boundary, the prompt engineering, the output guardrails, and the human review process. Run adversarial testing against your specific use case. Review vendor model cards, SOC 2 reports, and any bias/fairness disclosures. Establish contractual rights to notification of model updates.

How often does an LLM need to be revalidated? Define event-triggered revalidation criteria: major model version update, output quality degradation below a documented threshold, deployment to a new use case, or material prompt change. Calendar-only validation cycles miss the most common failure modes for LLMs.

What are examiners actually finding when they review AI model risk programs? The most common gaps: models deployed without formal inventory registration (shadow AI), documentation that doesn’t address training data, bias testing, or drift thresholds, unclear model ownership for vendor-embedded AI, and governance committee approvals not documented before go-live.

Frequently Asked Questions

Does SR 11-7 apply to LLMs?
Yes — but it's complicated. If an LLM is used in credit decisioning, fraud scoring, risk rating, or compliance monitoring, it almost certainly meets SR 11-7's model definition. For other uses like document drafting or customer service automation, the OCC's 2021 guidance resolves the ambiguity: 'Regardless of how AI is classified (as a model or not a model), the associated risk management should be commensurate with the level of risk.' Don't play the definitional game — tier the LLM by risk and govern it accordingly.
What needs to be in the model inventory for an LLM?
At minimum: model name, use case and business function, risk tier (High/Medium/Low), development source (in-house vs. vendor API), regulatory applicability, validation status, model owner (person, not team), and last review date. For vendor LLMs, also document the provider name, model version, API endpoint, and any fine-tuning or prompt engineering applied by your team.
Who should own an enterprise LLM used across multiple business lines?
Assign a single accountable owner — typically the function that initiated the deployment or has the most material use case. For enterprise-wide GenAI platforms, this often lands with the CRO, CTO, or a dedicated AI governance function. SR 11-7 requires clear ownership; 'shared ownership' is not defensible to an examiner. Document the owner, their responsibilities, and the escalation path.
How do you validate a vendor LLM you can't examine internally?
You scope validation around what you can control: the use case boundary, the prompt engineering, the output guardrails, and the human review process. Run adversarial testing against your specific use case. Review the vendor's model cards, SOC 2 reports, and any bias/fairness disclosures. Document limitations explicitly. Establish contractual rights to notification of model updates and model changes.
How often does an LLM need to be revalidated?
Unlike traditional models validated annually, LLMs need validation triggers tied to events — not just calendar cycles. Trigger revalidation when: the provider pushes a major model version update, output quality degrades below a documented threshold, the LLM is deployed to a new use case, or a significant prompt change is made. Define these triggers before go-live, not after an exam finding.
What are examiners actually finding when they review AI model risk programs?
The most common gaps (per the GAO's May 2025 report and OCC Fall 2025 SARP): models deployed without formal inventory registration (shadow AI), documentation that doesn't address training data, bias testing, or drift thresholds, unclear model ownership for vendor-embedded AI, and governance committee approvals that aren't documented in the file before go-live.
Rebecca Leung

Rebecca Leung

Rebecca Leung has 8+ years of risk and compliance experience across first and second line roles at commercial banks, asset managers, and fintechs. Former management consultant advising financial institutions on risk strategy. Founder of RiskTemplates.

Related Framework

AI Risk Assessment Template & Guide

Comprehensive AI model governance and risk assessment templates for financial services teams.

Immaterial Findings ✉️

Weekly newsletter

Sharp risk & compliance insights practitioners actually read. Enforcement actions, regulatory shifts, and practical frameworks — no fluff, no filler.

Join practitioners from banks, fintechs, and asset managers. Delivered weekly.