SR 11-7 for AI Systems: Applying Legacy Model Risk Guidance to LLMs
Table of Contents
There’s a conversation happening in MRM teams right now: “Legal wants to use an LLM for contract analysis. It needs to go through model risk review.” “Great. Which SR 11-7 framework do we use?” “The same one we always use.” “OK — how do we document conceptual soundness for a transformer model?” Silence.
The framework is the right one. The problem is most banks are running a 2011 checklist against a 2025 system and hoping nobody notices the gaps. The OCC’s Fall 2025 Semiannual Risk Perspective called AI an “emerging risk” requiring management “commensurate with the materiality and complexity” of AI activities. Federal Reserve Governor Barr’s April 2025 speech explicitly cited SR 11-7 as foundational for AI governance while acknowledging the explainability and hallucination problems that current guidance doesn’t cleanly address.
The framework is sound. The implementation details are what break.
Here’s how to actually apply SR 11-7 to LLMs — from model inventory through governance ownership through validation scope — without the theater.
TL;DR
- SR 11-7 applies to LLMs, but the implementation assumptions (static parameters, annual validation cycles, deterministic outputs) must be rethought for AI
- The five places MRM programs most often break with LLMs: model inventory scoping, governance ownership, documentation gaps, vendor AI handling, and validation timing
- OCC Bulletin 2025-26 signals a broader MRM guidance review is coming — don’t wait for it; build a defensible program now based on existing expectations
- The GAO’s May 2025 report found examiners apply SR 11-7 inconsistently to AI tools — document your rationale for every scope and methodology decision
Does SR 11-7 Actually Apply to Your LLM?
Start here because most MRM programs waste energy on the wrong question.
SR 11-7 defines a “model” as “a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates.” That clause about “quantitative estimates” is what creates the debate about LLMs — their outputs are often text, not numbers.
Don’t get trapped in the definitional argument. The OCC’s 2021 Comptroller’s Handbook guidance resolved it pragmatically: “Regardless of how AI is classified (as a model or not a model), the associated risk management should be commensurate with the level of risk.”
The practical decision tree:
| LLM Use Case | SR 11-7 Model? | Required Treatment |
|---|---|---|
| Credit underwriting / risk scoring | Yes — outputs drive quantitative decisions | Full Tier 1 MRM governance |
| Fraud detection / transaction risk scoring | Yes | Full MRM governance |
| Compliance monitoring (SAR flagging, AML) | Yes — SR 21-8 confirms this explicitly | Full MRM governance |
| Customer-facing Q&A / support chatbot | Arguable — tier by consequence | Commensurate risk controls |
| Internal document drafting / summarization | Lowest risk tier | Documented AUP + basic monitoring |
| Contract review / legal research | Arguable — tier by consequence | Risk assessment + human review controls |
SR 21-8 (the 2021 BSA/AML interagency guidance) explicitly confirmed that transaction monitoring and sanctions screening systems “meet the definition of ‘model’ under SR 11-7” — even when built on ML architectures. If your LLM is anywhere near the BSA/AML stack, it’s in scope.
The GARP analysis from February 2026 put it clearly: the real tension is that SR 11-7 was “grounded in the principle that models are simplified, relatively static representations of real-world relationships.” LLMs are neither simplified nor static. That tension doesn’t make SR 11-7 irrelevant — it makes the implementation decisions harder. Document your rationale for every scoping decision.
Step 1: Build an AI Model Inventory That Captures LLMs
SR 11-7 requires organizations to “maintain an inventory of models implemented for use, under development for implementation, or recently retired.” That sentence was written for credit scorecards. Your inventory needs to expand significantly.
What the inventory must include (baseline per OCC 2021 Comptroller’s Handbook):
- Model name and model risk level (High/Medium/Low)
- Owning department and model owner (person, not team)
- Developer (in-house team, vendor, or open-source)
- Purpose and intended use cases
- Actual usage and any restrictions on use
LLM-specific fields to add:
- Model type (foundation model, fine-tuned, RAG-augmented, prompt-engineered wrapper)
- Provider name and model version (e.g., GPT-4o, Claude 3.5, Llama 3.1)
- API endpoint or deployment environment
- Fine-tuning data used (if any) and source
- Prompt engineering approach and version control status
- Output format (text, structured output, risk score)
- Human-in-the-loop controls (required? Optional? None?)
- Validation status and last review date
- Revalidation triggers (version update, use case expansion, degradation threshold breach)
The shadow AI problem is your biggest inventory risk. The OCC’s Fall 2025 SARP and examiner feedback consistently flag models deployed without formal inventory registration. At most organizations, business lines are spinning up vendor LLM API connections faster than MRM can track them. The fix: require a model intake form for any AI system before deployment authorization, with a signed attestation from the business line owner.
The inventory will also need to capture embedded AI — the LLM baked into your vendor’s underwriting platform, your compliance screening tool, or your CRM. These often aren’t registered because nobody thinks of them as “your” models. They are your risk. Require vendors to disclose AI/ML components in their systems, and document them.
Step 2: Governance Ownership — Who Actually Owns the LLM?
SR 11-7’s governance pillar assumes “clear separation between model development, validation, and use.” For traditional credit models, that’s straightforward — development owns the model, validation reviews it, the business line uses it.
LLMs break this structure three ways:
Problem 1: The enterprise GenAI platform problem. When a bank deploys a single LLM platform used by legal, compliance, operations, and risk simultaneously, the “model owner” concept frays. SR 11-7 says “model owners typically are in individual business units” — but who owns an enterprise tool with a dozen use cases?
The fix: Assign a single accountable model owner — typically the function that initiated deployment or has the most material use case. Designate business line “model users” with documented use-case-specific limitations. Create a tiered governance structure: the enterprise owner sets the AI risk appetite and platform controls; business line users are responsible for use-case-specific controls and human oversight.
Problem 2: The vendor model ownership gap. When the foundation model comes from OpenAI or Anthropic, traditional “model owner” accountability doesn’t map cleanly. You can’t review their training data, audit their RLHF process, or demand architectural documentation. The bank uses the model but doesn’t control the model.
The fix: Your ownership responsibility is at the use-case boundary — the prompts you write, the guardrails you implement, the human review you require, the output validation you run. Document your use-case scope explicitly. Require contractual notifications from vendors for model version changes, training data updates, and known capability changes. The model card should live in your inventory even if it came from the vendor.
Problem 3: The MRM / data science accountability gap. At many banks, AI governance is split across model risk (SR 11-7 compliance), data science (model development), and increasingly IT/legal (vendor management and GenAI policy). Nobody is clearly accountable for the LLM.
The fix: Document the accountability matrix. MRM owns the validation standard. The business line owns the use case and risk acceptance. IT owns deployment controls. Legal owns acceptable use policy. Write this down in the AI governance framework and get senior management sign-off — examiners will ask.
Step 3: Documentation That Passes Examiner Review
SR 11-7 requires documentation “sufficiently detailed to allow parties unfamiliar with a model to understand how the model operates, as well as its limitations and key assumptions.” For a traditional regression model, that’s achievable. For an LLM with hundreds of billions of parameters, it requires a different documentation structure.
What you can document for in-house or fine-tuned LLMs:
| Documentation Element | Traditional Model | LLM Equivalent |
|---|---|---|
| Conceptual soundness | Theoretical basis, mathematical derivation | Architecture rationale, use-case justification, training approach |
| Data quality | Input data sources, cleaning methodology | Training data provenance, bias in training data, data cutoff date |
| Assumptions and limitations | Model assumptions, out-of-scope conditions | Known failure modes, hallucination risks, use-case scope limits |
| Performance metrics | Accuracy, GINI, KS statistic | Task-specific metrics, hallucination rate, bias test results |
| Version control | Model version, parameter changes | Prompt version, fine-tuning data version, API version |
What you can document for third-party vendor LLMs (when you can’t see inside):
- Use case boundary: exactly what questions/tasks the LLM is authorized to perform
- Prompt engineering documentation with version history
- Output validation controls: what human or automated checks run on LLM outputs
- Bias and fairness testing results for your specific use case (even if you can’t audit training data)
- Vendor model card (if provided) and any disclosed bias or limitation disclosures
- Contractual terms covering notification of model changes
- Incident history and response procedures
The documentation gap most likely to cause an MRA: Examiners consistently cite incomplete documentation around training data, bias testing, and drift thresholds. If your model file doesn’t answer “where did the training data come from?” and “how was bias evaluated?” — that’s a finding. For vendor LLMs, “we don’t know — we asked the vendor” is not a defensible answer. Require vendors to provide this.
Step 4: Validation Scope — In-House vs. Vendor LLMs
SR 11-7 requires “independent, rigorous testing of model performance, conceptual soundness, and ongoing monitoring.” The challenge for LLMs: traditional validation methods (backtesting, benchmarking, sensitivity analysis) don’t translate cleanly.
For in-house or fine-tuned LLMs:
The validation scope should include:
- Conceptual soundness review — Is the architecture appropriate for the use case? Are the training choices documented and defensible?
- Pre-deployment testing — Accuracy on held-out test data, hallucination rate testing across use-case scenarios, adversarial/red-team testing, bias and fairness evaluation for regulated decisions
- Use-case boundary testing — What happens when inputs are outside the intended scope? Does the model fail gracefully?
- Human-in-the-loop validation — Does the review process actually catch model errors before they affect customers or decisions?
- Ongoing monitoring plan — What metrics will you track? What thresholds trigger revalidation?
For vendor LLMs (APIs you don’t control):
Traditional validation frameworks assume you have access to model internals. You don’t. Reframe validation around what you can control:
- Use-case scoped testing — Run representative samples through the model and evaluate output quality against documented criteria
- Red-team testing — Try to break the model with adversarial inputs specific to your use case. Can it be manipulated into outputting incorrect risk decisions? Discriminatory advice?
- Prompt sensitivity analysis — How much does output vary across slight rephrasing of inputs? High variance is a documented risk
- Vendor documentation review — Model cards, safety evaluations, bias testing results, SOC 2 / Trust and Safety reports
- Contractual rights — Do you have rights to notification of model changes? Audit rights? These are validation controls in their own right
The annual validation assumption is the biggest trap. Traditional MRM operates on a 12-month validation cycle. LLMs can change — via provider updates, through prompt modifications, or through distribution shift in input data — on much shorter timescales. Define event-triggered revalidation criteria before deployment:
- Major provider model version update
- Output quality degradation below documented thresholds (set a specific metric and threshold)
- Deployment to a new use case not covered by original validation scope
- Material change in the volume or type of inputs
Write these triggers into the model’s governance documentation, not into someone’s head.
Step 5: Ongoing Monitoring — What to Track and What to Do When It Degrades
SR 11-7’s ongoing monitoring requirement was written for models that produce the same output for the same input. LLMs don’t. This is the gap that catches MRM programs off guard.
Key monitoring metrics for LLMs in production:
| Metric | What It Measures | Trigger for Review |
|---|---|---|
| Output accuracy / task completion rate | Quality of LLM responses for defined use case | Drop below documented threshold |
| Hallucination rate | Frequency of factually incorrect outputs | Any increase from baseline |
| Latency / availability | System performance | SLA breach or sustained degradation |
| Bias / fairness metrics | Disparate impact across protected classes | Any statistically significant change |
| User escalation / override rate | Human reviewers overriding LLM outputs | Sustained increase (model degrading) |
| Model version log | Tracking provider changes | Any undocumented version change |
The drift challenge for LLMs: Traditional models drift when real-world distributions shift away from training data. LLMs face an additional challenge — the provider may update the model silently, changing its behavior without changing the version number. Build monitoring that detects behavioral changes, not just data drift.
Governance of monitoring outputs: Who sees the monitoring reports? What’s the escalation path when a threshold is breached? SR 11-7 requires board and senior management oversight — that means your monitoring reports need to flow to the model risk committee, not just sit in a data science dashboard.
The Third-Party Vendor Problem
The OCC identified this clearly: when your organization deploys a vendor LLM, you bear the model risk even though you don’t own the model. Most vendor AI contracts don’t give banks what SR 11-7 requires.
What to push for in vendor AI contracts:
- Notification of material model changes (version updates, training data changes, capability changes) with reasonable advance notice
- Model cards or equivalent documentation covering architecture overview, training data description, known limitations, bias evaluations
- Incident notification procedures for identified model failures or bias issues
- Data handling terms covering what vendor does with your prompts and outputs
- Termination and transition provisions covering data export and model continuity
What you’re unlikely to get (and how to compensate):
- Access to model weights or training data — document why this is unavailable and what compensating controls exist
- Independent audit of vendor model internals — compensate with your own use-case-level testing
- Guaranteed model stability — compensate with version monitoring and revalidation triggers
The GAO’s May 2025 report (GAO-25-107197) confirmed that regulators rely on existing laws and guidance for AI oversight and that most AI review happens as part of broader safety-and-soundness or compliance examinations. Your third-party AI governance won’t be evaluated in isolation — it’ll come up when an examiner reviews TPRM, model risk, and IT risk together. The gaps compound.
OCC Bulletin 2025-26: What the Forthcoming Guidance Update Signals
OCC Bulletin 2025-26 (October 2025) is the first signal of a broader MRM guidance review. Its immediate scope is community banks and the misconception that SR 11-7 requires annual model validation — it explicitly clarifies that validation frequency should match model risk, not a calendar. The bulletin’s bigger signal is its closing line: this is “a first step as part of the OCC’s broader review of model risk management guidance, practices, and examiner feedback at banks of all sizes.”
Don’t wait for updated guidance. The existing framework — SR 11-7, OCC Bulletin 2011-12, OCC Bulletin 2021-39, SR 21-8 — already provides sufficient authority for examiners to evaluate your AI model risk program. Federal Reserve Governor Barr called on regulators to “review and update existing model risk management standards” in his April 2025 speech. That review is coming.
Build a defensible program now, while the standard is still being clarified. It’s much easier to calibrate a documented program than to explain an undocumented gap when new guidance drops.
So What? The Implementation Checklist
Here’s what “applying SR 11-7 to LLMs” actually looks like in practice:
Immediate (Days 1-30):
- Run an AI inventory audit — identify all LLMs in use, in development, or embedded in vendor systems
- Assign a single accountable model owner to each LLM or AI system
- Define your risk tiers (High/Medium/Low) for AI use cases and document the criteria
- Identify your highest-risk LLMs (credit, compliance, customer-facing) and prioritize for formal validation
Short-term (Days 31-90):
- Expand your model inventory template to include LLM-specific fields (version, provider, prompt version, fine-tuning)
- Develop a model intake process for new AI deployments with mandatory pre-deployment documentation
- Establish revalidation triggers for each high-risk LLM
- Review vendor AI contracts and identify documentation and notification gaps to address in renewals
Ongoing:
- Build LLM monitoring into your model risk reporting dashboard
- Run red-team testing for any customer-facing or high-risk LLM at least annually and after major model updates
- Report AI model risk summary to model risk committee quarterly
- Keep a vendor AI model change log — when providers update their models, document the date, version change, and any revalidation performed
If you’re building out your AI risk program from scratch, the AI Risk Assessment Template includes a model inventory template, pre-deployment checklist, and third-party AI vendor questionnaire designed for exactly this workflow.
For deeper dives into the specific testing techniques your validation team should be running, see LLM Model Risk Assessment: What MRM Teams Actually Need to Test. For the documentation requirements examiners look for, see AI Model Documentation Requirements: What Examiners Look For. And for context on how the OCC’s bulletin history got here, see OCC Model Risk Management Meets AI: What Bulletin 2011-12 Means for Your ML Program.
FAQ
Does SR 11-7 apply to LLMs? Yes — but it’s complicated. If an LLM is used in credit decisioning, fraud scoring, risk rating, or compliance monitoring, it almost certainly meets SR 11-7’s model definition. For other uses like document drafting or customer service automation, the OCC’s 2021 guidance resolves the ambiguity: “Regardless of how AI is classified (as a model or not a model), the associated risk management should be commensurate with the level of risk.” Don’t play the definitional game — tier the LLM by risk and govern it accordingly.
What needs to be in the model inventory for an LLM? At minimum: model name, use case and business function, risk tier, development source (in-house vs. vendor API), regulatory applicability, validation status, model owner (person, not team), and last review date. For vendor LLMs, also document the provider name, model version, API endpoint, and any fine-tuning or prompt engineering applied by your team.
Who should own an enterprise LLM used across multiple business lines? Assign a single accountable owner — typically the function that initiated the deployment or has the most material use case. SR 11-7 requires clear ownership; “shared ownership” is not defensible to an examiner. Document the owner, their responsibilities, and the escalation path.
How do you validate a vendor LLM you can’t examine internally? Scope validation around what you can control: the use case boundary, the prompt engineering, the output guardrails, and the human review process. Run adversarial testing against your specific use case. Review vendor model cards, SOC 2 reports, and any bias/fairness disclosures. Establish contractual rights to notification of model updates.
How often does an LLM need to be revalidated? Define event-triggered revalidation criteria: major model version update, output quality degradation below a documented threshold, deployment to a new use case, or material prompt change. Calendar-only validation cycles miss the most common failure modes for LLMs.
What are examiners actually finding when they review AI model risk programs? The most common gaps: models deployed without formal inventory registration (shadow AI), documentation that doesn’t address training data, bias testing, or drift thresholds, unclear model ownership for vendor-embedded AI, and governance committee approvals not documented before go-live.
Related Template
AI Risk Assessment Template & Guide
Comprehensive AI model governance and risk assessment templates for financial services teams.
Frequently Asked Questions
Does SR 11-7 apply to LLMs?
What needs to be in the model inventory for an LLM?
Who should own an enterprise LLM used across multiple business lines?
How do you validate a vendor LLM you can't examine internally?
How often does an LLM need to be revalidated?
What are examiners actually finding when they review AI model risk programs?
Rebecca Leung
Rebecca Leung has 8+ years of risk and compliance experience across first and second line roles at commercial banks, asset managers, and fintechs. Former management consultant advising financial institutions on risk strategy. Founder of RiskTemplates.
Related Framework
AI Risk Assessment Template & Guide
Comprehensive AI model governance and risk assessment templates for financial services teams.
Keep Reading
AI and Fair Lending: UDAAP Risk in Algorithmic Decisioning
CFPB's UDAAP-as-discrimination gambit was vacated, but adverse action notice requirements still bite. Here's what AI lenders actually owe consumers in 2026.
Apr 13, 2026
AI RiskThird-Party AI Vendor Risk Assessment: Due Diligence Framework and Questionnaire
When a vendor deploys AI in the service they provide you, your institution's model risk responsibility doesn't disappear. Here's the due diligence framework, questionnaire areas, and contract provisions you need before deploying a vendor's AI.
Apr 13, 2026
AI RiskCommon Regulatory Exam Findings on AI: Top Deficiencies and How to Fix Them
These are the AI governance deficiencies regulators are actually finding in exams — incomplete model inventories, missing validation records, unmanaged vendor AI — and what to do about each one.
Apr 12, 2026
Immaterial Findings ✉️
Weekly newsletter
Sharp risk & compliance insights practitioners actually read. Enforcement actions, regulatory shifts, and practical frameworks — no fluff, no filler.
Join practitioners from banks, fintechs, and asset managers. Delivered weekly.