SR 11-7 in the Age of AI: What Model Risk Management Teams Must Change Now

SR 11-7 was written in 2011 for spreadsheet models, linear regressions, and credit scorecards. Your LLM doesn’t care.

The Federal Reserve’s Supervisory Letter SR 11-7 — and its OCC companion, Bulletin 2011-12 — remains the foundational framework for model risk management in U.S. banking. Nearly 15 years later, it’s still the first thing examiners reach for when evaluating your AI governance program.

The problem: SR 11-7 was built on assumptions that no longer hold. Static parameters. Deterministic outputs. Bounded scope. Decision paths you can reconstruct after the fact.

Neural networks don’t have static parameters. LLMs don’t produce deterministic outputs. And agentic AI systems make decisions you may not be able to reconstruct at all.

As GARP’s Krishan Sharma analyzed in February 2026: “SR 11-7 remains one of the few stable reference points for model governance, making clarity around its scope and limitations even more critical.” The framework isn’t obsolete. But applying it to modern AI without adaptation is a compliance gap waiting to become an MRA.

Here’s what model risk management teams actually need to change.

TL;DR

SR 11-7’s three pillars (model development, validation, governance) all require adaptation for AI/ML — the underlying framework is sound, but the implementation assumptions are outdated
The biggest gaps: effective challenge is nearly impossible for opaque models, periodic validation doesn’t catch continuous drift, and the model definition itself may not cover agentic AI
The OCC signaled a broader MRM guidance review in October 2025 — expect updates, but don’t wait for them

What SR 11-7 Actually Requires (The Framework, Briefly)

SR 11-7 defines a model as “a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates.” It then organizes model risk management around three pillars:

Model development, implementation, and use — sound conceptual basis, appropriate data, documented assumptions, controlled deployment
Model validation — independent, rigorous testing of model performance, conceptual soundness, and ongoing monitoring
Governance, policies, and controls — clear ownership, defined roles, board and senior management accountability

The guiding principle throughout: effective challenge — “critical analysis by objective, informed parties that can identify model limitations and produce appropriate changes.”

This framework is conceptually robust. The problem isn’t the principles; it’s the implicit assumptions baked into how those principles get implemented — assumptions that collapse when you’re governing a gradient-boosted tree, a transformer-based LLM, or an agentic system that takes actions in the real world.

Where SR 11-7 Breaks Down for AI: Pillar by Pillar

Pillar 1: Model Development and Use

The old assumption: Models have a stable design objective, bounded inputs, and a clear intended use case. You document the methodology, validate it, and deploy it. Done.

The AI reality: Machine learning models are trained on data — their “methodology” is learned from patterns, not specified by a modeler. The inputs can number in the thousands (or millions, for LLMs). The intended use case may evolve as business teams find new applications. And the model may continue to change after deployment if it’s fine-tuned or uses retrieval-augmented generation.

What must change:

Training data documentation is now a first-class requirement. SR 11-7 talks about input data quality; for ML models, you need to document where the training data came from, what time period it covers, what biases might be embedded in it, and whether the data distribution still matches production. Examiners are asking for this.
Intended use must be formally bounded and enforced. A credit underwriting model that gets repurposed for customer retention without re-validation is a classic SR 11-7 violation. With AI, this scope creep happens faster and more invisibly. Document approved use cases explicitly, and build a process for evaluating new use cases before they go to production.
Vendor AI models need the same rigor as internal models. When you buy a vendor model — for fraud detection, AML transaction monitoring, credit scoring — you own the model risk. SR 11-7’s guidance on vendor models applies in full. The fact that you can’t see the model’s internals is a risk to manage, not an excuse.

Pillar 2: Model Validation

This is where the framework strains hardest.

The old assumption: Periodic validation — annually for material models, less frequently for lower-risk ones — is sufficient. Validators test conceptual soundness, perform backtesting, run sensitivity analysis, and review outcomes. Between validations, the model doesn’t change.

The AI reality: ML models can drift continuously as underlying data distributions shift. LLMs produce probabilistic outputs that vary even with identical inputs. And as GARP’s February 2026 analysis notes, “material changes in behavior can occur without a formal redevelopment event” — meaning a model can become materially different from what was validated without triggering your standard validation workflow.

What must change:

Continuous monitoring must replace (or supplement) periodic validation. For any ML model in production, you need automated monitoring of model performance, input data distributions, and output distributions. Set quantitative thresholds — for example, a ±5% shift in feature distributions, or a >10% drop in a performance metric — that trigger investigation. This isn’t optional anymore; examiners expect to see it.
Validation techniques must match model type. Traditional backtesting and sensitivity analysis don’t translate directly to neural networks. For ML models: feature importance testing, adversarial robustness testing, performance across demographic subgroups. For LLMs: red-teaming, prompt robustness testing, output consistency checks, hallucination rate measurement. Your validation framework needs to specify these techniques by model tier and type.
Effective challenge requires explainability. SR 11-7 demands that validators be able to critically assess a model’s conceptual soundness and behavior. That’s straightforward for a logistic regression. For a deep neural network with hundreds of layers, it requires deliberate explainability work — SHAP values, LIME analysis, attention visualization — that must be built into the validation process, not bolted on afterward.
Benchmark model complexity for LLMs. SR 11-7 mentions benchmarking as a validation technique. For LLMs, this means comparison against established benchmark evaluations (MMLU, HellaSwag, domain-specific tests), as well as internal baseline comparisons. Document the evaluation suite and version it along with the model.

Validation frequency: A practical guide by model tier

Model Type	Recommended Monitoring	Full Re-Validation Trigger
Traditional statistical models (logistic regression, scorecard)	Quarterly performance review	Annually or on significant data change
Gradient-boosted / ensemble ML models	Monthly drift monitoring	Semi-annually or on >10% performance shift
Deep learning / neural networks	Continuous automated monitoring	Quarterly conceptual soundness review
LLMs in production applications	Real-time output sampling + weekly review	On model version change or significant fine-tuning
Agentic AI systems	Continuous audit trail review	Per deployment change + quarterly governance review

Pillar 3: Governance, Policies, and Controls

The old assumption: Clear organizational separation between development, validation, and use. Model inventory is manageable. Governance is periodic (model risk committee meets quarterly). The definition of “model” is unambiguous.

The AI reality: As GARP notes, the definition of model itself is now contested. Traditional software executes predefined instructions deterministically. AI systems are probabilistic, adaptive, and in the case of agentic AI, capable of pursuing objectives with limited human intervention. Whether these systems fall under SR 11-7 is no longer a theoretical question — it’s one that examiners are actively asking.

What must change:

Your model inventory must include AI systems, including vendor-embedded AI. The first question in any model risk exam is: “Show me your model inventory.” Most firms dramatically undercount AI because they define “model” narrowly. Shadow AI tools employees are using. AI embedded in third-party software (your core banking platform, your AML system, your CRM). Vendor foundation models accessed via API. All of it needs to be inventoried and assessed.
Model risk tiering must account for AI-specific factors. Risk tiering drives validation frequency, documentation depth, and committee escalation. Traditional tiering factors (financial materiality, breadth of use, regulatory impact) remain relevant, but AI adds new dimensions: autonomy level, explainability gap, rate of change, and concentration risk from shared foundation models.
Ownership must be explicit for every AI model. Who is responsible when an AI model produces a biased output? When it hallucinates in a customer-facing application? SR 11-7 requires clear roles and responsibilities; for AI, those need to be specified in writing and attached to specific individuals. At most banks, this sits with the Model Risk Management function or CRO. At fintechs without dedicated MRM, it typically falls to the Head of Compliance or VP of Engineering — but it needs to be someone, documented, with actual authority.
Third-party AI concentration risk is now a governance issue. GARP’s analysis calls this out directly: concentration in foundation model providers (OpenAI, Anthropic, Google) creates correlated risks across institutions. If your fraud model, your customer service chatbot, and your document processing pipeline all depend on the same foundation model, that’s concentration risk. Your governance framework needs to map these dependencies and assess what happens if a provider has an outage, changes terms of service, or is acquired.

The Definition Problem: Does SR 11-7 Even Cover Your AI?

Here’s the uncomfortable question model risk teams need to answer explicitly.

SR 11-7 defines a model as a quantitative method that produces quantitative estimates. An LLM that produces text — not a numerical output — might not meet that definition under a strict reading. An agentic AI system that takes actions rather than producing estimates is almost certainly a stretch.

The regulatory community is actively debating this. The OCC’s October 2025 Bulletin 2025-26 — which clarified MRM flexibility for community banks — explicitly noted it was “a first step as part of the OCC’s broader review of model risk management guidance.” That review is ongoing. Examiners are not waiting for updated guidance before asking questions about your AI.

The practical answer for MRM teams: don’t wait for definitional clarity. If an AI system informs material business decisions — credit, fraud, customer eligibility, pricing, compliance monitoring — govern it under SR 11-7 regardless of whether it technically meets the model definition. The risk is real whether or not the taxonomy is settled.

What “Effective Challenge” Looks Like for AI

SR 11-7’s requirement for effective challenge — independent, informed parties with real authority to push back — is arguably the hardest part to execute for AI.

For a credit scorecard, effective challenge means a validator with statistical expertise can independently replicate the model, test its assumptions, and credibly assess whether it works as intended. Most MRM teams can do this.

For a large language model, effective challenge requires:

Technical expertise in ML/NLP to assess model architecture, training approach, and evaluation methodology
Adversarial testing capability — structured red-teaming, not just reading a vendor data sheet
Domain expertise to evaluate whether the model’s outputs are appropriate for the specific use case
Independence from the model development and deployment team

Most banks don’t have all four in their current MRM function. Building that capability — through hiring, training, or targeted third-party validation — is not optional if you’re deploying AI in material applications.

The Agentic AI Problem: A Preview of What’s Coming

GARP’s February 2026 analysis focused on agentic AI because it represents the most acute version of the SR 11-7 strain: systems that don’t just produce outputs, but take actions — executing trades, moving funds, sending communications, triggering workflows — with limited human intervention.

The SR 11-7 assumption of reconstructible decision paths breaks completely for agentic systems. If an agentic AI executes a sequence of actions based on multi-step reasoning, and you can’t reconstruct that chain, you can’t conduct effective challenge, you can’t demonstrate governance, and you can’t satisfy an examiner.

If your institution is beginning to explore or deploy agentic AI, the governance questions to answer now:

What actions can the system take autonomously, and what requires human approval?
How are those decisions logged in a way that supports reconstruction?
What’s the kill switch — who has authority to halt the system, and under what conditions?
How does the validation cycle work for a system whose behavior can change between reviews?

These aren’t theoretical. Examiners at firms piloting agentic AI are already asking them.

Your 90-Day SR 11-7 AI Adaptation Roadmap

Days 1–30: Know what you have

Complete or update your model inventory to include all AI systems, including vendor-embedded AI and any models accessed via API
Classify each AI model against your existing tier structure
Identify gaps: which AI models have no validation documentation, no owner, no monitoring

Days 31–60: Close the validation gaps

For each Tier 1 and Tier 2 AI model without current validation documentation: scope and schedule validation
Implement automated monitoring for any ML models in production without it (input drift, output drift, performance metrics)
Document validation techniques by model type — traditional ML, deep learning, LLM — so validators know what’s expected

Days 61–90: Update governance

Revise your MRM policy to explicitly address AI/ML, including the model definition question and how you’ve resolved it
Update model risk tiering criteria to include AI-specific factors (autonomy, explainability, third-party concentration)
Establish explicit ownership for every material AI model: name, role, escalation path
Conduct a tabletop exercise: “Our fraud detection AI produced systematically biased outputs for 6 months. What do we do?”

So What?

SR 11-7 isn’t going away. The OCC’s 2025 signal of a broader MRM guidance review suggests updates are coming — but “updates are coming” is not a defensible position during an exam.

The firms that will fare best aren’t waiting for regulators to update the framework. They’re adapting their MRM programs now: completing AI model inventories, updating validation techniques, building the explainability infrastructure that effective challenge requires, and writing down who owns what when an AI model goes wrong.

The framework’s foundational principles — sound governance, independent validation, effective challenge — are as relevant for a transformer model as they were for a spreadsheet. What’s changed is the sophistication required to actually execute them.

If you’re building or updating your AI risk program, the AI Risk Assessment Template & Guide gives you a structured framework to assess AI models across development, validation, and governance — mapped directly to SR 11-7 and OCC 2011-12 requirements. It’s the starting point for teams that need to get compliant without starting from scratch.

Frequently Asked Questions

Does SR 11-7 apply to LLMs and generative AI?

Technically, SR 11-7’s definition of “model” — a system that processes input data to produce quantitative estimates — may not cover all LLM use cases, since LLMs often produce text rather than numerical outputs. However, U.S. banking regulators have consistently indicated that AI systems used in material business decisions should be governed under the spirit of SR 11-7 even if they don’t meet the strict definitional test. The OCC’s 2025 bulletin on model risk management signals a broader guidance review is underway. The practical answer: govern AI used for material decisions under SR 11-7 now, and don’t wait for definitional clarity.

What’s the difference between SR 11-7 and OCC Bulletin 2011-12?

SR 11-7 is the Federal Reserve’s supervisory letter on model risk management, issued April 4, 2011. OCC Bulletin 2011-12 is the Office of the Comptroller of the Currency’s companion guidance, issued the same year with substantially similar requirements. Both documents reflect joint supervisory thinking and are functionally equivalent in their requirements. Banks supervised by the Federal Reserve follow SR 11-7; national banks and federal savings associations supervised by the OCC follow Bulletin 2011-12. Most large bank holding companies are subject to both.

How often should AI models be validated under SR 11-7?

SR 11-7 requires validation frequency commensurate with model risk — more frequent for higher-risk, more complex models. For AI/ML models specifically, the traditional annual validation cycle is insufficient given the pace of change. Leading firms implement continuous automated monitoring for all production ML models, with formal re-validation triggered by material drift, performance degradation, or model changes. For LLMs and other high-complexity AI, quarterly conceptual soundness reviews are increasingly common in addition to continuous monitoring.

SR 11-7 in the Age of AI: What Model Risk Management Teams Must Change Now

What SR 11-7 Actually Requires (The Framework, Briefly)

Where SR 11-7 Breaks Down for AI: Pillar by Pillar

Pillar 1: Model Development and Use

Pillar 2: Model Validation

Pillar 3: Governance, Policies, and Controls

The Definition Problem: Does SR 11-7 Even Cover Your AI?

What “Effective Challenge” Looks Like for AI

The Agentic AI Problem: A Preview of What’s Coming

Your 90-Day SR 11-7 AI Adaptation Roadmap

So What?

Frequently Asked Questions

Rebecca Leung

Keep Reading

AI Model Validation: Testing Techniques That Actually Work for ML and LLM Models

AI Model Monitoring and Drift Detection: How to Keep Models From Going Off the Rails

Prompt Injection Attacks: What Compliance Teams Need to Know Right Now

Immaterial Findings ✉️