AI Risk

Common Regulatory Exam Findings on AI: Top Deficiencies and How to Fix Them

April 12, 2026 Rebecca Leung
Table of Contents

TL;DR

  • AI governance exam findings are clustering in eight specific areas — model inventory, scope classification, validation, documentation, vendor oversight, explainability, monitoring, and incident response
  • OCC Bulletin 2025-26 (October 2025) confirmed the broader MRM guidance review is underway, but current expectations still apply — and examiners are already looking
  • The May 2025 GAO report on AI in financial services found SR 11-7 applied inconsistently, but institutions are still receiving findings where programs are absent or deficient
  • Most of these deficiencies have workable fixes — the challenge is doing them before your next exam, not after

You know a problem is real when regulators start writing about it before most institutions have fully solved it. That’s exactly where AI governance is right now.

The OCC’s Semiannual Risk Perspective for Spring 2025 flagged AI — particularly models built on external data or third-party platforms — as an elevated risk area requiring examiner attention. The GAO’s May 2025 report on AI in financial services documented the existing supervisory framework and found that while SR 11-7 applies to AI, institutions frequently misapply it or apply it too narrowly. OCC Bulletin 2025-26, issued in October 2025, acknowledged ongoing confusion about MRM scope for community banks — and signaled that broader MRM guidance is under review.

None of that means examiners are giving institutions a pass while they wait for updated rules. It means the gap between where institutions are and where examiners expect them to be is already visible — and generating findings.

Here are the deficiencies showing up most frequently, and what fixing them actually looks like.


Why These Eight Areas, and Why Now

Regulatory examination for AI governance doesn’t follow a special checklist. Examiners apply existing frameworks — SR 11-7, OCC Bulletin 2011-12, FFIEC IT examination procedures — to AI systems and ask: does this institution have adequate controls over these models?

The eight deficiency areas below reflect where the existing framework meets AI-specific implementation gaps. Some of them are pure execution problems — institutions that never built the program. Others reflect genuine ambiguity where AI models have characteristics that don’t fit cleanly into legacy MRM structures. Both kinds produce findings.


Finding #1: Incomplete or Missing AI Model Inventory

What examiners see: An institution says it has “a handful of AI tools” in a model inventory. The examiner identifies fifteen additional AI systems — vendor-embedded ML in the AML platform, an LLM in the CRM, a fraud scoring model in the payment processor — that weren’t included.

Why it happens: Institutions built their original model inventories around internally developed, quantitative models — credit scoring, stress testing, CECL. AI is now embedded in almost every vendor-supplied software product, and most institutions haven’t updated their inventory scope to include it.

How to fix it: Run a cross-functional inventory sweep — IT, operations, compliance, and each business unit. Ask: what software uses automated decision logic, machine learning, or AI to produce outputs that affect customers, financial risk, or regulatory compliance? Include vendor-embedded AI. Classify each by risk tier (high, medium, low) based on use case and consequence of failure. Assign an owner. Document deployment date and last validation date. Review the inventory annually, and add new systems before go-live.

The standard for “complete” is not zero gaps — it’s a documented, repeatable process that would catch new AI systems before they operate outside the governance framework.


Finding #2: AI Tools Not Classified Under SR 11-7 Scope

What examiners see: An institution uses an ML-based fraud scoring engine to block transactions in real time. It’s not in the model inventory. It’s never been validated. When asked, the team says “it’s a vendor product, not a model.”

Why it happens: SR 11-7 defines a model as a quantitative method that maps inputs to outputs to make business decisions. The “vendor product” exception doesn’t exist — the vendor built it, but the institution is using it to make decisions. OCC Bulletin 2025-26 was specifically issued to address scope ambiguity for community banks, and it doesn’t create a vendor exception.

How to fix it: Apply the SR 11-7 model definition to your vendor-supplied AI tools. If the system ingests data, runs quantitative logic, and produces an output used in a consequential decision — credit, fraud, AML, customer service routing — it’s a model. Classify it accordingly, document the basis for classification, and apply appropriate governance. Some tools (basic automation, rule-based logic with no statistical learning) may fall outside model scope — document that determination too.


Finding #3: No Independent Validation

What examiners see: An institution’s AI model was developed by the analytics team, and the “validation” was reviewed by the same analytics team. Or: the vendor provided its own validation documentation, which the institution accepted as-is.

Why it happens: Independent validation is expensive and logistically difficult for small institutions. And vendor validation documentation feels like it should be sufficient — after all, they built it.

How to fix it: SR 11-7 is explicit: validation must be performed by parties independent of model development. For vendor AI, vendor documentation supports validation but doesn’t substitute for it. Independent validation doesn’t require a separate department — it requires someone without a conflict of interest, structured to evaluate whether the model does what it claims, performs as expected on your data, and is used as intended. Options include: internal audit with quantitative capability, a second-line model risk function, or a third-party validation firm for high-risk models.

For low-risk models, a risk-based, lighter-touch validation is defensible — but document the risk-based rationale. Don’t skip validation entirely; skip depth proportionate to tier.


Finding #4: Model Documentation Doesn’t Match Actual Deployment

What examiners see: The model documentation describes the model as originally designed. The production model has been retrained twice, had features removed, and is running on different infrastructure. Nobody updated the documentation.

Why it happens: Documentation is written once, filed away, and never revisited. Model changes that would trigger re-documentation in a mature MRM program slip through without triggering anything in less mature programs.

How to fix it: Establish a model change management process with explicit documentation triggers: model retraining, feature additions or removals, threshold changes, infrastructure migrations, and use case expansions. Any material change requires documentation update and, for material changes, re-validation. For LLMs and GenAI specifically — where the model may update through provider releases — document the version in use, the update cadence, and how you evaluate changes for risk impact.


Finding #5: Third-Party AI Vendors Not in TPRM Program

What examiners see: An institution applies rigorous vendor due diligence to its core banking processor and payment rails — SOC 2 reports reviewed, contractual SLAs negotiated, annual reviews scheduled. The AI vendor providing a credit scoring model has no questionnaire, no contractual monitoring rights, and hasn’t been reviewed since onboarding.

Why it happens: AI vendors often entered through a side door — a product team purchased an AI tool without routing through vendor management. Or the vendor is a software company, and the institution didn’t recognize that using their AI product creates a model risk exposure requiring TPRM treatment.

How to fix it: AI vendors whose tools meet the SR 11-7 model definition should be in your TPRM program with AI-specific due diligence. That means: a questionnaire covering training data sourcing and bias controls, model explainability and documentation, drift monitoring capabilities, incident notification procedures, and data handling under applicable privacy laws. The contractual relationship should give you access to validation documentation and the right to audit. For the OCC’s purposes, using a vendor’s AI model in consequential decisions doesn’t transfer the governance obligation to the vendor — it stays with you.

Internal link: Third-party AI due diligence framework →


Finding #6: Explainability Requirements Not Documented

What examiners see: A credit decisioning AI is producing adverse action notices that say “decision based on model score.” The institution can’t explain which input features drove the score for any given applicant, and the model documentation provides no mechanism for doing so.

Why it happens: Many ML models — gradient boosting, neural networks, ensemble models — produce accurate predictions through processes that aren’t inherently interpretable. The pressure to deploy fast frequently outpaces the effort to build explainability into the governance framework.

How to fix it: For any AI model used in consumer-facing credit, lending, or account management decisions, document your explainability approach before deployment. For complex models, this means: identifying the top contributing features for any given decision, establishing a method for generating adverse action reason codes that satisfy ECOA and Reg B requirements, and testing whether those codes are actually meaningful to the applicant. SHAP values, LIME, and other interpretability techniques provide feature-level attribution — the documentation should explain which technique you use and how reason codes are derived. Examiners are not asking for mathematical derivations; they’re asking whether you have a documented, defensible method.


Finding #7: No Ongoing Monitoring Program

What examiners see: An AI model was validated at deployment two years ago. It is running in production. Nothing has been done since. There are no monitoring dashboards, no defined performance thresholds, no re-validation trigger criteria.

Why it happens: Validation is a defined event with a clear output — a report. Ongoing monitoring is an open-ended commitment without a natural endpoint, and it typically gets deprioritized once the model is live.

How to fix it: Establish a monitoring framework before deployment, not after. The framework should include: the performance metrics being tracked (accuracy, approval rate, denial rate, output distribution), the frequency of review (monthly for high-risk, quarterly for medium-risk), and the thresholds that trigger escalation or re-validation. For AI models, add drift monitoring — Population Stability Index (PSI) or similar statistics that detect when the model’s input data distribution has shifted from its training data. The monitoring documentation should be reviewed at least annually and updated when thresholds are adjusted.

For more on what AI monitoring frameworks should include, see the SR 11-7 for AI Systems post.


Finding #8: AI Not Included in Incident Response Planning

What examiners see: An institution has a mature cybersecurity incident response plan and a well-tested business continuity program. Neither document mentions AI model failures: what constitutes an AI “incident,” who owns response, what’s the trigger for escalation, and how do you notify affected parties?

Why it happens: Incident response plans were written before AI was operational at scale. AI failures — a model producing systematically biased outputs, an LLM generating incorrect compliance guidance, a fraud model triggering a wave of false positives — don’t fit cleanly into existing incident response runbooks.

How to fix it: Add an AI-specific incident response annex. Define what constitutes an AI incident (performance degradation below threshold, evidence of systematic bias, data quality failure affecting model inputs, unauthorized model modification, vendor AI service outage). Assign clear ownership — typically the model owner with escalation to the model risk function. Establish escalation criteria that link AI incident severity to your enterprise incident response tiers. For LLMs used in customer-facing applications, define the process for identifying and notifying customers potentially affected by model errors.

A 2025 survey found only 54% of organizations maintain incident response playbooks for AI-specific risks — which means nearly half of institutions would improvise when something goes wrong.


Prioritizing Your Fix List

If you’re looking at this list and trying to figure out where to start, prioritize by regulatory risk exposure:

PriorityDeficiencyRisk Driver
1AI model inventoryFoundational — everything else flows from knowing what you have
2SR 11-7 scope classificationHigh-risk AI running outside MRM is the core finding
3Vendor AI in TPRMThird-party AI specifically flagged in OCC Spring 2025 report
4Independent validationSR 11-7 core requirement; gaps here produce MRAs not recommendations
5Explainability documentationConsumer protection and fair lending risk escalation
6Ongoing monitoringRequired under SR 11-7; absence is a clean finding
7Documentation accuracyExam credibility; inconsistency signals weak governance
8Incident response coverageGrowing scrutiny; lower acute risk but significant gap signal

The AI Governance Program Checklist covers the full program requirements behind each of these areas.


So What?

Exam preparation for AI governance isn’t about finding a defensible answer when a finding lands — it’s about building the program before examiners walk through the door. The eight deficiencies above are patterns, not outliers. They represent what happens when institutions deploy AI faster than they build governance.

The OCC and Federal Reserve aren’t expecting perfection. They’re expecting a risk-based program with documented methodology, clear ownership, and demonstrable evidence of execution. An incomplete model inventory with a documented remediation plan is better than a claim of completeness that falls apart under questioning.

The institutions that are cleanest in AI governance examinations share one thing: they treat it as a program, not a project. There’s no finish line — just a continuous loop of inventory, validation, monitoring, and improvement.

If your AI governance program was built in a hurry, now is the time to rebuild it properly. The AI Risk Assessment Template provides the structure to get there: model inventory, pre-deployment checklist, vendor questionnaire, and governance documentation — designed for teams that need to show progress without building a full model risk function from scratch.

Frequently Asked Questions

What are the most common AI governance exam findings regulators are citing?
Based on regulatory guidance and documented examiner patterns, the top findings are: (1) incomplete or missing AI model inventory — especially excluding vendor-supplied AI; (2) AI tools not classified as 'models' under SR 11-7 scope, leaving them without MRM oversight; (3) no independent validation of AI models, or validation performed by the same team that built the model; (4) model documentation that doesn't match actual deployment; (5) third-party AI vendors with no due diligence, questionnaire, or contractual monitoring; (6) no ongoing monitoring for model drift or performance degradation. Each of these maps to documented OCC, Fed, and FDIC supervisory expectations.
Do OCC and Federal Reserve examiners actually look at AI governance specifically?
Yes. The OCC's Semiannual Risk Perspective (both Spring and Fall 2025) specifically flagged AI — especially third-party AI — as an elevated risk area. OCC Bulletin 2025-26 (October 2025) addressed model risk management clarifications including AI scope questions. The May 2025 GAO report (GAO-25-107197) confirmed that OCC, Fed, and FDIC examiners apply SR 11-7 to AI models, but do so with inconsistent depth across institutions. 'Inconsistent' doesn't mean 'not at all' — it means your examiner team determines how deep they go.
What does 'independent validation' mean for an AI model, and who can do it?
SR 11-7 requires that model validation be performed by parties independent of model development — meaning someone not involved in building or selecting the model. For small institutions, this creates a challenge: 'independent' doesn't require a separate department, but it does require a documented conflict-of-interest firewall. Options include: a dedicated validation function within the second line of defense, a third-party model validation firm, or an internal audit team with quantitative capability. What doesn't satisfy the requirement: the vendor who built the model validating their own work.
Is vendor-supplied AI subject to the same exam scrutiny as internally built models?
Yes — and this is where many institutions are caught flat-footed. SR 11-7 applies to any model used in a decision-making context, regardless of who built it. The OCC's Spring 2025 Semiannual Risk Perspective specifically called out AI built on third-party platforms or using external data as carrying elevated risk. Examiners ask: Is the vendor AI in your inventory? What due diligence did you perform? What contractual monitoring rights do you have? 'The vendor takes care of it' is not an exam-ready answer.
How quickly can I fix an incomplete AI model inventory if I just got an MRA?
A basic inventory can be rebuilt in two to four weeks with the right structure. Start with a cross-functional inventory sweep: ask IT, operations, compliance, and each business line to list every AI tool or automated decision system they use, including software that embeds AI from vendors. Classify each by use case and risk tier. Document owner, deployment date, and validation status. The biggest delay is usually agreement on scope — deciding what counts as 'AI' — not the data gathering itself. The AI Risk Assessment Template includes a pre-built inventory structure that can accelerate this.
What's the difference between an AI governance exam finding and an MRA?
An exam finding is any gap or deficiency identified during the exam, ranging from informal recommendations to Matters Requiring Attention (MRAs) and Matters Requiring Immediate Attention (MRIAs). AI governance gaps that affect safety and soundness, or that create fair lending risk, are more likely to be elevated to MRA level. Missing documentation is typically a recommendation; a model with no validation producing credit decisions for thousands of customers is more likely an MRA. Severity depends on materiality and examiner judgment.
Rebecca Leung

Rebecca Leung

Rebecca Leung has 8+ years of risk and compliance experience across first and second line roles at commercial banks, asset managers, and fintechs. Former management consultant advising financial institutions on risk strategy. Founder of RiskTemplates.

Related Framework

AI Risk Assessment Template & Guide

Comprehensive AI model governance and risk assessment templates for financial services teams.

Immaterial Findings ✉️

Weekly newsletter

Sharp risk & compliance insights practitioners actually read. Enforcement actions, regulatory shifts, and practical frameworks — no fluff, no filler.

Join practitioners from banks, fintechs, and asset managers. Delivered weekly.