Feature AI Risk

AI Model Documentation: What Examiners Actually Want to See in 2026

Map SR 11-7 and OCC 2011-12 documentation requirements to AI and ML models. Section-by-section template for model cards, training data provenance, and examiner-ready documentation.

By Rebecca Leung · March 29, 2026 ·

ai model documentation model risk management SR 11-7

Table of Contents

TL;DR:

SR 11-7 and OCC Bulletin 2011-12 require documentation “detailed enough that parties unfamiliar with the model can understand its operation” — and examiners are now applying that standard to AI/ML models.

Traditional model documentation templates miss AI-specific elements: training data provenance, hyperparameter decisions, explainability approaches, drift thresholds, and known failure modes.

Build an AI model card for every model in your inventory — we break down each section below with what examiners expect.

Your Model Documentation Is Probably Getting You MRAs

Here’s what happens in most AI-related examinations right now: the examiner asks for the model documentation. Your team hands over the standard model development document — the one that worked fine for your logistic regression models. The examiner flips through it, asks where the training data provenance is documented, how you validated for bias, what your drift thresholds are, and what happens when the model hallucinates. Blank stares.

According to McKinsey’s 2025 State of AI survey, 51% of organizations using AI reported at least one negative AI-related consequence in the past year — with inaccuracy as the leading issue. Yet only 28% have CEO-level governance oversight, and just 17% report board-level AI governance. The documentation gap is a symptom of a broader governance problem, and examiners know it.

The regulatory expectations haven’t changed in principle — SR 11-7 (Federal Reserve, April 2011) and OCC Bulletin 2011-12 still form the backbone. But the application to AI/ML models requires fundamentally different documentation practices. Here’s what examiners actually want to see.

What SR 11-7 and OCC 2011-12 Actually Require

Both guidance documents establish the same core principle: model documentation must be “sufficiently detailed that parties unfamiliar with a model could understand how the model operates, its limitations, and its key assumptions.” That standard applies to every model — including the ones powered by neural networks and large language models.

The guidance organizes documentation requirements around three pillars:

Pillar	SR 11-7 Expectation	AI/ML Translation
Model Development	Document theoretical basis, methodology choices, data sources, variable selection, testing results	Training data provenance, architecture decisions, hyperparameter choices, feature engineering, benchmark evaluations
Model Validation	Independent review of soundness, including developmental evidence, process verification, and outcomes analysis	Adversarial testing results, bias evaluations, robustness checks, validation techniques specific to ML/LLM models
Ongoing Monitoring	Performance tracking, comparison of outcomes to expectations, stability assessment	Drift detection thresholds, performance decay metrics, retraining triggers, output consistency monitoring

The problem isn’t that the guidance is silent on what to document — it’s that most MRM teams are using documentation templates built for spreadsheet-based models in 2012 and haven’t updated them for models where “variable selection” means “we fine-tuned a transformer on 500GB of text data.”

The Model Card: Your AI Documentation Foundation

The concept of a “model card” was introduced by Mitchell et al. in 2019 at Google and has since become an industry standard adopted by organizations including Hugging Face and Google DeepMind. Think of it as a nutrition label for machine learning models — a standardized format that makes model details scannable and comparable.

For regulated financial services firms, the model card isn’t a replacement for your full model development document. It’s the executive summary that sits on top and gives examiners — and your own risk committee — a rapid understanding of what the model does, how it was built, and where the risks live.

A well-constructed model card for a regulated AI system should include:

Model Details: Name, version, type (classification, regression, generative), owner, intended purpose
Intended Use: Approved use cases, out-of-scope uses, known limitations
Training Data: Source, size, date range, representativeness, any known biases
Performance Metrics: Accuracy, precision, recall, F1, AUC — appropriate to the model type
Fairness Evaluations: Disparate impact testing results across protected classes
Ethical Considerations: Known failure modes, potential for harm, human oversight requirements
Caveats and Recommendations: What users should and shouldn’t rely on

Section-by-Section: What Goes in AI Model Documentation

Below is the expanded documentation template that maps SR 11-7’s requirements to AI/ML-specific content. This is what examiners are looking for in 2026.

Section 1: Model Overview and Business Context

What examiners expect: A clear statement of what the model does, why it was built, and what business decisions depend on its output.

What to document:

Model name, version number, and unique identifier (ties to your model inventory)
Business problem being solved and the decision the model informs
Model type and architecture (logistic regression, random forest, neural network, LLM, etc.)
Model tier/risk classification (links to your risk tiering methodology when published)
Model owner (name and role — not a team, a person)
Date deployed, last validated, next validation due

Section 2: Training Data Provenance

What examiners expect: Full documentation of where the training data came from, how it was selected, and whether it’s representative of the population the model serves.

This is where most AI documentation falls short. Traditional model docs describe data sources in a paragraph. For AI/ML models, examiners want:

Element	What to Document
Data sources	Every source, including vendor-provided data, internal databases, public datasets, and synthetic data
Date ranges	Training data time window and whether it covers relevant economic cycles
Volume	Number of records, features, and target distribution
Representativeness	Demographic breakdown compared to the target population
Labeling methodology	Who labeled the data, what instructions they followed, inter-annotator agreement rates
Data cleaning	Outlier detection methods, missing value treatment, transformation logic
Known gaps	What the data doesn’t cover — and what that means for model performance

For LLMs and generative AI models, also document: the source corpus (or vendor’s disclosed training approach), any fine-tuning data, retrieval-augmented generation (RAG) knowledge bases, and the data governance controls applied to each.

Section 3: Architecture and Design Decisions

What examiners expect: Not a textbook explanation of how neural networks work — a documented rationale for why you chose this architecture and the tradeoffs you considered.

What to document:

Model architecture choice and alternatives considered (with rationale for selection)
Key hyperparameters and how they were tuned (grid search, Bayesian optimization, manual selection)
Feature engineering decisions — which features were created, transformed, or excluded and why
For LLMs: prompt engineering approach, system prompts, temperature and parameter settings, guardrails configuration
For ensemble models: component model descriptions and combination methodology
Computational resources required for training and inference
Third-party components used (pre-trained models, open-source libraries, vendor APIs) — with version numbers

Section 4: Performance Metrics and Validation Results

What examiners expect: Quantitative evidence that the model works, documented in a way that enables effective challenge.

What to document:

Model Type	Primary Metrics	Additional Requirements
Classification	Accuracy, precision, recall, F1, AUC-ROC	Confusion matrix, performance by segment
Regression	RMSE, MAE, R², MAPE	Residual analysis, prediction intervals
Generative/LLM	Faithfulness score, hallucination rate, toxicity rate, task-specific accuracy	Human evaluation results, benchmark scores (e.g., MMLU, HellaSwag)

For every metric, document:

The validation dataset (separate from training data)
How test/validation splits were created
Performance across demographic subgroups (this isn’t optional — fair lending requirements demand it)
Comparison to the baseline/champion model and to simpler alternative approaches
Known conditions where performance degrades

Section 5: Explainability and Interpretability

What examiners expect: How you can explain the model’s decisions — especially when those decisions affect consumers.

SR 11-7 requires “effective challenge,” which means someone independent must be able to question the model’s logic. For opaque AI models, this means documenting your explainability approach:

Global explanations: What drives the model overall? (Feature importance, SHAP summary plots, attention patterns)
Local explanations: How do you explain individual decisions? (LIME, individual SHAP values, counterfactual explanations)
Limitations of the explanation method: SHAP and LIME are approximations — document what they can’t capture
Consumer-facing explanations: If the model drives adverse action notices (credit denial, insurance pricing), document how the explanation is generated and its fidelity to the actual model logic

Section 6: Known Limitations and Failure Modes

What examiners expect: Honest documentation of where the model breaks down.

This is counterintuitive for some teams — why would you document your model’s weaknesses? Because examiners already assume every model has them. What concerns them is when you haven’t identified them.

What to document:

Known edge cases where performance degrades
Input conditions that produce unreliable outputs
Population segments where accuracy drops below acceptable thresholds
For LLMs: documented hallucination patterns, prompt injection vulnerabilities, topics where the model lacks expertise
Compensating controls for each limitation (human review, fallback logic, output filters)
Scenarios that should trigger a model kill switch or shutdown (when published)

Section 7: Ongoing Monitoring Plan

What examiners expect: A defined plan — not a vague commitment to “keep an eye on it.”

What to document:

Monitoring Element	Specification
Performance metrics tracked	List each metric and its threshold
Data drift detection	Method (PSI, KL divergence, chi-squared), threshold, check frequency
Concept drift detection	How you detect when the relationship between inputs and outputs changes
Alert and escalation paths	Who gets notified, at what thresholds, and what happens next
Retraining triggers	Specific conditions that require model retraining or re-validation
Monitoring cadence	Daily, weekly, monthly — tied to model tier
Reporting	Who receives monitoring reports and how often

Section 8: Change Log and Version History

What examiners expect: A complete record of every change to the model, who approved it, and why.

Date, description, and rationale for every model update
Who approved the change (name and role)
Whether the change triggered re-validation
Rollback plan if the change degrades performance
Version number tied to your model inventory

The EU AI Act Raises the Bar Even Higher

If you operate in or serve EU markets, Article 11 of the EU AI Act and its Annex IV establish specific technical documentation requirements for high-risk AI systems that go beyond US regulatory expectations. High-risk AI systems must have technical documentation prepared before being placed on the market.

Annex IV requires documentation of: the general system description and intended purpose, design specifications including key algorithm choices and trade-offs, system architecture and computational resources, training data descriptions including provenance and labeling procedures, validation and testing procedures with metrics and test logs, and human oversight measures.

For firms operating across jurisdictions, the practical approach is to build documentation to the EU standard — it will satisfy both EU requirements and US examiner expectations under SR 11-7.

Colorado SB 205: Documentation for Deployers

Colorado’s AI Act (SB 24-205), effective January 1, 2027, adds state-level documentation requirements. Both developers and deployers of high-risk AI systems must maintain documentation of their risk management policies, including impact assessments and records of consumer notifications. Deployers must also document any known incidents of algorithmic discrimination and the corrective measures taken.

If you haven’t read our full Colorado AI Act compliance guide, start there for the complete picture.

Making It Practical: A 30/60/90-Day Documentation Roadmap

You can’t fix every documentation gap at once. Here’s how to prioritize:

Days 1–30: Triage and Template

Deliverable: Updated AI model documentation template approved by MRM leadership
Owner: Head of Model Risk Management or Chief Risk Officer
Actions: Inventory your current AI/ML models (or update your model inventory). Assess each model’s existing documentation against the sections above. Identify the highest-risk models with the biggest documentation gaps. Build the template using the sections in this article.

Days 31–60: High-Risk Model Documentation

Deliverable: Complete documentation packages for all Tier 1 (highest risk) AI models
Owner: Individual model owners (data scientists/ML engineers), validated by MRM team
Actions: Populate the new template for each Tier 1 model. Conduct gap-fill research where training data provenance or design rationale was never formally documented. Add explainability documentation (SHAP, LIME results) where missing. Document known limitations honestly.

Days 61–90: Validation, Process Integration, and Tier 2 Models

Deliverable: Independent review of Tier 1 documentation; completed documentation for Tier 2 models
Owner: Model validation team (or qualified third party)
Actions: Independent reviewers validate that documentation supports effective challenge. Integrate the new template into the model development lifecycle — every new model ships with a completed model card. Begin Tier 2 model documentation. Establish the quarterly documentation review cadence.

So What?

Documentation isn’t paperwork — it’s your evidence of sound risk management. When an examiner asks “how does this AI model work?” and you hand them a comprehensive, honest model card with training data provenance, performance metrics by demographic group, documented limitations, and a clear monitoring plan, you’ve just demonstrated exactly what SR 11-7 demands.

When you hand them a two-page Word doc that says “this model uses machine learning to predict credit risk” and nothing else, you’ve just earned an MRA. Or worse.

The firms that treat AI documentation as an afterthought will spend their 2026 exam cycles writing remediation plans. The ones that build it into the model development process — documentation as a first-class artifact, not a compliance checkbox — will spend those cycles deploying more models.

Need a structured framework to assess and document your AI model risks? The AI Risk Assessment Template & Guide includes risk scoring matrices, documentation templates, and control frameworks designed for AI/ML models in financial services.

FAQ

What documentation do examiners look for first when reviewing AI models?

Examiners typically start with the model inventory to understand scope, then request the full model development document for high-risk models. They focus on training data provenance (where did the data come from and is it representative?), validation results (especially performance across demographic segments), and the ongoing monitoring plan (what triggers re-validation?). If any of these are missing or thin, it’s an immediate red flag. Documentation of known limitations is also high on their list — they want to see that you’ve identified the model’s weaknesses, not just its strengths.

How is AI model documentation different from traditional model documentation?

Traditional model documentation (designed for linear regressions, scorecards, and financial models) focuses on variable selection, coefficient stability, and backtesting. AI model documentation must additionally cover training data provenance and labeling methodology, hyperparameter tuning decisions, explainability approaches for opaque models (SHAP, LIME), known failure modes like hallucinations or adversarial vulnerabilities, and drift detection thresholds. The core regulatory principle is the same — documentation sufficient for an unfamiliar party to understand the model — but the specifics are fundamentally different for AI/ML systems.

Do I need a separate model card for every AI model in production?

Yes. Every AI model in your model inventory should have its own model card — a standardized summary document covering its purpose, training data, performance, fairness evaluations, and limitations. The model card serves as the “nutrition label” that gives examiners and risk committees a quick understanding of the model. Full development documentation sits behind it for deeper review. For vendor-provided AI models, you should still create a model card documenting what the vendor disclosed, your independent validation results, and any gaps in vendor transparency.

◆ Need the working template?

Start with the source guide.

These answer-first guides summarize the required fields, evidence, and implementation steps behind the templates practitioners search for.

AI Risk

AI Risk Assessment Template Guide

Open source guide →

AI Risk

Generative AI Acceptable Use Policy Template Guide

Open source guide →

◆ Immaterial Findings · Weekly

Sharp risk & compliance insights. No fluff.

Author

Rebecca Leung

Rebecca Leung has 8+ years of risk and compliance experience across first and second line roles at commercial banks, asset managers, and fintechs. Former management consultant advising financial institutions on risk strategy. Founder of RiskTemplates.

◆ Related framework

AI Risk Assessment Template & Guide

Comprehensive AI model governance and risk assessment templates for financial services teams.

See what's included → Buy now — $59

◆ Keep reading

AI Model Documentation: What Examiners Actually Want to See in 2026

Your Model Documentation Is Probably Getting You MRAs

What SR 11-7 and OCC 2011-12 Actually Require

The Model Card: Your AI Documentation Foundation

Section-by-Section: What Goes in AI Model Documentation

Section 1: Model Overview and Business Context

Section 2: Training Data Provenance

Section 3: Architecture and Design Decisions

Section 4: Performance Metrics and Validation Results

Section 5: Explainability and Interpretability

Section 6: Known Limitations and Failure Modes

Section 7: Ongoing Monitoring Plan

Section 8: Change Log and Version History

The EU AI Act Raises the Bar Even Higher

Colorado SB 205: Documentation for Deployers

Making It Practical: A 30/60/90-Day Documentation Roadmap

Days 1–30: Triage and Template

Days 31–60: High-Risk Model Documentation

Days 61–90: Validation, Process Integration, and Tier 2 Models

So What?

FAQ

What documentation do examiners look for first when reviewing AI models?

How is AI model documentation different from traditional model documentation?

Do I need a separate model card for every AI model in production?

Start with the source guide.

AI Risk Assessment Template Guide

Generative AI Acceptable Use Policy Template Guide

AI Risk Assessment Template & Guide

Sharp risk & compliance insights. No fluff.

Rebecca Leung

AI Risk Assessment Template & Guide

Related posts.

Agentic AI in Financial Services 2026: The Governance Framework Your Board Doesn't Know It Needs

SR 26-2 for Community and Regional Banks: What the Proportionality Principle Actually Requires

SR 26-2 and OCC 2026-13: What the New Model Risk Management Guidance Changes — and the GenAI Gap Your Program Needs to Close

Want to stay sharp?

AI Model Documentation: What Examiners Actually Want to See in 2026

Your Model Documentation Is Probably Getting You MRAs

What SR 11-7 and OCC 2011-12 Actually Require

The Model Card: Your AI Documentation Foundation

Section-by-Section: What Goes in AI Model Documentation

Section 1: Model Overview and Business Context

Section 2: Training Data Provenance

Section 3: Architecture and Design Decisions

Section 4: Performance Metrics and Validation Results

Section 5: Explainability and Interpretability

Section 6: Known Limitations and Failure Modes

Section 7: Ongoing Monitoring Plan

Section 8: Change Log and Version History

The EU AI Act Raises the Bar Even Higher

Colorado SB 205: Documentation for Deployers

Making It Practical: A 30/60/90-Day Documentation Roadmap

Days 1–30: Triage and Template

Days 31–60: High-Risk Model Documentation

Days 61–90: Validation, Process Integration, and Tier 2 Models

So What?

FAQ

What documentation do examiners look for first when reviewing AI models?

How is AI model documentation different from traditional model documentation?

Do I need a separate model card for every AI model in production?

Start with the source guide.

AI Risk Assessment Template Guide

Generative AI Acceptable Use Policy Template Guide

AI Risk Assessment Template & Guide

Sharp risk & compliance insights. No fluff.

Rebecca Leung

AI Risk Assessment Template & Guide

Related posts.

Agentic AI in Financial Services 2026: The Governance Framework Your Board Doesn't Know It Needs

SR 26-2 for Community and Regional Banks: What the Proportionality Principle Actually Requires

SR 26-2 and OCC 2026-13: What the New Model Risk Management Guidance Changes — and the GenAI Gap Your Program Needs to Close

The brief, in your inbox.