AI Risk

AI Model Documentation: What Examiners Actually Want to See in 2026

Table of Contents

TL;DR:

  • SR 11-7 and OCC Bulletin 2011-12 require documentation “detailed enough that parties unfamiliar with the model can understand its operation” — and examiners are now applying that standard to AI/ML models.
  • Traditional model documentation templates miss AI-specific elements: training data provenance, hyperparameter decisions, explainability approaches, drift thresholds, and known failure modes.
  • Build an AI model card for every model in your inventory — we break down each section below with what examiners expect.

Your Model Documentation Is Probably Getting You MRAs

Here’s what happens in most AI-related examinations right now: the examiner asks for the model documentation. Your team hands over the standard model development document — the one that worked fine for your logistic regression models. The examiner flips through it, asks where the training data provenance is documented, how you validated for bias, what your drift thresholds are, and what happens when the model hallucinates. Blank stares.

According to McKinsey’s 2025 State of AI survey, 51% of organizations using AI reported at least one negative AI-related consequence in the past year — with inaccuracy as the leading issue. Yet only 28% have CEO-level governance oversight, and just 17% report board-level AI governance. The documentation gap is a symptom of a broader governance problem, and examiners know it.

The regulatory expectations haven’t changed in principle — SR 11-7 (Federal Reserve, April 2011) and OCC Bulletin 2011-12 still form the backbone. But the application to AI/ML models requires fundamentally different documentation practices. Here’s what examiners actually want to see.

What SR 11-7 and OCC 2011-12 Actually Require

Both guidance documents establish the same core principle: model documentation must be “sufficiently detailed that parties unfamiliar with a model could understand how the model operates, its limitations, and its key assumptions.” That standard applies to every model — including the ones powered by neural networks and large language models.

The guidance organizes documentation requirements around three pillars:

PillarSR 11-7 ExpectationAI/ML Translation
Model DevelopmentDocument theoretical basis, methodology choices, data sources, variable selection, testing resultsTraining data provenance, architecture decisions, hyperparameter choices, feature engineering, benchmark evaluations
Model ValidationIndependent review of soundness, including developmental evidence, process verification, and outcomes analysisAdversarial testing results, bias evaluations, robustness checks, validation techniques specific to ML/LLM models
Ongoing MonitoringPerformance tracking, comparison of outcomes to expectations, stability assessmentDrift detection thresholds, performance decay metrics, retraining triggers, output consistency monitoring

The problem isn’t that the guidance is silent on what to document — it’s that most MRM teams are using documentation templates built for spreadsheet-based models in 2012 and haven’t updated them for models where “variable selection” means “we fine-tuned a transformer on 500GB of text data.”

The Model Card: Your AI Documentation Foundation

The concept of a “model card” was introduced by Mitchell et al. in 2019 at Google and has since become an industry standard adopted by organizations including Hugging Face and Google DeepMind. Think of it as a nutrition label for machine learning models — a standardized format that makes model details scannable and comparable.

For regulated financial services firms, the model card isn’t a replacement for your full model development document. It’s the executive summary that sits on top and gives examiners — and your own risk committee — a rapid understanding of what the model does, how it was built, and where the risks live.

A well-constructed model card for a regulated AI system should include:

  • Model Details: Name, version, type (classification, regression, generative), owner, intended purpose
  • Intended Use: Approved use cases, out-of-scope uses, known limitations
  • Training Data: Source, size, date range, representativeness, any known biases
  • Performance Metrics: Accuracy, precision, recall, F1, AUC — appropriate to the model type
  • Fairness Evaluations: Disparate impact testing results across protected classes
  • Ethical Considerations: Known failure modes, potential for harm, human oversight requirements
  • Caveats and Recommendations: What users should and shouldn’t rely on

Section-by-Section: What Goes in AI Model Documentation

Below is the expanded documentation template that maps SR 11-7’s requirements to AI/ML-specific content. This is what examiners are looking for in 2026.

Section 1: Model Overview and Business Context

What examiners expect: A clear statement of what the model does, why it was built, and what business decisions depend on its output.

What to document:

  • Model name, version number, and unique identifier (ties to your model inventory)
  • Business problem being solved and the decision the model informs
  • Model type and architecture (logistic regression, random forest, neural network, LLM, etc.)
  • Model tier/risk classification (links to your risk tiering methodology when published)
  • Model owner (name and role — not a team, a person)
  • Date deployed, last validated, next validation due

Section 2: Training Data Provenance

What examiners expect: Full documentation of where the training data came from, how it was selected, and whether it’s representative of the population the model serves.

This is where most AI documentation falls short. Traditional model docs describe data sources in a paragraph. For AI/ML models, examiners want:

ElementWhat to Document
Data sourcesEvery source, including vendor-provided data, internal databases, public datasets, and synthetic data
Date rangesTraining data time window and whether it covers relevant economic cycles
VolumeNumber of records, features, and target distribution
RepresentativenessDemographic breakdown compared to the target population
Labeling methodologyWho labeled the data, what instructions they followed, inter-annotator agreement rates
Data cleaningOutlier detection methods, missing value treatment, transformation logic
Known gapsWhat the data doesn’t cover — and what that means for model performance

For LLMs and generative AI models, also document: the source corpus (or vendor’s disclosed training approach), any fine-tuning data, retrieval-augmented generation (RAG) knowledge bases, and the data governance controls applied to each.

Section 3: Architecture and Design Decisions

What examiners expect: Not a textbook explanation of how neural networks work — a documented rationale for why you chose this architecture and the tradeoffs you considered.

What to document:

  • Model architecture choice and alternatives considered (with rationale for selection)
  • Key hyperparameters and how they were tuned (grid search, Bayesian optimization, manual selection)
  • Feature engineering decisions — which features were created, transformed, or excluded and why
  • For LLMs: prompt engineering approach, system prompts, temperature and parameter settings, guardrails configuration
  • For ensemble models: component model descriptions and combination methodology
  • Computational resources required for training and inference
  • Third-party components used (pre-trained models, open-source libraries, vendor APIs) — with version numbers

Section 4: Performance Metrics and Validation Results

What examiners expect: Quantitative evidence that the model works, documented in a way that enables effective challenge.

What to document:

Model TypePrimary MetricsAdditional Requirements
ClassificationAccuracy, precision, recall, F1, AUC-ROCConfusion matrix, performance by segment
RegressionRMSE, MAE, R², MAPEResidual analysis, prediction intervals
Generative/LLMFaithfulness score, hallucination rate, toxicity rate, task-specific accuracyHuman evaluation results, benchmark scores (e.g., MMLU, HellaSwag)

For every metric, document:

  • The validation dataset (separate from training data)
  • How test/validation splits were created
  • Performance across demographic subgroups (this isn’t optional — fair lending requirements demand it)
  • Comparison to the baseline/champion model and to simpler alternative approaches
  • Known conditions where performance degrades

Section 5: Explainability and Interpretability

What examiners expect: How you can explain the model’s decisions — especially when those decisions affect consumers.

SR 11-7 requires “effective challenge,” which means someone independent must be able to question the model’s logic. For opaque AI models, this means documenting your explainability approach:

  • Global explanations: What drives the model overall? (Feature importance, SHAP summary plots, attention patterns)
  • Local explanations: How do you explain individual decisions? (LIME, individual SHAP values, counterfactual explanations)
  • Limitations of the explanation method: SHAP and LIME are approximations — document what they can’t capture
  • Consumer-facing explanations: If the model drives adverse action notices (credit denial, insurance pricing), document how the explanation is generated and its fidelity to the actual model logic

Section 6: Known Limitations and Failure Modes

What examiners expect: Honest documentation of where the model breaks down.

This is counterintuitive for some teams — why would you document your model’s weaknesses? Because examiners already assume every model has them. What concerns them is when you haven’t identified them.

What to document:

  • Known edge cases where performance degrades
  • Input conditions that produce unreliable outputs
  • Population segments where accuracy drops below acceptable thresholds
  • For LLMs: documented hallucination patterns, prompt injection vulnerabilities, topics where the model lacks expertise
  • Compensating controls for each limitation (human review, fallback logic, output filters)
  • Scenarios that should trigger a model kill switch or shutdown (when published)

Section 7: Ongoing Monitoring Plan

What examiners expect: A defined plan — not a vague commitment to “keep an eye on it.”

What to document:

Monitoring ElementSpecification
Performance metrics trackedList each metric and its threshold
Data drift detectionMethod (PSI, KL divergence, chi-squared), threshold, check frequency
Concept drift detectionHow you detect when the relationship between inputs and outputs changes
Alert and escalation pathsWho gets notified, at what thresholds, and what happens next
Retraining triggersSpecific conditions that require model retraining or re-validation
Monitoring cadenceDaily, weekly, monthly — tied to model tier
ReportingWho receives monitoring reports and how often

Section 8: Change Log and Version History

What examiners expect: A complete record of every change to the model, who approved it, and why.

  • Date, description, and rationale for every model update
  • Who approved the change (name and role)
  • Whether the change triggered re-validation
  • Rollback plan if the change degrades performance
  • Version number tied to your model inventory

The EU AI Act Raises the Bar Even Higher

If you operate in or serve EU markets, Article 11 of the EU AI Act and its Annex IV establish specific technical documentation requirements for high-risk AI systems that go beyond US regulatory expectations. High-risk AI systems must have technical documentation prepared before being placed on the market.

Annex IV requires documentation of: the general system description and intended purpose, design specifications including key algorithm choices and trade-offs, system architecture and computational resources, training data descriptions including provenance and labeling procedures, validation and testing procedures with metrics and test logs, and human oversight measures.

For firms operating across jurisdictions, the practical approach is to build documentation to the EU standard — it will satisfy both EU requirements and US examiner expectations under SR 11-7.

Colorado SB 205: Documentation for Deployers

Colorado’s AI Act (SB 24-205), effective February 1, 2026, adds state-level documentation requirements. Both developers and deployers of high-risk AI systems must maintain documentation of their risk management policies, including impact assessments and records of consumer notifications. Deployers must also document any known incidents of algorithmic discrimination and the corrective measures taken.

If you haven’t read our full Colorado AI Act compliance guide, start there for the complete picture.

Making It Practical: A 30/60/90-Day Documentation Roadmap

You can’t fix every documentation gap at once. Here’s how to prioritize:

Days 1–30: Triage and Template

  • Deliverable: Updated AI model documentation template approved by MRM leadership
  • Owner: Head of Model Risk Management or Chief Risk Officer
  • Actions: Inventory your current AI/ML models (or update your model inventory). Assess each model’s existing documentation against the sections above. Identify the highest-risk models with the biggest documentation gaps. Build the template using the sections in this article.

Days 31–60: High-Risk Model Documentation

  • Deliverable: Complete documentation packages for all Tier 1 (highest risk) AI models
  • Owner: Individual model owners (data scientists/ML engineers), validated by MRM team
  • Actions: Populate the new template for each Tier 1 model. Conduct gap-fill research where training data provenance or design rationale was never formally documented. Add explainability documentation (SHAP, LIME results) where missing. Document known limitations honestly.

Days 61–90: Validation, Process Integration, and Tier 2 Models

  • Deliverable: Independent review of Tier 1 documentation; completed documentation for Tier 2 models
  • Owner: Model validation team (or qualified third party)
  • Actions: Independent reviewers validate that documentation supports effective challenge. Integrate the new template into the model development lifecycle — every new model ships with a completed model card. Begin Tier 2 model documentation. Establish the quarterly documentation review cadence.

So What?

Documentation isn’t paperwork — it’s your evidence of sound risk management. When an examiner asks “how does this AI model work?” and you hand them a comprehensive, honest model card with training data provenance, performance metrics by demographic group, documented limitations, and a clear monitoring plan, you’ve just demonstrated exactly what SR 11-7 demands.

When you hand them a two-page Word doc that says “this model uses machine learning to predict credit risk” and nothing else, you’ve just earned an MRA. Or worse.

The firms that treat AI documentation as an afterthought will spend their 2026 exam cycles writing remediation plans. The ones that build it into the model development process — documentation as a first-class artifact, not a compliance checkbox — will spend those cycles deploying more models.


Need a structured framework to assess and document your AI model risks? The AI Risk Assessment Template & Guide includes risk scoring matrices, documentation templates, and control frameworks designed for AI/ML models in financial services.

FAQ

What documentation do examiners look for first when reviewing AI models?

Examiners typically start with the model inventory to understand scope, then request the full model development document for high-risk models. They focus on training data provenance (where did the data come from and is it representative?), validation results (especially performance across demographic segments), and the ongoing monitoring plan (what triggers re-validation?). If any of these are missing or thin, it’s an immediate red flag. Documentation of known limitations is also high on their list — they want to see that you’ve identified the model’s weaknesses, not just its strengths.

How is AI model documentation different from traditional model documentation?

Traditional model documentation (designed for linear regressions, scorecards, and financial models) focuses on variable selection, coefficient stability, and backtesting. AI model documentation must additionally cover training data provenance and labeling methodology, hyperparameter tuning decisions, explainability approaches for opaque models (SHAP, LIME), known failure modes like hallucinations or adversarial vulnerabilities, and drift detection thresholds. The core regulatory principle is the same — documentation sufficient for an unfamiliar party to understand the model — but the specifics are fundamentally different for AI/ML systems.

Do I need a separate model card for every AI model in production?

Yes. Every AI model in your model inventory should have its own model card — a standardized summary document covering its purpose, training data, performance, fairness evaluations, and limitations. The model card serves as the “nutrition label” that gives examiners and risk committees a quick understanding of the model. Full development documentation sits behind it for deeper review. For vendor-provided AI models, you should still create a model card documenting what the vendor disclosed, your independent validation results, and any gaps in vendor transparency.

Rebecca Leung

Rebecca Leung

Rebecca Leung has 8+ years of risk and compliance experience across first and second line roles at commercial banks, asset managers, and fintechs. Former management consultant advising financial institutions on risk strategy. Founder of RiskTemplates.

Immaterial Findings ✉️

Weekly newsletter

Sharp risk & compliance insights practitioners actually read. Enforcement actions, regulatory shifts, and practical frameworks — no fluff, no filler.

Join practitioners from banks, fintechs, and asset managers. Delivered weekly.