AI Risk Assessment Template: Pre-Deployment Checklist for Financial Services
Table of Contents
TL;DR
- SR 26-02 (which replaced SR 11-7 on April 17, 2026) covers traditional model risk for banks over $30B but explicitly excludes generative and agentic AI. The gap is filled by NIST AI RMF 1.0, NIST AI 600-1 (GenAI Profile), and Treasury’s FS AI RMF.
- A defensible AI risk assessment has six components: use case documentation, tiering, pre-deployment scorecard, human oversight controls, monitoring triggers, and named approver. Skip any one and the assessment won’t survive a regulatory exam.
- State enforcement is filling the federal gap. Massachusetts AG settled with a student loan company on July 10, 2025 for $2.5 million over disparate impact in an AI underwriting model — and required a fair lending governance program for AI as part of the settlement.
- The most common assessment failure isn’t bias testing — it’s treating assessment as a one-time pre-deployment artifact and never re-reviewing as the model drifts, the vendor updates, or usage grows.
If your bank partner has asked for your AI governance documentation in the past six months, you already know the question isn’t “do you have it.” It’s “show me the assessment for this specific use case, and walk me through how you scored it.” Generic policy decks don’t satisfy that. Use-case-level evidence does.
This is the practitioner’s version of the AI risk assessment — the structure, the tiering logic, the scorecard, and the controls regulators and bank partners actually examine. It’s framework-agnostic so it works under SR 26-02, NIST AI RMF, FS AI RMF, the EU AI Act, and the Colorado AI Act. The frameworks differ on terminology; the assessment artifacts they want are nearly identical.
Why This Is Suddenly Urgent
Three forces converged in the past 12 months and made AI risk assessment a top-tier compliance priority.
The federal framework rewrote itself. The Federal Reserve and OCC issued SR 26-02 and OCC Bulletin 2026-13 on April 17, 2026, replacing SR 11-7 after 15 years. The revised guidance modernizes traditional model risk management — but explicitly excludes generative and agentic AI from scope. Banks now operate in a multi-framework environment: SR 26-02 for traditional ML, NIST AI RMF 1.0 for sector-agnostic AI governance, NIST AI 600-1 for GenAI specifically, and Treasury’s FS AI RMF translating NIST into 230 financial-services-specific control objectives. (We have a full crosswalk between the four frameworks if you want the side-by-side.)
State AGs are filling the federal enforcement vacuum. With CFPB enforcement contracted and the OCC deferring fair lending exams through January 2026, state attorneys general have moved aggressively. On July 10, 2025, Massachusetts AG Andrea Joy Campbell announced a $2.5 million settlement with a student loan company over allegations that its AI underwriting model produced disparate impact based on race and immigration status. The settlement required the lender to implement a fair lending governance program for AI — testing, controls, and risk assessments. New Jersey codified disparate impact under state law and issued explicit AI explainability guidance. The pattern: federal restraint plus state activism equals more, not less, enforcement risk.
Bank partners moved before regulators did. Sponsor banks are asking for AI governance documentation as part of standard third-party diligence. The questionnaires are detailed: model inventory, tiering methodology, bias testing approach, vendor AI dependencies, incident response for AI failures. “We’re working on it” doesn’t pass diligence. A documented assessment for each AI use case does.
If your fintech sells into a regulated bank, your bank partner is your immediate examiner. If you are the bank, your regulator is.
What a Pre-Deployment AI Risk Assessment Actually Contains
A defensible assessment has six required artifacts. Each one is a specific deliverable, not a section of a deck.
| Artifact | Purpose | Format |
|---|---|---|
| Use case description | Defines what the model does, who uses it, what data flows through it | 1-page structured narrative |
| Risk tiering | Categorizes the use case as High / Medium / Low | Scoring matrix with documented inputs |
| Pre-deployment scorecard | Tests the model against 11 risk domains before launch | Multi-question scorecard with evidence |
| Human oversight controls | Documents who reviews, who approves, who can override | Control table with named roles |
| Monitoring plan | Defines metrics, thresholds, and re-assessment triggers | Monitoring spec with frequency |
| Named approval | Captures the accountable person who said “yes, deploy” | Signed approval record |
Miss any one and you have a partial assessment. Regulators expect all six.
Step 1: The Use Case Description
The single most underrated artifact in AI governance. Most organizations skip straight to scoring without first defining clearly what they’re scoring.
A complete use case description captures:
- Business purpose — What problem does this solve? Who benefits?
- Decision role — Does the model make decisions, recommend decisions, or generate content?
- Consumer impact — Does the model directly affect a consumer outcome (lending, account access, fraud freeze)? Indirectly? Not at all?
- Inputs — What data goes in? Personal data? Sensitive financial data? Biometric? Third-party-sourced?
- Outputs — What comes out? A score? A recommendation? Generated text or images? An automated action?
- Volume and scale — How many decisions per day? How many users affected?
- Vendor and ownership — Built in-house? Third-party model (which vendor, which model, which version)? Hybrid?
- Lifecycle stage — Pre-deployment, pilot, production, deprecation candidate?
The description isn’t paperwork. It’s the input that drives every downstream artifact. If you can’t write a clean one-pager, you don’t understand the use case well enough to assess it.
Step 2: Risk Tiering — Three Tiers, Four Inputs
Tiering is the single most common place where AI governance programs collapse. Either there are too many categories (14 tiers, none defensible) or none at all (every model gets the same controls, which is operationally impossible).
Use four inputs, three tiers.
The Four Tiering Inputs
| Input | Low (1 pt) | Medium (2 pts) | High (3 pts) |
|---|---|---|---|
| Consumer impact | Internal only | Indirect (affects employee decisions about consumers) | Direct decisioning (lending, account access, etc.) |
| Decisioning autonomy | Advisory only | Human-in-the-loop required | Fully automated |
| PII exposure | None | Non-sensitive PII | Sensitive PII (financial, health, biometric) |
| Regulatory touchpoint | None | One regulation (e.g., GLBA) | Multiple regulations including fair lending or BSA |
Sum the scores: 4–6 = Low, 7–9 = Medium, 10–12 = High.
What Each Tier Means in Practice
| Tier | Pre-deployment | Monitoring | Approval |
|---|---|---|---|
| High | Full scorecard + bias testing + independent validation | Monthly metrics review; quarterly model review | CRO or AI Governance Committee |
| Medium | Full scorecard + bias testing if consumer-facing | Quarterly metrics review; annual model review | CCO or designated approver |
| Low | Lightweight scorecard | Annual review | Department head |
The tiering forces you to apply controls proportionate to risk — and gives you a defensible answer when an examiner asks “why doesn’t this Medium-tier chatbot have the same controls as your underwriting model?”
Step 3: The Pre-Deployment Scorecard — 11 Risk Domains
The scorecard is where the assessment actually does work. For a high-tier use case, expect 40+ specific questions across 11 domains. For Low and Medium, fewer, but the domains are the same.
The 11 Risk Domains
- Data quality and lineage — Is training data documented? Is it representative? When was it last refreshed?
- Model performance — What metrics define success? What’s the baseline performance? Failure mode behavior?
- Bias and fair lending — Has the model been tested for disparate impact across protected classes? What’s the methodology?
- Transparency and explainability — Can you explain individual decisions? In plain language? To a consumer if required?
- Security and adversarial robustness — Has the model been tested against adversarial inputs? Prompt injection (for GenAI)? Data poisoning?
- Privacy and data minimization — What personal data is in training? In inference inputs? In outputs? What’s the legal basis?
- Third-party dependencies — What vendor models or APIs does this depend on? What are their controls? What happens if they change?
- Operational resilience — What’s the failover if the model is unavailable? Acceptable degraded mode?
- Regulatory compliance — Which specific rules apply? UDAAP? ECOA? FCRA? BSA? Has each been mapped?
- Consumer disclosure — Are consumers told the model is in use? Is the disclosure consistent with privacy notice?
- Documentation and audit trail — Can you reconstruct the model version, training data, and inputs for any past decision?
For each question, the assessment captures: the answer, the evidence (link to documentation, test result, vendor SOC 2, etc.), the residual risk, and any required mitigations before deployment.
Bias Testing Specifically
For consumer-facing models — especially in lending, account access, fraud, and pricing — bias testing isn’t optional. The minimum acceptable methodology:
- Disparate impact ratio (4/5ths rule for hiring/credit; lower thresholds for high-risk uses)
- Statistical parity difference
- Equal opportunity difference
- Calibration across protected classes
Each test should run on a representative validation set, not just the training data. Results should be retained as audit evidence for the same period as the model itself.
The CFPB’s 2023 guidance on adverse action with AI remains in force: if you use algorithms to make credit decisions, you must explain them in plain language. Sample form letters don’t satisfy ECOA when the actual reason is buried in a black-box model.
Step 4: Human Oversight Controls
Every High-tier and most Medium-tier use cases require human controls. Document them concretely:
- Pre-decision review — When does a human review before the model’s output is acted on?
- Override capability — Can a human override the model’s decision? Who has the authority? How is it logged?
- Sample auditing — What % of decisions are sampled for human QA? At what cadence?
- Escalation triggers — What outcomes trigger immediate human review (e.g., flagged sensitive demographic patterns)?
- Consumer-facing recourse — Can a consumer request human review of an adverse decision? How?
For GenAI and agentic AI, the NIST AI 600-1 GenAI Profile explicitly calls out these oversight controls. So does the EU AI Act for high-risk systems. Don’t write controls in the abstract — name the role, the system, the threshold, and the workflow.
Step 5: Monitoring and Re-Assessment Triggers
The biggest mistake: treating assessment as a one-time pre-deployment artifact. Models drift. Vendors update their backends without telling you. Training data goes stale. Usage scales beyond what was assessed.
A complete monitoring plan defines:
| Metric | Frequency | Threshold | Trigger Action |
|---|---|---|---|
| Performance (e.g., AUC, accuracy) | Weekly | >5% degradation | Model owner review |
| Bias metrics across protected classes | Monthly | DI ratio <0.8 | Suspend deployment, escalate |
| Input data drift | Weekly | >10% distribution shift | Re-validate model |
| Volume of overrides | Monthly | >5% override rate | Investigate model fit |
| Customer complaints | Monthly | Trend up >15% | Root cause review |
And re-assessment triggers — the events that force a fresh full assessment:
- Vendor model version change
- New data source added to training
- Expansion to a new product, geography, or customer segment
- Material change in consumer-facing UI or workflow
- Regulatory change (new state law, agency guidance)
- Sustained drift past threshold
- Annual calendar review (default, even if nothing else triggered)
The monitoring plan is also where you tie AI risk back into your overall risk register — high-tier AI use cases are entries in your enterprise risk inventory, not floating in their own world.
Step 6: Named Approval
Every assessment ends with a named human approving the deployment in writing. Not “the AI Governance Committee approved.” A name, a role, a date, and a record of which version of the assessment they approved.
Why does this matter? When something goes wrong — and at scale, something will — regulators will ask “who approved this?” “Nobody specifically” is not an acceptable answer. “The Chief Risk Officer, on March 12, 2026, approving Version 2 of the assessment” is.
Common Findings From Bank Partner Diligence
If you’re a fintech selling into sponsor banks, the diligence questions you’ll see most often:
-
“Show me your AI inventory.” Have a structured spreadsheet with one row per use case: name, tier, vendor, status, last assessment date, approver.
-
“Walk me through your tiering for this use case.” Walk them through the four-input methodology. Show the score. Don’t improvise.
-
“What’s your testing for fair lending bias on consumer-facing models?” Have the methodology documented. Have recent test results. If you don’t have either, say so honestly — improvising is worse than admitting a gap with a remediation plan.
-
“What happens when your AI vendor updates their underlying model?” Have a vendor change-management workflow. Show how you become aware of changes (contractual notice, status page monitoring) and what re-assessment that triggers.
-
“How do you handle Shadow AI?” Have a periodic discovery process — survey, network monitoring, finance/expense review. Have a stated policy on which tools require AI Governance Committee review before adoption.
-
“Who owns AI risk?” A name. Not a team.
The bank’s job is to make sure that if their regulator asks them about your AI, they have an answer that doesn’t involve “we don’t really know what they’re doing.” Give them that answer in writing.
So What?
Three takeaways for the practitioner.
First, the federal framework gap is real and won’t close in 2026. SR 26-02 is a major upgrade for traditional ML, but GenAI is governed by NIST AI RMF, NIST AI 600-1, and FS AI RMF — none mandatory, all examined. The practical fix is one assessment template that produces evidence satisfying all four frameworks. Don’t build framework-specific processes; you’ll burn out.
Second, the most expensive failure isn’t a bias test you didn’t run. It’s treating assessment as a one-time artifact and never re-reviewing. Vendor model updates, drift, scale changes, regulatory change — any of those can convert a properly-assessed Low-tier model into a high-risk one without anyone noticing. Build re-assessment triggers into the workflow.
Third, state enforcement is the immediate exposure for fintechs and consumer-facing institutions. Massachusetts AG, NJ DOJ, Texas AG, NY DFS, and California’s CPPA are all actively pursuing AI cases. The federal vacuum gets filled. Your assessment has to be defensible against state-level disparate impact theories, not just federal.
If you want a built-out version of this — including the AI Use Case Inventory with auto-tiering formula, the 44-question pre-deployment scorecard, the third-party AI vendor questionnaire, and 8 worked examples (fraud detection, customer chatbot, credit underwriting, AML monitoring, marketing GenAI, Shadow AI ChatGPT, BaaS KYC AI, crypto sanctions AI) — our AI Risk Assessment Template & Guide gives you the operational artifacts. It’s mapped to NIST AI RMF 1.1, SR 26-02 / OCC 2026-13, FS AI RMF, the Colorado AI Act, CFPB ECOA AI provisions, and EU AI Act high-risk requirements.
The assessment that gets built isn’t the one with the most questions. It’s the one your team can actually run, every time, before every deployment.
Related Template
AI Risk Assessment Template & Guide
Comprehensive AI model governance and risk assessment templates for financial services teams.
Frequently Asked Questions
What's the minimum AI risk assessment a community bank or small fintech actually needs to do before deploying a model?
Does SR 26-02 apply to generative AI?
How do you tier AI use cases without ending up with 12 categories?
What's the most common AI risk assessment mistake examiners flag?
Do we need a separate AI risk assessment for off-the-shelf tools like ChatGPT or Microsoft Copilot?
Who owns AI risk assessment at most banks and fintechs?
Rebecca Leung
Rebecca Leung has 8+ years of risk and compliance experience across first and second line roles at commercial banks, asset managers, and fintechs. Former management consultant advising financial institutions on risk strategy. Founder of RiskTemplates.
Related Framework
AI Risk Assessment Template & Guide
Comprehensive AI model governance and risk assessment templates for financial services teams.
Keep Reading
EU AI Act GPAI Obligations: What Providers and Downstream Deployers Must Do in 2026
EU AI Act GPAI model obligations went live August 2, 2025. Enforcement fines up to 3% of global turnover kick in August 2, 2026. Here's what every general-purpose AI provider must document, what systemic risk triggers, and what downstream deployers need from their vendors.
May 2, 2026
AI RiskGenAI Supply Chain Risk: Third-Party Model Dependencies and NIST AI 600-1 Controls
Most financial institutions using GenAI APIs don't fully own their AI supply chain. NIST AI 600-1 says that's your problem. Here's what you need to control.
Apr 25, 2026
AI RiskDeveloper vs. Deployer vs. Operator: Role-Specific Obligations Under NIST AI 600-1
NIST AI 600-1 assigns different GenAI risk obligations to developers, deployers, and operators. Here's what each role actually owns—and where the gaps live.
Apr 25, 2026
Immaterial Findings ✉️
Weekly newsletter
Sharp risk & compliance insights practitioners actually read. Enforcement actions, regulatory shifts, and practical frameworks — no fluff, no filler.
Join practitioners from banks, fintechs, and asset managers. Delivered weekly.