AI Risk

AI Risk Assessment Template: Pre-Deployment Checklist for Financial Services

May 4, 2026 Rebecca Leung
Table of Contents

TL;DR

  • SR 26-02 (which replaced SR 11-7 on April 17, 2026) covers traditional model risk for banks over $30B but explicitly excludes generative and agentic AI. The gap is filled by NIST AI RMF 1.0, NIST AI 600-1 (GenAI Profile), and Treasury’s FS AI RMF.
  • A defensible AI risk assessment has six components: use case documentation, tiering, pre-deployment scorecard, human oversight controls, monitoring triggers, and named approver. Skip any one and the assessment won’t survive a regulatory exam.
  • State enforcement is filling the federal gap. Massachusetts AG settled with a student loan company on July 10, 2025 for $2.5 million over disparate impact in an AI underwriting model — and required a fair lending governance program for AI as part of the settlement.
  • The most common assessment failure isn’t bias testing — it’s treating assessment as a one-time pre-deployment artifact and never re-reviewing as the model drifts, the vendor updates, or usage grows.

If your bank partner has asked for your AI governance documentation in the past six months, you already know the question isn’t “do you have it.” It’s “show me the assessment for this specific use case, and walk me through how you scored it.” Generic policy decks don’t satisfy that. Use-case-level evidence does.

This is the practitioner’s version of the AI risk assessment — the structure, the tiering logic, the scorecard, and the controls regulators and bank partners actually examine. It’s framework-agnostic so it works under SR 26-02, NIST AI RMF, FS AI RMF, the EU AI Act, and the Colorado AI Act. The frameworks differ on terminology; the assessment artifacts they want are nearly identical.

Why This Is Suddenly Urgent

Three forces converged in the past 12 months and made AI risk assessment a top-tier compliance priority.

The federal framework rewrote itself. The Federal Reserve and OCC issued SR 26-02 and OCC Bulletin 2026-13 on April 17, 2026, replacing SR 11-7 after 15 years. The revised guidance modernizes traditional model risk management — but explicitly excludes generative and agentic AI from scope. Banks now operate in a multi-framework environment: SR 26-02 for traditional ML, NIST AI RMF 1.0 for sector-agnostic AI governance, NIST AI 600-1 for GenAI specifically, and Treasury’s FS AI RMF translating NIST into 230 financial-services-specific control objectives. (We have a full crosswalk between the four frameworks if you want the side-by-side.)

State AGs are filling the federal enforcement vacuum. With CFPB enforcement contracted and the OCC deferring fair lending exams through January 2026, state attorneys general have moved aggressively. On July 10, 2025, Massachusetts AG Andrea Joy Campbell announced a $2.5 million settlement with a student loan company over allegations that its AI underwriting model produced disparate impact based on race and immigration status. The settlement required the lender to implement a fair lending governance program for AI — testing, controls, and risk assessments. New Jersey codified disparate impact under state law and issued explicit AI explainability guidance. The pattern: federal restraint plus state activism equals more, not less, enforcement risk.

Bank partners moved before regulators did. Sponsor banks are asking for AI governance documentation as part of standard third-party diligence. The questionnaires are detailed: model inventory, tiering methodology, bias testing approach, vendor AI dependencies, incident response for AI failures. “We’re working on it” doesn’t pass diligence. A documented assessment for each AI use case does.

If your fintech sells into a regulated bank, your bank partner is your immediate examiner. If you are the bank, your regulator is.

What a Pre-Deployment AI Risk Assessment Actually Contains

A defensible assessment has six required artifacts. Each one is a specific deliverable, not a section of a deck.

ArtifactPurposeFormat
Use case descriptionDefines what the model does, who uses it, what data flows through it1-page structured narrative
Risk tieringCategorizes the use case as High / Medium / LowScoring matrix with documented inputs
Pre-deployment scorecardTests the model against 11 risk domains before launchMulti-question scorecard with evidence
Human oversight controlsDocuments who reviews, who approves, who can overrideControl table with named roles
Monitoring planDefines metrics, thresholds, and re-assessment triggersMonitoring spec with frequency
Named approvalCaptures the accountable person who said “yes, deploy”Signed approval record

Miss any one and you have a partial assessment. Regulators expect all six.

Step 1: The Use Case Description

The single most underrated artifact in AI governance. Most organizations skip straight to scoring without first defining clearly what they’re scoring.

A complete use case description captures:

  • Business purpose — What problem does this solve? Who benefits?
  • Decision role — Does the model make decisions, recommend decisions, or generate content?
  • Consumer impact — Does the model directly affect a consumer outcome (lending, account access, fraud freeze)? Indirectly? Not at all?
  • Inputs — What data goes in? Personal data? Sensitive financial data? Biometric? Third-party-sourced?
  • Outputs — What comes out? A score? A recommendation? Generated text or images? An automated action?
  • Volume and scale — How many decisions per day? How many users affected?
  • Vendor and ownership — Built in-house? Third-party model (which vendor, which model, which version)? Hybrid?
  • Lifecycle stage — Pre-deployment, pilot, production, deprecation candidate?

The description isn’t paperwork. It’s the input that drives every downstream artifact. If you can’t write a clean one-pager, you don’t understand the use case well enough to assess it.

Step 2: Risk Tiering — Three Tiers, Four Inputs

Tiering is the single most common place where AI governance programs collapse. Either there are too many categories (14 tiers, none defensible) or none at all (every model gets the same controls, which is operationally impossible).

Use four inputs, three tiers.

The Four Tiering Inputs

InputLow (1 pt)Medium (2 pts)High (3 pts)
Consumer impactInternal onlyIndirect (affects employee decisions about consumers)Direct decisioning (lending, account access, etc.)
Decisioning autonomyAdvisory onlyHuman-in-the-loop requiredFully automated
PII exposureNoneNon-sensitive PIISensitive PII (financial, health, biometric)
Regulatory touchpointNoneOne regulation (e.g., GLBA)Multiple regulations including fair lending or BSA

Sum the scores: 4–6 = Low, 7–9 = Medium, 10–12 = High.

What Each Tier Means in Practice

TierPre-deploymentMonitoringApproval
HighFull scorecard + bias testing + independent validationMonthly metrics review; quarterly model reviewCRO or AI Governance Committee
MediumFull scorecard + bias testing if consumer-facingQuarterly metrics review; annual model reviewCCO or designated approver
LowLightweight scorecardAnnual reviewDepartment head

The tiering forces you to apply controls proportionate to risk — and gives you a defensible answer when an examiner asks “why doesn’t this Medium-tier chatbot have the same controls as your underwriting model?”

Step 3: The Pre-Deployment Scorecard — 11 Risk Domains

The scorecard is where the assessment actually does work. For a high-tier use case, expect 40+ specific questions across 11 domains. For Low and Medium, fewer, but the domains are the same.

The 11 Risk Domains

  1. Data quality and lineage — Is training data documented? Is it representative? When was it last refreshed?
  2. Model performance — What metrics define success? What’s the baseline performance? Failure mode behavior?
  3. Bias and fair lending — Has the model been tested for disparate impact across protected classes? What’s the methodology?
  4. Transparency and explainability — Can you explain individual decisions? In plain language? To a consumer if required?
  5. Security and adversarial robustness — Has the model been tested against adversarial inputs? Prompt injection (for GenAI)? Data poisoning?
  6. Privacy and data minimization — What personal data is in training? In inference inputs? In outputs? What’s the legal basis?
  7. Third-party dependencies — What vendor models or APIs does this depend on? What are their controls? What happens if they change?
  8. Operational resilience — What’s the failover if the model is unavailable? Acceptable degraded mode?
  9. Regulatory compliance — Which specific rules apply? UDAAP? ECOA? FCRA? BSA? Has each been mapped?
  10. Consumer disclosure — Are consumers told the model is in use? Is the disclosure consistent with privacy notice?
  11. Documentation and audit trail — Can you reconstruct the model version, training data, and inputs for any past decision?

For each question, the assessment captures: the answer, the evidence (link to documentation, test result, vendor SOC 2, etc.), the residual risk, and any required mitigations before deployment.

Bias Testing Specifically

For consumer-facing models — especially in lending, account access, fraud, and pricing — bias testing isn’t optional. The minimum acceptable methodology:

  • Disparate impact ratio (4/5ths rule for hiring/credit; lower thresholds for high-risk uses)
  • Statistical parity difference
  • Equal opportunity difference
  • Calibration across protected classes

Each test should run on a representative validation set, not just the training data. Results should be retained as audit evidence for the same period as the model itself.

The CFPB’s 2023 guidance on adverse action with AI remains in force: if you use algorithms to make credit decisions, you must explain them in plain language. Sample form letters don’t satisfy ECOA when the actual reason is buried in a black-box model.

Step 4: Human Oversight Controls

Every High-tier and most Medium-tier use cases require human controls. Document them concretely:

  • Pre-decision review — When does a human review before the model’s output is acted on?
  • Override capability — Can a human override the model’s decision? Who has the authority? How is it logged?
  • Sample auditing — What % of decisions are sampled for human QA? At what cadence?
  • Escalation triggers — What outcomes trigger immediate human review (e.g., flagged sensitive demographic patterns)?
  • Consumer-facing recourse — Can a consumer request human review of an adverse decision? How?

For GenAI and agentic AI, the NIST AI 600-1 GenAI Profile explicitly calls out these oversight controls. So does the EU AI Act for high-risk systems. Don’t write controls in the abstract — name the role, the system, the threshold, and the workflow.

Step 5: Monitoring and Re-Assessment Triggers

The biggest mistake: treating assessment as a one-time pre-deployment artifact. Models drift. Vendors update their backends without telling you. Training data goes stale. Usage scales beyond what was assessed.

A complete monitoring plan defines:

MetricFrequencyThresholdTrigger Action
Performance (e.g., AUC, accuracy)Weekly>5% degradationModel owner review
Bias metrics across protected classesMonthlyDI ratio <0.8Suspend deployment, escalate
Input data driftWeekly>10% distribution shiftRe-validate model
Volume of overridesMonthly>5% override rateInvestigate model fit
Customer complaintsMonthlyTrend up >15%Root cause review

And re-assessment triggers — the events that force a fresh full assessment:

  • Vendor model version change
  • New data source added to training
  • Expansion to a new product, geography, or customer segment
  • Material change in consumer-facing UI or workflow
  • Regulatory change (new state law, agency guidance)
  • Sustained drift past threshold
  • Annual calendar review (default, even if nothing else triggered)

The monitoring plan is also where you tie AI risk back into your overall risk register — high-tier AI use cases are entries in your enterprise risk inventory, not floating in their own world.

Step 6: Named Approval

Every assessment ends with a named human approving the deployment in writing. Not “the AI Governance Committee approved.” A name, a role, a date, and a record of which version of the assessment they approved.

Why does this matter? When something goes wrong — and at scale, something will — regulators will ask “who approved this?” “Nobody specifically” is not an acceptable answer. “The Chief Risk Officer, on March 12, 2026, approving Version 2 of the assessment” is.

Common Findings From Bank Partner Diligence

If you’re a fintech selling into sponsor banks, the diligence questions you’ll see most often:

  1. “Show me your AI inventory.” Have a structured spreadsheet with one row per use case: name, tier, vendor, status, last assessment date, approver.

  2. “Walk me through your tiering for this use case.” Walk them through the four-input methodology. Show the score. Don’t improvise.

  3. “What’s your testing for fair lending bias on consumer-facing models?” Have the methodology documented. Have recent test results. If you don’t have either, say so honestly — improvising is worse than admitting a gap with a remediation plan.

  4. “What happens when your AI vendor updates their underlying model?” Have a vendor change-management workflow. Show how you become aware of changes (contractual notice, status page monitoring) and what re-assessment that triggers.

  5. “How do you handle Shadow AI?” Have a periodic discovery process — survey, network monitoring, finance/expense review. Have a stated policy on which tools require AI Governance Committee review before adoption.

  6. “Who owns AI risk?” A name. Not a team.

The bank’s job is to make sure that if their regulator asks them about your AI, they have an answer that doesn’t involve “we don’t really know what they’re doing.” Give them that answer in writing.

So What?

Three takeaways for the practitioner.

First, the federal framework gap is real and won’t close in 2026. SR 26-02 is a major upgrade for traditional ML, but GenAI is governed by NIST AI RMF, NIST AI 600-1, and FS AI RMF — none mandatory, all examined. The practical fix is one assessment template that produces evidence satisfying all four frameworks. Don’t build framework-specific processes; you’ll burn out.

Second, the most expensive failure isn’t a bias test you didn’t run. It’s treating assessment as a one-time artifact and never re-reviewing. Vendor model updates, drift, scale changes, regulatory change — any of those can convert a properly-assessed Low-tier model into a high-risk one without anyone noticing. Build re-assessment triggers into the workflow.

Third, state enforcement is the immediate exposure for fintechs and consumer-facing institutions. Massachusetts AG, NJ DOJ, Texas AG, NY DFS, and California’s CPPA are all actively pursuing AI cases. The federal vacuum gets filled. Your assessment has to be defensible against state-level disparate impact theories, not just federal.

If you want a built-out version of this — including the AI Use Case Inventory with auto-tiering formula, the 44-question pre-deployment scorecard, the third-party AI vendor questionnaire, and 8 worked examples (fraud detection, customer chatbot, credit underwriting, AML monitoring, marketing GenAI, Shadow AI ChatGPT, BaaS KYC AI, crypto sanctions AI) — our AI Risk Assessment Template & Guide gives you the operational artifacts. It’s mapped to NIST AI RMF 1.1, SR 26-02 / OCC 2026-13, FS AI RMF, the Colorado AI Act, CFPB ECOA AI provisions, and EU AI Act high-risk requirements.

The assessment that gets built isn’t the one with the most questions. It’s the one your team can actually run, every time, before every deployment.

Frequently Asked Questions

What's the minimum AI risk assessment a community bank or small fintech actually needs to do before deploying a model?
Six things at minimum: (1) document the use case, the decisioning role, and whether it touches consumers; (2) tier the use case (high/medium/low) using consumer impact, decisioning autonomy, PII exposure, and regulatory touchpoint; (3) run a pre-deployment scorecard covering bias, robustness, transparency, third-party dependencies, and data lineage; (4) document the human-in-the-loop and override controls; (5) define monitoring metrics and triggers for re-review; (6) get a named approver to sign off. Anything less than that won't survive a regulator's question of 'show me your AI governance.'
Does SR 26-02 apply to generative AI?
No — and this is the gap most banks haven't closed. SR 26-02, which replaced SR 11-7 on April 17, 2026, explicitly excludes generative and agentic AI from its scope. It covers traditional models and quantitative ML at banks over $30 billion. For GenAI, you're operating under NIST AI RMF 1.0, the NIST AI 600-1 GenAI Profile, and the Treasury FS AI RMF — none of which are mandatory but all of which examiners will reference. The practical answer: build one assessment template that satisfies all three frameworks, and apply it regardless of model type.
How do you tier AI use cases without ending up with 12 categories?
Use four inputs and three tiers. Inputs: consumer impact (none / indirect / direct decisioning), decisioning autonomy (advisory / human-in-the-loop / fully automated), PII exposure (none / non-sensitive / sensitive / financial), regulatory touchpoint (none / one rule / multiple rules including fair lending or BSA). Score each, sum, and bucket: High (full lifecycle controls), Medium (proportionate controls), Low (lightweight). Three tiers force tradeoffs and are defensible; 12 tiers means you can't explain your own framework to an examiner.
What's the most common AI risk assessment mistake examiners flag?
Treating the assessment as a one-time pre-deployment artifact. Models drift, vendors update their underlying systems without telling you, training data goes stale, and the consumer impact that was 'medium' at deployment becomes 'high' six months later when usage grows. The fix is a re-assessment trigger list: vendor model version change, more than X% drift in performance metrics, change in consumer-facing UI, expansion to a new product line, regulatory change. If none of those have triggered in 12 months, do a calendar-based annual review anyway.
Do we need a separate AI risk assessment for off-the-shelf tools like ChatGPT or Microsoft Copilot?
Yes, and this is where 'Shadow AI' becomes a regulatory problem. Every GenAI tool used to make business decisions, generate customer-facing content, or process customer data needs an assessment — even if your firm didn't build it. Vendor-hosted GenAI is third-party model risk. The questions are different (you can't audit OpenAI's training) but the discipline is the same: document the use case, classify the data flowing in, define acceptable outputs, and assign human oversight. The Treasury FS AI RMF explicitly addresses third-party AI in its 230 control objectives.
Who owns AI risk assessment at most banks and fintechs?
It depends on size. At banks over $30 billion: Model Risk Management owns the assessment for traditional ML, and a dedicated AI Governance Committee — usually chaired by the CRO with CCO, CISO, and CDO members — owns GenAI. At mid-size institutions: the CCO or CRO typically owns it, with input from MRM if it exists. At fintechs: the CCO or Head of Risk owns it, often with the CTO as the technical reviewer. The wrong answer is 'the data science team' — they build the models, they should not approve their own risk.
Rebecca Leung

Rebecca Leung

Rebecca Leung has 8+ years of risk and compliance experience across first and second line roles at commercial banks, asset managers, and fintechs. Former management consultant advising financial institutions on risk strategy. Founder of RiskTemplates.

Related Framework

AI Risk Assessment Template & Guide

Comprehensive AI model governance and risk assessment templates for financial services teams.

Immaterial Findings ✉️

Weekly newsletter

Sharp risk & compliance insights practitioners actually read. Enforcement actions, regulatory shifts, and practical frameworks — no fluff, no filler.

Join practitioners from banks, fintechs, and asset managers. Delivered weekly.