Third-Party AI Vendor Risk Assessment: Due Diligence Framework and Questionnaire
Table of Contents
TL;DR
- When a vendor deploys AI in the service they provide you, your SR 11-7 model risk responsibilities follow — you can’t outsource accountability
- Standard TPRM questionnaires weren’t built for AI: they miss training data bias, drift monitoring, hallucination rates, and model change notifications
- The U.S. Treasury’s Financial Services AI RMF (February 2026) formalizes evidence-based AI vendor assessment as an expectation for financial institutions
- Six areas of AI-specific due diligence — plus the contract provisions that actually protect you if a vendor’s model fails
Your bank partner’s compliance team just sent a questionnaire. One of the new items: “Describe your due diligence process for third-party AI vendors, including how you assess model performance, bias risk, and ongoing monitoring.” You’ve got a solid TPRM program — 300 vendors, tiered assessments, annual reviews. None of your questionnaires have the word “hallucination” in them.
That gap is exactly what regulators are starting to probe. Here’s the framework for closing it.
Why Your Existing TPRM Program Doesn’t Cover AI Risk
Traditional vendor due diligence was designed to assess vendors whose software does predictable, documented things. It covers availability and uptime, security controls, data handling practices, business continuity capabilities, and regulatory compliance posture.
AI vendors introduce a different risk category. When a vendor deploys an AI model in the service they provide your institution, you’re not just relying on a system — you’re relying on a model whose outputs depend on training data quality, parameter choices, bias characteristics, and ongoing behavioral stability. Those risks don’t appear on a standard security questionnaire.
SR 11-7, the Federal Reserve and OCC’s model risk management guidance, is unambiguous: model risk management responsibilities apply regardless of whether the model is developed internally or acquired externally. “Trust the vendor” is not a defensible posture under SR 11-7.
The GAO’s May 2025 report on AI oversight in financial services (GAO-25-107197) found that financial regulators — OCC, Federal Reserve, FDIC, CFPB — all expect institutions to apply model risk management and third-party risk principles to AI, including vendor-supplied AI. The report noted that institutions bear the burden of determining how these frameworks apply to their specific vendor AI arrangements.
On February 19, 2026, the U.S. Department of the Treasury released the Financial Services AI Risk Management Framework (FS AI RMF), a non-binding but significant document that explicitly aligns with SR 11-7 and formalizes evidence-based vendor AI assessment as an expectation for financial institutions. The FS AI RMF calls for independent testing, bias audits, hallucination measurement, and security testing — not questionnaire responses alone.
The FINRA 2025 Annual Regulatory Oversight Report flagged third-party risk — with AI as an emerging concern — as a key area of examination focus for broker-dealers.
How to Tier AI Vendors by Risk Level
Not every vendor with an AI feature warrants the same scrutiny. Tiering your AI vendor population lets you concentrate enhanced due diligence where it actually matters.
Tier 1 — Critical AI Vendors: Vendors whose AI directly affects credit decisions, fraud determinations, compliance screening, customer communications about financial products, or regulatory reporting. These require full AI-specific due diligence — documentation review, independent testing where feasible, and enhanced contract provisions. SR 11-7 validation principles apply at this tier.
Tier 2 — Material AI Vendors: Vendors using AI in non-decision-critical workflows that still touch customer data or regulated activities — AI-assisted document review, risk monitoring tools, workflow automation. Standard AI questionnaire coverage with annual review and contract AI provisions.
Tier 3 — Incidental AI Vendors: Vendors using AI in features peripheral to their primary function (AI-assisted search, auto-fill, background analytics). Disclosure and notification requirements in contracts; quarterly scanning for tier upgrade if AI use expands.
The critical question at onboarding: “In which parts of the service you provide us does AI contribute to outputs or decisions?” Most vendors will answer this accurately if asked directly. Many won’t disclose it if you don’t ask.
The Six Areas of AI-Specific Due Diligence
1. AI Model Inventory and Documentation
Ask vendors to disclose every AI model — including foundation models, fine-tuned layers, and RAG knowledge bases — that contributes to the service they provide you. Request model cards or equivalent documentation for each. Model cards are standardized documents (developed by Google Research and widely adopted in the industry) that describe a model’s intended use, training data, performance characteristics, and known limitations.
If a vendor can’t describe the AI underlying their service at this level, that’s a meaningful signal about their AI governance maturity.
2. Training Data Sourcing and Bias Controls
Training data quality determines model output quality. Key questions:
- What datasets were used to train or fine-tune the models in your service?
- How is training data screened for bias, particularly across protected class characteristics relevant to financial services (race, gender, age, national origin)?
- Are demographic impact analyses performed before deployment updates?
- How is institution data handled — specifically, is it used to train or fine-tune vendor models?
For vendors processing protected-class-relevant data in credit, employment, or housing contexts, bias testing documentation should be treated as a due diligence deliverable, not just a questionnaire checkbox.
3. Model Performance and Hallucination Risk
For traditional ML models: request accuracy, precision, recall, and AUC metrics, along with out-of-sample performance data. Ask how performance is measured in production and what the vendor’s acceptable performance degradation threshold is before retraining.
For generative AI models: standard accuracy metrics don’t apply. Ask specifically about hallucination rate measurement — the vendor’s process for detecting and measuring confident but factually incorrect outputs. For any GenAI model touching regulatory content, financial figures, or customer-facing communications, hallucination rate should be a defined, tracked metric with contractual floor.
4. Drift Monitoring and Retraining Procedures
Model performance degrades over time as real-world data distributions shift away from training conditions. Ask vendors:
- How do you monitor for model drift in production?
- What performance thresholds trigger retraining?
- How frequently are models retrained or updated?
- What notification do clients receive before a new model version is deployed?
The last question is the most important. A vendor that deploys model updates without client notification is creating a change management gap: your SR 11-7 obligations require you to assess material model changes, but you can’t assess what you don’t know about.
5. Security and Adversarial Testing
AI-specific security vulnerabilities don’t appear in standard penetration testing. For AI vendors, due diligence should cover:
- Prompt injection vulnerability testing (for GenAI models)
- PII extraction testing — can users manipulate the model to expose training data or other users’ data?
- Jailbreaking resistance — does the model maintain output guardrails under adversarial prompting?
- Adversarial input robustness — for ML models, can inputs be manipulated to produce incorrect outputs?
Ask whether vendors have completed red teaming exercises and whether results are available for review. Vendors with mature AI governance will have documented red team findings; vendors who haven’t done this work won’t know what “red teaming” means.
6. Incident Response and Notification
AI incidents — model failures, bias discoveries, harmful outputs at scale — require a different incident response posture than standard technology outages. Your contract and due diligence should establish:
- What triggers a vendor AI incident notification (harmful output thresholds, bias discovery, model failure)?
- What’s the notification timeline (24 hours? 72 hours?)?
- What remediation process applies when harmful outputs are discovered post-deployment?
- How are regulatory obligations handled if a vendor AI incident creates regulatory exposure for your institution?
This last question — regulatory responsibility allocation — is where most vendor contracts are silent. It shouldn’t be.
What Documentation to Require vs. Accept
| Documentation | Tier 1 (Critical) | Tier 2 (Material) |
|---|---|---|
| Model cards or equivalent | Require | Request |
| Bias testing results | Require (pre-deployment + current) | Request latest |
| Performance metrics on representative dataset | Require | Request summary |
| Red team / adversarial testing results | Require summary | Disclose if conducted |
| Audit rights for independent testing | Contractually required | Request notification rights |
| SOC 2 AI controls coverage | Require | Request |
| ISO/IEC 42001 compliance or roadmap | Request | Not required |
“Request” means you ask; “Require” means it goes in the contract as a condition.
Contract Provisions AI Deployments Require
Standard vendor agreements weren’t written for AI. Before deploying a vendor’s AI in a critical function, ensure your contract includes:
Performance floors: Specific, measurable accuracy and error rate thresholds below which the vendor is in breach. Vague language like “commercially reasonable performance” is unenforceable when a model starts hallucinating.
Bias testing obligations: Contractual requirement to test for demographic disparate impact before and after major model updates, with results shared with your institution.
Change notification: Written notification at least 14-30 days before any material AI model update, with your right to request a reassessment period before the new model is used in your workflows.
Incident notification: Defined notification window (typically 24-48 hours) for AI incidents — harmful outputs, bias discoveries, model failures — with a clear definition of what constitutes an incident.
Data handling restrictions: Explicit confirmation that your institution’s data will not be used to train, fine-tune, or improve vendor AI models without separate written consent.
Audit rights: Your right (or a designated third party’s right) to conduct independent testing of the vendor’s AI performance on use-case-specific inputs, on a defined schedule and upon material change.
For more on building a comprehensive AI vendor questionnaire within your broader AI governance program, see our posts on applying SR 11-7 to AI systems and the AI governance program checklist regulators actually test.
Red Flags in Vendor Responses
Watch for:
- “Our AI is proprietary and we can’t share that information.” A refusal to disclose model documentation for critical AI is not a standard confidentiality position — it’s a vendor whose AI governance doesn’t support your regulatory requirements. This is a deal-breaker for Tier 1 vendors.
- “We use a third-party AI but we’re not sure which one.” Fourth-party AI risk is real. A vendor that doesn’t know what’s running underneath their own product has no AI governance program.
- “Our model hasn’t changed significantly.” No quantified performance data, no drift monitoring results. Follow up with specific questions about metrics.
- Certification claims without documentation. “We’re ISO 42001 compliant” should be backed by a certificate or audit report. “We follow ISO 42001 principles” is meaningless.
So What? Your Immediate Actions
1. Inventory your current vendors for AI use. Ask existing vendors a single screening question: “In what ways does your service incorporate AI or machine learning?” Review responses and retier vendors based on answers.
2. Update your TPRM questionnaire to include the six areas above before your next due diligence cycle. Don’t wait for the annual renewal — add an AI supplement for critical vendors now.
3. Audit your existing contracts for AI-specific provisions. Most will be silent on model change notification and AI incident response. Prioritize contract amendments for Tier 1 vendors at next renewal.
4. Build an AI vendor register tracking: vendor name, AI use case, model tier (1/2/3), last assessment date, documentation received, and contract AI provision status.
Our AI Risk Assessment Template & Guide includes a third-party AI vendor due diligence questionnaire covering all six areas, a vendor risk tiering matrix, and AI contract provision checklist — built for compliance teams managing vendor AI without a dedicated model risk department.
For context on how examiners are approaching AI governance gaps right now, see our post on common regulatory exam findings on AI.
Related Template
AI Risk Assessment Template & Guide
Comprehensive AI model governance and risk assessment templates for financial services teams.
Frequently Asked Questions
Does SR 11-7 apply to AI models we get from third-party vendors?
What's the difference between a standard vendor questionnaire and an AI-specific due diligence questionnaire?
What is ISO/IEC 42001 and should I ask vendors about it?
How does fourth-party AI risk work in the vendor assessment context?
What contract provisions should I require before deploying a vendor's AI?
How often do AI vendors need to be reassessed compared to traditional technology vendors?
Rebecca Leung
Rebecca Leung has 8+ years of risk and compliance experience across first and second line roles at commercial banks, asset managers, and fintechs. Former management consultant advising financial institutions on risk strategy. Founder of RiskTemplates.
Related Framework
AI Risk Assessment Template & Guide
Comprehensive AI model governance and risk assessment templates for financial services teams.
Keep Reading
AI and Fair Lending: UDAAP Risk in Algorithmic Decisioning
CFPB's UDAAP-as-discrimination gambit was vacated, but adverse action notice requirements still bite. Here's what AI lenders actually owe consumers in 2026.
Apr 13, 2026
AI RiskCommon Regulatory Exam Findings on AI: Top Deficiencies and How to Fix Them
These are the AI governance deficiencies regulators are actually finding in exams — incomplete model inventories, missing validation records, unmanaged vendor AI — and what to do about each one.
Apr 12, 2026
AI RiskSR 11-7 for AI Systems: Applying Legacy Model Risk Guidance to LLMs
How to actually implement SR 11-7 for LLMs: model inventory, governance ownership, documentation standards, and validation scope for in-house and vendor AI.
Apr 12, 2026
Immaterial Findings ✉️
Weekly newsletter
Sharp risk & compliance insights practitioners actually read. Enforcement actions, regulatory shifts, and practical frameworks — no fluff, no filler.
Join practitioners from banks, fintechs, and asset managers. Delivered weekly.