AI Risk

Third-Party AI Vendor Risk Assessment: Due Diligence Framework and Questionnaire

Table of Contents

TL;DR

  • When a vendor deploys AI in the service they provide you, your SR 11-7 model risk responsibilities follow — you can’t outsource accountability
  • Standard TPRM questionnaires weren’t built for AI: they miss training data bias, drift monitoring, hallucination rates, and model change notifications
  • The U.S. Treasury’s Financial Services AI RMF (February 2026) formalizes evidence-based AI vendor assessment as an expectation for financial institutions
  • Six areas of AI-specific due diligence — plus the contract provisions that actually protect you if a vendor’s model fails

Your bank partner’s compliance team just sent a questionnaire. One of the new items: “Describe your due diligence process for third-party AI vendors, including how you assess model performance, bias risk, and ongoing monitoring.” You’ve got a solid TPRM program — 300 vendors, tiered assessments, annual reviews. None of your questionnaires have the word “hallucination” in them.

That gap is exactly what regulators are starting to probe. Here’s the framework for closing it.

Why Your Existing TPRM Program Doesn’t Cover AI Risk

Traditional vendor due diligence was designed to assess vendors whose software does predictable, documented things. It covers availability and uptime, security controls, data handling practices, business continuity capabilities, and regulatory compliance posture.

AI vendors introduce a different risk category. When a vendor deploys an AI model in the service they provide your institution, you’re not just relying on a system — you’re relying on a model whose outputs depend on training data quality, parameter choices, bias characteristics, and ongoing behavioral stability. Those risks don’t appear on a standard security questionnaire.

SR 11-7, the Federal Reserve and OCC’s model risk management guidance, is unambiguous: model risk management responsibilities apply regardless of whether the model is developed internally or acquired externally. “Trust the vendor” is not a defensible posture under SR 11-7.

The GAO’s May 2025 report on AI oversight in financial services (GAO-25-107197) found that financial regulators — OCC, Federal Reserve, FDIC, CFPB — all expect institutions to apply model risk management and third-party risk principles to AI, including vendor-supplied AI. The report noted that institutions bear the burden of determining how these frameworks apply to their specific vendor AI arrangements.

On February 19, 2026, the U.S. Department of the Treasury released the Financial Services AI Risk Management Framework (FS AI RMF), a non-binding but significant document that explicitly aligns with SR 11-7 and formalizes evidence-based vendor AI assessment as an expectation for financial institutions. The FS AI RMF calls for independent testing, bias audits, hallucination measurement, and security testing — not questionnaire responses alone.

The FINRA 2025 Annual Regulatory Oversight Report flagged third-party risk — with AI as an emerging concern — as a key area of examination focus for broker-dealers.

How to Tier AI Vendors by Risk Level

Not every vendor with an AI feature warrants the same scrutiny. Tiering your AI vendor population lets you concentrate enhanced due diligence where it actually matters.

Tier 1 — Critical AI Vendors: Vendors whose AI directly affects credit decisions, fraud determinations, compliance screening, customer communications about financial products, or regulatory reporting. These require full AI-specific due diligence — documentation review, independent testing where feasible, and enhanced contract provisions. SR 11-7 validation principles apply at this tier.

Tier 2 — Material AI Vendors: Vendors using AI in non-decision-critical workflows that still touch customer data or regulated activities — AI-assisted document review, risk monitoring tools, workflow automation. Standard AI questionnaire coverage with annual review and contract AI provisions.

Tier 3 — Incidental AI Vendors: Vendors using AI in features peripheral to their primary function (AI-assisted search, auto-fill, background analytics). Disclosure and notification requirements in contracts; quarterly scanning for tier upgrade if AI use expands.

The critical question at onboarding: “In which parts of the service you provide us does AI contribute to outputs or decisions?” Most vendors will answer this accurately if asked directly. Many won’t disclose it if you don’t ask.

The Six Areas of AI-Specific Due Diligence

1. AI Model Inventory and Documentation

Ask vendors to disclose every AI model — including foundation models, fine-tuned layers, and RAG knowledge bases — that contributes to the service they provide you. Request model cards or equivalent documentation for each. Model cards are standardized documents (developed by Google Research and widely adopted in the industry) that describe a model’s intended use, training data, performance characteristics, and known limitations.

If a vendor can’t describe the AI underlying their service at this level, that’s a meaningful signal about their AI governance maturity.

2. Training Data Sourcing and Bias Controls

Training data quality determines model output quality. Key questions:

  • What datasets were used to train or fine-tune the models in your service?
  • How is training data screened for bias, particularly across protected class characteristics relevant to financial services (race, gender, age, national origin)?
  • Are demographic impact analyses performed before deployment updates?
  • How is institution data handled — specifically, is it used to train or fine-tune vendor models?

For vendors processing protected-class-relevant data in credit, employment, or housing contexts, bias testing documentation should be treated as a due diligence deliverable, not just a questionnaire checkbox.

3. Model Performance and Hallucination Risk

For traditional ML models: request accuracy, precision, recall, and AUC metrics, along with out-of-sample performance data. Ask how performance is measured in production and what the vendor’s acceptable performance degradation threshold is before retraining.

For generative AI models: standard accuracy metrics don’t apply. Ask specifically about hallucination rate measurement — the vendor’s process for detecting and measuring confident but factually incorrect outputs. For any GenAI model touching regulatory content, financial figures, or customer-facing communications, hallucination rate should be a defined, tracked metric with contractual floor.

4. Drift Monitoring and Retraining Procedures

Model performance degrades over time as real-world data distributions shift away from training conditions. Ask vendors:

  • How do you monitor for model drift in production?
  • What performance thresholds trigger retraining?
  • How frequently are models retrained or updated?
  • What notification do clients receive before a new model version is deployed?

The last question is the most important. A vendor that deploys model updates without client notification is creating a change management gap: your SR 11-7 obligations require you to assess material model changes, but you can’t assess what you don’t know about.

5. Security and Adversarial Testing

AI-specific security vulnerabilities don’t appear in standard penetration testing. For AI vendors, due diligence should cover:

  • Prompt injection vulnerability testing (for GenAI models)
  • PII extraction testing — can users manipulate the model to expose training data or other users’ data?
  • Jailbreaking resistance — does the model maintain output guardrails under adversarial prompting?
  • Adversarial input robustness — for ML models, can inputs be manipulated to produce incorrect outputs?

Ask whether vendors have completed red teaming exercises and whether results are available for review. Vendors with mature AI governance will have documented red team findings; vendors who haven’t done this work won’t know what “red teaming” means.

6. Incident Response and Notification

AI incidents — model failures, bias discoveries, harmful outputs at scale — require a different incident response posture than standard technology outages. Your contract and due diligence should establish:

  • What triggers a vendor AI incident notification (harmful output thresholds, bias discovery, model failure)?
  • What’s the notification timeline (24 hours? 72 hours?)?
  • What remediation process applies when harmful outputs are discovered post-deployment?
  • How are regulatory obligations handled if a vendor AI incident creates regulatory exposure for your institution?

This last question — regulatory responsibility allocation — is where most vendor contracts are silent. It shouldn’t be.

What Documentation to Require vs. Accept

DocumentationTier 1 (Critical)Tier 2 (Material)
Model cards or equivalentRequireRequest
Bias testing resultsRequire (pre-deployment + current)Request latest
Performance metrics on representative datasetRequireRequest summary
Red team / adversarial testing resultsRequire summaryDisclose if conducted
Audit rights for independent testingContractually requiredRequest notification rights
SOC 2 AI controls coverageRequireRequest
ISO/IEC 42001 compliance or roadmapRequestNot required

“Request” means you ask; “Require” means it goes in the contract as a condition.

Contract Provisions AI Deployments Require

Standard vendor agreements weren’t written for AI. Before deploying a vendor’s AI in a critical function, ensure your contract includes:

Performance floors: Specific, measurable accuracy and error rate thresholds below which the vendor is in breach. Vague language like “commercially reasonable performance” is unenforceable when a model starts hallucinating.

Bias testing obligations: Contractual requirement to test for demographic disparate impact before and after major model updates, with results shared with your institution.

Change notification: Written notification at least 14-30 days before any material AI model update, with your right to request a reassessment period before the new model is used in your workflows.

Incident notification: Defined notification window (typically 24-48 hours) for AI incidents — harmful outputs, bias discoveries, model failures — with a clear definition of what constitutes an incident.

Data handling restrictions: Explicit confirmation that your institution’s data will not be used to train, fine-tune, or improve vendor AI models without separate written consent.

Audit rights: Your right (or a designated third party’s right) to conduct independent testing of the vendor’s AI performance on use-case-specific inputs, on a defined schedule and upon material change.

For more on building a comprehensive AI vendor questionnaire within your broader AI governance program, see our posts on applying SR 11-7 to AI systems and the AI governance program checklist regulators actually test.

Red Flags in Vendor Responses

Watch for:

  • “Our AI is proprietary and we can’t share that information.” A refusal to disclose model documentation for critical AI is not a standard confidentiality position — it’s a vendor whose AI governance doesn’t support your regulatory requirements. This is a deal-breaker for Tier 1 vendors.
  • “We use a third-party AI but we’re not sure which one.” Fourth-party AI risk is real. A vendor that doesn’t know what’s running underneath their own product has no AI governance program.
  • “Our model hasn’t changed significantly.” No quantified performance data, no drift monitoring results. Follow up with specific questions about metrics.
  • Certification claims without documentation. “We’re ISO 42001 compliant” should be backed by a certificate or audit report. “We follow ISO 42001 principles” is meaningless.

So What? Your Immediate Actions

1. Inventory your current vendors for AI use. Ask existing vendors a single screening question: “In what ways does your service incorporate AI or machine learning?” Review responses and retier vendors based on answers.

2. Update your TPRM questionnaire to include the six areas above before your next due diligence cycle. Don’t wait for the annual renewal — add an AI supplement for critical vendors now.

3. Audit your existing contracts for AI-specific provisions. Most will be silent on model change notification and AI incident response. Prioritize contract amendments for Tier 1 vendors at next renewal.

4. Build an AI vendor register tracking: vendor name, AI use case, model tier (1/2/3), last assessment date, documentation received, and contract AI provision status.

Our AI Risk Assessment Template & Guide includes a third-party AI vendor due diligence questionnaire covering all six areas, a vendor risk tiering matrix, and AI contract provision checklist — built for compliance teams managing vendor AI without a dedicated model risk department.

For context on how examiners are approaching AI governance gaps right now, see our post on common regulatory exam findings on AI.

Frequently Asked Questions

Does SR 11-7 apply to AI models we get from third-party vendors?
Yes. SR 11-7 makes clear that model risk management responsibilities apply regardless of whether the model is developed internally or acquired from a vendor. The institution relying on the vendor model remains responsible for understanding the model, validating it appropriately, and ensuring it performs as intended in the institution's specific context. 'Trust the vendor' is not an acceptable SR 11-7 posture.
What's the difference between a standard vendor questionnaire and an AI-specific due diligence questionnaire?
A standard vendor questionnaire covers security controls, business continuity, data handling, and regulatory compliance. An AI-specific questionnaire adds: training data sourcing and bias controls, model explainability capabilities, drift monitoring and retraining procedures, hallucination rates for GenAI models, incident notification triggers for model failures or bias discoveries, and audit rights over model performance. Most standard TPRM questionnaires don't cover any of these.
What is ISO/IEC 42001 and should I ask vendors about it?
ISO/IEC 42001 is the international standard for AI Management Systems, published in 2023. It provides a framework for organizations to govern AI systems responsibly — covering risk management, bias testing, documentation, and monitoring. While not mandatory, asking vendors whether they're compliant with or pursuing certification against 42001 gives you a signal about whether their AI governance is formalized. Expect larger vendors to have this; don't expect it from early-stage software providers.
How does fourth-party AI risk work in the vendor assessment context?
Fourth-party AI risk arises when your vendor uses AI models built or fine-tuned by another provider — for example, a compliance software vendor whose product runs on a foundation model from a major AI lab. Your due diligence questionnaire should ask vendors to disclose the AI models underlying their service, including foundation models, fine-tuning providers, and RAG knowledge base sources. A vendor's answer of 'we use a third-party AI' without further detail is a red flag requiring follow-up.
What contract provisions should I require before deploying a vendor's AI?
At minimum: (1) performance thresholds with specific accuracy and error rate floors, (2) bias testing obligations with reporting requirements for demographic impact, (3) incident notification within a defined window when the AI produces harmful outputs or a model failure is detected, (4) audit rights allowing independent testing of model performance, (5) change notification requirements before new model versions are deployed that could affect institutional use cases, and (6) data handling restrictions confirming institution data won't be used to train the vendor's general model.
How often do AI vendors need to be reassessed compared to traditional technology vendors?
Annual reassessment is the standard for traditional technology vendors. For AI vendors, reassessment triggers should include any material AI model update — not just annual cycles. Many AI vendors push model updates continuously or quarterly; each update has the potential to change bias characteristics, accuracy thresholds, or output behavior. Your vendor management program should require vendors to notify you of material AI changes as a contractual obligation, triggering a targeted reassessment rather than waiting for the annual review.
Rebecca Leung

Rebecca Leung

Rebecca Leung has 8+ years of risk and compliance experience across first and second line roles at commercial banks, asset managers, and fintechs. Former management consultant advising financial institutions on risk strategy. Founder of RiskTemplates.

Related Framework

AI Risk Assessment Template & Guide

Comprehensive AI model governance and risk assessment templates for financial services teams.

Immaterial Findings ✉️

Weekly newsletter

Sharp risk & compliance insights practitioners actually read. Enforcement actions, regulatory shifts, and practical frameworks — no fluff, no filler.

Join practitioners from banks, fintechs, and asset managers. Delivered weekly.