Agentic AI Governance: The Compliance Gap Nobody's Talking About
Table of Contents
TL;DR
- Agentic AI—systems that plan, decide, and take autonomous actions—creates compliance gaps that SR 11-7, Reg E, and UDAAP weren’t designed to cover
- The CFPB issued an August 2025 ANPRM specifically asking: who counts as a “representative” acting on a consumer’s behalf in an agentic context?
- SR 11-7’s definition of a “model” is too narrow for systems that don’t produce estimates—they execute actions
- Compliance programs need four components today: pre-authorization constraints, continuous validation protocols, decision trace logs, and UDAAP review for agent-generated communications
Your AI is making decisions right now. Calling APIs, generating customer emails, adjusting pricing parameters, flagging accounts for review. Maybe approving things. Maybe communicating things to customers that your compliance team hasn’t seen.
And if you’re being honest—your compliance program wasn’t built for this.
The gap isn’t about awareness. Risk managers know agentic AI is coming; many have been tracking it for months. The gap is structural. SR 11-7, Regulation E, UDAAP, and your current model governance framework all share a foundational assumption that turns out to be wrong for agentic systems: that a human stands between the AI’s output and the real-world consequence.
With traditional AI, a model scores a loan. A human approves it. With agentic AI, the model scores, decides, and acts—often before any human can intervene.
That difference changes everything about how compliance needs to work.
What “Agentic AI” Actually Means for Compliance
Before getting into the regulatory gaps, it’s worth being precise. “Agentic AI” isn’t just AI with more capabilities. It’s a fundamentally different operational model.
A conventional AI system produces an output. A human acts on that output. A model risk manager validates the model’s logic. An examiner reviews the output population. The feedback loop involves humans at every consequential step.
An agentic AI system is orchestrated differently:
| Characteristic | Traditional Model | Generative AI (LLM) | Agentic AI |
|---|---|---|---|
| Output type | Prediction or score | Text or content | Decision + action |
| Human role | Acts on output | Reviews output | Often bypassed entirely |
| Tool access | None | Limited | APIs, databases, email, payment rails |
| State | Stateless | Session-scoped | Persistent memory across sessions |
| Failure mode | Wrong prediction | Hallucination | Unauthorized or harmful action at scale |
| Validation window | Scheduled review | Scheduled review | Continuous; behavior changes between reviews |
Gartner projected that 40% of financial services firms would be deploying AI agents by end of 2026. The Consumer Bankers Association’s January 2026 agentic AI white paper documented over a dozen use cases already in production across retail banking, payments, and commercial lending—including agents that send customer notifications, adjust exposure limits, and initiate collections workflows.
None of those use cases fit neatly into your current governance model.
The SR 11-7 Problem: When Validation Assumptions Break
SR 11-7—the Federal Reserve’s 2011 guidance on model risk management, jointly issued with the OCC as OCC Bulletin 2011-12—defines a model as “a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates.”
That definition assumes:
- The model produces an estimate (not an action)
- The system is relatively static (stable parameters between reviews)
- Decision paths are reconstructible (validation can determine how an output was reached)
Agentic AI breaks all three assumptions.
As the GARP analysis on SR 11-7 and agentic AI laid out clearly, “material changes in behavior can occur without a formal redevelopment event”—meaning an agentic system might behave differently after weeks of interaction even though no one changed the model. Traditional periodic validation cycles miss this entirely.
The three most consequential gaps:
Gap 1: Dynamic Validation
SR 11-7’s validation framework assumes you can validate a model and then apply that validation for a review cycle. Conceptual soundness assessments, outcomes analysis, and benchmarking are all designed for systems that stay reasonably stable between reviews.
Agentic systems don’t. They adapt, learn from interactions, and may develop behavioral drift that has nothing to do with a formal model change. An agent that handles collections calls might shift toward more aggressive language patterns after months of reinforcement—without any model update triggering a revalidation event.
What this means in practice: validation for agentic systems needs to be continuous, not periodic. You need monitoring that detects behavioral drift in real time, not an annual validation cycle that catches it 11 months late.
Gap 2: Third-Party Concentration Risk
Most agentic AI deployments don’t use models built in-house. They use foundation models from OpenAI, Anthropic, Google, or Microsoft as the underlying reasoning engine, then layer proprietary business logic on top.
This creates a concentration risk that SR 11-7’s third-party model validation requirements weren’t designed to handle. When the underlying model provider updates their model—which happens continuously for cloud-hosted systems—your agent’s behavior can change without your knowledge and without triggering any internal review process.
The OCC Bulletin 2013-29 TPRM framework applies, but it wasn’t built for a scenario where the “vendor” updates the product in real time. Your vendor risk assessment needs a new section.
Gap 3: Explainability Standards
SR 11-7 “emphasizes transparency sufficient to enable effective challenge”—but provides no specific standard for what explainability looks like for a system that chains multiple decisions in a multi-step reasoning process.
If an agent makes a credit exposure decision based on six sequential reasoning steps, none of which are individually recordable in your current audit log, how do you explain that to an examiner? What does “adequate explainability” mean for a system that reasons differently every time?
Regulators haven’t answered this yet. That doesn’t mean you can wait.
Reg E and the Authorization Void
Here’s the compliance gap that could produce the largest consumer protection liability exposure.
Regulation E (EFTA) protects consumers for electronic fund transfers. The dispute framework assumes a consumer authorized a specific transaction. When something goes wrong, the consumer disputes it, and the financial institution investigates whether the transaction was authorized.
What happens when an AI agent initiates a transaction on the consumer’s behalf?
This is the exact question the CFPB is wrestling with. In August 2025, the bureau issued an Advance Notice of Proposed Rulemaking on personal financial data rights, specifically seeking comment on who can serve as a “representative” operating on a consumer’s behalf. The Center for Data Innovation’s March 2026 analysis pointed out the core problem: Reg E mentions authorization via “card, code, or other means” but “provides no framework for disputes when agents malfunction—such as ordering incorrect items or failing to recognize artificially inflated prices that humans would catch.”
The practical problem for compliance: if a consumer authorizes an AI agent to manage their finances, and the agent initiates a transfer the consumer later disputes—is that an authorized transaction? Under current Reg E, the answer is genuinely unclear.
This matters for any financial institution building:
- Automated bill payment agents
- Personal financial management tools that move money
- AI-driven savings or investment rebalancing
- Any consumer-facing agent that touches payment rails
The CFPB hasn’t finished the rulemaking. But enforcement won’t wait for rulemaking to finish. Build your authorization frameworks now.
What the authorization framework needs:
- Explicit consumer consent to agent authority, scoped to specific action types
- Spending limits and action category limits baked into the agent’s permission model
- Human override availability for any transaction above a defined threshold
- Transaction logs that clearly show agent-initiated vs. consumer-initiated activity
UDAAP Risk: When Your Agent Speaks
The CFPB has been clear that UDAAP applies to AI-generated consumer communications. “There are no exceptions to the federal consumer financial protection laws for new technologies.”
That statement should focus attention on a specific agentic AI risk that most compliance programs have completely missed: the agent is talking to your customers, and compliance hasn’t reviewed what it’s saying.
Traditional consumer communications go through a review workflow. A compliance officer reviews the draft, flags potentially deceptive language, approves the final version. That workflow assumes there’s a fixed document to review.
Agentic AI generates bespoke communications dynamically. Each message is different—different tone, different framing, different emphasis—based on the interaction context. You cannot review agentic communications the same way you review a batch-generated notice.
The UDAAP risk is real and specific:
- An agent that handles fee complaints might describe fee structures in ways that are technically accurate but misleading in context
- An agent that manages collections might use language calibrated for engagement that ends up being aggressive in ways that would fail UDAAP review
- An agent that discusses account options might emphasize certain products in ways that could constitute deceptive marketing
The compliance question isn’t just “what did the agent say”—it’s “what does the agent say across 10,000 conversations, and does the aggregate pattern constitute a deceptive practice?”
What the UDAAP review process for agentic AI needs:
- Systematic sampling of agent-generated communications (not just reviewing templates)
- Behavioral monitoring for language drift and escalation patterns
- Consumer complaint analysis for agent-specific issues
- Pre-deployment testing scenarios for deception and manipulation vectors
What Your Compliance Program Is Actually Missing
Most financial institutions deploying agentic AI have some version of the following:
- An AI use case inventory that includes their agents
- A vendor due diligence questionnaire for the foundation model provider
- Some version of “human-in-the-loop” for high-stakes decisions
That’s a start. It’s not a governance framework.
Here’s the specific list of what’s typically missing:
| Gap | What’s Missing | Why It Matters |
|---|---|---|
| Pre-authorization scope definition | No documented list of what the agent can and cannot do before deployment | Creates unlimited authority for agent errors |
| Continuous validation protocol | Periodic review cycle only | Misses behavioral drift and emergent behavior |
| Agent action audit trail | No structured log of what the agent did and why | Cannot satisfy examiner documentation requests |
| Reg E authorization mapping | Vague “consumer consent” without transaction-type scoping | Dispute liability exposure |
| UDAAP monitoring for agent communications | Template review only, no dynamic content monitoring | Pattern-level deception risk |
| Circuit breaker definitions | No defined conditions that halt agent operations | Cascade failure risk |
| Third-party model change management | No protocol for vendor model updates | Undetected behavioral drift from upstream changes |
The Treasury’s FS AI RMF (February 2026) introduced 230 control objectives across seven domains. It’s the most comprehensive sector-specific framework available. But it was built around the same model governance assumptions as SR 11-7—it doesn’t specifically address the unique compliance obligations of systems that take autonomous action.
Building the Agentic Governance Stack: What to Do in the Next 90 Days
You don’t have to wait for regulatory clarity to start building. The controls below apply regardless of what specific rules ultimately emerge, because they’re grounded in basic compliance hygiene—authorization, transparency, accountability, and monitoring.
Days 1–30: Scope and Constrain
Agent inventory. Add a new field to your AI use case inventory: does this system take direct actions (versus producing outputs for human review)? For every agentic system, document the specific actions it can take.
Pre-authorization frameworks. For each agent, define: action categories permitted, spending/transaction limits, prohibited actions, and escalation triggers. This is your agent’s “permission model”—it should be documented before the agent goes live and reviewed whenever the agent’s capabilities expand.
Circuit breakers. Define the conditions under which an agent stops operating: error rate thresholds, unusual transaction volume, consumer complaint spikes, security alerts. Document who can reset the circuit breaker and what review is required before reinstatement.
Days 31–60: Instrument and Monitor
Action audit logs. Work with your engineering team to ensure every agent action is logged with: timestamp, action type, input context, output action, and confidence/reasoning trace where available. This doesn’t need to be perfect—it needs to be sufficient to reconstruct what happened in a dispute or examination.
Behavioral monitoring. Implement monitoring for: agent communication tone and language drift, transaction volume anomalies, customer complaint patterns tied to agent interactions, and error rates by action type. Set amber/red thresholds and document who receives alerts.
Consumer authorization mapping. Review your consumer authorization disclosures for any product that uses an agentic system. Are consumers explicitly authorizing the types of actions the agent can take? Is the scope of agent authority clearly disclosed? Fix gaps before deployment or as part of your next disclosure update cycle.
Days 61–90: Test and Validate
Adversarial testing. Before deploying any consumer-facing agent—and as part of periodic revalidation—test specifically for: unauthorized action scope expansion, deceptive communication patterns, response to edge cases and manipulation attempts, and behavior under error conditions.
UDAAP sampling. Pull a stratified sample of agent-generated communications. Route them through your existing UDAAP review process. Identify patterns rather than individual messages. Build this into your ongoing monitoring cadence.
Third-party change management. Establish a protocol for foundation model vendor updates: how are you notified? What testing is triggered before resuming production use? What behavioral benchmarks establish the baseline?
So What?
The compliance gap in agentic AI isn’t a theoretical future problem. Agents are in production at financial institutions today. The CFPB is actively working on regulatory frameworks that will fill the current voids. When those frameworks arrive, they will be enforced—and enforcement will look backward at the control environments institutions had in place.
The question isn’t whether agentic AI governance will become a regulatory priority. The GAO’s 2025 report on AI oversight in financial services (GAO-25-107197) made the regulatory trajectory clear: more oversight is coming, specifically for autonomous systems. The question is whether your compliance program has documented evidence that you took this seriously before the examiner arrived.
The practitioners who build these frameworks now—before the rules are finalized—are the ones who have documented evidence of thoughtful governance when examiners show up asking about the agent that sent 50,000 collection notices last quarter.
The practitioners who wait for the final rules are the ones who have “we’re working on it” as their answer.
Build the AI governance program checklist now, before your agentic systems are in scope for a regulatory exam. See also our guide on applying SR 11-7 to AI systems and the broader agentic AI risk management framework for governance controls.
If you’re building or expanding an AI governance program, the AI Risk Assessment Template includes a model inventory with agentic AI fields, pre-deployment checklist, and third-party vendor questionnaire designed for the current regulatory environment.
Related Template
AI Risk Assessment Template & Guide
Comprehensive AI model governance and risk assessment templates for financial services teams.
Frequently Asked Questions
What is the compliance gap with agentic AI?
Does SR 11-7 apply to agentic AI systems?
What is the CFPB doing about agentic AI and Reg E?
What should compliance programs build for agentic AI today?
How is agentic AI different from traditional AI models for compliance purposes?
Is there a regulatory framework specifically for agentic AI in financial services?
Rebecca Leung
Rebecca Leung has 8+ years of risk and compliance experience across first and second line roles at commercial banks, asset managers, and fintechs. Former management consultant advising financial institutions on risk strategy. Founder of RiskTemplates.
Related Framework
AI Risk Assessment Template & Guide
Comprehensive AI model governance and risk assessment templates for financial services teams.
Keep Reading
NIST AI RMF MAP Function: How to Frame AI Risk Context Before You Build or Deploy
The MAP function is where NIST AI RMF risk management actually starts. Learn what MAP 1-5 require, how financial institutions implement them, and why most teams get this wrong.
Apr 21, 2026
AI RiskContinuous Monitoring for AI Models: Drift, Degradation, and Compliance Triggers
SR 11-7 ongoing monitoring for AI models — drift detection, PSI thresholds, re-validation triggers, and what OCC examiners check in 2026.
Apr 19, 2026
AI RiskAI Model Validation Best Practices: Why Traditional Testing Breaks with Generative AI
Traditional SR 11-7 validation breaks with generative AI. Learn why deterministic testing fails for LLMs and what new validation approaches financial services firms actually need.
Apr 18, 2026
Immaterial Findings ✉️
Weekly newsletter
Sharp risk & compliance insights practitioners actually read. Enforcement actions, regulatory shifts, and practical frameworks — no fluff, no filler.
Join practitioners from banks, fintechs, and asset managers. Delivered weekly.