Compliance Monitoring and Testing: How to Build a Risk-Based Program That Survives an Exam

May 3, 2026 • Rebecca Leung •

compliance monitoring compliance testing risk-based testing

Table of Contents

TL;DR

Compliance monitoring and testing is the pillar of your CMS that examiners interrogate hardest. A schedule that generates no escalations is a finding.

Risk-based frequency is the standard: high-risk regulations test quarterly or monthly; lower-risk, stable areas can test annually. Document the rationale.

The OCC and CFPB now both signal risk-tiered supervision — institutions with strong self-identification and self-correction postures see regulators step back; those without face deeper dives.

Self-identification is an exam maturity signal. If your testing has never caught a compliance issue, the question is whether your testing is substantive.

The CFPB exam team asks for your compliance testing calendar. You produce one — a spreadsheet with a quarterly rotation through seven regulatory areas. Looks comprehensive. But the examiner asks a follow-up: “Walk me through the last finding your testing generated and how it was remediated.” You look at the spreadsheet. There are no findings documented for 14 months.

That absence is the finding. A testing program that runs on schedule and produces nothing raises more concern than one that runs less frequently but generates and resolves real issues. Examiners evaluate compliance monitoring and testing on substance — whether the function is actually detecting risk — not just whether it exists.

Here’s how to build a program that holds up.

Why Testing Is the Pillar That Actually Gets You in Trouble

A compliance management system (CMS) has four pillars: board and management oversight, compliance program (policies, procedures, training), consumer complaint response, and compliance monitoring and testing. The OCC Comptroller’s Handbook on Compliance Management Systems evaluates all four. But testing is where the evidence of whether the CMS is actually functioning lives.

Policies can be written without being followed. Training records can be filed without retention. Complaint responses can be templated without root cause analysis. Testing, when done correctly, cuts through all of that. It’s the mechanism that would catch the gap between what your policy says and what’s happening in transactions. When testing produces nothing, it either means every control is working perfectly — or the testing isn’t looking in the right places.

The CFPB Supervision and Examination Manual is explicit: examiners evaluate monitoring and testing for substance, not form. They look at how findings are prioritized, escalated, and resolved. Testing that produces results without consequence often weakens examiner confidence more than testing gaps do.

The Four Types of Compliance Testing

A complete testing program uses multiple techniques, each serving a different purpose:

Transaction testing — the structured review of individual consumer transactions or files against a specific regulatory requirement. Sample a set of TILA disclosure packets. Pull 25 mortgage servicing records and trace each through RESPA requirements. Test a cohort of adverse action notices against ECOA’s timing and content rules. This is the most labor-intensive but most defensible form of testing.

Targeted reviews — a focused assessment of a specific control or process rather than individual transactions. Example: review your complaint management workflow against your policy, or trace how a recent regulatory change was incorporated into procedure updates. Useful for new requirements or emerging risk areas.

Ongoing monitoring — automated or periodic reviews that flag potential issues without full transaction samples. Exception reports that surface outliers. Dashboard metrics that signal trends. Complaint volume alerts. This runs continuously and catches drift between structured testing cycles.

Management information and self-assessment — business line reporting, control certifications, and internal audit reliance. These supplement rather than substitute for direct testing. They’re useful for confirming the picture your testing develops.

A mature program runs all four. Smaller programs often default to transaction testing alone — which works but misses the real-time early warning that monitoring provides.

Building a Risk-Based Testing Schedule

Risk-based frequency is the regulatory expectation. High-risk regulations and high-volume consumer-facing products get tested more often. Lower-risk, stable areas can be tested annually. The FDIC Consumer Compliance Examination Manual reinforces that sample sizes and testing frequency should be proportional to the perceived risk of consumer harm.

Here’s a defensible baseline structure:

Testing Frequency	Risk Profile	Examples
Monthly	Critical/high volume	UDAAP in consumer-facing products, BSA/AML transaction monitoring, new product launches
Quarterly	High risk, active regulatory focus	Fair lending testing, TILA/Reg Z disclosures, ECOA adverse action notices
Semi-annually	Moderate risk, established controls	Privacy notice delivery, complaint handling timelines, training completion rates
Annually	Lower risk, stable requirements	Reg E error resolution procedures, FCRA dispute handling, older product lines with stable controls

The frequency is a starting point. Document the risk rationale for each area’s cadence in your testing plan. When an examiner asks why you test a particular product quarterly, the answer shouldn’t be “that’s what the schedule says.” The answer should reference the product’s consumer impact, volume, and the regulatory requirements that apply.

Transaction Testing: How to Build a Defensible Sample

Transaction testing is where most compliance teams either over-engineer (testing everything uniformly) or under-build (pulling convenience samples with no documented methodology). Neither holds up.

A defensible approach follows this structure:

1. Define the population. What universe of transactions are you testing? Specify the date range, product type, channel, and any other relevant parameters. Document this before you pull the sample.

2. Select the sample methodology. Statistical random sampling is the gold standard. For smaller populations (under 50 transactions), testing the full population may be appropriate. For larger populations, a judgmental sample is acceptable if the selection rationale is documented — but it’s harder to defend than random sampling under examiner scrutiny.

3. Size the sample appropriately. There’s no single required number. The OCC’s guidance frames sample size as proportional to the risk of consumer harm and the need to assess compliance. A practical baseline: start with 25-50 transactions for moderate-risk areas; scale up for high-risk or high-volume areas; consider whether the sample is large enough that a 10% error rate would be statistically meaningful.

4. Define the testing criteria before you look at the files. The regulatory requirement or control being tested must be documented before you open the first transaction. Post-hoc definition of what you’re testing for is a credibility problem if an examiner asks.

5. Document each transaction reviewed and the result. A testing workpaper should show each item reviewed, the regulatory criteria applied, the result (pass/fail), and any exceptions found. The workpaper is the evidence.

How to Document Results That Hold Up

Testing documentation should answer five questions:

What did you test? (scope, population definition)
How did you select the sample? (methodology and size rationale)
What were you testing against? (regulatory requirement or control objective)
What did you find? (results by item, not just an aggregate)
What happened next? (escalation path and remediation for any exceptions)

Question 5 is where programs fail. A testing workpaper that stops at finding documentation without tracking what happened to the exception is an incomplete record. The escalation and remediation loop is what tells the story of a functioning compliance program.

When exceptions are found, the next step isn’t just remediation — it’s root cause analysis. Is this an isolated error or a systemic pattern? A single TILA disclosure timing failure might be a one-off. Five failures in three months might be a training gap, a system configuration issue, or a process breakdown. Root cause determines whether you fix one transaction or redesign a control.

The Self-Identification Imperative

The 2025 regulatory shifts — the CFPB’s “Humility in Supervision” posture, the OCC’s risk-based exam tailoring — both signal the same thing: institutions that self-identify and self-correct get more regulatory deference. Institutions whose issues are discovered by examiners rather than internal programs face heightened scrutiny and more prescriptive remediation requirements.

Self-identification doesn’t mean finding small errors. It means your testing is substantive enough to find real compliance gaps before the examiner does — and your escalation process actually routes those gaps to leadership who acts on them.

A compliance program that has run for 18 months without escalating a single finding to management is either operating in a miraculously perfect regulatory environment or running testing that isn’t looking hard enough. Examiners know the difference.

Integrating Your Three Lines of Defense

Monitoring and testing shouldn’t sit entirely within the compliance function (Second Line). An effective program leverages all three lines:

First Line (Business Units): Own day-to-day control execution and should be running their own quality assurance. Front-line monitoring catches issues closest to the transaction.

Second Line (Compliance): Runs independent testing of First Line controls, maintains the testing calendar, escalates findings, and reports to management. This is the formal testing function.

Third Line (Internal Audit): Independently tests compliance, including testing whether the Second Line testing program itself is functioning as designed.

The Three Lines of Defense model is the governance architecture. The compliance testing program is what Second Line does to execute its role. If you’re a smaller fintech without a formal Internal Audit function, a periodic external review of your compliance testing program fills the Third Line gap and is worth documenting for examiner conversations.

What the 2025–2026 Regulatory Shifts Actually Mean for Your Program

Three regulatory changes are reshaping what a defensible testing program looks like:

OCC Bulletin 2025-24 (effective January 1, 2026) eliminated mandatory policy-based examination procedures for community banks. OCC examiners now have explicit discretion to tailor scope and frequency based on risk. Practically: high-performing banks with strong self-monitoring may see less direct transaction testing by examiners and more reliance on the bank’s own results. But that reliance requires a testing program examiners actually trust.

CFPB’s 2025 supervision posture — “Humility in Supervision” — narrows the scope of exams. Narrower exams concentrated on fewer areas go deeper in those areas, not lighter. A CFPB exam that focuses specifically on your compliance program goes deep on methodology, documentation, and findings resolution.

FDIC’s 2025 compliance examination schedule update extended examination cycles for lower-risk institutions. Longer gaps between exams don’t reduce your compliance obligation — they increase the period of self-governance between external checkpoints, raising the stakes for internal monitoring.

The direction is consistent: regulators are stepping back from routine examination of well-run programs and concentrating resources on institutions with identified weaknesses. Being a “well-run program” requires demonstrating it through your testing documentation, not just asserting it.

Common Deficiencies — What Examiners Actually Find

These are the failures that appear consistently across CFPB and OCC exam reports and industry benchmarking:

Testing scope doesn’t track regulatory change. A new regulation took effect 18 months ago. It’s not on the testing calendar because the testing calendar hasn’t been updated since the prior exam.

Monitoring exists in policy, not in practice. The CMS policy describes a quarterly monitoring review process. No one has run the review in eight months.

Complaints aren’t feeding into testing. The complaint log shows repeated consumer issues with disclosure timing. The testing program is reviewing disclosure content, not timing. The connection was never made.

Findings die in a workpaper. Exceptions are documented. Nothing in the record shows they were escalated, a root cause was determined, or remediation occurred.

Testing frequency doesn’t match risk. High-volume consumer products with active regulatory attention are tested annually. A minor internal procedure is tested quarterly. The frequency is arbitrary, not risk-based.

Connecting Testing to Your RCSA and Risk Register

Your RCSA (Risk and Control Self-Assessment) should inform your testing priorities. Controls rated as weak or partially effective in the RCSA should appear more frequently in the testing calendar. Controls rated strong should appear less frequently — but shouldn’t disappear entirely.

Closed-loop integration means: RCSA identifies weak controls → testing validates the RCSA assessment → findings feed back into the RCSA as the control environment evolves. A testing program that runs independently of the RCSA misses the risk intelligence both tools generate.

This also applies to your compliance management system more broadly. Testing is one pillar, but its findings should flow into policy updates, training revisions, and board reporting. A silo’ed testing function that generates workpapers without downstream effect isn’t functioning as designed.

So What?

Building a testing program that survives an exam comes down to five practical decisions:

Document your testing calendar with risk rationale, not just frequency and scope. The “why” for each testing area is what you defend in an exam.
Build your sampling methodology before you test, not after. Post-hoc methodology is a credibility problem.
Create an escalation path with teeth. Every exception needs an owner, a timeline, and a remediation record. No open exceptions should sit unresolved beyond your own policy deadlines.
Run root cause analysis on everything, even isolated exceptions. “One-off” findings with no documented root cause analysis look like untested hypotheses.
Treat self-identification as a competitive advantage. Regulators are explicitly differentiating institutions that find their own issues from those that wait for examiners to find them.

The GRC Starter Kit includes a compliance testing calendar template, sample transaction testing workpaper format, and an escalation and remediation tracking log — so your testing program is documented, defensible, and connected from finding to resolution rather than scattered across spreadsheets and email chains.

Frequently Asked Questions

What's the difference between compliance monitoring and compliance testing?

Monitoring is ongoing — automated alerts, dashboards, complaint trend reviews, and exception reports that flag potential issues in real time or near-real time. Testing is periodic and structured — a deliberate sample review of transactions, files, or disclosures against a specific regulatory requirement or control. Both are required components of a compliance management system. Testing validates that controls are working; monitoring catches drift between testing cycles.

How often should a compliance testing program run reviews?

Frequency should be risk-based. High-risk regulations or high-volume, consumer-facing products warrant quarterly or even monthly testing. Medium-risk areas typically test semi-annually. Lower-risk, stable programs may test annually. The frequency should be documented in your testing plan with a rationale for each area's cadence, and reviewed at least annually as your risk profile changes.

What do CFPB and OCC examiners look for in a compliance testing program?

Both agencies evaluate substance over form. A testing program that generates no escalations, covers the same areas on a rotation regardless of risk, or documents findings without consequence signals a program that exists on paper but isn't functioning. Examiners want to see: documented scope with risk-based justification, sampling methodology appropriate to transaction volume and risk, escalation paths with evidence they've been used, and findings that demonstrably led to remediation and root cause analysis.

What is transaction testing and how should I size my samples?

Transaction testing is the review of individual consumer transactions or files against regulatory requirements — for example, pulling a sample of mortgage loan files to confirm disclosure timing met Reg Z requirements. Sample size should be proportional to transaction volume and the consequence of a control failure. There's no universal rule, but the OCC Comptroller's Handbook indicates that sample size and technique should reflect the perceived risk of consumer harm and the need to assess compliance in the activity. A defensible approach documents the sampling methodology and rationale before testing begins.

What are the most common compliance testing deficiencies examiners cite?

The most consistently cited deficiencies: (1) testing scope doesn't cover high-risk products or recently changed regulations, (2) monitoring exists in policy but is not operationalized — no one is actually running the reviews, (3) complaints are tracked but never analyzed for trends or escalated to compliance, (4) testing produces findings that sit unresolved with no remediation plan or owner, and (5) no evidence of self-identified issues — the program has never caught anything internally, which signals the testing isn't substantive.

What changed with OCC and CFPB exam expectations in 2025 and 2026?

The OCC issued Bulletin 2025-24 eliminating mandatory policy-based examination procedures for community banks, effective January 1, 2026, shifting to examiner discretion based on risk. This means tailored, risk-focused exams — not lighter exams. The CFPB's 2025 'Humility in Supervision' posture narrows exam scope but concentrates findings in areas actually reviewed. Both shifts favor institutions with robust self-monitoring and demonstrated self-correction over those relying on light-touch compliance programs.

Rebecca Leung

Rebecca Leung has 8+ years of risk and compliance experience across first and second line roles at commercial banks, asset managers, and fintechs. Former management consultant advising financial institutions on risk strategy. Founder of RiskTemplates.

Related Framework

GRC Starter Kit

Everything a new compliance hire needs to build their first risk program — 6 products at 46% off.

See What's Included → Buy Now — $149

Keep Reading

Compliance Strategy

Cybersecurity Policy Template: Building a Defensible Information Security Program

Build a cybersecurity policy that satisfies NYDFS Part 500, NIST CSF 2.0, FTC Safeguards, and FFIEC. Required elements, control mappings, and what examiners flag.

May 5, 2026

Compliance Strategy

Information Security Policy Template: A Fintech and Community Bank Walkthrough

Build an information security policy that satisfies the FTC Safeguards Rule, FFIEC expectations, and bank examiner scrutiny. Includes required elements, structure, and common gaps.

May 4, 2026

Compliance Strategy

SOC 2 vs ISO 27001: When to Pick Which (and When You Need Both)

SOC 2 or ISO 27001? The right answer depends on your market, timeline, and customer base — not on which framework sounds more rigorous. A practitioner's comparison of costs, timelines, control overlap, and when both are worth it.

Apr 30, 2026

Immaterial Findings ✉️

Weekly newsletter

Sharp risk & compliance insights practitioners actually read. Enforcement actions, regulatory shifts, and practical frameworks — no fluff, no filler.

Join practitioners from banks, fintechs, and asset managers. Delivered weekly.