Control Testing Techniques: Sampling, Walkthroughs, and Evidence Collection That Holds Up
Table of Contents
TL;DR
- Control testing is how you prove your compliance program works — not just that you have policies. Regulators and auditors want evidence, not assertions.
- The three primary techniques — walkthroughs, sampling tests, and reperformance — serve different purposes. Walkthroughs test design; sampling tests operating effectiveness; reperformance provides the strongest evidence that a control actually works.
- PCAOB AS 2315, updated for fiscal years beginning after December 15, 2025, codifies sampling requirements for public company auditors. Internal compliance teams can use the same framework to build defensible sample sizes and documentation.
- The most common control testing failures aren’t bad controls — they’re bad workpapers: undocumented sample rationale, weak evidence, conclusions that don’t follow from the testing performed.
The examiner sits down, opens your compliance program documentation, and asks for the last round of control testing workpapers. You hand over a spreadsheet with a list of controls, a column marked “Tested,” and dates. The examiner asks: “What did you test? What did you look at? What did you find?”
If the answer is “we looked at the process and confirmed it was working,” you don’t have control testing. You have control affirmation — and there’s a meaningful difference.
Real control testing produces evidence. It names the specific population being tested, documents how samples were selected, records what the tester physically reviewed or reperformed, and states a conclusion that follows from the evidence. It’s the kind of workpaper a skeptical second reviewer — or examiner — can reconstruct independently.
This is the methodology guide. Walkthroughs, sampling, reperformance, evidence collection — what each one is for, how to execute it, and what the documentation needs to look like.
Why Control Testing Is Different from Monitoring
Before getting into technique, it’s worth being precise about what you’re doing and why. Control testing gets conflated with compliance monitoring. They’re related but distinct.
Monitoring is ongoing — automated system alerts, daily or weekly reports, exception dashboards that flag when something is outside tolerance. Monitoring tells you when something might be wrong.
Testing is periodic and structured — a deliberate exercise that evaluates whether a specific control operated effectively over a defined period. Testing confirms that monitoring is actually working, and it covers controls that monitoring doesn’t catch (manual review steps, judgment-based approvals, training completions, vendor certifications).
The FDIC’s Consumer Compliance Examination Manual draws this distinction explicitly: the compliance management system must include both monitoring and audit functions, and examiners evaluate each separately. A monitoring program that produces reports nobody acts on, or an audit function that tests controls without generating findings, are both CMS deficiencies.
The Three Core Testing Techniques
1. Walkthroughs
A walkthrough traces a single transaction or process from beginning to end — origination, processing, handoffs, controls applied at each step, and final output. The tester observes, inquires, and inspects documents at every stage rather than testing a sample.
What walkthroughs test: Design effectiveness. Does the control, as it actually operates, address the risk it’s supposed to address? Is the process consistent with what the procedures describe?
When to use walkthroughs:
- First-time testing of a control you haven’t evaluated before
- After a significant process change or procedure update
- When substantive testing results suggest something is wrong but you can’t identify the source
- As preparation for a larger sample test (understanding the process before sampling)
What a walkthrough isn’t: Walkthroughs don’t test operating effectiveness at scale. One transaction tells you the process can work — it doesn’t tell you it works consistently across all 3,000 transactions in the population. That’s what sampling does.
Documentation requirements: The walkthrough should produce a documented process map or narrative, a list of the evidence reviewed at each step (with specific document references — not just “reviewed a file”), inquiries made and responses received, and a conclusion about design adequacy with specific observations about any gaps between the documented procedure and observed practice.
2. Attribute Sampling (Tests of Controls)
Attribute sampling is the standard technique for testing operating effectiveness — the control worked or it didn’t, for each sample item. The result is a rate of deviation: out of 60 sampled transactions, 3 showed a control failure — a 5% deviation rate.
Statistical vs. nonstatistical sampling: Both are legitimate under PCAOB AS 2315 and for internal compliance purposes. Statistical sampling uses probability theory to mathematically relate sample results to the full population; nonstatistical sampling uses professional judgment. Either way, the auditor must use judgment in planning, performing, and evaluating the sample.
Key parameters for setting sample size:
| Parameter | What It Means | How It Affects Sample Size |
|---|---|---|
| Tolerable deviation rate | The maximum rate of control failure you’d accept and still conclude the control is effective | Lower tolerance → larger sample |
| Expected population deviation rate | Your estimate of how often the control actually fails in the population | Higher expected deviation → larger sample |
| Confidence level / Allowable risk | How confident you need to be in the conclusion | Higher confidence → larger sample |
| Population size | Total items subject to the control in the period | Matters less than you think for large populations |
Practical nonstatistical sample sizes (commonly applied in internal compliance programs):
| Control Frequency | Typical Sample Size |
|---|---|
| Annual (once per year) | 1–3 items |
| Quarterly (4x per year) | 2–4 items |
| Monthly (12x per year) | 3–6 items |
| Weekly | 8–15 items |
| Daily | 20–40 items |
| Multiple times daily / transaction-level | 40–60+ items |
These are heuristics, not mandates. Document your rationale. If you select 25 items for a daily control, explain why: prior-year results showed no deviations, the control is automated with system-enforced logic, and the risk is moderate. If someone can’t reconstruct your sample size logic from the workpaper, you have a documentation gap.
Sample selection methods:
- Random selection: Every item in the population has an equal chance of selection. Requires a complete population listing and random number generator or tool. Most defensible for large populations.
- Systematic selection: Select every nth item from a sequential population (e.g., every 20th transaction). Defensible if the population isn’t organized in a pattern that creates bias.
- Haphazard selection: Manual selection intended to approximate randomness without statistical precision. Acceptable for nonstatistical sampling but document how items were chosen and why it approximates an unbiased draw.
- Judgment selection (targeted): Select items that represent specific characteristics — high-value transactions, new customer types, specific product lines. Not appropriate for reaching conclusions about the broader population; use for targeted risk testing.
3. Reperformance
Reperformance is exactly what it sounds like: the tester independently executes the same steps the control operator would have performed and compares the outcome. The question isn’t “does this approval form exist?” — it’s “if I perform the same calculation or review the same data the approver was supposed to review, do I reach the same conclusion?”
Reperformance is the strongest evidence type in the hierarchy:
- Reperformance — tester independently executes the control
- Inspection — tester examines documents, records, or physical items
- Observation — tester watches the control being performed
- Inquiry — tester asks about the control; receives verbal or written responses
Inquiry alone is not sufficient evidence. Multiple enforcement actions and audit finding letters cite exactly this gap: institutions that “tested” a control by asking the control owner if they had performed it. That’s not testing. That’s interviewing.
A practical combination: walk one transaction through reperformance, then inspect documentation for the remaining sample. Reperformance validates that the control produces the right output; inspection of documentation validates that it was consistently applied.
Evidence Collection: What “Holds Up” Means
Evidence holds up when a different person — a second-line tester, an internal auditor, an external examiner — can look at it and reach the same conclusion you did. That requires specificity.
What weak evidence looks like:
- “Reviewed approval documentation — no issues noted”
- “Confirmed with manager that monthly reports are distributed”
- “Tested 10 transactions — all compliant”
What strong evidence looks like:
- “Inspected approval email chain for Transaction #2847-Q (dated 04/15/26), confirming dual authorization by [Role A] at 14:23 and [Role B] at 14:51, prior to funds release at 15:02. Screenshot retained in workpaper Tab 3.”
- “Reperformed the OFAC screening for 15 of the 40 sampled customer records against the SDN List as of [date]. All 15 produced no matches; 2 produced false positives that were appropriately cleared in the case management system with documented rationale (workpaper Tab 5).”
- “Tested 40 transactions. 38 passed — approval timestamp preceded processing timestamp. 2 exceptions: Transaction #1042 and #1119 — approval recorded same minute as processing, unable to confirm sequencing. Classified as deviation.”
The difference isn’t length. It’s specificity: what exactly was tested, which documents were reviewed, what the tester actually looked at, and what was found — including exceptions.
Building the Testing Workpaper
A complete control testing workpaper includes:
- Control description: What the control is supposed to do, referencing the documented procedure
- Risk addressed: What failure would look like if the control didn’t operate
- Population: The complete set of items subject to the control (total count, date range, source)
- Sample selection: How items were selected, by whom, using what tool or method
- Sample listing: Enough information to reconstruct which specific items were tested
- Testing procedure: Step-by-step description of what was done — not “reviewed” but what specifically was inspected or reperformed
- Results by sample item: Pass/fail for each, with exception notation and description
- Exceptions: For each deviation, describe the nature, likely cause if determinable, and materiality judgment
- Conclusion: Overall operating effectiveness conclusion, whether any exceptions exceed the tolerable deviation rate, and recommended action
- Preparer and reviewer: Name, date, and sign-off
If your workpapers don’t include all of these, examiners will note the gap. The FDIC’s examination procedures specifically call out reviewing internal audit plans, reports, and finding remediation — which means they’ll be reading your workpapers, not just your audit report.
Common Findings and How to Avoid Them
The same deficiencies appear in control testing programs across industries and institution types. Here’s what examiners and internal audit chiefs see most often:
1. Sample rationale not documented. The sample size is defensible; the reasoning isn’t recorded. Fix: one sentence in every workpaper explaining sample size determination.
2. Evidence post-dates the testing period. You find an email chain or system log that was created after the testing date. This looks like retroactive documentation — and sometimes is. Fix: pull evidence contemporaneously and note the document date.
3. Conclusions disconnect from results. The workpaper documents 3 exceptions out of 25 samples (12% deviation rate) and concludes “control is effective.” That’s only defensible if your tolerable deviation rate is above 12% — which you need to state explicitly. Fix: define tolerable deviation rate before testing begins, and link your conclusion to whether you exceeded it.
4. Same finding, multiple cycles. A control deficiency is noted in Q1, “remediation in progress” is noted in Q2, the same deficiency is noted in Q3. This is one of the strongest exam signals that your CMS has a systemic gap. Fix: track exceptions through to verified remediation before closing them.
5. Testing design effectiveness only. Walkthroughs confirm the control process exists; nobody tests whether it operates consistently across the population. Fix: walkthroughs and sample testing serve different purposes — both are needed.
Integration with Your Broader Compliance Program
Control testing is one leg of a compliance program structure. It connects upward to your compliance risk assessment — risk prioritization should drive testing frequency and sample size, so high-risk areas get tested more often with larger samples. It connects laterally to your compliance monitoring and testing plan, which documents the testing universe, schedule, and methodology at the program level.
It also connects to your RCSA process. The RCSA identifies control gaps and rates residual risk; control testing validates whether those assessments are accurate. A control the business owner rates as “strong” that testing reveals has a 15% deviation rate is a material discrepancy — and it should flow back to the RCSA as a finding that changes the residual risk rating.
For SOC 2 specifically, control testing follows a similar structure but maps to the Trust Service Criteria rather than regulatory requirements. The AICPA’s guidance on audit procedures is consistent with the techniques described here — the same attribute sampling logic and evidence standards apply.
The PCAOB’s updated AS 2315 — effective for fiscal years beginning after December 15, 2025 — reinforces the same fundamentals for public company internal control audits. Internal compliance teams aren’t subject to AS 2315 directly, but the framework is the right model: documented sampling rationale, clear parameters, and conclusions that follow from the evidence.
So What?
If your control testing produces workpapers that say “tested — no issues,” you’re generating paperwork, not compliance evidence. When an examiner or internal auditor questions a control failure next quarter, you’ll have nothing to show them except a conclusion that was never justified.
The fix isn’t complicated, just disciplined. Before the next testing cycle:
- Define your population for each control. Not “all transactions” — the specific date range, system, product type, or subset that’s subject to this control.
- Set your tolerable deviation rate. For most compliance controls, 5–10% is the typical threshold. Write it in the workpaper before you start.
- Choose your sample size and document the rationale. Use the frequency table above as a starting point; adjust for risk.
- Record specific evidence. Document IDs, dates, document names, what was inspected, what was found.
- State your conclusion explicitly. “X of Y items passed. The observed deviation rate of Z% is [below/above] the 5% tolerable rate. Control is [effective/ineffective].”
A well-designed RCSA template — one that captures control descriptions, testing methodology, and residual risk ratings in one place — can significantly reduce the documentation overhead. The RCSA Risk & Control Self-Assessment template is built for exactly this: structured control inventories that feed directly into your testing workpapers.
See the Compliance Monitoring and Testing Plan guide for how to structure the program-level document that ties your testing schedule, methodology, and findings tracking together.
The Annual Compliance Risk Assessment guide covers how risk prioritization should drive testing frequency — so you’re putting the most testing resources where the risk is highest.
Related Template
RCSA (Risk & Control Self-Assessment)
141 pre-populated fintech risks with control assessments, questionnaire framework, and testing calendar.
Frequently Asked Questions
What's the difference between a test of controls and a substantive test?
How many items do I need to sample for a control test?
What's a walkthrough and when is it required?
What evidence is strong enough to satisfy an examiner?
What are the most common control testing deficiencies examiners find?
How do I document control testing so it survives a second reviewer?
Rebecca Leung
Rebecca Leung has 8+ years of risk and compliance experience across first and second line roles at commercial banks, asset managers, and fintechs. Former management consultant advising financial institutions on risk strategy. Founder of RiskTemplates.
Related Framework
RCSA (Risk & Control Self-Assessment)
141 pre-populated fintech risks with control assessments, questionnaire framework, and testing calendar.
Keep Reading
Who Should Own the Contingency Funding Plan? Treasury, Finance, Risk, and the Review-and-Challenge Model
Practical guide to CFP ownership: who drafts, who challenges, who approves. Three-lines-of-defense roles, board oversight, and what examiners expect after SR 10-6 and the 2023 addendum.
May 15, 2026
Compliance StrategyFintech Acceptable Use Policy: How to Handle High-Risk Customers Without Killing Good Business
How to build a fintech acceptable use policy that evaluates high-risk customers by actual platform use, not blunt industry labels.
May 14, 2026
Compliance StrategyCompliance Calendar Template: Tracking Regulatory Deadlines, Filings, and Internal Reviews
How to build a compliance calendar that tracks every BSA, HMDA, Call Report, SAR, and exam deadline — with a 2026 reference template and the fields that survive an audit.
May 9, 2026
Immaterial Findings ✉️
Weekly newsletter
Sharp risk & compliance insights practitioners actually read. Enforcement actions, regulatory shifts, and practical frameworks — no fluff, no filler.
Join practitioners from banks, fintechs, and asset managers. Delivered weekly.