Compliance Strategy

Control Testing Techniques: Sampling, Walkthroughs, and Evidence Collection That Holds Up

May 8, 2026 Rebecca Leung
Table of Contents

TL;DR

  • Control testing is how you prove your compliance program works — not just that you have policies. Regulators and auditors want evidence, not assertions.
  • The three primary techniques — walkthroughs, sampling tests, and reperformance — serve different purposes. Walkthroughs test design; sampling tests operating effectiveness; reperformance provides the strongest evidence that a control actually works.
  • PCAOB AS 2315, updated for fiscal years beginning after December 15, 2025, codifies sampling requirements for public company auditors. Internal compliance teams can use the same framework to build defensible sample sizes and documentation.
  • The most common control testing failures aren’t bad controls — they’re bad workpapers: undocumented sample rationale, weak evidence, conclusions that don’t follow from the testing performed.

The examiner sits down, opens your compliance program documentation, and asks for the last round of control testing workpapers. You hand over a spreadsheet with a list of controls, a column marked “Tested,” and dates. The examiner asks: “What did you test? What did you look at? What did you find?”

If the answer is “we looked at the process and confirmed it was working,” you don’t have control testing. You have control affirmation — and there’s a meaningful difference.

Real control testing produces evidence. It names the specific population being tested, documents how samples were selected, records what the tester physically reviewed or reperformed, and states a conclusion that follows from the evidence. It’s the kind of workpaper a skeptical second reviewer — or examiner — can reconstruct independently.

This is the methodology guide. Walkthroughs, sampling, reperformance, evidence collection — what each one is for, how to execute it, and what the documentation needs to look like.

Why Control Testing Is Different from Monitoring

Before getting into technique, it’s worth being precise about what you’re doing and why. Control testing gets conflated with compliance monitoring. They’re related but distinct.

Monitoring is ongoing — automated system alerts, daily or weekly reports, exception dashboards that flag when something is outside tolerance. Monitoring tells you when something might be wrong.

Testing is periodic and structured — a deliberate exercise that evaluates whether a specific control operated effectively over a defined period. Testing confirms that monitoring is actually working, and it covers controls that monitoring doesn’t catch (manual review steps, judgment-based approvals, training completions, vendor certifications).

The FDIC’s Consumer Compliance Examination Manual draws this distinction explicitly: the compliance management system must include both monitoring and audit functions, and examiners evaluate each separately. A monitoring program that produces reports nobody acts on, or an audit function that tests controls without generating findings, are both CMS deficiencies.

The Three Core Testing Techniques

1. Walkthroughs

A walkthrough traces a single transaction or process from beginning to end — origination, processing, handoffs, controls applied at each step, and final output. The tester observes, inquires, and inspects documents at every stage rather than testing a sample.

What walkthroughs test: Design effectiveness. Does the control, as it actually operates, address the risk it’s supposed to address? Is the process consistent with what the procedures describe?

When to use walkthroughs:

  • First-time testing of a control you haven’t evaluated before
  • After a significant process change or procedure update
  • When substantive testing results suggest something is wrong but you can’t identify the source
  • As preparation for a larger sample test (understanding the process before sampling)

What a walkthrough isn’t: Walkthroughs don’t test operating effectiveness at scale. One transaction tells you the process can work — it doesn’t tell you it works consistently across all 3,000 transactions in the population. That’s what sampling does.

Documentation requirements: The walkthrough should produce a documented process map or narrative, a list of the evidence reviewed at each step (with specific document references — not just “reviewed a file”), inquiries made and responses received, and a conclusion about design adequacy with specific observations about any gaps between the documented procedure and observed practice.

2. Attribute Sampling (Tests of Controls)

Attribute sampling is the standard technique for testing operating effectiveness — the control worked or it didn’t, for each sample item. The result is a rate of deviation: out of 60 sampled transactions, 3 showed a control failure — a 5% deviation rate.

Statistical vs. nonstatistical sampling: Both are legitimate under PCAOB AS 2315 and for internal compliance purposes. Statistical sampling uses probability theory to mathematically relate sample results to the full population; nonstatistical sampling uses professional judgment. Either way, the auditor must use judgment in planning, performing, and evaluating the sample.

Key parameters for setting sample size:

ParameterWhat It MeansHow It Affects Sample Size
Tolerable deviation rateThe maximum rate of control failure you’d accept and still conclude the control is effectiveLower tolerance → larger sample
Expected population deviation rateYour estimate of how often the control actually fails in the populationHigher expected deviation → larger sample
Confidence level / Allowable riskHow confident you need to be in the conclusionHigher confidence → larger sample
Population sizeTotal items subject to the control in the periodMatters less than you think for large populations

Practical nonstatistical sample sizes (commonly applied in internal compliance programs):

Control FrequencyTypical Sample Size
Annual (once per year)1–3 items
Quarterly (4x per year)2–4 items
Monthly (12x per year)3–6 items
Weekly8–15 items
Daily20–40 items
Multiple times daily / transaction-level40–60+ items

These are heuristics, not mandates. Document your rationale. If you select 25 items for a daily control, explain why: prior-year results showed no deviations, the control is automated with system-enforced logic, and the risk is moderate. If someone can’t reconstruct your sample size logic from the workpaper, you have a documentation gap.

Sample selection methods:

  • Random selection: Every item in the population has an equal chance of selection. Requires a complete population listing and random number generator or tool. Most defensible for large populations.
  • Systematic selection: Select every nth item from a sequential population (e.g., every 20th transaction). Defensible if the population isn’t organized in a pattern that creates bias.
  • Haphazard selection: Manual selection intended to approximate randomness without statistical precision. Acceptable for nonstatistical sampling but document how items were chosen and why it approximates an unbiased draw.
  • Judgment selection (targeted): Select items that represent specific characteristics — high-value transactions, new customer types, specific product lines. Not appropriate for reaching conclusions about the broader population; use for targeted risk testing.

3. Reperformance

Reperformance is exactly what it sounds like: the tester independently executes the same steps the control operator would have performed and compares the outcome. The question isn’t “does this approval form exist?” — it’s “if I perform the same calculation or review the same data the approver was supposed to review, do I reach the same conclusion?”

Reperformance is the strongest evidence type in the hierarchy:

  1. Reperformance — tester independently executes the control
  2. Inspection — tester examines documents, records, or physical items
  3. Observation — tester watches the control being performed
  4. Inquiry — tester asks about the control; receives verbal or written responses

Inquiry alone is not sufficient evidence. Multiple enforcement actions and audit finding letters cite exactly this gap: institutions that “tested” a control by asking the control owner if they had performed it. That’s not testing. That’s interviewing.

A practical combination: walk one transaction through reperformance, then inspect documentation for the remaining sample. Reperformance validates that the control produces the right output; inspection of documentation validates that it was consistently applied.

Evidence Collection: What “Holds Up” Means

Evidence holds up when a different person — a second-line tester, an internal auditor, an external examiner — can look at it and reach the same conclusion you did. That requires specificity.

What weak evidence looks like:

  • “Reviewed approval documentation — no issues noted”
  • “Confirmed with manager that monthly reports are distributed”
  • “Tested 10 transactions — all compliant”

What strong evidence looks like:

  • “Inspected approval email chain for Transaction #2847-Q (dated 04/15/26), confirming dual authorization by [Role A] at 14:23 and [Role B] at 14:51, prior to funds release at 15:02. Screenshot retained in workpaper Tab 3.”
  • “Reperformed the OFAC screening for 15 of the 40 sampled customer records against the SDN List as of [date]. All 15 produced no matches; 2 produced false positives that were appropriately cleared in the case management system with documented rationale (workpaper Tab 5).”
  • “Tested 40 transactions. 38 passed — approval timestamp preceded processing timestamp. 2 exceptions: Transaction #1042 and #1119 — approval recorded same minute as processing, unable to confirm sequencing. Classified as deviation.”

The difference isn’t length. It’s specificity: what exactly was tested, which documents were reviewed, what the tester actually looked at, and what was found — including exceptions.

Building the Testing Workpaper

A complete control testing workpaper includes:

  1. Control description: What the control is supposed to do, referencing the documented procedure
  2. Risk addressed: What failure would look like if the control didn’t operate
  3. Population: The complete set of items subject to the control (total count, date range, source)
  4. Sample selection: How items were selected, by whom, using what tool or method
  5. Sample listing: Enough information to reconstruct which specific items were tested
  6. Testing procedure: Step-by-step description of what was done — not “reviewed” but what specifically was inspected or reperformed
  7. Results by sample item: Pass/fail for each, with exception notation and description
  8. Exceptions: For each deviation, describe the nature, likely cause if determinable, and materiality judgment
  9. Conclusion: Overall operating effectiveness conclusion, whether any exceptions exceed the tolerable deviation rate, and recommended action
  10. Preparer and reviewer: Name, date, and sign-off

If your workpapers don’t include all of these, examiners will note the gap. The FDIC’s examination procedures specifically call out reviewing internal audit plans, reports, and finding remediation — which means they’ll be reading your workpapers, not just your audit report.

Common Findings and How to Avoid Them

The same deficiencies appear in control testing programs across industries and institution types. Here’s what examiners and internal audit chiefs see most often:

1. Sample rationale not documented. The sample size is defensible; the reasoning isn’t recorded. Fix: one sentence in every workpaper explaining sample size determination.

2. Evidence post-dates the testing period. You find an email chain or system log that was created after the testing date. This looks like retroactive documentation — and sometimes is. Fix: pull evidence contemporaneously and note the document date.

3. Conclusions disconnect from results. The workpaper documents 3 exceptions out of 25 samples (12% deviation rate) and concludes “control is effective.” That’s only defensible if your tolerable deviation rate is above 12% — which you need to state explicitly. Fix: define tolerable deviation rate before testing begins, and link your conclusion to whether you exceeded it.

4. Same finding, multiple cycles. A control deficiency is noted in Q1, “remediation in progress” is noted in Q2, the same deficiency is noted in Q3. This is one of the strongest exam signals that your CMS has a systemic gap. Fix: track exceptions through to verified remediation before closing them.

5. Testing design effectiveness only. Walkthroughs confirm the control process exists; nobody tests whether it operates consistently across the population. Fix: walkthroughs and sample testing serve different purposes — both are needed.

Integration with Your Broader Compliance Program

Control testing is one leg of a compliance program structure. It connects upward to your compliance risk assessment — risk prioritization should drive testing frequency and sample size, so high-risk areas get tested more often with larger samples. It connects laterally to your compliance monitoring and testing plan, which documents the testing universe, schedule, and methodology at the program level.

It also connects to your RCSA process. The RCSA identifies control gaps and rates residual risk; control testing validates whether those assessments are accurate. A control the business owner rates as “strong” that testing reveals has a 15% deviation rate is a material discrepancy — and it should flow back to the RCSA as a finding that changes the residual risk rating.

For SOC 2 specifically, control testing follows a similar structure but maps to the Trust Service Criteria rather than regulatory requirements. The AICPA’s guidance on audit procedures is consistent with the techniques described here — the same attribute sampling logic and evidence standards apply.

The PCAOB’s updated AS 2315 — effective for fiscal years beginning after December 15, 2025 — reinforces the same fundamentals for public company internal control audits. Internal compliance teams aren’t subject to AS 2315 directly, but the framework is the right model: documented sampling rationale, clear parameters, and conclusions that follow from the evidence.

So What?

If your control testing produces workpapers that say “tested — no issues,” you’re generating paperwork, not compliance evidence. When an examiner or internal auditor questions a control failure next quarter, you’ll have nothing to show them except a conclusion that was never justified.

The fix isn’t complicated, just disciplined. Before the next testing cycle:

  1. Define your population for each control. Not “all transactions” — the specific date range, system, product type, or subset that’s subject to this control.
  2. Set your tolerable deviation rate. For most compliance controls, 5–10% is the typical threshold. Write it in the workpaper before you start.
  3. Choose your sample size and document the rationale. Use the frequency table above as a starting point; adjust for risk.
  4. Record specific evidence. Document IDs, dates, document names, what was inspected, what was found.
  5. State your conclusion explicitly. “X of Y items passed. The observed deviation rate of Z% is [below/above] the 5% tolerable rate. Control is [effective/ineffective].”

A well-designed RCSA template — one that captures control descriptions, testing methodology, and residual risk ratings in one place — can significantly reduce the documentation overhead. The RCSA Risk & Control Self-Assessment template is built for exactly this: structured control inventories that feed directly into your testing workpapers.


See the Compliance Monitoring and Testing Plan guide for how to structure the program-level document that ties your testing schedule, methodology, and findings tracking together.

The Annual Compliance Risk Assessment guide covers how risk prioritization should drive testing frequency — so you’re putting the most testing resources where the risk is highest.

Frequently Asked Questions

What's the difference between a test of controls and a substantive test?
A test of controls (also called a compliance test or conformance test) evaluates whether a control is operating as designed — did the approval happen, was the review documented, was the threshold applied correctly? A substantive test evaluates whether a specific transaction or balance is accurate. In audit practice, strong control test results allow the auditor to reduce the scope of substantive testing. In compliance programs, you generally run both: control tests verify your processes are working, substantive tests verify the outputs. Examiners often see gaps when organizations run control tests but don't validate that the controls are actually preventing errors in transactions.
How many items do I need to sample for a control test?
It depends on your testing approach, the control's frequency, and your tolerable deviation rate. A commonly used nonstatistical heuristic: 25–60 items for annual controls, 40–60 for quarterly, 60–120 for monthly or higher frequency. PCAOB AS 2315 requires that sample size account for the tolerable rate of deviation, the expected rate of deviation in the population, and the allowable risk of incorrectly concluding the control is effective. For low-risk controls with strong prior results, smaller samples may be defensible — but document the rationale.
What's a walkthrough and when is it required?
A walkthrough is a technique where the auditor or compliance tester follows a single transaction or process end-to-end, from origination through completion, observing each step and confirming that controls are actually applied as described in procedures. Walkthroughs are primarily used to assess design effectiveness — does the control, as designed, actually prevent or detect the risk? PCAOB AS 2201 requires walkthroughs as part of integrated audits of internal control over financial reporting. In compliance programs, walkthroughs are most valuable at onboarding new processes, when procedures are updated, and as the first step in testing a control you haven't tested before.
What evidence is strong enough to satisfy an examiner?
Strength of evidence generally ranks: reperformance > inspection of documentation > observation > inquiry. Reperformance — where the tester independently executes the same steps the control owner would — is the most powerful because it shows the control can be replicated, not just described. Inquiry alone (asking someone 'did you do this?') is the weakest and will not satisfy examiners as standalone evidence. A combination of documented evidence (approval timestamps, system screenshots, signed checklists) plus reperformance for at least a subset of samples is the standard most examiners expect.
What are the most common control testing deficiencies examiners find?
The top recurring findings: (1) Testing documentation that describes what was tested but not what was found — conclusions need to be explicit. (2) Sample sizes that can't be justified — no rationale for why 10 items is enough. (3) Testing controls rather than testing that controls prevent actual errors — a walkthrough confirms the form exists, but substantive testing confirms the form was used correctly. (4) No follow-up on prior findings — same control tested, same deficiency noted, no evidence of remediation. (5) Evidence that post-dates the testing period — approval emails, system logs, or checklists that were created after the fact.
How do I document control testing so it survives a second reviewer?
A good testing workpaper includes: the control description (what it's supposed to do), the population (all transactions or records subject to the control in the period), the sample selection methodology (how items were chosen), sample items tested (enough detail to reconstruct the selection), the testing procedure performed (not 'reviewed file' — describe specifically what was inspected or reperformed), results for each sample item (pass, fail, or exception with explanation), overall conclusion, and the tester's name, date, and reviewer signature. If someone who wasn't in the room can't reconstruct exactly what was tested and what was found, the documentation is insufficient.
Rebecca Leung

Rebecca Leung

Rebecca Leung has 8+ years of risk and compliance experience across first and second line roles at commercial banks, asset managers, and fintechs. Former management consultant advising financial institutions on risk strategy. Founder of RiskTemplates.

Related Framework

RCSA (Risk & Control Self-Assessment)

141 pre-populated fintech risks with control assessments, questionnaire framework, and testing calendar.

Immaterial Findings ✉️

Weekly newsletter

Sharp risk & compliance insights practitioners actually read. Enforcement actions, regulatory shifts, and practical frameworks — no fluff, no filler.

Join practitioners from banks, fintechs, and asset managers. Delivered weekly.