Compliance Strategy

Algorithmic Fairness Audits: A Step-by-Step Compliance Guide for 2026

March 31, 2026 Rebecca Leung
Table of Contents

TL;DR:

  • NYC Local Law 144 already requires annual independent bias audits for automated hiring tools — and a December 2025 Comptroller audit found enforcement is so weak that most non-compliant employers haven’t even been identified yet.
  • Colorado SB 205 takes effect February 1, 2026, requiring “reasonable care” against algorithmic discrimination for all high-risk AI systems — not just hiring tools.
  • The EU AI Act’s high-risk system requirements (including bias testing and audit trails) kick in August 2, 2026. If you’re running AI that touches credit, employment, housing, or insurance decisions, the audit window is now.

Three Laws, One Deadline Year: Why 2026 Is the Algorithmic Fairness Reckoning

Here’s the uncomfortable truth about algorithmic fairness audits in 2026: most companies doing them are doing them badly, and most companies that should be doing them aren’t doing them at all.

NYC’s Local Law 144 has been live since July 2023, making it the most mature AI bias audit law in the United States. It requires employers using automated employment decision tools (AEDTs) to conduct independent bias audits annually and publish the results. The penalty? $500 to $1,500 per violation per day — where each day of use without a compliant audit counts as a separate violation.

But here’s what makes 2026 different: NYC is no longer the only jurisdiction. Colorado SB 205 went live on February 1, 2026, covering all high-risk AI systems — not just hiring tools. The EU AI Act’s high-risk obligations take effect August 2, 2026. And the EEOC’s landmark settlement in EEOC v. iTutorGroup$365,000 for using hiring software that automatically rejected applicants over age 55 (women) and 60 (men) — proved that existing anti-discrimination law already applies to AI, no new statute required.

This guide walks you through the entire algorithmic fairness audit lifecycle: what triggers one, how to scope it, which statistical tests to run, how to pick an auditor, and what to do when the audit finds something bad.

What Is an Algorithmic Fairness Audit?

An algorithmic fairness audit is a structured assessment of whether an automated decision-making system produces outcomes that disproportionately disadvantage people in protected classes — race, gender, age, disability, national origin, or other characteristics covered by anti-discrimination law.

It’s different from a model validation (which checks whether the model works as designed) and different from a privacy impact assessment (which evaluates data handling). A fairness audit specifically asks: does this system treat different groups differently, and if so, is that difference legally or ethically defensible?

Audit TypeWhat It TestsWho Requires ItCadence
Bias Audit (NYC LL 144)Disparate impact in hiring/promotion AEDTsNYC DCWPAnnual
Impact Assessment (Colorado SB 205)Algorithmic discrimination risk for high-risk AIColorado AGBefore deployment + ongoing
Conformity Assessment (EU AI Act)Bias, transparency, accuracy for high-risk systemsEU national authoritiesBefore market placement + ongoing
Disparate Impact Analysis (ECOA/Fair Housing)Lending/housing discriminationCFPB, HUD, DOJModel validation cycle

Who Needs a Fairness Audit Right Now?

If you’re asking “does this apply to me?” — here’s the decision tree:

You definitely need one if:

  • You use AI/automated tools for hiring, screening, or promotion decisions in NYC (LL 144)
  • You deploy AI for “consequential decisions” in Colorado — credit, employment, education, housing, insurance, or legal services (SB 205)
  • You operate high-risk AI systems in the EU or serving EU residents (AI Act, August 2026)
  • You use AI in lending decisions anywhere in the US (ECOA, Fair Housing Act — no new law needed)

You probably need one if:

  • You use AI in insurance underwriting (state-level scrutiny increasing)
  • You’re a vendor selling AI tools to companies in the above categories (both LL 144 and SB 205 put obligations on developers, not just deployers)
  • You use AI for customer segmentation that affects pricing, offers, or service levels

You can wait (but shouldn’t) if:

  • Your AI is purely internal (e.g., demand forecasting) with no consequential human impact
  • You’re in a jurisdiction with no current AI-specific law — but ECOA, Title VII, and the ADA already cover algorithmic discrimination

The Audit Lifecycle: 7 Steps From Scoping to Remediation

Step 1: Inventory and Classify Your AI Systems (Weeks 1-2)

You can’t audit what you can’t find. Before any fairness testing begins, build a complete inventory of automated decision-making systems.

What to inventory:

  • Production AI/ML models making or influencing decisions
  • Vendor-provided AI tools (ATS screening, credit scoring, chatbot triage)
  • Rule-based automated systems that use demographic-adjacent features
  • Shadow AI — tools employees adopted without IT approval

Classification criteria:

Risk FactorLowMediumHigh
Decision typeInternal operationsCustomer-facing recommendationsCredit, employment, housing, insurance
AutonomyHuman makes final decisionAI recommends, human approvesAI decides autonomously
Protected-class exposureNo demographic data usedDemographic-adjacent features (ZIP code, name)Direct or strongly correlated features
Legal exposureNo specific regulationIndustry guidance appliesStatute requires audit

Owner: Chief Risk Officer or Head of Compliance. At fintechs without a CRO, this falls to the VP of Engineering or Head of Product, with compliance oversight.

Deliverable: Completed AI system inventory with risk classifications, mapped to applicable regulations.

Step 2: Select Your Auditor (Weeks 2-3)

LL 144 requires an “independent auditor” but doesn’t define independence beyond saying the auditor can’t be the employer or a direct agent of the employer. This is a known gap — the ACM FAccT 2025 paper “Auditing the Audits” (which won a Best Paper Award) found that the quality of LL 144 bias audits has been inconsistent precisely because auditor independence requirements are vague.

What to look for in an auditor:

CriterionWhy It MattersRed Flag
IndependenceAuditor shouldn’t have financial ties to the AI vendorAuditor is the same company that built or sold the tool
Statistical expertiseFairness testing requires real statistical chopsAuditor only runs four-fifths rule calculations with no deeper analysis
Regulatory knowledgeDifferent laws require different metricsAuditor can’t explain which statute they’re testing against
Domain experienceFinancial services AI has unique risksAuditor has only done HR tech audits
TransparencyMethodology should be documented and defensibleAuditor won’t share their testing methodology

Auditor types:

  • Big 4 advisory firms — Deloitte, PwC, EY, KPMG all have AI audit practices. Best for large financial institutions needing regulator-credible audits.
  • Specialized AI audit firms — Companies like Holistic AI, BABL AI, and O’Neil Risk Consulting. Often more affordable with deeper technical expertise.
  • Academic partnerships — Universities with AI fairness labs can provide research-grade audits. Less common but high credibility.

Avoid “opinion shopping” — the practice of approaching multiple auditors until one produces favorable results. This is the algorithmic fairness equivalent of financial audit shopping, and regulators will notice.

Step 3: Define Scope and Metrics (Week 3)

Which fairness metrics you test depends on which law you’re complying with and what kind of decisions the system makes.

Core fairness metrics:

MetricWhat It MeasuresWhen to UseLegal Basis
Impact Ratio (Four-Fifths Rule)Selection rate of protected group ÷ selection rate of highest-performing group. Below 0.8 = potential disparate impactHiring, promotion, screeningEEOC Uniform Guidelines, LL 144
Demographic ParityWhether selection rates are equal across groupsBroad screening toolsGeneral fairness baseline
Equalized OddsWhether true positive and false positive rates are equal across groupsCredit decisions, risk scoringECOA, fair lending
Predictive ParityWhether positive predictive value is equal across groupsRisk assessmentsWhen accuracy parity matters
CalibrationWhether predicted probabilities match actual outcomes equally across groupsCredit scoring, insuranceModel performance fairness

LL 144 specifically requires the impact ratio calculated for selection/scoring rates across sex categories (male, female, non-binary/other), race/ethnicity categories (Hispanic/Latino, White, Black/African American, Asian, Native Hawaiian/Pacific Islander, American Indian/Alaska Native, Two or More Races), and intersectional combinations thereof.

Colorado SB 205 doesn’t prescribe specific metrics but requires deployers to use “reasonable care” to prevent “algorithmic discrimination” — defined as differential treatment or impact on the basis of age, color, disability, ethnicity, genetic information, national origin, race, religion, sex, or veteran status. This means you need to test across all of those classes, not just the ones LL 144 specifies.

Step 4: Collect and Prepare Data (Weeks 3-4)

This is where most audits break down. You need outcome data broken down by protected class, and many companies either don’t collect demographic data or collect it inconsistently.

Data requirements:

  • Input data: What features the model uses to make decisions
  • Outcome data: What the model decided (accepted/rejected, scored high/low, approved/denied)
  • Demographic data: Protected class information for the people affected by decisions
  • Ground truth: Actual outcomes (did the hired candidate perform well? did the approved loan default?)

Common data challenges:

ChallengeImpactWorkaround
No demographic data collectedCan’t calculate impact ratiosUse BISG (Bayesian Improved Surname Geocoding) proxy for race/ethnicity; use name-based proxies for gender
Incomplete demographic dataBiased sample skews resultsLL 144 allows excluding categories representing <2% of data; document the gap
Historical bias in training dataModel learned discriminatory patternsTest the model AND the training data; document known data limitations
Vendor won’t share dataCan’t audit a black boxContractual right-to-audit clauses are essential; SB 205 requires developers to share information with deployers

Owner: Data engineering team, with compliance oversight on demographic data handling (this data is sensitive — limit access).

Step 5: Conduct the Audit (Weeks 4-6)

The actual testing phase. A thorough fairness audit goes beyond simply calculating impact ratios.

Testing protocol:

  1. Baseline impact ratio analysis. Calculate selection/scoring rates across all required demographic categories and intersections. Flag any ratio below 0.8 (four-fifths rule threshold).

  2. Statistical significance testing. A ratio below 0.8 in a sample of 15 applicants is meaningless. Use Fisher’s exact test or chi-squared tests to determine whether observed disparities are statistically significant at the 95% confidence level.

  3. Root cause analysis. For statistically significant disparities, identify which model features drive the differential outcomes. SHAP values and LIME explanations can isolate feature-level contributions to disparate impact.

  4. Intersectional analysis. Test for compound discrimination — a system might pass fairness tests for race and gender separately but fail for Black women or Asian men. LL 144 explicitly requires intersectional impact ratios.

  5. Temporal analysis. Test whether disparities are consistent over time or whether they emerge during certain periods (e.g., after a model retrain, after a data drift event).

  6. Counterfactual testing. What would happen if a protected characteristic were changed? If flipping an applicant’s race from Black to white changes the outcome, the model is using race (directly or via proxies) as a decision factor.

Step 6: Interpret Results and Remediate (Weeks 6-8)

Finding disparate impact doesn’t automatically mean the system is illegal. The legal standard (under Griggs v. Duke Power Co. and subsequent case law) allows disparate impact if:

  1. The practice is “job-related and consistent with business necessity” (Title VII) or serves a “legitimate business justification” (ECOA)
  2. There is no less discriminatory alternative that serves the same purpose

Remediation decision tree:

  • Impact ratio ≥ 0.8, no statistical significance → Document, monitor, retest at next audit cycle
  • Impact ratio < 0.8, statistically significant, clear business justification → Document the justification thoroughly, implement monitoring, consider whether less discriminatory alternatives exist
  • Impact ratio < 0.8, statistically significant, no clear justification → Remediate before continued use: retrain model, remove proxy features, adjust thresholds, or add human review
  • Impact ratio < 0.8, egregious disparity → Stop using the system immediately, investigate root cause, and consider voluntary disclosure to regulators

Remediation techniques:

  • Feature removal/replacement: Remove features that serve as proxies for protected characteristics (ZIP code strongly correlates with race in the US)
  • Threshold adjustment: Modify decision thresholds to equalize outcomes across groups (balances accuracy against fairness)
  • Resampling/reweighting: Adjust training data to reduce historical bias
  • Adversarial debiasing: Train the model to explicitly not predict protected class membership
  • Human-in-the-loop: Add human review for cases near decision boundaries, particularly for protected classes with lower selection rates

Step 7: Document, Publish, and Monitor (Weeks 8-10)

Documentation isn’t just a compliance box to check — it’s your legal defense if someone challenges your AI system.

LL 144 requires public posting of bias audit summaries. At minimum, publish:

  • Date of audit
  • Auditor identity
  • Selection rates by category
  • Impact ratios for each category and intersection

Colorado SB 205 requires a risk management policy (publicly available), impact assessments, and the ability to disclose to the Attorney General upon request.

EU AI Act requires technical documentation including bias testing results, maintained throughout the system’s lifecycle.

Monitoring cadence:

RegulationAudit FrequencyAdditional Monitoring
NYC LL 144At least annuallyContinuous monitoring recommended
Colorado SB 205Before deployment + ongoingMust update when “reasonably known or foreseeable risks” change
EU AI ActBefore market placement + periodicPost-market monitoring required
Fair lending (ECOA)Model validation cycle (typically annual)Ongoing performance monitoring

The NYC LL 144 Enforcement Reality Check

The December 2025 audit by the New York State Comptroller painted a brutal picture of LL 144 enforcement. Key findings:

  • DCWP received only two AEDT complaints during the audit period (July 2023–June 2025). Two. In a city with thousands of employers using automated hiring tools.
  • DCWP surveyed 32 companies and found one instance of non-compliance. The Comptroller’s team reviewed the same 32 companies and found at least 17 instances of potential non-compliance.
  • DCWP lacks technical expertise to evaluate AEDTs and didn’t consult with the NYC Office of Technology and Innovation despite having a memorandum of understanding to do so.
  • The complaint-based enforcement model is fundamentally broken: if an employer doesn’t post a bias audit or notify candidates (i.e., doesn’t comply at all), there’s no mechanism to detect non-compliance.

What does this mean for you? Two things. First, the current enforcement gap won’t last — the Comptroller’s report is a public embarrassment that will force DCWP to step up enforcement. Second, early compliance gives you a competitive advantage when DCWP starts conducting proactive investigations.

The Mobley v. Workday Warning

Beyond statutory audit requirements, the ongoing class action Mobley v. Workday, Inc. is reshaping the liability landscape for AI vendors. Derek Mobley alleged that Workday’s AI-powered hiring tools discriminated against him based on race, age, and disability. In July 2024, a California federal judge allowed the claims to proceed, ruling that AI service providers could be directly liable under an “agent” theory — even though Workday wasn’t the employer.

In May 2025, the court granted conditional certification of the ADEA claims, allowing it to proceed toward class action status. If Mobley succeeds, every AI vendor selling hiring tools faces direct discrimination liability — not just the employers using them.

The implication for fairness audits: if you’re a vendor, you can’t shift audit responsibility to your customers. Both developers and deployers need to test for bias independently.

30/60/90-Day Implementation Roadmap

Days 1-30: Foundation

WeekDeliverableOwnerDependencies
1-2Complete AI system inventoryCRO / Head of ComplianceIT asset management data, vendor contracts
2-3Risk-classify all systems, identify audit triggersCompliance teamCompleted inventory, legal review of applicable laws
3-4Issue RFP for independent auditor (or engage internal audit if capabilities exist)Procurement + ComplianceBudget approval, vendor evaluation criteria
4Draft data collection plan for demographic and outcome dataData engineering + LegalPrivacy review of demographic data handling

Days 31-60: Execution

WeekDeliverableOwnerDependencies
5-6Complete data extraction and preparationData engineeringAccess to model outputs, demographic data sources
6-7Auditor conducts bias testingExternal auditorClean data delivery, model access or output access
7-8Receive preliminary results, begin root cause analysisModel risk team + AuditorCompleted testing
8Develop remediation plan for identified disparitiesModel risk + EngineeringRoot cause analysis results

Days 61-90: Remediation and Governance

WeekDeliverableOwnerDependencies
9-10Implement technical remediations (feature engineering, threshold adjustments, human review additions)Engineering + Data scienceApproved remediation plan
10-11Re-test remediated models, validate improvementAuditor or internal validationRemediated model in staging
11-12Publish audit results (LL 144), file documentation (SB 205), update technical files (EU AI Act)ComplianceFinal audit report
12Establish ongoing monitoring cadence and alert thresholdsModel risk teamMonitoring infrastructure

So What? Why This Matters Right Now

The window between “fairness audits are optional best practice” and “fairness audits are legally required” is closing. Colorado SB 205 is already live. The EU AI Act’s high-risk deadline is four months away. And existing anti-discrimination law — as the iTutorGroup settlement and Mobley v. Workday demonstrate — already covers AI-driven decisions.

The companies that build audit programs now get three advantages: they catch bias before it becomes a lawsuit, they build institutional muscle before regulators start asking questions, and they create auditable documentation that serves as an affirmative defense. The companies that wait get to explain to the Colorado Attorney General — or a federal judge — why they didn’t test a system they knew was making consequential decisions about people’s lives.

If you’re building an AI fairness audit program from scratch, the AI Risk Assessment Template gives you the structured assessment framework, risk tiering methodology, and documentation templates to get your audit program operational in weeks instead of months.

FAQ

How much does an algorithmic fairness audit cost?

Costs vary widely based on scope. A LL 144-compliant bias audit for a single AEDT from a specialized firm typically runs $5,000–$25,000. A comprehensive fairness audit covering multiple AI systems across a financial institution can cost $50,000–$200,000+ depending on the number of systems, data complexity, and regulatory scope. Big 4 firms tend to price at the higher end; specialized AI audit firms offer more affordable options for mid-market companies.

Can we conduct fairness audits in-house instead of hiring an external auditor?

It depends on the regulation. NYC LL 144 requires an “independent auditor” — meaning someone who is not the employer or a direct agent of the employer. Colorado SB 205 doesn’t explicitly require third-party audits but does require impact assessments that demonstrate “reasonable care.” The EU AI Act requires conformity assessments that may need third-party involvement for high-risk categories. Even where self-assessment is technically permitted, using an independent auditor dramatically increases the credibility and legal defensibility of your results.

What happens if our AI system fails a fairness audit?

Failure isn’t binary — it depends on the severity and context. A marginal impact ratio (e.g., 0.75 when the threshold is 0.80) with a documented business justification may be acceptable. A severe disparity (e.g., 0.40 impact ratio for a protected class) with no justification likely requires immediate remediation or suspension of the system. Under LL 144, you can still use a system that shows disparate impact — the law requires the audit and publication, not a specific outcome. Under SB 205, continued use of a system with known algorithmic discrimination risk without remediation could be evidence of failure to exercise “reasonable care.”

Rebecca Leung

Rebecca Leung

Rebecca Leung has 8+ years of risk and compliance experience across first and second line roles at commercial banks, asset managers, and fintechs. Former management consultant advising financial institutions on risk strategy. Founder of RiskTemplates.

Immaterial Findings ✉️

Weekly newsletter

Sharp risk & compliance insights practitioners actually read. Enforcement actions, regulatory shifts, and practical frameworks — no fluff, no filler.

Join practitioners from banks, fintechs, and asset managers. Delivered weekly.