SOC 2 Type 2 for AI Companies 2026: AI-Specific Evidence Auditors Look For

Q: Do we need SOC 2 if our entire LLM stack is OpenAI?

Yes. SOC 2 attests to your controls, not OpenAI's. OpenAI's SOC 2 (and ISO 27001, and ISO 42001 once issued) tells your auditors that the LLM provider has its own controls — that is a piece of your vendor management evidence, not a substitute for your own report. Enterprise procurement will ask for both: the SOC 2 of the application they are buying (you), and evidence that your sub-processors are themselves audited.

Last updated: April 2026 · Category: AI Compliance · Author: Knowlee Team

If you sell software to mid-market or enterprise buyers in 2026, SOC 2 is no longer a "nice security badge" — it is the procurement gate. Vendor risk teams will not even open a commercial conversation without an in-date SOC 2 Type 2 report on the table. That has been true for SaaS for years. What is new is what those same vendor risk teams are now writing into their AI-vendor questionnaires: model versioning policies, prompt logging retention, training-data lineage, hallucination handling, sub-processor chains for LLM providers, and AI literacy training records that mirror Article 4 of the EU AI Act.

The American Institute of Certified Public Accountants (AICPA), which owns the SOC 2 framework via its Trust Services Criteria, has not (as of April 2026) released a dedicated "AI module." But the Trust Services Criteria are written as principles, and AI-fluent auditors are interpreting those principles against AI-specific risks the framework's 2017 update never anticipated. This guide covers what is genuinely different about a SOC 2 Type 2 audit for an AI company in 2026 — what evidence auditors are asking for, how SOC 2 composes with the AI Act and ISO/IEC 42001, and how to prepare without rebuilding your entire engineering organization.

We will be specific. SOC 2 is a US framework managed by the AICPA; it is not a regulatory certification, it is an attestation report issued by an independent CPA firm. We are not auditors and we have not been engaged to perform any opinion in this document — none of what follows substitutes for advice from a licensed CPA firm. What we can do is share what we see across our customers' AI vendor questionnaires, what AI-specific controls are landing in their final reports, and how the framework composes with the AI Act compliance work many of those same teams are running in parallel.

Type 1 vs Type 2: the distinction that actually matters in procurement

Both Type 1 and Type 2 reports are issued under the same Trust Services Criteria — Security (always required), Availability, Processing Integrity, Confidentiality, and Privacy (any combination of the latter four are scoped in optionally). The distinction is observation period:

SOC 2 Type 1 is point-in-time. The auditor inspects your controls on a single date and gives an opinion on whether they are suitably designed to meet the criteria. You can finish a Type 1 in 6–10 weeks of preparation plus 2–4 weeks of audit fieldwork.
SOC 2 Type 2 is a period-of-time observation, typically 6 to 12 months (3 months in some bridged engagements, but most enterprise buyers want minimum 6, often 12). The auditor tests whether your controls are operating effectively over that window — meaning they actually fired, not just that they exist on paper.

In procurement, the gap between Type 1 and Type 2 is enormous. A Type 1 tells a buyer "this vendor wrote down their controls." A Type 2 tells a buyer "this vendor's controls have demonstrably been on for a year." Most enterprise vendor risk teams will accept a Type 1 as a bridge — proof that you are pursuing certification — but will require a Type 2 within 6 to 12 months for renewal. Some procurement gates accept only Type 2 from day one.

For an AI company specifically, the Type 2 observation window matters more than for traditional SaaS. AI systems drift. Models get retrained, prompts get edited, vendor LLM endpoints version up. A point-in-time Type 1 cannot show a buyer that your prompt audit trail was continuously logging across all those changes — only a Type 2 can. This is why an increasing number of AI-specific RFPs require Type 2 with an observation period that overlaps any major model migration in scope.

The 5 SOC 2 Trust Services Criteria — through an AI lens

The Trust Services Criteria (TSC), issued by the AICPA and most recently updated in 2017 with subsequent points-of-focus refreshes, define what a SOC 2 audit covers. Security is always in scope. The other four — Availability, Processing Integrity, Confidentiality, Privacy — are elective, and most AI companies selling to enterprise scope at minimum Security + Confidentiality, with Availability added when SLAs are part of the contract and Privacy added when personal data is processed (which, for any AI company touching customer data, is essentially always).

What is new in 2026 is how AI-fluent auditors interpret each criterion when the system under audit is an AI product. Here is what they are now asking for.

Security

The classic Security criterion covers logical access, change management, system monitoring, incident response, and vendor management. AI-specific evidence layered on top in 2026 includes:

Model registry access controls. Who can promote a model to production? Is that gated by approval workflow, with the approver, timestamp, and model version logged?
Prompt-injection mitigation testing. Have you red-teamed your prompts? Auditors are starting to ask for prompt-injection test artifacts: a documented adversarial test suite, results, and mitigation evidence (input sanitization, output filtering, system-prompt hardening).
Secrets handling for LLM API keys. Provider API keys (OpenAI, Anthropic, Google, Mistral, etc.) are credentials. They need to live in a secrets manager, rotate on a documented cadence, and be revoked on offboarding — same standard as database credentials.
Sub-processor chain documentation. When a customer's data flows through your stack into a third-party LLM, the LLM provider is a sub-processor. The chain — and the underlying contractual controls — need to be documented and reviewed.

Availability

Traditional Availability evidence is uptime monitoring, capacity planning, backup and restore testing, DR runbooks. The AI-specific layer:

LLM provider failover. If your only LLM is OpenAI and OpenAI goes down, what happens to your service SLA? Multi-provider routing or graceful degradation evidence is becoming table stakes.
Inference latency monitoring. Availability is not just up/down; for AI products, p95/p99 inference latency is part of what customers consider "available."
Rate limit handling. Vendor LLM APIs throttle. Documented retry/backoff and queueing evidence.

Processing Integrity

This criterion is about whether the system processes data completely, accurately, in a timely manner, and as authorized. For an AI system, "accuracy" is the open question — and auditors are not asking you to certify your model is correct, but they are asking how you know when it is wrong.

Output verification controls. For generative outputs, what guardrails check before content is delivered to a user? (Content filters, schema validation for structured outputs, hallucination detection layers, citation grounding checks.)
Model performance monitoring. Continuous evaluation against a held-out test set. Drift alerts. A documented threshold for triggering rollback or retraining.
Human-in-the-loop enforcement. Where your policy says "a human reviews this before it goes out," can you produce evidence — per-decision — that a human actually did?

Confidentiality

Customer data confidentiality. The AI-specific evidence is about the data that enters and leaves the model:

Prompt and completion retention policy. How long are prompts and model outputs stored? Where? Who can access them? Auditors want a documented policy and evidence the policy is enforced (TTL on storage, access logs, deletion attestations).
Vendor LLM zero-data-retention (ZDR) confirmations. If you tell customers their data is not used for vendor training, you need contractual proof from each LLM provider — typically a signed DPA or enterprise agreement clause confirming ZDR — and you need to be able to produce it in audit.
Tenant isolation in shared infrastructure. If a single LLM key serves multiple customers, what stops cross-tenant prompt or completion leakage? Logical isolation evidence.

Privacy

Privacy under SOC 2 references the AICPA's Generally Accepted Privacy Principles. Where personal data is processed by an AI system, the AI-specific overlay is:

Training-data lineage. If you fine-tune or train on customer data, can you trace what went in? What was the lawful basis? What deletion rights apply?
Data subject request (DSR) handling for AI artifacts. When a user invokes the right to erasure, does that include their prompts and completions? Embeddings derived from their data?
Cross-border data flow documentation. LLM providers are typically US-based. For EU customer data, transfer mechanism evidence (SCCs, supplementary measures) needs to extend through the LLM hop.

AI-specific evidence categories: what to actually have ready

Below are the evidence categories AI-fluent auditors are most consistently requesting in 2026 SOC 2 Type 2 engagements. None of these are formally codified in the AICPA Trust Services Criteria; all of them are being mapped to existing TSC controls (CC6, CC7, CC8 most commonly) by auditors interpreting principles against AI risk. Treat this as a working checklist — not the AICPA's published guidance.

1. Model versioning and rollback

A model registry (MLflow, Weights & Biases, custom, or part of your model-serving platform) recording every model promoted to production: name, version, training run, training data snapshot reference, evaluation metrics, approver, deployment timestamp.
Documented rollback procedure with a tested example (an actual rollback run, not just a runbook).
Tagging discipline so any production inference can be traced to the exact model version that produced it.

2. Prompt audit trail

This is the single most-requested AI-specific evidence in 2026 SOC 2 Type 2s. Auditors are asking for:

Per-call logging: timestamp, calling user/service, model name and version, full input prompt (or hashed reference if PII is in scope), full output, token counts, cost.
Retention period documented and enforced. Common patterns: 90 days hot, 12 months cold, deletion thereafter unless legal hold.
Tamper-evidence — append-only storage, write-once buckets, or cryptographic chaining — so you can attest the log was not edited after the fact.

For a deeper implementation walkthrough, see our AI audit trail implementation guide.

3. Vendor LLM passthrough — DPAs, ZDR, sub-processor chain

Signed Data Processing Agreement (DPA) with each LLM provider in production scope.
Explicit zero-data-retention or no-training-use confirmation, in contract — not in marketing copy.
Sub-processor list maintained and shared with customers, with notification mechanism for additions or changes.
Evidence of vendor security review: SOC 2 reports of your LLM providers, ISO 27001 certificates, completed security questionnaires.

This is one of the higher-leverage areas to systematize early; our AI vendor risk assessment checklist covers the standard evidence enterprise buyers are now demanding from AI vendors and their sub-processors.

4. AI hallucination and incorrect-output handling controls

Documented policy: how you classify model errors, what triggers an incident, severity thresholds.
Output filtering layer evidence: content moderation, factuality checks, structured-output schema validation, refusal rules.
User reporting mechanism with SLA for review and remediation.
Trend analytics — are hallucination reports going up or down quarter over quarter? Auditors increasingly want to see the trend, not just the policy.

5. AI literacy training records (Article 4 AI Act crossover)

The EU AI Act, in force progressively from August 2024 onward, requires under Article 4 that providers and deployers of AI systems ensure a sufficient level of AI literacy among staff. AI-fluent SOC 2 auditors are now asking for this evidence even when the audited entity is US-domiciled — because their enterprise customers in Europe ask for it, and SOC 2 Common Criteria CC1 covers personnel competency.

Annual AI literacy training curriculum, role-tailored.
Completion tracking per employee, with timestamps.
Retention of records for at least the audit period (better: indefinitely for active employees, plus retention period after offboarding).

6. Human-in-the-loop policy enforcement

For AI outputs that policy designates as requiring human review (high-risk decisions, regulated industries, customer-facing content depending on stakes): per-decision evidence that a human reviewed.
The reviewer's identity, the timestamp, and ideally the difference between the model's draft and the published version.
This evidence dovetails with AI Act Article 14 (human oversight for high-risk systems) — the same logs satisfy both frameworks, which is exactly the kind of cross-mapping leverage you want to build for.

7. Model performance monitoring and degradation alerts

Continuous evaluation pipeline running against a held-out or production-shadow test set.
Threshold-based alerting on accuracy, latency, hallucination rate, refusal rate, drift metrics.
Runbook for what happens when an alert fires — who triages, decision tree to rollback, retrain, or accept.
Evidence of alerts firing during the observation period and how they were resolved.

Composing SOC 2 with the EU AI Act and ISO/IEC 42001 — the three-framework strategy

In 2026, no serious AI company doing enterprise sales runs SOC 2 in isolation. The frameworks that compose with it are:

EU AI Act (Regulation (EU) 2024/1689) — a legal regulation, not a voluntary attestation. Covers prohibited AI practices, high-risk system obligations (registration in the EU database, technical documentation, post-market monitoring, conformity assessment), transparency obligations for foundation models, and AI literacy. Penalties scale to 7% of global annual turnover for prohibited-practice violations.
ISO/IEC 42001:2023 — an international management-system standard for AI, structured similarly to ISO/IEC 27001 (information security) and ISO 9001 (quality). It is voluntary, certifiable by accredited certification bodies, and structures your AI governance as a Plan-Do-Check-Act management system with an AI policy, AI risk assessment process, AI impact assessment, and continual improvement.
SOC 2 Type 2 — the AICPA Trust Services Criteria attestation. Voluntary, US-rooted, but the most common procurement gate in enterprise software sales globally.

The three are intentionally overlapping but not redundant. SOC 2 attests to your trust criteria implementation. The AI Act imposes legal obligations specific to AI risk classes. ISO 42001 builds the management system that makes consistent compliance with both achievable and demonstrable.

Auditors and regulators increasingly cross-map. A control documenting model versioning satisfies SOC 2 CC8 (change management), ISO 42001 clause 8 (operation), and AI Act Annex IV technical-documentation requirements simultaneously. A prompt audit trail satisfies SOC 2 CC7 (system monitoring), ISO 42001 clause 9 (performance evaluation), and AI Act Article 12 (record-keeping for high-risk systems). The three-framework strategy — once you have the underlying evidence — costs marginally more than running any one framework alone, because the same control evidence maps to multiple controls across frameworks.

For a side-by-side feature comparison, see our ISO 42001 vs SOC 2 vs ISO 27001 comparison. For the EU regulatory layer specifically, see our AI Act compliance software guide and the broader AI compliance checklist 2026.

A practical sequencing for AI-vertical companies in 2026 typically looks like:

AI Act readiness first if you have any EU exposure — because it is law, not optional, and the high-risk system registration deadlines have teeth.
SOC 2 Type 1 then Type 2 in parallel, because procurement will not wait. Most companies pursue a Type 1 in months 1–3 of their compliance program and a Type 2 covering months 4–9 (or 4–15 for a 12-month observation).
ISO/IEC 42001 certification layered on once the SOC 2 program is operating. The management-system discipline ISO 42001 enforces is what keeps the SOC 2 controls operating effectively year over year.

For the ISO 42001 deep dive, see our ISO 42001 checklist for AI management.

Audit prep playbook — the 6-month timeline

Most first-time SOC 2 Type 2 engagements for AI companies follow this rough shape:

Months -6 to -4 (pre-observation). Scope the audit (which TSC, which systems, which entities). Choose your auditor — for AI companies, prioritize firms with documented AI-engagement experience; ask for redacted examples of AI-specific controls they have tested. Choose your compliance automation platform — see our Vanta pricing 2026 breakdown for the most common option, or compare alternatives in our AI governance platform 2026 overview. Implement controls that are missing. Run a readiness assessment with your auditor (often a separate engagement) to surface gaps before the observation window opens.

Months -3 to 0. Close readiness gaps. Test your evidence collection automation end-to-end — prompt logs, model registry, access reviews, vendor reviews, training completion. Lock down the scope.

Months 0 to +6 (observation period). Controls operate. Evidence accumulates. Resist the temptation to change controls mid-window — every change creates a "did the new control operate effectively" question. Track issues and remediations transparently; auditors prefer "we found a gap, here is the remediation evidence" over silence.

Months +6 to +9 (audit fieldwork and report). Auditor samples evidence, conducts interviews, tests controls. Common findings on first audit: incomplete access reviews (offboarded users still in systems), inconsistent change-management evidence (PRs merged without recorded approvals), gaps in vendor reviews, missing AI literacy training records, prompt logs with PII not properly masked. Most are remediable; very few are report-blocking when surfaced early.

Common evidence collection automation patterns. Compliance automation platforms (Vanta, Drata, Secureframe, Sprinto, Thoropass) handle the SOC 2 control evidence — identity, infrastructure, code change, vendor — well. None of them, as of April 2026, ship out-of-the-box AI-specific evidence collection (prompt logs, model registry events, AI literacy training, AI Act registrations). That layer is still on you. The pattern we see working: keep your SOC 2 automation platform for the traditional controls, then add an AI-evidence layer on top — either home-grown logging plus a manual audit-binder, or an AI-governance platform composing with the SOC 2 tool.

AI-vertical SOC 2 vs traditional SOC 2 — what is different in 2026

Compared to a traditional SaaS SOC 2 Type 2, the AI-vertical version differs in three concrete ways:

Scope is wider. "The system" includes the model, the prompts, the training data lineage, the inference pipeline, and the chain of LLM sub-processors — not just the application and its database.
Observation is more demanding. Models change. Prompts evolve. Vendor LLM endpoints version up. Continuous evidence (prompt logs, model registry, evaluation runs) needs to be tamper-evident across the full observation window, not just sampled at quarter-end.
Auditor selection matters more. A SOC 2 firm without AI engagement experience will either over-scope the audit, under-scope it (missing AI-specific risks entirely), or both. Ask candidate firms for their AI-engagement methodology, redacted control examples, and how they handle prompt-log evidence collection. A firm that cannot answer those questions is the wrong firm for an AI vertical.

Frequently asked questions

Do we need SOC 2 if our entire LLM stack is OpenAI?

Yes. SOC 2 attests to your controls, not OpenAI's. OpenAI's SOC 2 (and ISO 27001, and ISO 42001 once issued) tells your auditors that the LLM provider has its own controls — that is a piece of your vendor management evidence, not a substitute for your own report. Enterprise procurement will ask for both: the SOC 2 of the application they are buying (you), and evidence that your sub-processors are themselves audited.

Does the EU AI Act replace SOC 2?

No. Different jurisdictions, different scopes, different mechanisms. SOC 2 is a US AICPA attestation, voluntary, focused on trust criteria. The AI Act is EU regulation, mandatory, focused on AI-specific risk classes and obligations. They overlap on evidence — model versioning, audit logs, AI literacy — but neither substitutes for the other. Most AI companies selling to enterprise need both. See our AI Act compliance software guide for the regulatory side.

What does a first SOC 2 Type 2 cost for an AI startup?

As of April 2026, total first-year SOC 2 Type 2 spend for an AI startup typically lands in the $40k-$120k range, depending on scope and remediation needs. Components: auditor fees ($20k-$60k for the Type 2 engagement, scaling with scope and TSC selection), compliance automation platform ($10k-$40k/year), readiness consulting (often bundled or $5k-$25k), engineering time to close gaps (highly variable, often the largest hidden cost). AI-specific scope tends to push the upper end of these ranges because the auditor has more areas to test. These are illustrative ranges from public reports and customer engagements, not quotes — get fixed-fee proposals from at least three audit firms.

Which audit firms have AI fluency?

We do not endorse specific audit firms. The signal to look for: ask the partner who would lead your engagement to walk through how they have audited (a) a prompt audit trail, (b) a model registry, (c) a vendor LLM sub-processor chain, in a prior engagement. If they need to escalate the call to a specialist, that is a yellow flag for a SOC 2 Type 2 — by the time you are in fieldwork, the lead partner should be the AI-fluent one. Several Big 4 firms, several mid-tier specialists, and several SOC-2-focused boutiques have built AI practices in 2024-2026; the differentiator is the partner, not the firm logo.

How long from start to a delivered Type 2 report?

Realistically, 9–15 months from kickoff to a delivered Type 2 report, broken roughly into 3–6 months of preparation and a 6-month observation period plus 1–3 months of fieldwork and reporting. Companies that already have most security controls in place (and just need the AI-specific overlay) can compress preparation to 1-2 months. Companies starting from scratch should plan a Type 1 first as a procurement bridge while the Type 2 observation is running.

Do we need ISO 42001 too?

Not strictly. SOC 2 alone clears most US enterprise procurement gates. ISO/IEC 42001 becomes valuable when (a) your buyers are European or global enterprises that have started asking for it, (b) you want a management-system discipline that keeps your SOC 2 controls operating effectively year over year, or (c) you are building toward AI Act conformity for high-risk systems and want the management-system foundation. Most AI companies serious about enterprise add 42001 in year 2 or 3, after the SOC 2 program is mature. See our ISO 42001 checklist for AI management for the certification scope.

Conclusion: SOC 2 for AI companies is a composing problem, not a separate audit

The shift in 2026 is not that SOC 2 changed — the AICPA Trust Services Criteria are still the 2017 framework, principles-based and unchanged on the surface. The shift is that AI-fluent auditors interpret those principles against AI-specific risks: model versioning, prompt audit trails, vendor LLM passthrough, hallucination handling, AI literacy. And the same evidence increasingly satisfies SOC 2, ISO 42001, and AI Act obligations simultaneously — if you build the underlying evidence layer with cross-mapping in mind.

That is exactly the layer Knowlee 4Legals operates at. We do not issue SOC 2 reports — that is the auditor's job, and we are not a CPA firm. What we do is build and maintain the AI-specific evidence layer underneath: AI Act high-risk-system registration support, ISO 42001 management-system documentation, AI literacy training and tracking (Article 4), prompt and model audit trails ready for export to your SOC 2 evidence binder, vendor LLM sub-processor inventory with DPA tracking, and the cross-mapping that lets the same control evidence flow into your SOC 2 audit, your AI Act technical file, and your ISO 42001 management system.

We compose with — not replace — the SOC 2 automation platforms (Vanta, Drata, Secureframe, Sprinto, Thoropass) you may already run for the traditional security controls. They handle infrastructure, identity, and change management. We handle the AI-specific layer. The result is one evidence base, three frameworks satisfied, one audit story for procurement.

If you are scoping a 2026 SOC 2 Type 2 and the AI-specific evidence is the part of the work without an obvious owner, that is the gap we close. Start with the AI compliance checklist 2026, browse the AI Act compliance tool glossary entry, or talk to us about composing the AI evidence layer with whatever SOC 2 program you are already running.