Human-in-the-Loop AI Policy Template (2026) — Roles, Decisions, SLAs, AI Act Coverage
A complete human-oversight policy template with the role × decision × SLA matrix the EU AI Act now requires for high-risk systems — adoptable as-is or adaptable to your governance baseline.
This article owns the policy document template (MOFU) intent. For the AI Act Article 14 regulatory obligation as a concept, see the human oversight glossary entry (GOVERNANCE wave). For the HITL workflow design pattern, see human-in-the-loop glossary entry.
Many organizations reach production with a high-risk AI system and discover their "human oversight" obligation is a sentence in a policy document, not an operational commitment. A reviewer clicks approve, nothing is checked, nobody is accountable. That is rubber-stamping — and it will not survive an audit.
This template converts the obligation into a working system: four named roles, a decision-category matrix with explicit time-bound SLAs, and the evidence trail an auditor needs to confirm oversight actually happened. It aligns directly with the EU AI Act Article 14 requirements and fits inside any organization already working from the complete AI compliance checklist.
What "Human in the Loop" Actually Means Under the AI Act {#what-hitl-means}
"Human in the loop" has become a phrase that covers everything from a real-time approval gate to an annual review of AI outputs. The EU AI Act does not use the phrase at all. What it requires — under Article 14 — is human oversight: the ability of designated persons to understand AI outputs, detect anomalies, intervene, and halt or override the system.
Article 14 Requirements at Plain-English Level
Article 14 of the EU AI Act imposes four categories of obligation on deployers of high-risk AI systems:
- Designation: At least one named person must be assigned oversight responsibility for each high-risk AI system.
- Competence: That person must have the authority, training, and situational awareness to actually exercise oversight — not just exist on an org chart.
- Interpretability: The system must produce outputs that a qualified human can understand well enough to assess correctness.
- Override capability: The person must be technically able to interrupt, override, or halt the system. This is a technical requirement on the provider, not just a policy commitment on the deployer.
A policy that designates a reviewer but gives them no training, no alert mechanism, and no override capability does not satisfy Article 14. The policy must be operationalized.
HITL vs. HOTL vs. HIC — The 3 Oversight Modes
The three oversight modes differ in timing and depth:
| Mode | Full name | When oversight occurs | Typical use |
|---|---|---|---|
| HITL | Human-in-the-loop | Before each individual AI decision is acted on | High-stakes, low-volume decisions: credit, employment, medical triage |
| HOTL | Human-on-the-loop | AI acts; human monitors and can intervene | Medium-stakes, higher-volume: fraud flagging, content moderation |
| HIC | Human-in-command | Human sets policy; AI operates within it; human reviews by exception | Lower-stakes or well-characterized systems: routine classification, scheduling |
Your policy must state, per use case, which mode applies and why. A system with a mix of decision types may require different modes for different output categories.
When Human Oversight Is Mandatory vs. Recommended {#when-required}
Human oversight is mandatory under the EU AI Act when any of the following conditions are met:
- The AI system is classified as high-risk under Annex III (which covers employment decisions, credit scoring, biometric categorization, critical infrastructure, education access, law enforcement, migration, and administration of justice). For a full classification analysis, see the AI Act high-risk systems classification.
- The system makes or significantly influences decisions that affect an individual's legal status, access to services, or fundamental rights.
- The system operates in a regulated sector (financial services, healthcare, insurance) where sector-specific rules impose human review requirements independently of the AI Act.
- GDPR Article 22 applies: the system makes solely automated decisions with legal or similarly significant effects on natural persons.
Human oversight is strongly recommended (and effectively required for any reasonable governance program) when:
- The system produces outputs that will be used to brief human decision-makers without independent verification.
- The system operates in a context where errors are costly to reverse — even if the AI Act's formal Annex III criteria are not met.
- Your organization has made contractual commitments to clients or regulators about AI governance.
If your system does not currently meet the formal mandatory threshold, this policy template still applies. Documenting that you assessed the threshold and determined the oversight mode appropriate to the risk level is itself a governance artifact worth having.
The 4 Oversight Roles Every Policy Needs {#oversight-roles}
A functional human oversight policy requires four distinct roles. In small organizations, one person may hold multiple roles — but the accountability structure must remain clean. Role conflation is a common policy failure: when the AI Owner also approves decisions, there is no independent check.
For context on how these roles interact with ISO 42001 management system requirements, see the ISO 42001 checklist.
AI Owner (Accountability) {#ai-owner}
The AI Owner is the senior accountable person for the AI system — typically the product owner, business unit head, or system sponsor. This role is not operational; it is accountable.
Responsibilities:
- Approves the AI system's deployment and any significant change to its scope or behavior.
- Signs off on the risk classification and the oversight mode assigned to each decision category.
- Receives escalations that the AI Reviewer cannot resolve within SLA.
- Reviews the AI Auditor's periodic report and signs the audit response.
- Is named in the organization's AI system inventory and in any EU AI Act conformity documentation.
What this role is not: The AI Owner does not review individual decisions. They own the system; they do not operate it.
AI Reviewer (Decision Review SLA) {#ai-reviewer}
The AI Reviewer is the operational human who reviews AI outputs before or after they are acted on, depending on the oversight mode assigned.
Responsibilities:
- Reviews AI outputs within the SLA defined for each decision category (see the matrix below).
- Approves, rejects, or modifies AI recommendations with documented reasoning.
- Escalates anomalies, unexpected patterns, or system malfunctions to the AI Operator.
- Maintains a review log — date, decision ID, action taken, time to decision, any overrides.
Competence requirements: The AI Reviewer must be trained on the system's purpose, the decision criteria in scope, the indicators of system error or bias, and the override procedure. Training must be documented and refreshed at least annually.
Staffing: Calculate reviewer capacity against expected volume and SLAs before going to production. A policy that requires 4-hour review of 200 daily decisions but assigns one part-time reviewer will fail operationally regardless of what it says on paper.
AI Operator (In-Flight Intervention) {#ai-operator}
The AI Operator is the person — or team — with technical access to the running system, responsible for monitoring, intervention, and incident response.
Responsibilities:
- Monitors system performance metrics, error rates, and output distributions in real time or near-real time.
- Executes the override or halt procedure when triggered by the AI Reviewer's escalation or by automated alerts.
- Investigates anomalies flagged by the monitoring layer.
- Coordinates with the AI system provider on technical issues that cannot be resolved in-house.
- Documents every intervention: timestamp, trigger, action taken, outcome.
Authority: The AI Operator must have the technical authority to pause or halt the AI system without waiting for AI Owner approval in defined emergency conditions. The conditions triggering emergency halt must be specified in the policy.
AI Auditor (Post-Hoc Review) {#ai-auditor}
The AI Auditor conducts periodic independent review of the system's behavior, the quality of human oversight, and compliance with this policy.
Responsibilities:
- Reviews samples of AI decisions and the corresponding reviewer actions on a defined schedule (at minimum quarterly for high-risk systems).
- Assesses whether SLAs are being met in practice, not just on paper.
- Identifies rubber-stamping patterns (e.g., reviewer approval rate of 99.8% with average review time of 4 seconds).
- Produces a written audit report with findings and recommendations.
- Reports directly to the AI Owner — not through the operational chain.
Independence requirement: The AI Auditor must be independent of the AI Reviewer and AI Operator. For AI compliance automation in fintech and other regulated sectors, independence may be required by supervisory guidance.
The Decision × SLA Matrix {#decision-sla-matrix}
The matrix is the operational core of this policy. It defines how decisions are categorized, what SLA applies to human review in each category, and what triggers escalation or override.
Decision Categories {#decision-categories}
Every AI output that flows to an operational decision should be assigned to one of four categories:
| Category | Definition | Default oversight mode |
|---|---|---|
| Autonomous | AI decision is acted on without human review | HIC (human set policy; AI executes within it) |
| Supervised | AI decision is acted on; human monitors aggregate patterns and can intervene | HOTL |
| Mandatory Review | AI decision is not acted on until a human reviewer approves | HITL |
| Blocked | AI must not make this decision under any conditions; human decides | N/A — AI only informs |
Assigning categories: Walk through your AI system's output types and assign each to a category. The criteria for assignment are:
- Reversibility of the decision if wrong
- Population affected and magnitude of impact
- Regulatory requirement (Annex III, GDPR Article 22, sector regulation)
- Organizational risk appetite
For the first deployment of any high-risk system, default to Mandatory Review for all output types. Downgrade to Supervised only after at least 90 days of operational data confirming low error rate and no significant incidents.
Time-Bound SLAs per Category {#slas}
SLAs convert oversight from a principle into a commitment. Define them at policy-adoption time and resource accordingly.
| Decision category | Maximum review SLA | AI Owner escalation trigger | Notes |
|---|---|---|---|
| Mandatory Review — urgent | 2 hours | Unreviewed at T+1h | Employment actions, credit decisions, clinical recommendations |
| Mandatory Review — standard | 24 hours | Unreviewed at T+20h | Most high-risk Annex III decisions in non-urgent contexts |
| Supervised — anomaly | 4 hours from flag | Auto-escalation if no acknowledgment | When monitoring alerts; reviewer must acknowledge and investigate |
| Supervised — periodic | Weekly report | Monthly if no report produced | Aggregate review of AI output distributions |
| Autonomous — audit sample | 5 business days | — | Auditor samples; not a blocking SLA |
SLAs must be operationalized: your system must be capable of timestamping when a decision was produced and when it was reviewed, and alerting the AI Owner automatically when a SLA breach is imminent.
Escalation Triggers and Overrides {#escalation}
The policy must specify, in writing, what triggers each escalation level:
Reviewer → Operator escalation triggers:
- AI output is inconsistent with known ground truth or exhibits obvious error
- AI output contradicts a decision made by a human reviewer within the preceding 30 days on a materially identical input
- Monitoring metrics (accuracy, rejection rate, output distribution) deviate more than [threshold — specify] from baseline
- AI system response time exceeds [threshold — specify]
Operator → AI Owner escalation triggers:
- System halt executed
- Any serious incident as defined in Article 73 of the EU AI Act
- SLA breach affecting more than [N — specify] decisions
- Reviewer reports a pattern suggesting systematic error or bias
Emergency halt authority: The AI Operator may halt the system without AI Owner approval when: (i) an active serious incident is in progress; (ii) the system is producing outputs that would cause irreversible harm if not stopped; (iii) a cybersecurity event affecting system integrity is detected. The AI Owner must be notified within 1 hour of any emergency halt.
Generate your human-in-the-loop policy in 10 minutes — Free generator
The AI policy template generator walks you through role assignment, decision categorization, and SLA setting — and outputs a policy document you can adopt immediately.
Training and AI Literacy Requirements {#training}
Article 14(4) of the EU AI Act requires that designated oversight persons have the competence necessary to understand the AI system's outputs and detect malfunctions. That requirement has direct implications for training program design.
Minimum training requirements for AI Reviewers and AI Operators:
- System purpose and scope: What the AI system does, what decisions it informs or makes, and what its declared performance metrics are.
- Output interpretation: How to read the system's outputs, what confidence scores or uncertainty indicators mean, and what constitutes a normal vs. anomalous output.
- Error patterns and bias indicators: Known failure modes, demographic or contextual factors that correlate with higher error rates, and how to recognize them in practice.
- Override and halt procedure: Exactly how to execute an override or halt — which interface, which authority, what documentation is required.
- Incident reporting: How to report an AI incident internally, and who has external reporting obligations (Article 73).
Training cadence:
- Initial training before any reviewer begins operational oversight.
- Annual refresher for all roles.
- Ad-hoc re-training when the system is significantly modified or when an incident reveals a gap in reviewer competence.
Documentation requirement: Training records must be maintained per reviewer, including: date, content, trainer/provider, and assessment result. These records are audit evidence — they must be producible within 48 hours on request.
Evidencing Human Oversight to an Auditor {#audit-evidence}
The question an auditor will ask is not "do you have a human in the loop" — it is "show me the evidence that the human actually reviewed the decision, understood it, and had the authority and capability to intervene."
For organizations that have completed a DPIA for AI systems, the oversight documentation requirements partially overlap — the DPIA will already require you to describe the oversight mechanism. This policy is the operational counterpart: it must be evidenced, not just described.
Evidence the auditor will expect:
| Evidence type | What it proves | Retention period |
|---|---|---|
| Decision review log | Reviewer reviewed, timestamped, took an action | 5 years (AI Act Article 12 minimum; extend for regulated sectors) |
| SLA compliance report | Oversight was timely, not retrospective | 3 years |
| Training records | Reviewers were competent | Duration of employment + 3 years |
| Override and halt log | Override capability was real and used when needed | 5 years |
| AI Auditor reports | Independent assessment confirms oversight quality | 5 years |
| Escalation records | Escalation paths function in practice | 3 years |
| Emergency halt records | Emergency authority exists and was documented | 5 years |
Common evidence gaps found in practice:
- Review logs exist but do not capture the reviewer's reasoning — only the outcome (approve/reject). An auditor can argue a log of approvals with no reasoning is consistent with rubber-stamping.
- Training records exist for initial onboarding but not for refresher training or ad-hoc re-training following an incident.
- Override capability is documented in architecture diagrams but was never tested — no record of a test execution.
For an AI vendor risk assessment checklist context: if you are deploying a third-party AI system, confirm that the provider's technical design actually enables the override and halt capabilities your policy requires. Do not assume — request documentation.
Policy Template — Copy/Paste Structure {#policy-template}
Use the following structure as your starting point. Replace all [PLACEHOLDER] values with your organization's specifics before adoption.
[ORGANIZATION NAME] — Human Oversight Policy for AI Systems
Version: 1.0
Effective date: [DATE]
Policy owner: [AI OWNER NAME AND TITLE]
Review cycle: Annual, and on any significant system modification
Scope: All AI systems classified as high-risk under the EU AI Act Annex III, and any additional AI systems designated in-scope by the AI Review Board
1. Purpose
This policy establishes the human oversight framework required by EU AI Act Article 14 for AI systems in scope. It defines oversight roles, decision categories, review SLAs, training requirements, and audit evidence obligations.
2. Roles and Responsibilities
[Complete the 4-role table from Section 3 of this article, with named individuals or named positions]
3. Decision Category Matrix
[Complete the decision × SLA matrix from Section 4 of this article, applied to each AI system in scope. One matrix per system, or per output type if a single system covers multiple decision categories]
4. SLAs and Escalation Paths
[Specify SLAs per category and named escalation paths with contact information]
5. Training Requirements
[Specify training content, cadence, assessment method, and documentation requirement]
6. Override and Halt Procedure
[Step-by-step procedure, naming the interface, the authority, and the notification requirement]
7. Audit and Evidence
[Specify what is logged, where, for how long, and the audit sample procedure]
8. Incident Reporting
[Internal reporting chain and external reporting obligations under Article 73 and any sector-specific regulation]
9. Policy Review
[Annual review date; triggers for interim review]
Common Pitfalls {#pitfalls}
Rubber-stamping. When a reviewer approves 99% of AI decisions with an average review time below 10 seconds, they are not reviewing — they are clicking. Causes include: reviewer volume overload, unclear criteria for rejection, no feedback when they override and the override turns out to be wrong. Fixes: SLA caps on volume per reviewer per day, explicit rejection criteria in training, and a feedback loop that closes the loop on overrides.
Alarm fatigue. Monitoring-based oversight (HOTL mode) fails when the monitoring layer produces too many alerts with low signal-to-noise ratio. Reviewers learn to dismiss alerts without investigating. Fixes: calibrate alert thresholds against baseline distributions before going live; create a tiered alert system that separates routine anomalies from escalation-worthy events; measure alert acknowledgment and investigation rate as a governance metric.
Role overload. Assigning the AI Reviewer and AI Operator role to the same person who also has primary job responsibilities unrelated to AI oversight creates a reliable path to oversight failure. The oversight role must be resourced, not appended. Calculate hours required before go-live.
Policy-to-practice gap. The policy states 24-hour SLA; no system timestamps decisions or sends alerts when the SLA is breached; reviewers are not aware of the SLA. The policy is an artifact that does not govern behavior. Every SLA in the policy must have a corresponding technical mechanism that enforces it or at least makes breaches visible.
Static roles in a dynamic system. The AI Owner approved v1.0 of the AI system; v2.1 introduced a new use case that crosses into a higher-risk Annex III category; no one updated the decision matrix or assigned oversight capacity for the new output type. Policy must be explicitly re-reviewed on every significant system modification.
Frequently Asked Questions
Is a human-in-the-loop policy mandatory for all AI systems, or only high-risk ones?
Mandatory under the EU AI Act only for high-risk systems classified under Annex III. However, other legal frameworks create overlapping obligations: GDPR Article 22 requires human review for solely automated decisions with legal or similarly significant effects, regardless of AI Act risk classification. Many organizations also find that sector-specific regulations (EBA guidelines, MDR, insurance supervision) impose human review requirements on AI systems that do not meet the Annex III threshold. A policy is also useful for non-mandatory systems — it converts a vague "responsible AI" commitment into documented, auditable governance.
Who should own the human oversight policy — the DPO, the Head of AI, or the COO?
Policy ownership and operational accountability are different questions. The AI Owner role — the senior accountable person named in this policy — should typically be the business unit head or product owner responsible for the AI system, because oversight accountability needs to sit with the person who controls deployment decisions. The DPO should be consulted on GDPR Article 22 intersection and should receive copies of oversight evidence for systems processing personal data. The Head of AI (or equivalent) typically owns the policy framework itself — the governance structure, the template, and the refresh cycle. The COO may need to own capacity planning: assigning reviewer time without COO sign-off on resourcing is a common path to underfunded oversight.
What's the minimum SLA for human review of high-risk AI decisions?
The EU AI Act does not specify a numeric SLA — it requires that oversight be "meaningful" and that designated persons have "the authority, knowledge and time" to exercise it. In practice, the SLA must be derived from the decision context: an AI system producing employment shortlist recommendations that an HR team acts on daily requires a shorter SLA than a system producing quarterly risk assessments reviewed at a governance committee. The matrix in this template provides a starting framework; calibrate it to your operational context and document the reasoning behind the SLA chosen. For urgent decisions, 2 hours is a common floor in regulated sector guidance. For standard decisions, 24 hours is widely defensible.
Can automated escalation rules count as "human oversight" under the AI Act?
Automated escalation rules — for example, an alert that fires when an AI system's output confidence drops below a threshold and routes the decision to a human reviewer — count as part of the oversight architecture, but they are not themselves human oversight. They are a mechanism that triggers human oversight. The human review that follows the alert is where the oversight obligation is met. A policy that describes alert thresholds and routing rules is describing the plumbing; the policy must also specify what the human does when the alert fires, who it goes to, and within what timeframe. Automated routing with no documented human response procedure is not oversight — it is an unanswered alarm.
How does a human-in-the-loop policy fit alongside an existing AI ethics charter?
An AI ethics charter typically articulates principles: fairness, transparency, accountability, non-maleficence. A human oversight policy operationalizes one of those principles — accountability — for AI systems in production. They are complementary, not redundant. The ethics charter sets the normative framework; the oversight policy translates one dimension of it into operational commitments, named roles, and measurable SLAs. If your organization already has an ethics charter, cross-reference this policy to it explicitly: state which charter principles the oversight roles and SLAs are designed to operationalize. This creates a governance audit trail that shows principles → policy → practice — which is exactly what a mature governance program looks like to an external auditor.
Putting the Policy Into Practice
Writing the policy is the first step. Making it work requires three further actions that organizations consistently underestimate:
Define roles → set SLAs → train reviewers → instrument the system to produce timestamps and alerts — in that order. Reversing the sequence (instrument first, define roles later) produces a system that logs data no one is accountable for.
Run a tabletop test before go-live. Simulate a serious incident: the AI system produces a wrong output, an alert fires, the AI Reviewer escalates. Walk each person in the role chain through their step. Identify the gaps before production does it for you.
Schedule the first AI Auditor review at 90 days. The first 90 days of production are the period of highest oversight failure risk — SLAs are theoretical, reviewers are still learning the system, volume estimates may be wrong. An early audit report gives you data to calibrate before the annual review cycle locks in flawed practices.
The AI policy template generator will walk you through role assignment, decision matrix completion, and SLA setting — and produce a policy document you can adapt and adopt without starting from a blank page.
For organizations that want a second opinion on whether a draft policy would survive an audit, book a 20-minute compliance review to validate your policy against Article 14 requirements and your specific system context.
This article is part of the AI compliance pillar — the complete AI compliance checklist covers the full regulatory surface across the EU AI Act, GDPR, ISO 42001, and NIST AI RMF.