Action Model: Definition & How It Differs from Chat LLMs and Reasoning Models

Key Takeaway: An action model is a foundation model trained or post-trained specifically to plan, navigate, and execute tasks in digital environments — not to answer questions in a chat window. The category distinction matters: the same model that writes excellent prose may be a poor action planner, and vice versa.

What is an Action Model?

An action model is a foundation model whose training regime, architecture choices, and evaluation benchmarks are oriented toward taking actions in digital environments rather than generating conversational text. Where a chat-completion model is trained to produce a helpful response to a user message, an action model is trained to produce a sequence of actions — clicks, keystrokes, API calls, code executions, file operations — that accomplish a goal in a real or simulated environment.

H Company, a Paris-based AI lab, has developed the clearest articulation of the action model category. Their framing distinguishes between models optimized for language tasks (conversational quality, factual accuracy, reasoning expressed in text) and models optimized for action tasks (task completion rate, step efficiency, error recovery, generalization to unseen interface states). These are different optimization targets that often pull training in different directions. A model post-trained on human demonstrations of software navigation may become a worse conversationalist; a model fine-tuned for chat quality may become a worse planner of multi-step digital workflows.

Training Distinguishes Action Models

The training data and reward signals for action models differ from those of chat models in two important ways.

Action-observation sequences. Action models are trained on traces of humans (or other agents) performing tasks: the current screen state, the action taken, the resulting screen state. This requires collecting and labeling operational data — browser sessions, desktop recordings, API interaction logs — rather than the text corpora used for language model pretraining. The scarcity and cost of this data is one reason the action model category has developed more slowly than the chat model category.

Task completion rewards. Chat models are trained on preference comparisons between responses (RLHF). Action models are evaluated on whether the task was actually completed: did the booking get made, did the form get submitted, did the data get extracted. Intermediate step quality matters but is secondary to goal achievement. This shifts training toward robustness and recovery — handling the cases where the expected UI state does not appear — rather than toward fluency and factual accuracy.

How It Differs from Adjacent Categories

Versus a chat LLM. Chat models (GPT-4o, Claude Sonnet, Gemini Pro) are optimized for conversational quality, instruction following in natural language, and factual recall. They can be prompted to plan actions and can call tools, but their underlying training did not optimize for multi-step task execution in dynamic digital environments. Action models treat the execution environment as the primary evaluation domain; chat models treat the conversation as the primary domain.

Versus a reasoning model. Reasoning models (o3, o1, DeepSeek-R1) are optimized for deliberate multi-step reasoning expressed through chain-of-thought or extended computation before producing an answer. They excel at mathematics, formal logic, and hard problem-solving tasks expressed in language. Reasoning and action are related but distinct: a model can reason well about what to do and still fail to execute reliably in a dynamic environment. An action model's reasoning is instrumental — it reasons in order to act efficiently — not terminal.

Versus an autonomous agent. An action model is the model layer. An autonomous agent is the full system: action model + runtime + memory + tool access + goal specification. The action model is one component of the agent architecture, specifically the component that decides what action to take given the current state. See Agentic AI for the full agent architecture.

Why Action Models Need an Agentic OS

An action model operating without governance infrastructure creates a specific risk: it acts effectively but without accountability. The model can complete tasks — navigate software, execute workflows, interact with external services — without any operator-visible record of what it did, under whose authorization, and whether the action fell within defined risk boundaries.

An agentic OS addresses this gap. The jobs registry declares what actions each agent is authorized to take, under what risk classification, with what human-oversight requirements. The kanban makes every running action model's state visible. The audit trail records every tool call. The action model becomes more useful when embedded in this governance layer, not despite it — because the governance layer is what makes its outputs defensible to regulators, clients, and the operator's own organization.

H Company's Contribution

H Company (formerly known as Holistic AI's spin-out team, now independent) has positioned action models as the next major capability category after chat and reasoning models, arguing that most economically valuable AI work involves taking actions in existing software systems — CRMs, ERPs, browsers, APIs — rather than generating text for human review. Their research focuses on generalization across unseen interfaces, error recovery from unexpected states, and efficient multi-step planning for long-horizon tasks in real enterprise environments.

Related Concepts

  • Agentic AI — the design paradigm that action models implement at the model layer; the full agent architecture includes runtime, memory, and tools.
  • Agent Runtime — the execution environment that translates an action model's output into real system operations.
  • World Model AI — a complementary architecture: action models navigate; world models predict consequences before acting.
  • Agentic Operating System — the governance and observability layer that makes action models production-safe.
  • Agentic OS vs Agent Platform — why action models embedded in an OS are architecturally different from action models running in an isolated pipeline.