Multi-Agent Communication Protocols: MCP, A2A, and Structured Outputs (2026)

Most multi-agent systems built in 2025 had a custom communication layer. By the end of 2026, most production systems do not — they use one of three protocols, often all three, and the bespoke glue code that used to surround agents has been retired.

Three things changed. Anthropic's Model Context Protocol (MCP) became the default tool-and-data interface and stopped being controversial. Google's Agent-to-Agent (A2A) protocol moved out of draft and is now adopted across the major frameworks. And structured-output enforcement at the model layer (JSON Schema, constrained decoding) became reliable enough that "JSON inside a code block, hopefully" stopped being an acceptable pattern.

This piece is the operator-level walkthrough of all three: what each protocol actually does, when each fits, where each fails, and how they fit together in a real production stack. Code examples throughout, written as the contract — what the wire actually carries — not as framework-specific syntactic sugar.

The layered model

The three protocols address three different boundaries.

Structured outputs govern the boundary between the model and the application. They are the foundation. Every other protocol assumes structured outputs are working underneath; without them, the others degrade into best-effort.

MCP governs the boundary between an agent and a tool. A tool is anything an agent calls to do something — a database query, a file read, a search, a calendar lookup. MCP makes that boundary uniform across tools and across agents.

A2A governs the boundary between two agents, particularly across a trust boundary (different teams, different organizations, different runtimes). It is the protocol you use when one agent calls another agent it does not control.

A useful mental picture: structured outputs are how an agent talks to itself, MCP is how an agent talks to its tools, and A2A is how an agent talks to other agents. In a typical 2026 production fleet you have structured outputs everywhere, several MCP servers (one per tool family or data source), and one or two A2A endpoints (usually the foreman, optionally a public agent customers can call into).

Structured outputs: the foundation

Structured-output enforcement is the discipline of making every agent emit typed, schema-validated output, never free text. Not "JSON inside a code block, hopefully." A real schema (JSON Schema, Pydantic, Zod), enforced at the model layer where the provider supports it, and validated at the application layer always.

This is the single highest-leverage discipline in multi-agent systems. If your agents return free text, your foreman has to parse it. Parsing is where regressions hide. Regressions in parsing surface as silent quality drops two weeks later that no one can root-cause. Make this non-negotiable.

In code, the contract:

// Pydantic-style schema for a Lead-Discovery agent
import { z } from "zod"

export const CandidateCompany = z.object({
  name: z.string(),
  domain: z.string(),
  evidence_url: z.string().url(),
  fit_score: z.number().min(0).max(1),
})

export const DiscoveryResult = z.object({
  candidates: z.array(CandidateCompany),
  notes: z.string().nullable(),
  confidence: z.number().min(0).max(1),
})

export type DiscoveryResult = z.infer<typeof DiscoveryResult>

Three places this schema gets used:

At the model layer. Modern model providers (Anthropic, OpenAI, Google) support constrained decoding against a JSON Schema. The model is forced to produce output that conforms to the schema; deviations are impossible at the token level. You pass the schema in the request; you get a guaranteed-shape response.
At the application layer. Even when the model layer enforces the schema, the application validates again on receipt. Belt and suspenders. Schema versions drift; provider implementations have edge cases; the second validation catches the rare misses.
At the foreman layer. When the foreman dispatches a specialist, it validates the work item against the specialist's input schema before dispatch and validates the result against the output schema on return. The foreman is the schema gatekeeper for the system.

// Foreman dispatching a specialist with structured-output enforcement
async function dispatch<I, O>(
  specialist: Specialist<I, O>,
  workItem: I,
): Promise<Result<O>> {
  const validatedInput = specialist.inputSchema.safeParse(workItem)
  if (!validatedInput.success) {
    return { kind: "invalid_input", issues: validatedInput.error.issues }
  }

  const raw = await specialist.invoke(validatedInput.data, {
    response_format: {
      type: "json_schema",
      schema: specialist.outputSchema,
    },
  })

  const validatedOutput = specialist.outputSchema.safeParse(raw)
  if (!validatedOutput.success) {
    return { kind: "invalid_output", issues: validatedOutput.error.issues }
  }

  return { kind: "ok", value: validatedOutput.data }
}

When structured outputs work:

The whole workflow is in a system you control, end to end.
You have access to a model with constrained-decoding support.
The schemas are stable enough to commit to source control.

When structured outputs fail:

Schema is too tight. If the schema is overspecified (every nested field required, no nullable fields, no optional notes), the model has nowhere to put information that does not fit the rigid shape. It will produce technically valid output that loses real signal. Fix: include a notes or metadata field for free text the agent could not place elsewhere, and explicit nullable fields where information may be absent.
Schema is too loose. If half the fields are optional and the rest are stringly-typed, the schema does not constrain anything and you are back to free text wrapped in JSON. Fix: require everything that has a structural meaning, and use enums for categorical fields.
Cross-model inconsistency. Different providers implement constrained decoding with subtly different semantics. The same schema produces slightly different shape across providers. Fix: validate at the application layer regardless of which provider is in use; treat provider-level enforcement as best-effort.

The mistake we see most often: teams treat structured outputs as an optional optimization, get to production with free-text-and-parse, and then spend a quarter chasing parse failures. Lock in structured outputs in week one; everything downstream is easier.

MCP: agent calling tools

The Model Context Protocol is the open standard Anthropic published in late 2024 for exposing tools and resources to AI agents. The mental model is simple: an MCP server exposes a set of tools (and, optionally, resources and prompts) over a JSON-RPC connection; any MCP-aware client (an agent, a Claude Desktop session, an IDE extension) can connect, discover the available tools, and call them. The full specification lives at modelcontextprotocol.io.

The wire format is JSON-RPC 2.0. A tool call from an agent looks like this:

// Agent calls "execute_sql" on a Postgres MCP server
{
  "jsonrpc": "2.0",
  "id": 17,
  "method": "tools/call",
  "params": {
    "name": "execute_sql",
    "arguments": {
      "query": "select id, domain from companies where icp_match = true limit 50"
    }
  }
}

// Server response
{
  "jsonrpc": "2.0",
  "id": 17,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "[{\"id\":1,\"domain\":\"acme.example\"}, ...]"
      }
    ],
    "isError": false
  }
}

The server side defines the tool with a JSON Schema for its input and a structured output:

// MCP server defining a tool
import { Server } from "@modelcontextprotocol/sdk/server/index.js"

const server = new Server({ name: "postgres", version: "1.0.0" })

server.setRequestHandler("tools/list", async () => ({
  tools: [{
    name: "execute_sql",
    description: "Execute a read-only SQL query against the production database",
    inputSchema: {
      type: "object",
      additionalProperties: false,
      required: ["query"],
      properties: {
        query: {
          type: "string",
          description: "A SELECT statement; mutations are rejected at the server",
        },
      },
    },
  }],
}))

server.setRequestHandler("tools/call", async (req) => {
  if (req.params.name !== "execute_sql") {
    throw new Error(`unknown tool: ${req.params.name}`)
  }
  const rows = await runReadOnlyQuery(req.params.arguments.query)
  return {
    content: [{ type: "text", text: JSON.stringify(rows) }],
    isError: false,
  }
})

Why this matters for multi-agent systems:

Uniform tool surface across agents. Every specialist in your fleet sees the same tool surface for the same data source. The lead-discovery agent and the qualification agent both call the same execute_sql MCP tool with the same shape. Swapping a specialist from one model to another does not break tool access; the tools live behind MCP, not inside the agent prompt.
Auth and credentials live with the server. The agent does not hold database credentials, API keys, or service tokens. The MCP server holds them. This is the security posture you want anyway, and it falls out naturally from the protocol.
Discoverability. An agent can ask an MCP server what tools it exposes (tools/list), what resources it exposes (resources/list), and what prompts it offers (prompts/list). New tools appear automatically; old tools deprecate cleanly.
Server-side authorization. The MCP server is the place where tool-level authorization lives. "This agent can read the companies table but not the contacts table" is enforced at the server, not at the prompt. Prompt-level authorization is a security theater pattern; server-level authorization is real.

When MCP works:

A tool surface is shared across multiple agents.
Tools have non-trivial auth requirements (database, third-party APIs, file systems with permissions).
The system spans multiple processes or hosts and needs a stable RPC contract.
You want to swap models without rewriting tool integrations.

When MCP fails:

Single-agent, single-process workloads. If you have one agent calling one library function in the same process, MCP is overkill. Just call the function. The protocol overhead is not justified.
Latency-critical inner loops. MCP adds a JSON-RPC round-trip per tool call. For tools called hundreds of times per second inside a tight loop (rare in agent workloads, but it happens in some retrieval patterns), the overhead matters. Benchmark before deciding.
Highly streaming tools. MCP supports notifications and progress, but tools that produce large streaming output (audio, video, very large file content) are awkward. The protocol favors request-response semantics.

The mistake we see most often: teams skip MCP and let each agent import its own SDKs. By month three, every specialist has its own copy of the database client, three copies of the same auth wrapper, and a quarter of the codebase is "credentials in agent prompts" code that the security review will reject. Centralize tool access. Five hours wrapping a service in MCP saves fifty hours of incident response later.

A note on the implementation surface: MCP servers can be local subprocesses (spawned over stdio), local network services (Unix socket or HTTP), or remote network services (HTTP with auth). The protocol is the same; the transport differs. For local development, stdio servers are usually right; for production deployments, HTTP servers behind an auth proxy are usually right.

A2A: agent calling agents

Agent-to-Agent is Google's open protocol (published in draft form in late 2024, broadly adopted through 2025-2026) for agents to discover, authenticate to, and call other agents over the network. Where MCP is "agent calls a tool," A2A is "agent calls another agent." The spec lives at a2aprotocol.org.

The semantic difference is important. A tool is a function that returns a result; an agent is a stateful actor that can plan, call its own tools, take time, fail in interesting ways, and produce a result that may itself be the work product, not just data. A2A acknowledges that calling an agent is fundamentally different from calling a tool, and the protocol surfaces concepts that MCP does not need: long-running tasks, streaming intermediate updates, agent capability discovery, multi-turn negotiation.

A capability discovery exchange in A2A:

// Caller asks a remote agent what it can do
GET /a2a/v1/capabilities

// Response
{
  "agent_id": "enrichment.partner.example",
  "capabilities": [
    {
      "name": "enrich_company",
      "description": "Enrich a company record with revenue, headcount, tech stack",
      "input_schema": {
        "type": "object",
        "required": ["domain"],
        "properties": { "domain": { "type": "string" } }
      },
      "output_schema": {
        "type": "object",
        "properties": {
          "revenue_estimate": { "type": "number" },
          "headcount_estimate": { "type": "number" },
          "tech_stack": { "type": "array", "items": { "type": "string" } }
        }
      },
      "max_runtime_seconds": 30,
      "supports_streaming": true
    }
  ]
}

A task invocation:

// Caller invokes a long-running task on the remote agent
POST /a2a/v1/tasks

{
  "capability": "enrich_company",
  "input": { "domain": "acme.example" },
  "callback_url": "https://caller.example/a2a/callbacks/abc123",
  "auth": { "scheme": "bearer", "token": "..." }
}

// Synchronous response with task id
{
  "task_id": "task_01HXYZ",
  "status": "running",
  "estimated_completion": "2026-04-30T14:22:18Z"
}

// Asynchronous callback when the task completes
POST https://caller.example/a2a/callbacks/abc123

{
  "task_id": "task_01HXYZ",
  "status": "completed",
  "output": {
    "revenue_estimate": 50000000,
    "headcount_estimate": 240,
    "tech_stack": ["aws", "snowflake", "salesforce"]
  },
  "evidence": [
    { "claim": "headcount_estimate", "source": "https://acme.example/about" }
  ]
}

Why A2A exists separately from MCP:

Trust boundaries. When you call an agent that is not yours — a partner's agent, a vendor's agent, a customer's agent calling into yours — the call is across an organizational trust boundary. A2A surfaces auth, identity, and authorization as first-class concerns of the protocol. MCP assumes trusted clients; A2A does not.
Long-running work. A tool returns in milliseconds to seconds; an agent run can take minutes. A2A handles long-running tasks natively, with task IDs, polling, and callbacks. MCP can be retrofitted for this but the ergonomics are awkward.
Capability discovery, not just tool discovery. An agent might expose a hundred internal tools but only three capabilities a caller cares about ("enrich a company," "score a lead," "draft an outreach"). A2A discovery operates at the capability level; MCP discovery operates at the tool level. Different abstractions.
Negotiation and clarification. A capable A2A agent can ask the caller for clarification, request additional inputs, or propose alternatives. MCP tool calls are one-shot. A2A is conversational.

When A2A works:

An agent in your runtime calls an agent in a different runtime, organization, or trust boundary.
The remote agent's work is long-running (seconds to minutes), produces evidence, and may need to ask for clarification.
You want to expose your own agent as a callable capability to partners or customers.

When A2A fails:

Single-runtime foreman pattern. If your foreman and specialists live in the same process, you do not need A2A. Call the specialist as a function. The protocol overhead is not justified by the trust-boundary benefits.
Synchronous hot paths. A2A's task model assumes some latency budget. For a hot-path call where you need a sub-second response, the protocol's polling/callback semantics are wrong fit.
When the remote "agent" is actually a tool. If the remote thing has a deterministic input/output shape and no judgment, it is a tool, not an agent. Wrap it in MCP, not A2A.

A note on the maturity curve: as of mid-2026, A2A adoption is widespread among the major agent frameworks but the spec is still settling on edge cases (auth schemes, error vocabularies, capability versioning). We expect another year of refinement. Use it where it fits, but expect to upgrade implementations as the spec stabilizes.

When to use which: a trade-off matrix

Boundary	Protocol	Why
Model → application	Structured outputs	Foundation. Every other protocol assumes this works.
Agent → tool (in-process)	Direct function call	No protocol needed; just import.
Agent → tool (cross-process, same trust)	MCP	Uniform tool surface, swappable models, server-side auth.
Agent → tool (cross-org)	MCP over HTTPS with auth	Same benefits, with TLS and bearer tokens.
Agent → agent (same runtime)	Direct function call (foreman pattern)	Specialist is a function; foreman is the caller.
Agent → agent (cross-runtime, same org)	A2A or MCP-wrapped	A2A for long-running, MCP for short request-response.
Agent → agent (cross-org)	A2A	Trust boundary, capability discovery, long-running work.
Public agent (customers call in)	A2A	Standard contract for inbound capability calls.

A short way to decide:

Are you calling a function that returns data? That is a tool. Use MCP if the function lives outside your process; direct call if it lives inside.
Are you calling a thing that plans, takes time, and produces work? That is an agent. Use A2A if it lives outside your trust boundary; direct call if it is a specialist in your foreman pattern.
Are you enforcing a shape on a model's output? That is structured outputs. Always.

How they fit together: a worked example

The 4Sales pipeline, which we run on this stack daily, uses all three protocols in their natural places.

Structured outputs everywhere. Every specialist (lead-discovery, qualification, contact-selection, personalization, reply-handler) has a typed output schema enforced at both the model layer and the application layer. The foreman validates every result against the schema before deciding the next step. There is no free text on any agent boundary.

MCP for tool access. A handful of MCP servers expose the data and tools every agent needs:

A database MCP server exposes execute_sql, list_tables, apply_migration — auth and authorization enforced server-side, agent prompts contain no credentials.
A search MCP server exposes search_web, fetch_page — rate-limiting and cost tracking server-side.
A memory-graph MCP server exposes read_cypher, write_cypher — every agent reads from and writes to the same graph through the same protocol.
A browser-automation MCP server exposes launch_session, navigate, extract — session management and credentials live with the server.

Specialists call these tools the same way regardless of which model is bound to them. When we swap a specialist's model from one provider to another, no tool integration changes.

A2A for cross-trust calls. The foreman optionally calls an external enrichment partner's agent for company-level enrichment when the partner's data is better than ours. That call goes over A2A, with capability discovery, async task IDs, and callbacks. The partner cannot see our internal tools; we cannot see their internal tools. The contract is the published capability schema.

The result is a stack where the protocols sit at the right layer, the abstractions fit the boundary they are crossing, and swapping any one component (a model, a tool implementation, a partner) does not cascade into the others. That is the property that distinguishes a maintainable multi-agent system from a stack of bespoke glue.

Migration notes for existing systems

If you have a multi-agent system already running and want to adopt these protocols, three migrations to plan in this order.

Migrate to structured outputs first. This is the highest-leverage change and it has the lowest risk. Define schemas for every existing agent boundary, validate on receipt, log validation failures as first-class events. Within a week you will know which boundaries were silently producing malformed output; within a month you will have eliminated a class of regression you did not know you had.

Migrate to MCP for the tools that span multiple agents. Pick the data source or tool family that is used by more than two agents and put it behind an MCP server first. The wins compound — every additional agent that adopts MCP for that tool simplifies the system more. Order your migration by the number of consumers, not by the simplicity of the tool.

Adopt A2A only when you cross a trust boundary. If your whole system is in one trust boundary, you do not need A2A. Adopt it the first time you call an external agent, or the first time you expose your own agent for external callers. Until then, A2A is solving a problem you do not have.

A common mistake during migration: replacing direct function calls between agents in the same runtime with A2A "for consistency." This is overengineering. The function call has a faster path, simpler debugging, and no network failure modes. Use A2A only across trust or runtime boundaries.

What "good" looks like

A multi-agent system with all three protocols correctly placed has these properties at run time:

Every agent's output is parseable by a schema validator with zero errors over millions of runs.
Every tool call lives in an MCP server log with full request, response, latency, and auth context.
Every cross-trust call lives in an A2A task log with task ID, capability invoked, status transitions, and final result.
Swapping a model on a specialist takes minutes (change the binding, run the regression suite); rewriting a tool integration is unnecessary because the tool lives behind MCP.
Onboarding a partner agent takes a day (publish capabilities, set up auth); rewriting your foreman is unnecessary because A2A is a stable contract.

That set of properties is what production multi-agent looks like in 2026. The protocols are the boring infrastructure that makes the interesting work tractable.

For the broader architectural picture these protocols sit inside, see our how to build a multi-agent AI system guide and the foreman / manager pattern explainer. For the vocabulary they plug into, see our multi-agent orchestration glossary entry. For the ecosystem of frameworks (CrewAI, LangGraph, AutoGen, and others) that build on top of these protocols, see our top agentic AI frameworks compared piece.

The ten years of distributed-systems wisdom that says "stable contracts beat clever abstractions" applies here too. Pick the boring protocol. Wrap your tools. Type your outputs. Reach for A2A when you cross a trust boundary, and not before. The systems that survive 2026 in production are the ones that did this in week one.