AI Agent Platform Architecture 2026: Reference Patterns + Layer Decomposition

Last updated April 2026

The architecture diagrams that circulated in 2023 and 2024 showed an AI agent as a single box: an LLM with a tool list and a memory blob. As of April 2026, that abstraction has fully collapsed under contact with production. A serious agent platform is no longer one component — it is a stack of seven distinct layers, each with its own vendors, its own failure modes, and its own build-vs-buy calculus.

This piece is for the technical buyer who has been told "we use LangGraph" or "we use CrewAI" and recognises that those statements describe one layer of seven. The question is not which framework is best. The question is which layers a given organisation should build, which it should buy, and how the layers compose into something that survives an AI Act audit, a 3am pager, and a model-provider price change.

We work on this stack daily. Knowlee is itself a layer-6 orchestrator that consumes the other six, so treat the analysis as informed but not neutral. Conflict-of-interest disclosure: we are competitors with several vendors named in this article, including LangGraph, CrewAI, AutoGen, and the Microsoft Agent Framework at the framework layer. Where Knowlee appears in comparisons, we have tried to describe it the way a sceptic would.

We will walk through the seven layers, the dominant patterns inside each, and a build-vs-buy heuristic per layer. We will close with how the layers compose in practice and which decisions are hardest to reverse.

The seven-layer reference architecture

The layering below is the one we use internally and the one we see emerging in vendor decks, RFPs, and recent reference architectures from analysts. It is not standardised — there is no agentic OSI model — but the seven boxes recur often enough that engineering teams converge on roughly the same decomposition.

Layer 1: Model layer

This is the foundation model itself. As of April 2026 the production-relevant choices are Anthropic Claude (Sonnet 4.x and Opus 4.x lines), OpenAI GPT-5 family, Google Gemini 2.x, Meta Llama 4 (open-weights), Mistral, and a long tail of specialised open-weight models for code, embedding, and reasoning. Per the April 2026 LMSYS Chatbot Arena leaderboard and provider release notes, the frontier has six contenders within roughly the same elo band; differences are now measured in tool-use reliability and long-context recall, not raw IQ.

What lives here: weights, tokenizer, native tool-calling protocol, native multimodal support, native long-context behaviour. Nothing else.

What does not live here: any abstraction that lets you swap models without touching code. That is layer 2.

Layer 2: Inference and gateway layer

The gateway layer sits between your application and the model. It exposes a unified API, handles model routing, enforces budget caps, retries on provider outages, and logs every request for cost and audit. The dominant choices in April 2026 are Vercel AI Gateway, OpenRouter, LiteLLM (open-source proxy), Portkey, and AWS Bedrock as a hyperscaler-flavour gateway.

What lives here: the contract that says "I want to call a chat completion with these messages and these tools, and I do not care which provider serves it." Failover, BYOK, regional routing, prompt-cache hit ratios, and per-tenant quotas all belong here.

What does not live here: agent loops, tool definitions, or memory. The gateway treats each request statelessly.

Layer 3: Framework layer

This is where the agent loop lives. The framework decides how a model decision becomes a tool call, how tool output gets folded back into context, when the loop terminates, and how parallelism works. The April 2026 incumbents:

LangGraph (LangChain Inc) — graph-based orchestration, durable execution via LangGraph Platform, strong on stateful multi-actor flows. Per the LangGraph documentation as of April 2026, it remains the most-adopted framework in enterprise pilots.
CrewAI — role-based multi-agent abstractions, opinionated on the "team of specialists" mental model.
Microsoft Agent Framework — the unification of AutoGen and Semantic Kernel announced October 2025 and now generally available; tightly integrated with Azure AI Foundry.
Vercel AI SDK — TypeScript-first, agent-loop primitives via Agent / ToolLoopAgent, optimised for streaming UI.
OpenAI Agents SDK and the open Swarm pattern — lightweight, model-provider-native, less opinionated.
Pydantic AI — Python-typed-first, gaining ground in data-engineering teams.

What lives here: the loop, the tool schema, structured output, retries on tool failures, and the streaming format that gets piped to a UI.

What does not live here: scheduling, multi-tenant isolation, persistent memory, or anything that needs to survive a process restart unless the framework explicitly offers durable execution (LangGraph Platform, Temporal, Vercel Workflow DevKit).

Layer 4: Memory layer

Two distinct components live under this label and conflating them is the most common architectural mistake we see in 2026 RFPs.

The first is short-term and episodic memory: conversation summaries, recent tool outputs, working scratchpads. Mem0, Letta (formerly MemGPT), and Zep are the leading vendors. Vector databases — Pinecone, Weaviate, Qdrant, pgvector inside Postgres — sit underneath as storage substrate.

The second is the world model: the entities the organisation cares about (companies, contacts, deals, candidates, patients, projects), their relationships, and the historical signals attached to them. This is a knowledge graph. Neo4j is the dominant production substrate; ArangoDB, Memgraph, and TigerGraph compete; lightweight options like Kuzu and the embedded graph in DuckDB are emerging.

What lives here: anything an agent needs to retrieve that is older than the current conversation. RAG pipelines, embedding indexes, graph traversal queries.

What does not live here: tool execution, reasoning loops, governance metadata.

Layer 5: Tool and MCP layer

Anthropic's Model Context Protocol (MCP), introduced November 2024 and now broadly adopted, has become the lingua franca for tool exposure. As of April 2026, Microsoft, OpenAI, Google, and most major platform vendors ship native MCP server or client support, per the official MCP specification at modelcontextprotocol.io.

The layer-5 question is: how does an agent get to your databases, your SaaS APIs, your browser automation, your filesystem, your search backends? In 2024 the answer was bespoke tool definitions per framework. In 2026 the answer is increasingly: an MCP server in front of each capability, and the framework speaks MCP.

What lives here: MCP servers (@modelcontextprotocol/server-* reference implementations, vendor-published servers like Supabase MCP, GitHub MCP, Slack MCP), tool routing cascades (cheap-first, expensive-fallback), tool authorisation, rate limits, and the audit log of which agent invoked which tool with which arguments.

What does not live here: the agent loop or the model. MCP is deliberately transport-agnostic.

Layer 6: Orchestration layer

This is the layer where individual agents stop being individual. The orchestration layer manages the fleet: which agents exist, when they run, how they share state, how the operator sees what is happening, how human review gets injected, how flashcards or proposals turn into approved tasks. It is the cockpit.

Early-2026 entrants here include LangGraph Platform (positioning up from layer 3), CrewAI Enterprise, Knowlee OS (our product), and a long tail of internal platforms built by hyperscaler customers on top of Step Functions, Temporal, or Argo Workflows.

What lives here: the kanban or job board, the schedule registry, multi-tenant isolation, the approval workflow, cross-agent state, durable execution semantics, and the human-in-the-loop user interface.

What does not live here: the agent itself, the model, or the memory store. The orchestrator delegates.

Layer 7: Governance and observability layer

The last layer is the one that turned from "nice to have" into "non-negotiable" between October 2025 and April 2026, as the EU AI Act's general-purpose AI obligations took effect on 2 August 2025 and the high-risk system obligations approach their August 2026 trigger date.

What lives here: the trace store, the cost ledger, the prompt and response archive, the per-run risk classification, the human-oversight flag, the approver identity, the technical documentation register required by AI Act Article 11, and the post-market monitoring required by Article 72.

The vendors splitting this layer in April 2026 include Langfuse, Helicone, LangSmith, Arize Phoenix, Braintrust, OpenTelemetry-native stacks (Honeycomb, Datadog, Grafana Tempo) increasingly extended with the OpenTelemetry GenAI semantic conventions which reached stable status in late 2025.

What does not live here: any decision-making logic. Governance observes; orchestration decides.

Patterns within each layer

Each layer has a small number of recurring patterns that any architecture review should explicitly choose between.

At the model layer the dominant pattern is multi-model routing: a cheap, fast model handles classification and routing decisions, a frontier model handles the hard reasoning, and a small specialised model handles embeddings or structured extraction. The anti-pattern is single-model lock-in, which makes the gateway layer pointless.

At the gateway layer the dominant pattern is unified BYOK with provider failover. You bring your own provider keys, the gateway handles fallback when one provider degrades, and the gateway emits OpenTelemetry spans the governance layer can ingest. The anti-pattern is calling provider SDKs directly from application code, which means every model swap is a code change.

At the framework layer three patterns dominate. First, the single-agent tool loop — one agent, many tools, ReAct-style iteration. Second, the supervisor or foreman pattern — one orchestrator agent that delegates to specialists, popularised by AutoGen and now standard in Microsoft Agent Framework. Third, the graph or workflow pattern — explicit nodes and edges, durable execution, audit-friendly. LangGraph and Vercel Workflow DevKit lean here. The wrong pattern is "many agents talking to each other in free chat," which sounds good in demos and falls apart in production.

At the memory layer the dominant split is RAG-for-content vs graph-for-entities. Embedding-and-retrieve handles documents, transcripts, articles, and unstructured prose. Graph traversal handles "which companies introduced which contacts to which deals" — the relational questions that vectors handle badly. Mature platforms run both and route per question type.

At the tool and MCP layer the pattern that emerged across 2025 is the routing cascade: try the cheapest viable tool first, fall back to the next tier on failure. For scraping, the cascade is something like cheap fetch → headless browser → captcha-solving managed browser. For search, it is local search index → SaaS search API → live web crawl. The cascade is encoded in code or in the orchestrator, not in the prompt.

At the orchestration layer the dominant pattern is the kanban or job board as single source of truth. Every running agent, every scheduled job, every pending approval shows up on one surface. Side queues — a separate flashcards table, a separate alerts inbox, a separate review folder — are the anti-pattern; they fragment the operator's mental model.

At the governance layer the pattern is end-to-end OpenTelemetry tracing with provenance: every agent decision links back to the model call that produced it, the tools it used, the data categories those tools touched, and the human (if any) who approved the run. The anti-pattern is logging at the application-log level only, which gives you sentences but not causality.

Build vs buy by layer

For each layer, the same heuristic works: build only where the layer is strategic differentiation. Buy where it is undifferentiated infrastructure.

Layer 1 (model): never build. Frontier models cost north of $100M per training run as of April 2026 per public statements from Anthropic, OpenAI, and Google. No customer of an agent platform should be training their own frontier model. Open-weight fine-tunes for narrow tasks are reasonable; pretraining is not.

Layer 2 (gateway): buy unless multi-region data residency drives a custom build. Vercel AI Gateway, OpenRouter, and LiteLLM are mature, cheap, and well-instrumented. Building a gateway from scratch is roughly six months of engineering effort that you will spend reinventing retries, circuit breakers, and cost ledgers. Build only if a regulator forces the LLM call to stay inside specific infrastructure boundaries.

Layer 3 (framework): buy or adopt open-source. LangGraph, CrewAI, Microsoft Agent Framework, Vercel AI SDK, and the OpenAI Agents SDK each cover the loop adequately. The framework is not where differentiation lives in 2026 — every serious framework can express the same agent. Build only if you are inventing a new agent paradigm and can articulate why none of the incumbents works.

Layer 4 (memory): hybrid. Buy the substrate (Mem0, Letta, Zep, Neo4j, pgvector). Build the schema. Your knowledge graph schema — what entities, what relationships, what signals — is differentiating because it encodes how your business reasons about the world. The vendor gives you a graph engine; you give the engine its meaning. The opposite mistake is buying a vendor's pre-baked schema and discovering it does not fit your business.

Layer 5 (tools and MCP): hybrid. Buy or use vendor-published MCP servers for commodity tools (GitHub, Slack, Google Workspace, Supabase, browser automation). Build MCP servers for your proprietary systems — your CRM, your data warehouse, your internal services. The MCP spec is now stable enough that an internal MCP server is roughly a week of work for a competent engineer.

Layer 6 (orchestration): this is the layer where the build-vs-buy question is hardest. Building an orchestration layer means building a job scheduler, a kanban UI, a flashcard or approval queue, a multi-tenant isolation model, a session manager, and a durable-execution layer. That is a year-plus of engineering. Buying means accepting another vendor's mental model for what an "agent" is and how it is reviewed. As of April 2026 the buy options are LangGraph Platform (graph-shaped), CrewAI Enterprise (crew-shaped), Knowlee OS (operator-cockpit-shaped), and the hyperscaler workflow engines (workflow-step-shaped). Pick the shape that matches how your operators actually think about the work; do not let vendor terminology dictate your operating model.

Layer 7 (governance): buy the trace store, build the policy. Langfuse, Helicone, LangSmith, Arize Phoenix, Braintrust, and the OpenTelemetry GenAI stacks each handle ingestion, search, and replay. None of them encodes the AI Act risk-classification decisions you must make on every job — that is your policy, not a SaaS configuration. Build a thin policy layer on top of a bought trace store. The combination is what an external auditor under AI Act Article 16 will ask to see.

A hard rule that recurs across the seven decisions: avoid the layer-6-and-7 build that started life as a layer-3 demo. Many platforms now in production were originally a single LangChain script; the orchestration and governance layers were retrofitted under deadline. The retrofit is always more expensive than designing the layers as separate concerns from week one.

How Knowlee composes the seven layers

Conflict-of-interest disclosure restated: Knowlee competes with several layer-6 vendors. The description below is how we have built Knowlee, not an objective recommendation that you choose Knowlee.

Knowlee is a layer-6 orchestrator with a deliberately thin point of view on layers 1, 2, 3, and 5, and a thick point of view on layers 4, 6, and 7.

At the model layer, Knowlee is provider-neutral. Operators configure which model handles which job in state/jobs.json; the platform itself does not require a specific provider.

At the gateway layer, Knowlee defers to whatever the operator brings — Vercel AI Gateway, OpenRouter, or direct provider keys. Per-job model selection is a metadata field, not a code change.

At the framework layer, Knowlee uses Claude Code as the runtime for type: "session" jobs. Each running agent is a real Claude Code child process spawned with its own PTY, its own prompt template, and its own MCP allowlist. Operators who want LangGraph or CrewAI inside a job script that out as a type: "script" job — Knowlee orchestrates the framework, it does not replace it.

At the memory layer, Knowlee takes a strong position. The "Brain" — a Neo4j knowledge graph — is the cross-vertical memory shared by every agent in every product. This is the layer where we believe the moat is: every agent run feeds the graph, every later agent reads from it, and the graph compounds. The episodic memory layer is delegated to the model provider's native long-context plus per-job summaries.

At the tool and MCP layer, Knowlee is MCP-native. Every external capability — Supabase, Neo4j, GitHub, Slack, browser automation, search backends — is an MCP server, configured in .mcp.json. Tool routing cascades (search → scrape → captcha-solve) are encoded as documented fallback orders. Bulk database writes are deliberately not done through MCP; that is a known limit of the protocol's tool-parameter size and we use a generator-and-loader pattern instead.

At the orchestration layer, Knowlee is the product. One kanban board backed by state/jobs.json shows every running agent, every scheduled job, every pending flashcard approval. Flashcards are draft kanban tasks, not a side queue. Every job carries risk_level, data_categories, human_oversight_required, and approver identity as first-class fields.

At the governance layer, Knowlee writes a streamed JSON transcript of every model decision and tool call to state/jobs/logs/, plus structured per-job reports to state/jobs/reports/. The schema is designed to map to AI Act Annex IV technical documentation and Article 12 record-keeping requirements. We integrate with external trace stores via OpenTelemetry export rather than reinventing the trace UI.

The tradeoff: an operator adopting Knowlee accepts our opinion that the kanban-and-jobs-registry shape is the right operator surface, and that the Neo4j Brain is worth maintaining. Teams that want a graph-of-nodes (LangGraph Platform) or a crew-of-roles (CrewAI Enterprise) operator surface should choose those instead.

Frequently asked questions

Are these layers always separate vendors?

No. A single vendor can span layers — Vercel ships a gateway (layer 2), a framework (layer 3 via the AI SDK), and a workflow engine (layer 6 via Workflow DevKit). LangChain Inc ships a framework (layer 3, LangGraph), an orchestrator (layer 6, LangGraph Platform), and a trace store (layer 7, LangSmith). What matters is that you can articulate which layer you are evaluating when you compare vendors. "LangGraph vs Vercel AI SDK" is a layer-3 question; "LangGraph Platform vs Knowlee" is a layer-6 question; they are not interchangeable.

Where does an MCP-native architecture differ from a tool-calling-native architecture?

Tool-calling-native architectures define tools per framework — a LangChain Tool, a CrewAI BaseTool, an OpenAI function. MCP-native architectures define tools as separate processes that any framework can consume. The tradeoff is one process boundary per tool group versus tighter integration. For 2026, MCP-native is the safer default because it survives a framework swap. Per the MCP specification at modelcontextprotocol.io, transport options now include stdio, HTTP, and Server-Sent Events.

How does the AI Act change layer 7?

The EU AI Act Articles 9 (risk management), 10 (data governance), 11 (technical documentation), 12 (record-keeping), 13 (transparency), 14 (human oversight), 15 (accuracy and robustness), 16 (provider obligations), and 72 (post-market monitoring) collectively require that high-risk AI systems produce a documented evidence trail spanning inputs, decisions, outputs, and human review. Per the European Commission's official AI Act timeline, the high-risk obligations apply from 2 August 2026. The practical impact on layer 7 is that traces must be retained, queryable, and tied to a risk classification — three things most 2024-vintage observability stacks do not do natively. Verify your trace retention against your DPO's interpretation; we are not lawyers.

What is the smallest viable architecture?

For a team of two engineers experimenting: layer 1 (one provider), layer 3 (one framework), layer 5 (a few tools), and rudimentary logging. Skip the gateway, skip the orchestrator, skip the formal governance layer. This works up to roughly five jobs and one operator. The architecture starts breaking when you add the second operator or the sixth recurring job — that is when layers 2, 6, and 7 stop being optional.

Should layer 4 always be Neo4j?

No. Neo4j is the dominant production graph as of April 2026, but pgvector inside an existing Postgres is sufficient for many use cases, and the graph layer can be deferred entirely if your domain does not have rich relational structure. The mistake is committing to a graph schema before you know which questions you will ask of it; the schema is the differentiating asset, not the engine.

How often does this stack change?

The layer boundaries have been stable since roughly Q2 2025. The vendors inside each layer churn quarterly — frameworks consolidate, gateways add features, MCP server catalogues grow. A reference architecture that names layers will age slowly. A reference architecture that names vendors will need refreshing every six months. Treat this article as the second kind: re-verify vendor positions against current documentation before committing.