ODI Reference Architecture for Agents

An agent with tools is not an architecture. It is an architecture-shaped risk until the data layer can explain what the agent is allowed to know and do.

The practical problem

Enterprise agents need to retrieve context, query data, call tools, respect permissions, cite sources, and leave an audit trail. Most stacks handle those responsibilities in separate places. The agent gets a tool list and a prayer.

An ODI reference architecture gives the agent a governed path from question to data to action. The important move is not adding one more agent framework. It is making the data contracts explicit enough for autonomous systems to use.

Core idea: agentic AI needs a governed context plane between the model and the data estate.

The layers

A practical architecture has seven layers.

Open table layer: Iceberg, Parquet, and related contracts that preserve data state outside one vendor boundary.
Catalog layer: table identity, namespaces, operations, credentials, and access coordination.
Metadata and lineage layer: source, ownership, transformations, quality, freshness, and downstream dependencies.
Policy layer: identity, purpose, row and column constraints, masking, obligations, and denials.
Context graph layer: links among entities, metrics, policies, lineage, and allowed data products.
Retrieval and query layer: search, SQL, vector retrieval, embedded query execution, and result shaping.
Agent tool layer: bounded actions exposed through APIs, MCP servers, or workflow services.

The request flow

A good request flow starts with identity and purpose. Who is asking? Which agent is acting? What task is being performed? Which policy applies before retrieval begins?

The system then resolves context. It chooses approved data products, retrieves metadata and lineage, checks freshness and quality signals, scopes tool access, and executes queries through the appropriate engine. The final answer or action should carry source references and audit evidence.

That sounds heavier than a direct SQL tool because it is heavier. Production autonomy deserves more than a clever prompt wrapped around broad credentials.

Failure modes

The first failure is tool sprawl. Every team adds a tool, each tool handles identity differently, and the model becomes the only integration layer.

The second failure is retrieval without governance. The agent finds relevant context but ignores policy, freshness, or source reliability.

The third failure is no replay. After a bad answer, the team cannot reconstruct the prompt, tool call, data slice, policy state, or lineage path.

Questions to ask

Which layer decides whether an agent can access a dataset?
How does the agent receive freshness, quality, policy, and lineage?
Which tool calls require approval, limits, or human review?
Can the system replay a data-backed answer after the fact?
Can a new model or agent framework use the same governed context plane?

Sources to start with

Start with the primary docs. They are the contracts you can test against, not commentary about the contracts.

ODI for agents MCP governed data layer ODI hub

Get started with Apache Iceberg, today! Want to learn more? Visit https://www.opendatainfrastructure.com/

Open Data Infrastructure Reference Architecture for Agents