Data Modeling for Agentic Analytics

A dashboard can tolerate a vague data model longer than an agent can. The dashboard waits for a human to notice. The agent acts.

The practical problem

Traditional analytics models were built for people reading reports. A metric definition could live in a wiki, a join rule could live in an analyst's head, and a dashboard filter could quietly encode business logic.

Agentic analytics removes that buffer. If an autonomous system asks for churn risk, pricing context, customer health, or policy eligibility, the data model needs to carry meaning and constraints in machine-usable form.

Core idea: agentic analytics needs models that explain what a metric means, where it came from, who can use it, and which actions it can support.

The ODI boundary

Data modeling for agents sits between semantic meaning and infrastructure control. It includes entities, metrics, dimensions, relationships, grain, ownership, policy, lineage, and quality signals.

A semantic layer can define metrics and dimensions. Model contracts can define expected shape. Lineage can show where data came from. Policies can constrain use. ODI matters because those pieces have to work together instead of living in separate documents.

Patterns that work

Model entities before modeling prompts. Agents need stable business objects: customer, account, order, product, subscription, invoice, claim, device, asset, supplier. If the entity model is unclear, the agent will invent the join path with great confidence.

Make grain explicit. Many wrong analytics answers start with a grain mismatch: customer-level metric joined to account-level attributes, daily facts compared to monthly targets, or event-level behavior rolled into a profile without rules.

Attach policy and lineage to the model. If a metric is allowed for internal planning but not customer-facing automation, the agent should not learn that from vibes. It should receive a constraint.

Failure modes

The first failure is metric name matching. The agent sees "revenue" and chooses the closest field. That may be bookings, recognized revenue, gross revenue, net revenue, or a terrifying spreadsheet artifact from 2019.

The second failure is missing action boundaries. A model may be good enough for explanation but not good enough for automated outreach, pricing, approval, or denial.

The third failure is context-free freshness. A stale metric in a dashboard is annoying. A stale metric in an agent workflow can become a bad action.

Questions to ask

Which entities, grains, and relationships are canonical?
Which metrics are approved for agent use, and which are human-only?
Can the agent retrieve lineage, freshness, and policy with each metric?
Which model changes are breaking changes for downstream agents?
Can a human audit why the agent selected a metric and join path?

For related modeling material, read Data Modeling in Open Data Infrastructure, Migrating the Semantic Layer, and Context Graph vs Knowledge Graph.

Sources to start with

Start with the primary docs. They are the contracts you can test against, not commentary about the contracts.

ODI data modeling Context vs knowledge graph ODI for agents

Get started with Apache Iceberg, today! Want to learn more? Visit https://www.opendatainfrastructure.com/