Data Modeling with Semantic Identifiers for Agents

Agents are terrible at business context when every system names the same thing differently.

Agents need stable identifiers

Data modeling for agents starts with identity. A customer, policy, account, order, device, document, and derived feature all need identifiers that remain stable enough for retrieval, lineage, authorization, and reasoning.

Primary keys help inside one system. Semantic identifiers help across systems. They tell the platform what entity a record represents, what domain owns it, which source is authoritative, and which policies apply when an agent uses it.

Semantic identifiers carry meaning

DataHub uses URNs as a resource identity pattern for metadata entities. W3C PROV provides a model for representing provenance across systems. dbt model contracts can define the shape of returned datasets. Those are different tools, but they point at the same architectural need: stable identity and explicit meaning.

A useful semantic identifier is not just a string. It has namespace, entity type, source system, lifecycle state, owner, and policy context. It can be joined to lineage and entitlement data without relying on fuzzy matching.

Core idea: agents reason over context better when data models expose stable business identity, not only storage identity.

The model should cross systems

For related ODI context, read identity resolution for agentic systems, temporal entity memory, and context graphs for AI.

The model should connect entities, events, documents, policies, features, and derived tables. That connection should be explicit enough for agents to ask which source is authoritative and which identifier should be used in a tool call.

What breaks first

Different systems use different IDs for the same business entity.
Entity resolution happens in app code instead of shared metadata.
Derived features lose the source entity identity that produced them.
Policies attach to tables while agents reason about business objects.

Identifier questions

Ask what the identifier means, who owns it, where it is authoritative, how it changes, and which policies follow it. If the answer is hidden in one application, agents will guess.

Sources to start with

These primary and authoritative sources anchor the claims in this guide.

A model without stable identifiers asks agents to build business meaning from crumbs.

ODI hub Article library Use the scorecard Identity resolution Temporal entity memory Context graph

Get started with Apache Iceberg, today! Want to learn more? Visit https://www.opendatainfrastructure.com/