Open Data Infrastructure
Data Modeling with Semantic Identifiers for Agents
Why agents need stable semantic identifiers across entities, events, policies, documents, and features before they can reason over context.
Agents are terrible at business context when every system names the same thing differently.
Agents need stable identifiers
Data modeling for agents starts with identity. A customer, policy, account, order, device, document, and derived feature all need identifiers that remain stable enough for retrieval, lineage, authorization, and reasoning.
Primary keys help inside one system. Semantic identifiers help across systems. They tell the platform what entity a record represents, what domain owns it, which source is authoritative, and which policies apply when an agent uses it.
Semantic identifiers carry meaning
DataHub uses URNs as a resource identity pattern for metadata entities. W3C PROV provides a model for representing provenance across systems. dbt model contracts can define the shape of returned datasets. Those are different tools, but they point at the same architectural need: stable identity and explicit meaning.
A useful semantic identifier is not just a string. It has namespace, entity type, source system, lifecycle state, owner, and policy context. It can be joined to lineage and entitlement data without relying on fuzzy matching.
Core idea: agents reason over context better when data models expose stable business identity, not only storage identity.
The model should cross systems
For related ODI context, read identity resolution for agentic systems, temporal entity memory, and context graphs for AI.
The model should connect entities, events, documents, policies, features, and derived tables. That connection should be explicit enough for agents to ask which source is authoritative and which identifier should be used in a tool call.
What breaks first
- Different systems use different IDs for the same business entity.
- Entity resolution happens in app code instead of shared metadata.
- Derived features lose the source entity identity that produced them.
- Policies attach to tables while agents reason about business objects.
Identifier questions
Ask what the identifier means, who owns it, where it is authoritative, how it changes, and which policies follow it. If the answer is hidden in one application, agents will guess.
Sources to start with
These primary and authoritative sources anchor the claims in this guide.
- DataHub concepts documentation
- DataHub metadata model documentation
- W3C PROV-O recommendation
- dbt contract configuration documentation
A model without stable identifiers asks agents to build business meaning from crumbs.