Data Modeling for Open Data Infrastructure

Data modeling used to be mostly about making BI work. In ODI, it is also about making meaning portable.

Why it matters

If models are engine-specific, portability dies. You can move files, but you cannot move understanding.

If models are not explicit, AI systems will make up meaning. They will pick the wrong grain. They will join the wrong keys. They will answer questions confidently and incorrectly.

The ODI angle

ODI forces you to separate storage models from semantic models. You want open table contracts at the storage layer, and clear, versioned semantic meaning on top.

That meaning can live in transformation tools, semantic layers, catalogs, and governance systems. The important part is that it can travel across tools and still mean the same thing.

If the semantic layer is trapped inside one BI tool, you are building a semantic lock-in machine.

Core idea: open data without open semantics is still dependency, just later in the stack.

The architecture test

For analytics engineers, the test is whether meaning survives contact with another engine and another agent tool.

Define canonical domain entities and name them consistently.
Treat metrics definitions as versioned code.
Make joins and keys explicit as part of the semantic contract.
Document modeling assumptions in a way retrieval can use.
Test modeling changes with data diffs, not just schema diffs.

What breaks first

This breaks when semantics are stored as UI configuration instead of shared contracts.

The semantic layer lives inside one BI tool, so nothing else can reuse it.
Keys and joins are tribal knowledge, so every new system breaks in a different way.
Data products do not have owners, so models decay.
AI systems answer questions from ambiguous tables because semantics are not encoded.

Questions to ask

Use these questions when you evaluate data modeling open data infrastructure as part of a portable data contract.

Where does metric logic live, and can it be reused across tools?
How do you ensure join keys and grain are explicit?
How do you keep semantics in sync with storage evolution?
Can you export semantic metadata into a context graph or catalog?
How do you test meaning changes when a model changes?

If you cannot answer those questions, you do not have a semantic layer. You have a semantic rumor.

Sources to start with

Start with the tools that encode meaning, then ensure you can export and reuse the contract.

ODI hub Article library Use the scorecard Semantic layers in the agent era Catalog vs metastore vs glossary

Get started with Apache Iceberg, today! Want to learn more? Visit https://www.opendatainfrastructure.com/