Data Product Versioning in Open Data Infrastructure

Versioning a data product by schema alone is like versioning a contract by font size.

The practical problem

Data products are becoming infrastructure interfaces. Dashboards, services, agents, evaluation sets, and downstream models depend on them. That means versioning has to describe more than table columns.

Open Data Infrastructure gives teams the pieces: table snapshots, schemas, catalogs, lineage, semantic definitions, policy metadata, and consumer contracts. The architecture has to turn those pieces into a versioning model.

Version the contract, not only the table

A useful data product version should cover schema, grain, semantics, owner, policy, freshness, lineage, quality checks, evaluation sets, and compatibility rules. Some of those change independently. A schema can stay stable while a metric definition changes. A policy can change while the files stay the same.

Iceberg snapshots help represent table state. dbt semantic models can represent metric and entity meaning. OpenLineage can connect runs and datasets. The data product version should connect those artifacts into one consumer-facing contract.

Core idea: data product versioning is the discipline of making change visible to every consumer that depends on the product.

Consumers need migration paths

Breaking change policy matters. A new version should explain whether consumers need code changes, semantic changes, access review, evaluation refresh, or retraining. It should also explain how long the old version stays available.

This is especially important for agents. An agent tool may depend on field names, examples, failure behavior, and evaluation cases. If the data product changes, the agent contract changes too.

What breaks first

Schema versions exist, but metric meaning changes without a version bump.
Policy changes alter who can consume the product, but consumers receive no migration notice.
Lineage points to the latest table state but not the version an answer used.
Evaluation sets are not tied to the data product version they validate.

Questions to ask

Which artifacts define a data product version?
Which changes are breaking for humans, services, and agents?
How long are old versions supported?
Can an audit record show which version shaped a downstream answer?

Sources to start with

These primary sources anchor the technical claims in this guide.

A data product version should tell every consumer what changed and why it is still safe to trust.

ODI hub Article library Use the scorecard Agentic data products Data product SLAs SQLMesh Plans as Data Change

Get started with Apache Iceberg, today! Want to learn more? Visit https://www.opendatainfrastructure.com/