Context Graphs for Data Product Discovery

Search can find a table. Discovery has to find the right data product for the job, with enough context to avoid a confident wrong answer.

The practical problem

Data product discovery is not keyword search with nicer cards. A consumer needs meaning, owner, grain, freshness, policy, lineage, quality, examples, and known failure modes. Agents need all of that too, but they need it in machine-usable form.

A context graph can connect those pieces. DataHub, OpenMetadata, OpenLineage, and W3C PROV all point toward the same underlying need: relationships matter when you assess trust.

The graph should connect decisions

A useful discovery graph connects business terms to data products, data products to tables, tables to owners, owners to policies, policies to access decisions, and lineage to downstream consumers. Examples and evaluation cases should sit in the graph too.

For agents, this means discovery can ask better questions. Which data product answers this intent? Is the product certified for this use? Which policy applies? Which examples show safe usage? Which related product is better for a different grain?

Core idea: A context graph is useful when it turns discovery into a trust decision, not only a search result.

The ODI discovery model

Open Data Infrastructure needs discovery that survives platform boundaries. The graph should not depend on one warehouse, one catalog UI, or one BI tool. It should connect open metadata, lineage, policy, and data product contracts.

This also reduces agent prompt pressure. The agent does not have to infer business meaning from table names when the graph can expose meaning directly.

What breaks first

Discovery returns tables without ownership, grain, freshness, or policy context.
Agents choose popular data products instead of appropriate data products.
Business terms exist in a glossary but are not connected to tables, examples, or contracts.
Lineage exists after the fact but does not guide discovery before use.

Questions to ask

Ask what relationships the discovery layer can answer. Ask whether agents can inspect owner, grain, policy, freshness, examples, and downstream impact before selecting a data product.

Sources to start with

These primary sources anchor the technical claims in this guide.

Discovery should help the agent choose the right data product, not merely the nearest table name.

ODI hub Article library Use the scorecard Incident graphs Metadata layer

Get started with Apache Iceberg, today! Want to learn more? Visit https://www.opendatainfrastructure.com/