Open Data Infrastructure
Open Data Infrastructure Observability Scorecards for AI
How ODI scorecards connect freshness, lineage coverage, access decisions, catalog ownership, query behavior, and agent evaluations.
AI observability without data observability is a very expensive blind spot.
The scorecard should span the data path
AI teams often measure model latency, cost, and answer quality. Those metrics matter, but they do not explain whether the data path was trustworthy. An ODI observability scorecard should connect freshness, lineage coverage, access decisions, catalog ownership, query behavior, and agent evaluations.
The scorecard is not another dashboard for people to ignore. It is a compact way to ask whether the infrastructure behind AI can be trusted under change.
Good signals connect systems
Freshness signals should connect to sources and products. Lineage coverage should connect to transformations and answers. Access decisions should connect to identity and purpose. Query behavior should connect to cost and latency. Evaluations should connect to source evidence, not only final text.
OpenTelemetry can help with system signals. OpenLineage and DataHub can help with lineage signals. OpenAI evals can help test answer behavior. The scorecard exists to put those signals in one operating conversation.
Core idea: AI observability needs to measure the infrastructure that creates the context.
The ODI pattern makes scorecards portable
Open Data Infrastructure keeps data, metadata, policy, and lineage from being trapped inside one platform. That matters because AI observability scorecards should survive changes in model provider, retrieval system, catalog, or serving engine.
For related context, see the ODI scorecard, ODI control loops for data products, and metadata SLAs for AI. Scorecards are the executive view of operating discipline.
What breaks first
A scorecard fails when every team reports its own truth and no one can connect the dots.
- Model evals pass while source freshness is outside the SLA.
- Access denials are logged but never tied to answer behavior.
- Lineage coverage is measured for pipelines but not retrieval indexes.
- Catalog ownership exists, but no scorecard asks whether owners reviewed risky changes.
Questions for leaders
Ask whether each AI product can show source freshness, lineage coverage, policy decisions, owner accountability, query behavior, and evaluation results on the same page.
The answer does not have to be perfect. It has to be honest enough to manage.
Sources to start with
These primary sources anchor the technical claims in this guide.
- OpenLineage object model documentation
- OpenTelemetry concepts documentation
- OpenAI evals guide
- NIST AI Risk Management Framework
The scorecard is useful when it shows whether the AI system can trust its own data path.