Open Data Infrastructure
AI-Ready Data Quality Signals
AI-ready data quality is not just test pass or fail. Agents need source, freshness, uncertainty, policy, lineage, and fit-for-use signals.
A green dashboard freshness check does not mean an agent should use the data. It means one narrow test passed.
The practical problem
Data quality programs were often built around reports: freshness, row counts, null checks, uniqueness, and a few business rules. Those checks still matter. They are not enough for AI systems that retrieve context, compare evidence, generate explanations, and trigger actions.
AI-ready quality signals have to tell an agent whether data is usable for a specific task. The answer depends on source, lineage, uncertainty, policy, freshness, completeness, and the decision being made.
Core idea: data quality for agents is not a single score. It is a set of machine-usable signals tied to use, risk, and provenance.
The ODI boundary
ODI puts quality inside the infrastructure contract. A quality check that lives only in a dashboard does not help an agent decide whether to answer, refuse, escalate, or ask for more context.
The context layer needs signals such as source system, transformation path, last successful refresh, known incidents, contract version, policy status, sensitive-field tags, completeness warnings, and uncertainty. Some signals come from lineage. Some come from metadata systems. Some come from data tests. The agent needs a governed bundle, not a scavenger hunt.
Patterns that work
Separate health from fitness. A dataset can be healthy for weekly planning and unfit for real-time fraud response. The quality signal should name the intended use.
Expose freshness with context. "Updated at 8:05" is less useful than "source system updated at 8:05, transform completed at 8:12, policy review current, known incident none." Agents need the path, not just the timestamp.
Carry uncertainty forward. If identity resolution is probabilistic, a metric is estimated, or a source has known coverage gaps, that should travel with the data. Hiding uncertainty from an agent does not make the answer better. It makes the risk harder to see.
Failure modes
The first failure is a universal quality score. One number cannot represent freshness, lineage, policy, completeness, and suitability across every task.
The second failure is human-only metadata. A catalog page may document warnings, but the agent tool returns only rows. That is not AI-ready infrastructure.
The third failure is no refusal path. If quality signals are weak, the agent should know how to ask for review, choose a safer source, or avoid acting.
Questions to ask
- Which quality signals are returned with data, not merely displayed elsewhere?
- Can agents distinguish freshness, completeness, policy, provenance, and uncertainty?
- Which tasks require stronger quality gates before automation?
- Can the system explain why a dataset was considered unfit for a request?
- How do incidents update the signals agents see?
For adjacent guidance, read Data Quality Signals for AI Agents, AI-Ready Context Data, and The Enterprise AI Trust Data Layer.
Sources to start with
Start with the primary docs. They are the contracts you can test against, not commentary about the contracts.