Open Data Infrastructure
AI-Ready Context Windows Need Data Contracts
Why context windows need contracts for source, freshness, allowed use, transformation path, truncation risk, and evaluation evidence.
A context window is not a magical bag of useful text. It is a lossy data interface with consequences.
Context is a data product
Teams usually build contracts for tables, APIs, and events. AI systems need the same discipline for context payloads. The model receives a bounded package of source data, metadata, instructions, examples, and retrieved material. That package has a source path and a failure mode.
W3C PROV gives a vocabulary for provenance. OpenLineage gives a model for jobs, runs, datasets, and facets. DataHub describes contracts as verifiable guarantees about data assets. AI-ready context should borrow that discipline instead of pretending prompts are exempt from data engineering.
The contract must cover loss
A context contract should name source, owner, freshness, allowed use, transformation path, ranking method, truncation behavior, policy status, and evaluation evidence. Truncation matters because context windows are finite. Something gets left out. The contract should say what can be omitted, what must be retained, and when the system should refuse to answer.
This is especially important for agentic workflows. An agent that acts on incomplete context can create a real-world side effect, not just a weak answer.
Core idea: Context windows need data contracts because every context payload is a governed data product in miniature.
The ODI context pattern
Open Data Infrastructure connects the raw material a context window needs: catalogs, lineage, access policy, freshness, quality signals, and data product ownership. The context layer should package that evidence with the answer path.
For adjacent context, read AI-ready context, context provenance receipts, and why the context window is not the data layer.
What breaks first
- Retrieved text has a source link but no freshness or policy status.
- Context is truncated without preserving required fields or denial reasons.
- Evaluation failures cannot be connected to the specific context payload used.
- Agents treat examples as current data because the context contract does not distinguish them.
Questions to ask
Ask what the context contract guarantees, what it allows the system to omit, and when missing context should trigger refusal. Ask whether every context payload can be tied back to source, policy, and evaluation records.
The context window is small. The contract around it cannot be.
Sources to start with
These primary sources anchor the technical claims in this guide.
- W3C PROV overview
- OpenLineage object model documentation
- OpenLineage facets documentation
- DataHub data contract entity documentation
A context payload without a contract is just a guess with formatting.