Open Data Infrastructure
Why Every AI Strategy Needs an Open Data Infrastructure Strategy
An AI strategy without an open data infrastructure strategy is mostly a model strategy.
Most AI strategies start too high in the stack. They name the model, the app, the vendor, the assistant, the prompt pattern, and the demo. Then someone asks the rude question: can the system actually reach trusted data without violating policy or rebuilding half the platform?
The Strategy Gap
AI strategy usually gets framed as a choice about models and applications. That is too narrow. Production AI depends on the data layer being available, governed, explainable, and portable enough to support many future workloads.
If the data layer is closed, undocumented, or trapped inside one platform, every AI use case has to negotiate with that boundary. The result is predictable: more copies, more fragile connectors, more policy exceptions, and more confusion about which answer should be trusted.
That is why an ODI strategy belongs next to the AI strategy. It is the control plan for the data foundation.
What an ODI Strategy Covers
An ODI strategy defines which parts of the data estate need open control and how that control will be implemented. It should be specific enough to shape architecture decisions, not vague enough to fit on a conference slide.
- Data ownership: which teams own data products, policies, metadata, and quality signals.
- Open formats: where tables need portable storage semantics rather than private engine state.
- Catalog control: how schemas, permissions, table operations, and discovery are coordinated.
- Lineage and audit: how teams trace data and AI answers back to sources and transformations.
- AI context: what metadata, semantics, quality signals, and policy details can be given to agents.
The Operating Model
The work isn't just technical. An AI-ready data foundation needs clear ownership. Otherwise "open" becomes another word for "someone else will clean it up later."
Use a simple split. Platform teams own shared infrastructure, catalogs, access patterns, and reliability. Domain teams own the meaning of their data, the definitions, the quality expectations, and the business rules. Governance teams define policy and audit requirements in forms the platform can enforce.
That structure keeps ODI from becoming a tool wishlist. It becomes an operating model for how data gets trusted, accessed, and reused.
A Practical Roadmap
Start where AI risk and data value overlap.
- List the AI use cases that need governed enterprise data, not just public documents or static files.
- Map the data sources, metadata, policies, and lineage needed for those use cases.
- Identify where data or metadata is trapped in a proprietary boundary.
- Choose the open table, catalog, and lineage interfaces that reduce that dependency.
- Build the first governed retrieval path and make it observable.
The goal isn't to boil the data estate. The goal is to turn the highest-value AI paths into repeatable infrastructure.
Board Questions
If AI is on the board agenda, data control should be there too.
- Which AI initiatives depend on data we don't fully control?
- Can we audit how an AI answer used data, metadata, and policy?
- What would happen if we changed our warehouse, catalog, model provider, or agent framework?
- Where are we using exports as a substitute for infrastructure?
- What is our plan for keeping AI context current, governed, and explainable?
A serious AI strategy needs answers to those questions. Otherwise the strategy is just hope with a bigger cloud bill.
Sources to Start With
These sources are useful starting points for checking the technical claims behind this topic.