Open Data Infrastructure
How to Tell If a Vendor Is Open or Just Open-Washing
The goal is to teach buyers to distinguish open standards from marketing language. A practical ODI guide for buyers.
The best time to ask hard questions about a data platform is before the platform becomes expensive to leave. After that, every answer comes with migration tax attached.
Why it matters
Buying a data platform is easy if the only question is feature coverage. It is much harder when you ask who controls the data after the contract is signed. This matters because it decides whether teams can build on data as infrastructure or keep negotiating with the same closed boundary over and over.
The practical test is not whether a tool sounds open. The test is whether data, metadata, policy, and workload behavior can survive contact with another engine, another team, another vendor, or another AI system.
The ODI angle
The goal is to teach buyers to distinguish open standards from marketing language. I would frame this as a buyer evaluation and vendor accountability question, not a product category question.
Core idea: open data infrastructure is the discipline of keeping control close to the data owner while still letting the ecosystem move fast.
That control has to include the boring parts (permissions, schemas, lineage, cost, freshness, and recovery). Those are the parts that decide whether the architecture works after the first demo.
The architecture test
For buyers, the architecture test is direct. Can this design make the right thing easy without hiding the real constraints?
- Access should be documented, programmatic, and reasonable to operate.
- Storage should preserve table meaning beyond one compute engine.
- Catalogs should coordinate identity, metadata, policy, and table operations.
- Governance should run in the path of work, not as a spreadsheet nearby.
- AI context should carry source, policy, quality, and lineage with the answer.
What breaks first
Most ODI failures start with a small compromise that becomes architecture by accident.
- The product supports open standards only at the edges.
- The demo hides export limits, policy lock-in, or metadata loss.
- The buyer gets API access but not practical portability.
- The vendor owns the roadmap for every critical data workflow.
None of those failures mean the team picked bad tools. They usually mean the tools were asked to carry a contract the architecture never made explicit.
Questions to ask
Use these questions when you evaluate open-washing data vendors in a real platform decision.
- Can the vendor prove open access with docs, not slides?
- What metadata leaves with the data?
- Which engines and catalogs work without custom glue?
- What does an exit look like in weeks, dollars, and risk?
If the answer depends on a custom export, a private metadata model, or a single execution engine, the system may still be useful. It just is not as open as the slide says (and yes, that distinction matters).
Sources to start with
These sources are useful starting points for checking the technical claims behind this topic. They are not a substitute for testing your own stack.