The best time to ask hard questions about open data infrastructure is before the contract is signed. After that, every honest answer comes with migration cost attached.

The buying mistake

Most data platform evaluations over-index on features. Does it ingest the source? Does it have a catalog? Does it run SQL? Does it support AI? Does the demo look clean?

Those questions are not wrong. They are just incomplete. The better buyer question is: what happens when we need to change something?

Change the compute engine. Change the catalog. Change the governance tool. Change the AI application. Change the vendor contract. Change the data sharing model. If the answer is "you can export some files and rebuild the rest," you are not buying open infrastructure. You are buying a future migration project.

Seven tests for ODI buyers

  1. Data access: Can you get operational and analytical data through documented APIs, CDC, exports, or open storage paths without punitive friction?
  2. Storage portability: Can the data live in formats that other engines can read without a proprietary translation step?
  3. Metadata portability: Can schemas, ownership, freshness, table state, lineage, and policies move with the data?
  4. Catalog openness: Does the catalog expose documented interfaces for discovery, permissions, and table operations?
  5. Compute choice: Can another engine use the same data contract without copying everything into a new platform?
  6. Governance in the path: Are access control, auditability, and lineage enforced where work happens?
  7. AI context: Can agents retrieve governed data with source, meaning, policy, quality, and history attached?

25 questions to ask vendors

Ask these before the demo gets too pretty.

  1. Where does the customer-controlled copy of the data live?
  2. Which open table formats do you support for reads and writes?
  3. Which operations depend on proprietary metadata?
  4. Can another engine read the same tables safely?
  5. How are schema evolution, partition evolution, deletes, and snapshots represented?
  6. Do you support a documented catalog API?
  7. Can table metadata be exported with useful semantics intact?
  8. Where is policy enforced?
  9. Can policies be evaluated outside your user interface?
  10. What identity systems do you support?
  11. How do you capture lineage?
  12. Can lineage events leave the platform?
  13. How do quality signals attach to datasets?
  14. Can AI tools retrieve quality, policy, and lineage context through an API?
  15. What happens when a second query engine enters the architecture?
  16. What happens when a customer wants to leave?
  17. Which fees appear during bulk export, replication, or egress?
  18. Which integrations are based on open standards?
  19. Which integrations are custom partnerships?
  20. Can we run a proof of portability during procurement?
  21. Can we test a non-preferred engine?
  22. Can we inspect generated metadata directly?
  23. Can we audit agent access across tools?
  24. What will not work if we remove your control plane?
  25. What part of the architecture are you asking us to trust but not verify?

Red flags

Be careful when a vendor says "open" but the proof stops at one of these:

  • Export exists, but only as a slow operational escape hatch.
  • Open formats are read-only while writes require the vendor path.
  • The catalog is called open because it has a UI search box.
  • Governance works only inside one compute engine.
  • Lineage is visible in screenshots but not available as portable events.
  • AI context is generated from data the agent cannot trace or explain.

The proof to request

Ask for a small portability proof. Not a generic demo. A real proof.

  1. Write a table through the platform.
  2. Read it from another engine.
  3. Evolve the schema.
  4. Apply a policy.
  5. Capture lineage.
  6. Expose metadata to an AI tool or application.
  7. Show what still works if the preferred vendor interface is removed.

If that test is treated as unreasonable, you have learned something useful.

Sources to start with