If a vendor demo makes a lakehouse look simple, ask one question. If you swap the query engine, what stays true?

The confusion that breaks ODI

Teams talk about "Iceberg support" as if it is one feature. It is not. A portable open data platform is a chain of contracts. The table format is one contract. The catalog is another. The query engine is a third.

When those contracts get blurred together, you end up with an architecture that looks open on a slide and behaves closed in production. The data may be in open storage, but the meaning of the data is trapped behind one catalog boundary or one engine-specific behavior.

Core idea: open storage is table files. Open data infrastructure is the ability to move a workload without losing table meaning, policy, and trust.

A simple mental model

Use this as a quick separation-of-concerns model:

  • Table format: the table contract stored with the data (metadata, snapshots, schema evolution rules, partition semantics, and delete semantics).
  • Catalog: the coordination layer that hands out table identity, namespaces, credentials, and policies, and that controls table operations.
  • Query engine: the executor that reads and writes tables through a format and a catalog, then turns it into performance and reliability.

If you need a longer definition of table formats first, start with What Is a Table Format? and Open Table Format vs File Format.

What a table format owns

A table format is not "Parquet plus a folder." A table format owns the metadata model that makes a pile of files act like a table. In Iceberg terms, this includes snapshots, manifests, partition specs, schema evolution, and delete semantics defined by the specification. The specification is the promise you can hold the ecosystem to.

The table format boundary matters because it determines what multiple engines can safely share. If the format contract is open and stable, you can run multiple engines on the same tables. If the format contract is proprietary or "mostly compatible," the second engine becomes a migration project.

Formats are also where "open" can still be misleading. A format can be open while operational portability is still blocked by a closed catalog, a closed governance model, or an engine-only optimization that changes semantics. That is why ODI treats the format as necessary, not sufficient.

What a catalog owns

A catalog is the control plane for tables. It answers questions like: What is the canonical name of this table? Where is its metadata? Who is allowed to read or write? What credentials should a client use? Which operations are allowed, and how are they audited?

In the Iceberg ecosystem, the catalog role is increasingly defined through a standard protocol, not a per-engine plugin. The Iceberg REST Catalog API exists so engines and languages can talk to a catalog through one interoperable interface.

That is why a page like What Is a REST Catalog? is not just definitional content. It is an architecture boundary. It is the difference between "your engine supports our catalog" and "any engine with the client can connect."

What a query engine owns

A query engine owns performance, resource management, and execution behavior. It can also own semantics that users confuse with the format.

Examples:

  • Planning and pruning: two engines can read the same Iceberg table and still behave differently because their planners use metadata differently.
  • Write patterns: file sizing, commit frequency, and delete strategy depend on the engine’s writers and its retry behavior.
  • Operational ergonomics: observability, retries, backpressure, and incident response tend to be engine-specific.

ODI does not require engines to be identical. It requires the table meaning to survive the swap.

Where interoperability actually lives

Interoperability is not a checkbox. It is a set of interfaces that have to stay stable:

  • Storage interface: open object storage and predictable layout (compute-storage separation is the prerequisite).
  • Table contract interface: the table format specification and its compatibility story.
  • Catalog interface: a standard API such as the Iceberg REST Catalog protocol, not a per-engine plugin zoo.
  • Governance interface: policy and audit that is enforced and logged consistently (not a UI-only control).

If you want a deeper foundation on ODI boundaries, start at What Is Open Data Infrastructure? and then Open Data Infrastructure Stack.

A buyer checklist that works

Use this as a real-world evaluation checklist:

  • If you replace the query engine, can you still read and write the same tables through the same catalog endpoint?
  • If you replace the catalog, can you still run the same engines without a rewrite?
  • Can you prove compatibility with documentation and a test plan, not a marketing slide?
  • Can you audit table operations and access decisions across the stack, not only inside one UI?
  • Can you recover from a bad write without an incident-response novella?

If the answer to any of those depends on proprietary glue, you have a useful stack. You just do not have ODI yet.

Sources to start with

Start with the specifications. They are the contract you can hold the ecosystem to.