The fastest way to waste a year is to argue about "lakehouse" while your data boundaries quietly become irreversible.

Why the labels cause confusion

Most teams use "data lake," "data warehouse," and "lakehouse" as brand adjectives. That is why every vendor can claim all three. The labels only help if you map them to concrete contracts: storage, table format, catalog, governance, and compute.

Open data infrastructure is not a fourth label. It is a way to evaluate the architecture behind any label by asking: who controls the data, the metadata, and the execution path?

Core idea: labels do not move your switching costs. Contracts do.

Plain definitions that hold up

These definitions stay useful even when marketing language shifts:

  • Data lake: data stored in low-cost object storage with minimal enforced structure at write time.
  • Data warehouse: a managed system where storage, compute, and governance are typically integrated behind one platform boundary.
  • Lakehouse: a lake with table semantics, transactional metadata, and managed operations layered on top (usually through a table format plus a catalog).

The ODI lens is less interested in the word you choose and more interested in whether your "lakehouse" is actually portable across engines and vendors. If you need a deeper grounding, start with What Is a Lakehouse, Really? and Compute-Storage Separation.

The ODI test: what stays portable?

ODI treats portability as a system property. The test is not "Can you export data?" The test is "Can you change one layer without rewriting the world?"

Four portability questions that keep you honest:

  • Data portability: can you move physical data without downtime and without losing partitioning, history, and retention policy?
  • Metadata portability: can schema, ownership, lineage, and policy travel with the data (or be re-attached predictably) when you switch platforms?
  • Workload portability: can multiple query engines read and write the same tables safely?
  • Governance portability: do access controls and auditability live in the data path, not only in a UI?

Those are ODI questions. They are also how you avoid platform lock-in while still getting a high-quality managed experience. See What Is Data Portability? for the common trap doors.

Tradeoffs that matter in practice

This is where the "lake vs warehouse" debate becomes real:

  • Operational burden: lakes push more responsibility to your team. Warehouses push more responsibility to the vendor.
  • Control surface: warehouses centralize control, but often behind closed interfaces. Lakes can preserve control, but only if you adopt open table and catalog standards.
  • Cost shape: compute-storage separation changes the cost curve, but it also forces you to manage performance and reliability explicitly.
  • Interoperability: the second engine is the moment you learn whether your table and catalog contracts are actually open.

If you want the short version, the lakehouse idea is correct. The execution details are where teams get hurt.

A practical architecture pattern

A credible open lakehouse pattern looks like this:

  • Storage: object storage that you control.
  • Table format: open tables such as Iceberg for the physical and transactional contract.
  • Catalog: a REST-compatible catalog, so multiple engines can share the same tables through one interface.
  • Compute: one or more engines, chosen per workload.
  • Governance and lineage: enforced and logged in the data path, not as a spreadsheet next to the system.

If you are building this today, the distinction between table formats and catalogs matters. Start with Table Format vs Catalog vs Query Engine.

What to ask in vendor meetings

Vendor meetings get productive when you ask architecture questions:

  • Where does the table contract live, and which parts are defined by an open specification?
  • Which catalog protocol do you support for cross-engine access?
  • Which operations are portable, and which depend on your proprietary runtime?
  • How do you export metadata, policy, and lineage, not only data files?
  • Show a recovery plan for a bad write, not a feature list.

The goal is not to avoid managed platforms. The goal is to buy managed platforms without buying a permanent boundary.

Sources to start with

Start with the specifications and the vendor docs that describe the contracts explicitly.