Engine debates get religious fast. The faster question is: which parts of your system are you willing to make irreversible?

The engine choice you are really making

Open lakehouse architectures make it possible to use more than one query engine on the same data. That is the upside. The risk is that an engine becomes the place where the table contract, governance model, or operational workflow becomes proprietary.

DuckDB, Apache DataFusion, StarRocks, and Apache Doris show up in the same shortlists because they all touch analytics. They usually serve different roles.

Core idea: ODI wants the table format and catalog to be the durable contract. Engines should be swappable roles.

The roles these engines tend to play

A practical framing:

  • DuckDB: an embedded analytical database that is strong for local analytics, data science workflows, and fast iteration on files and tables. It often shines at "bring compute to the data on your laptop" and at embedding analytics in applications. DuckDB docs.
  • Apache DataFusion: a query engine (and framework) in Rust that is designed to be embedded, extended, and composed into other systems. It often shows up when teams want to build custom query services or need a lightweight engine they can integrate deeply. DataFusion docs.
  • StarRocks: an analytical database focused on interactive performance for BI and real-time analytics patterns. It is often evaluated as the "fast serving layer" for dashboards and customer-facing analytics. StarRocks docs.
  • Apache Doris: an MPP analytical database with a similar "fast interactive analytics" goal, often used for dashboard and reporting workloads where low latency matters. Apache Doris docs.

If you are mixing these together in your head, pause and re-read Table Format vs Catalog vs Query Engine.

ODI fit: what stays portable?

ODI does not require you to standardize on one engine. It requires the durable contracts to sit below the engine layer.

Three ODI-fit questions to ask for any engine choice:

  • Table contract: can the engine read and write open table formats without changing semantics (schema evolution, snapshots, and deletes)?
  • Catalog contract: can the engine integrate with a catalog through standard interfaces, not one-off plugins?
  • Exit path: if you move this workload to another engine, what breaks: only performance tuning, or the whole data access model?

If you keep those contracts open, you can use an embedded engine for local analytics and an MPP engine for interactive serving without turning either into the one true platform.

Operational realities that matter

Engine choices become production choices because of operational constraints:

  • concurrency and isolation: interactive analytics workloads need predictable behavior under load
  • multi-tenant governance: access control and auditability need to work across engines, not only inside one cluster
  • data freshness: ingestion and update patterns determine whether the engine is serving accurate answers
  • maintenance: file hygiene and metadata growth still matter when the underlying storage uses open table formats

If you want to make this practical, read Running the Open Lakehouse in Production and Automating Table Maintenance and Compaction.

A decision model

This is a decision model that usually holds up:

  • If you need local, embedded analytics: evaluate DuckDB and DataFusion first. Choose based on whether you want a complete embedded database experience or a framework you can build into a service.
  • If you need interactive, high-concurrency serving: evaluate StarRocks and Doris first. Choose based on operational fit and the workload patterns you actually have (dashboards, customer-facing analytics, real-time reporting).
  • If you need ODI durability: standardize on open tables and catalogs first, then treat engines as roles.

The open lakehouse is not "one engine to rule them all." It is a set of open contracts that let you choose the right engine per job.

Sources to start with

Start with the official documentation for each engine, then validate behavior with the table format and catalog contracts you plan to standardize.