Open Lakehouse Benchmark Design for ODI

A benchmark that only measures query speed is a benchmark for buying faster tunnel vision.

The practical problem

Open lakehouse decisions often lean on benchmark numbers. Query latency matters. Cost matters. Concurrency matters. But an ODI benchmark that stops there misses the point of open infrastructure.

Open Data Infrastructure is about control, portability, governance, and ecosystem fit. A benchmark should test those properties directly.

Benchmark the workload and the boundary

A useful benchmark starts with real workload classes: ingestion, streaming writes, ad hoc analytics, dashboard serving, model feature retrieval, data product APIs, maintenance, and recovery. Each class should name the engines, catalog path, table format behavior, policy requirements, and freshness expectation.

Then test interoperability. Can another engine read the table without migration? Does schema evolution behave consistently? Do snapshots, manifests, and metadata tables provide enough evidence for operations? Can lineage and policy travel with the workload?

Core idea: an ODI benchmark should measure whether the architecture preserves control under change.

Evidence beats leaderboard thinking

The benchmark output should include more than a chart. It should include table layout, catalog configuration, query plans, policy decisions, lineage events, maintenance results, and exit steps. That is how teams compare architectures instead of isolated engines.

This matters for systems such as DuckDB, DataFusion, StarRocks, Doris, Flink, and Iceberg because each tool can be excellent in the right layer. The benchmark should show the layer, not crown a universal winner.

What breaks first

The benchmark runs one engine against one happy-path table and calls the result architectural truth.
Governance is tested manually after performance results are already accepted.
Metadata behavior, maintenance, and recovery are excluded because they are harder to measure.
Exit paths are discussed in procurement but never benchmarked.

Questions to ask

Which workload classes does the benchmark represent?
Which interoperability claims are tested with a second engine?
Which governance controls run during the benchmark?
Can the platform prove an exit path with preserved data, metadata, and policy?

For adjacent architecture, read DuckDB, DataFusion, StarRocks, and Doris in ODI, Table Formats, Catalogs, and Query Engines, and Trino vs Spark vs DuckDB.

Sources to start with

These primary sources anchor the technical claims in this guide.

The benchmark should prove the architecture, not just flatter the engine.

ODI hub Article library Use the scorecard Query engines in ODI Formats and catalogs Engine comparison

Get started with Apache Iceberg, today! Want to learn more? Visit https://www.opendatainfrastructure.com/