A lakehouse table can be technically open and still be painful to query. The data files are readable. The table metadata is there. The layout still decides whether the workload behaves.

Layout is part of the contract

The Apache Iceberg specification includes sort orders as table metadata. Spark DDL also exposes write ordering for Iceberg tables, which means the physical layout is not just a hidden engine preference. It is a table-level behavior that writers and readers should be able to inspect.

That matters because performance promises usually get stated outside the table. A data product owner says the table supports dashboard filtering. A platform team says compaction is under control. An AI tool says a query should be cheap enough to run during a workflow. Without layout evidence, those promises become folk wisdom.

Sort order is evidence

Sort order does not guarantee fast queries by itself. It gives the platform a concrete artifact to compare against workload expectations. If the table is frequently filtered by customer, event date, or region, the declared order should make sense for those access patterns. If it does not, the table contract and the workload have drifted apart.

The useful question is not whether the table is sorted. The useful question is whether the sort order, file sizes, partitioning, compaction cadence, and query plans tell the same story.

Core idea: Iceberg sort orders make table layout part of the evidence layer, not an invisible tuning rumor.

The ODI planning loop

Open Data Infrastructure should connect sort order to catalogs, lineage, cost, and data product SLAs. A catalog can expose the expected access pattern. Lineage can show which consumers depend on that pattern. Query logs can show whether readers still match it. Maintenance jobs can rewrite files with a reason attached.

For adjacent context, read Apache Iceberg and Open Data Infrastructure, Iceberg metadata tables as evidence, and open lakehouse benchmark design.

What breaks first

  • The table has a declared sort order, but no one checks whether current queries still use it.
  • Compaction rewrites files for storage hygiene while damaging the access pattern that consumers rely on.
  • New engines read the table but do not preserve the same write-order behavior.
  • Agent query budgets fail because layout, workload, and cost evidence never meet.

Questions to ask

Ask where sort order is defined, which writers respect it, and which workloads depend on it. Ask whether table maintenance can prove that the layout still supports the data product promise.

The open table is the starting point. The evidence around the table is what makes it operable.

Sources to start with

These primary sources anchor the technical claims in this guide.

Sort order is where performance starts becoming a data contract.