Open Data Infrastructure
What Is a Table Format? And Why Data Ownership Depends On It
A plain-language answer to what a table format is, why it is different from files, and how it affects data ownership.
A table format is the difference between owning a pile of files and owning a usable table. That sounds subtle until you try to leave a platform.
A table format is a metadata contract
A table format defines how many data files become one logical table. It tracks which files belong to the table, which snapshot is current, how the schema changed, how partitions are represented, and how engines should handle table operations.
Parquet can store the columnar data. The table format explains how those files behave together. Without that contract, teams have to reconstruct meaning from filenames, folder conventions, old jobs, and hope. Hope is not infrastructure.
Data ownership depends on table behavior
Owning raw files is useful, but incomplete. If you lose the metadata that explains snapshots, deletes, schema evolution, and partition changes, you have data but not the full table. Another engine may read the files and still misunderstand the dataset.
That is why table formats are central to ODI. They move critical behavior out of one private compute engine and into a contract other systems can inspect and implement.
Core idea: a table format is where data files become a durable, portable data product.
Iceberg, Delta, and Hudi solve similar problems differently
Apache Iceberg, Delta Lake, and Apache Hudi all give data lakes table-level behavior. They differ in history, governance, ecosystem, APIs, and operational tradeoffs. The important buyer move is to separate the category from the implementation.
The category matters because the old data lake pattern was too loose. The implementation matters because each format decides how portable and governable your table contract really is.
The ownership test
Ask one question. Can a competent team, using public docs and compatible engines, understand and operate the table without the original vendor platform?
If the answer is yes, the table format is doing real ownership work. If the answer is no, the data may be open but the table is still captive.
Sources to start with
These are the primary sources I would start from when checking the claims in this piece.