Open Data Infrastructure
Open Table Format vs File Format: What Is the Difference?
Parquet is a file format. Iceberg is a table format. Confusing the two is how teams end up with portable files and non-portable platforms.
If you say "we are open because we store Parquet," you are describing storage. You are not describing ownership.
What a file format defines
A file format defines how bytes are laid out in a file. Parquet, ORC, and Avro are file formats. They describe compression, encoding, and how readers interpret the content.
A file format does not define table-level behavior: snapshots, schema evolution rules, partition evolution, deletes, and time travel.
What a table format defines
A table format defines the behavior of a table across a collection of files. It defines metadata structures and rules for how writers commit changes and how readers interpret table state.
Iceberg is a table format. It defines snapshots, manifests, schema evolution, partition evolution, time travel, and row-level deletes in a way that can be implemented by multiple engines.
Core idea: file formats make data readable. Table formats make data governable and portable.
Why the distinction matters for ODI
ODI is about portability with meaning. That requires a table contract, not only readable files. If you store Parquet but rely on a vendor-specific metastore and a vendor-specific commit protocol, you still do not own the table behavior.
Open table formats are the contract layer that lets multiple engines participate without losing semantics.
Common mistakes
These mistakes show up everywhere:
- Assuming "file open" means "platform open": it does not.
- Ignoring metadata portability: the catalog and governance layer is where lock-in often lives.
- Skipping time travel and audit: a table contract is not real if you cannot inspect and roll back history.
If you want to own the data product, you need open behavior, not only open bytes.
Sources to start with
Start with the file and table specifications.