Open Data Infrastructure
Apache Parquet Explained: The Foundation of the Open Lakehouse
Apache Parquet explained as the columnar file format under much of the open lakehouse, and why it is not the same as a table format.
Apache Parquet is one of the quiet reasons the modern lakehouse works at all. It is also one of the reasons people confuse file formats with table formats.
Parquet is a file format
Parquet is a columnar file format. Columnar means values are stored by column instead of by row, which can make analytical reads much more efficient when queries only need some columns. It also supports schema information, compression, and encodings that matter for analytics workloads.
That makes Parquet foundational. It does not make Parquet a table format.
A Parquet file can store data. It does not, by itself, manage table snapshots, concurrent writes, row-level deletes, partition evolution, time travel, or a transaction log across many files. That is the job table formats such as Iceberg, Delta, and Hudi were built to handle.
The lakehouse builds table behavior on top
Open lakehouse systems often use Parquet for the physical files and a table format for the metadata contract. The table format tells engines which files belong to which snapshot, how schemas changed, how partitions evolved, and what the current table state is.
This layered design is why ODI conversations need clean vocabulary. If a vendor says "we support Parquet," that is useful. It does not mean the vendor supports open table operations, portable metadata, or interoperable catalogs.
Core idea: Parquet makes data portable at the file layer. Table formats make data meaningful at the table layer.
Why data ownership depends on the distinction
If all you can export is a folder of Parquet files, you may have the raw data but lose the table contract. You might have to reconstruct schemas, partitions, deletes, history, and dependencies. That is not nothing. It is also not full ownership.
True portability means the data files and the metadata contract can move together. Parquet is a critical part of that story, but it is only one layer.
Questions to ask vendors
- Do you write standard Parquet files that other engines can read?
- Which table format owns snapshots, schema evolution, and deletes?
- Can the table metadata be read outside your platform?
- Can another catalog or engine operate on the same table safely?
Parquet is the foundation. Do not mistake the foundation for the whole building.
Sources to start with
These are the primary sources I would start from when checking the claims in this piece.