Open Data Infrastructure
Iceberg v3 and Deletion Vectors Explained
Deletion vectors make merge-on-read less painful by turning row-level deletes into a cheap bitmap check instead of a growing pile of delete files.
Row-level deletes are where lakehouse optimism goes to die. Not because they are impossible, but because the operational tax piles up quietly, then all at once.
The problem deletion vectors solve
Iceberg supports row-level deletes without rewriting data files by writing delete metadata and applying it at read time. In practice, the most common pain point is not the idea of merge-on-read. It is the accumulated cost of applying a lot of tiny delete artifacts during scans.
Position delete files work, but they tend to multiply in high-churn tables. Every scan now has more metadata to plan, more deletes to read, and more work to do before you even touch the data you wanted.
Core idea: merge-on-read fails operationally when deletes become a metadata workload, not a data workload.
What a deletion vector is
A deletion vector is a compact representation of which row positions in a specific data file should be treated as deleted. The key phrase is specific data file. A deletion vector is not a table-wide structure. It is an attachment to one file.
Conceptually, it is a bitmap that says: when you read this data file, ignore these positions. That makes the read path cheap. Instead of joining a growing list of delete rows, the reader checks a bitset.
How Iceberg v3 uses them
Iceberg spec v3 introduces deletion vectors as a first-class mechanism for representing row-level deletes more efficiently than position delete files at execution time. The specification describes how deletion vectors are stored and referenced, including how they relate to snapshot metadata and file sequence numbers.
This does not mean spec v2 deletes disappear. Many stacks will still produce position delete files, and in some cases you will want that behavior. Deletion vectors are a way to keep merge-on-read from turning into "merge on a million tiny files."
The practical mental model is:
- Spec v2: row-level deletes are expressed as delete files that are scanned and applied.
- Spec v3: some of those deletes can be materialized into a per-data-file bitmap that is cheap to apply.
If you are building an engine-agnostic stack, this is not about performance trivia. It is about whether mutable workloads remain portable across engines without one vendor-specific delete implementation becoming the hidden dependency.
Tradeoffs you still own
Deletion vectors help the read path, but they do not eliminate the work. You still have to answer uncomfortable questions.
- Engine support: your stack is only as portable as the least-capable reader you need to keep.
- Table upgrades: upgrading a table contract is a coordination problem across writers, readers, and maintenance jobs.
- Maintenance discipline: compaction, snapshot expiration, and orphan cleanup still matter. Deletion vectors do not save you from messy table hygiene.
What to test before you rely on it
Do not ship this on faith. Test the exact behaviors your operators will have to own.
- Which engines in your stack can read and write spec v3 tables?
- What happens when a downstream engine only understands spec v2 delete semantics?
- How do your maintenance jobs behave when deletes are expressed as vectors instead of delete files?
- Can you inspect delete state using metadata tables and operational dashboards?
- Can you roll back safely when an ingestion job deletes the wrong rows?
In ODI terms, the goal is not "fast deletes." The goal is a portable contract for mutable data that does not collapse under operational load.
Sources to start with
Start with the specification, then validate behavior through the docs of the engines you actually run.