Deleting data from a lakehouse is not one act. It is a promise about table behavior, reader visibility, audit evidence, and cleanup.

The deletion problem

Apache Iceberg stores row-level deletes in delete files. The specification describes position deletes, which identify a row by data file path and row position, and equality deletes, which identify rows by one or more column values.

That detail matters for governance because a delete request should not disappear into an opaque cleanup job. The platform should know which request created the delete, which snapshot made it visible, which readers can see it, and when maintenance rewrote or expired the affected files.

Delete files are evidence

A delete file is not the whole privacy workflow. It is the table-format evidence that a logical removal became part of table state. The governance workflow still needs request identity, approval, policy basis, retention rules, lineage, and proof that downstream consumers saw the right snapshot.

That is the useful Open Data Infrastructure angle. Open table behavior gives governance teams something concrete to inspect. Instead of asking whether a pipeline probably removed a row, teams can ask which table metadata changed and which engines inherited the change.

Core idea: Iceberg delete files matter because they move deletion evidence into the table boundary, where multiple engines can reason about it.

The ODI control loop

A practical control loop starts with the deletion request and ends with downstream verification. The table snapshot records the visible state. The catalog ties that state to ownership, policy, and access. Lineage connects affected data products and agent tools. Maintenance jobs compact or rewrite files only after the governance record is complete.

For adjacent context, read Apache Iceberg and Open Data Infrastructure, Iceberg metadata tables as evidence, and governance as infrastructure.

What breaks first

  • Delete requests are tracked in a ticket system, but table snapshots are not tied back to those requests.
  • Equality deletes work in one write path, but another engine or maintenance job treats them differently.
  • Compaction rewrites evidence before audit systems record which rows were affected.
  • Agent-facing data products see stale snapshots because freshness and snapshot policy were never part of the contract.

Questions to ask

Ask which delete type each workflow uses, which engines read those deletes correctly, and how long delete evidence remains queryable. Ask whether deletion state is visible through metadata tables, lineage, and catalog events.

The boring answer is the important answer. If the team can not connect request, table state, reader behavior, and cleanup, deletion is still an operational hope rather than governance evidence.

Sources to start with

These primary sources anchor the technical claims in this guide.

A delete is governed only when the table can prove what changed.