Time travel is wonderful until every old version becomes someone else's storage bill.

Retention is where time travel becomes policy

Open table formats make time travel practical by preserving table snapshots. That capability is valuable for debugging, recovery, audit, and reproducibility. It also creates a retention problem.

If snapshots never expire, metadata and storage grow. If snapshots expire too aggressively, recovery and audit use cases break. Snapshot retention is not a cleanup preference. It is a policy decision.

Core idea: snapshot expiration should encode business recovery and audit requirements before it encodes storage anxiety.

What expiration actually does

Iceberg creates a new snapshot for table changes. Expiring snapshots removes old snapshots from table metadata based on retention policy, which allows data files no longer referenced by retained snapshots to be deleted.

That distinction matters. Expiration changes the recoverability window. It also affects whether old files can be physically removed. Teams should understand both the metadata effect and the storage effect before scheduling a job.

Retention needs table classes

Not every table needs the same retention policy. Classify tables by business requirement:

  • critical regulated tables: longer retention, tighter audit, slower deletion
  • operational high-churn tables: shorter windows with strong recovery procedures
  • derived tables: retention tied to rebuild cost and downstream dependency
  • temporary or experimental tables: aggressive expiration and clear owner accountability

One global default is easy. It is rarely correct.

Expiration has to respect writes and branches

Retention must account for active writes, branches, tags, and long-running readers. A cleanup process that deletes data needed by in-progress work can corrupt trust in the platform faster than almost anything else.

Use conservative safety windows, dry runs where available, and table-specific procedures. The team should know exactly which snapshots will remain and why.

A retention runbook that works

  • Define minimum snapshots, maximum age, and branch or tag policy by table class.
  • Record owners and approval path for retention changes.
  • Run expiration on a schedule with metrics and alerts.
  • Track metadata size, snapshot count, data-file deletion, and failure rate.
  • Test restore and rollback behavior inside the retained window.

If the policy has never been tested during a restore drill, it is not a policy yet. It is a guess.

Sources to start with

Start with Iceberg maintenance and specification docs, then validate operational behavior in the engines that manage your tables.