Open Data Infrastructure
Iceberg Branches for Agent Sandboxes
How Iceberg branching and tagging can support agent sandboxes for candidate changes without corrupting production tables.
Agents should not get a write path to production tables just because they can generate plausible SQL. Plausible is not the same as safe.
The practical problem
Agentic systems can propose data fixes, enrichment jobs, classifications, table maintenance, and derived datasets. Some of that work is useful. Some of it will be wrong. The architecture needs a place for candidate changes to exist before production consumers trust them.
Apache Iceberg branching and tagging are relevant because they give teams a way to reason about table state beyond one main line. That makes them useful for agent sandboxes when combined with policy, evaluation, review, and promotion controls.
Core idea: an agent sandbox should isolate table changes, evaluate them, and promote only reviewed state back into production.
The sandbox pattern
Create an isolated branch for agent-generated candidate changes. The agent can write proposed rows, corrections, labels, or derived artifacts without touching the production branch.
Run checks against the candidate state. That includes data quality tests, policy checks, evaluation examples, lineage capture, and human review for high-risk changes. The branch is not a loophole. It is a safer place to inspect the work.
Promote deliberately. A promotion path should say which branch, snapshot, evaluator, reviewer, and policy decision allowed the change to become production state.
What breaks first
- Agents write to a branch, but downstream tools accidentally read it as trusted production data.
- Evaluation checks validate syntax and row counts but not business correctness.
- Branches accumulate without retention policy or ownership.
- Promotion copies data but loses audit context about who or what proposed the change.
Questions to ask
Ask these before exposing Iceberg branches to agents.
- Which agents can create branches, write changes, and request promotion?
- Which checks are required before promotion?
- How do consumers avoid reading sandbox state by accident?
- How long do sandbox branches live?
- Can audit records connect the agent, tool call, branch, snapshot, review, and promotion?
For related agent control, read AI Data Contracts for Agents, Why Agents Need Governed Data Access, and Data Readiness for Fine-Tuning.
Sources to start with
Use Iceberg table-state features as the isolation layer, then add evaluation and governance around them.