Serving an Iceberg table is easy until the serving layer has a different idea of the table than the catalog does.

Metadata cache is part of the contract

StarRocks documentation describes Iceberg catalog support and metadata cache refresh settings. That detail matters because serving engines are often judged by latency, but trust depends on whether the engine is serving the right table state.

If the serving layer caches catalog metadata, the data product contract needs to name the refresh behavior. Which catalog is authoritative? How often does StarRocks refresh metadata? Which queries use cached state? Which incident record proves refresh happened after an upstream change?

Sync boundaries should be visible

A sync boundary is the line between table truth and serving truth. The Iceberg catalog may know about a new snapshot, schema change, partition evolution, or maintenance action before the serving layer observes it.

That is not automatically bad. It becomes bad when users receive stale answers and operators cannot tell whether the issue is source data, catalog metadata, serving cache, query planning, or workload isolation.

Core idea: Iceberg serving needs freshness evidence for metadata, not only rows.

The ODI pattern puts sync into operations

Open Data Infrastructure asks teams to expose the operational behavior that makes open tables trustworthy. In this case, that means metadata refresh policy, query profiles, table snapshot references, and owner review belong together.

For related articles, see StarRocks query profiles for AI serving, StarRocks external catalog guardrails, and Iceberg metadata tables as ODI evidence. Serving is where metadata promises meet user-facing latency.

What breaks first

Most incidents start as a reasonable cache decision that nobody documented as a contract.

  • An upstream Iceberg snapshot is committed, but the serving layer still answers from stale metadata.
  • A schema change reaches the catalog before serving tests run.
  • A query profile shows slow scans but not the table snapshot or metadata age behind the query.
  • Incident review blames the engine because catalog freshness was never measured.

What the runbook should record

Record the catalog type, refresh settings, last refresh time, affected tables, query profile links, snapshot identifiers, and the owner who can force or validate a refresh.

That runbook turns sync from a hidden engine behavior into a data product reliability signal.

Sources to start with

These primary sources anchor the technical claims in this guide.

Fast serving is useful. Fast serving with stale metadata is just a prettier incident.