The modern data stack has too many catalogs. Some are metastores. Some are governance layers. Some are product feature sets pretending to be a control plane. The result is metadata fragmentation and policy drift.

The catalog problem Gravitino is trying to solve

Catalog sprawl is a symptom of a deeper issue. Teams keep adding systems that need to know what data exists, who owns it, and how it can be used. Every system adds its own representation of metadata, and the organization pays the tax as inconsistency.

At scale, catalog sprawl creates two painful outcomes:

  • Discovery drift: “what exists” becomes multiple answers.
  • Policy drift: “who can access it” becomes different rules depending on the tool.

What Gravitino is

Apache Gravitino positions itself as a way to manage and federate metadata across multiple sources. The key concept is federation. Instead of forcing every system into one catalog, you can present a unified view across catalogs while preserving underlying ownership boundaries.

That is not automatically the right architecture, but it is a useful direction for organizations that already have multiple catalogs in production and need a way to reduce fragmentation without a massive rip-and-replace program.

The ODI angle

ODI treats the catalog layer as part of the control plane. The catalog is where discovery, access decisions, and governance behavior become consistent across engines and tools.

A federation approach can support ODI goals when it does three things well:

  • Unified discovery: one consistent place to answer what data exists and what it means.
  • Stable governance boundary: policy decisions remain explicit and auditable, not duplicated per tool.
  • Portability support: metadata remains usable outside one vendor interface, so tool changes do not destroy trust.

Core idea: the catalog layer is the control plane, and federation is one way to keep the control plane coherent.

Tradeoffs and failure modes

Federation can reduce fragmentation, but it introduces its own risks.

  • Inconsistent semantics: two underlying catalogs can model the same concept differently, and the federation layer cannot magically fix meaning.
  • Authorization complexity: policy evaluation can become complicated when identity and permission models differ.
  • Performance and reliability: unified queries can become dependent on multiple upstream systems.
  • Control confusion: teams stop knowing where the authoritative policy decision lives.

Federation works when the organization treats “metadata meaning” as real infrastructure. If it is treated as a UI feature, you will recreate the same drift at a different layer.

Sources to start with

Start with Apache Gravitino documentation and ODI-relevant catalog and lineage standards.