The catalog is not a metadata database. It is where your lakehouse decides who can do what, and when.

Why catalogs are different

Teams treat catalogs like passive registries. That is true only until you have multiple engines, multiple teams, and a real governance model. At that point the catalog becomes the system that coordinates table identity, credentials, namespaces, policies, and table operations.

This is why "managed vs self-hosted" matters more for catalogs than it does for many other components. You are choosing who owns the control plane.

Core idea: if the catalog is where policy and operations live, then ownership of the catalog is ownership of the boundary.

What you outsource when it is managed

When the catalog is managed, you usually outsource:

  • availability: uptime, failover, scaling, and operational patching
  • security operations: key rotation, endpoint hardening, vulnerability response
  • credential vending complexity: integration patterns for multiple engines and clouds
  • some audit surface: log retention and access pattern visibility

You do not outsource the consequences. When an incident happens, the blast radius is still yours. The difference is whether you can change the boundary quickly or you have to wait for a vendor process.

What you own when it is self-hosted

When you self-host, you own the same operational responsibilities. You also gain two forms of control that matter in ODI terms:

  • control of the policy model: you can integrate it with your identity system and your governance approach without inheriting a vendor opinionated boundary
  • control of the migration path: you can move catalog deployments, operators, and supporting systems without negotiating a hosted boundary

Self-hosting is not "free." It is a bet that catalog ownership is strategically valuable. For many buyers, it is.

Failure modes to plan for

Catalogs fail in ways that look boring until the day they are existential:

  • auth drift: engines that used to connect stop connecting because of a credential rotation or a policy change
  • namespace sprawl: the catalog becomes ungovernable because tenancy conventions were never defined
  • policy fragmentation: the catalog enforces one model, the warehouse enforces another, and teams build shadow access paths
  • recovery ambiguity: a rollback or delete goes wrong and nobody can prove what the canonical state should be

Those are not edge cases. They are the normal shape of operating a real platform.

A decision framework

This framework tends to produce a sane decision:

  • Start with your blast radius: how many engines, teams, and business-critical workflows depend on the catalog boundary?
  • Decide what you must control: identity integration, policy evaluation model, audit requirements, and disaster recovery posture.
  • Pick a protocol story: for Iceberg catalogs, a REST-compatible interface is the baseline interoperability requirement.
  • Pick an operational story: managed can be correct if it still gives you a credible exit and a credible audit trail.

If you are not sure what "protocol story" means, start with What Is a REST Catalog?.

Sources to start with

Use the protocol docs and the project docs to anchor your evaluation in contracts, not claims.