Open Data Infrastructure
Policy Enforcement Across Open Data Systems
Policy enforcement in open data infrastructure is a placement problem: decide which controls belong in catalogs, engines, storage, services, and applications.
The hard part of policy enforcement isn't writing the policy. The hard part is making sure the right system evaluates the right policy at the right moment.
Policy Placement
Open data systems have multiple enforcement points. Catalogs can decide what exists, what a principal can discover, and which table operations are allowed. Engines can enforce query-time controls. Storage can enforce object-level access. Services can control APIs and workflow actions. Applications can shape user experience and intent.
Each layer sees a different part of the problem. A catalog may understand table identity and metadata. An engine may understand the query plan. Storage may understand object access but not business meaning. An application may understand the workflow but not every downstream data dependency.
Catalog Enforcement
Catalogs are powerful because they sit near table identity. They can coordinate namespaces, table metadata, permissions, and operations across multiple engines. That makes the catalog a natural control-plane component.
But catalog enforcement has limits. If the catalog grants access and the engine ignores the policy model, the system still fails. The catalog can coordinate control, but the rest of the stack has to honor the contract.
Engine Enforcement
Engines are closer to the query. They can apply controls that depend on rows, columns, functions, joins, or query shape. That makes them important for fine-grained policy enforcement.
The tradeoff is portability. If every engine implements policy differently, the platform owner ends up managing a policy translation problem. ODI needs enforcement that is strong enough for production and consistent enough to survive engine choice.
Service Enforcement
Core idea: policy enforcement should be explicit about authority. If every layer thinks another layer is responsible, nobody is.
Some policies belong in services: retrieval APIs, agent tools, semantic services, sharing services, and operational workflows. These services can enforce purpose-specific constraints that a generic engine may not understand.
That can work well, but only if service-level policy doesn't become a bypass. If a service reads broad data and returns filtered answers, the platform still needs audit records and a way to prove the service enforced the right rules.
Tradeoffs
- Centralized policy is easier to reason about, but it can become a bottleneck if it ignores runtime context.
- Distributed enforcement is closer to the work, but it can drift without a shared policy contract.
- Storage controls are necessary, but they rarely understand business semantics on their own.
- Application controls improve usability, but they should not be the only barrier protecting data.
The right design usually uses more than one enforcement point. The discipline is naming which layer is authoritative for each decision.
Sources to Start With
These primary sources are useful starting points for checking the technical claims behind this topic.