Open Data Infrastructure
Access Control in Open Data Infrastructure
Open data systems need access control that works across catalogs, engines, services, agents, and shared data products.
Access control gets harder when data becomes more open. That sounds backwards until you remember what "open" really means. More engines can participate. More tools can query. More workflows can reuse the same data. More systems need to know what is allowed.
The Access Problem
A closed platform can hide access control inside one product. That doesn't make it simple, but it does keep the boundary obvious. ODI breaks that assumption on purpose.
The point of open data infrastructure is that data, metadata, and compute don't have to live inside one vendor boundary. That means access control needs a portable design. If permissions only work inside one UI, they won't survive the architecture you are trying to build.
Where Control Lives
There are several places to enforce access in an ODI stack.
- Identity layer: who the user, service, or agent is.
- Catalog layer: which namespaces, tables, views, and metadata the principal can discover or operate on.
- Engine layer: which queries can run and how row, column, or object-level controls are applied.
- Storage layer: which files or objects can be read or written.
- Application layer: which workflows, tools, and actions are available.
The mistake is assuming one layer can do everything. It usually can't. The design question is which layer owns which decision.
Useful Patterns
Core idea: access control in ODI should be narrow, auditable, and close to the data path. Broad app-level trust isn't a governance model.
Start with least privilege. Give users, services, and agents only the permissions needed for the job. Use catalogs to limit discovery and table operations. Use engines or policy services for runtime query enforcement. Use storage credentials carefully, especially when a catalog can vend short-lived access to underlying data.
The clean pattern is explicit delegation. A user or agent asks for data. The catalog and policy layer decide what can be discovered and accessed. The engine executes within that boundary. Logs capture the decision and the action.
Agent Risk
Agents make access control less forgiving because they can chain actions. A person might ask one question. An agent might inspect metadata, choose a tool, query a table, summarize the result, and trigger another workflow.
That doesn't mean agents should get broad access. It means the access contract has to include scope, purpose, auditability, and failure states. A well-designed agent should be able to say "I can't answer that with the permissions I have" instead of silently crossing a boundary.
Checklist
- Can access be scoped by user, service, and agent identity?
- Can discovery be restricted before data is queried?
- Can policies be evaluated in the data path, not just at the UI?
- Can temporary credentials be limited and audited?
- Can the same policy model work across more than one engine?
If access breaks the moment a second engine arrives, the architecture isn't open enough for production governance.
Sources to Start With
These primary sources are useful starting points for checking the technical claims behind this topic.