Open Data Infrastructure
DataFusion Logical Plans as Policy Evidence
How DataFusion logical plans can expose projection, filters, joins, and data movement before governed query services execute SQL.
A query should not become a governance incident before policy gets to inspect what it is about to do.
Policy needs preflight evidence
Apache DataFusion documentation defines a logical plan as a structured representation of a query that describes high-level operations such as filtering, sorting, and joining. The optimizer can rewrite those plans before execution.
That is useful for governed data APIs because policy systems need more than a raw SQL string. They need to know which columns are projected, which filters are applied, which joins move data across domains, and which outputs could expose sensitive context.
Logical plans expose intent
Logical plans are not a complete security model. They are preflight evidence. A governed query service can inspect a plan, attach policy decisions, rewrite allowed portions, reject risky operations, and record why the decision was made.
This is especially useful when queries come from agents, notebooks, embedded analytics, or application APIs. The user may not understand the full data boundary. The platform still has to.
Core idea: Policy should inspect the shape of work before execution turns intent into data movement.
The ODI control point
In Open Data Infrastructure, DataFusion can sit behind a data product API as a query engine library. The control point is not the engine alone. It is the connection between plan evidence, catalog metadata, policy code, lineage, and audit records.
For adjacent context, read DataFusion policy-aware query services, DataFusion for data product APIs, and policy enforcement in open systems.
What breaks first
- Policy evaluates the request path but never sees the projected columns.
- Optimized plans change execution behavior without leaving policy evidence.
- Generated SQL is accepted because it parses, even though the join crosses a restricted domain.
- Audit logs record query text but not the policy decision attached to the logical plan.
Questions to ask
Ask where logical plans are captured, which policy inputs they expose, and whether policy decisions survive optimizer rewrites. Ask whether the final audit record includes plan shape, catalog context, and denial reasons.
If the policy system can not see the query shape before execution, it is doing governance after the interesting part already happened.
Sources to start with
These primary sources anchor the technical claims in this guide.
- Apache DataFusion logical plans documentation
- Apache DataFusion query optimizer documentation
- Open Policy Agent documentation
- DataHub access policies documentation
The plan is where query intent becomes visible enough to govern.