DataFusion Logical Plans as Policy Evidence

A query should not become a governance incident before policy gets to inspect what it is about to do.

Policy needs preflight evidence

Apache DataFusion documentation defines a logical plan as a structured representation of a query that describes high-level operations such as filtering, sorting, and joining. The optimizer can rewrite those plans before execution.

That is useful for governed data APIs because policy systems need more than a raw SQL string. They need to know which columns are projected, which filters are applied, which joins move data across domains, and which outputs could expose sensitive context.

Logical plans expose intent

Logical plans are not a complete security model. They are preflight evidence. A governed query service can inspect a plan, attach policy decisions, rewrite allowed portions, reject risky operations, and record why the decision was made.

This is especially useful when queries come from agents, notebooks, embedded analytics, or application APIs. The user may not understand the full data boundary. The platform still has to.

Core idea: Policy should inspect the shape of work before execution turns intent into data movement.

The ODI control point

In Open Data Infrastructure, DataFusion can sit behind a data product API as a query engine library. The control point is not the engine alone. It is the connection between plan evidence, catalog metadata, policy code, lineage, and audit records.

For adjacent context, read DataFusion policy-aware query services, DataFusion for data product APIs, and policy enforcement in open systems.

What breaks first

Policy evaluates the request path but never sees the projected columns.
Optimized plans change execution behavior without leaving policy evidence.
Generated SQL is accepted because it parses, even though the join crosses a restricted domain.
Audit logs record query text but not the policy decision attached to the logical plan.

Questions to ask

Ask where logical plans are captured, which policy inputs they expose, and whether policy decisions survive optimizer rewrites. Ask whether the final audit record includes plan shape, catalog context, and denial reasons.

If the policy system can not see the query shape before execution, it is doing governance after the interesting part already happened.

Sources to start with

These primary sources anchor the technical claims in this guide.

The plan is where query intent becomes visible enough to govern.

ODI hub Article library Use the scorecard Policy-aware query services Data product APIs

Get started with Apache Iceberg, today! Want to learn more? Visit https://www.opendatainfrastructure.com/