Open Data Infrastructure
DataFusion Session Boundaries for Data Product APIs
How DataFusion sessions define catalog, object store, runtime, UDF, and policy boundaries for governed query services.
A query service is not governed because it has an API in front of it.
The session is the control boundary
DataFusion documentation describes SessionContext as the main interface for executing queries. It owns registered tables, views, configuration, and runtime context. That makes the session boundary a serious architecture decision for data product APIs.
If every request shares one loose session, policies and runtime assumptions blur together. If each tenant, purpose, or product boundary gets its own session context, the service can make catalog registration, object-store access, UDF exposure, and runtime settings explicit.
Data product APIs need repeatable context
A data product API should be able to explain which tables were registered, which catalog supplied them, which runtime settings applied, which user-defined functions were available, and which policy context was checked before execution.
That does not require turning every API call into a giant governance ceremony. It does require refusing to let session state become invisible shared memory.
Core idea: Session boundaries decide whether governed query behavior is explicit or accidental.
The ODI pattern separates engine power from policy responsibility
DataFusion is valuable because it gives builders a composable query engine. Open Data Infrastructure asks where the surrounding contract lives. The answer should include catalogs, policies, lineage, observability, and owner review, not only SQL execution.
For more context, read DataFusion execution metrics, policy-aware query services, and the ODI reference architecture for agentic analytics. The same session boundary that serves humans will serve agents.
What breaks first
Shared session state is convenient until the first incident review asks why a query saw a table it should not have seen.
- A table remains registered after the request that needed it is gone.
- A UDF is available to every tenant because it was registered globally.
- Runtime settings differ between tests and production without a review record.
- Object store credentials sit below the API layer and never appear in access evidence.
Questions to ask before shipping
Ask whether sessions are scoped by tenant, data product, purpose, or request. Ask how table registration is audited and how policy context reaches the planner. Ask what happens when a session fails halfway through a query.
The API contract should name these boundaries. Otherwise the contract lives in code paths that only one maintainer remembers.
Sources to start with
These primary sources anchor the technical claims in this guide.
- DataFusion SQL API documentation
- DataFusion Python SessionContext basics
- DataFusion data sources documentation
- DataFusion configuration documentation
A governed query API starts with a session boundary that means something.