Open Data Infrastructure
DataFusion UDF Boundaries for Data Product APIs
How DataFusion UDFs can become reviewed API boundaries for data product services instead of hidden runtime shortcuts.
A user-defined function looks small until it becomes the place business logic, policy logic, and performance risk hide together.
UDFs are API boundaries
Data product APIs often need logic that plain SQL does not express cleanly. Teams add a user-defined function, expose an endpoint, and move on. That is convenient. It is also the moment the API contract becomes harder to audit.
The ODI lens treats a UDF as a boundary. It has inputs, outputs, owners, versions, allowed data types, policy assumptions, and failure behavior. If a data product API depends on it, the function is no longer a local implementation detail.
DataFusion makes functions explicit
DataFusion documents scalar, window, aggregate, and table UDF patterns. It also exposes catalogs, table providers, and plans that can be inspected around execution. That combination gives API owners a place to register behavior and a place to review how the behavior enters the query path.
Core idea: a UDF that affects a data product answer should have the same review posture as the API that exposes the answer.
Review the function, not just the query
Review the function signature, volatility expectations, null behavior, error behavior, resource cost, and policy context. Log which UDF version ran for a request. If the function touches external state, treat that state as part of the data product contract.
For related guidance, read DataFusion UDF boundaries for governed query services, DataFusion logical plans as policy evidence, and agentic AI tool schemas as data contracts.
What breaks first
- A UDF implements business logic without an owner or version.
- A policy check hides inside a function that auditors cannot inspect.
- A function calls an external service without timeout and replay rules.
- The API logs query text but not the function version that changed the answer.
UDF governance questions
Ask whether every UDF has an owner, version, test fixture, resource budget, and policy assumption. If the function can change the meaning of an answer, it belongs in the review record.
Sources to start with
These primary sources anchor the technical claims in this guide.
- DataFusion user-defined functions documentation
- DataFusion functions documentation
- DataFusion custom table provider documentation
- OpenLineage object model documentation
A data product API is only as reviewable as the functions it trusts.