Open Data Infrastructure
DuckDB ATTACH Patterns for Portable Data Products
How DuckDB ATTACH patterns compose local files, governed fixtures, catalogs, and portable analytics without hiding data origin.
Portable analytics fails when "local" becomes another word for "nobody knows where this came from."
ATTACH makes source boundaries visible
DuckDB ATTACH lets one connection operate across multiple databases. The documentation calls out local database files as well as HTTP and S3 endpoints, with remote HTTP and S3 attachments read-only by default. That is a useful pattern for portable data products because it keeps source boundaries visible inside the query environment.
A local-first workflow can attach a production extract, a test fixture, a reference database, and a scratch database without pretending they are the same thing. The names become part of the contract.
Portable does not mean uncontrolled
The problem with local analytics is not that it is local. The problem is that it often loses ownership, refresh history, and policy context. A good ATTACH pattern names each data source, records how it was created, and separates read-only reference data from writable scratch work.
That pattern matters for data products because tests need repeatable inputs. A portable data product should say which attached database holds source fixtures, which holds expected outputs, and which database is safe for temporary analyst work.
Core idea: ATTACH is most valuable when it makes boundaries explicit instead of making everything feel like one anonymous workspace.
The ODI pattern connects local work to governed sources
Open Data Infrastructure does not ban local analytics. It asks whether local analytics can preserve enough context to be trusted. DuckDB is powerful because it can sit close to files and developer workflows. ODI adds the discipline around provenance, policy, and repeatability.
For related patterns, read DuckDB extension allowlisting, data product versioning, and the ODI reference architecture. Portability is useful only when meaning travels with the data.
What breaks first
Local workflows usually fail at the handoff from clever query to repeatable product.
- An attached file moves, but no one knows which report depended on it.
- A writable scratch database gets treated like a governed source.
- Remote data is queried without recording object paths, snapshot time, or refresh date.
- Test fixtures are updated by hand and stop matching the production contract.
A practical naming pattern
Name attached databases by role, not by mood: source, reference, expected, scratch, and publish. Keep source and reference read-only where possible. Write outputs to a separate database so the review path can compare source inputs, transformations, and published results.
That small convention turns a DuckDB session into a reviewable data product workspace. Not fancy. Very useful.
Sources to start with
These primary sources anchor the technical claims in this guide.
- DuckDB ATTACH statement documentation
- DuckDB data sources documentation
- DuckDB importing data overview
- W3C PROV overview
Portable data products do not hide locality. They make locality inspectable.