Open Data Infrastructure
dbt Core in an Open Data Infrastructure Stack
dbt Core can be a strong transformation layer in ODI, but transformation code is not the whole infrastructure contract.
dbt Core is not the data platform. It is the part of the platform where transformation discipline can become explicit.
The practical problem
Analytics engineering made transformation work more visible. That was a real step forward. SQL moved into version control, dependencies became inspectable, tests became normal, and models stopped living only in another person's saved query folder.
ODI raises the bar again. Transformation code has to fit into a wider infrastructure contract: open table formats, catalogs, metadata, lineage, policy, semantic meaning, and AI-ready context.
Core idea: dbt Core is strongest in ODI when it owns transformation discipline and integrates cleanly with the layers that own table state, policy, lineage, and context.
The ODI boundary
dbt Core should own models, dependencies, tests, documentation, and transformation logic. It can also define model contracts in supported contexts and participate in semantic-model workflows. Those are important contracts.
It should not be asked to own every infrastructure concern. dbt does not replace the table format, the catalog, the access-control model, the lineage event system, or the operational data plane. Asking it to do all of that creates a fragile stack with great SQL and unclear control.
Patterns that work
Use model contracts for public models that downstream systems depend on. The dbt docs frame contracts as upfront guarantees about the shape of a model. That maps nicely to ODI because consumers, reports, and agents need predictable structures.
Keep the table layer open. If dbt builds tables on an open lakehouse, use formats and catalogs that preserve table behavior across engines. The transformation layer should not require one warehouse forever.
Export lineage and metadata into the wider control plane. dbt metadata is useful, but agents and governance systems need a shared view across ingestion, transformation, query, and application usage.
Failure modes
The first failure is dbt-as-governance. Tests and documentation help, but they do not enforce access policy or explain every downstream use of data.
The second failure is warehouse lock-in hidden inside SQL. A dbt project can be beautifully organized and still depend on proprietary functions, permissions, and storage behavior.
The third failure is semantic drift. Metric definitions live in one layer, model contracts in another, dashboards somewhere else, and agents infer meaning from names. That is how "revenue" becomes four incompatible concepts with one confident chatbot on top.
Questions to ask
- Which dbt models are public contracts, and which are internal implementation details?
- Can model outputs move to another engine without losing table semantics?
- Where do access policy, lineage, and quality signals live outside dbt?
- How are semantic models connected to catalogs and agent context?
- Which warehouse-specific assumptions are tested before migration?
For related transformation material, read dbt Core vs SQLMesh in the Open Lakehouse, SQLMesh State and Data Contracts, and Migrating the Semantic Layer.
Sources to start with
Start with the primary docs. They are the contracts you can test against, not commentary about the contracts.