Open Data Infrastructure
SQLGlot Lineage for Portable Data Models
Where SQLGlot lineage helps migration evidence and where catalog-governed lineage still has to take over.
SQL migration fails when syntax translation looks complete and meaning quietly moved.
The practical problem
SQLGlot is useful in Open Data Infrastructure because portable SQL is part of portable data products. It can parse, transform, and inspect SQL across dialects, and its lineage API can help teams understand column dependencies.
That is migration evidence. It is not a complete governance model. A parser sees the query text. It does not automatically know the runtime policy, catalog ownership, freshness, quality state, or business contract behind each table.
Parser lineage is still valuable
Parser-derived lineage helps answer migration questions quickly. Which columns does this model read? Which expressions produce this metric? Which dialect features will break when the query moves? Which derived fields should be compared before cutover?
Those answers are practical. They turn a migration from vibes into a testable inventory. They also help build compatibility harnesses for open lakehouse moves, especially when teams are translating many warehouse-specific models.
Core idea: SQLGlot lineage is excellent evidence for query intent. Catalog-governed lineage is still needed for platform truth.
The catalog has to carry the rest of the truth
A portable data model needs more than parsed SQL. It needs table identity, owner, schema version, policy, lineage from upstream jobs, and consumer commitments. That is the catalog and metadata layer.
The right pattern is not parser versus catalog. Use SQLGlot to inspect and test query logic. Use the catalog to connect that logic to governed data products. That is how SQLGlot compatibility testing becomes part of an ODI migration path.
What breaks first
- The migrated SQL compiles, but the metric changed because table grain was misunderstood.
- Parser lineage finds column references but misses policy restrictions attached to the source table.
- Dialect translation preserves syntax while changing null, time zone, or type behavior.
- Migration evidence lives in a spreadsheet instead of the data product contract.
Questions to ask
- Which lineage comes from parsed SQL, and which comes from runtime metadata?
- Which translated queries have golden-result tests?
- Which semantic assumptions are outside the SQL text?
- Can the catalog identify the same data product after migration?
For related migration work, read SQLGlot for Open Data Infrastructure Migration, SQLGlot and Portable SQL, and Data Product Versioning in ODI.
Sources to start with
These primary sources anchor the technical claims in this guide.
Portable SQL matters, but portable meaning is the real migration.