Open Data Infrastructure
The architecture, standards, and ecosystem that let organizations access, move, govern, query, and build on data without being trapped inside one vendor platform.
Open Data Infrastructure
Plain-English definitions for the terms behind open data infrastructure, lakehouse architecture, data ownership, and AI-ready data systems.
The architecture, standards, and ecosystem that let organizations access, move, govern, query, and build on data without being trapped inside one vendor platform.
A storage-layer table standard that preserves schemas, partitions, snapshots, and transaction behavior across compatible engines. Apache Iceberg is a common example.
A metadata and coordination service that helps tools discover tables, manage permissions, track schemas, and coordinate table operations.
An open table format for large analytic datasets. It helps engines safely share tables through metadata, snapshots, schema evolution, partition evolution, and transactions.
An open catalog implementation for Iceberg-based architectures. The catalog layer is important because it is where table metadata, permissions, and interoperability meet.
The record of where data came from, how it changed, and which downstream data products, reports, models, or applications depend on it.
Data about data: schemas, owners, freshness, quality, usage, policies, lineage, descriptions, and operational state.
Data that is governed, documented, accessible, fresh enough, and reliable enough for AI applications and agents to use safely.
The ability for systems to work together through shared standards and contracts instead of brittle one-off integrations.
Arrow Database Connectivity, a database access standard built around Apache Arrow for efficient columnar data movement.
A governed, documented, maintained data asset designed for reuse by consumers, applications, analytics workflows, or AI systems.
A state where the cost of moving data, logic, metadata, or workloads away from a vendor becomes high enough to reduce customer control.