Open Data Infrastructure

The architecture, standards, and ecosystem that let organizations access, move, govern, query, and build on data without being trapped inside one vendor platform.

Open Table Format

A storage-layer table standard that preserves schemas, partitions, snapshots, and transaction behavior across compatible engines. Apache Iceberg is a common example.

Catalog

A metadata and coordination service that helps tools discover tables, manage permissions, track schemas, and coordinate table operations.

Apache Iceberg

An open table format for large analytic datasets. It helps engines safely share tables through metadata, snapshots, schema evolution, partition evolution, and transactions.

Apache Polaris

An open catalog implementation for Iceberg-based architectures. The catalog layer is important because it is where table metadata, permissions, and interoperability meet.

Lineage

The record of where data came from, how it changed, and which downstream data products, reports, models, or applications depend on it.

Metadata

Data about data: schemas, owners, freshness, quality, usage, policies, lineage, descriptions, and operational state.

AI-Ready Data

Data that is governed, documented, accessible, fresh enough, and reliable enough for AI applications and agents to use safely.

Interoperability

The ability for systems to work together through shared standards and contracts instead of brittle one-off integrations.

ADBC

Arrow Database Connectivity, a database access standard built around Apache Arrow for efficient columnar data movement.

Data Product

A governed, documented, maintained data asset designed for reuse by consumers, applications, analytics workflows, or AI systems.

Vendor Lock-In

A state where the cost of moving data, logic, metadata, or workloads away from a vendor becomes high enough to reduce customer control.