Open Data Infrastructure
What Is Compute-Storage Separation?
Compute-storage separation is why modern data stacks can scale, but it is also why ownership, portability, and governance have to be designed on purpose.
If your storage and compute are welded together, your platform vendor is not only selling performance. They are selling control.
The definition
Compute-storage separation means the system stores data in a shared storage layer and runs compute as a separate, scalable layer that can be increased or decreased independently.
In practice, this usually means object storage or a shared data layer holds the data, and one or more compute engines read and write it. The engines may be ephemeral. The storage is meant to be durable.
Why it matters
Separation matters because it changes the economics and the architecture:
- You can scale compute without duplicating data.
- You can run multiple compute engines on the same storage substrate.
- You can isolate workloads and cost domains more cleanly.
But separation also creates a new problem. If storage is shared, the system needs a contract for what the tables mean, how they change, and how policy is enforced.
Core idea: decoupling compute and storage makes portability possible, but it does not make it automatic.
The ODI angle
ODI exists because compute-storage separation is not enough. When storage is shared, the power moves to whoever controls the metadata and the access path.
Open table formats (like Iceberg) and open catalog interfaces are the contract layer that keeps compute and storage separation from becoming a new kind of lock-in.
Tradeoffs and failure modes
The tradeoffs are predictable:
- Metadata becomes critical infrastructure: if catalog and lineage are weak, the shared storage layer becomes a swamp.
- Governance becomes harder: multiple engines and tools increase the number of access paths to secure and audit.
- Portability can still fail: vendors can keep you locked in through catalog features, proprietary formats, or policy layers, even when storage looks "open."
If you treat compute-storage separation as the portability strategy, you will eventually learn that the contract layer was the real strategy.
Sources to start with
Start with open table specs and one concrete vendor architecture description, then map the ideas back to ODI contracts.