Object storage pricing is the calm surface. The real cost profile comes from how your table format, engine, and retention rules turn “files in a bucket” into “a database people can trust.”

The storage cost myth

The open lakehouse story is usually told as “cheap storage, flexible compute.” That is directionally right, but incomplete. Storage cost is not only dollars per GB per month. It includes the work you do to keep data queryable, consistent, and governed over time.

If you model storage as only capacity, you will underestimate the bill and overestimate portability. Both mistakes show up later as emergency cleanup programs.

What actually drives cost

For an open lakehouse, storage economics are shaped by a handful of drivers:

  • Object operations: list and get patterns, small file explosions, and metadata scans.
  • Table maintenance: compaction, file rewrite, clustering, and statistics collection.
  • Retention behavior: snapshots, delete behavior, and the cost of safety windows.
  • Partition and layout strategy: too many partitions drives metadata and file counts, too few drives scan cost.
  • Cross-region and cross-service movement: replication and data transfer costs can dominate capacity.

Core idea: open storage is cheap. Trustworthy storage is not free.

Compaction is a bill, not a background task

Compaction is the maintenance work that turns messy ingest output into efficient read patterns. In an open lakehouse, compaction is not optional. It is the cost of keeping performance predictable.

Teams get into trouble when compaction is treated as “someone else will handle it.” If compaction ownership is unclear, costs climb in three places at once: storage grows due to churn, compute spend rises due to scanning, and operational risk rises because maintenance tasks become a backlog.

When you choose a table format, you are choosing a maintenance model. That is why table format specs and their maintenance behaviors matter for ODI.

Design choices that keep costs predictable

You do not need perfection. You need predictability.

  • Set file sizing targets: avoid a small file regime that turns every query into an object-store metadata storm.
  • Make compaction explicit: schedule it, measure it, and assign ownership.
  • Use retention policies intentionally: safety windows are valuable, but they are not free.
  • Keep table layouts aligned to access patterns: storage layout is a query cost decision.
  • Track “cost per TB served”: storage is not a static asset, it is an active service.

An open lakehouse is a good default when you want long-lived data in portable contracts. It becomes a bad deal when you treat maintenance as an afterthought.

Sources to start with

These sources cover object storage pricing and table maintenance behavior.