Open Data Infrastructure
FinOps for the Open Lakehouse
Allocation, accountability, and optimization for open lakehouse spend across storage, compute, movement, and governance.
The open lakehouse makes costs more visible, and more fragmented. That is good. It also means you need a real operating model for spend, not a monthly surprise meeting.
Why lakehouse spend is hard
Warehouses concentrate spend into one bill. Open lakehouses spread spend across storage, compute engines, orchestration, catalogs, and data movement. The result is that nobody owns “the data bill,” and everyone has a plausible story about why their piece is necessary.
FinOps is how you make the system honest. It is not a tool. It is a shared practice that links engineering decisions to financial accountability.
A practical cost model
If you want to manage open lakehouse spend, you need to measure it in units that match how teams build systems.
- Storage unit cost: cost per TB per month, including request patterns and maintenance tasks.
- Compute unit cost: cost per governed query, cost per pipeline run, cost per model feature batch.
- Movement unit cost: cost per TB replicated, cost per export, and cost per cross-region transfer.
- Control plane unit cost: cost per dataset governed, catalog operations per week, lineage coverage.
These units force better conversations. They make it clear when you are paying for real value, and when you are paying for accidental architecture.
Core idea: FinOps is the bridge between “open” and “sustainable.”
Allocation that people accept
Allocation fails when it feels arbitrary. It succeeds when it matches how work is structured.
- Allocate by product or domain: if a domain owns the data, it should see the unit economics.
- Separate shared platform cost: governance, metadata, and reliability are shared infrastructure, not individual team overhead.
- Show the movement bill: exports and cross-region replication are often the hidden drivers.
- Make cost reviews part of engineering: treat cost changes like performance changes, not like procurement events.
Optimization that does not break trust
The worst cost optimizations destroy trust. They make pipelines flaky, make data stale, or break governance coverage.
The safe optimizations are structural:
- Fix small file regimes: storage and compute costs both drop when layout is sane.
- Make maintenance explicit: compaction and retention are cheaper when scheduled than when reactive.
- Reduce unnecessary movement: query in place when possible, and design “copy once” patterns when copying is required.
- Keep compute optional: portability is a cost control mechanism because it preserves engine choice.
FinOps for an open lakehouse is not about chasing the lowest number this month. It is about building a system where the unit economics stay predictable as workloads change.
Sources to start with
Start with the FinOps framework, then ground the rest in primary pricing documentation.