The first cloud bill that shocks teams is not compute. It is data movement. That bill is also the mechanism that turns “we could switch later” into “we will never switch.”

Data gravity is not a metaphor

Data gravity is the practical effect of accumulated dependencies. As data grows and more systems depend on it, moving the data becomes harder, and the cost of moving it becomes less predictable. Gravity is not the file copy. Gravity is everything attached to the data: permissions, schemas, table history, lineage, quality expectations, application contracts, and who gets paged when a job fails.

In closed stacks, those attachments become proprietary. Your “exit plan” becomes an engineering program with unclear scope and a questionable end date.

Egress fees are a control surface

Cloud providers price data transfer separately for a reason. Data transfer is the tax on leaving a boundary. It also becomes the tax on multi-cloud, hybrid deployments, cross-region resilience, and any architecture that wants to move compute closer to users.

You do not need a conspiracy theory for this to matter. If the economics make movement expensive, movement stops. When movement stops, the architecture calcifies, and negotiation power shifts away from the customer.

Core idea: egress is not only a line item. It is how platform boundaries become durable.

Patterns that reduce egress risk

ODI does not make data movement free. It makes it planned. It makes it measurable. It reduces the number of things that have to be reinvented when data crosses a boundary.

These patterns help:

  • Keep durable data in open contracts: open table formats reduce the amount of semantic meaning trapped in one engine or warehouse.
  • Bring compute to the data: do not export data to run one query. Use engines that can run where the data lives.
  • Design for replication: treat replication as normal, not exceptional, so the team builds operational muscle and knows the real cost.
  • Make governance portable: treat policy, lineage, and auditability as part of the asset so movement does not destroy trust.
  • Use explicit “data movement budgets”: treat large transfers like any other costed project, with an owner and a forecast.

The point is not to move everything constantly. The point is to keep the option to move without rewriting the world.

Measure the right thing

Teams usually track storage and query spend. They do not track the cost of movement until the day movement becomes urgent.

Track:

  • Out-of-boundary data transfer by domain.
  • Cross-region transfer caused by architecture, not by users.
  • Requests that “export for analysis” rather than querying in place.
  • Time-to-replicate a critical dataset, including the governance and validation work.

If you never measure movement, you will discover the cost only when you are already trapped.

Sources to start with

Start with primary pricing documentation and the specs that enable portability.