Proprietary formats rarely fail on day one. They fail in year three, when your data has to survive a tool change, a merger, a new governance policy, or an AI program that wants context from everywhere.

The real cost categories

The obvious cost is conversion. The hidden cost is everything surrounding conversion.

  • Re-export cycles: every new consumer requires a new export path, and exports become a permanent feature.
  • Compatibility gaps: a format might be readable only in one engine, or only with one catalog path.
  • Metadata loss: table history, statistics, and governance signals can be trapped behind format-specific tooling.
  • Operational duplication: teams keep multiple representations of the same data to serve different systems.
  • Exit timing risk: conversion programs collide with business deadlines and create forced platform commitments.

Core idea: proprietary formats do not just lock data. They lock calendars.

Format longevity is governance

Governance is not only about policy. It is also about whether an asset will remain usable and auditable over time. If you cannot read your own data without a specific vendor client, you have created a governance dependency.

Open formats like Parquet and ORC are boring on purpose. Boring is a feature for long-lived assets. It reduces the risk that the organization has to rewrite its storage layer because a product direction changed.

Interoperability is an architecture requirement

Interoperability is not a “nice to have” for modern data systems. It is the constraint that makes multi-engine, multi-team, and AI-assisted architectures possible.

Open data infrastructure assumes that multiple systems will touch the same data: query engines, catalogs, governance tools, lineage systems, and AI applications. When the format boundary is closed, every one of those integrations becomes custom work.

When a proprietary format can be acceptable

Sometimes a proprietary format is a pragmatic choice for a short-lived dataset, or for a narrow workload where the vendor provides a real and measurable benefit. The key is to make that choice explicit and scoped.

  • Use proprietary formats for transient, non-critical outputs.
  • Keep durable, high-value datasets in open formats or open table formats.
  • Require a tested export path that preserves the governance signals you need.
  • Model the exit cost before you commit, not after you are trapped.

If the vendor cannot provide a credible portability story, you are not buying a feature. You are buying dependency.

Sources to start with

Start with the format specifications and documentation maintained by the projects.