Open Data Infrastructure
A CISO's Guide to Open Data Infrastructure
Control, auditability, and no black-box trust make openness a security posture, not a risk.
Security teams do not lose sleep over open source. They lose sleep over invisible control planes, unverifiable data paths, and audit trails that only exist inside one vendor console.
Why it matters
If your data stack is a closed system, your security posture is a trust exercise. You trust that the vendor enforces policy correctly, logs what matters, and will keep your access model stable as products evolve. You also trust that you can leave if the risk profile changes.
ODI is about reducing blind trust. It keeps the contracts that matter in places the customer can inspect: table metadata, catalog boundaries, policy enforcement points, lineage events, and audit signals that survive tool changes.
The ODI security angle
Open does not mean public. It means inspectable and portable. You can keep enterprise data private while still insisting that the infrastructure contracts are explicit and not proprietary.
Think of ODI as a security control system with four layers:
- Discovery control: who can discover that data exists, and who can see metadata.
- Access control: who can read, write, and administer data and policies.
- Policy control: where enforcement happens (engine, catalog, gateway, storage) and whether it is consistent.
- Evidence control: audit logs and lineage that can prove what happened, when it happened, and why it happened.
Core idea: security gets harder when the evidence is proprietary.
Controls that actually hold
A useful ODI security posture does not start with tool comparisons. It starts with the control points you need to exist regardless of which engine or vendor is currently fashionable.
- Catalog boundary clarity: one place to answer, “What data exists, who owns it, and who can access it?”
- Policy enforcement in the path: permissions and policy must apply at query time, not after a report is generated.
- Portable lineage: lineage events should be readable outside a single vendor UI so you can correlate incidents and prove data provenance.
- Separation of concerns: keep table format, catalog, and query engine decisions separable so a security requirement does not force a full platform migration.
- Repeatable audit: test that the policy model is enforced consistently across engines and access paths.
In practice, this often means open table formats, open catalog APIs, and standardized lineage events. Not because they are trendy, but because they keep the evidence usable.
Failure modes
Most ODI failures are not “we chose the wrong product.” They are “we never owned the control plane.”
- Access policies exist in multiple places and drift over time.
- Audit logs are incomplete because the enforcement point is inconsistent.
- Lineage exists, but it is trapped behind a proprietary interface and cannot be joined with incident timelines.
- New AI and agent workflows bypass the normal controls because they are treated as experimentation.
- Export is possible, but the exported data loses the metadata and evidence needed for regulated use.
Questions to ask
- Where is policy enforced, and how do we prove it is enforced?
- Can we reconstruct who accessed a sensitive domain and why, across all tools?
- Can we correlate lineage events with incidents and change management?
- If we changed query engines, what security controls would silently disappear?
- What happens to auditability when AI agents start querying data directly?
If the answers depend on one vendor console, you have a security dependency, not a security posture.
Sources to start with
Start with the specs and standards that define portable contracts and evidence.