Open Data Infrastructure
Feature Stores and Open Data Infrastructure
Feature stores connect training and serving. ODI makes the offline feature layer governed, portable, and explainable instead of trapped inside one ML platform.
Feature stores are where ML teams rediscover data engineering, usually with tighter latency and higher consequences.
Features are data products with latency requirements
A feature store sounds like an ML system. In practice, it is a data platform boundary. It has historical data for training, low-latency values for serving, entity definitions, transformation logic, freshness expectations, and lineage requirements.
If that boundary is closed, the model team may move fast for one use case and trap feature meaning inside one serving stack. If the offline feature layer is open and governed, features can become reusable data products.
Core idea: a feature store should not be the place where open data contracts disappear on the way to production ML.
The offline store is the ODI anchor
Feast describes the offline store as the interface for historical time-series feature values stored in data sources, used for building training datasets and materializing features into an online store. That is exactly where ODI should show up.
The offline store should connect to governed, versioned, lineage-aware data. Open tables are a strong fit because training data needs reproducibility. If a model was trained on a feature set, the team should be able to reconstruct which data, feature definitions, and time windows produced it.
Online serving does not excuse closed provenance
Online feature serving has different constraints. Low latency matters. Availability matters. Operational safety matters. Those constraints can justify specialized stores, caches, and serving systems.
They do not justify losing provenance. A feature value used at serving time should still trace back to its definition, source, freshness, and materialization path. Otherwise model debugging becomes guesswork with dashboards.
Feature governance has to be machine-readable
ML teams need more than a catalog page. Feature governance should answer:
- Which entity does this feature describe?
- Which source tables and transformations produced it?
- Which training datasets used it?
- Which models serve with it?
- Which policies restrict its use?
Those answers should be available to the platform, not trapped in a launch document.
The practical ODI feature-store pattern
A durable pattern looks like this: open tables hold the historical feature record, lineage captures source and transformation paths, the feature registry defines entities and feature views, and online stores serve low-latency values with auditable materialization.
The feature store remains important. ODI keeps it from becoming the only place where feature meaning exists.
Sources to start with
Use feature-store docs for offline and online store behavior, then connect the offline layer to open table and lineage standards.