Streaming data products fail in strange ways when nobody can explain what "fresh" meant at the moment a result was served.

Streaming evidence starts with time

Apache Flink separates event time from processing time. That distinction matters because many data products care about when an event happened, not only when the system processed it. A late event can be correct, but it can still break a freshness promise if the platform cannot explain how it was handled.

Watermarks are the mechanism Flink uses to measure progress in event time. They let a stream processor say that event time has advanced to a point, with assumptions about older events. That is useful operational evidence, but only if teams capture it in a way consumers can understand.

Watermarks explain progress

Flink documentation describes watermarks in the DataStream API and in SQL DDL through the WATERMARK clause. Those details should not stay buried in code. A data product should expose the watermark behavior that supports its freshness claim.

If an AI feature reads a streaming-derived table, the answer should be able to carry a freshness receipt. The receipt does not need to include every internal metric, but it should show event-time progress, late-data policy, sink state, and the pipeline run that produced the output.

Core idea: watermarks become governance evidence when they connect stream progress to a consumer-facing freshness promise.

Audit trails need interpretation

For related ODI context, read Flink watermarks as freshness evidence, Flink checkpoint lineage, and streaming lakehouse data contracts.

The audit trail should connect the watermark strategy, checkpoint or savepoint evidence, source offsets, sink commits, and late-data handling. Otherwise incident review becomes a debate about what the stream probably did.

What breaks first

  • The watermark strategy changes, but the data product contract does not.
  • Late data is dropped or delayed without a consumer-visible reason.
  • Freshness checks use processing time while the business question uses event time.
  • Agents read streaming outputs without knowing the current watermark.

Watermark review questions

Ask which timestamp defines truth, which watermark strategy is used, how late data is handled, and where the freshness receipt is stored. Those answers should be available before the dashboard turns red.

Sources to start with

These primary sources anchor the technical claims in this guide.

Freshness is not a feeling in a streaming system. It is a time claim that needs evidence.