Context Graphs for AI Incident Response

When an AI incident starts, nobody wants a scavenger hunt through prompts, logs, dashboards, and Slack threads.

The practical problem

AI incidents are rarely only model incidents. The failure may involve stale data, missing policy context, a bad retrieval chunk, a changed metric, a tool-call error, or a pipeline run that produced unexpected data.

A context graph connects the objects responders need: agent action, prompt, tool call, data product, source table, lineage run, owner, policy decision, evaluation result, and runbook.

The graph is an incident map

OpenLineage gives a model for runs, jobs, datasets, and facets. W3C PROV frames provenance around entities, activities, and people. A context graph can use those ideas to make AI incident response navigable.

The point is not to build a pretty diagram. The point is to answer incident questions fast. Which data product shaped the answer? Which run produced it? Who owns it? Which policy was evaluated? Which evaluation case should have caught it?

Core idea: AI incident response needs connected evidence across data, policy, lineage, and agent behavior.

The response workflow should start from the artifact

Responders should be able to start from the bad answer, tool call, or denied action and walk backward to sources. That means every runtime artifact needs identifiers that connect to the graph.

The same graph should point forward to mitigation. Which index should be refreshed? Which data product should be quarantined? Which owner should review the policy? Which runbook applies? This is where the context graph becomes operational rather than conceptual.

What breaks first

Prompt logs exist, but retrieved source IDs are missing.
Lineage shows pipelines but not agent tool calls.
Policy decisions are logged separately from the answer they influenced.
Runbooks name systems, not the data products and owners involved.

Questions to ask

Can responders start from an answer and find the data product that shaped it?
Can lineage connect source runs to retrieval artifacts?
Which owners are reachable from the graph?
Which evaluation set should be updated after the incident?

Sources to start with

These primary sources anchor the technical claims in this guide.

Incident response improves when every answer has a path back to evidence.

ODI hub Article library Use the scorecard Agentic lineage Context access decisions Retrieval governance

Get started with Apache Iceberg, today! Want to learn more? Visit https://www.opendatainfrastructure.com/