Open Data Infrastructure
Context Graphs for AI Incident Response
How context graphs connect agent actions, data products, lineage, owners, policies, runbooks, and evidence during incidents.
When an AI incident starts, nobody wants a scavenger hunt through prompts, logs, dashboards, and Slack threads.
The practical problem
AI incidents are rarely only model incidents. The failure may involve stale data, missing policy context, a bad retrieval chunk, a changed metric, a tool-call error, or a pipeline run that produced unexpected data.
A context graph connects the objects responders need: agent action, prompt, tool call, data product, source table, lineage run, owner, policy decision, evaluation result, and runbook.
The graph is an incident map
OpenLineage gives a model for runs, jobs, datasets, and facets. W3C PROV frames provenance around entities, activities, and people. A context graph can use those ideas to make AI incident response navigable.
The point is not to build a pretty diagram. The point is to answer incident questions fast. Which data product shaped the answer? Which run produced it? Who owns it? Which policy was evaluated? Which evaluation case should have caught it?
Core idea: AI incident response needs connected evidence across data, policy, lineage, and agent behavior.
The response workflow should start from the artifact
Responders should be able to start from the bad answer, tool call, or denied action and walk backward to sources. That means every runtime artifact needs identifiers that connect to the graph.
The same graph should point forward to mitigation. Which index should be refreshed? Which data product should be quarantined? Which owner should review the policy? Which runbook applies? This is where the context graph becomes operational rather than conceptual.
What breaks first
- Prompt logs exist, but retrieved source IDs are missing.
- Lineage shows pipelines but not agent tool calls.
- Policy decisions are logged separately from the answer they influenced.
- Runbooks name systems, not the data products and owners involved.
Questions to ask
- Can responders start from an answer and find the data product that shaped it?
- Can lineage connect source runs to retrieval artifacts?
- Which owners are reachable from the graph?
- Which evaluation set should be updated after the incident?
For related articles, read Agentic Workflows Need Lineage, Context Graphs for Data Access Decisions, and Retrieval Governance in Open Data Infrastructure.
Sources to start with
These primary sources anchor the technical claims in this guide.
- OpenLineage documentation
- OpenLineage job facets
- W3C PROV overview
- NIST AI Risk Management Framework
Incident response improves when every answer has a path back to evidence.