Context Graphs for AI Root Cause Analysis

AI incidents are rarely one broken thing. They are usually a chain: context, policy, source data, tool behavior, prompt, and evaluation all leaning in the wrong direction.

Incidents need a graph

A context graph connects business meaning, metadata, lineage, policy, owners, data products, tools, and evaluation evidence. That makes it useful for root cause analysis because agent failures usually cross system boundaries.

W3C PROV defines provenance around entities, activities, and agents. OpenLineage defines jobs, runs, datasets, and facets. OpenTelemetry traces connect operations inside distributed systems. A context graph can bring those evidence types together for AI systems.

The graph should connect evidence

For an AI incident, the graph should connect the answer to the prompt, retrieved context, tool calls, source data products, policy checks, freshness status, owners, lineage paths, and evaluation result. The goal is not a pretty diagram. The goal is a shorter path from symptom to cause.

If the answer was wrong because the policy label was stale, the graph should show that. If the source data was correct but the retrieval rule chose the wrong data product, the graph should show that too.

Core idea: Context graphs make AI root cause analysis possible because they connect model behavior to data infrastructure evidence.

The ODI analysis pattern

Open Data Infrastructure already contains the control points a context graph needs: catalogs, lineage, policy, metadata, access logs, and data product ownership. The graph should unify those signals rather than create a parallel investigation layer.

For adjacent context, read the context graph, context graphs for AI incident response, and context provenance receipts.

What breaks first

The evaluation failure is stored separately from the data access trace.
Lineage shows table dependencies but not the context payload used by the agent.
Policy decisions are logged but not connected to owners or source data products.
Incident review blames the prompt because the data path is invisible.

Questions to ask

Ask which identifiers connect prompts, tool calls, datasets, policies, and evaluations. Ask whether the graph can answer who owns the broken context and which downstream workflows were affected.

Root cause analysis starts when the answer stops being isolated from the data path.

Sources to start with

These primary sources anchor the technical claims in this guide.

The graph is where the AI incident becomes explainable.

ODI hub Article library Use the scorecard Context graph Incident response

Get started with Apache Iceberg, today! Want to learn more? Visit https://www.opendatainfrastructure.com/