Skip to main content

AI Anomaly Explanation

When an anomaly or trend alert fires, understanding its root cause can require cross-referencing sync events, upstream data sources, quality checks, and schema changes. The AI Anomaly Explanation feature automates this investigation by gathering context signals and generating a natural-language hypothesis.

Triggering an Explanation

Open any anomaly or trend alert from the Alerts Center and click the Explain button. The AI gathers relevant context about the alert's data product and produces an explanation within a few seconds.

Explanations are cached — once generated, the same explanation is returned on subsequent views without re-invoking the LLM. To regenerate, click Explain again on an alert whose metadata has been cleared.

Context Signals

The explanation service gathers four categories of context before calling the LLM:

Sync Events

Recent connector sync events for the alert's data product are examined. Failed syncs, partial loads, or unusually long sync durations can indicate upstream pipeline issues that correlate with the anomaly.

Upstream Anomalies

The service traverses the product's upstream lineage and checks whether any source products have active anomaly alerts. If an upstream dependency is itself experiencing data issues, that's a strong signal for cascading impact.

Quality Check Failures

Recent failed quality checks on the same product are gathered. If null-rate checks, freshness checks, or referential integrity checks have failed around the same time, they may be related to the anomaly.

Schema Drift

Recent schema changes are detected by scanning audit log entries for the product. Column additions, type changes, or column removals can cause downstream metric anomalies.

Understanding the Explanation

The generated explanation includes:

FieldDescription
ExplanationA natural-language narrative describing the most likely root cause
ConfidenceHow confident the AI is in the hypothesis (low, medium, high)
Contributing FactorsA list of specific signals that informed the analysis (e.g., "upstream product X has an active anomaly", "sync failed 2 hours before alert")
tip

The explanation is a starting point for investigation, not a definitive diagnosis. Always verify the AI's hypothesis against your operational knowledge before taking action.

How It Works

  1. The alert is loaded and validated (must be an anomaly or trend type alert)
  2. Context signals are gathered in parallel — sync events, upstream anomalies, quality failures, schema drift
  3. A structured prompt is sent to the configured LLM with all gathered context
  4. The LLM response is parsed into explanation, confidence, and contributing factors
  5. The result is stored in the alert's metadata for caching
  6. The interaction is logged as an AI log entry for token tracking