Data Quality Engine

The Data Quality (DQ) Engine is the automated heart of the platform, responsible for executing checks, detecting anomalies, and generating alerts.

Architecture

1. Provider-Based Execution

The engine supports multiple execution backends via a Provider Architecture.

SQL Provider: Executes SQL queries directly against a target data warehouse (Snowflake, BigQuery, Postgres).
dbt Provider: Integrates with dbt tests to import results.
Python Provider: Runs custom Python logic for complex validations.

2. Execution Model

Checks can be executed:

Scheduled: Via cron expressions managed by the internal scheduler.
Ad-hoc: Triggered manually by users.
Event-driven: Triggered via API or Webhook.

3. Asynchronous Processing

Check execution is always asynchronous.

Request: API receives a request to run a check.
Queue: A task is pushed to the Redis queue (Arq).
Worker: A worker picks up the task, instantiates the appropriate provider, and executes the logic.
Result: Results are written to the database and analyzed for failure conditions.

Alerting Integration

When a check fails, the engine integrates with the Unified Annotation System.

Alert Generation: A DQAlert is created.
Notification: The Notification Service is triggered to dispatch alerts to configured channels (Slack, Email).
Remediation: Users can annotate the alert, assign it to a user, or link it to a Jira ticket.

Consistency & Isolation

Standard #111-A: Strict Scoping

All execution logic is strictly scoped to the Space and Dataset to prevent cross-tenant data leakage. The Lazy='raise' standard ensures that we do not accidentally fetch unrelated data during execution.

Architecture​

1. Provider-Based Execution​

2. Execution Model​

3. Asynchronous Processing​

Alerting Integration​

Consistency & Isolation​

Standard #111-A: Strict Scoping​