Quality as Code (YAML)
Rather than configuring quality checks one-by-one through the UI, you can define all your quality rules in a YAML configuration file. This file lives alongside your data transformation code and is managed with the Qarion CLI — giving you version control, code review, and reproducibility for free.
Why Quality as Code?
| Benefit | Description |
|---|---|
| Version control | Rules live in Git — every change has a full audit trail |
| Code review | Thresholds and queries go through pull requests before deployment |
| Reproducibility | Bootstrap a new environment by running qarion quality apply |
| CI/CD integration | Embed quality gates directly into deployment pipelines |
YAML File Structure
A DQ config file (qarion-dq.yaml by default) has four top-level keys:
version: "1.0" # Config schema version (optional, default "1.0")
space: acme-analytics # Target space slug (required)
defaults: # Inherited by every check (optional)
connector: warehouse-snowflake
schedule: "0 6 * * *" # Daily at 6 AM
checks: # One or more check definitions (required)
- slug: orders-row-count
name: Orders Row Count
type: sql_metric
query: "SELECT COUNT(*) FROM analytics.orders"
thresholds:
operator: gte
value: 1000
Defaults
The defaults block reduces repetition. Every check inherits these values unless it explicitly provides its own:
| Key | Description |
|---|---|
connector | Connector slug used to execute checks |
schedule | Cron expression for scheduled runs |
Check Fields
| Key | Required | Description |
|---|---|---|
slug | ✅ | Unique identifier within the space (URL-safe) |
name | ✅ | Human-readable display name |
type | ✅ | Check type — see Check Types below |
description | Explains what the check validates | |
query | SQL query (for SQL-based types) | |
connector | Connector slug (overrides default) | |
product | Target product slug to associate the check with | |
schedule | Cron schedule (overrides default) | |
thresholds | Pass/fail criteria (see Thresholds) | |
configuration | Type-specific settings (see per-type examples) | |
parameters | Query parameter definitions (see Parameters) |
Check Type Reference
SQL Checks
sql_metric
Returns a single numeric value, evaluated against a threshold.
- slug: orders-row-count
name: Orders Row Count
type: sql_metric
query: "SELECT COUNT(*) FROM analytics.orders"
product: orders-table
thresholds:
operator: gte
value: 1000
sql_condition
Fails if any rows are returned. Ideal for "this should never happen" assertions.
- slug: no-negative-amounts
name: No Negative Order Amounts
type: sql_condition
query: "SELECT * FROM analytics.orders WHERE amount < 0"
product: orders-table
Field-Level Checks
Field checks target a specific column. They require field_name and table_name in the configuration block.
| Type | What it validates |
|---|---|
null_check | No null values in the column |
uniqueness | No duplicate values |
type_check | Values conform to an expected type |
range_check | Numeric values within min_value / max_value |
pattern_check | String values match a regex pattern |
enum_check | Values belong to an allowed_values list |
length_check | String length within min_length / max_length |
freshness_check | Most recent timestamp within max_age_hours |
Example — null check:
- slug: users-email-not-null
name: Users Email Not Null
type: null_check
product: users-table
configuration:
field_name: email
table_name: analytics.users
Example — enum check:
- slug: orders-status-valid
name: Order Status Validation
type: enum_check
product: orders-table
configuration:
field_name: status
table_name: analytics.orders
allowed_values:
- pending
- confirmed
- shipped
- delivered
- cancelled
Example — freshness check:
- slug: orders-freshness
name: Orders Table Freshness
type: freshness_check
product: orders-table
configuration:
field_name: updated_at
table_name: analytics.orders
max_age_hours: 24
Composite Checks
field_checks
Bundles multiple field assertions for one table into a single check:
- slug: users-field-suite
name: Users Field Quality Suite
type: field_checks
product: users-table
configuration:
table_name: analytics.users
checks:
- field: id
assertion: uniqueness
- field: email
assertion: not_null
- field: created_at
assertion: not_null
reconciliation
Compares results from two SQL queries with optional tolerance:
- slug: revenue-reconciliation
name: Revenue Reconciliation
type: reconciliation
configuration:
source_query: "SELECT SUM(amount) FROM staging.revenue"
target_query: "SELECT SUM(amount) FROM prod.revenue"
comparison_mode: percentage # exact | percentage | absolute
tolerance: 0.01
External & Manual Checks
| Type | Description |
|---|---|
custom | Receives results from external tools (see External Integration) |
manual | Prompts for human-entered values when triggered |
anomaly | Statistical anomaly detection on metric time series |
Thresholds
Thresholds define pass/fail criteria for checks that produce a numeric value.
thresholds:
operator: gte # Comparison: eq, gt, gte, lt, lte, between
value: 1000 # Fail below this
warn: 5000 # Warning below this (but above fail value)
With this configuration: 800 → fail, 3000 → warning, 6000 → pass.
For range checks, use between:
thresholds:
operator: between
min: 100
max: 10000
Parameterised Queries
Inject runtime values into SQL queries using {{variable_name}} syntax.
- slug: daily-row-count
name: Daily Row Count
type: sql_metric
query: "SELECT COUNT(*) FROM analytics.events WHERE event_date = '{{run_date}}'"
thresholds:
operator: gte
value: 10000
parameters:
- name: run_date
type: string
default: "2024-01-15"
description: "Target date partition"
required: false
Override parameter values when triggering a run through the CLI or SDK.
Workflow: Validate → Apply → Run
1. Validate
Check the file for errors without modifying anything:
qarion quality validate -f qarion-dq.yaml
2. Apply
Sync definitions to the platform — creates missing checks, updates existing ones:
qarion quality apply -f qarion-dq.yaml
The apply command is idempotent: running it multiple times with the same file produces no changes. It matches checks by slug.
3. Run
Execute all checks and record results:
qarion quality run-config -f qarion-dq.yaml
Use --no-record for local testing without recording:
qarion quality run-config -f qarion-dq.yaml --no-record
Integrating External Tools
External validation tools (Great Expectations, dbt tests, Airflow checks) can push their results into Qarion using the push command. This gives you a unified quality dashboard regardless of where checks run.
Pattern
- Define a
customormanualcheck in your YAML config (or via the UI) - Run your external tool as usual
- Push the result to Qarion using the CLI or SDK
Great Expectations
After running a Great Expectations checkpoint, push the result:
# Run your GE checkpoint
great_expectations checkpoint run my_checkpoint
# Push the result to Qarion
qarion quality push my-space ge-orders-validation \
--status passed \
--value 100.0
Or automate this in a Python script:
import great_expectations as gx
# Run checkpoint
context = gx.get_context()
result = context.run_checkpoint("my_checkpoint")
# Map result to Qarion
status = "passed" if result.success else "failed"
passed_pct = (
result.statistics["successful_expectations"]
/ result.statistics["evaluated_expectations"]
* 100
)
# Push to Qarion
from qarion import QarionSyncClient
client = QarionSyncClient(api_key="qk_...")
client.quality.push_result(
"my-space",
"ge-orders-validation",
status=status,
value=passed_pct,
)
dbt Tests
After a dbt run, push test results:
# Run dbt tests
dbt test
# Push results per check
qarion quality push my-space dbt-not-null-orders-id \
--status passed
General Pattern
Any tool that produces a pass/fail outcome can integrate:
qarion quality push SPACE SLUG --status passed|failed|error [--value N]
The result is recorded as a new execution and triggers alerting if thresholds are breached.
Full Example
A production-grade config with multiple check types, shared defaults, and external integration:
version: "1.0"
space: acme-analytics
defaults:
connector: warehouse-snowflake
schedule: "0 6 * * *"
checks:
# SQL metric — row count with warning threshold
- slug: orders-row-count
name: Orders Row Count
type: sql_metric
description: "Ensure orders table has minimum rows"
query: "SELECT COUNT(*) FROM analytics.orders"
product: orders-table
schedule: "0 8 * * *"
thresholds:
operator: gte
value: 1000
warn: 500
# Field-level null check
- slug: users-email-not-null
name: Users Email Not Null
type: null_check
product: users-table
configuration:
field_name: email
table_name: analytics.users
# Condition — no orphaned records
- slug: no-orphaned-orders
name: No Orphaned Orders
type: sql_condition
query: >
SELECT o.id
FROM analytics.orders o
LEFT JOIN analytics.customers c ON o.customer_id = c.id
WHERE c.id IS NULL
# External — Great Expectations result
- slug: ge-orders-validation
name: GE Orders Validation
type: custom
description: "Result pushed from Great Expectations checkpoint"
product: orders-table
# Freshness
- slug: orders-freshness
name: Orders Freshness
type: freshness_check
product: orders-table
configuration:
field_name: updated_at
table_name: analytics.orders
max_age_hours: 24
Learn More
- Quality Checks — Creating and managing checks in the UI
- Continuous Monitoring — Monitoring strategies and best practices
- Quality Dashboard — Viewing aggregated quality health