Skip to main content

Quality as Code (YAML)

Rather than configuring quality checks one-by-one through the UI, you can define all your quality rules in a YAML configuration file. This file lives alongside your data transformation code and is managed with the Qarion CLI — giving you version control, code review, and reproducibility for free.

Why Quality as Code?

BenefitDescription
Version controlRules live in Git — every change has a full audit trail
Code reviewThresholds and queries go through pull requests before deployment
ReproducibilityBootstrap a new environment by running qarion quality apply
CI/CD integrationEmbed quality gates directly into deployment pipelines

YAML File Structure

A DQ config file (qarion-dq.yaml by default) has four top-level keys:

version: "1.0"             # Config schema version (optional, default "1.0")
space: acme-analytics # Target space slug (required)

defaults: # Inherited by every check (optional)
connector: warehouse-snowflake
schedule: "0 6 * * *" # Daily at 6 AM

checks: # One or more check definitions (required)
- slug: orders-row-count
name: Orders Row Count
type: sql_metric
query: "SELECT COUNT(*) FROM analytics.orders"
thresholds:
operator: gte
value: 1000

Defaults

The defaults block reduces repetition. Every check inherits these values unless it explicitly provides its own:

KeyDescription
connectorConnector slug used to execute checks
scheduleCron expression for scheduled runs

Check Fields

KeyRequiredDescription
slugUnique identifier within the space (URL-safe)
nameHuman-readable display name
typeCheck type — see Check Types below
descriptionExplains what the check validates
querySQL query (for SQL-based types)
connectorConnector slug (overrides default)
productTarget product slug to associate the check with
scheduleCron schedule (overrides default)
thresholdsPass/fail criteria (see Thresholds)
configurationType-specific settings (see per-type examples)
parametersQuery parameter definitions (see Parameters)

Check Type Reference

SQL Checks

sql_metric

Returns a single numeric value, evaluated against a threshold.

- slug: orders-row-count
name: Orders Row Count
type: sql_metric
query: "SELECT COUNT(*) FROM analytics.orders"
product: orders-table
thresholds:
operator: gte
value: 1000

sql_condition

Fails if any rows are returned. Ideal for "this should never happen" assertions.

- slug: no-negative-amounts
name: No Negative Order Amounts
type: sql_condition
query: "SELECT * FROM analytics.orders WHERE amount < 0"
product: orders-table

Field-Level Checks

Field checks target a specific column. They require field_name and table_name in the configuration block.

TypeWhat it validates
null_checkNo null values in the column
uniquenessNo duplicate values
type_checkValues conform to an expected type
range_checkNumeric values within min_value / max_value
pattern_checkString values match a regex pattern
enum_checkValues belong to an allowed_values list
length_checkString length within min_length / max_length
freshness_checkMost recent timestamp within max_age_hours

Example — null check:

- slug: users-email-not-null
name: Users Email Not Null
type: null_check
product: users-table
configuration:
field_name: email
table_name: analytics.users

Example — enum check:

- slug: orders-status-valid
name: Order Status Validation
type: enum_check
product: orders-table
configuration:
field_name: status
table_name: analytics.orders
allowed_values:
- pending
- confirmed
- shipped
- delivered
- cancelled

Example — freshness check:

- slug: orders-freshness
name: Orders Table Freshness
type: freshness_check
product: orders-table
configuration:
field_name: updated_at
table_name: analytics.orders
max_age_hours: 24

Composite Checks

field_checks

Bundles multiple field assertions for one table into a single check:

- slug: users-field-suite
name: Users Field Quality Suite
type: field_checks
product: users-table
configuration:
table_name: analytics.users
checks:
- field: id
assertion: uniqueness
- field: email
assertion: not_null
- field: created_at
assertion: not_null

reconciliation

Compares results from two SQL queries with optional tolerance:

- slug: revenue-reconciliation
name: Revenue Reconciliation
type: reconciliation
configuration:
source_query: "SELECT SUM(amount) FROM staging.revenue"
target_query: "SELECT SUM(amount) FROM prod.revenue"
comparison_mode: percentage # exact | percentage | absolute
tolerance: 0.01

External & Manual Checks

TypeDescription
customReceives results from external tools (see External Integration)
manualPrompts for human-entered values when triggered
anomalyStatistical anomaly detection on metric time series

Thresholds

Thresholds define pass/fail criteria for checks that produce a numeric value.

thresholds:
operator: gte # Comparison: eq, gt, gte, lt, lte, between
value: 1000 # Fail below this
warn: 5000 # Warning below this (but above fail value)

With this configuration: 800 → fail, 3000 → warning, 6000 → pass.

For range checks, use between:

thresholds:
operator: between
min: 100
max: 10000

Parameterised Queries

Inject runtime values into SQL queries using {{variable_name}} syntax.

- slug: daily-row-count
name: Daily Row Count
type: sql_metric
query: "SELECT COUNT(*) FROM analytics.events WHERE event_date = '{{run_date}}'"
thresholds:
operator: gte
value: 10000
parameters:
- name: run_date
type: string
default: "2024-01-15"
description: "Target date partition"
required: false

Override parameter values when triggering a run through the CLI or SDK.


Workflow: Validate → Apply → Run

1. Validate

Check the file for errors without modifying anything:

qarion quality validate -f qarion-dq.yaml

2. Apply

Sync definitions to the platform — creates missing checks, updates existing ones:

qarion quality apply -f qarion-dq.yaml

The apply command is idempotent: running it multiple times with the same file produces no changes. It matches checks by slug.

3. Run

Execute all checks and record results:

qarion quality run-config -f qarion-dq.yaml

Use --no-record for local testing without recording:

qarion quality run-config -f qarion-dq.yaml --no-record

Integrating External Tools

External validation tools (Great Expectations, dbt tests, Airflow checks) can push their results into Qarion using the push command. This gives you a unified quality dashboard regardless of where checks run.

Pattern

  1. Define a custom or manual check in your YAML config (or via the UI)
  2. Run your external tool as usual
  3. Push the result to Qarion using the CLI or SDK

Great Expectations

After running a Great Expectations checkpoint, push the result:

# Run your GE checkpoint
great_expectations checkpoint run my_checkpoint

# Push the result to Qarion
qarion quality push my-space ge-orders-validation \
--status passed \
--value 100.0

Or automate this in a Python script:

import great_expectations as gx

# Run checkpoint
context = gx.get_context()
result = context.run_checkpoint("my_checkpoint")

# Map result to Qarion
status = "passed" if result.success else "failed"
passed_pct = (
result.statistics["successful_expectations"]
/ result.statistics["evaluated_expectations"]
* 100
)

# Push to Qarion
from qarion import QarionSyncClient

client = QarionSyncClient(api_key="qk_...")
client.quality.push_result(
"my-space",
"ge-orders-validation",
status=status,
value=passed_pct,
)

dbt Tests

After a dbt run, push test results:

# Run dbt tests
dbt test

# Push results per check
qarion quality push my-space dbt-not-null-orders-id \
--status passed

General Pattern

Any tool that produces a pass/fail outcome can integrate:

qarion quality push SPACE SLUG --status passed|failed|error [--value N]

The result is recorded as a new execution and triggers alerting if thresholds are breached.


Full Example

A production-grade config with multiple check types, shared defaults, and external integration:

version: "1.0"
space: acme-analytics

defaults:
connector: warehouse-snowflake
schedule: "0 6 * * *"

checks:
# SQL metric — row count with warning threshold
- slug: orders-row-count
name: Orders Row Count
type: sql_metric
description: "Ensure orders table has minimum rows"
query: "SELECT COUNT(*) FROM analytics.orders"
product: orders-table
schedule: "0 8 * * *"
thresholds:
operator: gte
value: 1000
warn: 500

# Field-level null check
- slug: users-email-not-null
name: Users Email Not Null
type: null_check
product: users-table
configuration:
field_name: email
table_name: analytics.users

# Condition — no orphaned records
- slug: no-orphaned-orders
name: No Orphaned Orders
type: sql_condition
query: >
SELECT o.id
FROM analytics.orders o
LEFT JOIN analytics.customers c ON o.customer_id = c.id
WHERE c.id IS NULL

# External — Great Expectations result
- slug: ge-orders-validation
name: GE Orders Validation
type: custom
description: "Result pushed from Great Expectations checkpoint"
product: orders-table

# Freshness
- slug: orders-freshness
name: Orders Freshness
type: freshness_check
product: orders-table
configuration:
field_name: updated_at
table_name: analytics.orders
max_age_hours: 24

Learn More