Skip to main content

Quality Framework

Qarion's quality framework provides automated monitoring of data health through configurable checks that run on a schedule, detect problems, and surface alerts when something goes wrong. This page explains the quality dimensions the platform measures, the check types available, how severity and scheduling work, and how quality integrates with the broader governance model.

Quality Dimensions

Data quality is a multi-faceted concept. Qarion organizes it into five key dimensions, each addressing a different aspect of data health:

DimensionQuestionHow Qarion Measures
FreshnessIs the data current?Timestamp age checks
CompletenessIs all expected data present?Row counts, null checks
UniquenessAre there duplicates?Primary key validation
ValidityDoes data match expected formats?Custom SQL assertions
ConsistencyIs data aligned across sources?Cross-product checks

Not every product needs checks across all five dimensions, but thinking in these terms helps teams identify the most important quality risks for each dataset and prioritize monitoring accordingly.


Quality Check Types

Freshness

Freshness checks monitor how recently a dataset was updated. They work by examining a timestamp column (typically updated_at or loaded_at) and comparing the most recent value to a maximum age threshold. If the data is older than the threshold, the check fails.

{
"check_type": "freshness",
"config": {
"timestamp_column": "updated_at",
"max_age_hours": 24
}
}

Freshness checks are often the single most valuable type of quality monitoring, because stale data is a symptom of almost every pipeline failure — whether it's a crashed job, a blocked dependency, or a permissions issue. If you only set up one check per product, make it a freshness check.


Row Count

Row count checks validate that the number of rows in a dataset falls within expected bounds. You specify minimum and maximum thresholds, and the check fails if the actual count falls outside that range.

{
"check_type": "row_count",
"config": {
"min_rows": 1000,
"max_rows": 1000000
}
}

Unexpected drops in row count often indicate that a data load failed or was only partially completed, while unexpected spikes can signal duplicate ingestion or a runaway upstream process. Row count checks serve as a simple but effective sanity check on pipeline health.


Uniqueness

Uniqueness checks detect duplicate records by examining a specific column — typically the primary key or a business key that should contain only distinct values.

{
"check_type": "uniqueness",
"config": {
"column_name": "id"
}
}

The check passes only when zero duplicates are found. Duplicates can cause double-counting in aggregations, incorrect joins, and misleading metrics, so uniqueness checks are especially important for products that serve as dimension tables or that feed into financial reporting.


Not Null

Not null checks ensure that required fields are populated. They examine a single column and fail if any null values are found.

{
"check_type": "not_null",
"config": {
"column_name": "customer_id"
}
}

This check type is essential for fields that downstream consumers depend on — such as foreign keys, required identifiers, and critical business attributes. A null value in a join key, for example, can silently drop rows from downstream query results without producing any visible error.


Custom SQL

Custom SQL checks provide maximum flexibility by letting you write any SQL assertion and compare the result against an expected value. The query should return a single numeric value, which is then compared using a configurable operator.

{
"check_type": "custom_sql",
"config": {
"sql_query": "SELECT COUNT(*) FROM orders WHERE amount < 0",
"expected_value": 0,
"comparison": "equals"
}
}

Supported comparison operators include equals, greater_than, less_than, and between. Custom SQL checks are the right choice for business rule validation (e.g., "no orders should have negative amounts"), cross-table consistency checks (e.g., "the sum of line items should match the order total"), and any quality rule that can't be expressed by the built-in check types.


Severity Levels

Each quality check is assigned a severity level that determines how its failures are prioritized and communicated:

LevelIconWhen to Use
Infoℹ️Minor issues, FYI only
Warning⚠️Degraded quality, needs attention
Critical🚨Data unusable, immediate action required

Severity affects how alerts appear in the platform — critical alerts are surfaced prominently in dashboards and may trigger immediate notifications, while info-level alerts are logged for trend analysis but don't demand immediate attention. Choosing the right severity is important: if everything is marked as critical, teams quickly develop alert fatigue and start ignoring notifications entirely.


Scheduling

Quality checks run on a schedule defined by a cron expression. The schedule should align with the cadence of the data pipeline that produces the product:

ScheduleCronUse Case
Every hour0 * * * *Streaming data, fast-moving pipelines
Daily 6 AM0 6 * * *Batch pipelines, daily loads
Weekly Monday0 0 * * 1Weekly aggregations
Every 15 min*/15 * * * *Real-time monitoring

Running checks too frequently wastes compute resources and can generate noisy alerts during normal pipeline windows. Running them too infrequently means problems go undetected for longer. A good rule of thumb is to schedule the check to run shortly after the pipeline is expected to complete — for a daily batch job that finishes at 5 AM, scheduling a freshness check at 6 AM gives a reasonable buffer while still catching failures quickly.


Quality Score

Each data product has an aggregated quality score that provides a quick summary of its overall health. The score is calculated as the ratio of passing checks to total checks:

Quality Score = Passing Checks / Total Checks

This score maps to a visual health indicator in the catalog:

ScoreStatusColor
≥ 0.95Healthy🟢 Green
0.80 - 0.94Warning🟡 Yellow
< 0.80Critical🔴 Red

The quality score is a useful signal for data consumers deciding whether to trust a dataset, and it can also be referenced in data contracts as an SLA term (for example, requiring that a product maintain a minimum quality score of 0.95).


Alert Lifecycle

When a quality check fails, the platform creates an alert and tracks it through a lifecycle:

An alert begins in the Open state, indicating that it needs attention. When someone begins investigating, they can move it to Acknowledged to signal that the problem is being worked on. Once the underlying issue is fixed and the check passes again, the alert transitions to Resolved — either through manual intervention or automatically when the next scheduled run succeeds.

For problems that require more formal tracking, open alerts can be escalated into issues, which provide a richer workflow with assignments, priorities, and comment threads.


Quality Checks via API

Create Check

To create a new quality check, send a POST request with the check configuration:

POST /quality/checks
{
"name": "Orders Freshness",
"product_id": "product-uuid",
"check_type": "freshness",
"config": {
"timestamp_column": "updated_at",
"max_age_hours": 24
},
"schedule": "0 * * * *",
"severity": "critical"
}

Trigger Manual Run

You can trigger a check to run immediately outside of its scheduled cadence, which is useful for validating that a newly created check is working correctly or for re-checking after a fix:

POST /quality/checks/{id}/run

Get Check History

To review the recent execution history of a check, use the history endpoint with an optional days parameter:

GET /quality/checks/{id}/history?days=7

Integration with Governance

Quality monitoring doesn't exist in isolation — it connects to the broader governance framework at several points.

Steward Responsibility

The product Steward is the person most directly responsible for quality. They define the quality rules and thresholds, set appropriate severity levels, and are the first responder when quality alerts are triggered. By connecting quality checks to governance roles, Qarion ensures that quality monitoring has a clear human owner.

Contract SLAs

Data contracts between producers and consumers can include explicit quality requirements as part of their SLA terms:

{
"sla": {
"quality_score_min": 0.95
}
}

When a product's quality score drops below the contracted threshold, the platform automatically tracks the violation, making quality a measurable and enforceable commitment — not just a best-effort aspiration.

Issue Integration

Critical alerts that require structured investigation and resolution can be escalated into formal issues. This creates a bridge between automated quality monitoring and the human-driven issue management workflow, ensuring that significant quality problems are tracked to resolution with full accountability.


Best Practices

Start with Freshness

If you're setting up quality monitoring for the first time, begin by adding a freshness check to every product. Freshness failures are the most common and most impactful data quality problem, and a single freshness check catches a wide variety of pipeline failures.

Layer Your Checks

As your quality program matures, build up monitoring in tiers:

Tier 1: Freshness + Row Count (all products)
Tier 2: Uniqueness + Not Null (critical fields)
Tier 3: Custom SQL (business rules)

This tiered approach ensures comprehensive coverage for important products without overwhelming your team with alerts on assets that don't warrant close monitoring.

Right-Size Severity

Reserve the Critical severity for genuine emergencies — situations where data is unusable and downstream processes are at risk. Use Warning for degraded quality that needs attention but isn't an emergency, and Info for monitoring trends over time. If most of your alerts are critical, none of them effectively are.

Test Your Checks

Before enabling a new check on a production schedule, run it manually a few times to verify that the logic works correctly. Check historical data to confirm that the thresholds you've set would have produced reasonable results over the past few days or weeks, and adjust thresholds based on normal variance to avoid false positives.