Skip to main content

Lineage & Impact

Data lineage is the ability to trace how data flows between products — from raw sources through transformations to final outputs. In Qarion, lineage relationships are first-class entities that you can define manually, import automatically from tools like dbt, or manage programmatically through the API.

What is Lineage?

At its simplest, lineage describes the upstream and downstream relationships between data products:

Raw Events  →  Cleaned Events  →  Event Metrics  →  Dashboard
(upstream) (downstream)

Upstream products are the sources that feed data into a product. Downstream products are the consumers that depend on it. By recording these relationships, Qarion builds a directed graph that can be traversed in either direction, giving you a complete picture of how data moves through your organization.


Why Lineage Matters

Impact Assessment

Before making a change to a data product — whether it's a schema modification, a pipeline refactor, or a data migration — you need to know what will be affected downstream. Lineage lets you answer questions like "If I change this table's schema, which dashboards will break?" before you deploy, rather than discovering the answer when something fails in production.

Root Cause Analysis

When something goes wrong — a dashboard shows incorrect numbers, a metric stops updating, or a downstream consumer receives unexpected nulls — lineage allows you to trace the problem back to its source. Instead of manually investigating each step in the pipeline, you can walk the lineage graph upstream to pinpoint where the bad data originated.

Compliance and Auditing

Regulatory requirements often demand that organizations can demonstrate where their data comes from and where it goes. Lineage provides the provenance trail needed to answer audit questions like "Show me everywhere customer PII flows" — a question that would otherwise require extensive manual investigation across teams and systems.


Lineage Relationships

Qarion supports several relationship types to capture the nature of the dependency between products:

TypeDescription
transformsSource data is transformed or aggregated
joinsData is joined with this source
consumesData is read but unchanged
derivesCalculated or derived from

These types are informational rather than functional — the platform treats all lineage edges equally when computing impact or rendering graphs — but they provide valuable context when a developer is trying to understand how data flows between products, not just that it flows.

Here is an example showing how a single raw source can feed multiple downstream products through different relationship types:

customer_events (raw)
├── transforms → customer_events_clean
│ ├── joins → customer_dim
│ └── transforms → customer_metrics
│ └── consumes → executive_dashboard

└── consumes → event_monitoring_dashboard

Lineage Graph

Product-Level Lineage

To view the upstream and downstream dependencies of a single product, use the product lineage endpoint:

GET /catalog/spaces/{slug}/products/{id}/lineage?depth=2

The depth parameter controls how many hops from the focal product to include (defaulting to 1), and the direction parameter lets you request only upstream, only downstream, or both.

Global Lineage

For a broader view that spans the entire graph, use the global lineage endpoint with a selector expression to define your focus:

GET /lineage/graph?selector=customer_metrics+

This query retrieves customer_metrics and all of its downstream consumers. The selector syntax is described in detail below.


dbt-Style Selectors

Qarion supports a selector syntax inspired by dbt's graph operators, providing a concise way to express which portion of the lineage graph you want to retrieve:

SelectorMeaning
modelSingle product
+modelAll upstream + product
model+Product + all downstream
+model+Full lineage chain
model+2Product + 2 levels downstream
2+model2 levels upstream + product

The + operator indicates direction (prefix for upstream, suffix for downstream), and an optional numeric modifier limits the depth of traversal. These selectors compose naturally — 3+customer_metrics+3 retrieves three levels of upstream dependencies, the focal product, and three levels of downstream consumers.

# Everything that feeds into customer_metrics
curl "/lineage/graph?selector=+customer_metrics"

# customer_metrics and all its consumers
curl "/lineage/graph?selector=customer_metrics+"

# Full lineage chain, 3 levels each direction
curl "/lineage/graph?selector=3+customer_metrics+3"

Impact Analysis

Before a Change

The impact analysis endpoint quantifies the blast radius of a potential change to a data product. Given a product ID, it returns the number of affected products, dashboards, contracts, and quality checks, along with a list of stakeholders who should be notified:

GET /lineage/impact?product_id={id}&depth=5
{
"affected_products": 12,
"affected_dashboards": 3,
"affected_contracts": 2,
"affected_checks": 8,
"stakeholders": [
{"name": "Alice (Owner - Customer Metrics)", "email": "..."},
{"name": "Bob (Steward - Revenue Dashboard)", "email": "..."}
]
}

Stakeholder Notification

You can combine impact analysis with programmatic notification to ensure that everyone affected by a change knows about it before it happens:

def notify_before_change(product_id):
impact = client.get(f"/lineage/impact?product_id={product_id}").json()

for stakeholder in impact["stakeholders"]:
send_email(
to=stakeholder["email"],
subject=f"Planned change to {product_name}",
body=f"This may affect {len(impact['affected_products'])} products..."
)

This pattern is especially valuable in organizations where changes to shared datasets can have cascading consequences across teams.


Setting Up Lineage

Manual Definition

The most straightforward way to define lineage is by specifying upstream and downstream product IDs directly:

PUT /catalog/spaces/{slug}/products/{id}/lineage
{
"upstream_ids": ["source-1-uuid", "source-2-uuid"],
"downstream_ids": ["consumer-uuid"]
}

This approach works well for small catalogs or for relationships that aren't captured by any automated tool.

Automatic from dbt

For organizations using dbt, lineage can be extracted automatically from the depends_on field in the dbt manifest. When you sync your dbt project with Qarion, the platform reads the dependency graph and creates the corresponding lineage relationships:

# In dbt manifest
"depends_on": {
"nodes": [
"model.project.customer_events",
"model.project.customer_dim"
]
}

This is the recommended approach for dbt-based pipelines, since it keeps lineage in sync with your actual transformation logic without any manual intervention. See the dbt Sync Tutorial for a complete walkthrough.

API-Based Discovery

For custom pipelines that don't use dbt, you can update lineage programmatically after each job run. This approach treats your pipeline orchestrator as the source of truth for lineage:

def update_lineage_after_job(job_config):
for output_table in job_config["outputs"]:
for input_table in job_config["inputs"]:
# Add upstream relationship
client.post(
f"/catalog/spaces/{space}/products/{output_table}/lineage/upstream",
json={"product_id": input_table}
)

Lineage Visualization

Graph Response Format

The lineage graph API returns a structured response with nodes (the products in the graph) and edges (the relationships between them). Each node includes a layer value indicating its distance from the focal product — zero for the focal product itself, negative values for upstream dependencies, and positive values for downstream consumers:

{
"nodes": [
{"id": "...", "name": "customer_events", "type": "table", "layer": -2},
{"id": "...", "name": "customer_metrics", "type": "table", "layer": 0},
{"id": "...", "name": "dashboard", "type": "dashboard", "layer": 1}
],
"edges": [
{"source": "customer_events", "target": "customer_metrics", "relationship": "transforms"},
{"source": "customer_metrics", "target": "dashboard", "relationship": "consumes"}
]
}

This format is designed to be easy to render as a graph visualization, with the layer values providing a natural left-to-right ordering.


Use Cases

Change Management

Before deploying schema changes to a data product, query the impact analysis endpoint to identify affected downstream assets, notify the stakeholders listed in the response, verify that downstream products are compatible with the planned change, and deploy with confidence that all affected parties are aware.

Incident Response

When a data quality issue is detected, use the lineage graph to check upstream dependencies and identify the root cause. Then trace all affected downstream products to understand the full scope of impact, and coordinate resolution across the teams responsible for each affected product.

Compliance Audit

For regulatory requirements, start by identifying data products tagged with sensitive classifications (such as PII), trace all downstream consumers to see where sensitive data flows, verify that access controls are appropriate at each step, and generate a lineage report that demonstrates data provenance.

Documentation

The lineage graph itself serves as living documentation of your data architecture. By exporting it and rendering it as a visualization, you can include data flow diagrams in product documentation that always reflect the actual state of your pipelines — rather than maintaining static diagrams that drift out of date.


Best Practices

Keep Lineage Current

Stale lineage is worse than no lineage at all, because it creates a false sense of security. Whenever pipelines change — whether through code modifications, new data sources, or retired outputs — update the lineage graph to reflect the new reality.

Use Automation

Manual lineage updates are error-prone and easily forgotten. Wherever possible, automate lineage management by syncing from dbt automatically, updating lineage via CI/CD hooks when pipeline code changes, or using pipeline metadata from your orchestration tool to infer dependencies.

Tag Sensitive Data

Lineage becomes especially powerful when combined with data classification tags. By tagging products that contain PII, confidential data, or sensitive business information, you can use lineage traversal to track how sensitive data propagates through your organization — and ensure that access controls and compliance measures are applied consistently at every stage.

Audit Regularly

Even with automation, lineage graphs can develop inconsistencies over time. Schedule periodic audits to identify orphaned products (those with no upstream sources, which may indicate missing lineage), dead ends (products with no downstream consumers, which may be obsolete), and circular dependencies (which usually indicate a modeling error).