Skip to main content

Qarion ETL Tutorials

Step-by-step tutorials for using Qarion ETL.

Tutorial 1: Building Your First Change Feed

This tutorial walks you through creating a change feed flow to track changes in customer data.

Step 1: Initialize Project

qarion-etl init --project-name my_project
cd my_project

Step 2: Create Flow Definition

Create flows/customers_change_feed.toml:

id = "customers_change_feed"
name = "Customers Change Feed"
flow_type = "change_feed"
namespace = "raw"

[input]
primary_key = ["customer_id"]
columns = ["customer_id", "name", "email", "status", "updated_at"]

[properties.load]
source_path = "data/customers"
file_pattern = "customers_*.csv"

Step 3: Generate Datasets

qarion-etl generate-docs

This creates dataset definitions in datasets/.

Step 4: Generate Code

qarion-etl generate-code --format sql --flow customers_change_feed --output-dir output

Step 5: Build and Apply Migrations

# Generate datasets and migrations
qarion-etl build

# Apply migrations to create tables
qarion-etl apply-migrations

Tutorial 2: Financial Transaction Processing

Build a delta publishing flow for financial transactions.

Step 1: Create Flow

Create flows/transactions_delta.toml:

id = "transactions_delta"
name = "Transactions Delta Publishing"
flow_type = "delta_publishing"

[input]
primary_key = ["transaction_id"]
columns = ["transaction_id", "account_id", "amount", "transaction_date", "type"]

[properties]
namespace = "finance"

Step 2: Generate DBT Code

qarion-etl generate-code --format dbt --flow transactions_delta --output-dir dbt_project --dialect postgres

Step 3: Review Generated Code

Check the generated DBT models in dbt_project/models/.

Tutorial 3: User Session Analysis

Create a sessionization flow for web analytics.

Step 1: Create Flow

Create flows/user_sessions.toml:

id = "user_sessions"
name = "User Sessionization"
flow_type = "sessionization"

[input]
primary_key = ["event_id"]
columns = ["event_id", "user_id", "event_time", "event_type", "page_url"]

[properties]
session_timeout_minutes = 30

Step 2: Generate Code

qarion-etl generate-code --format dbt --flow user_sessions --output-dir dbt_project --dialect postgres

Step 3: Review Generated Code

Check the generated DBT models in dbt_project/models/.

Tutorial 4: Quick Start — Standard Flow End-to-End

Create a standard flow that loads CSV data, transforms it, and exports results.

Step 1: Initialize and Create Flow

qarion-etl init --project-name quick_start
cd quick_start

# Copy the example flow
cp examples/flows/standard.toml flows/my_first_flow.toml

Step 2: Prepare Sample Data

Create data/users.csv:

id,name,email,created_at
1,Alice,alice@example.com,2024-01-15
2,Bob,,2024-02-20
3,Charlie,charlie@example.COM,2024-03-10

Step 3: Build, Validate, and Run

# Validate your flow definition
qarion-etl validate-config

# Build datasets and migrations
qarion-etl build

# Run the flow
qarion-etl trigger --flow-id example_standard --batch-id 1

Step 4: Check for Optimizations

qarion-etl suggest-optimizations --flow-id example_standard

Tutorial 5: Data Quality — Freshness and Quality Checks

Set up quality checks with alerting for a data table.

Step 1: Define Quality Check Flow

Create flows/user_quality.toml:

id = "user_quality"
name = "User Data Quality"
flow_type = "quality_check"

[input]
columns = ["id", "email", "age", "updated_at"]
primary_key = "id"

[properties]
source_table = "users"

[[properties.checks]]
id = "email_complete"
type = "completeness"
columns = ["email"]
severity = "error"

[[properties.checks]]
id = "data_freshness"
type = "freshness"
columns = ["updated_at"]
severity = "warning"
[properties.checks.config]
timestamp_column = "updated_at"
max_age_hours = 24

Step 2: Run Quality Checks with Alerting

# Run the quality suite
qarion-etl run-quality-checks --suite-id user_quality --alert-channel log

# Schedule recurring checks (runs every 6 hours)
qarion-etl schedule-quality-check \
--suite-id user_quality \
--cron "0 */6 * * *" \
--alert-channel file \
--alert-file quality_alerts.jsonl

Tutorial 6: Multi-Engine — Same Flow, Different Engines

Run the same flow on SQLite and then DuckDB to compare engines.

Step 1: Create a Flow

Use the same standard flow from Tutorial 4.

Step 2: Run on SQLite (Default)

# qarion.toml already defaults to SQLite
qarion-etl trigger --flow-id example_standard --batch-id 1

Step 3: Switch to DuckDB

Edit qarion.toml:

[engine]
type = "duckdb"
path = "data/warehouse.db"

Step 4: Run the Same Flow on DuckDB

qarion-etl trigger --flow-id example_standard --batch-id 1

Both runs produce the same result — the flow definition is engine-agnostic.