Skip to main content

Getting Started with Qarion ETL

This guide will help you get started with Qarion ETL in minutes.

What is Qarion ETL?

Qarion ETL is a flexible, extensible data transformation framework for building scalable data pipelines. It provides:

  • Flow-Based Architecture: Define data pipelines using declarative flow definitions
  • Plugin System: Extensible architecture for custom flow types, engines, and code generators
  • Multiple Engines: Support for SQLite, Pandas, DuckDB, and more
  • Code Generation: Generate SQL, DBT, or Airflow code from flows
  • Schema Evolution: Manage schema changes with forward/strict compatibility modes

Installation

Prerequisites

  • Python 3.11 or higher
  • pip or poetry for package management

Install from PyPI

pip install qarion-etl

Install from Source

git clone https://github.com/yourorg/qarion-etl.git
cd qarion-etl
pip install -e .

Quick Start

1. Initialize a Project

qarion-etl init

This creates a basic project structure:

my_project/
├── config.toml # Project configuration
├── datasets/ # Dataset definitions
├── flows/ # Flow definitions
└── migrations/ # Migration files

2. Define a Dataset

Create a dataset definition in datasets/orders.toml:

name = "orders"
namespace = "raw"

[columns]
[columns.id]
schema_type = "integer"
required = true

[columns.customer_id]
schema_type = "integer"
required = true

[columns.amount]
schema_type = "float"
required = true

[columns.created_at]
schema_type = "timestamp"
required = false

3. Define a Flow

Create a flow definition in flows/process_orders.toml:

id = "process_orders"
name = "Process Orders"
flow_type = "change_feed"
namespace = "raw"

[input]
columns = [
{ name = "id", schema_type = "integer" },
{ name = "customer_id", schema_type = "integer" },
{ name = "amount", schema_type = "float" },
{ name = "created_at", schema_type = "timestamp" }
]
primary_key = ["id"]

[properties]
change_detection_columns = ["amount", "customer_id"]

4. Generate Code

Generate SQL code:

qarion-etl generate-code --format sql --flow process_orders --output-dir output

Generate DBT code:

qarion-etl generate-code --format dbt --flow process_orders --output-dir dbt_project --dialect postgres

5. Execute Transformations

qarion-etl execute --flow process_orders --batch-id 1

Next Steps

Common Use Cases

Change Detection

Track changes in data over time:

flow_type = "change_feed"
[properties]
change_detection_columns = ["status", "amount"]

Delta Publishing

Financial transaction processing:

flow_type = "delta_publishing"
[properties]
delta_method = "merge"

Sessionization

Group events into sessions:

flow_type = "sessionization"
[properties]
user_id_field = "user_id"
timestamp_field = "event_time"
session_timeout = "30 minutes"

Getting Help

  • Documentation: See the Documentation Index
  • Examples: Check the examples/ directory
  • Issues: Report issues on GitHub