Getting Started with Qarion ETL
This guide will help you get started with Qarion ETL in minutes.
What is Qarion ETL?
Qarion ETL is a flexible, extensible data transformation framework for building scalable data pipelines. It provides:
- Flow-Based Architecture: Define data pipelines using declarative flow definitions
- Plugin System: Extensible architecture for custom flow types, engines, and code generators
- Multiple Engines: Support for SQLite, Pandas, DuckDB, and more
- Code Generation: Generate SQL, DBT, or Airflow code from flows
- Schema Evolution: Manage schema changes with forward/strict compatibility modes
Installation
Prerequisites
- Python 3.11 or higher
- pip or poetry for package management
Install from PyPI
pip install qarion-etl
Install from Source
git clone https://github.com/yourorg/qarion-etl.git
cd qarion-etl
pip install -e .
Quick Start
1. Initialize a Project
qarion-etl init
This creates a basic project structure:
my_project/
├── config.toml # Project configuration
├── datasets/ # Dataset definitions
├── flows/ # Flow definitions
└── migrations/ # Migration files
2. Define a Dataset
Create a dataset definition in datasets/orders.toml:
name = "orders"
namespace = "raw"
[columns]
[columns.id]
schema_type = "integer"
required = true
[columns.customer_id]
schema_type = "integer"
required = true
[columns.amount]
schema_type = "float"
required = true
[columns.created_at]
schema_type = "timestamp"
required = false
3. Define a Flow
Create a flow definition in flows/process_orders.toml:
id = "process_orders"
name = "Process Orders"
flow_type = "change_feed"
namespace = "raw"
[input]
columns = [
{ name = "id", schema_type = "integer" },
{ name = "customer_id", schema_type = "integer" },
{ name = "amount", schema_type = "float" },
{ name = "created_at", schema_type = "timestamp" }
]
primary_key = ["id"]
[properties]
change_detection_columns = ["amount", "customer_id"]
4. Generate Code
Generate SQL code:
qarion-etl generate-code --format sql --flow process_orders --output-dir output
Generate DBT code:
qarion-etl generate-code --format dbt --flow process_orders --output-dir dbt_project --dialect postgres
5. Execute Transformations
qarion-etl execute --flow process_orders --batch-id 1
Next Steps
- Core Concepts - Understand flows, datasets, and transformations
- Flow Types - Learn about different flow patterns
- Code Generation - Generate SQL, DBT, or Airflow code
- Configuration - Configure storage, engines, and plugins
- Architecture - Deep dive into system architecture
Common Use Cases
Change Detection
Track changes in data over time:
flow_type = "change_feed"
[properties]
change_detection_columns = ["status", "amount"]
Delta Publishing
Financial transaction processing:
flow_type = "delta_publishing"
[properties]
delta_method = "merge"
Sessionization
Group events into sessions:
flow_type = "sessionization"
[properties]
user_id_field = "user_id"
timestamp_field = "event_time"
session_timeout = "30 minutes"
Getting Help
- Documentation: See the Documentation Index
- Examples: Check the
examples/directory - Issues: Report issues on GitHub