Skip to main content

Flow Variables

A comprehensive guide to passing variables to flows in Qarion ETL, enabling dynamic configuration and parameterization of flow execution.

Overview

Flow variables allow you to pass dynamic values to flows at execution time, making flows reusable and configurable. Variables are available in:

  • Flow templates (Jinja2-style templating)
  • Task configurations
  • SQL queries
  • File paths
  • Any template-enabled configuration

Quick Start

Basic Variable Passing

qarion-etl trigger --flow-id my_flow --var environment=production --var region=us-east-1

Using Variables in Flow Templates

# flows/my_flow.toml
id = "my_flow"
name = "My Flow"
flow_type = "standard"

# Variables can be used in templates
[[tasks]]
id = "ingest_data"
type = "ingestion"
target_dataset_id = "data_{{ environment }}"

[tasks.config]
file_path = "s3://bucket-{{ region }}/data/input.csv"

CLI Usage

Trigger Command with Variables

# Single variable
qarion-etl trigger --flow-id my_flow --var key=value

# Multiple variables
qarion-etl trigger --flow-id my_flow \
--var environment=production \
--var region=us-east-1 \
--var batch_size=1000

# Variables with spaces (use quotes)
qarion-etl trigger --flow-id my_flow \
--var message="Hello World" \
--var description="Data processing flow"

Variable Types

Variables are automatically parsed based on their values:

# String (default)
--var name=John

# Number (integer)
--var count=123

# Number (float)
--var ratio=0.95

# Boolean
--var enabled=true
--var disabled=false

# Null
--var optional=null

Variable Formats

String Variables

--var name=value
--var message="value with spaces"
--var path=/data/files/

Number Variables

--var count=100        # Integer
--var ratio=0.95 # Float
--var threshold=1000 # Integer

Boolean Variables

--var enabled=true
--var disabled=false

Null Variables

--var optional=null
--var missing=none

Using Variables in Flows

In Flow Definition

Variables can be defined in the flow definition and overridden at execution time:

# flows/my_flow.toml
id = "my_flow"
flow_type = "standard"

# Default variables (can be overridden)
[variables]
environment = "development"
region = "us-west-2"
batch_size = 100

[[tasks]]
id = "process_data"
type = "transformation"
source_dataset_id = "data_{{ environment }}"

In Task Configurations

[[tasks]]
id = "export_data"
type = "export"

[tasks.properties]
destination = "s3://bucket-{{ region }}/output/"
format = "parquet"

[tasks.properties.export_config]
batch_size = {{ batch_size }}

In SQL Queries

[[tasks]]
id = "filter_data"
type = "transformation"

[tasks.config]
sql = """
SELECT *
FROM source_table
WHERE region = '{{ region }}'
AND environment = '{{ environment }}'
AND batch_id = {{ batch_id }}
"""

In File Paths

[[tasks]]
id = "load_file"
type = "ingestion"

[tasks.config]
file_path = "s3://data-bucket/{{ environment }}/{{ region }}/input.csv"

Variable Precedence

Variables are merged in the following order (later values override earlier ones):

  1. Flow-level variables (defined in flow definition)
  2. CLI-provided variables (from --var arguments)
  3. System metadata (batch_id, execution_date)

CLI-provided variables take precedence over flow-level variables.

Examples

Example 1: Environment-Specific Configuration

# Production
qarion-etl trigger --flow-id data_pipeline \
--var environment=production \
--var region=us-east-1 \
--var s3_bucket=prod-data-bucket

# Development
qarion-etl trigger --flow-id data_pipeline \
--var environment=development \
--var region=us-west-2 \
--var s3_bucket=dev-data-bucket

Flow definition:

id = "data_pipeline"
flow_type = "standard"

[[tasks]]
id = "ingest"
type = "ingestion"

[tasks.config]
file_path = "s3://{{ s3_bucket }}/{{ environment }}/input/"

Example 2: Dynamic Batch Processing

qarion-etl trigger --flow-id batch_processor \
--var batch_size=5000 \
--var max_retries=3 \
--var timeout=300

Flow definition:

[[tasks]]
id = "process_batch"
type = "transformation"

[tasks.config]
sql = """
SELECT *
FROM source_table
LIMIT {{ batch_size }}
"""

[tasks.config.retry_config]
max_retries = {{ max_retries }}
timeout = {{ timeout }}

Example 3: Date-Based Processing

qarion-etl trigger --flow-id daily_report \
--var report_date="2024-01-15" \
--var include_weekend=false

Flow definition:

[[tasks]]
id = "generate_report"
type = "export"

[tasks.properties]
destination = "s3://reports/{{ report_date }}/daily_report.csv"

[tasks.config]
query = """
SELECT *
FROM daily_data
WHERE date = '{{ report_date }}'
{% if not include_weekend %}
AND DAYOFWEEK(date) NOT IN (1, 7)
{% endif %}
"""

Example 4: Multi-Environment Deployment

# Staging
qarion-etl trigger --flow-id deploy \
--var env=staging \
--var db_host=staging-db.example.com \
--var api_key="${STAGING_API_KEY}"

# Production
qarion-etl trigger --flow-id deploy \
--var env=production \
--var db_host=prod-db.example.com \
--var api_key="${PROD_API_KEY}"

System Variables

The following variables are automatically available in all flows:

  • batch_id: Current batch ID (integer)
  • execution_date: Execution date/time (datetime object)

These are always available and don't need to be passed via --var.

Variable Access in Templates

Variables are available in Jinja2 templates throughout the flow:

# In file paths
file_path = "s3://bucket/{{ environment }}/{{ date }}/data.csv"

# In SQL queries
sql = "SELECT * FROM {{ table_prefix }}_data WHERE region = '{{ region }}'"

# In conditions
{% if environment == 'production' %}
# Production-specific configuration
{% endif %}

# In loops
{% for region in regions %}
# Process each region
{% endfor %}

Best Practices

  1. Use Descriptive Variable Names:

    --var environment=production  # Good
    --var e=prod # Avoid
  2. Document Variables:

    # flows/my_flow.toml
    # Required variables:
    # - environment: Deployment environment (production, staging, development)
    # - region: AWS region (us-east-1, eu-west-1, etc.)
  3. Provide Defaults:

    [variables]
    environment = "development" # Default value
    region = "us-west-2" # Default value
  4. Use Environment Variables:

    # Pass environment variables as flow variables
    qarion-etl trigger --flow-id my_flow \
    --var api_key="${API_KEY}" \
    --var db_password="${DB_PASSWORD}"
  5. Validate Variables:

    • Use flow validation to ensure required variables are provided
    • Check variable values in flow templates

Troubleshooting

Variable Not Available

Problem: Variable is None or not found in template.

Solution:

  • Check variable name spelling
  • Ensure variable is passed via --var
  • Verify variable is merged correctly (CLI variables override flow-level)

Variable Type Issues

Problem: Variable is treated as string when it should be a number.

Solution:

  • Numbers are automatically parsed
  • For explicit type conversion, use Jinja2 filters:
    {{ count | int }}
    {{ ratio | float }}

Special Characters

Problem: Variable value contains special characters.

Solution:

  • Use quotes for values with spaces or special characters:
    --var message="Hello, World!"
    --var path="/data/files/"