Skip to main content

Configuration Reference

Complete reference for Qarion ETL configuration options.

Configuration File

Qarion ETL uses a config.toml file for project configuration.

Engine Configuration

Qarion ETL supports two types of engines:

  1. Processing Engine ([engine]): Required. Used for data transformations and processing.
  2. Metadata Engine ([metadata_engine]): Optional. Used for storing metadata in database storage. Defaults to processing engine if not specified.

Processing Engine

The processing engine is configured in the [engine] section:

SQLite

[engine]
name = "sqlite"
[engine.config]
path = "data/qarion-etl.db"

Pandas In-Memory

[engine]
name = "pandas_memory"
[engine.config]
# No configuration required

Pandas Local Storage

[engine]
name = "pandas_local"
[engine.config]
storage_dir = "data/pandas"

DuckDB

[engine]
name = "duckdb"
[engine.config]
path = "data/qarion-etl.duckdb"

Metadata Engine

The metadata engine is optional and configured in the [metadata_engine] section. If not specified, the processing engine is used for metadata storage.

When to configure a separate metadata engine:

  • Using database storage for metadata (dataset_storage = "database", flow_storage = "database", etc.)
  • Want to separate processing workloads from metadata management
  • Using different engines optimized for different purposes

Example: Separate Metadata Engine

# Processing engine
[engine]
name = "pandas_memory"

# Metadata engine (for database storage)
[metadata_engine]
name = "sqlite"
[metadata_engine.config]
path = "data/metadata.db"

# Use database storage
dataset_storage = "database"
flow_storage = "database"

Note: If metadata_engine is not specified and you're using database storage, the processing engine will be used for metadata operations.

Repository Storage Configuration

Local Storage

Store metadata (datasets, flows, migrations) in local files:

[dataset_storage]
type = "local"
config = { dataset_dir = "datasets" }

[flow_storage]
type = "local"
config = { flow_dir = "flows" }

[migration_storage]
type = "local"
config = { migration_dir = "migrations" }

Database Storage

Store metadata in database tables. When using database storage, you can optionally configure a separate [metadata_engine] for metadata operations. If not specified, the processing engine ([engine]) is used.

# Processing engine (for data transformations)
[engine]
name = "pandas_memory"

# Metadata engine (optional - for metadata storage)
[metadata_engine]
name = "sqlite"
[metadata_engine.config]
path = "data/metadata.db"

# Use database storage
dataset_storage = "database"
flow_storage = "database"
schema_storage = "database"

Note: The metadata_engine configuration is used when dataset_storage, flow_storage, or schema_storage is set to "database". If metadata_engine is not specified, the processing engine is used for metadata operations.

Schema History Storage

Local Schema History

Schema history from migration files:

[schema_storage]
type = "local"
config = { migration_dir = "migrations" }

Database Schema History

Schema history in database:

[schema_storage]
type = "database"
config = {
connection_string = "sqlite:///metadata.db",
namespace = "xt"
}

Storage Backends (Input Files)

Storage backends are automatically detected from file paths. For S3:

[properties.input_ingestion]
path = "s3://my-bucket/data/"
pattern = "orders_*.csv"
credentials = {
aws_access_key_id = "your-access-key"
aws_secret_access_key = "your-secret-key"
region_name = "us-east-1"
}

Flow Loading Configuration

CSV Loader

[flow.load]
type = "csv"
delimiter = ","
header = true

JSON Loader

[flow.load]
type = "json"

Parquet Loader

[flow.load]
type = "parquet"

Fernet Key

A Fernet encryption key is automatically generated on project initialization:

fernet_key = "gAAAAABh..."  # Automatically generated, never commit to version control

Important:

  • Generated automatically when you run qarion-etl new-project or qarion-etl init
  • Required for credential encryption
  • Never commit to version control - add qarion-etl.toml to .gitignore
  • Each project should have its own unique key

Credential Store Configuration

Local Keystore

[credential_store]
type = "local_keystore"
[credential_store.config]
keystore_path = "~/.qarion_etl/credentials.keystore" # Optional
# fernet_key is automatically loaded from project config

Database Store

[credential_store]
type = "database"
[credential_store.config]
engine = { name = "sqlite", config = { path = "metadata.db" } }
table_name = "xt_credentials" # Optional
# fernet_key is automatically loaded from project config

AWS SSM Parameter Store

[credential_store]
type = "aws_ssm"
[credential_store.config]
parameter_prefix = "/qarion_etl/credentials/" # Optional, default: /qarion_etl/credentials/
region_name = "us-east-1"
kms_key_id = "alias/my-credentials-key" # Optional, uses default SSM key if not provided

Credential Definitions

[[credentials]]
id = "my_aws_creds"
name = "AWS Production Credentials"
credential_type = "aws"
description = "AWS credentials for production"
[credentials.metadata]
environment = "production"

Credential Types:

  • aws: AWS credentials
  • database: Database credentials
  • api_key: API key credentials
  • oauth: OAuth credentials
  • basic_auth: Basic authentication
  • custom: Custom credential type

Environment Variables

Configuration files support environment variable substitution. See Configuration Guide for details.

Quick Reference

  • ${VAR_NAME} - Substitute with optional default: ${VAR_NAME:-default}
  • $VAR_NAME - Simple substitution (no default)
  • XTRANSACT_CONFIG_PATH - Override configuration file path

Note: For production environments, consider using the Credential Store instead of environment variables for better security and management.