Configuration Guide
How to configure Qarion ETL for your needs.
Configuration File
Qarion ETL uses an qarion-etl.toml file for project configuration (default name). This file is created when you initialize a project.
Basic Configuration
Minimal Configuration:
# qarion-etl.toml
[app]
app = "Qarion ETL"
type = "project"
project_name = "my_project"
[engine]
name = "sqlite"
[engine.config]
path = "data/qarion-etl.db"
dataset_dir = "datasets"
migration_dir = "migrations"
flow_dir = "flows"
quality_dir = "data_quality"
schema_storage = "local"
dataset_storage = "local"
flow_storage = "local"
quality_storage = "local"
# Quality Store Configuration
[quality_store]
enabled = true
auto_calculate_metrics = true
results_table_name = "_quality_results"
metrics_table_name = "_quality_metrics"
Complete Configuration Example
Full Configuration with All Options:
# qarion-etl.toml
[app]
app = "Qarion ETL"
type = "project"
project_name = "my_project"
version = "1.0.0"
# Processing Engine
[engine]
name = "sqlite"
[engine.config]
path = "data/qarion-etl.db"
# Optional: Separate Metadata Engine
[metadata_engine]
name = "sqlite"
[metadata_engine.config]
path = "data/metadata.db"
# Storage Configuration
dataset_storage = "local"
flow_storage = "local"
schema_storage = "local"
quality_storage = "local"
dataset_dir = "datasets"
flow_dir = "flows"
migration_dir = "migrations"
quality_dir = "data_quality"
metadata_namespace = "xt"
default_namespace = "public"
# Quality Store Configuration
[quality_store]
enabled = true
auto_calculate_metrics = true
results_table_name = "_quality_results"
metrics_table_name = "_quality_metrics"
# Credential Store Configuration
[credential_store]
type = "local_keystore"
[credential_store.config]
keystore_path = "~/.qarion_etl/credentials.keystore"
# Quality Store Configuration
[quality_store]
enabled = true
auto_calculate_metrics = true
results_table_name = "_quality_results"
metrics_table_name = "_quality_metrics"
# Credential Definitions
[[credentials]]
id = "aws_prod_creds"
name = "AWS Production Credentials"
credential_type = "aws"
description = "AWS credentials for production S3 access"
[[credentials]]
id = "db_prod_creds"
name = "Database Production Credentials"
credential_type = "database"
description = "PostgreSQL credentials for production"
Storage Configuration
Qarion ETL has multiple storage layers:
- Storage Backends: For input file storage (local filesystem, S3)
- Repository Storage: For metadata storage (datasets, flows, migrations)
See Engines and Storage for detailed information.
Repository Storage
Local Storage
Store definitions in local files (default):
dataset_dir = "datasets"
flow_dir = "flows"
migration_dir = "migrations"
dataset_storage = "local"
flow_storage = "local"
schema_storage = "local"
Database Storage
Store definitions in database:
dataset_storage = "database"
flow_storage = "database"
schema_storage = "database"
Note: Database storage requires a configured engine and database service.
Storage Backends
Storage backends are automatically detected from file paths. For S3:
Using Inline Credentials (Not Recommended):
[properties.input_ingestion]
path = "s3://my-bucket/data/"
pattern = "orders_*.csv"
credentials = {
aws_access_key_id = "your-access-key"
aws_secret_access_key = "your-secret-key"
region_name = "us-east-1"
}
Using Credential Store (Recommended):
[properties.input_ingestion]
path = "s3://my-bucket/data/"
pattern = "orders_*.csv"
credentials = "${credential:my_aws_creds}"
See Credential Management for detailed information on managing credentials securely.
Engine Configuration
Engines are the execution environments where transformations run. Qarion ETL supports two types of engines:
- Processing Engine (
[engine]): Required. Used for data transformations and processing. - Metadata Engine (
[metadata_engine]): Optional. Used for storing metadata in database storage. Defaults to processing engine if not specified.
See Engines and Storage for detailed information.
Processing Engine
The processing engine is configured in the [engine] section:
SQLite Engine
[engine]
name = "sqlite"
[engine.config]
path = "data/qarion-etl.db"
Pandas In-Memory Engine
[engine]
name = "pandas_memory"
[engine.config]
# No configuration required
Pandas Local Storage Engine
[engine]
name = "pandas_local"
[engine.config]
storage_dir = "data/pandas"
DuckDB Engine
[engine]
name = "duckdb"
[engine.config]
path = "data/qarion-etl.duckdb"
PySpark Engine
[engine]
name = "pyspark"
[engine.config]
app_name = "Qarion ETL"
master = "local[*]"
enable_hive_support = false
SparkSQL Engine
[engine]
name = "sparksql"
[engine.config]
app_name = "Qarion ETL-SQL"
master = "local[*]"
enable_hive_support = true
Polars Engine
[engine]
name = "polars"
[engine.config]
storage_dir = "data/polars" # Optional: for persistence
Or in-memory only:
[engine]
name = "polars"
[engine.config]
# No storage_dir = in-memory only
Metadata Engine
The metadata engine is optional and configured in the [metadata_engine] section. If not specified, the processing engine is used for metadata storage.
Example: Separate Metadata Engine
# Processing engine - for data transformations
[engine]
name = "pandas_memory"
# Metadata engine - for storing metadata in database
[metadata_engine]
name = "sqlite"
[metadata_engine.config]
path = "data/metadata.db"
# Use database storage
dataset_storage = "database"
flow_storage = "database"
When to use a separate metadata engine:
- Using database storage for metadata
- Want to separate processing workloads from metadata management
- Using different engines optimized for different purposes (e.g., Spark for processing, PostgreSQL for metadata)
Schema Storage
Local Schema Storage
schema_storage = "local"
migration_dir = "migrations"
Database Schema Storage
schema_storage = "database"
metadata_namespace = "xt"
Note: Database schema storage requires a configured engine and database service.
Namespace Configuration
metadata_namespace = "xt"
default_namespace = "public"
The metadata_namespace is used as a prefix for metadata tables (e.g., xt_runs, xt_schemas).
The default_namespace is the default namespace for datasets and flows.
Credential Management
Qarion ETL provides a credential store system for managing credentials securely. This is the recommended approach for storing sensitive data like passwords, API keys, and access tokens.
Key Benefits:
- Define credentials once and reuse across configurations
- Store credentials securely in databases, local keystores, or cloud key management services
- Reference credentials without exposing sensitive data in configuration files
- Support for multiple credential store backends
Quick Example:
[credential_store]
type = "local_keystore"
[[credentials]]
id = "my_aws_creds"
name = "AWS Production Credentials"
credential_type = "aws"
Then reference in configuration:
[properties.input_ingestion]
path = "s3://my-bucket/data/"
credentials = "${credential:my_aws_creds}"
See Credential Management Guide for complete documentation.
Quality Store Configuration
The quality store configuration controls how quality check results and metrics are stored and tracked.
Configuration Options
[quality_store]
enabled = true # Enable/disable automatic storage (default: true)
auto_calculate_metrics = true # Automatically calculate metrics (default: true)
results_table_name = "_quality_results" # Table name for results (default: "_quality_results")
metrics_table_name = "_quality_metrics" # Table name for metrics (default: "_quality_metrics")
Options:
enabled(boolean, default:true): Whether to enable automatic storage of quality check results and metrics. When disabled, results are not persisted to the database.auto_calculate_metrics(boolean, default:true): Whether to automatically calculate and store aggregated metrics when storing results. Metrics include pass rate, failure count, average execution time, and total records checked.results_table_name(string, default:"_quality_results"): Name of the table in the metadata engine where quality check execution results are stored.metrics_table_name(string, default:"_quality_metrics"): Name of the table in the metadata engine where aggregated quality metrics are stored.
Example Configuration
# Enable quality store with custom table names
[quality_store]
enabled = true
auto_calculate_metrics = true
results_table_name = "quality_check_results"
metrics_table_name = "quality_metrics"
Disabling Quality Store
To disable quality results storage:
[quality_store]
enabled = false
When disabled, quality checks will still execute, but results will not be persisted to the database. This is useful if you only need real-time validation without historical tracking.
Integration
The quality store is automatically integrated with:
- Quality check flows
- Quality check tasks in standard flows
- Quality check nodes
- Automatic quality checks after transformations
Results are automatically stored when quality checks execute, using the configured table names and settings.
For more information, see the Data Quality Guide.
Environment Variables
Qarion ETL supports using environment variables directly in configuration files. This is useful for:
- Keeping sensitive values (passwords, API keys) out of version control
- Using different values across environments (dev, staging, production)
- Sharing configuration across multiple projects
Note: For production environments, consider using the Credential Store instead of environment variables for better security and management.
Environment Variable Substitution
You can use environment variables in your config.toml file using two syntaxes:
Standard Syntax with Defaults
[engine]
name = "sqlite"
[engine.config]
path = "${DB_PATH:-data/qarion-etl.db}"
This will:
- Use the value of
DB_PATHenvironment variable if set - Fall back to
data/qarion-etl.dbifDB_PATHis not set
Simple Syntax
[engine]
name = "sqlite"
[engine.config]
path = "$DB_PATH"
This will:
- Use the value of
DB_PATHenvironment variable if set - Use an empty string if
DB_PATHis not set (with a warning)
Examples
Database Credentials
[engine]
name = "postgres"
[engine.config]
host = "${DB_HOST:-localhost}"
port = "${DB_PORT:-5432}"
database = "$DB_NAME"
user = "$DB_USER"
password = "$DB_PASSWORD"
Set environment variables:
export DB_HOST=production-db.example.com
export DB_PORT=5432
export DB_NAME=mydb
export DB_USER=myuser
export DB_PASSWORD=secretpassword
S3 Credentials
Using Environment Variables:
[properties.input_ingestion]
path = "s3://${S3_BUCKET}/data/"
credentials = {
aws_access_key_id = "$AWS_ACCESS_KEY_ID"
aws_secret_access_key = "$AWS_SECRET_ACCESS_KEY"
region_name = "${AWS_REGION:-us-east-1}"
}
Using Credential Store (Recommended):
[properties.input_ingestion]
path = "s3://my-bucket/data/"
credentials = "${credential:my_aws_creds}"
See Credential Management for setting up credential stores.
File Paths
dataset_dir = "${DATASET_DIR:-datasets}"
migration_dir = "${MIGRATION_DIR:-migrations}"
flow_dir = "${FLOW_DIR:-flows}"
Configuration File Path
You can also specify the configuration file path using an environment variable:
export XTRANSACT_CONFIG_PATH=/path/to/custom/config.toml
This takes precedence over the default qarion-etl.toml file, but can be overridden by command-line arguments.
Complete Configuration Example
Production-Ready Configuration:
# qarion-etl.toml
[app]
app = "Qarion ETL"
type = "project"
project_name = "production_pipeline"
version = "1.0.0"
# Processing Engine - for data transformations
[engine]
name = "duckdb"
[engine.config]
path = "data/processing.duckdb"
# Metadata Engine - for storing metadata
[metadata_engine]
name = "sqlite"
[metadata_engine.config]
path = "data/metadata.db"
# Storage Configuration
dataset_storage = "database"
flow_storage = "database"
schema_storage = "database"
dataset_dir = "datasets"
flow_dir = "flows"
migration_dir = "migrations"
metadata_namespace = "xt"
default_namespace = "public"
# Credential Store
[credential_store]
type = "local_keystore"
[credential_store.config]
keystore_path = "~/.qarion_etl/credentials.keystore"
# Credential Definitions
[[credentials]]
id = "aws_prod_creds"
name = "AWS Production Credentials"
credential_type = "aws"
description = "AWS credentials for production S3 access"
[[credentials]]
id = "db_prod_creds"
name = "Database Production Credentials"
credential_type = "database"
description = "PostgreSQL credentials for production database"
# Fernet key for credential encryption (auto-generated)
fernet_key = "gAAAAABh..." # Never commit this to version control
Validation
Configuration is validated on load. Invalid configuration will raise errors with details about what's wrong.
Related Documentation
- Engines and Storage - Detailed guide on engines and storage layers
- Configuration Reference - Complete configuration reference
- Core Concepts - Understanding Qarion ETL fundamentals
- Credential Management - Secure credential management