Migrations Guide
A comprehensive guide to understanding and working with migrations in Qarion ETL.
Overview
Migrations in Qarion ETL are the bridge between dataset definitions and database tables. They ensure your database schema stays synchronized with your dataset definitions as your data models evolve.
What Are Migrations?
Migrations are structured files that contain database schema changes (DDL operations). They:
- Materialize Datasets: Convert dataset definitions into physical database tables
- Track Changes: Record schema evolution over time
- Enable Rollback: Support reversing schema changes when needed
- Ensure Consistency: Keep database schema in sync with dataset definitions
The Migration Workflow
1. Define Datasets
Start by defining your datasets in TOML files:
# datasets/orders.toml
name = "orders"
namespace = "raw"
description = "Customer orders"
[columns]
[columns.id]
schema_type = "integer"
required = true
primary_key = true
description = "Order identifier"
[columns.customer_id]
schema_type = "integer"
required = true
description = "Customer identifier"
[columns.amount]
schema_type = "float"
required = true
description = "Order total amount"
[columns.created_at]
schema_type = "timestamp"
required = false
description = "Order creation timestamp"
2. Generate Migrations
Generate migration files from your dataset definitions:
qarion-etl build
Or specifically generate migrations:
qarion-etl generate-migrations
This will:
- Scan all dataset definition files
- Compare them with the current database schema
- Generate migration files for any differences
- Save migration files to the
migrations/directory
3. Review Migrations
Review the generated migration files before applying:
{
"metadata": {
"dataset_id": "orders",
"dataset_name": "orders",
"action": "CREATE",
"new_version": 1,
"previous_version": null,
"compatibility_mode": "strict"
},
"up_ddl": [
{
"operation": "CREATE_TABLE",
"table_name": "raw_orders",
"columns": [
{
"name": "id",
"agnostic_type": "INTEGER",
"constraints": ["PRIMARY KEY"]
},
{
"name": "customer_id",
"agnostic_type": "INTEGER",
"constraints": ["NOT NULL"]
},
{
"name": "amount",
"agnostic_type": "REAL",
"constraints": ["NOT NULL"]
},
{
"name": "created_at",
"agnostic_type": "TIMESTAMP",
"constraints": []
}
]
}
],
"down_ddl": [
{
"operation": "DROP_TABLE",
"table_name": "raw_orders"
}
]
}
4. Apply Migrations
Apply migrations to your database:
# Apply all pending migrations
qarion-etl apply-migrations
# Dry run (validate without executing)
qarion-etl apply-migrations --dry-run
Migration File Structure
Migration files are JSON files with the following structure:
Metadata
{
"metadata": {
"dataset_id": "orders",
"dataset_name": "orders",
"action": "CREATE",
"new_version": 1,
"previous_version": null,
"compatibility_mode": "strict"
}
}
- dataset_id: Unique identifier for the dataset
- dataset_name: Human-readable dataset name
- action: Migration action (CREATE, ALTER, NO_CHANGE)
- new_version: New schema version number
- previous_version: Previous schema version (null for CREATE)
- compatibility_mode: Schema evolution mode used
UP DDL
Operations to apply the migration (forward direction):
{
"up_ddl": [
{
"operation": "CREATE_TABLE",
"table_name": "raw_orders",
"columns": [...]
}
]
}
DOWN DDL
Operations to rollback the migration (reverse direction):
{
"down_ddl": [
{
"operation": "DROP_TABLE",
"table_name": "raw_orders"
}
]
}
Migration Actions
CREATE
Creates a new table from a dataset definition. Generated when:
- A new dataset is defined
- No table exists for the dataset in the database
Example:
{
"operation": "CREATE_TABLE",
"table_name": "raw_orders",
"columns": [...]
}
ALTER
Modifies an existing table when a dataset definition changes. Generated when:
- Columns are added, removed, or modified
- Constraints are changed
- Data types are updated
Example:
{
"operation": "ADD_COLUMN",
"table_name": "raw_orders",
"column": {
"name": "status",
"agnostic_type": "TEXT",
"constraints": ["NOT NULL"]
}
}
NO_CHANGE
No migration needed. Generated when:
- Dataset definition matches current database schema
- No changes detected
Schema Evolution Modes
Migrations respect the schema evolution mode configured for each dataset:
Strict Mode
- No schema changes allowed
- Raises errors on schema mismatches
- Use when schema stability is critical
[schema_evolution]
compatibility_mode = "strict"
Forward Mode
- Allows adding new columns
- Prevents removing or modifying existing columns
- Use for backward-compatible changes
[schema_evolution]
compatibility_mode = "forward"
Normal Mode
- Allows all schema changes
- Includes validation
- Use when you need full flexibility
[schema_evolution]
compatibility_mode = "normal"
Supported DDL Operations
CREATE_TABLE
Creates a new table with specified columns and constraints.
{
"operation": "CREATE_TABLE",
"table_name": "raw_orders",
"columns": [
{
"name": "id",
"agnostic_type": "INTEGER",
"constraints": ["PRIMARY KEY", "NOT NULL"]
}
]
}
DROP_TABLE
Drops a table if it exists.
{
"operation": "DROP_TABLE",
"table_name": "raw_orders"
}
ADD_COLUMN
Adds a new column to an existing table.
{
"operation": "ADD_COLUMN",
"table_name": "raw_orders",
"column": {
"name": "status",
"agnostic_type": "TEXT",
"constraints": ["NOT NULL"]
}
}
DROP_COLUMN
Drops a column from a table.
{
"operation": "DROP_COLUMN",
"table_name": "raw_orders",
"column_name": "old_column"
}
ALTER_COLUMN
Modifies an existing column (type, constraints, etc.).
{
"operation": "ALTER_COLUMN",
"table_name": "raw_orders",
"column_name": "amount",
"new_type": "DECIMAL",
"new_constraints": ["NOT NULL"]
}
Migration Workflow Examples
Example 1: Creating a New Dataset
- Define Dataset:
# datasets/products.toml
name = "products"
namespace = "raw"
[columns]
[columns.id]
schema_type = "integer"
required = true
primary_key = true
- Generate Migration:
qarion-etl build
- Review Migration File:
migrations/20241205120000_create_products.json
- Apply Migration:
qarion-etl apply-migrations
- Result:
raw_productstable is created in the database
Example 2: Adding a Column
- Update Dataset:
# datasets/products.toml
name = "products"
namespace = "raw"
[columns]
[columns.id]
schema_type = "integer"
required = true
primary_key = true
[columns.name]
schema_type = "string"
required = true
description = "Product name"
- Generate Migration:
qarion-etl build
- Review Migration File:
migrations/20241205120001_alter_products_add_name.json
The generated migration file will contain:
{
"metadata": {
"dataset_id": "products",
"dataset_name": "products",
"action": "ALTER",
"new_version": 2,
"previous_version": 1,
"compatibility_mode": "forward"
},
"up_ddl": [
{
"operation": "ADD_COLUMN",
"table_name": "raw_products",
"column": {
"name": "name",
"agnostic_type": "TEXT",
"constraints": ["NOT NULL"]
}
}
],
"down_ddl": [
{
"operation": "DROP_COLUMN",
"table_name": "raw_products",
"column_name": "name"
}
]
}
- Apply Migration:
qarion-etl apply-migrations
- Result:
namecolumn is added toraw_productstable
Example 3: Complete Migration Workflow
Initial Dataset:
# datasets/customers.toml
name = "customers"
namespace = "raw"
[columns]
[columns.id]
schema_type = "integer"
required = true
primary_key = true
[columns.name]
schema_type = "string"
required = true
After First Migration:
qarion-etl build
qarion-etl apply-migrations
# Creates: raw_customers table with id and name columns
Add Email Column:
# datasets/customers.toml
name = "customers"
namespace = "raw"
[columns]
[columns.id]
schema_type = "integer"
required = true
primary_key = true
[columns.name]
schema_type = "string"
required = true
[columns.email]
schema_type = "string"
required = false
description = "Customer email address"
Generate and Apply:
qarion-etl build
# Generates: migrations/20241205120001_alter_customers_add_email.json
qarion-etl apply-migrations
# Adds: email column to raw_customers table
Migration History
Qarion ETL tracks migration history to:
- Prevent Duplicates: Ensures migrations are only applied once
- Maintain Order: Applies migrations in chronological order
- Support Rollback: Enables reversing migrations when needed
- Track Evolution: Records schema changes over time
Viewing Migration History
Migration history is stored in the database and can be queried:
# View applied migrations
qarion-etl apply-migrations --dry-run
Best Practices
1. Version Control
Always commit migration files to version control:
git add migrations/
git commit -m "Add migration for orders dataset"
2. Review Before Apply
Always review generated migrations before applying:
- Check the DDL operations
- Verify table names and column definitions
- Ensure compatibility with existing data
3. Test First
Test migrations in a development environment:
# Test in dev database
qarion-etl apply-migrations --conf config.dev.toml
4. Backup Before Production
Backup your database before applying migrations in production:
# Backup database
cp production.db production.db.backup
# Apply migrations
qarion-etl apply-migrations --conf config.prod.toml
5. Incremental Changes
Make small, incremental schema changes:
- ✅ Add one column at a time
- ✅ Test each change independently
- ❌ Avoid large, complex migrations
6. Document Changes
Document significant schema changes:
# datasets/orders.toml
name = "orders"
description = "Customer orders - Added status column in v2"
Troubleshooting
Migration Already Applied
If a migration was already applied, Qarion ETL will skip it:
Migration 20241205120000_create_orders.json already applied, skipping
Migration Conflicts
If migrations conflict, review the migration files and resolve manually:
- Check migration order
- Review DDL operations
- Resolve conflicts
- Regenerate if needed
Rollback
To rollback a migration, use the DOWN DDL operations:
- Review the migration file's
down_ddl - Execute the rollback operations manually
- Update migration history if needed
Related Documentation
- Core Concepts - Understanding datasets and migrations
- Configuration - Configuring migration storage
- CLI Reference - Migration commands