Skip to main content

Migrations Guide

A comprehensive guide to understanding and working with migrations in Qarion ETL.

Overview

Migrations in Qarion ETL are the bridge between dataset definitions and database tables. They ensure your database schema stays synchronized with your dataset definitions as your data models evolve.

What Are Migrations?

Migrations are structured files that contain database schema changes (DDL operations). They:

  • Materialize Datasets: Convert dataset definitions into physical database tables
  • Track Changes: Record schema evolution over time
  • Enable Rollback: Support reversing schema changes when needed
  • Ensure Consistency: Keep database schema in sync with dataset definitions

The Migration Workflow

1. Define Datasets

Start by defining your datasets in TOML files:

# datasets/orders.toml
name = "orders"
namespace = "raw"
description = "Customer orders"

[columns]
[columns.id]
schema_type = "integer"
required = true
primary_key = true
description = "Order identifier"

[columns.customer_id]
schema_type = "integer"
required = true
description = "Customer identifier"

[columns.amount]
schema_type = "float"
required = true
description = "Order total amount"

[columns.created_at]
schema_type = "timestamp"
required = false
description = "Order creation timestamp"

2. Generate Migrations

Generate migration files from your dataset definitions:

qarion-etl build

Or specifically generate migrations:

qarion-etl generate-migrations

This will:

  • Scan all dataset definition files
  • Compare them with the current database schema
  • Generate migration files for any differences
  • Save migration files to the migrations/ directory

3. Review Migrations

Review the generated migration files before applying:

{
"metadata": {
"dataset_id": "orders",
"dataset_name": "orders",
"action": "CREATE",
"new_version": 1,
"previous_version": null,
"compatibility_mode": "strict"
},
"up_ddl": [
{
"operation": "CREATE_TABLE",
"table_name": "raw_orders",
"columns": [
{
"name": "id",
"agnostic_type": "INTEGER",
"constraints": ["PRIMARY KEY"]
},
{
"name": "customer_id",
"agnostic_type": "INTEGER",
"constraints": ["NOT NULL"]
},
{
"name": "amount",
"agnostic_type": "REAL",
"constraints": ["NOT NULL"]
},
{
"name": "created_at",
"agnostic_type": "TIMESTAMP",
"constraints": []
}
]
}
],
"down_ddl": [
{
"operation": "DROP_TABLE",
"table_name": "raw_orders"
}
]
}

4. Apply Migrations

Apply migrations to your database:

# Apply all pending migrations
qarion-etl apply-migrations

# Dry run (validate without executing)
qarion-etl apply-migrations --dry-run

Migration File Structure

Migration files are JSON files with the following structure:

Metadata

{
"metadata": {
"dataset_id": "orders",
"dataset_name": "orders",
"action": "CREATE",
"new_version": 1,
"previous_version": null,
"compatibility_mode": "strict"
}
}
  • dataset_id: Unique identifier for the dataset
  • dataset_name: Human-readable dataset name
  • action: Migration action (CREATE, ALTER, NO_CHANGE)
  • new_version: New schema version number
  • previous_version: Previous schema version (null for CREATE)
  • compatibility_mode: Schema evolution mode used

UP DDL

Operations to apply the migration (forward direction):

{
"up_ddl": [
{
"operation": "CREATE_TABLE",
"table_name": "raw_orders",
"columns": [...]
}
]
}

DOWN DDL

Operations to rollback the migration (reverse direction):

{
"down_ddl": [
{
"operation": "DROP_TABLE",
"table_name": "raw_orders"
}
]
}

Migration Actions

CREATE

Creates a new table from a dataset definition. Generated when:

  • A new dataset is defined
  • No table exists for the dataset in the database

Example:

{
"operation": "CREATE_TABLE",
"table_name": "raw_orders",
"columns": [...]
}

ALTER

Modifies an existing table when a dataset definition changes. Generated when:

  • Columns are added, removed, or modified
  • Constraints are changed
  • Data types are updated

Example:

{
"operation": "ADD_COLUMN",
"table_name": "raw_orders",
"column": {
"name": "status",
"agnostic_type": "TEXT",
"constraints": ["NOT NULL"]
}
}

NO_CHANGE

No migration needed. Generated when:

  • Dataset definition matches current database schema
  • No changes detected

Schema Evolution Modes

Migrations respect the schema evolution mode configured for each dataset:

Strict Mode

  • No schema changes allowed
  • Raises errors on schema mismatches
  • Use when schema stability is critical
[schema_evolution]
compatibility_mode = "strict"

Forward Mode

  • Allows adding new columns
  • Prevents removing or modifying existing columns
  • Use for backward-compatible changes
[schema_evolution]
compatibility_mode = "forward"

Normal Mode

  • Allows all schema changes
  • Includes validation
  • Use when you need full flexibility
[schema_evolution]
compatibility_mode = "normal"

Supported DDL Operations

CREATE_TABLE

Creates a new table with specified columns and constraints.

{
"operation": "CREATE_TABLE",
"table_name": "raw_orders",
"columns": [
{
"name": "id",
"agnostic_type": "INTEGER",
"constraints": ["PRIMARY KEY", "NOT NULL"]
}
]
}

DROP_TABLE

Drops a table if it exists.

{
"operation": "DROP_TABLE",
"table_name": "raw_orders"
}

ADD_COLUMN

Adds a new column to an existing table.

{
"operation": "ADD_COLUMN",
"table_name": "raw_orders",
"column": {
"name": "status",
"agnostic_type": "TEXT",
"constraints": ["NOT NULL"]
}
}

DROP_COLUMN

Drops a column from a table.

{
"operation": "DROP_COLUMN",
"table_name": "raw_orders",
"column_name": "old_column"
}

ALTER_COLUMN

Modifies an existing column (type, constraints, etc.).

{
"operation": "ALTER_COLUMN",
"table_name": "raw_orders",
"column_name": "amount",
"new_type": "DECIMAL",
"new_constraints": ["NOT NULL"]
}

Migration Workflow Examples

Example 1: Creating a New Dataset

  1. Define Dataset:
# datasets/products.toml
name = "products"
namespace = "raw"

[columns]
[columns.id]
schema_type = "integer"
required = true
primary_key = true
  1. Generate Migration:
qarion-etl build
  1. Review Migration File:
migrations/20241205120000_create_products.json
  1. Apply Migration:
qarion-etl apply-migrations
  1. Result: raw_products table is created in the database

Example 2: Adding a Column

  1. Update Dataset:
# datasets/products.toml
name = "products"
namespace = "raw"

[columns]
[columns.id]
schema_type = "integer"
required = true
primary_key = true

[columns.name]
schema_type = "string"
required = true
description = "Product name"
  1. Generate Migration:
qarion-etl build
  1. Review Migration File:
migrations/20241205120001_alter_products_add_name.json

The generated migration file will contain:

{
"metadata": {
"dataset_id": "products",
"dataset_name": "products",
"action": "ALTER",
"new_version": 2,
"previous_version": 1,
"compatibility_mode": "forward"
},
"up_ddl": [
{
"operation": "ADD_COLUMN",
"table_name": "raw_products",
"column": {
"name": "name",
"agnostic_type": "TEXT",
"constraints": ["NOT NULL"]
}
}
],
"down_ddl": [
{
"operation": "DROP_COLUMN",
"table_name": "raw_products",
"column_name": "name"
}
]
}
  1. Apply Migration:
qarion-etl apply-migrations
  1. Result: name column is added to raw_products table

Example 3: Complete Migration Workflow

Initial Dataset:

# datasets/customers.toml
name = "customers"
namespace = "raw"

[columns]
[columns.id]
schema_type = "integer"
required = true
primary_key = true

[columns.name]
schema_type = "string"
required = true

After First Migration:

qarion-etl build
qarion-etl apply-migrations
# Creates: raw_customers table with id and name columns

Add Email Column:

# datasets/customers.toml
name = "customers"
namespace = "raw"

[columns]
[columns.id]
schema_type = "integer"
required = true
primary_key = true

[columns.name]
schema_type = "string"
required = true

[columns.email]
schema_type = "string"
required = false
description = "Customer email address"

Generate and Apply:

qarion-etl build
# Generates: migrations/20241205120001_alter_customers_add_email.json
qarion-etl apply-migrations
# Adds: email column to raw_customers table

Migration History

Qarion ETL tracks migration history to:

  • Prevent Duplicates: Ensures migrations are only applied once
  • Maintain Order: Applies migrations in chronological order
  • Support Rollback: Enables reversing migrations when needed
  • Track Evolution: Records schema changes over time

Viewing Migration History

Migration history is stored in the database and can be queried:

# View applied migrations
qarion-etl apply-migrations --dry-run

Best Practices

1. Version Control

Always commit migration files to version control:

git add migrations/
git commit -m "Add migration for orders dataset"

2. Review Before Apply

Always review generated migrations before applying:

  • Check the DDL operations
  • Verify table names and column definitions
  • Ensure compatibility with existing data

3. Test First

Test migrations in a development environment:

# Test in dev database
qarion-etl apply-migrations --conf config.dev.toml

4. Backup Before Production

Backup your database before applying migrations in production:

# Backup database
cp production.db production.db.backup

# Apply migrations
qarion-etl apply-migrations --conf config.prod.toml

5. Incremental Changes

Make small, incremental schema changes:

  • ✅ Add one column at a time
  • ✅ Test each change independently
  • ❌ Avoid large, complex migrations

6. Document Changes

Document significant schema changes:

# datasets/orders.toml
name = "orders"
description = "Customer orders - Added status column in v2"

Troubleshooting

Migration Already Applied

If a migration was already applied, Qarion ETL will skip it:

Migration 20241205120000_create_orders.json already applied, skipping

Migration Conflicts

If migrations conflict, review the migration files and resolve manually:

  1. Check migration order
  2. Review DDL operations
  3. Resolve conflicts
  4. Regenerate if needed

Rollback

To rollback a migration, use the DOWN DDL operations:

  1. Review the migration file's down_ddl
  2. Execute the rollback operations manually
  3. Update migration history if needed