Data Export Guide

Overview

Qarion ETL provides a comprehensive export system for exporting data from tables to files in various formats. This complements the file loading system, allowing you to export processed data for downstream consumption, reporting, or integration with other systems.

There are two ways to export data in Qarion ETL:

Export Flow: Declarative flow-based export with multiple export modes (full, batch, incremental, changes_only)
Export Executor API: Programmatic export using Python API

Quick Start

Using Export Flow (Recommended)

The easiest way to export data is using an Export Flow:

# flows/export_orders.toml
id = "export_orders"
name = "Export Orders"
flow_type = "export_flow"
namespace = "public"

[input]
columns = []  # Export flows don't require input columns

[properties.export]
source_dataset_id = "orders_staging"
file_path = "exports/orders.csv"
export_mode = "full"
format = "csv"
exporter_config = {
    delimiter = ","
    include_header = true
    encoding = "utf-8"
}

Then run the export flow:

qarion-etl run --flow export_orders

For complete Export Flow documentation, see Export Flow Reference.

Using Export Executor API

Export a table to a CSV file programmatically:

from exporters import ExportExecutor
from engines import EngineLoader

# Load and connect to engine
engine = EngineLoader.load({'name': 'sqlite', 'config': {'path': 'data.db'}})
engine.connect()

# Create export executor
executor = ExportExecutor(engine)

# Export table
result = executor.export_table(
    source_table='orders',
    file_path='exports/orders.csv'
)

print(f"Exported {result.rows_exported} rows")

Supported Formats

CSV

Export to CSV format (comma-separated values):

result = executor.export_table(
    source_table='orders',
    file_path='exports/orders.csv',
    format='csv',
    config={
        'delimiter': ',',
        'include_header': True,
        'encoding': 'utf-8'
    }
)

Configuration Options:

delimiter: Field delimiter (default: ',')
include_header: Include header row (default: True)
encoding: File encoding (default: 'utf-8')
columns: List of columns to export (default: all columns)

JSON

Export to JSON format:

# Standard JSON (array of objects)
result = executor.export_table(
    source_table='orders',
    file_path='exports/orders.json',
    format='json',
    config={
        'indent': 2,  # Pretty printing
        'jsonl': False
    }
)

# JSONL format (one object per line)
result = executor.export_table(
    source_table='orders',
    file_path='exports/orders.jsonl',
    format='json',
    config={'jsonl': True}
)

Configuration Options:

indent: Indentation level for pretty printing (default: None)
jsonl: Use JSONL format (default: False)
columns: List of columns to export (default: all columns)
encoding: File encoding (default: 'utf-8')

Parquet

Export to Parquet format (columnar storage):

result = executor.export_table(
    source_table='analytics_data',
    file_path='exports/analytics.parquet',
    format='parquet',
    config={
        'compression': 'snappy',
        'columns': ['id', 'value', 'timestamp']
    }
)

Configuration Options:

compression: Compression codec (default: 'snappy')
columns: List of columns to export (default: all columns)
index: Include index (default: False)

Note: Parquet export requires pandas and pyarrow:

pip install pandas pyarrow

Exporting Filtered Data

Use SQL queries to export filtered or transformed data:

# Export recent orders
result = executor.export_table(
    source_table='orders',
    file_path='exports/recent_orders.csv',
    query="SELECT * FROM orders WHERE order_date > '2024-01-01'"
)

# Export aggregated data
result = executor.export_table(
    source_table='orders',
    file_path='exports/daily_totals.csv',
    query="""
        SELECT
            DATE(order_date) as date,
            COUNT(*) as order_count,
            SUM(total) as total_revenue
        FROM orders
        GROUP BY DATE(order_date)
        ORDER BY date
    """
)

Exporting Specific Columns

Export only the columns you need:

result = executor.export_table(
    source_table='orders',
    file_path='exports/orders_summary.csv',
    config={
        'columns': ['id', 'order_date', 'total', 'status']
    }
)

Format Auto-Detection

The export format is automatically detected from the file extension:

# Format detected from extension
executor.export_table('orders', 'exports/orders.csv')   # CSV
executor.export_table('orders', 'exports/orders.json')  # JSON
executor.export_table('orders', 'exports/orders.parquet')  # Parquet

Error Handling

Export operations return results with success status:

result = executor.export_table(
    source_table='orders',
    file_path='exports/orders.csv'
)

if result.success:
    print(f"Successfully exported {result.rows_exported} rows")
    print(f"File: {result.file_path}")
else:
    print(f"Export failed: {result.error}")

Use Cases

Export processed data for sharing with other teams or systems:

# Export to CSV for Excel users
executor.export_table(
    source_table='monthly_report',
    file_path='reports/monthly_report.csv',
    config={'include_header': True, 'encoding': 'utf-8-sig'}
)

2. API Integration

Export to JSON for API consumption:

executor.export_table(
    source_table='api_data',
    file_path='api/export.json',
    config={'indent': 2, 'ensure_ascii': False}
)

3. Analytical Processing

Export to Parquet for analytical workloads:

executor.export_table(
    source_table='analytics_data',
    file_path='analytics/data.parquet',
    config={'compression': 'snappy'}
)

4. Incremental Exports

Export only new or changed data:

executor.export_table(
    source_table='orders',
    file_path='exports/new_orders.csv',
    query="SELECT * FROM orders WHERE export_status IS NULL"
)

Best Practices

Use Appropriate Formats: Choose formats based on your use case
- CSV for Excel compatibility
- JSON for API/web integration
- Parquet for analytical processing
Filter Before Export: Use queries to filter data rather than exporting entire tables
Select Columns: Export only the columns you need to reduce file size
Handle Errors: Always check result.success and handle errors appropriately
Use Descriptive Paths: Use clear, descriptive file paths for exports

Export Flow Configuration Examples

Full Export

Export all data from a dataset:

# flows/export_orders_full.toml
id = "export_orders_full"
name = "Export Orders (Full)"
flow_type = "export_flow"

[properties.export]
source_dataset_id = "orders_staging"
file_path = "exports/orders_full.csv"
export_mode = "full"
format = "csv"
exporter_config = {
    delimiter = ","
    include_header = true
    encoding = "utf-8"
}

Incremental Export

Export only new/changed records:

# flows/export_orders_incremental.toml
id = "export_orders_incremental"
name = "Export Orders (Incremental)"
flow_type = "export_flow"

[properties.export]
source_dataset_id = "change_feed_orders"
file_path = "exports/orders_incremental_{batch_id}.csv"
export_mode = "incremental"
format = "csv"

Batch Export

Export specific batch:

# flows/export_orders_batch.toml
id = "export_orders_batch"
name = "Export Orders (Batch)"
flow_type = "export_flow"

[properties.export]
source_dataset_id = "orders_staging"
file_path = "exports/orders_batch_{batch_id}.csv"
export_mode = "batch"
batch_id = 42
format = "csv"

Changes Only Export

Export only changed records from change feed:

# flows/export_orders_changes.toml
id = "export_orders_changes"
name = "Export Order Changes"
flow_type = "export_flow"

[properties.export]
source_dataset_id = "change_feed_orders"
file_path = "exports/orders_changes_{batch_id}.csv"
export_mode = "changes_only"
format = "csv"

Parquet Export

Export to Parquet format:

# flows/export_orders_parquet.toml
id = "export_orders_parquet"
name = "Export Orders to Parquet"
flow_type = "export_flow"

[properties.export]
source_dataset_id = "orders_staging"
file_path = "exports/orders.parquet"
export_mode = "full"
format = "parquet"
exporter_config = {
    compression = "snappy"
    columns = ["id", "customer_id", "amount", "order_date"]
}

JSON Export

Export to JSON format:

# flows/export_orders_json.toml
id = "export_orders_json"
name = "Export Orders to JSON"
flow_type = "export_flow"

[properties.export]
source_dataset_id = "orders_staging"
file_path = "exports/orders.json"
export_mode = "full"
format = "json"
exporter_config = {
    indent = 2
    jsonl = false
    encoding = "utf-8"
}

Export Operation Tracking

All export operations are automatically tracked in the metadata system. Each export operation is recorded in xt_export_operations with:

Export type and mode
Source dataset and destination path
Execution timing
Results (rows exported, success status)
Export configuration

You can query export history to monitor export operations:

-- Get recent export operations
SELECT
    export_operation_id,
    export_type,
    source_dataset_id,
    destination_path,
    rows_exported,
    status,
    start_time
FROM xt_export_operations
ORDER BY start_time DESC
LIMIT 10;

For more information about accessing and using execution metadata, see the Metadata Tracking Guide.

Export Flow Reference - Complete Export Flow documentation with all modes and options
Metadata Tracking - Tracking and monitoring all operations and executions
Export System Reference - Complete export system documentation
Load Planning System - File loading system
Engines and Storage - Engine configuration

Overview​

Quick Start​

Using Export Flow (Recommended)​

Using Export Executor API​

Supported Formats​

CSV​

JSON​

Parquet​

Exporting Filtered Data​

Exporting Specific Columns​

Format Auto-Detection​

Error Handling​

Use Cases​

1. Data Sharing​

2. API Integration​

3. Analytical Processing​

4. Incremental Exports​

Best Practices​

Export Flow Configuration Examples​

Full Export​

Incremental Export​

Batch Export​

Changes Only Export​

Parquet Export​

JSON Export​

Export Operation Tracking​

Related Documentation​

Overview

Quick Start

Using Export Flow (Recommended)

Using Export Executor API

Supported Formats

CSV

JSON

Parquet

Exporting Filtered Data

Exporting Specific Columns

Format Auto-Detection

Error Handling

Use Cases

1. Data Sharing

2. API Integration

3. Analytical Processing

4. Incremental Exports

Best Practices

Export Flow Configuration Examples

Full Export

Incremental Export

Batch Export

Changes Only Export

Parquet Export

JSON Export

Export Operation Tracking

Related Documentation