Lineage and SQL Conversion Test Suite

Overview

Comprehensive test suite for SQL parsing, lineage extraction, SQL-to-instruction conversion, and execution on non-SQL engines (pandas).

Test Files

1. `test_lineage_extraction.py` (253 lines)

Purpose: Test lineage extraction from SQL queries

Test Classes:

TestSQLParsing: SQL parsing with various query types
TestLineageExtraction: Table and column-level lineage extraction
TestTaskLineageExtraction: Lineage extraction from tasks
TestFlowLineage: Flow-level lineage extraction

Coverage:

Simple SELECT queries
INSERT INTO ... SELECT statements
JOIN queries
Aggregation queries
Complex queries with subqueries
Dataset mapping
Task-level lineage
Flow-level lineage

2. `test_sql_to_instruction_conversion.py` (476 lines)

Purpose: Test SQL to transformation instruction conversion

Test Classes:

TestSQLToInstructionConversion: Basic conversion tests
TestPandasExecution: Execution on pandas engine
TestEndToEndSQLToPandas: Complete workflow tests

Coverage:

Simple SELECT conversion
INSERT INTO ... SELECT conversion
WHERE clause extraction
Aggregation handling
Calculation extraction
Primary key detection
Numerical column detection
Pandas execution
End-to-end workflows

3. `test_lineage_pandas_integration.py` (381 lines)

Purpose: Integration tests for lineage with pandas execution

Test Classes:

TestSQLToPandasWorkflow: SQL -> Pandas execution workflows
TestLineageIntegration: Lineage integration with dependencies
TestComplexSQLPatterns: Complex SQL pattern handling

Coverage:

Simple transformations on pandas
Calculations on pandas
Filtering on pandas
Dependency inference from lineage
CTE parsing
Subquery parsing
Multiple JOINs
UNION queries

4. `test_sql_pandas_end_to_end.py` (342 lines)

Purpose: End-to-end tests for complete workflows

Test Classes:

TestSQLToPandasEndToEnd: Complete SQL -> Pandas workflows
TestLineageExtractionComprehensive: Comprehensive lineage tests
TestSQLParserComprehensive: Comprehensive SQL parsing tests

Coverage:

Simple SELECT with filtering
Calculations
Column selection and projection
Multiple transformations in sequence
Various SQL query types

5. `test_lineage_comprehensive.py` (342 lines)

Purpose: Comprehensive integration tests

Test Classes:

TestCompleteWorkflow: Complete workflow tests
TestSQLExpressionEvaluation: SQL expression evaluation in pandas
TestInstructionStructure: Instruction structure validation

Coverage:

Complete SQL -> Lineage -> Instruction -> Pandas workflows
Multiple transformations in sequence
Expression evaluation
Instruction serialization

Running Tests

Prerequisites

# Install required dependencies
pip install sqlglot pandas pytest

# Optional: For enhanced lineage analysis
pip install sqllineage

Run All Tests

# Run all lineage and SQL tests
pytest tests/test_lineage*.py tests/test_sql*.py -v

# Run with coverage
pytest tests/test_lineage*.py tests/test_sql*.py --cov=qarion-etl.sql --cov=qarion-etl.lineage -v

Run Specific Test Files

# Lineage extraction tests
pytest tests/test_lineage_extraction.py -v

# SQL to instruction conversion
pytest tests/test_sql_to_instruction_conversion.py -v

# Pandas integration
pytest tests/test_lineage_pandas_integration.py -v

# End-to-end tests
pytest tests/test_sql_pandas_end_to_end.py -v

# Comprehensive tests
pytest tests/test_lineage_comprehensive.py -v

Run Specific Test Classes

pytest tests/test_lineage_extraction.py::TestSQLParsing -v
pytest tests/test_sql_to_instruction_conversion.py::TestPandasExecution -v

Run Specific Tests

pytest tests/test_sql_pandas_end_to_end.py::TestSQLToPandasEndToEnd::test_simple_select_filter -v

Test Structure

Fixtures

sample_datasets: Sample dataset definitions
pandas_engine: Pandas in-memory engine
pandas_executor: Pandas transformation executor
complete_setup: Complete setup (engine, executor, converter, extractor)

Test Patterns

@pytest.mark.skipif(not SQLGLOT_AVAILABLE, reason="SQLGlot not available")
@pytest.mark.skipif(not PANDAS_AVAILABLE, reason="Pandas not available")
def test_example(pandas_setup):
    """Test description."""
    engine, executor = pandas_setup

    # 1. Setup data
    # 2. Convert SQL to instruction
    # 3. Execute on pandas
    # 4. Verify results

Test Coverage

SQL Parsing ✅

Lineage Extraction ✅

SQL to Instruction Conversion ✅

Pandas Execution ✅

Integration ✅

SQL -> Lineage -> Instruction -> Pandas
Dependency inference
End-to-end workflows
Instruction serialization

Example Test Scenarios

Scenario 1: Simple Filtering

# SQL
sql = "SELECT id, name, amount FROM orders WHERE status = 'active'"

# Convert to instruction
instruction = converter.convert(sql, 'orders', 'staging_orders', batch_id=1)

# Execute on pandas
result = executor.execute_transformation_instruction(instruction)

# Verify: Only active orders in target

Scenario 2: Calculations

# SQL with calculations
sql = """
SELECT
    id,
    amount * 1.1 as adjusted_amount,
    amount + tax as total_amount
FROM orders
"""

# Convert and execute
instruction = converter.convert(sql, 'orders', 'staging_orders', batch_id=1)
result = executor.execute_transformation_instruction(instruction)

# Verify: Calculations applied correctly

Scenario 3: Lineage Extraction

# Extract lineage
extractor = LineageExtractor()
lineage = extractor.extract_from_sql(sql, datasets)

# Verify: Source and target datasets identified
assert "orders" in lineage.source_datasets
assert lineage.target_dataset == "staging_orders"

Expected Test Results

When all dependencies are installed:

tests/test_lineage_extraction.py::TestSQLParsing::test_parse_simple_select PASSED
tests/test_lineage_extraction.py::TestLineageExtraction::test_extract_table_lineage_simple PASSED
tests/test_sql_to_instruction_conversion.py::TestSQLToInstructionConversion::test_convert_simple_select PASSED
tests/test_sql_pandas_end_to_end.py::TestSQLToPandasEndToEnd::test_simple_select_filter PASSED
tests/test_lineage_comprehensive.py::TestCompleteWorkflow::test_complete_sql_to_pandas_workflow PASSED

Troubleshooting

Tests Skipped

If tests are skipped:

Check dependencies: pip install sqlglot pandas pytest
Verify imports: Check for import errors in test output
Check fixtures: Ensure fixtures are properly defined

Import Errors

If you see import errors:

Ensure you're in the project root directory
Check that qarion-etl is in Python path
Verify all dependencies are installed

Pandas Execution Errors

If pandas tests fail:

Ensure pandas is installed: pip install pandas
Check that PandasInMemoryEngine is available
Verify engine initialization in fixtures

SQL Parsing Errors

If SQL parsing fails:

Ensure SQLGlot is installed: pip install sqlglot
Check SQL syntax (SQLGlot may not support all dialects)
Verify SQL query is valid

Continuous Integration

These tests are designed to run in CI/CD pipelines:

# Example GitHub Actions
- name: Install dependencies
  run: |
    pip install sqlglot pandas pytest

- name: Run lineage tests
  run: |
    pytest tests/test_lineage*.py tests/test_sql*.py -v

Performance Considerations

SQL parsing is fast (SQLGlot is optimized)
Lineage extraction adds minimal overhead
Pandas execution is in-memory (suitable for test data)
Large datasets may require optimization

Future Enhancements

Add tests for PySpark execution
Add tests for complex aggregations
Add tests for window functions
Add performance benchmarks
Add tests for cross-flow lineage
Add tests for lineage visualization

Overview​

Test Files​

1. test_lineage_extraction.py (253 lines)​

2. test_sql_to_instruction_conversion.py (476 lines)​

3. test_lineage_pandas_integration.py (381 lines)​

4. test_sql_pandas_end_to_end.py (342 lines)​

5. test_lineage_comprehensive.py (342 lines)​

Running Tests​

Prerequisites​

Run All Tests​

Run Specific Test Files​

Run Specific Test Classes​

Run Specific Tests​

Test Structure​

Fixtures​

Test Patterns​

Test Coverage​

SQL Parsing ✅​

Lineage Extraction ✅​

SQL to Instruction Conversion ✅​

Pandas Execution ✅​

Integration ✅​

Example Test Scenarios​

Scenario 1: Simple Filtering​

Scenario 2: Calculations​

Scenario 3: Lineage Extraction​

Expected Test Results​

Troubleshooting​

Tests Skipped​

Import Errors​

Pandas Execution Errors​

SQL Parsing Errors​

Continuous Integration​

Performance Considerations​

Future Enhancements​

Related Documentation​

Overview

Test Files

1. `test_lineage_extraction.py` (253 lines)

2. `test_sql_to_instruction_conversion.py` (476 lines)

3. `test_lineage_pandas_integration.py` (381 lines)

4. `test_sql_pandas_end_to_end.py` (342 lines)

5. `test_lineage_comprehensive.py` (342 lines)

Running Tests

Prerequisites

Run All Tests

Run Specific Test Files

Run Specific Test Classes

Run Specific Tests

Test Structure

Fixtures

Test Patterns

Test Coverage

SQL Parsing ✅

Lineage Extraction ✅

SQL to Instruction Conversion ✅

Pandas Execution ✅

Integration ✅

Example Test Scenarios

Scenario 1: Simple Filtering

Scenario 2: Calculations

Scenario 3: Lineage Extraction

Expected Test Results

Troubleshooting

Tests Skipped

Import Errors

Pandas Execution Errors

SQL Parsing Errors

Continuous Integration

Performance Considerations

Future Enhancements

Related Documentation