Skip to main content

Repository Plugin System

Overview

The repository plugin system provides an extensible architecture for repository implementations. This allows adding new repository storage types (e.g., S3, Azure Blob, etc.) without modifying core code - just create a plugin module and register it.

Architecture

Plugin Interface

All repository plugins implement the RepositoryPlugin abstract base class, which defines methods for creating different types of repositories:

  • create_history_repository() - Schema history repositories
  • create_dataset_repository() - Dataset definition repositories
  • create_flow_repository() - Flow definition repositories
  • create_migration_file_repository() - Migration file repositories
  • create_migration_file_history_repository() - Migration file history repositories

Built-in Plugins

  1. LocalRepositoryPlugin (storage_type: "local")

    • File-based storage using local filesystem
    • TOML files for datasets/flows
    • JSON files for migrations
  2. DatabaseRepositoryPlugin (storage_type: "database")

    • Database-backed storage
    • Uses DatabaseService for all operations

Usage

Creating Repositories

Use factory functions instead of importing repository classes directly:

from repository import (
create_history_repository,
create_dataset_repository,
create_flow_repository,
create_migration_file_repository,
create_migration_file_history_repository
)

# Create local repositories
history_repo = create_history_repository(
storage_type='local',
migration_dir='migrations'
)

dataset_repo = create_dataset_repository(
storage_type='local',
dataset_dir='datasets'
)

# Create database repositories
history_repo = create_history_repository(
storage_type='database',
db_service=db_service
)

Plugin System API

from repository.plugins import (
register_repository_plugin,
get_repository_plugin,
list_repository_types,
create_repository
)

# List available storage types
types = list_repository_types() # ['local', 'database']

# Get a plugin
plugin = get_repository_plugin('local')

# Create repository using plugin directly
repo = create_repository(
repository_type='history',
storage_type='local',
config={'migration_dir': 'migrations'}
)

Adding a New Repository Type

To add a new repository storage type (e.g., S3, Azure Blob):

1. Create Plugin Module

Create a new file: qarion_etl/repository/plugins/s3_plugin.py

from typing import Dict, Any, Optional
from .base import RepositoryPlugin

class S3RepositoryPlugin(RepositoryPlugin):
"""Plugin for S3-based repositories."""

@property
def storage_type(self) -> str:
return "s3"

@property
def name(self) -> str:
return "S3 Storage"

@property
def description(self) -> str:
return "S3-based storage using AWS S3 buckets"

def create_history_repository(self, config: Dict[str, Any], db_service: Optional[Any] = None):
# Implement S3-based history repository
bucket_name = config.get('bucket_name')
prefix = config.get('prefix', 'history/')
return S3HistoryRepository(bucket_name, prefix)

def create_dataset_repository(self, config: Dict[str, Any], db_service: Optional[Any] = None):
# Implement S3-based dataset repository
bucket_name = config.get('bucket_name')
prefix = config.get('prefix', 'datasets/')
return S3DatasetRepository(bucket_name, prefix)

# ... implement other repository types

2. Register Plugin

Add to qarion_etl/repository/plugins/_init_plugins.py:

from .s3_plugin import S3RepositoryPlugin
from .registry import register_repository_plugin

def _ensure_plugins_loaded():
global _plugins_loaded
if not _plugins_loaded:
# ... existing registrations ...
register_repository_plugin(S3RepositoryPlugin())
_plugins_loaded = True

3. Use New Repository Type

from repository import create_dataset_repository

# Use S3 storage
repo = create_dataset_repository(
storage_type='s3',
dataset_dir='s3://my-bucket/datasets/',
# Additional S3 config would go here
)

Key Principles

  1. No Direct Imports: Never import repository implementation classes directly. Always use factory functions.

  2. Plugin Encapsulation: All repository-specific logic should be in the plugin module. Core code should not know about specific storage types.

  3. Protocol-Based: Repositories implement protocols (interfaces) defined in repository/protocols.py. This ensures type safety and consistency.

  4. Configuration-Driven: Repository creation is driven by configuration (storage_type), not hardcoded logic.

Migration from Direct Imports

If you have code using direct imports:

Before:

from repository import LocalDatasetRepository
repo = LocalDatasetRepository('datasets')

After:

from repository import create_dataset_repository
repo = create_dataset_repository(storage_type='local', dataset_dir='datasets')

Benefits

  1. Extensibility: Add new storage types without modifying core code
  2. Testability: Easy to mock repositories for testing
  3. Consistency: All repository creation follows the same pattern
  4. Type Safety: Protocols ensure all repositories implement required methods
  5. Configuration-Driven: Storage type determined at runtime from configuration