Skip to main content

Sync Strategies & Origin Tracking

When Qarion synchronizes metadata from connected data platforms (Snowflake, BigQuery, dbt), it needs to decide how to handle new, modified, and removed resources. Sync strategies control this reconciliation behavior, while origin tracking protects manual edits from being overwritten by automated scrapes.

Merge Strategies

Each connector can be configured with one of three merge strategies. The strategy is set in the connector's sync_config and defaults to Additive if not specified.

Skip (Legacy)

The simplest approach: create new products for newly discovered resources, but never update or remove existing products. This was the original behavior and is preserved for backward compatibility.

New ResourcesExisting ResourcesRemoved Resources
✅ Created❌ Not updated❌ Not removed

When to use: When you want the scraper to only seed initial products and all subsequent updates should be manual.

Additive (Default)

A safe middle ground: create new products, update metadata on existing products, and add new fields — but never remove existing products or fields.

New ResourcesExisting ResourcesRemoved Resources
✅ Created✅ Metadata updated❌ Not removed

When to use: For most deployments. Ensures your catalog stays up-to-date while preventing accidental data loss if a source table is temporarily unavailable.

Refresh (Full Sync)

The most aggressive approach: create new products, update metadata and fields on existing products, and mark removed resources as stale. This strategy also performs a full field refresh — adding new fields, updating types on existing fields, and removing fields that no longer exist in the source.

New ResourcesExisting ResourcesRemoved Resources
✅ Created✅ Metadata + fields updated⚠️ Marked stale

When to use: When your catalog should be a faithful mirror of the source system. Resources that disappear from the source are flagged as stale rather than deleted, so no data is permanently lost.

note

Stale products are flagged with is_stale = true and a stale_since timestamp. They remain searchable and visible but are clearly indicated in the UI as potentially outdated.

Origin Tracking

Every product and field in the catalog carries an origin marker indicating whether it was created by an automated scrape or manually by a user.

OriginMeaning
scrapedCreated or last modified by the metadata scraper
manualCreated or edited by a user in the UI

How Origin Works

  • When the scraper creates a new product or field, origin is set to scraped.
  • When a user creates a product manually or edits a scraped product, origin changes to manual.
  • The user_edited_fields JSON column on products tracks exactly which fields were manually edited (e.g., ["description", "sensitivity"]).

Origin Modes

Origin modes control how the Refresh strategy interacts with manually edited data. The mode is set in the connector's sync_config.

Protect Manual (Default)

The scraper only updates or deletes fields and metadata that originated from a previous scrape. Any manually edited fields are preserved.

  • Products with origin = manual have their metadata skipped
  • Fields with origin = manual are never deleted or type-updated
  • Fields listed in user_edited_fields are not overwritten

When to use: For most deployments. Ensures that business context added by data stewards (descriptions, sensitivity labels, etc.) is never lost during a sync.

Scraper Wins

The scraper always updates metadata and fields regardless of origin. Manual edits are overwritten on the next sync.

  • All products are updated, regardless of origin
  • All fields can be deleted or type-updated
  • user_edited_fields is ignored

When to use: When the source system is the single source of truth and manual overrides should be temporary.

Manual Only

The scraper never updates existing metadata or deletes fields. It only adds new products and new fields.

  • Existing product metadata is never modified
  • Existing field types are never changed
  • No fields are ever deleted

When to use: When you want the scraper to discover new assets but all metadata curation is done manually.

Configuration

Merge strategy and origin mode are configured per connector in the connector settings. The relevant sync_config keys are:

KeyValuesDefault
merge_strategyskip, additive, refreshadditive
origin_modeprotect_manual, scraper_wins, manual_onlyprotect_manual

Example Configuration

{
"merge_strategy": "refresh",
"origin_mode": "protect_manual"
}

This configures a full sync that creates, updates, and marks stale resources — while protecting any manual edits from being overwritten.

Drift Detection

Before applying any merge strategy, Qarion performs drift detection — comparing the scraped metadata against the existing catalog to identify:

  • New resources — Tables or views that exist in the source but not in the catalog
  • Modified resources — Resources where metadata (name, schema, fields) has changed
  • Removed resources — Resources that exist in the catalog but no longer appear in the source

The drift result is then passed to the configured merge strategy for reconciliation.

Best Practices

Start with Additive

Begin with the default Additive strategy until your team is comfortable with automated syncing. This prevents accidental removal of products that may have been manually enriched with descriptions, ownership, and quality checks.

Use Protect Manual for Curated Catalogs

If your data stewards actively curate product metadata (descriptions, sensitivity, criticality), use the Protect Manual origin mode to safeguard their work.

Monitor Staleness

When using the Refresh strategy, regularly review stale products. A product marked stale may indicate a legitimate removal in the source, or it could mean a temporary connectivity issue during the last sync.

Learn More