Sync Strategies & Origin Tracking
When Qarion synchronizes metadata from connected data platforms (Snowflake, BigQuery, dbt), it needs to decide how to handle new, modified, and removed resources. Sync strategies control this reconciliation behavior, while origin tracking protects manual edits from being overwritten by automated scrapes.
Merge Strategies
Each connector can be configured with one of three merge strategies. The strategy is set in the connector's sync_config and defaults to Additive if not specified.
Skip (Legacy)
The simplest approach: create new products for newly discovered resources, but never update or remove existing products. This was the original behavior and is preserved for backward compatibility.
| New Resources | Existing Resources | Removed Resources |
|---|---|---|
| ✅ Created | ❌ Not updated | ❌ Not removed |
When to use: When you want the scraper to only seed initial products and all subsequent updates should be manual.
Additive (Default)
A safe middle ground: create new products, update metadata on existing products, and add new fields — but never remove existing products or fields.
| New Resources | Existing Resources | Removed Resources |
|---|---|---|
| ✅ Created | ✅ Metadata updated | ❌ Not removed |
When to use: For most deployments. Ensures your catalog stays up-to-date while preventing accidental data loss if a source table is temporarily unavailable.
Refresh (Full Sync)
The most aggressive approach: create new products, update metadata and fields on existing products, and mark removed resources as stale. This strategy also performs a full field refresh — adding new fields, updating types on existing fields, and removing fields that no longer exist in the source.
| New Resources | Existing Resources | Removed Resources |
|---|---|---|
| ✅ Created | ✅ Metadata + fields updated | ⚠️ Marked stale |
When to use: When your catalog should be a faithful mirror of the source system. Resources that disappear from the source are flagged as stale rather than deleted, so no data is permanently lost.
Stale products are flagged with is_stale = true and a stale_since timestamp. They remain searchable and visible but are clearly indicated in the UI as potentially outdated.
Origin Tracking
Every product and field in the catalog carries an origin marker indicating whether it was created by an automated scrape or manually by a user.
| Origin | Meaning |
|---|---|
scraped | Created or last modified by the metadata scraper |
manual | Created or edited by a user in the UI |
How Origin Works
- When the scraper creates a new product or field, origin is set to
scraped. - When a user creates a product manually or edits a scraped product, origin changes to
manual. - The
user_edited_fieldsJSON column on products tracks exactly which fields were manually edited (e.g.,["description", "sensitivity"]).
Origin Modes
Origin modes control how the Refresh strategy interacts with manually edited data. The mode is set in the connector's sync_config.
Protect Manual (Default)
The scraper only updates or deletes fields and metadata that originated from a previous scrape. Any manually edited fields are preserved.
- Products with
origin = manualhave their metadata skipped - Fields with
origin = manualare never deleted or type-updated - Fields listed in
user_edited_fieldsare not overwritten
When to use: For most deployments. Ensures that business context added by data stewards (descriptions, sensitivity labels, etc.) is never lost during a sync.
Scraper Wins
The scraper always updates metadata and fields regardless of origin. Manual edits are overwritten on the next sync.
- All products are updated, regardless of origin
- All fields can be deleted or type-updated
user_edited_fieldsis ignored
When to use: When the source system is the single source of truth and manual overrides should be temporary.
Manual Only
The scraper never updates existing metadata or deletes fields. It only adds new products and new fields.
- Existing product metadata is never modified
- Existing field types are never changed
- No fields are ever deleted
When to use: When you want the scraper to discover new assets but all metadata curation is done manually.
Configuration
Merge strategy and origin mode are configured per connector in the connector settings. The relevant sync_config keys are:
| Key | Values | Default |
|---|---|---|
merge_strategy | skip, additive, refresh | additive |
origin_mode | protect_manual, scraper_wins, manual_only | protect_manual |
Example Configuration
{
"merge_strategy": "refresh",
"origin_mode": "protect_manual"
}
This configures a full sync that creates, updates, and marks stale resources — while protecting any manual edits from being overwritten.
Drift Detection
Before applying any merge strategy, Qarion performs drift detection — comparing the scraped metadata against the existing catalog to identify:
- New resources — Tables or views that exist in the source but not in the catalog
- Modified resources — Resources where metadata (name, schema, fields) has changed
- Removed resources — Resources that exist in the catalog but no longer appear in the source
The drift result is then passed to the configured merge strategy for reconciliation.
Best Practices
Start with Additive
Begin with the default Additive strategy until your team is comfortable with automated syncing. This prevents accidental removal of products that may have been manually enriched with descriptions, ownership, and quality checks.
Use Protect Manual for Curated Catalogs
If your data stewards actively curate product metadata (descriptions, sensitivity, criticality), use the Protect Manual origin mode to safeguard their work.
Monitor Staleness
When using the Refresh strategy, regularly review stale products. A product marked stale may indicate a legitimate removal in the source, or it could mean a temporary connectivity issue during the last sync.
Learn More
- BigQuery Integration — Connecting to Google BigQuery
- Snowflake Integration — Connecting to Snowflake
- dbt Integration — Syncing dbt models and lineage