Data Profiling

Data Profiling computes column-level statistics for your data products, giving you visibility into data quality, distribution, and completeness directly within the catalog.

What is Data Profiling?

When profiling runs against a product, Qarion connects to the underlying data source, samples rows, and computes statistics for each column. The results are displayed in the product's Profile tab, providing an at-a-glance view of your data's shape and health.

Viewing Profiles

Open any data product in the catalog
Navigate to the Profile tab in the product detail view
Each column shows its computed statistics

Statistics Computed

Metric	Description	Profiling Depth
Null Ratio	Percentage of null values	Quick, Full
Distinct Count	Number of unique values (cardinality)	Quick, Full
Row Count	Number of rows sampled	Quick, Full
Min / Max	Minimum and maximum values (numeric and date columns)	Full
Mean	Average value (numeric columns only)	Full
Top Values	Most frequent values with their counts	Full
Histogram	Value distribution (numeric) or category frequencies	Full

Profiling Depth

You can choose between two profiling levels:

Quick — Computes only null ratio and cardinality. Fast and lightweight, suitable for routine monitoring.
Full — Computes all statistics including distributions, histograms, and top values. Takes longer but provides comprehensive insight.

Connection Requirements

Profiling connects to the same database as the product's scraping connector. The connection configuration is built by merging:

The connector's base connection_config (host, database, schema)
The linked credential's connection_config overlay (port, SSL options)
The credential's encrypted secret as the password

important

A valid credential must be configured on the source system linked to the product's connector. If the credential is missing or expired, profiling will fail with a connection error.

Running Profiling

On-Demand

From the product's Profile tab, click Run Profile and select the desired depth. The profiling runs immediately against the live data source.

Automatic (Post-Sync)

Profiling can run automatically after a successful metadata sync. When a connector completes a sync, the platform triggers a quick profile for newly discovered or updated products.

Sample Size

Profiling samples a configurable number of rows (default: 10,000, maximum: 100,000) to balance accuracy with query cost and execution time.

PII-Safe Profiling

Columns flagged as is_pii = true in the catalog are automatically skipped during profiling. This ensures sensitive data is never sampled or stored in profile statistics.

Historical Trends

Each time profiling runs, a snapshot is archived. You can compare profiles over time to detect drift:

Growing null ratios may indicate upstream pipeline issues
Sudden cardinality changes may signal data quality problems
Distribution shifts may require investigation

Access historical profiles from the History link in the Profile tab for any specific column.

Tips

Run full profiles periodically (e.g., weekly) and quick profiles after every sync
Use profiling results to inform quality check creation — columns with high null ratios are candidates for null-count checks
Review cardinality trends to detect schema drift early

Product Details — Understanding the product detail view
Quality Management — Setting up quality checks
Integrations — Configuring data source connections

What is Data Profiling?​

Viewing Profiles​

Statistics Computed​

Profiling Depth​

Connection Requirements​

Running Profiling​

On-Demand​

Automatic (Post-Sync)​

Sample Size​

PII-Safe Profiling​

Historical Trends​

Tips​

Related Documentation​