Data Profiling
Data Profiling computes column-level statistics for your data products, giving you visibility into data quality, distribution, and completeness directly within the catalog.
What is Data Profiling?
When profiling runs against a product, Qarion connects to the underlying data source, samples rows, and computes statistics for each column. The results are displayed in the product's Profile tab, providing an at-a-glance view of your data's shape and health.
Viewing Profiles
- Open any data product in the catalog
- Navigate to the Profile tab in the product detail view
- Each column shows its computed statistics
Statistics Computed
| Metric | Description | Profiling Depth |
|---|---|---|
| Null Ratio | Percentage of null values | Quick, Full |
| Distinct Count | Number of unique values (cardinality) | Quick, Full |
| Row Count | Number of rows sampled | Quick, Full |
| Min / Max | Minimum and maximum values (numeric and date columns) | Full |
| Mean | Average value (numeric columns only) | Full |
| Top Values | Most frequent values with their counts | Full |
| Histogram | Value distribution (numeric) or category frequencies | Full |
Profiling Depth
You can choose between two profiling levels:
- Quick — Computes only null ratio and cardinality. Fast and lightweight, suitable for routine monitoring.
- Full — Computes all statistics including distributions, histograms, and top values. Takes longer but provides comprehensive insight.
Connection Requirements
Profiling connects to the same database as the product's scraping connector. The connection configuration is built by merging:
- The connector's base
connection_config(host, database, schema) - The linked credential's
connection_configoverlay (port, SSL options) - The credential's encrypted secret as the password
A valid credential must be configured on the source system linked to the product's connector. If the credential is missing or expired, profiling will fail with a connection error.
Running Profiling
On-Demand
From the product's Profile tab, click Run Profile and select the desired depth. The profiling runs immediately against the live data source.
Automatic (Post-Sync)
Profiling can run automatically after a successful metadata sync. When a connector completes a sync, the platform triggers a quick profile for newly discovered or updated products.
Sample Size
Profiling samples a configurable number of rows (default: 10,000, maximum: 100,000) to balance accuracy with query cost and execution time.
PII-Safe Profiling
Columns flagged as is_pii = true in the catalog are automatically skipped during profiling. This ensures sensitive data is never sampled or stored in profile statistics.
Historical Trends
Each time profiling runs, a snapshot is archived. You can compare profiles over time to detect drift:
- Growing null ratios may indicate upstream pipeline issues
- Sudden cardinality changes may signal data quality problems
- Distribution shifts may require investigation
Access historical profiles from the History link in the Profile tab for any specific column.
Tips
- Run full profiles periodically (e.g., weekly) and quick profiles after every sync
- Use profiling results to inform quality check creation — columns with high null ratios are candidates for null-count checks
- Review cardinality trends to detect schema drift early
Related Documentation
- Product Details — Understanding the product detail view
- Quality Management — Setting up quality checks
- Integrations — Configuring data source connections