Data Profiling API
The Data Profiling API provides endpoints for triggering column-level profiling on data products and retrieving profile results and historical trends.
All endpoints are scoped to a space and require authentication with view permission on the product.
Base path: /spaces/{slug}/products/{product_id}/profile
Endpoints
Trigger Profiling
POST /spaces/{slug}/products/{product_id}/profile
Trigger an on-demand column profiling run for a product.
Request Body:
| Field | Type | Default | Description |
|---|---|---|---|
depth | string | "quick" | "quick" (null + cardinality) or "full" (distributions) |
sample_size | integer | 10000 | Max rows to sample (capped at 100,000) |
Response:
{
"status": "completed",
"depth": "full",
"fields_profiled": 12,
"profiles": [
{
"field_id": "uuid",
"null_ratio": 0.05,
"distinct_count": 1423,
"row_count": 10000,
"min_value": "1",
"max_value": "9999",
"mean_value": "5042.3",
"top_values": [
{"value": "active", "count": 4521},
{"value": "inactive", "count": 3200}
],
"histogram": [
{"bucket": "0-1000", "count": 1200},
{"bucket": "1000-2000", "count": 1800}
],
"profiled_at": "2026-02-16T12:00:00Z"
}
]
}
Error Codes:
| Status | Reason |
|---|---|
| 400 | Invalid depth value or profiling error |
| 404 | Product not found |
Get Latest Profiles
GET /spaces/{slug}/products/{product_id}/profile
Get the most recent profile snapshot for each field in the product.
Response:
{
"product_id": "uuid",
"profiles": [
{
"id": "uuid",
"field_id": "uuid",
"profile_depth": "full",
"row_count": 10000,
"null_count": 500,
"null_ratio": 0.05,
"distinct_count": 1423,
"min_value": "1",
"max_value": "9999",
"mean_value": "5042.3",
"top_values": [],
"histogram": [],
"profiled_at": "2026-02-16T12:00:00Z"
}
]
}
Get Field Profile History
GET /spaces/{slug}/products/{product_id}/profile/history/{field_id}
Get historical profile snapshots for a specific field. Useful for tracking data drift over time.
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
limit | integer | 10 | Max snapshots to return (1–50) |
Response:
{
"field_id": "uuid",
"snapshots": [
{
"id": "uuid",
"profile_depth": "full",
"row_count": 10000,
"null_count": 500,
"null_ratio": 0.05,
"distinct_count": 1423,
"min_value": "1",
"max_value": "9999",
"mean_value": "5042.3",
"top_values": [],
"histogram": [],
"profiled_at": "2026-02-16T12:00:00Z"
}
]
}
Profile Statistics Reference
| Field | Type | Quick | Full | Description |
|---|---|---|---|---|
null_ratio | float | ✅ | ✅ | Fraction of null values (0.0–1.0) |
null_count | integer | ✅ | ✅ | Absolute count of nulls |
distinct_count | integer | ✅ | ✅ | Number of unique values |
row_count | integer | ✅ | ✅ | Rows sampled |
min_value | string | ✅ | Minimum value (stringified) | |
max_value | string | ✅ | Maximum value (stringified) | |
mean_value | string | ✅ | Mean for numeric columns | |
top_values | array | ✅ | Most frequent values with counts | |
histogram | array | ✅ | Distribution buckets |
Notes
- PII-safe: Fields tagged as
is_pii = trueare automatically skipped during profiling - Automatic profiling: Post-sync profiling can be triggered by the worker after connector metadata syncs complete
- Connection resolution: Profiling connects via the product's scraping connector. The connection config is built by merging the connector's base
connection_config, the linked credential'sconnection_configoverlay, and the credential's encrypted secret as the password - Credential required: A valid credential must be configured on the connector's source system. Missing or expired credentials return a
400error