Skip to main content

Data Profiling API

The Data Profiling API provides endpoints for triggering column-level profiling on data products and retrieving profile results and historical trends.

All endpoints are scoped to a space and require authentication with view permission on the product.

Base path: /spaces/{slug}/products/{product_id}/profile

Endpoints

Trigger Profiling

POST /spaces/{slug}/products/{product_id}/profile

Trigger an on-demand column profiling run for a product.

Request Body:

FieldTypeDefaultDescription
depthstring"quick""quick" (null + cardinality) or "full" (distributions)
sample_sizeinteger10000Max rows to sample (capped at 100,000)

Response:

{
"status": "completed",
"depth": "full",
"fields_profiled": 12,
"profiles": [
{
"field_id": "uuid",
"null_ratio": 0.05,
"distinct_count": 1423,
"row_count": 10000,
"min_value": "1",
"max_value": "9999",
"mean_value": "5042.3",
"top_values": [
{"value": "active", "count": 4521},
{"value": "inactive", "count": 3200}
],
"histogram": [
{"bucket": "0-1000", "count": 1200},
{"bucket": "1000-2000", "count": 1800}
],
"profiled_at": "2026-02-16T12:00:00Z"
}
]
}

Error Codes:

StatusReason
400Invalid depth value or profiling error
404Product not found

Get Latest Profiles

GET /spaces/{slug}/products/{product_id}/profile

Get the most recent profile snapshot for each field in the product.

Response:

{
"product_id": "uuid",
"profiles": [
{
"id": "uuid",
"field_id": "uuid",
"profile_depth": "full",
"row_count": 10000,
"null_count": 500,
"null_ratio": 0.05,
"distinct_count": 1423,
"min_value": "1",
"max_value": "9999",
"mean_value": "5042.3",
"top_values": [],
"histogram": [],
"profiled_at": "2026-02-16T12:00:00Z"
}
]
}

Get Field Profile History

GET /spaces/{slug}/products/{product_id}/profile/history/{field_id}

Get historical profile snapshots for a specific field. Useful for tracking data drift over time.

Query Parameters:

ParameterTypeDefaultDescription
limitinteger10Max snapshots to return (1–50)

Response:

{
"field_id": "uuid",
"snapshots": [
{
"id": "uuid",
"profile_depth": "full",
"row_count": 10000,
"null_count": 500,
"null_ratio": 0.05,
"distinct_count": 1423,
"min_value": "1",
"max_value": "9999",
"mean_value": "5042.3",
"top_values": [],
"histogram": [],
"profiled_at": "2026-02-16T12:00:00Z"
}
]
}

Profile Statistics Reference

FieldTypeQuickFullDescription
null_ratiofloatFraction of null values (0.0–1.0)
null_countintegerAbsolute count of nulls
distinct_countintegerNumber of unique values
row_countintegerRows sampled
min_valuestringMinimum value (stringified)
max_valuestringMaximum value (stringified)
mean_valuestringMean for numeric columns
top_valuesarrayMost frequent values with counts
histogramarrayDistribution buckets

Notes

  • PII-safe: Fields tagged as is_pii = true are automatically skipped during profiling
  • Automatic profiling: Post-sync profiling can be triggered by the worker after connector metadata syncs complete
  • Connection resolution: Profiling connects via the product's scraping connector. The connection config is built by merging the connector's base connection_config, the linked credential's connection_config overlay, and the credential's encrypted secret as the password
  • Credential required: A valid credential must be configured on the connector's source system. Missing or expired credentials return a 400 error