Usage Analytics API
The Usage Analytics API provides access to BI query scraping, query log mining, usage aggregation, and popularity scoring.
BI Sync
Execute BI Sync
POST /connectors/{connector_id}/bi-sync
Triggers a full BI scrape for a connector. Creates a SyncJob, invokes the registered BI scraper (Superset, Metabase, Looker, Tableau), upserts BiQuery rows, and triggers usage aggregation.
Supported connector types: superset, metabase, looker, tableau, power_bi
Response: 200 OK — SyncJob
Errors:
404 Not Found— Connector not found400 Bad Request— Connector is not a BI type
Execute Query Log Mining
POST /connectors/{connector_id}/query-log-mining
Mines warehouse query logs directly from data warehouse audit logs. This is separate from BI connector scraping — it extracts query history from warehouses the platform already connects to.
Supported connector types: snowflake, bigquery, postgres
Response: 200 OK
{
"rows_upserted": 156
}
Usage Aggregation
Aggregate Usage
POST /spaces/{slug}/usage/aggregate
Recomputes ProductUsageStats for every product in a space. Groups BiQuery records by referenced table, matches them to existing data products, and upserts aggregated counts.
Response: 200 OK
{
"products_updated": 42
}
Get Unused Products
GET /spaces/{slug}/usage/unused
Returns products with zero BI query references in the lookback window.
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
days | int | 30 | Lookback window in days |
Response: 200 OK — Array of DataProduct summaries with no detected usage.
Data Model
BiQuery
The BiQuery model stores individual query or chart references scraped from BI tools or mined from query logs.
| Field | Type | Description |
|---|---|---|
id | UUID | Record identifier |
connector_id | UUID | Source connector |
resource_type | string | saved_query, chart, dashboard_chart, query_log |
resource_id | string | BI-tool-specific identifier |
title | string | null | Query/chart title |
sql_text | string | null | Anonymized SQL |
referenced_tables | string[] | Table names referenced in the SQL |
referenced_columns | string[] | Column names referenced |
executed_by | string | null | User/service account identity |
executed_at | datetime | null | Query execution time |
dashboard_id | string | null | Parent dashboard (for chart resources) |
dashboard_title | string | null | Dashboard name |
dashboard_url | string | null | Link-back URL to the BI tool |
created_at | datetime | Record creation timestamp |
ProductUsageStats
Aggregated usage statistics per product, computed by the UsageAggregationService.
| Field | Type | Description |
|---|---|---|
product_id | UUID | Product identifier |
space_id | UUID | Space identifier |
query_count_30d | int | Queries in the last 30 days |
unique_users_30d | int | Distinct users in the last 30 days |
last_queried_at | datetime | null | Most recent query timestamp |
popularity_score | float | 0–100 score |
top_columns | string[] | Most referenced columns |
updated_at | datetime | Last aggregation timestamp |
Popularity Score Formula
The popularity score is computed as:
raw = 0.5 × log₂(1 + query_count_30d)
+ 0.3 × unique_users_30d
+ 0.2 × recency_factor
score = min(100, raw / 10.0 × 100)
Where recency_factor decays based on how recently the product was last queried.
SQL Anonymization
All query text is anonymized before storage. String and numeric literals are replaced with ? placeholders:
'C-1234' → ?
1000 → ?
3.14 → ?
This produces normalized SQL patterns suitable for grouping (Popular Queries) without exposing sensitive filter values.