Skip to main content

Usage Analytics API

The Usage Analytics API provides access to BI query scraping, query log mining, usage aggregation, and popularity scoring.


BI Sync

Execute BI Sync

POST /connectors/{connector_id}/bi-sync

Triggers a full BI scrape for a connector. Creates a SyncJob, invokes the registered BI scraper (Superset, Metabase, Looker, Tableau), upserts BiQuery rows, and triggers usage aggregation.

Supported connector types: superset, metabase, looker, tableau, power_bi

Response: 200 OKSyncJob

Errors:

  • 404 Not Found — Connector not found
  • 400 Bad Request — Connector is not a BI type

Execute Query Log Mining

POST /connectors/{connector_id}/query-log-mining

Mines warehouse query logs directly from data warehouse audit logs. This is separate from BI connector scraping — it extracts query history from warehouses the platform already connects to.

Supported connector types: snowflake, bigquery, postgres

Response: 200 OK

{
"rows_upserted": 156
}

Usage Aggregation

Aggregate Usage

POST /spaces/{slug}/usage/aggregate

Recomputes ProductUsageStats for every product in a space. Groups BiQuery records by referenced table, matches them to existing data products, and upserts aggregated counts.

Response: 200 OK

{
"products_updated": 42
}

Get Unused Products

GET /spaces/{slug}/usage/unused

Returns products with zero BI query references in the lookback window.

Query Parameters:

ParameterTypeDefaultDescription
daysint30Lookback window in days

Response: 200 OK — Array of DataProduct summaries with no detected usage.


Data Model

BiQuery

The BiQuery model stores individual query or chart references scraped from BI tools or mined from query logs.

FieldTypeDescription
idUUIDRecord identifier
connector_idUUIDSource connector
resource_typestringsaved_query, chart, dashboard_chart, query_log
resource_idstringBI-tool-specific identifier
titlestring | nullQuery/chart title
sql_textstring | nullAnonymized SQL
referenced_tablesstring[]Table names referenced in the SQL
referenced_columnsstring[]Column names referenced
executed_bystring | nullUser/service account identity
executed_atdatetime | nullQuery execution time
dashboard_idstring | nullParent dashboard (for chart resources)
dashboard_titlestring | nullDashboard name
dashboard_urlstring | nullLink-back URL to the BI tool
created_atdatetimeRecord creation timestamp

ProductUsageStats

Aggregated usage statistics per product, computed by the UsageAggregationService.

FieldTypeDescription
product_idUUIDProduct identifier
space_idUUIDSpace identifier
query_count_30dintQueries in the last 30 days
unique_users_30dintDistinct users in the last 30 days
last_queried_atdatetime | nullMost recent query timestamp
popularity_scorefloat0–100 score
top_columnsstring[]Most referenced columns
updated_atdatetimeLast aggregation timestamp

Popularity Score Formula

The popularity score is computed as:

raw = 0.5 × log₂(1 + query_count_30d)
+ 0.3 × unique_users_30d
+ 0.2 × recency_factor

score = min(100, raw / 10.0 × 100)

Where recency_factor decays based on how recently the product was last queried.


SQL Anonymization

All query text is anonymized before storage. String and numeric literals are replaced with ? placeholders:

'C-1234'  →  ?
1000 → ?
3.14 → ?

This produces normalized SQL patterns suitable for grouping (Popular Queries) without exposing sensitive filter values.