Skip to main content

Observability

This guide covers Qarion's observability infrastructure: structured logging, request correlation, and application metrics.

Overview

Qarion's multi-instance architecture requires first-class observability. Three middleware components work together to provide a unified picture of request processing:

ComponentMiddlewarePurpose
Structured Loggingrequest_logging_middleware.pyJSON-formatted request/response logs
Correlation IDscorrelation_middleware.pyTrace requests across services
Metricsmetrics_middleware.pyPrometheus-compatible application metrics

Structured Logging

All log output uses structured JSON format for machine-readable log aggregation.

Log Format

Each log entry includes:

FieldDescription
timestampISO 8601 timestamp
levelLog level (INFO, WARNING, ERROR)
loggerPython logger name
messageHuman-readable log message
correlation_idRequest correlation ID (see below)
methodHTTP method
pathRequest path
status_codeResponse status code
duration_msRequest duration in milliseconds
user_idAuthenticated user ID (if available)
space_idSpace context (if available)

Configuration

Structured logging is enabled by default. Configure the log level via the LOG_LEVEL environment variable:

LOG_LEVEL=INFO  # Default
LOG_LEVEL=DEBUG # Include SQL queries and detailed traces

Log Filtering

API request logs are emitted at INFO level with a distinct logger name, making them easy to filter from SQL query noise. Use your log aggregator's filtering to isolate API activity:

logger:"app.middleware.request_logging_middleware"

Request Correlation

Every incoming request is assigned a unique Correlation ID that propagates through all log entries, database queries, and downstream service calls for that request.

How It Works

  1. The correlation_middleware checks for an incoming X-Correlation-ID header
  2. If present, that ID is used; otherwise a new UUID is generated
  3. The ID is stored in context and attached to all log entries
  4. The ID is returned in the X-Correlation-ID response header

Usage

Pass a correlation ID from your client to trace a request end-to-end:

curl -H "X-Correlation-ID: my-trace-123" https://api.qarion.com/...

All log entries for that request will include "correlation_id": "my-trace-123", enabling you to filter your log aggregator by this value.

Multi-Instance Tracing

In a multi-instance deployment, correlation IDs allow you to trace a single user action across multiple backend instances. The ID travels with the request regardless of which instance handles it.

Application Metrics

Qarion exposes Prometheus-compatible metrics for monitoring request latency, error rates, and throughput.

Metrics Endpoint

GET /metrics

Returns metrics in Prometheus exposition format.

Available Metrics

MetricTypeLabelsDescription
http_requests_totalCountermethod, path, statusTotal HTTP requests
http_request_duration_secondsHistogrammethod, pathRequest latency distribution
http_requests_in_progressGaugeCurrently processing requests

Prometheus Configuration

Add Qarion to your Prometheus scrape configuration:

scrape_configs:
- job_name: 'qarion'
scrape_interval: 15s
static_configs:
- targets: ['qarion-api:8000']
metrics_path: '/metrics'

Grafana Dashboards

Use the exposed metrics to build dashboards for:

  • Request raterate(http_requests_total[5m])
  • Error raterate(http_requests_total{status=~"5.."}[5m])
  • P95 latencyhistogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
  • Concurrent requestshttp_requests_in_progress

Instance Identification

The instance_middleware tags each request with the instance identifier, useful for debugging load balancer routing and identifying instance-specific issues.

HeaderDescription
X-Instance-IDInstance identifier returned in response headers

Best Practices

  1. Always pass Correlation IDs from client applications to enable end-to-end tracing
  2. Set LOG_LEVEL=INFO in production to capture API requests without SQL noise
  3. Scrape /metrics with Prometheus for real-time monitoring and alerting
  4. Use structured log fields (not free-text search) for log aggregation queries
  5. Monitor P95 latency to catch performance regressions early