Skip to main content

Notebook Workers

Notebook Workers gives administrators a live view of dedicated notebook runtime capacity. Use it to monitor worker startup, health, ownership, backend type, resource class usage, and stale runtime states.

Open Admin -> Notebook Workers.

Summary Metrics

The top summary shows:

MetricMeaning
TotalWorkers returned by the current filters.
ActiveWorkers that are ready, starting, busy, or stopping.
FailedWorkers that ended in a failed state.
StaleWorkers that appear stuck in starting or busy.

The page refreshes periodically and can also be refreshed manually.

Filters

Use filters to narrow the worker list:

  • Status: starting, ready, busy, stopping, stopped, or failed.
  • Backend: docker, kubernetes, or external.
  • Space ID and User ID: UUID filters for incident investigation.
  • Limit: maximum number of workers returned.
  • Include terminal workers: include stopped and failed workers.
  • Include runtime status: ask the runtime gateway or worker endpoint for live health. This can be slower than database-only status.

Worker Details

Expand a worker row to inspect runtime details, timestamps, repository scope, resource class snapshot, gateway metadata, and sanitized error detail. Runtime health can be:

HealthMeaning
readyThe worker endpoint reports healthy.
startingThe worker is still preparing.
errorThe worker endpoint reported an error.
unreachableQarion could not reach the worker endpoint.

Resource Classes

Notebook resource classes define the CPU, memory, GPU, workspace, and temporary storage limits offered to users when they start dedicated workers. Qarion stores a resource class snapshot on each worker so later investigations can see the limits used at startup even if the class is changed later.

Administrators can configure resource classes through platform settings or the notebook runtime resource class editor when available. At least one enabled class should be marked as the default.

Resource class settings include:

SettingMeaning
ID and display nameStable identifier and user-facing label shown in the worker picker.
Enabled and defaultWhether users can select the class and which enabled class is preselected.
CPU and memory requests/limitsScheduler and runtime limits for the worker process.
GPU count and resourceOptional GPU request and qualified resource name.
Workspace and temp storagePer-worker storage limits for checked-out files and temporary execution output.

If a resource class is changed, new workers use the new values. Existing worker rows continue to show the snapshot used when they started.

Troubleshooting

Workers remain starting: Check whether the runtime gateway can create Docker containers or Kubernetes workloads, and confirm the worker image is the dedicated notebook worker image rather than the backend image.

Runtime status is unreachable: Confirm gateway routing, internal tokens, network policy, and worker service reachability.

Dependency setup fails: Check the worker image, package fetch policy, private package repository access, and dependency cache configuration.

Users hit worker limits: Review NOTEBOOK_RUNTIME_MAX_WORKERS_PER_USER and the active resource classes. Stop stale or unused workers before increasing limits.

A resource class is rejected: Confirm exactly one enabled class is marked as the default, CPU and memory quantities are positive, GPU resources use a qualified resource name, and workspace/temp storage values meet the configured minimums.

Stop fails: Inspect runtime gateway reachability and worker cleanup logs. Qarion leaves the worker out of stopped until the failed stop is resolved or a cleanup task reconciles the runtime state.

  • Local Docker demo: run make demo-notebook-runtime-up.
  • Cleanup: stale runtime worker cleanup is scheduled as a background worker task.
  • Deployment: Kubernetes installs need the notebook runtime gateway, worker image, service account, and optional dependency cache PVC configured together.
  • User guide: Notebooks explains worker selection, package access, and cell execution behavior.