Notebook Workers
Notebook Workers gives administrators a live view of dedicated notebook runtime capacity. Use it to monitor worker startup, health, ownership, backend type, resource class usage, and stale runtime states.
Open Admin -> Notebook Workers.
Summary Metrics
The top summary shows:
| Metric | Meaning |
|---|---|
| Total | Workers returned by the current filters. |
| Active | Workers that are ready, starting, busy, or stopping. |
| Failed | Workers that ended in a failed state. |
| Stale | Workers that appear stuck in starting or busy. |
The page refreshes periodically and can also be refreshed manually.
Filters
Use filters to narrow the worker list:
- Status:
starting,ready,busy,stopping,stopped, orfailed. - Backend:
docker,kubernetes, orexternal. - Space ID and User ID: UUID filters for incident investigation.
- Limit: maximum number of workers returned.
- Include terminal workers: include stopped and failed workers.
- Include runtime status: ask the runtime gateway or worker endpoint for live health. This can be slower than database-only status.
Worker Details
Expand a worker row to inspect runtime details, timestamps, repository scope, resource class snapshot, gateway metadata, and sanitized error detail. Runtime health can be:
| Health | Meaning |
|---|---|
ready | The worker endpoint reports healthy. |
starting | The worker is still preparing. |
error | The worker endpoint reported an error. |
unreachable | Qarion could not reach the worker endpoint. |
Resource Classes
Notebook resource classes define the CPU, memory, GPU, workspace, and temporary storage limits offered to users when they start dedicated workers. Qarion stores a resource class snapshot on each worker so later investigations can see the limits used at startup even if the class is changed later.
Administrators can configure resource classes through platform settings or the notebook runtime resource class editor when available. At least one enabled class should be marked as the default.
Resource class settings include:
| Setting | Meaning |
|---|---|
| ID and display name | Stable identifier and user-facing label shown in the worker picker. |
| Enabled and default | Whether users can select the class and which enabled class is preselected. |
| CPU and memory requests/limits | Scheduler and runtime limits for the worker process. |
| GPU count and resource | Optional GPU request and qualified resource name. |
| Workspace and temp storage | Per-worker storage limits for checked-out files and temporary execution output. |
If a resource class is changed, new workers use the new values. Existing worker rows continue to show the snapshot used when they started.
Troubleshooting
Workers remain starting: Check whether the runtime gateway can create Docker containers or Kubernetes workloads, and confirm the worker image is the dedicated notebook worker image rather than the backend image.
Runtime status is unreachable: Confirm gateway routing, internal tokens, network policy, and worker service reachability.
Dependency setup fails: Check the worker image, package fetch policy, private package repository access, and dependency cache configuration.
Users hit worker limits: Review NOTEBOOK_RUNTIME_MAX_WORKERS_PER_USER and
the active resource classes. Stop stale or unused workers before increasing
limits.
A resource class is rejected: Confirm exactly one enabled class is marked as the default, CPU and memory quantities are positive, GPU resources use a qualified resource name, and workspace/temp storage values meet the configured minimums.
Stop fails: Inspect runtime gateway reachability and worker cleanup logs.
Qarion leaves the worker out of stopped until the failed stop is resolved or a
cleanup task reconciles the runtime state.
Related Operations
- Local Docker demo: run
make demo-notebook-runtime-up. - Cleanup: stale runtime worker cleanup is scheduled as a background worker task.
- Deployment: Kubernetes installs need the notebook runtime gateway, worker image, service account, and optional dependency cache PVC configured together.
- User guide: Notebooks explains worker selection, package access, and cell execution behavior.