Skip to main content

AI Ops

AI Ops gives administrators a live operational view of Qarion AI usage, failures, guardrails, workflow runs, evaluations, and authoring readiness.

Open Admin -> AI Ops.

Pipeline Authoring Readiness

The Pipeline Authoring readiness section summarizes whether generated-code workflows can run safely in the current instance. Use it before rolling out or debugging AI-assisted implementation. It combines runtime settings, recent workflow evidence, failure signals, and reliability audit findings.

The top status is intentionally conservative:

StatusMeaning
ReadyRequired rollout controls are configured and recent evidence is healthy.
AttentionThe workflow can run, but reliability, coverage, or rollout posture needs review.
BlockedA required control is missing, unsupported, or failing closed.
UnknownQarion does not yet have enough recent activity or definitions to grade the signal.

Key runtime chips include:

SignalMeaning
Code writerWhether dedicated code-writer routing is enabled for executable Pipeline Authoring changes.
Code writer sourceWhich setting or environment source selected the code-writer state.
SandboxWhether generated-code sandbox validation is enabled.
BackendThe generated-code sandbox backend, such as Docker or Kubernetes.
ImageWhether the generated-code sandbox image is configured.
NamespaceWhether the Kubernetes sandbox namespace is configured when Kubernetes is used.
TimeoutThe maximum generated-code sandbox execution time.
DB sandboxWhether database sandbox validation is enabled for generated SQL/database work.
DB backendThe database sandbox backend, currently expected to be SQLite when enabled.
Dependency smokeWhether dependency smoke validation is enabled.
Package fetchThe validation package-fetch policy.
Public indexWhether public package index access is configured.
Qarion base URLWhether validation package fetch can resolve the Qarion package endpoint.
ProfilerWhether Pipeline Authoring performance profiling is enabled.
Slow moduleThe module load threshold that creates profiler warnings.
CPU spikeThe CPU spike threshold that creates profiler warnings.
Memory growthThe memory growth threshold that creates profiler warnings.

Readiness Checks

Readiness checks combine runtime configuration, recent failures, and coding-agent diagnostics. Statuses are intended to be operational:

  • Ready means the check has the expected configuration or recent evidence.
  • Attention means the workflow can still run, but rollout or reliability needs review.
  • Blocked means the feature is missing required configuration or is failing closed.
  • Unknown means there is not enough recent evidence for that check.

Common checks include:

CheckWhat to review
Review integrityPersisted plan reviews, saved-file records, failed-file reviews, and review drift.
Failure signalsRecent blocked or degraded examples from the reliability audit.
Active jobsPipeline Authoring chat jobs that appear stale or stuck.
Code generation specialistWhether dedicated code-writer routing is intentionally enabled or disabled.
Validation runtimeGenerated-code sandbox settings and database sandbox settings.
Package fetchWhether validation can install dependencies from Qarion repositories and, when allowed, a public index.
Dependency smokeWhether generated dependencies are smoke-tested before rollout.
Performance profilerCPU, memory, and module timing diagnostics in recent traces.
Recent AI activityFailed, blocked, invalid-output, degraded, or running Pipeline Authoring incidents.
Prompt/cache healthCache-aware prompt layout and stable prompt section evidence.
Memory freshnessBounded conversation, clarification, and workspace memory provenance.
Subagent/tool policyRead-only subagent evidence, denied tool-call recording, and tool-policy diagnostics.
Reliability trace healthStructured contract hashes, validation tiers, checkpoint coverage, repair counters, and terminal status.
Run trace healthNormalized workflow run traces for recent Pipeline Authoring activity.

When a check has a link target, use it to jump to the related AI logs, workflow runs, jobs, or system settings. Metadata chips show compact evidence, such as selected backend, enabled state, source setting, or warning counts.

Failure Signals

Failure signals show recent Pipeline Authoring examples that need review. They can include stuck jobs, failed generated-code validation, sandbox failures, guardrail blocks, missing package access, command-approval pauses, or repeated repair loops.

Use the linked workflow, log, or authoring session to inspect the affected workspace. The AI Ops page should provide enough context to triage the failure without exposing raw prompts, secrets, or unsanitized command output.

Failure signals are separate from readiness findings. Findings describe persisted review or saved-file drift. Failure signals describe recent workflow behavior, such as validation failures, stale resume checkpoints, sandbox failures, blocked evidence gates, denied tool calls, or subagent budget caps.

Rollout Checklist

Before enabling broad generated-code rollout:

  1. Confirm code-writer routing is intentionally enabled or intentionally disabled.
  2. Enable sandbox validation and verify the backend and image are configured.
  3. Enable database sandbox validation when generated SQL or database-facing support files are part of the rollout.
  4. Enable dependency smoke checks when package fetch and sandbox execution are available.
  5. Set validation package fetch to qarion_only or qarion_plus_public when generated dependencies need package installation.
  6. Confirm private package repositories, public-index access, and the Qarion base URL match the package policy.
  7. Review prompt/cache health, memory freshness, subagent policy, reliability trace health, and run trace health for warnings.
  8. Enable the performance profiler during rollout windows when CPU, memory, or module load behavior is part of the risk.
  9. Investigate recent failure signals before increasing rollout.

Troubleshooting

Code writer is disabled means executable changes use the main planning path. Enable the dedicated code-writer setting only when the specialist path is ready for rollout.

Sandbox validation is disabled means generated code is not being executed in the sandbox before review. Enable sandbox validation before relying on generated code for broader teams.

Sandbox backend is unsupported means the configured backend is not one of the supported runtime backends. Use Docker or Kubernetes and configure the matching image and runtime settings.

Database sandbox backend is unsupported means database sandbox validation is enabled with a backend Qarion cannot run safely. Use the supported SQLite-backed database sandbox or disable database sandbox validation until the backend is ready.

Package fetch is disabled means validation cannot install generated external dependencies. Use qarion_only for private Qarion package repositories or qarion_plus_public when public-index access is allowed.

Package fetch is enabled but the Qarion base URL is missing means generated dependency installation cannot resolve private Qarion package endpoints. Set the validation package base URL before relying on private package dependencies.

Dependency smoke is disabled means dependencies may be selected but not import-smoke-tested before review. Enable it once package fetch and the sandbox runtime are ready.

Performance profiler warnings appear means recent traces exceeded module load, CPU, or memory growth thresholds. Treat these as rollout signals and inspect the linked workflow before assuming the model response is the root cause.

Prompt/cache or memory freshness warnings appear means recent authoring runs may be missing stable prompt layout, context freshness, or recovery state. Check the linked workflow runs before treating the issue as a model-quality problem.

Reliability trace warnings appear means recent runs missed structured contract evidence, validation-tier evidence, checkpoint coverage, sandbox results, repair counters, or expected terminal status. Review the run trace and the replay/eval output before increasing rollout.