AI Ops

AI Ops gives administrators a live operational view of Qarion AI usage, failures, guardrails, workflow runs, evaluations, and authoring readiness.

Open Admin -> AI Ops.

Provider And Embedding Controls

AI Ops works with the provider controls in the admin area:

Control	Where it is configured	What it affects
LLM provider and model routing	Admin -> System Settings and instance settings	Copilot, generation, triage, authoring assistance, and other LLM-backed workflows.
Embedding configuration	Admin -> System Settings and instance settings	Semantic search for catalog products, documentation, and standards content.
External AI tool providers	API-backed superadmin and space-setting endpoints	Read-only external MCP tools that can be exposed to AI workflows in selected spaces.

Use these controls together. LLM configuration decides which model handles AI reasoning and generation. Embedding configuration decides how semantic search vectors are produced. External tool providers decide which additional read-only tool results AI workflows can use in a space.

Embedding Configuration

Embedding configuration controls semantic search. Superadmins can choose a provider, model, dimensions, optional base URL, enabled state, and index version. Supported providers are local, openai, azure_openai, mistral, and voyage.

Key lifecycle states:

State	Meaning
Active	The selected embedding shape is ready for newly indexed content.
Pending reindex	Provider, model, or dimensions changed and existing vectors should be regenerated.
Failed	The configuration or runtime needs administrator attention.

For local embeddings, the runtime badge can show idle, warming, ready, or failed. Remote providers show an external-provider state because Qarion tests them through provider calls rather than local model warmup.

After changing provider, model, dimensions, or index version:

Save the embedding configuration.
Use Test to verify the effective provider can generate a vector.
Reindex products so catalog semantic search uses the new vector shape.
Reindex standards or documentation content when the changed provider should affect those results as well.
Monitor AI Ops and search behavior for provider failures or missing semantic results.

Instance settings can override the control-plane embedding configuration for a specific tenant instance. When an instance override exists, testing and reindexing should be performed from that instance context.

External AI Tool Providers

External AI tool providers let superadmins register governed MCP stdio servers as read-only tool sources for AI workflows. A provider defines the command arguments, optional safe relative working directory, non-secret environment variables, optional credential reference, and tool metadata overrides.

The rollout lifecycle is:

Create the provider with a safe argv command and credential reference.
Discover tools so Qarion records tool names, descriptions, input schemas, namespaced names, and risk metadata.
Review tool metadata and enable only read-risk tools.
Enable the provider for specific spaces and optionally allowlist tool names.
Watch AI logs and workflow runs for tool-call failures or unexpected provider output.

V1 external provider enablement is read-risk only. Shell workflows, sensitive environment keys, unsafe working directories, and blocked executable names are rejected. Store secrets in credential records instead of plain environment variables.

Instance Scope

Some AI controls can be configured at the control plane or overridden for a specific runtime instance. Use instance-level overrides when a tenant needs a different embedding provider, model dimensions, base URL, or key than the global default.

When debugging instance-specific behavior, check the configuration source badge before changing settings. If the source is environment, the value comes from runtime settings. If the source is database, it was saved through the admin configuration APIs or UI.

Pipeline Authoring Readiness

The Pipeline Authoring readiness section summarizes whether generated-code workflows can run safely in the current instance. Use it before rolling out or debugging AI-assisted implementation. It combines runtime settings, recent workflow evidence, failure signals, and reliability audit findings.

The top status is intentionally conservative:

Status	Meaning
Ready	Required rollout controls are configured and recent evidence is healthy.
Attention	The workflow can run, but reliability, coverage, or rollout posture needs review.
Blocked	A required control is missing, unsupported, or failing closed.
Unknown	Qarion does not yet have enough recent activity or definitions to grade the signal.

Key runtime chips include:

Signal	Meaning
Code writer	Whether dedicated code-writer routing is enabled for executable Pipeline Authoring changes.
Code writer source	Which setting or environment source selected the code-writer state.
Sandbox	Whether generated-code sandbox validation is enabled.
Backend	The generated-code sandbox backend, such as Docker or Kubernetes.
Image	Whether the generated-code sandbox image is configured.
Namespace	Whether the Kubernetes sandbox namespace is configured when Kubernetes is used.
Timeout	The maximum generated-code sandbox execution time.
DB sandbox	Whether database sandbox validation is enabled for generated SQL/database work.
DB backend	The database sandbox backend, currently expected to be SQLite when enabled.
Dependency smoke	Whether dependency smoke validation is enabled.
Package fetch	The validation package-fetch policy.
Public index	Whether public package index access is configured.
Qarion base URL	Whether validation package fetch can resolve the Qarion package endpoint.
Profiler	Whether Pipeline Authoring performance profiling is enabled.
Slow module	The module load threshold that creates profiler warnings.
CPU spike	The CPU spike threshold that creates profiler warnings.
Memory growth	The memory growth threshold that creates profiler warnings.

Readiness Checks

Readiness checks combine runtime configuration, recent failures, and coding-agent diagnostics. Statuses are intended to be operational:

Ready means the check has the expected configuration or recent evidence.
Attention means the workflow can still run, but rollout or reliability needs review.
Blocked means the feature is missing required configuration or is failing closed.
Unknown means there is not enough recent evidence for that check.

Common checks include:

Check	What to review
Review integrity	Persisted plan reviews, saved-file records, failed-file reviews, and review drift.
Failure signals	Recent blocked or degraded examples from the reliability audit.
Active jobs	Pipeline Authoring chat jobs that appear stale or stuck.
Code generation specialist	Whether dedicated code-writer routing is intentionally enabled or disabled.
Validation runtime	Generated-code sandbox settings and database sandbox settings.
Package fetch	Whether validation can install dependencies from Qarion repositories and, when allowed, a public index.
Dependency smoke	Whether generated dependencies are smoke-tested before rollout.
Performance profiler	CPU, memory, and module timing diagnostics in recent traces.
Recent AI activity	Failed, blocked, invalid-output, degraded, or running Pipeline Authoring incidents.
Prompt/cache health	Cache-aware prompt layout and stable prompt section evidence.
Memory freshness	Bounded conversation, clarification, manual workspace memory, and learned long-term memory provenance.
Subagent/tool policy	Read-only subagent evidence, denied tool-call recording, and tool-policy diagnostics.
Reliability trace health	Structured contract hashes, validation tiers, checkpoint coverage, repair counters, and terminal status.
Run trace health	Normalized workflow run traces for recent Pipeline Authoring activity.

When a check has a link target, use it to jump to the related AI logs, workflow runs, jobs, or system settings. Metadata chips show compact evidence, such as selected backend, enabled state, source setting, or warning counts.

Failure Signals

Failure signals show recent Pipeline Authoring examples that need review. They can include stuck jobs, failed generated-code validation, sandbox failures, guardrail blocks, missing package access, command-approval pauses, or repeated repair loops.

Use the linked workflow, log, or authoring session to inspect the affected workspace. The AI Ops page should provide enough context to triage the failure without exposing raw prompts, secrets, or unsanitized command output.

Failure signals are separate from readiness findings. Findings describe persisted review or saved-file drift. Failure signals describe recent workflow behavior, such as validation failures, stale resume checkpoints, sandbox failures, blocked evidence gates, denied tool calls, or subagent budget caps.

Rollout Checklist

Before enabling broad generated-code rollout:

Confirm code-writer routing is intentionally enabled or intentionally disabled.
Enable sandbox validation and verify the backend and image are configured.
Enable database sandbox validation when generated SQL or database-facing support files are part of the rollout.
Enable dependency smoke checks when package fetch and sandbox execution are available.
Set validation package fetch to qarion_only or qarion_plus_public when generated dependencies need package installation.
Confirm private package repositories, public-index access, and the Qarion base URL match the package policy.
Review prompt/cache health, memory freshness, subagent policy, reliability trace health, and run trace health for warnings.
Enable the performance profiler during rollout windows when CPU, memory, or module load behavior is part of the risk.
Investigate recent failure signals before increasing rollout.

Troubleshooting

Code writer is disabled means executable changes use the main planning path. Enable the dedicated code-writer setting only when the specialist path is ready for rollout.

Sandbox validation is disabled means generated code is not being executed in the sandbox before review. Enable sandbox validation before relying on generated code for broader teams.

Sandbox backend is unsupported means the configured backend is not one of the supported runtime backends. Use Docker or Kubernetes and configure the matching image and runtime settings.

Database sandbox backend is unsupported means database sandbox validation is enabled with a backend Qarion cannot run safely. Use the supported SQLite-backed database sandbox or disable database sandbox validation until the backend is ready.

Package fetch is disabled means validation cannot install generated external dependencies. Use qarion_only for private Qarion package repositories or qarion_plus_public when public-index access is allowed.

Package fetch is enabled but the Qarion base URL is missing means generated dependency installation cannot resolve private Qarion package endpoints. Set the validation package base URL before relying on private package dependencies.

Dependency smoke is disabled means dependencies may be selected but not import-smoke-tested before review. Enable it once package fetch and the sandbox runtime are ready.

Performance profiler warnings appear means recent traces exceeded module load, CPU, or memory growth thresholds. Treat these as rollout signals and inspect the linked workflow before assuming the model response is the root cause.

Prompt/cache or memory freshness warnings appear means recent authoring runs may be missing stable prompt layout, context freshness, or recovery state. Check the linked workflow runs before treating the issue as a model-quality problem.

Reliability trace warnings appear means recent runs missed structured contract evidence, validation-tier evidence, checkpoint coverage, sandbox results, repair counters, or expected terminal status. Review the run trace and the replay/eval output before increasing rollout.

Embedding config is pending reindex means semantic vectors may still use an older provider, model, or dimension setting. Run product and standards reindex jobs before judging semantic search quality.

Embedding test fails means the selected provider cannot generate a vector from the current configuration. Check the provider, model name, dimensions, base URL, API key, local model runtime, and instance scope.

External tool provider discovery fails means Qarion could not start or query the configured MCP stdio server. Check argv, working directory, allowed executable, credential reference, environment keys, and provider logs.

A tool is not available in a space means the provider may be inactive, the tool was not discovered, the tool is disabled, the space enablement is off, the tool is outside the allowlist, or the tool risk level is not read-only.

Pipeline Authoring for the user-facing review and validation flow.
Authoring Overview for shared workspace and execution controls.
Artifact Repositories for private package and OCI dependencies.
Notebook Workers for dedicated notebook runtime monitoring.
AI-Assisted Capabilities for the broader AI feature map.

Provider And Embedding Controls​

Embedding Configuration​

External AI Tool Providers​

Instance Scope​

Pipeline Authoring Readiness​

Readiness Checks​

Failure Signals​

Rollout Checklist​

Troubleshooting​

Related Guides​