Drilling Into a Single RAG Domain

The platform dashboard answers "is everything running?" The domain deep-dive answers "how well is this specific domain performing?"

Pick any of the 57 Qdrant collections from a dropdown. The page fetches live data from three sources — the Qdrant API, the domain's FastAPI service, and the hash-chained audit log — and renders it in a single view.

What you see per domain

Collection health

Point count, indexed vectors, segment count, optimizer status, vector dimensions, distance metric. This is pulled directly from the Qdrant /collections/{name} API. A green status means HNSW indexing is complete. An "optimizing" status means the index is still being built — queries will work but retrieval may be slower.

For a domain like Airport Fuel Station with 13,705 points across 41 regulatory documents, the collection detail confirms every point is indexed and the optimizer is idle.

Service health

Four indicators: API, Qdrant, Ollama, Embedder. Each is checked by hitting the domain's /health endpoint, which in turn pings its own dependencies. A red Embedder means the shared TEI service at :8085 is down — every domain will show this simultaneously since they all share the same embedder.

Query statistics

Parsed from /var/log/aspexilary/queries.jsonl — the hash-chained compliance log. Filtered by domain. Shows:

Total queries and counts for the last 24 hours and 7 days
Average latency broken into retrieval time (embedding + Qdrant search) and LLM time (Ollama inference)
P95 latency — the number that matters for SLA conversations
Low confidence / no answer / Brave fallback counts — these are the queries where the corpus didn't have the answer

The chain integrity field verifies the prev_hash linkage across all entries for this domain. "Verified" means no entries have been modified or deleted.

Docker containers

Every container belonging to this domain's Docker stack, with image, state, and uptime. A typical domain shows 4 containers: Qdrant, API, and two capture sidecars. If a capture sidecar is restarting, it usually means it couldn't install tcpdump — the Alpine container needs network access during its first apk add.

Latency distribution

A bucketed bar chart showing how many queries fall into each latency band: under 5 seconds, 5–15s, 15–30s, 30–60s, over 60 seconds. For construction-domain queries against a local Qwen 3.5 model, most queries land in the 5–15 second range. Anything over 30 seconds usually means Ollama was loading the model from disk (first query after a cold start).

Latency over time

A stacked area chart showing retrieval time (green), LLM inference time (purple), and total time (blue) for the last 200 queries. The retrieval component should be roughly constant — it's an embedding call plus a vector search. The LLM component varies with response length and model load.

This chart reveals patterns: if retrieval time spikes, the Qdrant collection may need reindexing. If LLM time spikes, the GPU is contended (likely a training job). If both spike, the machine is overloaded.

Query report

Pulled live from the domain's /admin/query-report endpoint (localhost-only). Shows 7-day query analysis including intent distribution (how many queries were routed to internal retrieval vs. hybrid with Brave search) and corpus gap queries — the specific questions where the retrieval score was low enough to trigger a fallback.

Corpus gaps are the most actionable metric on the page. Each gap query is a document the domain should have but doesn't. For a fuel station domain, a gap query about "jet fuel temperature compensation" means we need to ingest the API MPMS Chapter 11 petroleum measurement standard.

Document coverage

The top 20 source documents by chunk count, with a proportional bar. Built by scrolling through up to 2,000 Qdrant points and aggregating the source payload field. Shows which regulatory documents dominate the collection and which have minimal coverage.

For Airport Fuel Station, the top sources are CFR Title 40 Vol 24 (EPA SPCC regulations, 436 chunks), AC 150/5370-10H (FAA construction standards, 311 chunks), and the EPA SPCC Guidance (230 chunks). A document with only 3–5 chunks may have been poorly extracted and needs re-processing.

How the data flows

Browser → nginx :8093 → /api/domain/{name} → Go backend :8091
                                                ├── Qdrant :6333 (collection detail + point scroll)
                                                ├── FastAPI :{port}/health (service health)
                                                ├── FastAPI :{port}/admin/query-report (7d stats)
                                                ├── /var/log/aspexilary/queries.jsonl (audit log parse)
                                                └── docker ps (container status)

The Go backend aggregates all five sources in parallel with an 8-second timeout. Service discovery uses the same systemd enumeration and port scanning as the platform dashboard — no hardcoded port mappings.

What this replaces

The alternative is five separate terminals: curl Qdrant, journalctl the service, docker ps | grep, parse the audit log with jq, and read the query report. For a single domain that's manageable. For 59 domains being evaluated by a non-technical stakeholder, it's not.

The deep-dive page puts every signal for a single domain on one screen. The platform dashboard puts every domain's top-line status on one screen. Together, they replace the terminal for day-to-day operations monitoring — without introducing a metrics pipeline.

The domain deep-dive is at localhost:8093/domain.html on omen-dev, or 192.168.195.92:8093/domain.html via ZeroTier. Select any collection from the dropdown and the page loads in under 2 seconds.