Building an Ops Dashboard for 59 RAG Domains

At four domains, you can SSH in and check each one. At sixty, you can't.

We crossed that threshold this month. The Aspexilary RAG platform now runs 59 construction-domain RAG systems, each deployed as a Docker Compose stack with its own Qdrant vector database, FastAPI service, and tcpdump capture sidecars for compliance. That's 245 containers across 61 stacks on a single machine.

The platform dashboard exists because we needed to answer three questions without opening a terminal:

Is everything running?
How much capacity is left?
Which domain needs attention?

What the dashboard shows

The overview dashboard is a single HTML page backed by a Go binary that polls every service on the platform and returns JSON. No database. No time-series store. Just live system state, rendered on every page load.

System health

CPU, RAM, GPU utilization, disk usage across both NVMe drives. The GPU panel matters because Ollama shares the RTX 5090 with training jobs — if VRAM usage crosses 20 GB, a fine-tune is running and the inference model may need to swap to CPU or a smaller quantization.

Docker fleet

The platform-level view shows all 61 stacks with container counts and health status. When everything is green, you see one line: "All containers healthy." When something is wrong, only the problem containers surface — you don't have to scan through 245 rows to find the one that exited.

Each stack runs 4 containers:

Qdrant — isolated vector database loaded from a pre-built snapshot
API — FastAPI service with domain-specific query routing
capture-qdrant — tcpdump sidecar monitoring Qdrant network traffic
capture-api — tcpdump sidecar monitoring API network traffic

The API containers reach the shared bare-metal Ollama and TEI embedder via host.docker.internal — no GPU containers per stack, keeping the footprint at roughly 4 containers per domain instead of 8.

Service logs

Filtered to RAG-relevant services only. Desktop noise (dbus, at-spi, bluetooth) is excluded at the backend level. Error counts use journalctl -p err..emerg — actual errors, not info-level messages that happen to contain the word "error."

Qdrant collections

All 57+ collections with point counts, index status, and optimizer health. A yellow optimizer means Qdrant is still building HNSW indexes — normal after a snapshot restore, concerning if it persists.

Ollama models

Every model loaded in Ollama with size and last-modified date. The fine-tuned model (qwen35-tuned:latest) should always be present. If it's missing, someone ran ollama rm or the model directory got corrupted.

Architecture

The dashboard is deliberately simple:

Backend: Single Go binary (127.0.0.1:8091), ~600 lines across 14 collectors
Frontend: Single HTML file with inline CSS and D3.js, served by nginx on :8093
No database: Every API call queries live system state. No historical data, no retention, no cleanup
No authentication: LAN-only via ZeroTier. Not exposed to the internet

Each collector runs with a 3–6 second timeout. The Go binary spawns concurrent goroutines for service discovery (systemd unit enumeration, port scanning 8001–8099, Qdrant collection listing). Total page load time is under 2 seconds despite polling 59 services.

Why not Grafana

Grafana needs Prometheus or InfluxDB. Prometheus needs exporters on every service. That's a metrics pipeline to maintain on top of the RAG pipeline. For 59 domains on a single machine, the operational cost of a full observability stack exceeds the value.

The Go dashboard does one thing: show what's running right now. It has no history, no alerting, no retention. If we need historical trends, the hash-chained audit logs already capture every query with timestamps and latencies — the compliance infrastructure doubles as the analytics source.

The dashboard runs at localhost:8093 on omen-dev, accessible via ZeroTier at 192.168.195.92:8093. It refreshes every 30 seconds. The domain deep-dive — for drilling into individual RAG systems — is covered in the next post.