Every domain is a sealed, four-container stack. Nothing needs to be configured, built, or downloaded after delivery. The vector database arrives pre-loaded with embedded regulatory documents.
Deployment Model
Your infrastructure.
Our intelligence.
Every domain RAG system ships as a self-contained Docker Compose stack with a pre-built vector database. No ingestion pipeline. No GPU. No cloud dependency. Pull, start, query — under 60 seconds.
What Ships
Three Commands
From delivery to a live, queryable API. Any machine with Docker installed. No internet required after initial pull.
# Pull the domain stack docker compose pull # Start everything docker compose up -d # Query immediately curl localhost:8001/query \ -H "Content-Type: application/json" \ -d '{"question": "What are the NFPA 855 requirements for BESS installations?"}'
Air-Gapped by Design
Every domain runs on its own internal Docker network with no internet egress. This isn't a software filter — it's enforced at the kernel network layer. A compromised container in one domain has no network path to another domain's data or to the outside world.
# Each domain is a sealed unit aspexilary_fuel_internal (internal: true, no egress) rag-api ←→ qdrant rag-api ←→ ollama rag-api ←→ embedder aspexilary_parking_internal (internal: true, no egress) rag-api ←→ qdrant rag-api ←→ ollama rag-api ←→ embedder Deploy one, deploy ten — they don't interfere.
Why Pre-Built Snapshots
The alternative is shipping raw documents and an ingestion pipeline. That means your team needs to:
| Aspexilary | DIY Pipeline | |
|---|---|---|
| Time to first query | Under 60 seconds | 4–8 hours |
| GPU required | No | Recommended |
| Embedding pipeline | Pre-built snapshot | Run yourself |
| Document processing | Already done | PDF extraction + chunking |
| HNSW index build | Pre-indexed | Runs at startup |
| Debug failures | Tested before delivery | Your problem |
| Retrieval quality | Validated at build | Unknown until tested |
What's Included
- Docker Compose stack — four containers, one command to start. Pre-configured networking, health checks, and restart policies.
- Pre-built Qdrant snapshot — your domain's regulatory corpus, extracted, chunked, embedded, and HNSW-indexed. Loads on startup.
- Fine-tuned LLM — quantized to Q4_K_M for CPU inference. Trained on domain-specific regulatory language, not generic web text.
- Web UI — domain-branded query interface with conversation history. Drop-in ready, no frontend build step.
- Network isolation diagram — Graphviz render of actual Docker network state. Machine-generated, not hand-drawn.
- Egress proof — tcpdump capture proving zero external traffic during query processing. SHA-256 hashed and appended to audit log.
- Hash-chained audit trail — every query logged with SHA-256 chain. Tampering breaks the chain. Suitable for HIPAA AU, SOC 2 CC7, FedRAMP AU-2/AU-9.
System Requirements
Every domain runs on CPU out of the box. Add a GPU for faster inference. Here are real-world specs based on our production environment.
| Component | CPU Mode | GPU Mode |
|---|---|---|
| LLM (Q4_K_M quantized) | 5.6 GB system RAM | 5.6 GB VRAM |
| LLM (FP16 full precision) | 18 GB system RAM | 18 GB VRAM |
| BGE-Large embedder | ~800 MB RAM | ~1.8 GB VRAM |
| Qdrant (per domain) | 25–130 MB RAM | 25–130 MB RAM |
| Disk (per domain) | 8–15 GB | 8–15 GB |
| Docker Engine | 24+ | 24+ with NVIDIA Container Toolkit |
| Internet | Not required after pull | Not required after pull |
Multi-domain deployments share the LLM and embedder across all domains — only Qdrant and the API container are per-domain. Running 10 domains adds ~300 MB RAM total, not 10x the model weight. Our production environment runs 608 domains on a single RTX 5090 (32 GB VRAM) with an on-demand gateway that starts services as needed and stops them after 10 minutes of idle time.
Questions about deployment
info@aspexilary.ai