Field Note RAG Security March 2026

Keeping Enterprise Secrets Out of the Internet

On-premises RAG must stay current without leaking proprietary data. Here is how the architecture actually works — and why the answer is simpler than it looks.

A question we hear from every regulated-industry customer is some version of this: "If the AI is running on our servers, how do we know our queries aren't going out to the internet?" It is a legitimate concern, and the short answer is: you enforce it at the network layer, not the policy layer.

But that raises the next question immediately. If the query containers have no internet access, how does the system stay current? Regulations change. Case law evolves. Clinical guidelines are updated. An AI system frozen at ingestion time has a rapidly shrinking useful life in any compliance-sensitive domain.

The answer is a strict separation between two data flows that never touch each other.

Two Paths, One Wall Between Them

Every enterprise RAG deployment we build operates on this principle: the query path and the data acquisition path are architecturally isolated. They share a vector store, but they share nothing else — not network access, not runtime context, not pipeline code.

┌────────────────────────────────────────────────────┐ INGESTION PLATFORM (DMZ — no enterprise data here) fetch approved sources → chunk → embed → scan └──────────────────────────┬─────────────────────────┘ sanitized vectors only ┌──────────────────────────▼─────────────────────────┐ SHARED VECTOR STORE (on-premises) /construction_public /healthcare_public /construction_enterprise /healthcare_enterprise └──────┬──────────────┬──────────────┬───────────────┘ RAG RAG RAG ← no internet access Construction Healthcare Legal

The ingestion platform is the only component with internet access. It runs in a DMZ, has no visibility into enterprise documents, and never sees a user query. It fetches from a fixed allowlist of approved public sources — regulatory bodies, standards organizations, public databases — on a schedule. Everything it produces passes through a content scanner before entering the vector store.

The RAG query containers sit on an internal-only network. At the Docker level, they have no route to the internet. This is not a configuration setting that can drift — it is a network topology enforcement. A compromised RAG container cannot exfiltrate data because it has nowhere to send it.

The Vector Store as the Safe Hand-Off Point

The shared vector store is where the two paths meet, and the design ensures the meeting is one-directional and safe. The ingestion platform writes vectors in. The RAG containers read vectors out. There is no channel through which query content can travel back toward the ingestion platform or out to the internet.

Each domain gets two collections: one for public knowledge (updated by the ingestion platform on schedule) and one for the customer's proprietary documents (populated during onboarding, never touched by external pipelines). When a user submits a query, the RAG container searches both collections, merges the results, and sends the combined context to the local LLM for inference. The user gets answers grounded in both current public information and their own internal documents.

Compliance Tiers

Not every customer has the same posture. The architecture supports three configurations:

Tier
Use Case
Air-Gapped
No internet access of any kind. Public knowledge updated via vendor data feeds or physical media. For classified and highest-sensitivity environments.
Scheduled Ingestion
Default enterprise posture. Query path fully isolated. Ingestion platform refreshes approved public sources on a configurable schedule. Appropriate for HIPAA, SOC 2, FedRAMP Moderate.
Controlled Real-Time
Query path remains isolated. Optional real-time search with mandatory query sanitization — proprietary context is stripped before any search query exits the perimeter. All egress is logged.

Scaling to Multiple Domains

One of the operational wins of this architecture is that adding a new knowledge domain — construction, healthcare, legal, HR, finance — does not require new ingestion infrastructure. The ingestion platform, embedding model, vector store, and LLM inference are all shared. A new domain is a configuration entry and a container.

This matters for enterprise deployments where a single customer may need several specialized RAG instances. The security model scales with zero additional attack surface per domain.

The Audit Trail

Every component produces structured, hash-chained audit logs. Ingestion events record the source URL, fetch timestamp, content hash, and destination collection. Query events record what was retrieved and from which collection. The chain of custody for every piece of data — from public source through embedding through retrieval through inference — is verifiable and tamper-evident.

Enterprise query traffic is fully isolated from internet egress at the network layer. All external data enters through a single audited ingestion pipeline with configurable source allowlists, local embedding, and content scanning. Enterprise documents are stored in isolated collections with no shared pipeline. All system activity produces a hash-chained audit record suitable for HIPAA, SOC 2, and FedRAMP evidence requirements.

The goal of this architecture is to make the compliance story not just technically true, but demonstrably true — so that when an auditor asks how you know enterprise queries are not going to the internet, the answer is not a policy document. It is a network diagram and a log file.