Last Updated: March 20, 2026
Most enterprise RAG pipelines have a gap the LLM layer never sees: the retrieval layer is wide open. RAG security access control is the set of mechanisms that govern who can retrieve what, prevent document poisoning, and produce audit records regulators can actually read. Without it, your language model may be compliant. Your data pipeline almost certainly is not.
Vendor research indicates that most enterprise RAG deployments ship without role-based access controls, audit trails, or permission-aware retrieval logic. In multi-tenant environments, multi-tenant RAG systems frequently expose cross-tenant data during retrieval. This post covers the four most critical RAG security controls: chunk-level access, prompt injection via poisoned documents, PII handling under GDPR and HIPAA, and audit logging requirements for production systems.
How does access control work in a RAG vector database?
RAG security access control in a vector database works by attaching permission metadata to each chunk at ingestion and filtering by those permissions at query time, before retrieval results are returned.
The critical detail is timing. Post-retrieval filtering is itself a data leak. If your system retrieves 40 documents and then discards 30 because the user lacks permission, those 30 were still scanned. Filter at the retrieval call, not after. Pinecone uses namespace-scoped API keys. Weaviate supports tenant-aware classes with dedicated shards. Qdrant uses named collections and tenant-level sharding. Milvus supports collections and partitions but needs the platform team to wire up access controls manually in self-hosted deployments.
Document-level access is not enough for regulated content. A single contract or medical record can contain sections with different permission levels. Chunk-level policies, where access is set for each vector independently, are the right approach when sensitivity varies within documents. Zilliz has published guidance on per-user, per-chunk access policies for this use case.
What is prompt injection via poisoned documents in RAG?
Indirect prompt injection in RAG is an attack where hidden instructions are embedded inside a corpus document. When the pipeline retrieves that document, the instructions execute inside the model’s context.
This is distinct from direct prompt injection at the user input layer. The payload is injected once, asynchronously, and activates for every future user whose query triggers that document’s retrieval. Research at USENIX Security 2025 (PoisonedRAG) found five crafted documents among millions can achieve a 90% attack success rate in controlled conditions. OWASP’s 2025 Top 10 for LLM Applications formalised this as LLM08:2025 Vector and Embedding Weaknesses. Mitigations include input validation before embedding, content scanning, and preventing retrieved text from overriding system prompt instructions. Amazon Bedrock Knowledge Bases and LlamaIndex both publish guidance on corpus integrity controls.
How does GDPR right to deletion apply to RAG embeddings?
Under GDPR Article 17, deleting the source document is not sufficient. All corresponding embeddings and chunk-level vectors derived from that document must also be deleted from the vector store.
No major provider currently guarantees atomic deletion of all derived embeddings across every replica in a single API call. A complete GDPR erasure request covers: source document deletion, vector deletion by document ID across all namespaces, index rebuilds where soft-deletion is in use, cache invalidation, and written confirmation that no stale vectors survive. AWS machine learning research has confirmed that embeddings reveal nearly as much as the underlying raw text, so they need the same security controls as source data: encryption at rest, access controls, and deletion tracking.
What does HIPAA require for PHI in a RAG system?
HIPAA’s Minimum Necessary Standard requires that a healthcare RAG system access only the PHI essential for the specific task, not the full patient database. PHI must be de-identified before chunking and embedding.
De-identification should happen before the document enters the chunking pipeline. Named Entity Recognition performs better on full documents than on fragments. HIPAA recognises two methods: Safe Harbor (remove all 18 specified identifiers) and Expert Determination (statistically documented low re-identification risk). Three tools handle RAG pipeline PII redaction in practice: Microsoft Presidio (open source, NER-based, integrates with LlamaIndex), Amazon Comprehend (managed, replaces detected entities with typed placeholders), and Tonic Textual (strips PII before vectorisation with Pinecone). A Business Associate Agreement is mandatory when any AI service creates, receives, or transmits PHI on behalf of a covered entity. This includes the vector database provider.
What should a RAG system audit log capture?
A production RAG audit log must record one discrete access entry per document retrieved, not one per user session. HIPAA, GDPR, and SOX treat each document retrieval as a distinct regulated access event.
Each entry needs: authenticated user identity, AI system identity, document IDs retrieved with sensitivity classification, permission check result, timestamp, and originating endpoint. Audit events must feed a SIEM in real time. Batched logs don’t satisfy continuous monitoring under NIST AI RMF or SOC 2 Type II. No major vector database, including Pinecone, Weaviate, Qdrant, or Milvus, ships production-grade per-query audit logging natively. Build it at the application layer. Privacera and Rubrik Annapurna offer data access governance layers that can apply to RAG pipelines.
How do major vector databases compare on security features?
Pinecone, managed Weaviate, and managed Qdrant are turnkey for compliance in cloud deployments. Milvus gives more control but puts security implementation on the platform team.
| Feature | Pinecone | Weaviate | Qdrant | Milvus |
|---|---|---|---|---|
| RBAC | Yes (per-index API keys) | Yes | Yes | Yes (self-managed) |
| Tenant isolation | Logical namespaces | Tenant classes + dedicated shards | Named collections + sharding | Collections + partitions |
| SOC 2 Type II | Yes (managed) | Yes (managed) | Yes (cloud) | Requires self-setup |
| HIPAA attestation | Yes | Yes (AWS, 2025) | HIPAA-ready (enterprise) | Requires self-setup |
| GDPR alignment | Yes | Yes | Yes | Yes (with config) |
| Private / BYOC deployment | Yes (AWS/Azure/GCP) | Yes | Yes | Yes (self-hosted) |
| Encryption at rest | Yes | Yes | Yes | Yes |
| Native per-query audit logs | Limited (platform logs) | Limited (platform logs) | Limited (platform logs) | Limited (platform logs) |
Per-query audit logging is the one gap all four share. Every production RAG system needs it, and every team has to build it at the application layer.
What to do next
Audit your retrieval layer first. Check whether access decisions happen before or after the vector search call. Confirm chunk-level metadata filters are in place. Verify your ingestion pipeline runs PII redaction tools like Microsoft Presidio or Amazon Comprehend before documents enter the index. If you work in a regulated industry, confirm your vector database provider has signed a BAA (for HIPAA) and that you have a documented deletion workflow for GDPR erasure requests.
Read next: Retrieval-Augmented Generation (RAG) for Enterprise AI Systems