The practical stack I use for enterprise RAG: retrieval quality, observability, eval loops, and guardrails.
March 28, 20262 min readRAG
Why most RAG demos fail in production
Most demos work because the data is tiny, clean, and manually curated. Real enterprise data is the opposite: noisy documents, broken formatting, conflicting versions, missing metadata, and access rules that change by team.The goal is not “it answers once”. The goal is consistent, explainable answers under latency, cost, and governance constraints.
Reference architecture
A production RAG system usually needs more than a vector database and a prompt. My baseline architecture has six layers:
Source registry — what sources are allowed, who owns them, how often they refresh.
Ingestion pipeline — parsers, OCR when needed, metadata normalization, and versioned exports.
Chunking strategy — chunk boundaries tuned by document type, not one global magic number.