In practice
It lets an LLM answer using company documents, internal knowledge bases, or up-to-date articles without training. It cuts hallucinations on specific data and refreshes knowledge without re-training. It is the first architecture to consider for an enterprise chatbot.
Related terms
Seen in the wild
20 entries mentioning it- MediumCohere Command A: the foundation model that runs on-prem on 2 GPUs
- MediumKoboldCpp v1.84: native RAG with embedded ChromaDB, no separate servers
- MediumOracle OCI Generative AI: Llama 3.1, dedicated clusters, and RAG with Oracle Database 23ai
- HighAnythingLLM 1.0: the complete local RAG stack for enterprise use
- MediumDify 0.7: visual agentic workflows with integrated RAG and 10+ LLMs
- MediumTabbyML: open-source GitHub Copilot alternative with self-hosted codebase RAG
- MediumKoboldCpp adds integrated RAG: offline all-in-one LLM with documents and character AI
- HighCopilot+ PC and Recall: Microsoft tries 'infinite PC memory', privacy backlash erupts
- MediumNotion AI Q&A: answers across the entire enterprise workspace with source citation
- MediumCohere Command R+: an enterprise-focused model built for RAG and tool use
- HighAutomatic Prefix Caching in vLLM: Shared KV Cache Across Requests for Near-Zero TTFT
- MediumBox AI: questions and summaries on enterprise documents with page citations
- HighIndirect Prompt Injection: the attack vector in RAG systems and AI agents
- HighOpen WebUI: ChatGPT-style web interface for Ollama with multi-user and history
- MediumLlamaIndex 0.10 stable: the standard RAG framework for local LLMs
- MediumAnythingLLM: full local RAG with web UI and embedded vector DB
- MediumSuperAGI: the first open-source autonomous agent platform with a GUI
- HighprivateGPT: chat with your documents, completely offline
- HighRETRO: DeepMind foreshadows RAG with retrieval over 2 trillion tokens
- LandmarkRAG: Retrieval-Augmented Generation enters the literature