Inference Beginner Also known as: Retrieval-Augmented Generation · Generazione aumentata da recupero

RAG

/rag/

A technique that fetches relevant text from an external data source and inserts it into the model's prompt before generating the response.

ShareLinkedIn X

In practice

It lets an LLM answer using company documents, internal knowledge bases, or up-to-date articles without training. It cuts hallucinations on specific data and refreshes knowledge without re-training. It is the first architecture to consider for an enterprise chatbot.

Seen in the wild

20 entries mentioning it

October 30, 2025

Cohere Command A: the foundation model that runs on-prem on 2 GPUs

Medium
March 28, 2025

KoboldCpp v1.84: native RAG with embedded ChromaDB, no separate servers

Medium
October 14, 2024

Oracle OCI Generative AI: Llama 3.1, dedicated clusters, and RAG with Oracle Database 23ai

Medium
September 1, 2024

AnythingLLM 1.0: the complete local RAG stack for enterprise use

High
July 15, 2024

Dify 0.7: visual agentic workflows with integrated RAG and 10+ LLMs

Medium
June 14, 2024

TabbyML: open-source GitHub Copilot alternative with self-hosted codebase RAG

Medium
June 5, 2024

KoboldCpp adds integrated RAG: offline all-in-one LLM with documents and character AI

Medium
May 21, 2024

Copilot+ PC and Recall: Microsoft tries 'infinite PC memory', privacy backlash erupts

High
April 16, 2024

Notion AI Q&A: answers across the entire enterprise workspace with source citation

Medium
April 4, 2024

Cohere Command R+: an enterprise-focused model built for RAG and tool use

Medium
March 20, 2024

Automatic Prefix Caching in vLLM: Shared KV Cache Across Requests for Near-Zero TTFT

High
February 20, 2024

Box AI: questions and summaries on enterprise documents with page citations

Medium
February 6, 2024

Indirect Prompt Injection: the attack vector in RAG systems and AI agents

High
January 15, 2024

Open WebUI: ChatGPT-style web interface for Ollama with multi-user and history

High
January 10, 2024

LlamaIndex 0.10 stable: the standard RAG framework for local LLMs

Medium
December 18, 2023

AnythingLLM: full local RAG with web UI and embedded vector DB

Medium
August 25, 2023

SuperAGI: the first open-source autonomous agent platform with a GUI

Medium
May 14, 2023

privateGPT: chat with your documents, completely offline

High
December 8, 2021

RETRO: DeepMind foreshadows RAG with retrieval over 2 trillion tokens

High
May 22, 2020

RAG: Retrieval-Augmented Generation enters the literature

Landmark

← All terms

In practice

Related terms

Seen in the wild