What is retrieval-augmented generation (rag)?

An architecture pattern that enhances LLM responses by retrieving relevant documents from an external knowledge base and including them in the prompt as context before generating a response.

Retrieval-Augmented Generation (RAG) - AI Glossary

Retrieval-augmented generation (RAG) is an architecture pattern that grounds LLM responses in external, up-to-date knowledge. Instead of relying solely on the model's training data, RAG systems retrieve relevant documents from a knowledge base and inject them into the prompt as context, enabling the model to generate responses informed by specific, verifiable information.

The RAG pipeline has three stages. First, the retrieval stage takes the user's query and searches a knowledge base — typically a vector database containing embedded documents — to find the most semantically relevant passages. Second, the augmentation stage formats the retrieved documents and inserts them into the prompt alongside the user's query. Third, the generation stage sends the augmented prompt to the LLM, which produces a response grounded in the retrieved context.

RAG solves several fundamental LLM limitations. It addresses knowledge cutoff by providing access to information beyond the model's training date. It reduces hallucinations by grounding responses in verifiable source material. It enables domain-specific expertise without fine-tuning by injecting specialized documents at query time.

The quality of a RAG system depends heavily on the retrieval component. Chunking strategy — how documents are split into searchable passages — affects whether relevant information is found. Embedding model selection impacts semantic search accuracy. Re-ranking algorithms can improve precision by reordering retrieved results before they enter the prompt.

Prompt design is critical in RAG systems. The system prompt must instruct the model to base its responses on the provided context, cite sources when possible, and acknowledge when the retrieved documents don't contain sufficient information to answer the question. Managing the context window budget between retrieved documents and the rest of the prompt requires careful optimization.

Why RAG matters: Foundation models are frozen in time — their knowledge ends at training cutoff. For enterprise applications requiring current, accurate, or proprietary information, RAG is the primary architecture for bridging that gap. It also makes AI systems more auditable: when responses are grounded in specific retrieved documents, it becomes possible to trace claims back to verifiable source material rather than relying on opaque model knowledge.

Retrieval-Augmented Generation (RAG)

Context Window

Prompt Template

Prompt Engineering

Manage your prompts with PromptOT.