RAG, Vector Search & Knowledge Bases
How AI answers questions from your own company documents instead of guessing from general training.
What you'll learn
- Explain what RAG does in plain terms
- Understand why vector search finds the right passage
- Know why good source documents matter most
A plain chatbot only knows what it learned during training — general knowledge from the public internet, frozen at a point in time. It has never read your HR policy, your product manual, or last quarter’s report. So when you ask it something specific to your company, it either says it doesn’t know or, worse, makes up a confident-sounding answer. RAG is the technique that fixes this by letting the AI look things up in your documents before it answers. Once you grasp the basic flow, a lot of “AI for our knowledge base” projects suddenly make sense.
What RAG actually does
RAG stands for retrieval-augmented generation, which is just three steps in plain English. First it retrieves: when you ask a question, the system searches your document collection for the most relevant passages. Then it augments: it slips those passages into the prompt alongside your question. Finally it generates: the language model writes an answer grounded in the text it was just handed. The result reads like a normal chatbot reply, but it’s based on your real documents rather than the model’s general memory — and it can cite where it found the answer. Crucially, your documents stay outside the model: nothing is permanently baked in, so when a policy changes you simply update the document and the next answer reflects it.
Retrieve from your documents, augment the prompt, then generate a grounded answer.
Why vector search finds the right passage
The clever part of the “retrieve” step is vector search. Ordinary keyword search looks for matching words, so it misses an answer that uses different wording from your question. Vector search instead converts text into lists of numbers that capture meaning, so a question about “time off for a new baby” can find the policy paragraph headed “parental leave,” even with no shared keywords. The collection of your documents, prepared this way and stored for fast lookup, is your knowledge base. That’s why teams talk about “putting our policies into a vector database” — they’re making the content searchable by meaning.
RAG doesn’t make the AI smarter; it makes it better informed. The answer is only ever as good and as current as the documents you feed it.
Garbage in, garbage out
Because the answer is grounded in retrieved text, the quality of your source documents matters more than anything. If your knowledge base contains an outdated policy, RAG will faithfully quote the outdated policy. If two documents contradict each other, the system may surface either one. So a RAG project is as much a content project as a technical one: keeping documents current, removing duplicates, and marking which version is authoritative pays off directly in answer quality. It’s worth deciding early who owns that upkeep, because a knowledge base that nobody maintains slowly fills with stale answers that sound just as confident as the correct ones.
What it still gets wrong
RAG sharply reduces made-up answers, but it doesn’t eliminate them. The model can still misread a passage, blend two sources awkwardly, or answer confidently when the documents don’t actually cover the question. That’s why good RAG tools show their sources — a link or citation to the passage used — so you can click through and check. Treat the citation as the real answer and the generated paragraph as a helpful summary of it.
Spot it: RAG in action
Read each situation and decide for yourself, then tap a card to flip it and check your answer.
Sort the RAG concepts
Drag each item into the bucket it belongs to — or tap an item, then tap a bucket. Hit Check placement when you’re done.
Here's where each one goes:
- Searching for the most relevant passage → Retrieve — this is the first step in the RAG flow, before the model sees any text.
- Writing a fluent answer grounded in retrieved text → Generate — the LLM writes based on what was retrieved, not from general memory.
- Converting text into meaning-capturing numbers → Retrieve — this is vector search: turning documents into embeddings for semantic lookup.
- Citing the source paragraph so the user can verify → Generate — good RAG tools include citations in the generated response.
- Finding "annual leave entitlement" from "days off" → Retrieve — semantic matching with no shared keywords is what vector search does.
- Producing a summary when docs don't cover the question → Generate — the model still generates, but without good retrieved context it may hallucinate.
Tip: drag with a mouse, or tap an item then tap a bucket on touch screens. Get one wrong and the answer key appears.
How to use it
When someone proposes “an AI that answers questions from our documents,” you now know the shape of it: retrieve, augment, generate, powered by vector search over a knowledge base. Ask the questions that decide success: “Which documents will it search, and who keeps them current?” “Will it show the source so we can verify the answer?” “What happens when the documents don’t cover the question?” And remind people that tidying the source content is not a side task — it’s the part that most determines whether the answers can be trusted.
Quick check
1. RAG (retrieval-augmented generation) lets an AI…
2. Vector search is useful because it…
3. If your knowledge base contains an outdated policy, RAG will…