How RAG Chatbots Work
Retrieval augmented generation is the process of supplementing a user’s input to a large language model with additional information that wehave retrieved from somewhere else. The LLM can then use that information to augment the response that it generates.
It starts with a user’s question.
The first thing that happens is the retrieval step. In it, we take the user’s question and search for the most relevant content from a knowledge base that might answer it.
It is by far the most important, and most complex part of the RAG chain. Essentially it’s about pulling out the best chunks of relevant information related to the user’s query.
We cannot send the knowledge base to the LLM.
- models have built-in limits on how much text they can consume at a time (though these are quickly increasing).
- cost — sending huge amounts of text gets quite expensive.
- there is evidence suggesting that sending small amounts of relevant information results in better answers.
Once we’ve gotten the relevant information out of our knowledge base, we send it, along with the user’s question, to the LLM, which then “reads” the provided information and answers the question. This is…