Some definitions first
RAG (retrieval-augmented generation) provides LLMs with access to outside knowledge, while tool calls enable LLMs to use external tools. Tool calling is sometimes also called tool use, for example here.
For RAG, outside knowledge could be news, websites, R&D publications, or molecule data, for example. And in the case of tool calls, the external tools could be other software tools such as calculators, but also kinetic tools such as robots.
How RAG works
Typically, RAG uses the following steps:
- Take an input from a user, e.g. a question.
- Use some method (e.g. an LLM or some kind of parser) to convert the user's question into a data query.
- With the data query, retrieve relevant data, e.g. documents, text snippets, molecule data.
- Pass the retrieved data to an LLM, along with the user's question.
- Combining the data, the question, and perhaps some kind of instruction prompt, the LLM writes an answer to the user's question.
How tool calls work
Similar to how RAG connects LLMs to external knowledge, tool calls give LLMs the ability to use external tools. Basically, the goal of a tool call is to get from a user command like "Get me the fundamentals for MSFT as of today" to executing a tool like this:
get_fundamentals(ticker_symbol: string, date: string)
In this example, the role of the get_fundamentals
call is to get fundamentals like market cap, average volume, etc. for a user-defined stock (ticker_symbol
), e.g. MSFT (Microsoft), at a user-defined date
, from an API that provides such data (an API is an interface that lets one computer or app get data from another computer or app). After the call, the LLM takes the data returned from the API, combines it with the user question ("Get me the fundamentals...") and perhaps some extra instructions how to respond, and writes a natural language response. This response could look like this, for example:
Here are the fundamentals for MSFT on 15 April, 2024:
- MARKET CAP: 3.02T USD
- AVG VOLUME: 20.41M
- P/E RATIO: 35.20
- DIVIDEND YIELD: 0.74%
Notice that the LLM does not use the tool directly. Rather, it generates the (usually JSON) code that allows some other part of your softare, not the LLM itself, to call the get_fundamentals
tool.
This distinction is important. It means that you can build LLM-independent safety features against potentially harmful tool calls. For example, if your tool call does something in the real world (think robots), you will most likely want to introduce some safety features and guardrails that are independent of the LLM itself.
RAG + tool calling
You can combine tool calling and RAG. For example, you could build another tool, make_ticker_symbol_onepager
. This tool could get the fundamentals for a ticker symbol with get_fundamentals
, and then use RAG to get the latest news on your ticker symbol. Based on all that, it could then write a one-pager that combines all these data.
Why we need RAG
1. LLMs are reasoning engines, not knowledge databases
Because LLMs give very convincing answers when asked a question, it is tempting to think of them as knowledge databases. But that is not how they are trained, and it is not what they are designed for. Rather, the objective of training an LLM is to memorize and compress knowledge (source). And when you compress knowledge, you forget some of the details of the knowledge. In fact, that's the point: You compress, and as a result, you get reasoning. You generalize, you draw analogies, etc. In fact, compression seems to be a fundamental principle in human learning also (source).
RAG takes this into account. It separates the knowledge access part from the reasoning part, and uses LLMs for the latter.
2. The world produces new data much faster than LLMs can be trained
According to some estimates, more than 300 TB of new data are generated every day (source). By contrast, GPT-3 apparently was trained on ca. 45 TB of data (source). So even if newer models were trained on let's say 10x the amount of data, the new data produced every day would still outpace the LLM training process very quickly.
"Sure, but some day somebody will figure out a faster LLM training process that can keep up with new data produced."
OK, but even if this happens, LLMs still won't be knowledge bases but reasoning engines (see previous section).
3. With RAG, updating your knowledge base is easy
Even if you could somehow "train new knowledge into an LLM", this process, which is called fine-tuning, takes a while. By contrast, upserting new documents into a knowledge database is typically a matter of milliseconds. Not to mention that in a knowledge database you can use metadata such as document authors, dates, sources, etc.. By contrast, there is no straightforward way for doing this with an LLM.
4. You can identify the source of information
In most serious information search scenarios, people want to know where a piece of information came from. With RAG, you can always provide the source of any piece of information. With LLMs, you cannot do this.
Getting LLMs to interact with the physical world
I already mentioned tool calling for robots above. There are companies that build LLMs and other AI models specifically for interacting with the physical world. Here are some examples:
- Physical Intelligence builds foundation models for physically-actuated devices such as robots.
- Archetype AI develops AI for understanding real-world physical data in real time.
- Aitomatic builds domain-specific knowledge engines for industries like Oil & Gas or HVAC.
- Stanhope AI focuses on making AI energy efficient so that these systems can be integrated into devices.
- Hypersurfaces converts objects into data-driven interfaces.
- Plato Systems provides spatial intelligence systems for industrial manufacturing operations.
(This post is also available at Mergeflow's blog)