11 - Grounding & RAG

Grounding LLMs with Retrieval Augmented Generation

Jun 04, 2025

In our last Substack, we talked about how we can avoid or fix the problem of hallucination.

We talked about the process of grounding, which is providing context with more up-to-date data sources to improve accuracy of a model.

It’ll be extremely time consuming to re-train a model. It will also take up a ton of space, which would make it computationally expensive.

With RAG, you could train a model with static data and then use real-time data from up-to-date sources and add it onto that existing static data.

Basically, using existing LLMs but adding external tools to them.

So, for example, using Gemini as an LLM, which already has static training data with a cutoff date, but you can also provide it with some static public data, and then use NewsAPI for recent news articles to supplement the static data.

You can also use that real-time data and store it inside a knowledge graph

It’s great because you surpass the training cutoff with multiple different datasets and APIs and you can add domain specific knowledge then ground it to an LLM.

You can still put bad data during the generation process from these external sources. It will probably give you a bad output.

So, provide better data, get a better output.

Semantic Search

However, one of the problems using RAG is understanding what the user wants.

Instead of focusing on keywords, we can use semantic search to understand phrases and context better.

So, the results will be based on intent of the query.

Vectors in NLP

We also have vectors, which are generated numerical values that closely align the word with other related words.

But they can also be documents, images, audio files, and other data types.

For example, the word dog will be closer to puppy and canine than the word math.

You can use an LLM to create the vectors. These vectors are called embeddings. The definition of embedding is the conversion of words into continuous numbers.

The embeddings capture the context and relationship between multiple categories.

Semantic search uses contextual embeddings to understand the intent behind the query.

There are models available to create vectors like OpenAIs text embedding model.

Words with similar contexts will have vectors that are close together, kind of like clustering. The unrelated words will be farther in terms of distance.

So, the steps would be:

User query
Embedding of query
Vector comparison and similarity scoring
Return most relevant results to user

Hopefully, this helps understand grounding and RAG better.

Share Stats to Systems: Statistical Thinking + Scalable ML

Stats to Systems: Statistical Thinking + Scalable ML

Discussion about this post

Ready for more?