Transforming RAG systems with enhanced context (a langchain implementation)

Transforming RAG systems with enhanced context (a langchain implementation)

Table of Contents

Audio of the post:

I recently stumbled upon Anthropic’s fascinating post about contextual retrieval and was immediately intrigued by the potential to revolutionize RAG systems! The concept was so compelling that I decided to put it to the test with a real-world experiment using 10 years of annual reports from CUAC FM. What started as curiosity turned into a comprehensive research project. I crafted 30 carefully designed questions paired with 30 human-reviewed reference answers to rigorously evaluate whether contextual retrieval truly delivers on its promises. The results? Absolutely game-changing!

Want to dive into the complete research?

Naive RAG report

Full codebase and examples: GitHub Repository

Interactive research report: Live Demo Site

But first, let me share what makes contextual retrieval such a breakthrough and why traditional RAG systems have been leaving performance on the table.

Retrieval-Augmented Generation (RAG) has revolutionized how we build AI applications that can access and reason over large knowledge bases. However, traditional RAG systems often struggle with a critical limitation: context loss during retrieval. Enter contextual retrieval—a game-changing approach that significantly improves retrieval accuracy and downstream generation quality.

The problem with traditional RAG

In naive RAG implementations, documents are chunked into smaller pieces for efficient retrieval. While this approach works well for storage and search speed, it often strips away crucial context that could make the difference between finding the right information and missing it entirely.

Consider this scenario: a chunk containing “The new policy increases efficiency by 40%” might be retrieved, but without knowing which policy, which department, or which time period, the information becomes ambiguous or even misleading.

What is contextual retrieval?

Contextual retrieval enhances traditional RAG by enriching each document chunk with relevant context before storage and retrieval. This context can include:

  • Document metadata (title, author, date, source)
  • Surrounding content summaries
  • Section headers and document structure
  • Key entities and relationships
  • Custom contextual information based on domain needs

Key benefits of contextual retrieval

  1. Improved retrieval accuracy

By embedding contextual information alongside content, retrieval systems can better match user queries with relevant chunks, even when the query terms don’t appear directly in the chunk text.

  1. Reduced ambiguity

Context helps disambiguate similar-looking content from different sources, time periods, or domains, leading to more precise retrievals.

  1. Better semantic understanding

Contextual information provides semantic anchors that help embedding models understand the true meaning and relevance of content.

  1. Enhanced user experience

More accurate retrievals lead to better generated responses, improving overall user satisfaction and trust in the system.

Let’s explore how to implement contextual retrieval in Python using popular libraries.

Basic contextual chunk enhancement

# Load the JSON
with open('chunks/memoria22.json', 'r') as f:
    data = json.load(f)

chunks = []
texts = []
for entry in data:
    for content in entry.get('content', []):
        for chunk in content.get('chunks', []):
            text = chunk['text']
            metadata = chunk.get('metadata', {})
            chunks.append((text, metadata))
            texts.append(text)

# Prepare BM25
tokenized_corpus = [text.split() for text in texts]
bm25 = BM25Okapi(tokenized_corpus)

# Define the prompt template for the chain
template = """
<document>
{doc_content}
</document>

Here is the chunk we want to situate within the whole document
<chunk>
{chunk_content}
</chunk>

Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk.
Answer only with the succinct context and nothing else.
"""
prompt = PromptTemplate(
    input_variables=["doc_content", "chunk_content"],
    template=template
)

chain = prompt | llm_model

doc_content = "\n".join(texts)
documents = []

for i, (text, metadata) in enumerate(chunks):
    # Run the chain to get the context    
    context = chain.invoke({"doc_content": doc_content, "chunk_content": text})
    print(context)
    if hasattr(context, "content"):
        context = context.content
    metadata['situated_context'] = context
    metadata['original_context'] = chunk
    scores = bm25.get_scores(text.split())
    metadata['bm25_score'] = float(scores[i])
    document = Document(page_content=text, metadata=metadata)
    documents.append(document)
    full_documents.append(document)

vector_store.add_documents(documents=documents)
def retriever_hybrid(query):
    # BM25 retriever
    bm25_retriever = BM25Retriever.from_documents(full_documents)
    bm25_retriever.k = 20

    # Vector retriever from in-memory vector store
    vector_retriever = vector_store.as_retriever(search_kwargs={"k": 20},search_type="similarity")
    # Ensemble retriever
    ensemble_retriever = EnsembleRetriever(
        retrievers=[bm25_retriever, vector_retriever],
        weights=[0.4, 0.6]
    )
    hybrid_result = ensemble_retriever.get_relevant_documents(query)
    return hybrid_result

query = "What is CUAC FM?"

result = chain.invoke({"context": retriever_hybrid(query), "query": query})
print(result.content)

Best practices for contextual retrieval

  1. Context granularity

Balance the amount of context you include. Too little context provides insufficient information, while too much can dilute the relevance signal and increase computational costs.

  1. Domain-specific context

Tailor your contextual information to your specific domain. Legal documents might need different context than scientific papers or financial reports.

  1. Hierarchical context

Consider implementing hierarchical context that includes document-level, section-level, and paragraph-level information for maximum flexibility.

  1. Context caching

Cache contextual embeddings to avoid recomputation and improve system performance, especially for large document collections.

  1. Regular context updates

Implement mechanisms to update context when documents change or when you discover new contextual relationships.

Performance considerations

Contextual retrieval does introduce some overhead:

  • Storage: Enhanced chunks require more storage space
  • Computation: Generating contextual information adds processing time
  • Memory: Larger embeddings and context data increase memory usage

However, the improved accuracy and user experience typically justify these costs, especially in production applications where retrieval quality directly impacts user satisfaction.

Real-world performance: CUAC FM case study results

To validate these theoretical benefits, I conducted a comprehensive evaluation using 10 years of CUAC FM annual reports. The results speak volumes about the transformative power of contextual retrieval! I performed a RAGAS evaluation of the responses against the human-created answers, and the hybrid search approach proved to be the most effective for all questions.

Naive RAG: Naive RAG report

Contextual RAG: Contextual RAG report

Dramatic performance improvements

The numbers don’t lie because contextual retrieval delivers exceptional performance gains across all key RAG evaluation metrics:

  • Answer relevancy: +2.2% improvement (from 0.89 to 0.91)

While traditional RAG often returned tangentially related information contextual retrieval consistently provided directly relevant answers

  • Context precision: +33.8% boost (from 0.74 to 1.00)

Massive reduction in irrelevant context being retrieved. Much cleaner, more focused information feeding into the generator

  • Context recall: +2.2% enhancement (from 0.91 to 0.93)

Significantly better at finding ALL relevant information. Contextual cues helped surface related content that traditional methods missed

What these numbers mean in practice**

These aren’t just abstract metrics—they translate to real-world benefits:

  • Fewer frustrated users: 34%+ improvement in context precision means users find what they’re looking for much more often
  • Higher confidence: the boost in faithfulness means users can trust the responses more
  • Better decision-making: the improvement in correctness leads to more reliable insights from annual reports
  • Reduced manual verification: Higher precision means less time spent fact-checking AI responses

Conclusion

The CUAC FM experiment revealed several critical insights:

  • Context matters most for complex queries: Simple factual questions saw modest improvements, while complex analytical queries saw improvements of 30%+ in some cases
  • Annual report document complexity: Annual reports contain interconnected information where context is crucial—perfect testing ground for contextual retrieval
  • Compound benefits: As individual metrics improved, the overall user experience improved exponentially due to the multiplicative effect of better retrieval
  • Consistency gains: Not only did average performance improve, but the variance decreased—more consistent, reliable results across different query types

Contextual retrieval represents a significant advancement in RAG system design. By preserving and leveraging contextual information throughout the retrieval process, we can build more accurate, reliable, and user-friendly AI applications.

The Python implementations shown here provide a solid foundation for incorporating contextual retrieval into your own projects. Start simple with basic context enhancement, then gradually add more sophisticated features like hierarchical context and domain-specific filtering as your needs evolve.

Remember that the key to successful contextual retrieval lies in understanding your specific use case and tailoring the contextual information to match your users’ needs and query patterns. With thoughtful implementation, contextual retrieval can transform your RAG system from a simple information lookup tool into an intelligent, context-aware assistant.

Related Posts

Guardrails for LLMs: ensuring secure and reliable AI systems for Loredo bank

Guardrails for LLMs: ensuring secure and reliable AI systems for Loredo bank

The rapid evolution of Large Language Models (LLMs) has unlocked transformative applications, from content generation to automated decision-making. However, deploying LLMs in real-world systems requires robust security and reliability mechanisms. This post explores essential guardrails, the role of Pydantic as an output parser, and security concerns in agentic AI approaches.

Read More
Unlocking the future of conversational AI: A look into voice agent development

Unlocking the future of conversational AI: A look into voice agent development

I’m thrilled to share some insights from a recent demonstration on conversational AI, where I showcased a proof of concept (PoC) that highlights the transformative potential of voice agent development heading into 2025. The rapid pace of advancement in AI technology is opening up a world of possibilities, making scalable and dynamic voice solutions more achievable than ever.

Read More
How to build a Wordle with Flutter

How to build a Wordle with Flutter

I saw last weekend some people sharing random squares on Twitter and I was thinking…

Read More