Back to Blog
AI ResearchRAGNLPSearchLLM

RAG Beyond the Basics: Advanced Retrieval Strategies for Financial Documents

Moving past naive RAG — advanced chunking, hybrid search, reranking, and evaluation strategies for building retrieval systems that work on complex financial documents.

Rohit Raj··3 min read

Introduction

Retrieval-Augmented Generation (RAG) has become the de facto pattern for grounding LLMs in factual data. But the gap between a demo RAG and a production RAG system is enormous — especially in financial services where accuracy is non-negotiable.

This post explores the techniques that take RAG from "works in a notebook" to "trusted in production."

The Naive RAG Problem

The basic RAG pipeline is deceptively simple:

python
# Naive RAG — works in demos, fails in production
chunks = split_document(document, chunk_size=512)
embeddings = embed(chunks)
index.add(embeddings)
 
# At query time
query_embedding = embed(query)
relevant_chunks = index.search(query_embedding, top_k=5)
answer = llm.generate(query, context=relevant_chunks)

This fails on financial documents because:

  • Tables and figures are destroyed by naive chunking
  • Cross-references between sections are lost
  • Numerical precision matters — "revenue increased 2.3B"vs."2.3B" vs. "2.3M" is a critical distinction

Strategy 1: Semantic Chunking

Instead of fixed-size chunks, use semantic boundaries:

python
from langchain.text_splitter import RecursiveCharacterTextSplitter
 
# Financial-document-aware chunking
splitter = RecursiveCharacterTextSplitter(
    separators=[
        "\n## ",      # Major sections
        "\n### ",     # Subsections
        "\n\n",       # Paragraphs
        "\n",         # Lines
        ". ",         # Sentences
    ],
    chunk_size=1000,
    chunk_overlap=200,
)

Strategy 2: Hybrid Search

Combine dense (vector) and sparse (BM25) retrieval:

score(q,d)=αcos(q,d)+(1α)BM25(q,d)\text{score}(q, d) = \alpha \cdot \text{cos}(\mathbf{q}, \mathbf{d}) + (1 - \alpha) \cdot \text{BM25}(q, d)

Where α\alpha is tuned per use case (typically 0.5–0.7 for financial docs).

Strategy 3: Reranking

After initial retrieval, apply a cross-encoder reranker for precision:

python
from sentence_transformers import CrossEncoder
 
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-12-v2")
pairs = [(query, chunk.text) for chunk in candidates]
scores = reranker.predict(pairs)
 
# Re-sort by reranker scores
reranked = sorted(zip(candidates, scores), key=lambda x: x[1], reverse=True)

Strategy 4: Evaluation

You can't improve what you can't measure. Key metrics:

MetricWhat it measuresTarget
Recall@kAre relevant docs in top-k?> 0.90
MRRIs relevant doc ranked first?> 0.75
FaithfulnessDoes answer match retrieved context?> 0.95
Answer relevanceDoes answer address the query?> 0.85

Key Takeaways

  1. Chunking strategy is the highest-leverage optimization — get this right before tuning embeddings
  2. Hybrid search consistently outperforms pure vector search on financial documents
  3. Reranking adds 10-15% precision for minimal latency cost
  4. Build evaluation into your pipeline from day one — use frameworks like RAGAS

References

  • Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" (2020)
  • Es et al., "RAGAS: Automated Evaluation of Retrieval Augmented Generation" (2023)

Written by

Rohit Raj

Senior AI Engineer @ American Express

More posts →