RAG Beyond the Basics: Advanced Retrieval Strategies for Financial Documents
Moving past naive RAG — advanced chunking, hybrid search, reranking, and evaluation strategies for building retrieval systems that work on complex financial documents.
Introduction
Retrieval-Augmented Generation (RAG) has become the de facto pattern for grounding LLMs in factual data. But the gap between a demo RAG and a production RAG system is enormous — especially in financial services where accuracy is non-negotiable.
This post explores the techniques that take RAG from "works in a notebook" to "trusted in production."
The Naive RAG Problem
The basic RAG pipeline is deceptively simple:
# Naive RAG — works in demos, fails in production
chunks = split_document(document, chunk_size=512)
embeddings = embed(chunks)
index.add(embeddings)
# At query time
query_embedding = embed(query)
relevant_chunks = index.search(query_embedding, top_k=5)
answer = llm.generate(query, context=relevant_chunks)This fails on financial documents because:
- Tables and figures are destroyed by naive chunking
- Cross-references between sections are lost
- Numerical precision matters — "revenue increased 2.3M" is a critical distinction
Strategy 1: Semantic Chunking
Instead of fixed-size chunks, use semantic boundaries:
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Financial-document-aware chunking
splitter = RecursiveCharacterTextSplitter(
separators=[
"\n## ", # Major sections
"\n### ", # Subsections
"\n\n", # Paragraphs
"\n", # Lines
". ", # Sentences
],
chunk_size=1000,
chunk_overlap=200,
)Strategy 2: Hybrid Search
Combine dense (vector) and sparse (BM25) retrieval:
Where is tuned per use case (typically 0.5–0.7 for financial docs).
Strategy 3: Reranking
After initial retrieval, apply a cross-encoder reranker for precision:
from sentence_transformers import CrossEncoder
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-12-v2")
pairs = [(query, chunk.text) for chunk in candidates]
scores = reranker.predict(pairs)
# Re-sort by reranker scores
reranked = sorted(zip(candidates, scores), key=lambda x: x[1], reverse=True)Strategy 4: Evaluation
You can't improve what you can't measure. Key metrics:
| Metric | What it measures | Target |
|---|---|---|
| Recall@k | Are relevant docs in top-k? | > 0.90 |
| MRR | Is relevant doc ranked first? | > 0.75 |
| Faithfulness | Does answer match retrieved context? | > 0.95 |
| Answer relevance | Does answer address the query? | > 0.85 |
Key Takeaways
- Chunking strategy is the highest-leverage optimization — get this right before tuning embeddings
- Hybrid search consistently outperforms pure vector search on financial documents
- Reranking adds 10-15% precision for minimal latency cost
- Build evaluation into your pipeline from day one — use frameworks like RAGAS
References
- Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" (2020)
- Es et al., "RAGAS: Automated Evaluation of Retrieval Augmented Generation" (2023)
Written by
Rohit Raj
Senior AI Engineer @ American Express