/blog/Feb 10, 2026
Hybrid BM25 + dense retrieval that actually helps
Why parent-child chunking made the biggest difference for our docs RAG, and the cross-encoder reranker tuning that followed.
airagchroma
When I first wired up dense-vector search for our docs RAG, the metrics looked great in the eval set and the answers looked wrong in production. The usual story.
What moved the needle
- Parent-child chunking. We embed small chunks (for recall) but return the parent section (for synthesis). One line of JSON per parent ID in the metadata and the LLM stopped losing context.
- BM25 in the mix. Dense retrieval is great for paraphrase, terrible for proper nouns and version strings. A 50/50 blend of BM25 and dense, reranked with a cross-encoder, beat either alone.
- Rerank last, and cheaply. A small cross-encoder (
bge-reranker-base) over the top 20 candidates is ~30ms on CPU and lifted our top-3 accuracy by ~11pp.
What didn't
- Fancy embedding fine-tuning. Not worth the ceremony at our corpus size.
- Query rewriting with the big model. Helps sometimes, costs always.
Config sketch
const results = await chroma.query({
queryTexts: [question],
nResults: 40,
where: { tenant: userTenant },
});
const reranked = await reranker.score(question, results.documents);
return reranked.slice(0, 5).map(toParentChunk);Nothing surprising — just the obvious thing, done carefully.