RAGGAE: A multipurpose local RAG system for Adservio

Olivier Vitrac, PhD., HDR | olivier.vitrac@adservio.fr – 2025-10-24

Summary

This note discusses the design of a generic RAG/embeddings library can serve CVs, reports, and tenders, which relies on different document adapters using a shared semantic core (retrieval + re-rank + annotation + scoring). A hybrid (dense+sparse) + cross-encoder is proposed. The POC adds domain-tuning and NLI checks, and is designed from day one for traceability (provenance spans, scores, reasons). The whole system is designed to run on minimal infrastructure: fully local MVP – GPU with 8 GB VRAM and possibly running on CPU.

Access to all files, read this file in PDF


1 | Technical Review

1.1 | Embedding options (and when to use which)

A. Dense text embeddings (bi-encoders) — default for RAG

B. Cross-encoders (re-rankers) — for precision at the top

C. Hybrid retrieval (dense + sparse) — when vocabulary matters

D. Domain-tuned embeddings — when your domain dominates

E. Multilingual & French

F. Long-document strategies (tenders/CVs/reports)


1.2 | “Semantic analysis” we’ll want beyond embeddings

We think of this as signals layered on top of retrieval:

These features feed your scoring/ranking (fit, risk, attractiveness) and later your form pre-fill.


1.3 | Can one library handle CVs, reports, tenders? (Yes—if you design it right)

Design a document-agnostic semantic layer with adapters:

Result: same embedding/retrieval engine, different adapters and scoring logic.


1.4 | Minimal technical blueprint


1.5 | Choosing an embedding setup (quick decision guide)


Absolutely feasible locally: E5-small + BM25 + optional cross-encoder, FAISS index, Ollama (7–8B Q4) for NLI/extraction.

One generic library with adapters lets you handle tenders, CVs, and reports with the same semantic core.1.6 | Ranking & classification for tenders


1.7 | Toward pre-filling response forms (step 2)


1.8 | Evaluation of prototype from day 1


1.9 | Practical shortlist (safe bets to prototype)


Bottom line


 

2 | Local MVP stack (FR/EN tenders, CVs, reports)

2.1 | Retrieval (dense)

2.2 | Re-ranking (cross-encoder)

2.3 | Sparse retrieval (for jargon & exact clauses)

2.4 | Vector store

2.5 | Parsers & chunking

1.6 | Local NLI/extraction (for “does this clause match?” and pre-fill)**


3 | Minimal pipeline (drop-in code)

Swap in a cross-encoder re-ranker later (e.g., jinaai/jina-reranker-v1-base-multilingual) on the hits[:100] to boost precision@5.


4 | Using Ollama locally (NLI/extraction)


5 | What fits in 8 GB VRAM (comfortably)


6 | Can the same lib read CVs, reports, tenders? Yes — via adapters

Keep a shared semantic core and add thin adapters:

All three reuse the same: parser → chunker → embeddings → FAISS/BM25 → (optional) reranker → scorers.


7 | Folder scaffold (ready to uv/pip)


8 | Early-phase eval (so you can show value next week)


TL;DR