← Accueil

RAGGAE

Retrieval-Augmented Generation Generalized Architecture for Enterprise

A multipurpose local RAG system for processing and analyzing documents (tenders, CVs, reports) with semantic search, hybrid retrieval, and NLI-based compliance scoring.

Python License: MIT FastAPI


Table of Contents


Overview

RAGGAE is a production-ready, modular Retrieval-Augmented Generation (RAG) system designed to run entirely on local infrastructure. It combines:

The system is designed with a document-agnostic semantic core and pluggable adapters for different document types (PDFs, DOCX, ODT, TXT, MD), making it suitable for:


Key Features

Fully Local: No external APIs required—runs on CPU or GPU (8GB VRAM sufficient)

🔍 Hybrid Retrieval: Dense (FAISS) + Sparse (BM25) with configurable fusion

📄 Multi-Format Support: PDF, DOCX, ODT, TXT, MD with layout-aware parsing

🎯 NLI Compliance: Automatic requirement satisfaction checking via Ollama (Mistral, Llama3)

📊 Fit Scoring: Weighted requirement verdicts with exportable audit trails (JSON, CSV)

🌐 Web UI: Modern, responsive interface for upload, index, search, and scoring

🔌 RESTful API: FastAPI backend for integration with existing workflows

🧪 Fully Tested: Comprehensive test suite with mocked NLI for CI/CD

🌍 Multilingual: FR/EN support with E5 embeddings; extensible to other languages

📦 Extensible: Pluggable document adapters, embedding providers, and scoring strategies


Architecture

System Architecture

Output

Interface Layer

Intelligence Layer

Semantic Core

Parsing Layer

Document Input

Documents: PDF, DOCX, TXT, ODT, MD

PDF Parser
PyMuPDF

Text Loaders
DOCX/ODT/TXT/MD

Embedding Provider
STBiEncoder
multilingual-e5-small

FAISS Index
Inner Product
Cosine Similarity

BM25Okapi
Sparse Retrieval

Hybrid Retriever
α·dense + (1-α)·sparse

NLI Client
Ollama: Mistral/Llama3

Fit Scorer
Weighted Verdicts

CLI Tools
index_doc, search, quickscore

FastAPI
RESTful Endpoints

Web UI
HTML5 + Vanilla JS

Search Results
Scored Hits
Provenance

Audit Trail
JSON/CSV Export

Component Diagram

Web

CLI

IO Modules

Core Modules

uses

uses

uses

embeddings.py
EmbeddingProvider
STBiEncoder

index_faiss.py
FaissIndex
Record

retriever.py
HybridRetriever
Hit

scoring.py
FitScorer
RequirementVerdict

nli_ollama.py
NLIClient
NLIResult

pdf.py
PDFBlock
extract_blocks

textloaders.py
TextBlock
load_*

index_doc.py

search.py

quickscore.py

demo_app.py
FastAPI

index.html

script.js

styles.css

Data Flow

ScorerNLI/OllamaHybridBM25FAISSEmbeddingsParserFastAPIWeb UIUserScorerNLI/OllamaHybridBM25FAISSEmbeddingsParserFastAPIWeb UIUserPhase 1: IndexingPhase 2: SearchPhase 3: Quickscorestop checking further clausesbreak[If label is Yes]loop[For each clause]loop[For each requirement]Upload documentsPOST /upload (files)Save to uploads/Return keySubmit index requestPOST /index {key, index_path}Parse documentsList[Block]Embed textsEmbeddings (384-dim)Build FAISS indexBuild BM25 indexIndex saved{indexed: N, files: [...]}Enter queryPOST /search {query, k}Embed queryQuery vectorDense search (top-K)Dense hitsSparse scoresBM25 scoresFuse scores (α·dense + (1-α)·sparse)Ranked hits{hits: [{score, page, snippet}]}Display resultsEnter requirementsPOST /quickscore {requirements, topk}Embed each requirementRequirement vectorsSearch top-K clausesCandidate clausesNLI check (clause, req){label, rationale}Compute fit scoreWeighted score (0-100){fit_score, verdicts[]}Display verdicts + audit trail

Project Structure


Installation

Prerequisites

Environment Setup

Environment file (env-adservio-raggae.yml):

Option 2: pip + venv

GPU Support (Optional)

If you have a CUDA-capable GPU:

Dependencies

Core:

Parsing:

Web:

Testing:


Usage

CLI Tools

1. Index Documents

Output:

Supported flags:

Output:

3. Quickscore (NLI Compliance)

Output:

Prerequisites: Ollama must be running with a model (e.g., mistral)


Web Application

Start the API Server

Access the UI

Open http://localhost:8000 in your browser.

Features:

Keyboard shortcuts:


API Endpoints

Base URL: http://localhost:8000

Health Check

Response:

Upload Documents

Single file or ZIP:

Response:

Multiple files:

Response:

Index Documents

Response:

Response:

Quickscore (NLI)

Response:

Export Quickscore


Core Concepts

Hybrid Retrieval

RAGGAE combines dense (semantic) and sparse (lexical) retrieval:

  1. Dense: Sentence-Transformers bi-encoder (e.g., E5-small) → 384-dim vectors → FAISS inner-product search

  2. Sparse: BM25 on tokenized text (exact term matching)

  3. Fusion: score = α·dense + (1-α)·sparse (default α=0.6)

Why hybrid?

NLI-based Compliance Checking

Natural Language Inference (NLI) determines if a clause satisfies a requirement:

Example:

Robustness:

Fit Scoring

Aggregate compliance across multiple requirements:

Weights:

Document Adapters

Adapters translate document-specific formats into a unified Block abstraction:

Future adapters (in adapters/):


Extension Points

Custom Embedding Models

Custom Scoring Strategies

Custom Document Adapters

Multi-Stage Re-Ranking

Pluggable Vector Stores


Testing

Test structure:

Mocking Ollama for CI:


Development

Code Style

Adding Documentation

All modules, classes, and public functions include docstrings:

Versioning

Semantic versioning: MAJOR.MINOR.PATCH


Performance Considerations

Embedding Speed

ModelDimCPU (docs/sec)GPU (docs/sec)VRAM (8GB)
multilingual-e5-small384~30~200
multilingual-e5-base768~15~120
gte-base-en-v1.5768~18~150

Optimization:

FAISS Index Types

TypeSearch SpeedMemoryAccuracy
IndexFlatIPFast (exact)High100%
IndexIVFFlatVery fastMedium~99%
IndexHNSWFlatFastestHighest~98%

When to upgrade:

NLI Latency

ModelQuantizationLatency (per check)VRAM
mistral:7bQ4_K_M~2-3s4-5GB
llama3:8bQ4_K_M~3-4s5-6GB
phi-3:miniQ4_K_M~1-2s2-3GB

Optimization:


Troubleshooting

CUDA Not Available

Symptom: torch.cuda.is_available() == False

Solution:

Verify:

Ollama Connection Error

Symptom: requests.exceptions.ConnectionError: Ollama not running

Solution:

NumPy broadcast_to Import Error

Symptom: AttributeError: module 'numpy' has no attribute 'broadcast_to'

Solution:

FAISS Index Dimension Mismatch

Symptom: AssertionError: d == index.d

Cause: Embedding model changed between indexing and search.

Solution:

Web UI Not Loading

Symptom: 404 Not Found or blank page

Solution:


Contributing

Contributions are welcome! Please follow these guidelines:

  1. Fork the repository

  2. Create a feature branch: git checkout -b feature/amazing-feature

  3. Add tests for new functionality

  4. Ensure tests pass: pytest

  5. Format code: black cli/ tests/

  6. Commit: git commit -m "Add amazing feature"

  7. Push: git push origin feature/amazing-feature

  8. Open a Pull Request

Code review checklist:


License

This project is licensed under the MIT License - see the LICENSE file for details.


Authors

Dr. Olivier Vitrac, PhD, HDR


Acknowledgments

Inspirations:


Citation

If you use RAGGAE in your research or production systems, please cite:


Appendix

A. Mermaid Diagram: Module Dependency Graph

CLI

IO

Core

External

used by

used by

used by

used by

used by

provides

provides

provides

provides

provides

provides

provides

sentence-transformers

faiss-cpu/gpu

rank-bm25

ollama

PyMuPDF

embeddings.py

index_faiss.py

retriever.py

scoring.py

nli_ollama.py

pdf.py

textloaders.py

index_doc.py

search.py

quickscore.py

demo_app.py

B. Extension Roadmap

2025-10-052025-10-122025-10-192025-10-262025-11-022025-11-092025-11-162025-11-232025-11-302025-12-072025-12-142025-12-212025-12-282026-01-042026-01-112026-01-182026-01-252026-02-01Hybrid retrieval (dense + sparse) NLI-based compliance checking Fit scoring with weights PDF + DOCX + TXT loaders FAISS embedded index Web UI (upload, search, score) Export audit trails (JSON/CSV) Cross-encoder re-ranking TenderAdapter (lots, requirements) Bulk batch processing Domain-tuned embeddings (fine-tune) CVAdapter (skills, experience) Qdrant server integration Advanced filters (date, tags) ReportAdapter (sections, tables) Persistent caching (Redis) CoreAdaptersInfraUI/UXRAGGAE Roadmap

End of README

For questions, issues, or feature requests, please open an issue on GitHub or contact olivier.vitrac@adservio.com.