feat: ajout RAG CIM-10 avec FAISS + Ollama

Implémente un système RAG (Retrieval Augmented Generation) qui indexe
les documents de référence ATIH (CIM-10 FR 2026, Guide Métho MCO,
CCAM PMSI) et utilise Ollama (mistral-small3.2:24b) pour justifier
et valider le codage CIM-10 des diagnostics.

- Nouveaux modèles Pydantic : RAGSource, Diagnostic étendu (confidence,
  justification, sources_rag) — rétrocompatible
- Module rag_index.py : chunking des 3 PDFs, embedding sentence-camembert-large,
  index FAISS IndexFlatIP (3630 vecteurs)
- Module rag_search.py : recherche FAISS + appel Ollama avec fallback double
- Flag CLI --no-rag pour désactiver l'enrichissement RAG
- 18 nouveaux tests (88/88 passent)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
dom
2026-02-10 17:47:08 +01:00
parent 4a12cd2676
commit 4d6fbef2b9
8 changed files with 885 additions and 4 deletions

View File

@@ -28,9 +28,24 @@ NER_MODEL = "Jean-Baptiste/camembert-ner"
NER_CONFIDENCE_THRESHOLD = 0.80
# --- Configuration RAG ---
RAG_INDEX_DIR = BASE_DIR / "data" / "rag_index"
CIM10_PDF = Path("/home/dom/ai/aivanov_CIM/cim-10-fr_2026_a_usage_pmsi_version_provisoire_111225.pdf")
GUIDE_METHODO_PDF = Path("/home/dom/ai/aivanov_CIM/guide_methodo_mco_2026_version_provisoire.pdf")
CCAM_PDF = Path("/home/dom/ai/aivanov_CIM/actualisation_ccam_descriptive_a_usage_pmsi_v4_2025.pdf")
# --- Modèles de données CIM-10 ---
class RAGSource(BaseModel):
document: str
page: Optional[int] = None
code: Optional[str] = None
extrait: Optional[str] = None
class Sejour(BaseModel):
sexe: Optional[str] = None
age: Optional[int] = None
@@ -47,6 +62,9 @@ class Sejour(BaseModel):
class Diagnostic(BaseModel):
texte: str
cim10_suggestion: Optional[str] = None
cim10_confidence: Optional[str] = None
justification: Optional[str] = None
sources_rag: list[RAGSource] = Field(default_factory=list)
class ActeCCAM(BaseModel):