feat: dictionnaire CCAM complet (8 257 codes) + index FAISS enrichi + validation actes

Phase 2 (CCAM) :
- Nouveau src/medical/ccam_dict.py : build depuis CCAM_V81.xls via xlrd, lookup 3 niveaux, validation codes
- Intégration dans l'extracteur : fallback ccam_lookup + _validate_ccam() avec alertes
- CLI : --build-ccam-dict, --rebuild-index

Phase 3 (FAISS) :
- Chunks CCAM depuis le dictionnaire JSON (priorité sur le PDF)
- Chunks CIM-10 index alphabétique (terme → code)
- Priorisation cim10_alpha dans la recherche RAG

Viewer : endpoint reprocess + bloc scripts
Tests : 8 tests CCAM + tests raisonnement RAG (161 passed)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
dom
2026-02-11 11:41:39 +01:00
parent 9df4465fef
commit 7e69f994b0
10 changed files with 893 additions and 46 deletions

View File

@@ -74,8 +74,8 @@ def search_similar(query: str, top_k: int = 10) -> list[dict]:
raw_results.append(meta)
# Prioriser les sources CIM-10 (au moins 6 sur top_k)
cim10_results = [r for r in raw_results if r["document"] == "cim10"]
other_results = [r for r in raw_results if r["document"] != "cim10"]
cim10_results = [r for r in raw_results if r["document"] in ("cim10", "cim10_alpha")]
other_results = [r for r in raw_results if r["document"] not in ("cim10", "cim10_alpha")]
min_cim10 = min(6, len(cim10_results))
final = cim10_results[:min_cim10]
@@ -150,6 +150,7 @@ def _build_prompt(texte: str, sources: list[dict], contexte: dict, est_dp: bool
for i, src in enumerate(sources, 1):
doc_name = {
"cim10": "CIM-10 FR 2026",
"cim10_alpha": "CIM-10 Index Alphabétique 2026",
"guide_methodo": "Guide Méthodologique MCO 2026",
"ccam": "CCAM PMSI V4 2025",
}.get(src["document"], src["document"])