feat: dictionnaire CIM-10 complet (10 893 codes) + robustesse regex
- Nouveau module cim10_dict.py : extraction depuis metadata.json FAISS, lookup intelligent avec normalisation Unicode (accents, trémas, apostrophes) - cim10_extractor : _lookup_cim10 utilise le dictionnaire complet, _find_dp normalisé, _find_das élargi à 20 patterns (cardio, métabo, infectieux, rénal...), biologie +6 tests (TGO/TGP, Hb, créatinine), traitements sans limite de lignes - document_classifier : scoring pondéré, classify_with_confidence(), scan 5000 chars - CLI --build-dict pour regénérer data/cim10_dict.json - 32 nouveaux tests unitaires (124 total, 0 échec) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -38,6 +38,7 @@ OLLAMA_TIMEOUT = 120
|
||||
# --- Configuration RAG ---
|
||||
|
||||
RAG_INDEX_DIR = BASE_DIR / "data" / "rag_index"
|
||||
CIM10_DICT_PATH = BASE_DIR / "data" / "cim10_dict.json"
|
||||
CIM10_PDF = Path("/home/dom/ai/aivanov_CIM/cim-10-fr_2026_a_usage_pmsi_version_provisoire_111225.pdf")
|
||||
GUIDE_METHODO_PDF = Path("/home/dom/ai/aivanov_CIM/guide_methodo_mco_2026_version_provisoire.pdf")
|
||||
CCAM_PDF = Path("/home/dom/ai/aivanov_CIM/actualisation_ccam_descriptive_a_usage_pmsi_v4_2025.pdf")
|
||||
@@ -71,6 +72,7 @@ class Diagnostic(BaseModel):
|
||||
cim10_suggestion: Optional[str] = None
|
||||
cim10_confidence: Optional[str] = None
|
||||
justification: Optional[str] = None
|
||||
raisonnement: Optional[str] = None
|
||||
sources_rag: list[RAGSource] = Field(default_factory=list)
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user