23e19e17e442b9dabf1ee150a568606e30cf9fe2
Add infrastructure for NER-first name validation without changing existing behavior. New code only, quality score remains 100/100. Step 1: Load INSEE family names (219K) and prenoms (33K) as module-level gazetteers (_INSEE_NOMS_FAMILLE, _INSEE_PRENOMS_SET) normalized uppercase without accents. Step 2: Add _run_ner_on_original_text() that runs all available NER models (EDS-Pseudo, GLiNER, CamemBERT-bio) on unmasked text and returns deduplicated NerDetection list. Step 3: Add NerDetection and NameCandidate dataclasses. Modify _extract_document_names and _extract_trackare_identity to also return NameCandidate lists with context_strength (high/medium/low) metadata. Callers updated for new return values. Step 4: Add _cross_validate_name_candidates() implementing decision matrix: high context always accepted, medium/low validated against NER confirmations, INSEE membership, and stopword filtering. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
placer tout les fichiers dans un répertoire. faire un chmod 777 install.sh pour lui donner les droits d'execution lancer ./install.sh pour lancer l'installation complete
L'installation peut prendre du temps, elle charge deux modele IA nlp. Elle crée un environement virtuel python.
Description
Languages
Python
98.2%
Batchfile
1%
PowerShell
0.5%
Shell
0.3%