Domi31tls 23e19e17e4 feat(ner-first): add NER-first architecture scaffolding (steps 1-4)
Add infrastructure for NER-first name validation without changing
existing behavior. New code only, quality score remains 100/100.

Step 1: Load INSEE family names (219K) and prenoms (33K) as
  module-level gazetteers (_INSEE_NOMS_FAMILLE, _INSEE_PRENOMS_SET)
  normalized uppercase without accents.

Step 2: Add _run_ner_on_original_text() that runs all available NER
  models (EDS-Pseudo, GLiNER, CamemBERT-bio) on unmasked text and
  returns deduplicated NerDetection list.

Step 3: Add NerDetection and NameCandidate dataclasses. Modify
  _extract_document_names and _extract_trackare_identity to also
  return NameCandidate lists with context_strength (high/medium/low)
  metadata. Callers updated for new return values.

Step 4: Add _cross_validate_name_candidates() implementing decision
  matrix: high context always accepted, medium/low validated against
  NER confirmations, INSEE membership, and stopword filtering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 08:31:44 +02:00

placer tout les fichiers dans un répertoire. faire un chmod 777 install.sh pour lui donner les droits d'execution lancer ./install.sh pour lancer l'installation complete

L'installation peut prendre du temps, elle charge deux modele IA nlp. Elle crée un environement virtuel python.

Description
No description provided
Readme 247 MiB
Languages
Python 98.2%
Batchfile 1%
PowerShell 0.5%
Shell 0.3%