feat(phase2): Multi-signal NER — BDPM gazetteers, confiance EDS, safe patterns, GLiNER

Chantier 1: Intégration BDPM (5737 médicaments officiels) dans medication whitelist
Chantier 2: Safe patterns contextuels (dosages mg/mL/cpr, formes pharma, même ligne)
Chantier 3: Scores de confiance NER réels (edsnlp 0.20 ner_confidence_score)
Chantier 4: GLiNER zero-shot (urchade/gliner_multi_pii-v1) en vote croisé
Chantier 5: Scripts export silver annotations + fine-tuning CamemBERT-bio

0 fuite, 0 régression, -18 FP supplémentaires éliminés.
Sécurité: GLiNER ne peut rejeter que si confiance NER < 0.70.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-09 12:01:46 +01:00
parent 782551c1c6
commit 26ac02b0cb
16 changed files with 6431 additions and 41 deletions

View File

@@ -64,6 +64,12 @@ class EdsPseudoManager:
self._nlp = edsnlp.load(path)
else:
self._nlp = edsnlp.load(model_id_or_path)
# Activer les scores de confiance NER (edsnlp >= 0.16)
try:
ner_pipe = self._nlp.get_pipe('ner')
ner_pipe.compute_confidence_score = True
except Exception:
pass # versions plus anciennes sans support confiance
self._loaded = True
def unload(self) -> None:
@@ -100,12 +106,15 @@ class EdsPseudoManager:
mapped = EDS_LABEL_MAP.get(label, None)
if mapped is None:
continue
# Score de confiance réel si disponible (edsnlp >= 0.16)
raw_score = getattr(ent._, 'ner_confidence_score', None)
conf = raw_score if isinstance(raw_score, float) else 1.0
ents.append({
"entity_group": label,
"word": ent.text,
"start": ent.start_char,
"end": ent.end_char,
"score": 1.0, # edsnlp ne fournit pas de score de confiance
"score": conf,
"eds_mapped_key": mapped,
})
out.append(ents)