fix: corrections retours collaborateurs — FP médicaments, N° venue, taille PDF
- Fix critique: whole-word search dans redact_pdf_raster et redact_pdf_vector pour éviter le substring matching (ex: "Luc" dans "FLUCONAZOLE", "TATIN" dans "ATORVASTATINE"). Appliqué à tous les kinds nom/NER. - Ajout regex RE_VENUE_SEJOUR pour N° venue / N° séjour (BACTERIO, Trackare) - DDN multiline élargi: tolère 0-3 lignes entre label DDN et date (tableaux BACTERIO) - N° venue multiline: détection dans tableaux BACTERIO interleaved - Réduction taille PDF raster: 150 DPI + JPEG quality 85 (était 300 DPI PNG) Ratio moyen: 19.5x (était 30-50x) - Score qualité maintenu: 97.0/100 (grade A), 0 régression Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,18 +1,18 @@
|
||||
{
|
||||
"date": "2026-03-11T12:11:24.286697",
|
||||
"date": "2026-03-12T10:24:59.261417",
|
||||
"scores": {
|
||||
"global_score": 97.0,
|
||||
"leak_score": 100.0,
|
||||
"fp_score": 90,
|
||||
"totals": {
|
||||
"documents": 29,
|
||||
"audit_hits": 2804,
|
||||
"audit_hits": 2797,
|
||||
"name_tokens_known": 461,
|
||||
"leak_audit": 0,
|
||||
"leak_occurrences": 0,
|
||||
"leak_regex": 0,
|
||||
"leak_insee_high": 0,
|
||||
"leak_insee_medium": 568,
|
||||
"leak_insee_medium": 569,
|
||||
"fp_medical": 0,
|
||||
"fp_overmasking": 2
|
||||
}
|
||||
@@ -158,7 +158,7 @@
|
||||
"leak_audit": 0,
|
||||
"leak_regex": 0,
|
||||
"leak_insee_high": 0,
|
||||
"leak_insee_medium": 18,
|
||||
"leak_insee_medium": 19,
|
||||
"fp_medical": 0,
|
||||
"fp_overmasking": 0
|
||||
},
|
||||
|
||||
Reference in New Issue
Block a user