fix: corrections retours collaborateurs — FP médicaments, N° venue, taille PDF

- Fix critique: whole-word search dans redact_pdf_raster et redact_pdf_vector
  pour éviter le substring matching (ex: "Luc" dans "FLUCONAZOLE",
  "TATIN" dans "ATORVASTATINE"). Appliqué à tous les kinds nom/NER.
- Ajout regex RE_VENUE_SEJOUR pour N° venue / N° séjour (BACTERIO, Trackare)
- DDN multiline élargi: tolère 0-3 lignes entre label DDN et date (tableaux BACTERIO)
- N° venue multiline: détection dans tableaux BACTERIO interleaved
- Réduction taille PDF raster: 150 DPI + JPEG quality 85 (était 300 DPI PNG)
  Ratio moyen: 19.5x (était 30-50x)
- Score qualité maintenu: 97.0/100 (grade A), 0 régression

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-12 10:38:27 +01:00
parent eb14cd219d
commit a827d860f1
2 changed files with 105 additions and 34 deletions

View File

@@ -1,18 +1,18 @@
{
"date": "2026-03-11T12:11:24.286697",
"date": "2026-03-12T10:24:59.261417",
"scores": {
"global_score": 97.0,
"leak_score": 100.0,
"fp_score": 90,
"totals": {
"documents": 29,
"audit_hits": 2804,
"audit_hits": 2797,
"name_tokens_known": 461,
"leak_audit": 0,
"leak_occurrences": 0,
"leak_regex": 0,
"leak_insee_high": 0,
"leak_insee_medium": 568,
"leak_insee_medium": 569,
"fp_medical": 0,
"fp_overmasking": 2
}
@@ -158,7 +158,7 @@
"leak_audit": 0,
"leak_regex": 0,
"leak_insee_high": 0,
"leak_insee_medium": 18,
"leak_insee_medium": 19,
"fp_medical": 0,
"fp_overmasking": 0
},