dom
f4a23a5f43
feat: qualité anonymisation — sur-anonymisation, fuites PHI, nettoyage bruit
P0-A: stop words français + seuil subparts 5 chars + sweep conditionnel
P0-B: 6 nouveaux patterns PHI (DDN, Par, N Ipp, Adresse, DEMANDE, venue)
P2-C: cohérence pseudonymes (_find_matching_entity) + fix crochets
P1-B: text_cleaner.py — sidebar OCR, footers, dédup vitales, collapse blanks
P1-A: dédup CRH par SequenceMatcher (seuil 85%)
Tests: 34 nouveaux tests (996 pass, 0 fail)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 14:00:07 +01:00
..
2026-02-24 14:35:15 +01:00
2026-02-10 15:24:12 +01:00
2026-02-20 10:06:11 +01:00
2026-02-25 14:00:07 +01:00
2026-02-20 00:39:07 +01:00
2026-02-20 11:00:53 +01:00
2026-02-11 11:41:39 +01:00
2026-02-11 12:43:34 +01:00
2026-02-13 14:03:10 +01:00
2026-02-17 21:47:27 +01:00
2026-02-12 13:44:34 +01:00
2026-02-23 09:19:43 +01:00
2026-02-20 11:01:06 +01:00
2026-02-12 23:46:42 +01:00
2026-02-20 08:37:10 +01:00
2026-02-24 17:50:32 +01:00
2026-02-24 14:35:57 +01:00
2026-02-24 14:35:57 +01:00
2026-02-11 08:53:14 +01:00
2026-02-24 13:28:54 +01:00
2026-02-24 13:28:54 +01:00
2026-02-12 13:44:34 +01:00
2026-02-10 15:24:12 +01:00
2026-02-17 21:47:27 +01:00
2026-02-18 20:59:50 +01:00
2026-02-20 00:21:09 +01:00
2026-02-20 13:33:39 +01:00
2026-02-17 17:53:53 +01:00
2026-02-15 11:34:32 +01:00
2026-02-12 23:12:39 +01:00
2026-02-12 13:44:34 +01:00
2026-02-17 17:53:53 +01:00
2026-02-17 17:53:53 +01:00
2026-02-20 00:21:09 +01:00