From 9ad7833d21d82c538939e1e227e06a5d99973b60 Mon Sep 17 00:00:00 2001 From: Domi31tls Date: Thu, 25 Jun 2026 22:25:01 +0200 Subject: [PATCH] =?UTF-8?q?docs(beta):=20plan=201b=20=E2=80=94=20c=C3=A2bl?= =?UTF-8?q?age=20des=207=20toggles=20cat=C3=A9gories=20au=20moteur=20(P1-2?= =?UTF-8?q?)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Plan TDD du gating par catégorie : infra disabled_kinds + _CATEGORY_OF (default-deny) + filtre audit Tier 1 (porteur de sûreté PDF), relaxation rescan résiduel NIR/TEL, gates texte Tier 2/3 (dispatchers + selective_rescan + NER + phase-0), garde-fou adresse burn, câblage GUI 7 booléens. Tests comportementaux par catégorie + baseline non-régression. CODE SÉCURITÉ — revue Qwen obligatoire. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...-06-25-gui-v6-beta-plan-1b-gating-coeur.md | 247 ++++++++++++++++++ 1 file changed, 247 insertions(+) create mode 100644 docs/superpowers/plans/2026-06-25-gui-v6-beta-plan-1b-gating-coeur.md diff --git a/docs/superpowers/plans/2026-06-25-gui-v6-beta-plan-1b-gating-coeur.md b/docs/superpowers/plans/2026-06-25-gui-v6-beta-plan-1b-gating-coeur.md new file mode 100644 index 0000000..7f308d2 --- /dev/null +++ b/docs/superpowers/plans/2026-06-25-gui-v6-beta-plan-1b-gating-coeur.md @@ -0,0 +1,247 @@ +# GUI V6 bêta — Plan 1b : câblage des 7 toggles « Données à détecter » au moteur + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development or superpowers:executing-plans, task-by-task. Steps use checkbox (`- [ ]`). **CODE SÉCURITÉ — revue Qwen obligatoire** (décision spec P1-2). + +**Goal:** Rendre les 7 interrupteurs « Données à détecter » réellement effectifs : décocher une catégorie la laisse en clair en sortie (texte ET PDF) et relâche le filet de sécurité pour cette catégorie — sans jamais démasquer une catégorie non décochée. + +**Architecture:** Masquage inline éclaté (3 passes, ~50 sites, pas de chokepoint). On porte un `disabled_kinds: set[str]` via `cfg` (déjà threadé partout) et on applique un **filtre 3-tiers** : (T1) filtrer l'`audit` avant le burn PDF = porteur de sûreté pour le livrable PDF, **default-deny** ; (T2) gater le texte aux fonctions dispatcher + `selective_rescan` ; (T3) gater les blocs phase-0 multiline. Plus la relaxation du rescan résiduel (NIR/TEL) et un garde-fou adresse. Filet de validation = **tests comportementaux end-to-end par catégorie**. + +**Tech Stack:** Python, pytest. Fichier cœur `anonymizer_core_refactored_onnx.py` (5731 l.) + `gui_v6/`. + +**Référence spec :** `docs/superpowers/specs/2026-06-25-gui-v6-beta-prod-design.md` (chantier D, P1-2, décisions D2/D3 : pas de plancher dur ; `EMAIL/IBAN/IPP/VILLE/FAX` non toggleables = toujours masqués). + +**Mapping catégorie → kinds d'audit** (`_CATEGORY_OF`, default-deny : tout kind absent reste masqué) : +- NOM ← NOM, NOM_FORCE, NOM_GLOBAL, NOM_EXTRACTED, NOM_INITIAL, NER_PER, EDS_NOM, EDS_PRENOM +- DATE_NAISSANCE ← DATE_NAISSANCE, DATE_NAISSANCE_GLOBAL +- ETAB ← ETAB, ETAB_FINESS, ETAB_SPACED, ETAB_GLOBAL, NER_ORG, EDS_HOPITAL +- ADRESSE ← ADRESSE, ADDR_FINESS, EDS_ADRESSE *(VILLE/NER_LOC restent toujours masqués — hors des 7 toggles)* +- NIR ← NIR +- TEL ← TEL *(FAX reste toujours masqué)* +- ADHERENT ← ADHERENT + +--- + +### Task 1 : Infrastructure — `disabled_kinds` + `_CATEGORY_OF` + filtre audit (Tier 1) + +**Files:** Modify `anonymizer_core_refactored_onnx.py` (add `_CATEGORY_OF`/`_category_of` near placeholders ~l.610 ; add `disabled_kinds` kwarg to `process_pdf` ~l.4973 ; inject into `cfg` after ~l.5002 ; add the audit filter before the PDF write ~l.5553). Test `tests/unit/test_core_category_gating.py`. + +- [ ] **Step 1 — Failing test (audit filter + default-deny).** Create `tests/unit/test_core_category_gating.py`: + +```python +import anonymizer_core_refactored_onnx as core + + +def test_category_of_maps_known_kinds(): + assert core._category_of("NOM_FORCE") == "NOM" + assert core._category_of("NER_PER") == "NOM" + assert core._category_of("EDS_HOPITAL") == "ETAB" + assert core._category_of("ADDR_FINESS") == "ADRESSE" + assert core._category_of("NIR") == "NIR" + assert core._category_of("TEL") == "TEL" + assert core._category_of("ADHERENT") == "ADHERENT" + + +def test_category_of_default_deny_for_unknown(): + # Un kind non mappé NE doit JAMAIS être filtrable (reste masqué). Sécurité. + assert core._category_of("EMAIL") is None + assert core._category_of("IBAN") is None + assert core._category_of("VILLE") is None + assert core._category_of("FAX") is None + assert core._category_of("INCONNU_XYZ") is None + + +def test_filter_audit_drops_only_disabled_categories(): + PiiHit = core.PiiHit + audit = [ + PiiHit(1, "NOM", "Dupont", "[NOM]"), + PiiHit(1, "NIR", "1850574...", "[NIR]"), + PiiHit(1, "EMAIL", "x@y.fr", "[EMAIL]"), + ] + kept = core._filter_audit_by_disabled(audit, {"NIR"}) + kinds = {h.kind for h in kept} + assert "NIR" not in kinds # NIR décoché → retiré + assert "NOM" in kinds # non décoché → conservé + assert "EMAIL" in kinds # non toggleable → toujours conservé +``` + +- [ ] **Step 2 — Run, expect FAIL** (`_category_of`/`_filter_audit_by_disabled` absent): `.venv/bin/pytest tests/unit/test_core_category_gating.py -v`. + +- [ ] **Step 3 — Implement.** In `anonymizer_core_refactored_onnx.py`, after the `PLACEHOLDERS`/`CRITICAL_PII_KEYS` block (~l.610), add: + +```python +# --- Gating par catégorie (toggles GUI « Données à détecter ») ------------- +# Mappe chaque kind d'audit vers l'une des 7 catégories toggleables. Tout kind +# ABSENT de cette table est NON filtrable (default-deny → reste masqué). Les +# catégories non toggleables (EMAIL/IBAN/IPP/VILLE/FAX/…) ne figurent pas ici. +_CATEGORY_OF: dict[str, str] = { + "NOM": "NOM", "NOM_FORCE": "NOM", "NOM_GLOBAL": "NOM", + "NOM_EXTRACTED": "NOM", "NOM_INITIAL": "NOM", + "NER_PER": "NOM", "EDS_NOM": "NOM", "EDS_PRENOM": "NOM", + "DATE_NAISSANCE": "DATE_NAISSANCE", "DATE_NAISSANCE_GLOBAL": "DATE_NAISSANCE", + "ETAB": "ETAB", "ETAB_FINESS": "ETAB", "ETAB_SPACED": "ETAB", + "ETAB_GLOBAL": "ETAB", "NER_ORG": "ETAB", "EDS_HOPITAL": "ETAB", + "ADRESSE": "ADRESSE", "ADDR_FINESS": "ADRESSE", "EDS_ADRESSE": "ADRESSE", + "NIR": "NIR", + "TEL": "TEL", + "ADHERENT": "ADHERENT", +} + + +def _category_of(kind: str) -> str | None: + """Catégorie toggleable d'un kind d'audit, ou None si non toggleable.""" + return _CATEGORY_OF.get(kind) + + +def _filter_audit_by_disabled(audit: list, disabled_kinds: set) -> list: + """Retire de l'audit les hits dont la catégorie est désactivée (default-deny).""" + if not disabled_kinds: + return audit + return [h for h in audit if _category_of(h.kind) not in disabled_kinds] +``` + +Add the kwarg to `process_pdf` (signature ~l.4973-4987): append `disabled_kinds: set = None,`. After `cfg = load_dictionaries(config_path)` (~l.5002), add: +```python + cfg["disabled_kinds"] = set(disabled_kinds or ()) +``` +Before the PDF-writing block (~l.5553, right before `if make_vector_redaction:`), add: +```python + # Tier 1 : retirer du livrable PDF les catégories désactivées par l'utilisateur. + anon.audit = _filter_audit_by_disabled(anon.audit, cfg.get("disabled_kinds") or set()) +``` +(Adapt `anon.audit` to the actual audit variable name at that point — read the surrounding code; it is the list of `PiiHit` passed to `redact_pdf_vector`/`redact_pdf_raster`.) + +- [ ] **Step 4 — Run, expect PASS:** `.venv/bin/pytest tests/unit/test_core_category_gating.py -v`. +- [ ] **Step 5 — Non-régression:** `.venv/bin/pytest tests/unit/ -q` (expect prior count, 0 regression — defaults `disabled_kinds=None` ⇒ no behavior change). +- [ ] **Step 6 — Commit:** `git add anonymizer_core_refactored_onnx.py tests/unit/test_core_category_gating.py && git commit -m "feat(core): infra gating par catégorie + filtre audit Tier 1 (P1-2)"` + +--- + +### Task 2 : Relaxation du rescan résiduel (NIR/TEL) — couplage sécurité D3 + +**Files:** Modify `anonymizer_core_refactored_onnx.py` (`_residual_pii_patterns` ~l.5453-5458 + INSEE-names branch ~l.5470-5490). Test `tests/unit/test_core_category_gating.py` (extend). + +- [ ] **Step 1 — Failing test.** Add to `tests/unit/test_core_category_gating.py` a test that the residual-pattern builder skips NIR/TEL when disabled. First read the code around l.5449-5519 to expose the pattern-building as a testable helper `_build_residual_patterns(disabled_kinds)` (refactor the inline list into this helper). Test: + +```python +def test_residual_patterns_skip_disabled_nir_tel(): + labels_all = {lbl for _pat, lbl in core._build_residual_patterns(set())} + assert {"NIR", "EMAIL", "IBAN", "TEL"} <= labels_all + labels_no_nir = {lbl for _pat, lbl in core._build_residual_patterns({"NIR"})} + assert "NIR" not in labels_no_nir + assert "EMAIL" in labels_no_nir and "IBAN" in labels_no_nir # non toggleables restent + labels_no_tel = {lbl for _pat, lbl in core._build_residual_patterns({"TEL"})} + assert "TEL" not in labels_no_tel +``` + +- [ ] **Step 2 — Run, expect FAIL.** +- [ ] **Step 3 — Implement.** Refactor the inline `_residual_pii_patterns` (~l.5453-5458) into a module function `_build_residual_patterns(disabled_kinds: set) -> list[tuple]` that always includes EMAIL+IBAN, includes NIR only if `"NIR" not in disabled_kinds`, includes TEL only if `"TEL" not in disabled_kinds`. Call it in the residual check with `cfg.get("disabled_kinds") or set()`. Gate the opt-in INSEE-names branch (~l.5470) additionally under `"NOM" not in disabled`. +- [ ] **Step 4 — Run, expect PASS.** +- [ ] **Step 5 — Non-régression:** `.venv/bin/pytest tests/unit/ -q`. +- [ ] **Step 6 — Commit:** `git commit -m "feat(core): relâcher le rescan résiduel pour NIR/TEL décochés (P1-2/D3)"` + +--- + +### Task 3 : Gates texte (Tier 2 + Tier 3) — passes de détection + selective_rescan + +**Files:** Modify `anonymizer_core_refactored_onnx.py` at the dispatcher sites listed below. Test `tests/unit/test_core_category_gating_behavior.py` (behavioral, end-to-end on `anonymise_document_regex`). + +**Sites à gater** (lire chaque site avant édition ; pattern : récupérer `disabled = cfg.get("disabled_kinds") or set()` en tête de fonction, puis sauter le sous-bloc `.sub`/`PiiHit` de la catégorie si désactivée) : +`_mask_line_by_regex` (~1670), `_kv_value_only_mask` (~2110, incl. subs NOM/label 2098-2106), bloc PERSON-majuscules (~1942-2008 → NOM), `_apply_extracted_names` (~2809 → early-return `text` inchangé si NOM désactivé), `_mask_with_hf` (~3136 → par placeholder NOM/ETAB/ADRESSE), `_mask_with_eds_pseudo` (~3208 → idem via EDS_LABEL_MAP), `selective_rescan` (~4159 → DATE_NAISSANCE 4203, ADRESSE 4205-4207, ETAB 4229-4251, ADHERENT 4200-4201, TEL 4191-4193, NIR 4187-4188), blocs phase-0 multiline DATE_NAISSANCE (~3014) / NIR (~3034). + +- [ ] **Step 1 — Failing behavioral tests.** Create `tests/unit/test_core_category_gating_behavior.py`. For each of the 7 categories, build a minimal `pages_text` containing a clear instance of that category + one instance of a DIFFERENT category, run `anonymise_document_regex(pages_text, [], cfg)` with the category disabled, and assert: the disabled category's value is PRESENT (en clair) in the output, AND the other category is still masked. Example (NIR + TEL) — adapt others by reading the real regexes for realistic inputs: + +```python +import anonymizer_core_refactored_onnx as core + + +def _cfg(disabled): + cfg = core.load_dictionaries(None) + cfg["disabled_kinds"] = set(disabled) + return cfg + + +def test_disabling_nir_leaves_nir_clear_but_masks_tel(): + pages = ["NIR : 1 85 05 74 123 456 78\nTél : 05 59 12 34 56"] + out, _audit = core.anonymise_document_regex(pages, [], _cfg({"NIR"}))[:2] + text = "\n".join(out) if isinstance(out, list) else str(out) + assert "1 85 05 74 123 456 78" in text # NIR décoché → en clair + assert "05 59 12 34 56" not in text # TEL non décoché → masqué + + +def test_all_enabled_is_unchanged_baseline(): + pages = ["NIR : 1 85 05 74 123 456 78"] + out, _audit = core.anonymise_document_regex(pages, [], _cfg(set()))[:2] + text = "\n".join(out) if isinstance(out, list) else str(out) + assert "1 85 05 74 123 456 78" not in text # tout activé → masqué (non-régression) +``` + +(Write one analogous test per category: NOM, DATE_NAISSANCE, ETAB, ADRESSE, ADHERENT — using inputs that the real regexes detect. Read the regex definitions to craft valid inputs. Verify the exact return shape of `anonymise_document_regex` first.) + +- [ ] **Step 2 — Run, expect FAIL** (categories still masked because text-gates absent). +- [ ] **Step 3 — Implement** the gates at each site above. Apply the same `if "CAT" in disabled: ` pattern. Work site by site; after each, re-run the behavioral test for that category. +- [ ] **Step 4 — Run, expect ALL PASS** (7 category tests + baseline). +- [ ] **Step 5 — Non-régression + gate qualité:** `.venv/bin/pytest tests/unit/ -q` and `.venv/bin/python scripts/evaluate_quality.py` (score must stay A+ with defaults; the synthetic regression gate must pass). +- [ ] **Step 6 — Commit:** `git commit -m "feat(core): gates texte par catégorie (Tier 2/3) + selective_rescan (P1-2)"` + +--- + +### Task 4 : Garde-fou adresse dans le burn PDF (`_search_pdf_address_lines`) + +**Files:** Modify `anonymizer_core_refactored_onnx.py` (~l.4572 vector, ~l.4744 raster — `_search_pdf_address_lines` is called independently of audit). Test: extend behavioral test (or a focused unit test on the redact function with ADRESSE disabled). + +- [ ] **Step 1 — Failing test:** assert that when ADRESSE is disabled, the independent address-line search is skipped (so addresses aren't burned). Read `redact_pdf_vector`/`redact_pdf_raster` to find how `disabled_kinds` reaches them (pass `cfg["disabled_kinds"]` or the set as a param; the functions already receive `cfg` or can). +- [ ] **Step 2 — Run, expect FAIL.** +- [ ] **Step 3 — Implement:** guard both `_search_pdf_address_lines(page)` calls with `if "ADRESSE" not in disabled_kinds:`. +- [ ] **Step 4 — Run, expect PASS.** +- [ ] **Step 5 — Non-régression:** `.venv/bin/pytest tests/unit/ -q`. +- [ ] **Step 6 — Commit:** `git commit -m "feat(core): garde-fou adresse burn PDF si catégorie décochée (P1-2)"` + +--- + +### Task 5 : Câblage GUI — 7 booléens → moteur + +**Files:** Modify `gui_v6/config_state.py` (7 bool fields + map to `disabled_kinds`), `gui_v6/engine_bridge.py` (`EngineSettings` + `build_engine_kwargs`), `gui_v6/tabs/tab_config.py` (les 7 `_mini_toggle` ~l.351-357 → `variable`+`command` sur `ConfigState`). Tests `tests/unit/test_gui_v6_category_toggles.py`. + +- [ ] **Step 1 — Failing test.** Create `tests/unit/test_gui_v6_category_toggles.py`: + +```python +from gui_v6.config_state import ConfigState + + +def test_default_all_categories_enabled_means_no_disabled_kinds(): + es = ConfigState().to_engine_settings() + assert es.disabled_kinds == frozenset() + + +def test_unchecking_nir_and_etab_propagates_as_disabled_kinds(): + cs = ConfigState() + cs.mask_nir = False + cs.mask_etab = False + es = cs.to_engine_settings() + assert es.disabled_kinds == frozenset({"NIR", "ETAB"}) + + +def test_build_engine_kwargs_passes_disabled_kinds(): + from gui_v6.engine_bridge import EngineSettings, build_engine_kwargs + es = EngineSettings(disabled_kinds=frozenset({"TEL"})) + kwargs = build_engine_kwargs(es) + assert kwargs["disabled_kinds"] == frozenset({"TEL"}) +``` + +- [ ] **Step 2 — Run, expect FAIL.** +- [ ] **Step 3 — Implement.** + - `gui_v6/config_state.py`: add 7 bool fields (default True): `mask_noms, mask_ddn, mask_etab, mask_adresse, mask_nir, mask_tel, mask_adherent`. In `to_engine_settings`, build `disabled_kinds = frozenset(cat for field, cat in [(self.mask_noms,"NOM"),(self.mask_ddn,"DATE_NAISSANCE"),(self.mask_etab,"ETAB"),(self.mask_adresse,"ADRESSE"),(self.mask_nir,"NIR"),(self.mask_tel,"TEL"),(self.mask_adherent,"ADHERENT")] if not field)` and pass it to `EngineSettings`. + - `gui_v6/engine_bridge.py`: add `disabled_kinds: frozenset = frozenset()` to `EngineSettings`; in `build_engine_kwargs`, add `kwargs["disabled_kinds"] = settings.disabled_kinds`. + - `gui_v6/tabs/tab_config.py`: wire each of the 7 `_mini_toggle` to a `ctk.BooleanVar` bound to the matching `ConfigState` field with a `command` that writes it back. (Read the current `_mini_toggle` signature; follow the existing pattern used by other wired toggles in this tab.) +- [ ] **Step 4 — Run, expect PASS** + `.venv/bin/python Pseudonymisation_Gui_V6.py --self-test`. +- [ ] **Step 5 — Non-régression GUI:** `.venv/bin/pytest tests/unit/ -k gui_v6 -q`. +- [ ] **Step 6 — Commit:** `git commit -m "feat(gui): câbler les 7 toggles catégories au moteur (P1-2)"` + +--- + +## Self-review (couverture spec P1-2 + map) +- T1 audit filter (Task 1) · rescan relax NIR/TEL (Task 2) · text gates incl. selective_rescan + NER paths + phase-0 (Task 3) · address burn guard (Task 4) · GUI wiring (Task 5). ✓ +- Default-deny vérifié (Task 1 test `EMAIL/IBAN/VILLE/FAX → None`). EMAIL/IBAN/IPP/VILLE/FAX toujours masqués. ✓ +- Baseline « tout activé = non-régression » testée (Task 3) + `evaluate_quality` A+ gate. ✓ +- **Risque** : un site texte oublié ⇒ la catégorie reste masquée dans le texte (test rouge le détecte) mais JAMAIS de fuite croisée (default-deny). Le livrable PDF est garanti par T1 (audit filter) seul. +- **Revue Qwen obligatoire** sur Tasks 1-4 (cœur sécurité) avant exécution/après implémentation.