# QW Suite Mai — Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Livrer 3 quick wins (QW1 multi-écrans, QW2 LoopDetector, QW4 safety_checks hybrides) sur la branche `feature/qw-suite-mai` avant et autour de la démo GHT, en TDD léger, sans régression sur l'existant. **Architecture:** Trois modules serveur isolés (`monitor_router.py`, `loop_detector.py`, `safety_checks_provider.py`) plus extension du DSL `pause_for_human`, plus enrichissement client Agent V1 (capture multi-écrans) et frontend VWB (`PauseDialog` + extension `PropertiesPanel`). Tout backward compatible, kill-switches env vars sur QW2 et QW4. **Tech Stack:** Python 3.12 (FastAPI/Uvicorn serveur), Ollama local (`medgemma:4b` pour safety_checks), CLIP embedder déjà chargé (réutilisé pour LoopDetector signal A), `mss` + `screeninfo` pour capture multi-écrans, React + Vite + TypeScript pour VWB frontend, pytest pour tests serveur, test manuel cadré pour frontend. **Spec source:** `docs/superpowers/specs/2026-05-05-qw-suite-mai-design.md` --- ## File Structure | Fichier | Action | Responsabilité | |---|---|---| | `agent_v0/server_v1/monitor_router.py` | Créer | Résolution écran cible (QW1) | | `tests/unit/test_monitor_router.py` | Créer | Tests unitaires QW1 routeur | | `tests/integration/test_grounding_offset.py` | Créer | Tests offsets QW1 | | `core/execution/input_handler.py` | Modifier | Capture par monitor + offsets propagés (QW1) | | `agent_v0/agent_v1/vision/capturer.py` | Modifier | Enrichissement events `monitor_index` + `monitors_geometry` (QW1 client) | | `agent_v0/deploy/windows_client/agent_v1/vision/capturer.py` | Modifier | Idem (copie déployée Windows) | | `agent_v0/server_v1/loop_detector.py` | Créer | Détecteur de boucles composite (QW2) | | `tests/unit/test_loop_detector.py` | Créer | Tests unitaires QW2 | | `tests/integration/test_loop_detector_replay.py` | Créer | Tests intégration QW2 | | `agent_v0/server_v1/replay_engine.py` | Modifier | Extension `_create_replay_state` (QW2) + hook `pause_for_human` (QW4) | | `agent_v0/server_v1/api_stream.py` | Modifier | Hook routeur (QW1) + hook loop_detector (QW2) + extension `/replay/resume` (QW4) | | `agent_v0/server_v1/safety_checks_provider.py` | Créer | Provider hybride déclaratif + LLM contextuel (QW4) | | `tests/unit/test_safety_checks_provider.py` | Créer | Tests unitaires QW4 | | `tests/integration/test_replay_resume_acknowledgments.py` | Créer | Tests intégration QW4 | | `visual_workflow_builder/frontend_v4/src/types.ts` | Modifier | Extension types `PauseAction.parameters` + `Execution` (QW4) | | `visual_workflow_builder/frontend_v4/src/components/PauseDialog.tsx` | Créer | Composant pause + ChecklistPanel (QW4 UX) | | `visual_workflow_builder/frontend_v4/src/components/PropertiesPanel.tsx` | Modifier | Éditeur `safety_level` + `safety_checks` (QW4) | | `docs/QW_SUITE_MAI.md` | Créer | Doc de livraison synthétique | | `MEMORY.md` (~/.claude/projects/.../) | Modifier | Lien vers docs QW | | `.qw-baseline.log` | Créer | Log baseline E2E (gitignored) | --- ## Section 0 — Preflight & Baseline ### Task 1: Créer la branche backup et la pousser sur Gitea **Files:** - Pas de modif fichier, opération git pure - [ ] **Step 1: Créer le tag de backup en local** ```bash git tag -a backup-pre-qw-suite-mai-2026-05-05 -m "Backup avant sprint QW suite mai 2026 (multi-écrans + LoopDetector + safety_checks)" ``` - [ ] **Step 2: Créer la branche backup depuis HEAD (état actuel feature/qw-suite-mai juste après le commit du spec)** ```bash git branch backup/pre-qw-suite-mai-2026-05-05 ``` - [ ] **Step 3: Pousser la branche et le tag sur Gitea** ```bash git push gitea backup/pre-qw-suite-mai-2026-05-05 git push gitea backup-pre-qw-suite-mai-2026-05-05 ``` Expected output : `* [new branch] backup/pre-qw-suite-mai-2026-05-05 -> backup/pre-qw-suite-mai-2026-05-05` et idem pour le tag. - [ ] **Step 4: Vérifier la présence sur Gitea** ```bash git ls-remote gitea | grep -E "(backup/pre-qw|backup-pre-qw)" ``` Expected : 2 lignes (la branche et le tag). ### Task 2: Capturer la baseline E2E avant toute modification **Files:** - Create: `.qw-baseline.log` (gitignored, à ajouter au `.gitignore` si absent) - [ ] **Step 1: Vérifier que `.qw-baseline.log` est gitignored** ```bash grep -E "^\.qw-baseline\.log" .gitignore || echo ".qw-baseline.log" >> .gitignore ``` - [ ] **Step 2: Activer le venv et lancer la suite référence** ```bash source venv_v3/bin/activate pytest tests/test_pipeline_e2e.py \ tests/test_phase0_integration.py \ tests/integration/test_stream_processor.py \ -q 2>&1 | tee .qw-baseline.log ``` Expected : log avec ligne finale du type `XXX passed in YY.YYs` ou un mix `passed/failed/skipped`. Ce log devient la **référence absolue** pour la non-régression. - [ ] **Step 3: Extraire le compteur final pour comparaison rapide future** ```bash tail -3 .qw-baseline.log ``` Noter mentalement (ou copier dans un commentaire de PR) : nombre de passed / failed / skipped. ### Task 3: Smoke démo workflow Easily Assure existant **Files:** aucun (test manuel observable) - [ ] **Step 1: Démarrer la stack complète si pas déjà active** ```bash ./svc.sh status # Si streaming/vwb-backend/vwb-frontend KO : ./svc.sh start ``` - [ ] **Step 2: Ouvrir VWB dans le navigateur** URL : `http://localhost:3002` ou via reverse proxy `https://vwb.labs.laurinebazin.design` - [ ] **Step 3: Sélectionner un workflow existant validé sur Easily Assure** Choisir un workflow déjà démontré le 30/04 (cf. mémoire `reference_demo_ght_mockup.md`). Idéalement un dossier UHCD complet. - [ ] **Step 4: Lancer le replay et observer** Cliquer "→ Windows" pour exécuter sur Agent V1. Vérifier que le replay déroule jusqu'au bout sans erreur visible (clics au bon endroit, formulaires remplis). - [ ] **Step 5: Archiver une capture de l'état final dans /tmp** ```bash # Capture de l'écran final si possible (sur la machine cible) # Sinon : noter dans .qw-baseline.log la date du smoke et l'observation echo "smoke_easily_assure: OK ($(date -Iseconds))" >> .qw-baseline.log ``` ### Task 4: Vérifier l'état du frontend VWB (état "tout va bien") **Files:** aucun - [ ] **Step 1: Charger un workflow dans VWB** Dans le navigateur : ouvrir un workflow existant qui contient au moins une action `pause_for_human` (cf. `types.ts:46` et `PropertiesPanel.tsx:1356`). - [ ] **Step 2: Cliquer sur l'action pause dans le canvas → vérifier l'éditeur de propriétés** Le `PropertiesPanel` doit montrer le champ `message` éditable. Si l'éditeur s'ouvre et qu'on peut taper, c'est OK. - [ ] **Step 3: Capture d'écran "VWB OK"** Garder cette capture comme référence visuelle. Sera comparée après commit QW4 pour vérifier zéro régression UI. --- ## Section 1 — QW1 Multi-écrans ### Task 5: Tests unitaires `test_monitor_router.py` (rouges) **Files:** - Create: `tests/unit/test_monitor_router.py` - [ ] **Step 1: Créer le fichier de tests avec les 4 cas** ```python # tests/unit/test_monitor_router.py """Tests unitaires pour MonitorRouter (QW1).""" import pytest from agent_v0.server_v1.monitor_router import resolve_target_monitor, MonitorTarget # Geometry de référence pour les 3 tests : 2 écrans côte à côte TWO_MONITORS = [ {"idx": 0, "x": 0, "y": 0, "w": 1920, "h": 1080, "primary": True}, {"idx": 1, "x": 1920, "y": 0, "w": 1920, "h": 1080, "primary": False}, ] def test_resolve_uses_action_monitor_index_when_present(): """Si action.monitor_index présent et valide → cible cet écran.""" action = {"monitor_index": 1} session_state = {"monitors_geometry": TWO_MONITORS, "last_focused_monitor": 0} result = resolve_target_monitor(action, session_state) assert result.idx == 1 assert result.offset_x == 1920 assert result.offset_y == 0 assert result.source == "action" def test_resolve_falls_back_to_focused_monitor_when_action_missing(): """Si action.monitor_index absent → fallback focus actif.""" action = {} # pas de monitor_index session_state = {"monitors_geometry": TWO_MONITORS, "last_focused_monitor": 1} result = resolve_target_monitor(action, session_state) assert result.idx == 1 assert result.source == "focus" def test_resolve_falls_back_to_composite_when_geometry_empty(): """Si geometry vide (vieux Agent V1) → fallback composite (idx=-1, offset=0).""" action = {} session_state = {"monitors_geometry": [], "last_focused_monitor": None} result = resolve_target_monitor(action, session_state) assert result.source == "composite_fallback" assert result.offset_x == 0 assert result.offset_y == 0 def test_resolve_falls_back_when_action_index_out_of_range(): """Si action.monitor_index hors limites (écran débranché) → fallback focus.""" action = {"monitor_index": 5} # n'existe pas session_state = {"monitors_geometry": TWO_MONITORS, "last_focused_monitor": 0} result = resolve_target_monitor(action, session_state) assert result.idx == 0 assert result.source == "focus" ``` - [ ] **Step 2: Run pour vérifier qu'ils échouent** ```bash pytest tests/unit/test_monitor_router.py -v ``` Expected : `ImportError: cannot import name 'resolve_target_monitor' from 'agent_v0.server_v1.monitor_router'` ou `ModuleNotFoundError`. ### Task 6: Implémenter `monitor_router.py` **Files:** - Create: `agent_v0/server_v1/monitor_router.py` - [ ] **Step 1: Écrire le module complet** ```python # agent_v0/server_v1/monitor_router.py """MonitorRouter — résolution de l'écran cible pour le replay (QW1). Stratégie en cascade : 1. action.monitor_index (hérité de la session source) → cible cet écran 2. session.last_focused_monitor (focus actif vu en dernier heartbeat) → fallback 3. composite (offset 0, 0) → backward compat Émet sur le bus lea:* l'event monitor_routed avec la source de la décision. """ from dataclasses import dataclass from typing import Any, Dict, List, Optional @dataclass class MonitorTarget: """Représente l'écran cible résolu pour une action de replay.""" idx: int offset_x: int offset_y: int w: int h: int source: str # "action" | "focus" | "composite_fallback" _COMPOSITE_FALLBACK = MonitorTarget( idx=-1, offset_x=0, offset_y=0, w=0, h=0, source="composite_fallback", ) def _find_monitor(geometry: List[Dict[str, Any]], idx: int) -> Optional[Dict[str, Any]]: """Retourne le monitor d'index donné, ou None si absent.""" for m in geometry: if m.get("idx") == idx: return m return None def _to_target(monitor: Dict[str, Any], source: str) -> MonitorTarget: return MonitorTarget( idx=int(monitor["idx"]), offset_x=int(monitor.get("x", 0)), offset_y=int(monitor.get("y", 0)), w=int(monitor.get("w", 0)), h=int(monitor.get("h", 0)), source=source, ) def resolve_target_monitor( action: Dict[str, Any], session_state: Dict[str, Any], ) -> MonitorTarget: """Résout l'écran cible d'une action de replay. Args: action: Dict de l'action (peut contenir `monitor_index`). session_state: État de la session (doit contenir `monitors_geometry` et `last_focused_monitor`). Returns: MonitorTarget avec l'offset à appliquer aux coordonnées de grounding. """ geometry: List[Dict[str, Any]] = session_state.get("monitors_geometry") or [] # 1. Cible explicite via action explicit_idx = action.get("monitor_index") if explicit_idx is not None and geometry: m = _find_monitor(geometry, int(explicit_idx)) if m is not None: return _to_target(m, source="action") # Index invalide → on tombe sur le fallback focus # 2. Fallback focus actif focused_idx = session_state.get("last_focused_monitor") if focused_idx is not None and geometry: m = _find_monitor(geometry, int(focused_idx)) if m is not None: return _to_target(m, source="focus") # 3. Fallback composite (backward compat — comportement actuel mss.monitors[0]) return _COMPOSITE_FALLBACK ``` - [ ] **Step 2: Re-run les tests, ils doivent tous passer** ```bash pytest tests/unit/test_monitor_router.py -v ``` Expected : `4 passed`. - [ ] **Step 3: Commit** ```bash git add agent_v0/server_v1/monitor_router.py tests/unit/test_monitor_router.py git commit -m "$(cat <<'EOF' feat(qw1): MonitorRouter — résolution de l'écran cible pour le replay Module isolé qui choisit l'écran cible avec stratégie en cascade : 1. action.monitor_index (session source) → cible explicite 2. session.last_focused_monitor → fallback focus actif 3. composite (offset 0,0) → backward compat (comportement actuel) Backward 100% : actions sans monitor_index → fallback composite identique au comportement mss.monitors[0] actuel. Tests : 4 cas (cible OK, fallback focus, fallback composite, index invalide). Co-Authored-By: Claude Opus 4.7 (1M context) EOF )" ``` ### Task 7: Tests intégration `test_grounding_offset.py` (rouges) **Files:** - Create: `tests/integration/test_grounding_offset.py` - [ ] **Step 1: Créer le fichier de tests** ```python # tests/integration/test_grounding_offset.py """Tests intégration pour la propagation d'offset multi-écrans (QW1).""" import pytest from unittest.mock import patch, MagicMock from core.execution import input_handler @pytest.fixture def mock_screen(): """Mock une capture mss : retourne un PIL Image factice + offsets.""" from PIL import Image img = Image.new("RGB", (1920, 1080), color="white") return img def test_capture_screen_default_returns_composite_when_no_idx(mock_screen): """_capture_screen() sans monitor_idx → composite, offset (0, 0).""" with patch("core.execution.input_handler.mss") as mock_mss: ctx = mock_mss.mss.return_value.__enter__.return_value ctx.monitors = [{"left": 0, "top": 0, "width": 3840, "height": 1080}] ctx.grab.return_value = MagicMock(size=(3840, 1080), bgra=b"\x00" * (3840 * 1080 * 4)) with patch("core.execution.input_handler.PILImage.frombytes", return_value=mock_screen): screen, w, h, ox, oy = input_handler._capture_screen() assert (w, h, ox, oy) == (3840, 1080, 0, 0) def test_capture_screen_targets_specific_monitor_with_offset(mock_screen): """_capture_screen(monitor_idx=1) → cible monitors[2] (mss skip [0]), offset = monitor.left.""" with patch("core.execution.input_handler.mss") as mock_mss: ctx = mock_mss.mss.return_value.__enter__.return_value # mss layout : [0]=composite, [1]=primary, [2]=secondary ctx.monitors = [ {"left": 0, "top": 0, "width": 3840, "height": 1080}, {"left": 0, "top": 0, "width": 1920, "height": 1080}, {"left": 1920, "top": 0, "width": 1920, "height": 1080}, ] ctx.grab.return_value = MagicMock(size=(1920, 1080), bgra=b"\x00" * (1920 * 1080 * 4)) with patch("core.execution.input_handler.PILImage.frombytes", return_value=mock_screen): screen, w, h, ox, oy = input_handler._capture_screen(monitor_idx=1) assert (w, h, ox, oy) == (1920, 1080, 1920, 0) ``` - [ ] **Step 2: Run pour vérifier qu'ils échouent** ```bash pytest tests/integration/test_grounding_offset.py -v ``` Expected : `TypeError: _capture_screen() got an unexpected keyword argument 'monitor_idx'` ou similaire (la signature actuelle est sans paramètre). ### Task 8: Modifier `input_handler.py` — capture par monitor + propagation offsets **Files:** - Modify: `core/execution/input_handler.py:416-429` (`_capture_screen`) - Modify: `core/execution/input_handler.py:432-512` (`_grounding_ocr`) - Modify: `core/execution/input_handler.py:515-579` (`_grounding_ui_tars`) - Modify: `core/execution/input_handler.py:629-684` (`_grounding_vlm`) - [ ] **Step 1: Importer PIL.Image avec alias en haut du fichier (si pas déjà)** Vérifier que `from PIL import Image as PILImage` est importé au top-level (sinon l'ajouter en remplacement de l'import lazy actuel dans `_capture_screen`). - [ ] **Step 2: Réécrire `_capture_screen` pour accepter `monitor_idx`** Remplacer la fonction `_capture_screen` (lignes 416-429) par : ```python def _capture_screen(monitor_idx=None): """Capture l'écran et retourne (PIL.Image, width, height, offset_x, offset_y). Args: monitor_idx: Index logique 0..N-1 du monitor à capturer (cf. screeninfo). Si None : capture composite (mss.monitors[0]) — comportement legacy. Returns: (image, w, h, offset_x, offset_y). offset = (0,0) en mode composite. """ try: import mss from PIL import Image as PILImage with mss.mss() as sct: if monitor_idx is None: # Comportement actuel : composite tous écrans monitor = sct.monitors[0] offset_x, offset_y = 0, 0 else: # mss skip monitors[0] (composite). Index logique 0 → mss.monitors[1]. mss_idx = int(monitor_idx) + 1 if mss_idx >= len(sct.monitors): logger.warning( "mss.monitors[%d] hors limites (n=%d) — fallback composite", mss_idx, len(sct.monitors), ) monitor = sct.monitors[0] offset_x, offset_y = 0, 0 else: monitor = sct.monitors[mss_idx] offset_x = int(monitor.get("left", 0)) offset_y = int(monitor.get("top", 0)) screenshot = sct.grab(monitor) screen = PILImage.frombytes('RGB', screenshot.size, screenshot.bgra, 'raw', 'BGRX') return screen, monitor['width'], monitor['height'], offset_x, offset_y except Exception as e: logger.debug(f"Capture écran échouée: {e}") return None, 0, 0, 0, 0 ``` - [ ] **Step 3: Adapter `_grounding_ocr` pour propager l'offset** Dans `_grounding_ocr` (ligne 432-512) : - Remplacer `screen, screen_w, screen_h = _capture_screen()` par `screen, screen_w, screen_h, ox, oy = _capture_screen(monitor_idx=anchor_bbox.get("monitor_idx") if anchor_bbox else None)` - Ajouter `ox, oy` aux coords avant return : - Avant : `return {'x': best['x'], 'y': best['y'], ...}` - Après : `return {'x': best['x'] + ox, 'y': best['y'] + oy, 'method': 'ocr', 'confidence': best['conf']}` - [ ] **Step 4: Adapter `_grounding_ui_tars` idem** Dans `_grounding_ui_tars` (ligne 515-579) : - Modifier la signature : `def _grounding_ui_tars(target_text, target_description="", monitor_idx=None):` - Remplacer `screen, screen_w, screen_h = _capture_screen()` par `screen, screen_w, screen_h, ox, oy = _capture_screen(monitor_idx=monitor_idx)` - Ajouter offset dans le return : `return {'x': x + ox, 'y': y + oy, 'method': 'ui_tars', 'confidence': 0.85}` - [ ] **Step 5: Adapter `_grounding_vlm` idem** Dans `_grounding_vlm` (ligne 629-684) : - Modifier la signature : `def _grounding_vlm(target_text, target_description="", monitor_idx=None):` - Remplacer le `_capture_screen()` interne par `_capture_screen(monitor_idx=monitor_idx)` - Ajouter offset au return des coords confirmées par OCR - [ ] **Step 6: Modifier `find_element_on_screen` pour propager `monitor_idx`** Dans `find_element_on_screen` (signature ligne 312-317) : - Ajouter le paramètre `monitor_idx: Optional[int] = None` - Le passer aux 3 niveaux de cascade : - `_grounding_ocr(target_text, anchor_bbox=anchor_bbox)` → ajouter une étape qui range `monitor_idx` dans `anchor_bbox` si bbox dict, sinon créer un dict avec juste `monitor_idx` - `_grounding_ui_tars(target_text, target_description, monitor_idx=monitor_idx)` - `_grounding_vlm(target_text, target_description, monitor_idx=monitor_idx)` - [ ] **Step 7: Run les tests intégration** ```bash pytest tests/integration/test_grounding_offset.py -v ``` Expected : `2 passed`. - [ ] **Step 8: Re-run baseline pour vérifier non-régression** ```bash pytest tests/test_pipeline_e2e.py \ tests/test_phase0_integration.py \ tests/integration/test_stream_processor.py \ -q ``` Expected : même nombre de passed que `.qw-baseline.log`. - [ ] **Step 9: Commit** ```bash git add core/execution/input_handler.py tests/integration/test_grounding_offset.py git commit -m "$(cat <<'EOF' feat(qw1): capture par monitor + propagation offsets dans grounding cascade _capture_screen() accepte un monitor_idx optionnel (None = composite legacy). Index logique 0..N-1 mappé sur mss.monitors[idx+1] (mss[0] = composite). Les 3 niveaux de grounding (OCR, UI-TARS, VLM) propagent l'offset retourné par la capture pour traduire les coordonnées locales monitor en coordonnées absolues écran (correct pour pyautogui.click). find_element_on_screen() accepte monitor_idx et le forwarde aux 3 niveaux. Backward 100% : monitor_idx=None partout → comportement strictement actuel. Co-Authored-By: Claude Opus 4.7 (1M context) EOF )" ``` ### Task 9: Enrichir Agent V1 capturer côté client (events `monitor_index` + `monitors_geometry`) **Files:** - Modify: `agent_v0/agent_v1/vision/capturer.py` (au moins 4 endroits utilisant `sct.monitors[1]`) - [ ] **Step 1: Lire le fichier en entier pour comprendre l'API** ```bash wc -l agent_v0/agent_v1/vision/capturer.py ``` Si > 300 lignes, lire en 2 fois pour cibler les modifs. - [ ] **Step 2: Importer `screeninfo` (graceful fallback si absent)** En haut du fichier, après les imports existants : ```python try: from screeninfo import get_monitors as _screeninfo_get_monitors _SCREENINFO_AVAILABLE = True except ImportError: _SCREENINFO_AVAILABLE = False ``` - [ ] **Step 3: Ajouter une fonction helper `_get_monitors_geometry()`** ```python def _get_monitors_geometry(): """Retourne la liste des monitors physiques avec leurs offsets. Returns: List[dict] : [{idx, x, y, w, h, primary}, ...]. Vide si screeninfo indisponible (le serveur tombera sur fallback composite). """ if not _SCREENINFO_AVAILABLE: return [] try: monitors = _screeninfo_get_monitors() return [ { "idx": i, "x": int(m.x), "y": int(m.y), "w": int(m.width), "h": int(m.height), "primary": bool(getattr(m, "is_primary", False)), } for i, m in enumerate(monitors) ] except Exception: return [] def _get_active_monitor_index(): """Retourne l'index logique du monitor où se trouve le curseur (focus actif). Returns: int ou None si indéterminable. """ if not _SCREENINFO_AVAILABLE: return None try: import pyautogui cx, cy = pyautogui.position() for i, m in enumerate(_screeninfo_get_monitors()): if m.x <= cx < m.x + m.width and m.y <= cy < m.y + m.height: return i except Exception: return None return None ``` - [ ] **Step 4: Enrichir tous les payloads d'événements envoyés au serveur** Identifier les fonctions qui envoient au serveur (probablement via le `feedback_bus` ou via HTTP POST). Pour chacun, ajouter à l'event : ```python event_payload["monitor_index"] = _get_active_monitor_index() event_payload["monitors_geometry"] = _get_monitors_geometry() ``` Le plus simple : créer un helper en haut du module et l'appeler à chaque endroit qui crée un payload heartbeat ou event : ```python def _enrich_with_monitor_info(payload: dict) -> dict: """Ajoute monitor_index et monitors_geometry au payload (modification in-place).""" payload["monitor_index"] = _get_active_monitor_index() payload["monitors_geometry"] = _get_monitors_geometry() return payload ``` Et appeler `_enrich_with_monitor_info(payload)` juste avant chaque envoi. - [ ] **Step 5: Vérifier que l'import `screeninfo` est dans `requirements_agent_v1.txt`** ```bash grep -i screeninfo agent_v0/agent_v1/requirements*.txt ``` Si absent, l'ajouter à `requirements_agent_v1.txt` : ``` screeninfo>=0.8 ``` (Le module a un fallback gracieux si le paquet n'est pas installé sur les vieux Agent V1, donc pas de blocage.) - [ ] **Step 6: Smoke test local — vérifier que l'agent ne crashe pas** Sur la machine Linux (pas besoin de Windows pour ce smoke) : ```bash python -c "from agent_v0.agent_v1.vision.capturer import _get_monitors_geometry, _get_active_monitor_index; print(_get_monitors_geometry()); print(_get_active_monitor_index())" ``` Expected : une liste de monitors (ou `[]` si screeninfo absent), et un int (ou None). ### Task 10: Propager la modif au déploiement Windows **Files:** - Modify: `agent_v0/deploy/windows_client/agent_v1/vision/capturer.py` - [ ] **Step 1: Copier le fichier source vers le déploiement** ```bash cp agent_v0/agent_v1/vision/capturer.py agent_v0/deploy/windows_client/agent_v1/vision/capturer.py ``` - [ ] **Step 2: Vérifier le diff (rien d'autre ne doit changer)** ```bash git diff agent_v0/deploy/windows_client/agent_v1/vision/capturer.py | head -80 ``` - [ ] **Step 3: Mettre à jour `requirements_agent_v1.txt` côté deploy aussi** ```bash grep screeninfo agent_v0/deploy/windows_client/agent_v1/requirements*.txt || \ echo "screeninfo>=0.8" >> agent_v0/deploy/windows_client/agent_v1/requirements_agent_v1.txt ``` ### Task 11: Hook MonitorRouter dans `api_stream.py` **Files:** - Modify: `agent_v0/server_v1/api_stream.py` (~10 lignes ajoutées dans la branche qui dispatche les actions au client) - [ ] **Step 1: Localiser l'endroit où une action est envoyée au client** ```bash grep -n "next_action\|/replay/next\|return.*action" agent_v0/server_v1/api_stream.py | grep -v "^.*#" | head -20 ``` L'objectif : trouver l'endpoint qui POP l'action de la queue et la renvoie à l'Agent V1 (typiquement `/replay/next` ou la réponse au polling client). - [ ] **Step 2: Importer le routeur en haut du fichier** ```python from agent_v0.server_v1.monitor_router import resolve_target_monitor ``` - [ ] **Step 3: Avant de renvoyer l'action au client, l'enrichir avec `monitor_resolution`** Dans la fonction qui prépare la réponse au client (juste avant le `return`) : ```python # QW1 — Résoudre l'écran cible et joindre l'info à l'action session_state = { "monitors_geometry": session.last_window_info.get("monitors_geometry", []), "last_focused_monitor": session.last_window_info.get("monitor_index"), } target = resolve_target_monitor(action, session_state) action["monitor_resolution"] = { "idx": target.idx, "offset_x": target.offset_x, "offset_y": target.offset_y, "w": target.w, "h": target.h, "source": target.source, } # Bus event d'observabilité try: from agent_v0.agent_v1.network.feedback_bus import emit_server_event emit_server_event("lea:monitor_routed", { "replay_id": replay_state.get("replay_id"), "action_id": action.get("action_id"), "idx": target.idx, "source": target.source, }) except Exception: pass # bus optionnel, ne jamais bloquer le replay ``` - [ ] **Step 4: Vérifier que `last_window_info` est bien rempli côté serveur** Chercher où `last_window_info` est mis à jour à partir des heartbeats Agent V1 : ```bash grep -n "last_window_info" agent_v0/server_v1/*.py | head -10 ``` Si `monitor_index` et `monitors_geometry` envoyés par l'agent ne sont pas stockés dans `session.last_window_info`, ajouter leur stockage dans la fonction qui consomme les heartbeats. - [ ] **Step 5: Re-run baseline pour non-régression** ```bash pytest tests/test_pipeline_e2e.py \ tests/test_phase0_integration.py \ tests/integration/test_stream_processor.py \ -q ``` Expected : même résultat que baseline. - [ ] **Step 6: Commit** ```bash git add agent_v0/agent_v1/vision/capturer.py \ agent_v0/deploy/windows_client/agent_v1/vision/capturer.py \ agent_v0/agent_v1/requirements_agent_v1.txt \ agent_v0/deploy/windows_client/agent_v1/requirements_agent_v1.txt \ agent_v0/server_v1/api_stream.py git commit -m "$(cat <<'EOF' feat(qw1): enrichissement Agent V1 (monitor_index + monitors_geometry) + hook serveur Côté client Agent V1 : - helper _get_monitors_geometry() via screeninfo (fallback [] si absent) - helper _get_active_monitor_index() via position curseur - _enrich_with_monitor_info() ajouté à chaque payload event/heartbeat - screeninfo>=0.8 ajouté aux requirements (source + deploy) Côté serveur api_stream.py : - import resolve_target_monitor - Avant chaque envoi action au client : enrichissement action.monitor_resolution - Event bus lea:monitor_routed pour observabilité (idx, source) Backward 100% : si geometry vide, fallback composite identique au comportement actuel mss.monitors[0]. Co-Authored-By: Claude Opus 4.7 (1M context) EOF )" ``` ### Task 12: Smoke démo QW1 sur workflow Easily **Files:** aucun (test manuel observable) - [ ] **Step 1: Redémarrer le service streaming** ```bash ./svc.sh restart streaming sleep 2 ./svc.sh status streaming ``` - [ ] **Step 2: Rejouer le workflow Easily Assure utilisé en Task 3** Vérifier dans les logs serveur : `lea:monitor_routed` apparaît avec `source=focus` (vu que les workflows actuels n'ont pas de monitor_index, c'est le focus actif qui est retenu). ```bash journalctl -u rpa-streaming -f | grep monitor_routed ``` - [ ] **Step 3: Observer le replay** Le clic doit toujours atterrir au bon endroit (vérification visuelle, identique au smoke Task 3). Si décalage : kill-switch implicite = ré-checkout `backup/pre-qw-suite-mai-2026-05-05` et investiguer. - [ ] **Step 4: Push branche sur Gitea (backup distant après QW1)** ```bash git push gitea feature/qw-suite-mai ``` --- ## Section 2 — QW2 LoopDetector ### Task 13: Tests unitaires `test_loop_detector.py` (rouges) **Files:** - Create: `tests/unit/test_loop_detector.py` - [ ] **Step 1: Créer le fichier de tests avec les 8 cas** ```python # tests/unit/test_loop_detector.py """Tests unitaires pour LoopDetector composite (QW2).""" import os import pytest from unittest.mock import MagicMock from agent_v0.server_v1.loop_detector import LoopDetector, LoopVerdict @pytest.fixture def detector(): """LoopDetector avec embedder mocké (signal A toujours dispo).""" embedder = MagicMock() # Par défaut : 4 embeddings tous identiques → similarity 1.0 embedder.embed_image.return_value = [1.0, 0.0, 0.0] return LoopDetector(clip_embedder=embedder) def _state(retried=0, n_screenshots=0, n_actions=0): return { "retried_actions": retried, "_screenshot_history": [[1.0, 0.0, 0.0]] * n_screenshots, "_action_history": [{"type": "click", "x_pct": 0.5, "y_pct": 0.5}] * n_actions, } def test_screen_static_triggers_when_n_identical_embeddings(detector): """Signal A : 4 captures identiques (similarity > 0.99) → detected.""" state = _state(n_screenshots=4) verdict = detector.evaluate(state, screenshots=state["_screenshot_history"], actions=[]) assert verdict.detected is True assert verdict.signal == "screen_static" def test_screen_static_skipped_when_history_too_short(detector): """Signal A : moins de N captures → pas de détection.""" state = _state(n_screenshots=2) verdict = detector.evaluate(state, screenshots=state["_screenshot_history"], actions=[]) # Si seul A pourrait déclencher mais skip, et B/C pas remplis : detected=False assert verdict.detected is False def test_action_repeat_triggers_when_n_identical_actions(detector): """Signal B : 3 actions consécutives identiques → detected.""" state = _state(n_actions=3) verdict = detector.evaluate(state, screenshots=[], actions=state["_action_history"]) assert verdict.detected is True assert verdict.signal == "action_repeat" def test_action_repeat_skipped_when_actions_differ(detector): """Signal B : actions différentes → pas de détection.""" actions = [ {"type": "click", "x_pct": 0.1, "y_pct": 0.1}, {"type": "click", "x_pct": 0.2, "y_pct": 0.2}, {"type": "click", "x_pct": 0.3, "y_pct": 0.3}, ] verdict = detector.evaluate(_state(), screenshots=[], actions=actions) assert verdict.detected is False def test_retry_threshold_triggers_at_3(detector): """Signal C : retried_actions >= 3 → detected.""" state = _state(retried=3) verdict = detector.evaluate(state, screenshots=[], actions=[]) assert verdict.detected is True assert verdict.signal == "retry_threshold" def test_kill_switch_disables_all_signals(monkeypatch, detector): """Si RPA_LOOP_DETECTOR_ENABLED=0 → toujours detected=False.""" monkeypatch.setenv("RPA_LOOP_DETECTOR_ENABLED", "0") state = _state(retried=10, n_screenshots=10, n_actions=10) verdict = detector.evaluate(state, screenshots=state["_screenshot_history"], actions=state["_action_history"]) assert verdict.detected is False def test_embedder_unavailable_skips_signal_A_continues_others(): """Si CLIP embedder None → signal A skip, B et C continuent.""" detector = LoopDetector(clip_embedder=None) # Trigger signal C state = _state(retried=3) verdict = detector.evaluate(state, screenshots=[], actions=[]) assert verdict.detected is True assert verdict.signal == "retry_threshold" def test_embedder_exception_does_not_crash(detector): """Si embed_image lève une exception → log + verdict detected=False.""" detector.clip_embedder.embed_image.side_effect = RuntimeError("CUDA OOM") state = _state(n_screenshots=4) # Ne doit PAS lever : signal A devient inerte verdict = detector.evaluate(state, screenshots=state["_screenshot_history"], actions=[]) # Signal A inerte, B/C pas remplis → detected False assert verdict.detected is False ``` - [ ] **Step 2: Run pour vérifier qu'ils échouent** ```bash pytest tests/unit/test_loop_detector.py -v ``` Expected : `ModuleNotFoundError: No module named 'agent_v0.server_v1.loop_detector'`. ### Task 14: Implémenter `loop_detector.py` **Files:** - Create: `agent_v0/server_v1/loop_detector.py` - [ ] **Step 1: Écrire le module complet** ```python # agent_v0/server_v1/loop_detector.py """LoopDetector composite — détection de stagnation de Léa pendant un replay (QW2). Trois signaux indépendants : - screen_static : N captures consécutives avec CLIP similarity > seuil - action_repeat : N actions consécutives identiques (type + coords) - retry_threshold : nombre de retries cumulés >= seuil Un seul signal positif → verdict.detected=True. Le serveur bascule alors le replay en paused_need_help avec pause_reason explicite. Désactivable via env var RPA_LOOP_DETECTOR_ENABLED=0. """ import logging import os from dataclasses import dataclass, field from typing import Any, Dict, List, Optional logger = logging.getLogger(__name__) @dataclass class LoopVerdict: detected: bool = False reason: str = "" signal: str = "" # "screen_static" | "action_repeat" | "retry_threshold" | "" evidence: Dict[str, Any] = field(default_factory=dict) def _env_int(name: str, default: int) -> int: try: return int(os.environ.get(name, default)) except (TypeError, ValueError): return default def _env_float(name: str, default: float) -> float: try: return float(os.environ.get(name, default)) except (TypeError, ValueError): return default def _env_bool_enabled(name: str) -> bool: val = os.environ.get(name, "1").strip().lower() return val not in ("0", "false", "no", "off", "") def _cosine_similarity(a, b) -> float: """Similarité cosine entre deux vecteurs (listes ou np.array). Robuste vecteur nul.""" import numpy as np av = np.asarray(a, dtype=np.float32).flatten() bv = np.asarray(b, dtype=np.float32).flatten() na, nb = float(np.linalg.norm(av)), float(np.linalg.norm(bv)) if na < 1e-8 or nb < 1e-8: return 0.0 return float(np.dot(av, bv) / (na * nb)) class LoopDetector: def __init__(self, clip_embedder=None): self.clip_embedder = clip_embedder def evaluate( self, state: Dict[str, Any], screenshots: List[Any], actions: List[Dict[str, Any]], ) -> LoopVerdict: """Évalue les 3 signaux. Retourne le premier déclenché. Args: state: replay_state (utilisé pour retried_actions) screenshots: anneau d'embeddings CLIP (les N derniers) actions: anneau des N dernières actions exécutées """ if not _env_bool_enabled("RPA_LOOP_DETECTOR_ENABLED"): return LoopVerdict(detected=False) # Signal A : screen_static verdict = self._check_screen_static(screenshots) if verdict.detected: return verdict # Signal B : action_repeat verdict = self._check_action_repeat(actions) if verdict.detected: return verdict # Signal C : retry_threshold verdict = self._check_retry_threshold(state) if verdict.detected: return verdict return LoopVerdict(detected=False) def _check_screen_static(self, screenshots: List[Any]) -> LoopVerdict: n_required = _env_int("RPA_LOOP_SCREEN_STATIC_N", 4) threshold = _env_float("RPA_LOOP_SCREEN_STATIC_THRESHOLD", 0.99) if self.clip_embedder is None or len(screenshots) < n_required: return LoopVerdict() try: recent = screenshots[-n_required:] sims = [_cosine_similarity(recent[i], recent[i + 1]) for i in range(len(recent) - 1)] min_sim = min(sims) if min_sim > threshold: return LoopVerdict( detected=True, reason="loop_detected", signal="screen_static", evidence={"min_similarity": round(min_sim, 4), "n_captures": n_required, "threshold": threshold}, ) except Exception as e: logger.warning("LoopDetector signal_A erreur (%s) — signal inerte ce tick", e) return LoopVerdict() def _check_action_repeat(self, actions: List[Dict[str, Any]]) -> LoopVerdict: n_required = _env_int("RPA_LOOP_ACTION_REPEAT_N", 3) if len(actions) < n_required: return LoopVerdict() recent = actions[-n_required:] def _signature(a: Dict[str, Any]) -> tuple: return (a.get("type"), a.get("x_pct"), a.get("y_pct")) sigs = [_signature(a) for a in recent] if all(s == sigs[0] for s in sigs): return LoopVerdict( detected=True, reason="loop_detected", signal="action_repeat", evidence={"signature": sigs[0], "count": n_required}, ) return LoopVerdict() def _check_retry_threshold(self, state: Dict[str, Any]) -> LoopVerdict: threshold = _env_int("RPA_LOOP_RETRY_THRESHOLD", 3) retried = int(state.get("retried_actions", 0)) if retried >= threshold: return LoopVerdict( detected=True, reason="loop_detected", signal="retry_threshold", evidence={"retried_actions": retried, "threshold": threshold}, ) return LoopVerdict() ``` - [ ] **Step 2: Re-run les tests, ils doivent tous passer** ```bash pytest tests/unit/test_loop_detector.py -v ``` Expected : `8 passed`. - [ ] **Step 3: Commit** ```bash git add agent_v0/server_v1/loop_detector.py tests/unit/test_loop_detector.py git commit -m "$(cat <<'EOF' feat(qw2): LoopDetector composite (screen_static + action_repeat + retry) Module isolé, 3 signaux indépendants : - screen_static : CLIP similarity > 0.99 sur N captures consécutives - action_repeat : N actions identiques (type+coords) - retry_threshold : retried_actions >= seuil Premier signal positif → LoopVerdict.detected=True (caller responsable de la bascule en paused_need_help). Configurable env vars : RPA_LOOP_DETECTOR_ENABLED (kill-switch), RPA_LOOP_SCREEN_STATIC_N/THRESHOLD, RPA_LOOP_ACTION_REPEAT_N, RPA_LOOP_RETRY_THRESHOLD. Tests : 8 cas (chaque signal isolé, kill-switch, embedder absent, exception). Co-Authored-By: Claude Opus 4.7 (1M context) EOF )" ``` ### Task 15: Étendre `replay_engine.py` — anneaux d'historique dans `_create_replay_state` **Files:** - Modify: `agent_v0/server_v1/replay_engine.py:1452-1524` (`_create_replay_state`) - [ ] **Step 1: Ajouter les deux clés à la fin du dict retourné** Dans `_create_replay_state` (vers la ligne 1523, juste avant la dernière ligne `}` du return), ajouter : ```python # QW2 — Anneaux d'historique pour LoopDetector (5 derniers max) "_screenshot_history": [], # embeddings CLIP des N derniers heartbeats "_action_history": [], # N dernières actions exécutées (signature) ``` (Ils prennent place juste après la clé `"variables": {}`.) - [ ] **Step 2: Vérifier qu'aucun test unitaire de replay_engine n'attend l'absence de ces clés** ```bash grep -rn "_create_replay_state\|_screenshot_history\|_action_history" tests/ | head -20 ``` Si un test fait un `assert state == {...}` strict, l'adapter pour accepter les deux nouvelles clés (typiquement aucun ne le fait — c'est usage défensif). ### Task 16: Hook `loop_detector` dans `api_stream.py` **Files:** - Modify: `agent_v0/server_v1/api_stream.py:3159+` (`report_action_result`) - [ ] **Step 1: Importer en haut du fichier** ```python from agent_v0.server_v1.loop_detector import LoopDetector ``` - [ ] **Step 2: Instancier le détecteur globalement (singleton lazy)** Près des autres globals du module (chercher où `active_processor` est défini) : ```python _loop_detector: Optional[LoopDetector] = None def _get_loop_detector() -> LoopDetector: global _loop_detector if _loop_detector is None: embedder = active_processor._clip_embedder if active_processor else None _loop_detector = LoopDetector(clip_embedder=embedder) return _loop_detector ``` - [ ] **Step 3: Hook dans `report_action_result` après mise à jour de l'état** Localiser dans `report_action_result` (ligne 3159+) l'endroit où le `replay_state` est mis à jour suite au rapport d'action (juste avant le return de la fonction). Ajouter : ```python # QW2 — Mise à jour des anneaux d'historique try: from PIL import Image ss_path = report.screenshot_path or replay_state.get("last_screenshot") if ss_path and os.path.isfile(ss_path) and active_processor and active_processor._clip_embedder: emb = active_processor._clip_embedder.embed_image(Image.open(ss_path)) if emb is not None: replay_state["_screenshot_history"].append(emb.flatten().tolist()) replay_state["_screenshot_history"] = replay_state["_screenshot_history"][-5:] except Exception as e: logger.debug("LoopDetector: embed historique échoué: %s", e) # Snapshot signature de l'action courante replay_state["_action_history"].append({ "type": report.action_type if hasattr(report, "action_type") else "", "x_pct": report.x_pct if hasattr(report, "x_pct") else None, "y_pct": report.y_pct if hasattr(report, "y_pct") else None, }) replay_state["_action_history"] = replay_state["_action_history"][-5:] # Évaluer le LoopDetector try: verdict = _get_loop_detector().evaluate( replay_state, screenshots=replay_state["_screenshot_history"], actions=replay_state["_action_history"], ) if verdict.detected: replay_state["status"] = "paused_need_help" replay_state["pause_reason"] = "loop_detected" replay_state["pause_message"] = ( f"Léa semble bloquée — {verdict.signal} " f"(détail: {verdict.evidence})" ) logger.warning( "LoopDetector: replay %s mis en pause — signal=%s evidence=%s", replay_state["replay_id"], verdict.signal, verdict.evidence, ) # Bus event try: from agent_v0.agent_v1.network.feedback_bus import emit_server_event emit_server_event("lea:loop_detected", { "replay_id": replay_state["replay_id"], "signal": verdict.signal, "evidence": verdict.evidence, }) except Exception: pass except Exception as e: logger.warning("LoopDetector: évaluation échouée (non bloquant): %s", e) ``` - [ ] **Step 4: Re-run baseline pour vérifier non-régression** ```bash pytest tests/test_pipeline_e2e.py \ tests/test_phase0_integration.py \ tests/integration/test_stream_processor.py \ -q ``` Expected : même nombre de passed que `.qw-baseline.log`. ### Task 17: Tests intégration `test_loop_detector_replay.py` **Files:** - Create: `tests/integration/test_loop_detector_replay.py` - [ ] **Step 1: Créer le fichier** ```python # tests/integration/test_loop_detector_replay.py """Tests intégration : un replay simulé qui boucle bascule en paused_need_help.""" import pytest from unittest.mock import MagicMock, patch from agent_v0.server_v1.loop_detector import LoopDetector def test_replay_state_transitions_to_paused_on_screen_static(): """Cas : 4 screenshots identiques → replay passe à paused_need_help.""" embedder = MagicMock() embedder.embed_image.return_value = [1.0, 0.0, 0.0] # constant detector = LoopDetector(clip_embedder=embedder) state = { "replay_id": "r_test", "status": "running", "retried_actions": 0, "_screenshot_history": [[1.0, 0.0, 0.0]] * 4, "_action_history": [ {"type": "click", "x_pct": 0.1, "y_pct": 0.1}, {"type": "type", "x_pct": 0.2, "y_pct": 0.2}, ], } verdict = detector.evaluate(state, state["_screenshot_history"], state["_action_history"]) # Simuler ce que ferait api_stream après verdict if verdict.detected: state["status"] = "paused_need_help" state["pause_reason"] = verdict.reason state["pause_message"] = f"signal={verdict.signal}" assert state["status"] == "paused_need_help" assert state["pause_reason"] == "loop_detected" assert "screen_static" in state["pause_message"] def test_replay_state_transitions_on_action_repeat(): """Cas : 3 actions identiques → paused_need_help signal action_repeat.""" detector = LoopDetector(clip_embedder=None) actions = [{"type": "click", "x_pct": 0.5, "y_pct": 0.5}] * 3 state = {"replay_id": "r2", "status": "running", "retried_actions": 0, "_screenshot_history": [], "_action_history": actions} verdict = detector.evaluate(state, [], actions) assert verdict.detected and verdict.signal == "action_repeat" def test_kill_switch_keeps_replay_running(monkeypatch): """Avec RPA_LOOP_DETECTOR_ENABLED=0 le replay continue même en boucle.""" monkeypatch.setenv("RPA_LOOP_DETECTOR_ENABLED", "0") embedder = MagicMock() embedder.embed_image.return_value = [1.0, 0.0, 0.0] detector = LoopDetector(clip_embedder=embedder) state = {"retried_actions": 10, "_screenshot_history": [[1.0, 0.0, 0.0]] * 10, "_action_history": [{"type": "click", "x_pct": 0.5, "y_pct": 0.5}] * 10} verdict = detector.evaluate(state, state["_screenshot_history"], state["_action_history"]) assert verdict.detected is False ``` - [ ] **Step 2: Run** ```bash pytest tests/integration/test_loop_detector_replay.py -v ``` Expected : `3 passed`. ### Task 18: Commit QW2 + push + re-run baseline - [ ] **Step 1: Re-run baseline complète** ```bash pytest tests/test_pipeline_e2e.py \ tests/test_phase0_integration.py \ tests/integration/test_stream_processor.py \ tests/unit/test_loop_detector.py \ tests/integration/test_loop_detector_replay.py \ -q ``` Expected : tous passed, aucun nouveau failure par rapport à `.qw-baseline.log`. - [ ] **Step 2: Commit final QW2** ```bash git add agent_v0/server_v1/replay_engine.py \ agent_v0/server_v1/api_stream.py \ tests/integration/test_loop_detector_replay.py git commit -m "$(cat <<'EOF' feat(qw2): hook LoopDetector dans api_stream + extension replay_state replay_state enrichi de _screenshot_history (5 derniers embeddings CLIP) et _action_history (5 dernières signatures action). report_action_result : - met à jour les deux anneaux après chaque action - évalue le LoopDetector (singleton lazy) - si detected → bascule paused_need_help avec pause_reason="loop_detected" et bus event lea:loop_detected (signal + evidence) Tous les chemins d'erreur (embedder absent, OOM, exception) loggent et laissent le replay continuer — aucun blocage par la couche détection. Co-Authored-By: Claude Opus 4.7 (1M context) EOF )" ``` - [ ] **Step 3: Push branche sur Gitea (backup distant après QW2)** ```bash git push gitea feature/qw-suite-mai ``` --- ## Section 3 — QW4 Safety Checks Hybrides ### Task 19: Tests unitaires `test_safety_checks_provider.py` (rouges) **Files:** - Create: `tests/unit/test_safety_checks_provider.py` - [ ] **Step 1: Créer le fichier avec les 7 cas** ```python # tests/unit/test_safety_checks_provider.py """Tests unitaires SafetyChecksProvider (QW4).""" import json import pytest from unittest.mock import patch, MagicMock from agent_v0.server_v1.safety_checks_provider import build_pause_payload, PausePayload def _action(safety_level=None, declarative_checks=None, message="Validation"): params = {"message": message} if safety_level: params["safety_level"] = safety_level if declarative_checks is not None: params["safety_checks"] = declarative_checks return {"type": "pause_for_human", "parameters": params} def test_only_declarative_when_no_safety_level(): """Pas de safety_level → uniquement les checks déclaratifs, pas d'appel LLM.""" decl = [{"id": "c1", "label": "Vérifier IPP", "required": True}] with patch("agent_v0.server_v1.safety_checks_provider._call_llm_for_contextual_checks") as mock_llm: payload = build_pause_payload(_action(declarative_checks=decl), {}, last_screenshot=None) mock_llm.assert_not_called() assert len(payload.checks) == 1 assert payload.checks[0]["source"] == "declarative" def test_hybrid_appends_llm_checks_on_medical_critical(monkeypatch): """safety_level=medical_critical → LLM appelé, checks concaténés.""" decl = [{"id": "c1", "label": "Vérifier IPP", "required": True}] llm_resp = [{"label": "Nom patient suspect à l'écran", "evidence": "vu un nom différent"}] with patch("agent_v0.server_v1.safety_checks_provider._call_llm_for_contextual_checks", return_value=llm_resp) as mock_llm: payload = build_pause_payload( _action(safety_level="medical_critical", declarative_checks=decl), {}, last_screenshot="/tmp/fake.png", ) mock_llm.assert_called_once() assert len(payload.checks) == 2 assert payload.checks[0]["source"] == "declarative" assert payload.checks[1]["source"] == "llm_contextual" assert payload.checks[1]["evidence"] == "vu un nom différent" def test_llm_timeout_falls_back_to_declarative_only(): """LLM timeout → additional_checks=[], pas de crash, déclaratifs gardés.""" decl = [{"id": "c1", "label": "Vérifier IPP", "required": True}] with patch("agent_v0.server_v1.safety_checks_provider._call_llm_for_contextual_checks", return_value=[]) as mock_llm: payload = build_pause_payload( _action(safety_level="medical_critical", declarative_checks=decl), {}, last_screenshot="/tmp/fake.png", ) assert len(payload.checks) == 1 assert payload.checks[0]["source"] == "declarative" def test_llm_invalid_response_falls_back(): """Si _call_llm retourne [] (parse échoué en interne) → fallback safe.""" with patch("agent_v0.server_v1.safety_checks_provider._call_llm_for_contextual_checks", return_value=[]): payload = build_pause_payload( _action(safety_level="medical_critical", declarative_checks=[]), {}, last_screenshot="/tmp/fake.png", ) assert payload.checks == [] def test_kill_switch_disables_llm_call(monkeypatch): """RPA_SAFETY_CHECKS_LLM_ENABLED=0 → LLM jamais appelé.""" monkeypatch.setenv("RPA_SAFETY_CHECKS_LLM_ENABLED", "0") decl = [{"id": "c1", "label": "X", "required": True}] with patch("agent_v0.server_v1.safety_checks_provider._call_llm_for_contextual_checks") as mock_llm: payload = build_pause_payload( _action(safety_level="medical_critical", declarative_checks=decl), {}, last_screenshot="/tmp/fake.png", ) mock_llm.assert_not_called() assert len(payload.checks) == 1 def test_max_checks_respected(monkeypatch): """RPA_SAFETY_CHECKS_LLM_MAX_CHECKS=2 → max 2 checks LLM ajoutés.""" monkeypatch.setenv("RPA_SAFETY_CHECKS_LLM_MAX_CHECKS", "2") decl = [] llm_resp = [ {"label": f"Check {i}", "evidence": f"e{i}"} for i in range(5) ] with patch("agent_v0.server_v1.safety_checks_provider._call_llm_for_contextual_checks", return_value=llm_resp[:2]): # provider tronque déjà payload = build_pause_payload( _action(safety_level="medical_critical", declarative_checks=decl), {}, last_screenshot="/tmp/fake.png", ) assert len(payload.checks) == 2 def test_empty_declarative_with_llm_returns_only_llm(): """Pas de déclaratif + LLM ajoute 2 checks → payload contient les 2.""" llm_resp = [{"label": "Vérifier date", "evidence": "date 1900 suspecte"}, {"label": "Vérifier devise", "evidence": "montant en USD au lieu d'EUR"}] with patch("agent_v0.server_v1.safety_checks_provider._call_llm_for_contextual_checks", return_value=llm_resp): payload = build_pause_payload( _action(safety_level="medical_critical", declarative_checks=[]), {}, last_screenshot="/tmp/fake.png", ) assert len(payload.checks) == 2 assert all(c["source"] == "llm_contextual" for c in payload.checks) ``` - [ ] **Step 2: Run pour vérifier qu'ils échouent** ```bash pytest tests/unit/test_safety_checks_provider.py -v ``` Expected : `ModuleNotFoundError`. ### Task 20: Implémenter `safety_checks_provider.py` **Files:** - Create: `agent_v0/server_v1/safety_checks_provider.py` - [ ] **Step 1: Écrire le module complet** ```python # agent_v0/server_v1/safety_checks_provider.py """SafetyChecksProvider — checks hybrides déclaratifs + LLM contextuels (QW4). Pour une action pause_for_human : - les checks déclaratifs (workflow) sont toujours inclus - si safety_level == "medical_critical" et RPA_SAFETY_CHECKS_LLM_ENABLED=1, un appel LLM (medgemma:4b par défaut) ajoute jusqu'à N checks contextuels Tout échec côté LLM (timeout, exception, parse) → additional_checks=[] : le replay continue avec uniquement les déclaratifs (fallback safe). """ import base64 import io import json import logging import os import uuid from dataclasses import dataclass, field from typing import Any, Dict, List, Optional logger = logging.getLogger(__name__) @dataclass class PausePayload: checks: List[Dict[str, Any]] = field(default_factory=list) pause_reason: str = "" message: str = "" def _env(name: str, default: str) -> str: return os.environ.get(name, default).strip() def _env_int(name: str, default: int) -> int: try: return int(os.environ.get(name, default)) except (TypeError, ValueError): return default def _env_bool_enabled(name: str) -> bool: val = os.environ.get(name, "1").strip().lower() return val not in ("0", "false", "no", "off", "") def build_pause_payload( action: Dict[str, Any], replay_state: Dict[str, Any], last_screenshot: Optional[str], ) -> PausePayload: """Construit le payload de pause enrichi pour une action pause_for_human.""" params = action.get("parameters") or {} message = params.get("message", "Validation requise") safety_level = params.get("safety_level") declarative = params.get("safety_checks") or [] # Normalisation des checks déclaratifs checks: List[Dict[str, Any]] = [] for d in declarative: checks.append({ "id": d.get("id") or f"decl_{uuid.uuid4().hex[:6]}", "label": d.get("label", "Validation"), "required": bool(d.get("required", True)), "source": "declarative", "evidence": None, }) # Ajout LLM contextual si applicable if safety_level == "medical_critical" and _env_bool_enabled("RPA_SAFETY_CHECKS_LLM_ENABLED"): try: additional = _call_llm_for_contextual_checks( action=action, replay_state=replay_state, last_screenshot=last_screenshot, existing_labels=[c["label"] for c in checks], ) except Exception as e: logger.warning("safety_checks LLM exception (%s) — fallback safe", e) additional = [] for a in additional: checks.append({ "id": f"llm_{uuid.uuid4().hex[:6]}", "label": a.get("label", ""), "required": False, # checks LLM = informationnels, pas obligatoires V1 "source": "llm_contextual", "evidence": a.get("evidence", ""), }) return PausePayload( checks=checks, pause_reason="", message=message, ) def _call_llm_for_contextual_checks( action: Dict[str, Any], replay_state: Dict[str, Any], last_screenshot: Optional[str], existing_labels: List[str], ) -> List[Dict[str, str]]: """Appelle Ollama en mode JSON strict pour générer 0-N checks contextuels. Returns: List[{label, evidence}] (max RPA_SAFETY_CHECKS_LLM_MAX_CHECKS). [] sur tout échec (timeout, JSON invalide, exception). """ import requests model = _env("RPA_SAFETY_CHECKS_LLM_MODEL", "medgemma:4b") timeout_s = _env_int("RPA_SAFETY_CHECKS_LLM_TIMEOUT_S", 5) max_checks = _env_int("RPA_SAFETY_CHECKS_LLM_MAX_CHECKS", 3) ollama_url = _env("OLLAMA_URL", "http://localhost:11434") params = action.get("parameters") or {} workflow_message = params.get("message", "") existing = ", ".join(existing_labels) if existing_labels else "aucun" prompt = f"""Tu es Léa, assistante médicale supervisée. Avant de continuer le workflow, tu dois lister 0 à {max_checks} vérifications supplémentaires que l'humain doit acquitter, en regardant l'écran actuel. Contexte workflow : {workflow_message} Checks déjà demandés : {existing} NE répète PAS un check déjà demandé. Si rien d'inhabituel à signaler, retourne {{"additional_checks": []}}. Réponds UNIQUEMENT en JSON : {{ "additional_checks": [ {{"label": "string court", "evidence": "ce que tu as vu d'inhabituel"}} ] }} """ payload = { "model": model, "prompt": prompt, "stream": False, "format": "json", "options": {"temperature": 0.1, "num_predict": 200}, } if last_screenshot and os.path.isfile(last_screenshot): try: with open(last_screenshot, "rb") as f: payload["images"] = [base64.b64encode(f.read()).decode("ascii")] except Exception as e: logger.debug("safety_checks: lecture screenshot échouée (%s) — appel sans image", e) try: response = requests.post( f"{ollama_url}/api/generate", json=payload, timeout=timeout_s, ) if response.status_code != 200: logger.warning("safety_checks LLM HTTP %s", response.status_code) return [] text = response.json().get("response", "").strip() except requests.Timeout: logger.warning("safety_checks LLM timeout (%ss)", timeout_s) return [] except Exception as e: logger.warning("safety_checks LLM erreur réseau: %s", e) return [] # format=json garantit normalement du JSON valide try: parsed = json.loads(text) except json.JSONDecodeError as e: logger.warning("safety_checks LLM JSON invalide (%s) — fallback safe", e) return [] additional = parsed.get("additional_checks") or [] if not isinstance(additional, list): return [] # Filtre + tronc valid = [] for item in additional[:max_checks]: if isinstance(item, dict) and item.get("label"): valid.append({ "label": str(item["label"])[:200], "evidence": str(item.get("evidence", ""))[:300], }) return valid ``` - [ ] **Step 2: Re-run les tests, ils doivent tous passer** ```bash pytest tests/unit/test_safety_checks_provider.py -v ``` Expected : `7 passed`. - [ ] **Step 3: Commit** ```bash git add agent_v0/server_v1/safety_checks_provider.py tests/unit/test_safety_checks_provider.py git commit -m "$(cat <<'EOF' feat(qw4): SafetyChecksProvider hybride déclaratif + LLM contextuel build_pause_payload(action, state, last_screenshot) → PausePayload - Toujours inclure les checks déclaratifs (workflow.parameters.safety_checks) - Si safety_level=medical_critical ET RPA_SAFETY_CHECKS_LLM_ENABLED=1 : appel LLM (medgemma:4b par défaut) en format=json strict, timeout 5s, max 3 checks ajoutés (configurables via env vars) - Tous les chemins d'erreur (timeout, HTTP, JSON parse, exception) loggent et retournent [] (fallback safe : déclaratifs seuls) Tests : 7 cas (déclaratif seul, hybride OK, timeout, LLM invalide, kill-switch, max_checks, déclaratif vide). Co-Authored-By: Claude Opus 4.7 (1M context) EOF )" ``` ### Task 21: Hook `safety_checks_provider` dans `replay_engine.py` **Files:** - Modify: `agent_v0/server_v1/replay_engine.py:1452-1524` (`_create_replay_state`) — ajouter clés - Modify: `agent_v0/server_v1/api_stream.py` (branche pause_for_human, ~line 2918) - [ ] **Step 1: Étendre `_create_replay_state` avec les clés audit** Dans `_create_replay_state`, après les clés QW2 ajoutées en Task 15 : ```python # QW4 — Safety checks et audit acquittements "safety_checks": [], # liste produite par SafetyChecksProvider "checks_acknowledged": [], # ids acquittés via /replay/resume (audit trail) "pause_reason": "", # "loop_detected" | "" pour V1 "pause_payload": None, # payload complet pour debug/audit ``` - [ ] **Step 2: Localiser la branche `pause_for_human` dans api_stream.py** ```bash grep -n "pause_for_human" agent_v0/server_v1/api_stream.py | head -10 ``` Cible : ligne 2918 (commentaire actuel : "pause_for_human ignorée (mode autonome)") et le bloc qui suit pour le mode supervisé (probablement quelques lignes plus bas). - [ ] **Step 3: Modifier la branche supervisée pour appeler le provider** Avant la mise à jour `replay_state["status"] = "paused_need_help"` dans la branche supervisée : ```python # QW4 — Construire le payload de pause enrichi from agent_v0.server_v1.safety_checks_provider import build_pause_payload last_screenshot = replay_state.get("last_screenshot") payload = build_pause_payload(action, replay_state, last_screenshot) replay_state["safety_checks"] = payload.checks replay_state["pause_payload"] = { "checks": payload.checks, "pause_reason": payload.pause_reason, "message": payload.message, } replay_state["pause_message"] = payload.message # Bus event d'observabilité try: from agent_v0.agent_v1.network.feedback_bus import emit_server_event emit_server_event("lea:safety_checks_generated", { "replay_id": replay_state.get("replay_id"), "count": len(payload.checks), "sources": [c["source"] for c in payload.checks], }) except Exception: pass ``` - [ ] **Step 4: Re-run baseline** ```bash pytest tests/test_pipeline_e2e.py \ tests/test_phase0_integration.py \ tests/integration/test_stream_processor.py \ -q ``` Expected : même résultat baseline. ### Task 22: Tests intégration `test_replay_resume_acknowledgments.py` **Files:** - Create: `tests/integration/test_replay_resume_acknowledgments.py` - [ ] **Step 1: Créer le fichier** ```python # tests/integration/test_replay_resume_acknowledgments.py """Tests intégration : /replay/resume valide les acquittements de safety_checks (QW4).""" import pytest def test_resume_accepts_when_all_required_acknowledged(): """État pause + tous required acquittés → reprise OK.""" state = { "status": "paused_need_help", "safety_checks": [ {"id": "c1", "label": "X", "required": True, "source": "declarative", "evidence": None}, {"id": "c2", "label": "Y", "required": True, "source": "declarative", "evidence": None}, ], "checks_acknowledged": [], } # Simuler la validation côté serveur acknowledged = ["c1", "c2"] required_ids = {c["id"] for c in state["safety_checks"] if c["required"]} missing = required_ids - set(acknowledged) assert missing == set() # rien ne manque → reprise OK def test_resume_rejects_when_required_missing(): """État pause + un required non acquitté → 400 required_checks_missing.""" state = { "status": "paused_need_help", "safety_checks": [ {"id": "c1", "label": "X", "required": True, "source": "declarative", "evidence": None}, {"id": "c2", "label": "Y", "required": False, "source": "llm_contextual", "evidence": "..."}, ], "checks_acknowledged": [], } acknowledged = ["c2"] # only optional required_ids = {c["id"] for c in state["safety_checks"] if c["required"]} missing = required_ids - set(acknowledged) assert missing == {"c1"} # c1 manquant → resume doit retourner 400 def test_resume_audit_trail_stored(): """checks_acknowledged contient les ids reçus (audit).""" state = { "status": "paused_need_help", "safety_checks": [ {"id": "c1", "required": True, "label": "X", "source": "declarative", "evidence": None}, ], "checks_acknowledged": [], } acknowledged = ["c1"] state["checks_acknowledged"] = acknowledged state["status"] = "running" assert state["checks_acknowledged"] == ["c1"] assert state["status"] == "running" ``` - [ ] **Step 2: Run** ```bash pytest tests/integration/test_replay_resume_acknowledgments.py -v ``` Expected : `3 passed`. ### Task 23: Modifier endpoint `/replay/resume` dans `api_stream.py` **Files:** - Modify: `agent_v0/server_v1/api_stream.py:3974-3990+` (`/replay/resume`) - [ ] **Step 1: Localiser la fonction** ```bash grep -n "def.*resume.*replay\|@app.post.*resume\|/replay/resume" agent_v0/server_v1/api_stream.py | head -5 ``` Cible attendue : ligne ~3974. - [ ] **Step 2: Modifier la signature pour accepter `acknowledged_check_ids`** Soit la fonction existante : ```python @app.post("/replay/resume") async def resume_replay(...): ... ``` Étendre le body Pydantic pour accepter optionnellement `acknowledged_check_ids: List[str] = []`. Exemple : si la signature actuelle est `async def resume_replay(payload: ReplayResumeRequest):`, modifier le modèle `ReplayResumeRequest` pour ajouter `acknowledged_check_ids: List[str] = []`. - [ ] **Step 3: Vérifier les acquittements avant la reprise effective** À l'intérieur de la fonction, juste après avoir confirmé `state["status"] == "paused_need_help"` : ```python # QW4 — Vérification des safety_checks required safety_checks = state.get("safety_checks") or [] if safety_checks: required_ids = {c["id"] for c in safety_checks if c.get("required")} ack_set = set(payload.acknowledged_check_ids or []) missing = list(required_ids - ack_set) if missing: raise HTTPException( status_code=400, detail={"error": "required_checks_missing", "missing": missing}, ) # Audit trail state["checks_acknowledged"] = list(ack_set) ``` - [ ] **Step 4: Re-run baseline + tests intégration** ```bash pytest tests/test_pipeline_e2e.py \ tests/test_phase0_integration.py \ tests/integration/test_stream_processor.py \ tests/integration/test_replay_resume_acknowledgments.py \ -q ``` Expected : tous passed, baseline préservée. - [ ] **Step 5: Commit** ```bash git add agent_v0/server_v1/replay_engine.py \ agent_v0/server_v1/api_stream.py \ tests/integration/test_replay_resume_acknowledgments.py git commit -m "$(cat <<'EOF' feat(qw4): hook safety_checks_provider + extension /replay/resume avec acquittements replay_state enrichi de safety_checks, checks_acknowledged, pause_reason, pause_payload (audit trail). Branche supervisée pause_for_human : - appel build_pause_payload() avant bascule paused_need_help - bus event lea:safety_checks_generated (count, sources) POST /replay/resume : - accepte body { acknowledged_check_ids: [...] } - vérifie tous les checks required acquittés, sinon 400 required_checks_missing - stocke checks_acknowledged comme audit trail Backward 100% : workflows sans safety_checks → resume sans acquittement requis. Co-Authored-By: Claude Opus 4.7 (1M context) EOF )" ``` ### Task 24: Étendre `types.ts` côté frontend VWB **Files:** - Modify: `visual_workflow_builder/frontend_v4/src/types.ts:46` et alentours, et le type `Execution` - [ ] **Step 1: Localiser le type `PauseAction`** ```bash grep -n "pause_for_human\|PauseAction\|safety_checks\|Execution" visual_workflow_builder/frontend_v4/src/types.ts | head -20 ``` - [ ] **Step 2: Ajouter les types `SafetyCheck` et étendre `PauseAction.parameters`** Au début du fichier, après les imports : ```typescript export type SafetyLevel = 'standard' | 'medical_critical'; export interface SafetyCheck { id: string; label: string; required: boolean; source: 'declarative' | 'llm_contextual'; evidence?: string | null; } ``` Étendre les params de l'action pause_for_human (chercher dans la définition `ActionDef` ou similaire à la ligne 135) : ```typescript { type: 'pause_for_human', label: 'Pause supervisée', ..., params: [ { key: 'message', label: 'Message', type: 'text' }, { key: 'safety_level', label: 'Niveau', type: 'select', options: ['standard', 'medical_critical'] }, { key: 'safety_checks', label: 'Checks à valider', type: 'checks_editor' }, ], } ``` Étendre le type `Execution` pour transporter le payload de pause : ```typescript export interface Execution { // ... champs existants ... pause_reason?: string; pause_message?: string; safety_checks?: SafetyCheck[]; } ``` - [ ] **Step 3: Vérifier la compilation TypeScript** ```bash cd visual_workflow_builder/frontend_v4 && npx tsc --noEmit 2>&1 | head -30 ``` Expected : aucune erreur (ou seulement les erreurs préexistantes hors de ce diff). ### Task 25: Créer le composant `PauseDialog.tsx` **Files:** - Create: `visual_workflow_builder/frontend_v4/src/components/PauseDialog.tsx` - [ ] **Step 1: Écrire le composant** ```tsx // visual_workflow_builder/frontend_v4/src/components/PauseDialog.tsx import { useState, useMemo } from 'react'; import type { SafetyCheck } from '../types'; interface Props { pauseMessage: string; pauseReason?: string; safetyChecks: SafetyCheck[]; onResume: (acknowledgedIds: string[]) => Promise; onCancel: () => void; } export default function PauseDialog({ pauseMessage, pauseReason, safetyChecks, onResume, onCancel, }: Props) { const [checked, setChecked] = useState>({}); const [submitting, setSubmitting] = useState(false); const [error, setError] = useState(null); const allRequiredOK = useMemo(() => { return safetyChecks .filter((c) => c.required) .every((c) => checked[c.id] === true); }, [safetyChecks, checked]); const toggle = (id: string) => { setChecked((prev) => ({ ...prev, [id]: !prev[id] })); }; const handleResume = async () => { setSubmitting(true); setError(null); try { const acknowledgedIds = Object.entries(checked) .filter(([, v]) => v) .map(([k]) => k); await onResume(acknowledgedIds); } catch (e: any) { setError(e?.message || 'Erreur lors de la reprise'); } finally { setSubmitting(false); } }; // Backward compat : pas de checks → bulle simple legacy if (safetyChecks.length === 0) { return (

{pauseMessage}

{pauseReason && Raison : {pauseReason}}
); } return (

Pause supervisée

{pauseMessage}

{pauseReason && (
Raison : {pauseReason}
)}
    {safetyChecks.map((c) => (
  • {c.source === 'llm_contextual' && c.evidence && ( → {c.evidence} )}
  • ))}
{error &&
{error}
}
); } ``` - [ ] **Step 2: Ajouter le CSS minimal (dans le fichier CSS global ou inline)** Identifier le fichier CSS actif : ```bash ls visual_workflow_builder/frontend_v4/src/*.css ``` Ajouter : ```css .pause-dialog-checks { padding: 16px; max-width: 480px; background: #fff; border: 2px solid #f59e0b; border-radius: 8px; } .pause-dialog-checks h3 { margin: 0 0 8px; color: #92400e; } .pause-message { margin: 0 0 12px; } .pause-reason-banner { background: #fef3c7; padding: 8px; margin-bottom: 12px; border-radius: 4px; } .checklist-panel { list-style: none; padding: 0; margin: 0 0 12px; } .check-item { padding: 6px 0; border-bottom: 1px solid #f3f4f6; } .check-item.required { background: #fef9c3; } .check-item label { cursor: pointer; display: flex; align-items: center; gap: 6px; } .badge { font-size: 10px; padding: 2px 6px; border-radius: 10px; margin-left: 6px; } .badge-required { background: #dc2626; color: #fff; } .badge-lea { background: #2563eb; color: #fff; cursor: help; } .check-evidence { display: block; font-style: italic; color: #6b7280; margin-left: 24px; } .pause-error { color: #dc2626; padding: 8px; background: #fef2f2; border-radius: 4px; margin-bottom: 8px; } .pause-actions button:disabled { opacity: 0.5; cursor: not-allowed; } ``` - [ ] **Step 3: Brancher le composant dans le rendu existant de la pause** Localiser où la pause est actuellement rendue : ```bash grep -rn "pause_for_human\|paused_need_help\|Continuer\|onResume" visual_workflow_builder/frontend_v4/src/ | head -20 ``` Remplacer le rendu existant par `` avec les props issues du state d'exécution. ### Task 26: Étendre `PropertiesPanel.tsx` — éditeur de safety_level + safety_checks **Files:** - Modify: `visual_workflow_builder/frontend_v4/src/components/PropertiesPanel.tsx:1356` - [ ] **Step 1: Localiser la branche `case 'pause_for_human':`** Vers la ligne 1356 (déjà repéré). Lire les ~50 lignes qui suivent pour voir le pattern d'édition existant. - [ ] **Step 2: Ajouter les éditeurs après le champ message** ```tsx // Dans la branche case 'pause_for_human': <> {/* Champ message existant — ne pas toucher */} {/* QW4 — Niveau de sécurité */} {/* QW4 — Liste éditable de checks déclaratifs */} {(params.safety_checks || []).map((check: any, i: number) => (
{ const next = [...(params.safety_checks || [])]; next[i] = { ...check, id: e.target.value }; updateParam('safety_checks', next); }} /> { const next = [...(params.safety_checks || [])]; next[i] = { ...check, label: e.target.value }; updateParam('safety_checks', next); }} />
))} ``` (Adapter `updateParam` au nom réel de la fonction d'édition utilisée dans le fichier — vérifier le pattern existant ligne ~1356.) - [ ] **Step 3: Vérifier la compilation** ```bash cd visual_workflow_builder/frontend_v4 && npx tsc --noEmit 2>&1 | head -30 ``` ### Task 27: Checklist compat VWB manuelle **Files:** aucun (test manuel observable) - [ ] **Step 1: Démarrer le frontend Vite** ```bash cd visual_workflow_builder/frontend_v4 && npm run dev ``` - [ ] **Step 2: Workflow ancien (sans safety_checks) → bulle simple** Ouvrir un workflow existant validé 30/04. Lancer le replay. Quand la pause apparaît : la bulle doit être identique à avant (Continuer, Annuler, pas de checklist). - [ ] **Step 3: Workflow nouveau avec safety_checks déclaratifs** Créer un workflow avec une action `pause_for_human` ayant 2 safety_checks déclaratifs `required: true`. Lancer. Vérifier : - ChecklistPanel s'affiche - Bouton Continuer désactivé tant que les 2 cases ne sont pas cochées - Pas d'appel Ollama dans les logs serveur (vérifier `journalctl -u rpa-streaming -f | grep -i ollama` sur la fenêtre du replay) - [ ] **Step 4: Workflow `medical_critical` avec LLM** Modifier le workflow précédent : `safety_level: medical_critical`. Re-lancer. Vérifier : - Logs serveur affichent un appel à `medgemma:4b` dans les 5s - ChecklistPanel affiche 2 checks déclaratifs + 0-3 checks `[Léa]` (avec evidence en tooltip) - Si Ollama down : pas de crash, juste 2 checks déclaratifs (kill-switch implicite) - [ ] **Step 5: Test mauvais payload** Cocher tous les optionnels mais pas un required → Continuer reste désactivé. Force un POST direct au serveur via curl : ```bash curl -X POST http://localhost:5005/replay/resume \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $RPA_API_TOKEN" \ -d '{"replay_id":"...","acknowledged_check_ids":[]}' ``` Expected : `400 {"detail": {"error": "required_checks_missing", "missing": [...]}}`. - [ ] **Step 6: Vérifier DB workflow.db ouvre correctement** ```bash sqlite3 visual_workflow_builder/backend/instance/workflows.db ".tables" sqlite3 visual_workflow_builder/backend/instance/workflows.db "SELECT id, name FROM workflows LIMIT 3;" ``` Expected : aucune erreur, schéma intact. ### Task 28: Smoke démo full chain QW4 sur Easily Assure **Files:** aucun (test manuel observable) - [ ] **Step 1: Restart streaming + frontend Vite** ```bash ./svc.sh restart streaming # Frontend Vite reste actif depuis Task 27 ``` - [ ] **Step 2: Modifier UN workflow Easily Assure existant pour ajouter une pause `medical_critical`** Dans VWB, sur un workflow UHCD validé : insérer une action `pause_for_human` avant l'étape de validation finale, avec : - `safety_level: medical_critical` - `safety_checks: [{id:check_ipp, label:"IPP correct ?", required:true}, {id:check_diag, label:"Diagnostic confirmé ?", required:true}]` - [ ] **Step 3: Lancer le replay sur Agent V1 Windows** Vérifier la chaîne complète : - Workflow déroule jusqu'à la pause - Léa émet `lea:safety_checks_generated` avec checks déclaratifs + LLM - VWB affiche `` avec 2-5 checks - Médecin (toi) coche les checks - Continuer envoie le POST - Replay reprend, finit - [ ] **Step 4: Vérifier audit trail dans les logs** ```bash journalctl -u rpa-streaming -n 200 | grep -E "checks_acknowledged|safety_checks_generated|safety_checks_llm_failed" | tail -10 ``` Expected : trace propre. ### Task 29: Commit final QW4 + push + re-run baseline complète - [ ] **Step 1: Re-run baseline complète + tous les tests QW** ```bash pytest tests/test_pipeline_e2e.py \ tests/test_phase0_integration.py \ tests/integration/test_stream_processor.py \ tests/unit/test_monitor_router.py \ tests/integration/test_grounding_offset.py \ tests/unit/test_loop_detector.py \ tests/integration/test_loop_detector_replay.py \ tests/unit/test_safety_checks_provider.py \ tests/integration/test_replay_resume_acknowledgments.py \ -q ``` Expected : tous passed, baseline préservée. - [ ] **Step 2: Commit final QW4 (frontend)** ```bash git add visual_workflow_builder/frontend_v4/src/types.ts \ visual_workflow_builder/frontend_v4/src/components/PauseDialog.tsx \ visual_workflow_builder/frontend_v4/src/components/PropertiesPanel.tsx \ visual_workflow_builder/frontend_v4/src/*.css git commit -m "$(cat <<'EOF' feat(vwb): PauseDialog + ChecklistPanel + extension PropertiesPanel pour safety_checks PauseDialog (composant nouveau) : - 2 modes selon payload : bulle simple legacy si safety_checks vide, ChecklistPanel sinon - Continuer désactivé tant que required non cochés - Badge [obligatoire] et [Léa] (avec evidence en tooltip) - POST /replay/resume avec acknowledged_check_ids types.ts : SafetyCheck, SafetyLevel, extension Execution. PropertiesPanel : éditeur safety_level (dropdown standard/medical_critical) + liste éditable de safety_checks (id/label/required + ajout/suppression). Backward 100% : workflows existants sans safety_checks affichent la bulle legacy identique au comportement actuel. Co-Authored-By: Claude Opus 4.7 (1M context) EOF )" ``` - [ ] **Step 3: Push branche sur Gitea (livraison QW1+QW2+QW4 distante)** ```bash git push gitea feature/qw-suite-mai ``` --- ## Section 4 — Documentation & MEMORY ### Task 30: Créer la doc de livraison + maj MEMORY **Files:** - Create: `docs/QW_SUITE_MAI.md` - Modify: `/home/dom/.claude/projects/-home-dom-ai-rpa-vision-v3/memory/MEMORY.md` - [ ] **Step 1: Créer `docs/QW_SUITE_MAI.md`** ```markdown # QW Suite Mai 2026 — Synthèse de livraison Sprint d'amélioration RPA Vision V3, branche `feature/qw-suite-mai`, inspiré par exploration comparative de 5 frameworks computer-use (Simular Agent-S, browser-use, OpenAI CUA, Coasty, Showlab OOTB). ## Trois quick wins livrés - **QW1 — Multi-écrans** : capture/grounding par `monitor_index` avec fallbacks focus actif puis composite. Backward 100% sur workflows existants. - **QW2 — LoopDetector composite** : détection passive de stagnation via 3 signaux (CLIP screen_static + action_repeat + retry_threshold). Bascule en `paused_need_help` automatique. - **QW4 — Safety checks hybrides** : `pause_for_human` enrichi de checks déclaratifs (workflow) + LLM contextuels (`medgemma:4b` local, timeout 5s, fallback safe). UX VWB avec ChecklistPanel acquittable. ## Kill-switches en cas de problème ```bash systemctl edit rpa-streaming # Ajouter : Environment=RPA_LOOP_DETECTOR_ENABLED=0 Environment=RPA_SAFETY_CHECKS_LLM_ENABLED=0 systemctl restart rpa-streaming ``` Rollback complet : `git checkout backup/pre-qw-suite-mai-2026-05-05`. ## Référence design `docs/superpowers/specs/2026-05-05-qw-suite-mai-design.md` ## Référence plan d'exécution `docs/superpowers/plans/2026-05-05-qw-suite-mai.md` ``` - [ ] **Step 2: Mettre à jour MEMORY.md (ajouter une ligne d'index)** Ajouter dans `/home/dom/.claude/projects/-home-dom-ai-rpa-vision-v3/memory/MEMORY.md`, dans une section appropriée (après les autres specs/sessions) : ```markdown ## ⭐ Sprint QW Suite Mai 2026 (multi-écrans + LoopDetector + safety_checks) See [docs/QW_SUITE_MAI.md](../../../docs/QW_SUITE_MAI.md) — branche `feature/qw-suite-mai`, 3 modules serveur isolés + UI VWB. Kill-switches env vars sur QW2/QW4. Spec : `docs/superpowers/specs/2026-05-05-qw-suite-mai-design.md`. Plan : `docs/superpowers/plans/2026-05-05-qw-suite-mai.md`. ``` - [ ] **Step 3: Commit final docs** ```bash git add docs/QW_SUITE_MAI.md git commit -m "$(cat <<'EOF' docs(qw): synthèse de livraison QW suite mai 2026 Doc condensée des 3 quick wins livrés (QW1 multi-écrans, QW2 LoopDetector, QW4 safety_checks hybrides) avec procédures kill-switch et rollback. Pointe vers spec et plan d'exécution complets. Co-Authored-By: Claude Opus 4.7 (1M context) EOF )" git push gitea feature/qw-suite-mai ``` --- ## Récapitulatif des commits attendus ``` 1. docs(qw): spec design QW suite mai 2026 (DÉJÀ FAIT — commit 2a07d8084) 2. feat(qw1): MonitorRouter — résolution écran cible 3. feat(qw1): capture par monitor + propagation offsets dans grounding cascade 4. feat(qw1): enrichissement Agent V1 + hook serveur api_stream 5. feat(qw2): LoopDetector composite (3 signaux + kill-switch) 6. feat(qw2): hook LoopDetector dans api_stream + extension replay_state 7. feat(qw4): SafetyChecksProvider hybride déclaratif + LLM contextuel 8. feat(qw4): hook safety_checks_provider + extension /replay/resume 9. feat(vwb): PauseDialog + ChecklistPanel + extension PropertiesPanel 10. docs(qw): synthèse de livraison QW suite mai 2026 ``` 10 commits attendus (1 spec déjà fait + 9 features+docs).