15 Commits

Author SHA1 Message Date
Dom
0a02a6ec9c feat(qw4): bench rigoureux LLM safety_checks → gemma4:latest par défaut
Some checks failed
tests / Lint (ruff + black) (push) Successful in 15s
tests / Tests unitaires (sans GPU) (push) Failing after 14s
tests / Tests sécurité (critique) (push) Has been skipped
Bench 5 modèles × 5 scénarios × cold+warm sur RTX 5070 :
- gemma4:latest : warm 2.9s, JSON 92%, détection 46% → gagnant
- qwen2.5vl:7b : warm 6.6s, détection 23% (trop lent)
- qwen2.5vl:3b : warm 2.0s, détection 8% (vérifie pour vérifier)
- medgemma:4b : warm 0.5s, détection 0% (refuse de signaler) → mauvais
  défaut initial, corrigé
- qwen3-vl:8b : 0% JSON valide (ignore format=json Ollama) → écarté

Modifications safety_checks_provider.py :
- RPA_SAFETY_CHECKS_LLM_MODEL défaut: medgemma:4b → gemma4:latest
- RPA_SAFETY_CHECKS_LLM_TIMEOUT_S défaut: 5 → 7 (warm 2.9s + marge)

Doc complète : docs/BENCH_SAFETY_CHECKS_2026-05-06.md
Script : tools/bench_safety_checks_models.py (reproductible, ~10-15 min)

Limite assumée : 46% de détection. À présenter en démo comme aide médecin,
pas certification. Amélioration V2 = prompt plus dirigé sur champs à vérifier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 09:23:09 +02:00
Dom
83be93e121 chore(qw): cleanup post-review (préfixes BUS, événements monitor, import io)
Some checks failed
tests / Lint (ruff + black) (push) Successful in 14s
tests / Tests unitaires (sans GPU) (push) Failing after 14s
tests / Tests sécurité (critique) (push) Has been skipped
- safety_checks_provider : tous les logger.warning d'échec LLM préfixés
  [BUS] lea:safety_checks_llm_failed avec une raison spécifique
  (exception, http_status, timeout, network, json_decode).
- monitor_router : émission [BUS] lea:monitor_invalid_index si l'index
  explicite passé dans l'action est hors limites de monitors_geometry,
  et [BUS] lea:monitor_unavailable si focus actif demandé mais introuvable.
  Ces deux events permettent au bus de tracer chaque fallback de la cascade
  de routage QW1.
- safety_checks_provider : import io supprimé (inutilisé).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 00:08:22 +02:00
Dom
f5c33477f0 fix(qw4): câblage polling frontend → streaming pour PauseDialog
Avant ce fix, le frontend VWB ne savait pas qu'un replay Agent V1 (Windows)
était en pause supervisée : le seul polling (App.tsx) interrogeait
/execute/status (exécution locale Linux) et n'avait jamais l'info
safety_checks / pause_message du replay distant.

Côté backend (dag_execute.py) :
- ajout du proxy GET /api/v3/replay/state/<replay_id> qui forward vers
  /api/v1/traces/stream/replay/<id> avec Bearer token.

Côté frontend :
- ExecutionControls : nouvelle prop onWindowsReplayStarted, appelée avec
  le replay_id retourné par /api/v3/execute-windows.
- App.tsx : nouveau state streamingReplayId + useEffect qui poll
  /api/v3/replay/state/<id> toutes les secondes et fusionne status,
  pause_message, pause_reason, safety_checks dans appState.execution.
  Le PauseDialog existant s'affiche donc automatiquement quand
  status = paused_need_help.

Le polling s'arrête quand le replay est completed/error/cancelled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 00:06:20 +02:00
Dom
b1a3aa16f1 fix(qw1): enrichir heartbeat Windows avec monitor_index + monitors_geometry
Avant ce fix, le _heartbeat_loop côté Agent V1 deploy Windows
n'enrichissait pas son payload, donc QW1 multi-écran ne s'activait sur Windows
que via les events window_capture (déclenchés par les clics), pas en continu.

La source agent_v0/agent_v1/main.py portait déjà l'enrichissement (commit 2d71e2a24)
mais le snapshot deploy/windows_client/agent_v1/main.py n'avait pas été synchronisé.

Désormais chaque heartbeat porte monitor_index + monitors_geometry, le serveur
peut donc résoudre l'écran cible en permanence, même sans clic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 00:02:11 +02:00
Dom
0bcfddbbc4 docs(qw): plan de smoke tests manuels pour validation 2026-05-06
Some checks failed
tests / Lint (ruff + black) (push) Successful in 15s
tests / Tests unitaires (sans GPU) (push) Failing after 15s
tests / Tests sécurité (critique) (push) Has been skipped
Plan exécutable seul par Dom : 9 sections (préflight, QW1 mono/multi-écran,
QW2 boucle, QW4 backward/déclaratif/medical_critical, bus events, kill-switches,
rollback) avec checklist OK/KO et procédures d'urgence en pleine démo.

Validation pour démo GHT (1ère sem mai 2026).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 00:01:21 +02:00
Dom
aa47172f0f docs(qw): synthèse de livraison QW suite mai 2026
Some checks failed
tests / Lint (ruff + black) (push) Successful in 14s
tests / Tests unitaires (sans GPU) (push) Failing after 13s
tests / Tests sécurité (critique) (push) Has been skipped
Doc condensée des 3 quick wins livrés (QW1 multi-écrans, QW2 LoopDetector,
QW4 safety_checks hybrides) avec :
- procédures kill-switch et rollback
- table des env vars
- smoke tests manuels à effectuer avant démo GHT
- statut composant par composant

Pointe vers spec et plan d'exécution complets.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 23:48:26 +02:00
Dom
65da557310 feat(qw4): hook safety_checks_provider + extension /replay/resume avec acquittements
Some checks failed
tests / Lint (ruff + black) (push) Successful in 16s
tests / Tests unitaires (sans GPU) (push) Failing after 13s
tests / Tests sécurité (critique) (push) Has been skipped
replay_state enrichi de safety_checks, checks_acknowledged, pause_reason,
pause_payload (audit trail).

Branche supervisée pause_for_human :
- appel build_pause_payload() avant bascule paused_need_help
- log [BUS] lea:safety_checks_generated (count, sources)
- fallback safe sur exception (pause sans checks plutôt que crash)
- déclenchement si safety_level/safety_checks déclarés OU execution_mode != autonomous
- sinon comportement legacy (skip silencieux)

POST /replay/resume :
- accepte body { acknowledged_check_ids: [...] }
- vérifie tous les checks required acquittés, sinon 400 required_checks_missing
- stocke checks_acknowledged comme audit trail
- nettoie safety_checks/pause_payload après reprise

Proxy VWB /api/v3/replay/resume → streaming /replay/{id}/resume (forward bearer
token + acknowledged_check_ids).

Backward 100% : workflows sans safety_checks → resume sans acquittement requis.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 23:45:22 +02:00
Dom
af13cd80ff feat(vwb): PauseDialog + ChecklistPanel + extension PropertiesPanel pour safety_checks
PauseDialog (composant nouveau) :
- 2 modes selon payload : bulle simple legacy si safety_checks vide,
  ChecklistPanel sinon
- Continuer désactivé tant que required non cochés
- Badge [obligatoire] et [Léa] (avec evidence en tooltip)
- POST /api/v3/replay/resume avec acknowledged_check_ids quand replay_id
  présent, fallback api.resumeExecution() pour la voie locale

types.ts : SafetyCheck, SafetyLevel, extension Execution
(pause_reason, pause_message, safety_checks, replay_id, status
'paused_need_help'). Action pause_for_human enrichie de safety_level
et safety_checks dans le catalogue ACTIONS.

PropertiesPanel : éditeur safety_level (dropdown standard/medical_critical)
+ liste éditable de safety_checks (id/label/required + ajout/suppression).

App.tsx : rendu conditionnel du PauseDialog en overlay quand
status == paused_need_help, ou paused avec safety_checks. Backward 100% :
workflows existants sans safety_checks affichent la bulle legacy.

CSS : .pause-dialog-overlay/.pause-dialog-checks/.checklist-panel/
.check-item/.badge-required/.badge-lea/.check-editor-row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 23:33:04 +02:00
Dom
7c6945171e feat(qw4): SafetyChecksProvider hybride déclaratif + LLM contextuel
build_pause_payload(action, state, last_screenshot) → PausePayload
- Toujours inclure les checks déclaratifs (workflow.parameters.safety_checks)
- Si safety_level=medical_critical ET RPA_SAFETY_CHECKS_LLM_ENABLED=1 :
    appel LLM (medgemma:4b par défaut) en format=json strict, timeout 5s,
    max 3 checks ajoutés (configurables via env vars)
- Tous les chemins d'erreur (timeout, HTTP, JSON parse, exception) loggent
  et retournent [] (fallback safe : déclaratifs seuls)

Tests : 7 cas (déclaratif seul, hybride OK, timeout, LLM invalide,
kill-switch, max_checks, déclaratif vide).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 23:29:38 +02:00
Dom
ca0b436a61 feat(qw2): hook LoopDetector dans api_stream + extension replay_state
Some checks failed
tests / Lint (ruff + black) (push) Successful in 15s
tests / Tests unitaires (sans GPU) (push) Failing after 17s
tests / Tests sécurité (critique) (push) Has been skipped
replay_state enrichi de _screenshot_history (5 dernières images PIL) et
_action_history (5 dernières signatures action).

report_action_result :
- met à jour les deux anneaux après chaque action
- évalue le LoopDetector (singleton lazy avec _clip_embedder serveur)
- si detected → bascule paused_need_help avec pause_reason="loop_detected"
  et bus event lea:loop_detected (signal + evidence)

Tous les chemins d'erreur (embedder absent, OOM, exception) loggent et
laissent le replay continuer — aucun blocage par la couche détection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 23:25:04 +02:00
Dom
fc01afa59c fix(qw1): bus event lea:monitor_routed + cablage offset côté executor Agent V1
Cleanup post-review QW1 :
- Émission bus lea:monitor_routed dans /replay/next (idx, source, replay_id, action_id, offset, wh)
  via logger.info "[BUS] lea:monitor_routed ..." (le serveur streaming n'a pas
  de SocketIO local, agent_chat émet déjà lea:* sur 5004 ; ici on logge en INFO
  bien lisible, prêt pour un parser/pont futur)
- Executor Agent V1 (deploy/windows_client) lit action.monitor_resolution.{offset_x, offset_y, idx}
  et applique l'offset aux coords absolues du clic/type/scroll/popup quand idx >= 0
- composite_fallback (idx=-1) : pas d'offset appliqué (backward compat mono-écran)
- Log INFO "QW1 monitor cible idx=N source=X offset=(dx,dy) — appliqué aux coords"
  émis une fois par action quand un offset non nul s'applique

Tests : baseline 95 passed (e2e + phase0_integration + stream_processor + monitor_router + grounding_offset)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 23:16:06 +02:00
Dom
2a51a844b9 feat(qw2): LoopDetector composite (screen_static + action_repeat + retry)
Module isolé, 3 signaux indépendants :
- screen_static : CLIP similarity > 0.99 sur N captures consécutives
- action_repeat : N actions identiques (type+coords)
- retry_threshold : retried_actions >= seuil

Premier signal positif → LoopVerdict.detected=True (caller responsable de
la bascule en paused_need_help).

Configurable env vars : RPA_LOOP_DETECTOR_ENABLED (kill-switch),
RPA_LOOP_SCREEN_STATIC_N/THRESHOLD, RPA_LOOP_ACTION_REPEAT_N,
RPA_LOOP_RETRY_THRESHOLD.

Tests : 8 cas (chaque signal isolé, kill-switch, embedder absent, exception).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 23:09:43 +02:00
Dom
2d71e2a249 feat(qw1): enrichissement Agent V1 (monitor_index + monitors_geometry) + hook serveur
Some checks failed
tests / Lint (ruff + black) (push) Successful in 16s
tests / Tests unitaires (sans GPU) (push) Failing after 15s
tests / Tests sécurité (critique) (push) Has been skipped
Côté client Agent V1 :
- helpers _get_monitors_geometry() / _get_active_monitor_index() via screeninfo
  (fallback gracieux [] / None si screeninfo absent)
- _enrich_with_monitor_info() ajouté aux payloads dict de capture_dual,
  capture_active_window, et heartbeat_event poussé par main.py
- screeninfo>=0.8 ajouté aux requirements (source + deploy Windows)
- Deploy capturer.py reçoit l'enrichissement de manière additive (pas de
  copie verbatim qui aurait introduit BLUR_SENSITIVE absent côté deploy)

Côté serveur :
- import resolve_target_monitor depuis monitor_router (créé en QW1.1)
- /replay/next : enrichissement action.monitor_resolution avant envoi
  au client (idx, offset_x/y, w, h, source de la décision)
- live_session_manager.add_event : propagation monitor_index +
  monitors_geometry depuis window_capture ET depuis le payload event
  brut (cas heartbeat enrichi sans window/window_title)

Cascade de résolution (cf monitor_router.py) :
1. action.monitor_index (hérité de la session source)
2. session.last_focused_monitor (focus actif vu en dernier heartbeat)
3. composite_fallback (offset 0,0) — backward compat strict

Backward 100% : si geometry vide, fallback composite identique au
comportement actuel mss.monitors[0].

Tests : baseline 89/89 préservée, monitor_router 4/4 OK (total 93/93).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 23:05:44 +02:00
Dom
fae95c5366 feat(qw1): capture par monitor + propagation offsets dans grounding cascade
_capture_screen() accepte un monitor_idx optionnel (None = composite legacy).
Index logique 0..N-1 mappé sur mss.monitors[idx+1] (mss[0] = composite).

Les 3 niveaux de grounding (OCR, UI-TARS, VLM) propagent l'offset retourné
par la capture pour traduire les coordonnées locales monitor en coordonnées
absolues écran (correct pour pyautogui.click).

find_element_on_screen() accepte monitor_idx et le forwarde aux 3 niveaux.

Backward 100% : monitor_idx=None partout → comportement strictement actuel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 22:55:04 +02:00
Dom
6582a69d31 feat(qw1): MonitorRouter — résolution de l'écran cible pour le replay
Module isolé qui choisit l'écran cible avec stratégie en cascade :
1. action.monitor_index (session source) → cible explicite
2. session.last_focused_monitor → fallback focus actif
3. composite (offset 0,0) → backward compat (comportement actuel)

Backward 100% : actions sans monitor_index → fallback composite identique
au comportement mss.monitors[0] actuel.

Tests : 4 cas (cible OK, fallback focus, fallback composite, index invalide).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 22:50:22 +02:00
31 changed files with 2931 additions and 39 deletions

View File

@@ -448,6 +448,12 @@ class AgentV1:
window_title = self.vision.get_active_window_title()
if window_title:
heartbeat_event["active_window_title"] = window_title
# QW1 — enrichissement multi-écrans (additif, fallback gracieux)
try:
from .vision.capturer import _enrich_with_monitor_info
_enrich_with_monitor_info(heartbeat_event)
except Exception:
pass
self.streamer.push_event(heartbeat_event)
except Exception as e:
logger.error(f"Heartbeat error: {e}")

View File

@@ -5,6 +5,7 @@ Pillow>=10.0.0 # Crops et processing image
requests>=2.31.0 # Streaming réseau
python-socketio[client]>=5.10,<6.0 # Bus feedback Léa 'lea:*' (compat Flask-SocketIO 5.3.x serveur)
psutil>=5.9.0 # Monitoring CPU/RAM
screeninfo>=0.8 # QW1 — détection des monitors physiques + offsets
pystray>=0.19.5 # Icône Tray UI
plyer>=2.1.0 # Notifications toast natives (remplace PyQt5)
pywebview>=5.0 # Fenêtre de chat Léa intégrée (Edge WebView2 sur Windows)

View File

@@ -15,7 +15,7 @@ import time
import logging
import hashlib
import platform
from typing import Any, Dict, Optional
from typing import Any, Dict, List, Optional
from PIL import Image, ImageFilter, ImageStat
import mss
from ..config import TARGETED_CROP_SIZE, SCREENSHOT_QUALITY, BLUR_SENSITIVE
@@ -26,6 +26,66 @@ logger = logging.getLogger(__name__)
# OS courant (détecté une seule fois)
_SYSTEM = platform.system()
# QW1 — détection multi-écrans (fallback gracieux si screeninfo absent)
try:
from screeninfo import get_monitors as _screeninfo_get_monitors
_SCREENINFO_AVAILABLE = True
except ImportError:
_SCREENINFO_AVAILABLE = False
def _get_monitors_geometry() -> List[Dict[str, Any]]:
"""Retourne la liste des monitors physiques avec leurs offsets.
Returns:
List[dict] : [{idx, x, y, w, h, primary}, ...]. Vide si screeninfo
indisponible (le serveur tombera sur fallback composite).
"""
if not _SCREENINFO_AVAILABLE:
return []
try:
monitors = _screeninfo_get_monitors()
return [
{
"idx": i,
"x": int(m.x),
"y": int(m.y),
"w": int(m.width),
"h": int(m.height),
"primary": bool(getattr(m, "is_primary", False)),
}
for i, m in enumerate(monitors)
]
except Exception:
return []
def _get_active_monitor_index() -> Optional[int]:
"""Retourne l'index logique du monitor où se trouve le curseur (focus actif).
Returns:
int ou None si indéterminable.
"""
if not _SCREENINFO_AVAILABLE:
return None
try:
import pyautogui # import paresseux : évite la dépendance dure
cx, cy = pyautogui.position()
for i, m in enumerate(_screeninfo_get_monitors()):
if m.x <= cx < m.x + m.width and m.y <= cy < m.y + m.height:
return i
except Exception:
return None
return None
def _enrich_with_monitor_info(payload: dict) -> dict:
"""Ajoute monitor_index et monitors_geometry au payload (in-place + return)."""
if isinstance(payload, dict):
payload["monitor_index"] = _get_active_monitor_index()
payload["monitors_geometry"] = _get_monitors_geometry()
return payload
class VisionCapturer:
def __init__(self, session_dir: str):
self.session_dir = session_dir
@@ -121,6 +181,9 @@ class VisionCapturer:
if window_info:
result["window_capture"] = window_info
# QW1 — enrichissement multi-écrans (additif, fallback gracieux)
_enrich_with_monitor_info(result)
return result
except Exception as e:
logger.error(f"Erreur Dual Capture: {e}")
@@ -223,6 +286,9 @@ class VisionCapturer:
"click_inside_window": click_inside,
}
# QW1 — enrichissement multi-écrans (additif)
_enrich_with_monitor_info(result)
logger.debug(
f"Fenêtre capturée : {title} ({win_w}x{win_h}) — "
f"clic relatif ({click_rel_x}, {click_rel_y})"

View File

@@ -512,6 +512,21 @@ class ActionExecutorV1:
x_pct = action.get("x_pct", 0.0)
y_pct = action.get("y_pct", 0.0)
# QW1 — Si le serveur a résolu un monitor cible (idx >= 0),
# appliquer son offset aux coords absolues. Pour idx == -1
# (composite_fallback), aucun offset (backward compat).
# Le calcul des coords reste percent * (width/height) du monitor[1]
# côté client (x_pct est exprimé sur l'écran physique principal).
mon_res = action.get("monitor_resolution") or {}
mon_idx = mon_res.get("idx", -1)
mon_offset_x = mon_res.get("offset_x", 0) if mon_idx >= 0 else 0
mon_offset_y = mon_res.get("offset_y", 0) if mon_idx >= 0 else 0
if mon_idx >= 0 and (mon_offset_x or mon_offset_y):
logger.info(
f"[REPLAY] QW1 monitor cible idx={mon_idx} source={mon_res.get('source')} "
f"offset=({mon_offset_x},{mon_offset_y}) — appliqué aux coords"
)
# ── Diagnostic résolution ──
logger.info(
f"[REPLAY] Action {action_id} ({action_type}) — "
@@ -578,8 +593,8 @@ class ActionExecutorV1:
print(f" [OBSERVER] Popup détectée : '{popup_label}' — fermeture")
logger.info(f"Observer : popup '{popup_label}' détectée avant résolution")
if popup_coords:
real_x = int(popup_coords["x_pct"] * width)
real_y = int(popup_coords["y_pct"] * height)
real_x = int(popup_coords["x_pct"] * width) + mon_offset_x
real_y = int(popup_coords["y_pct"] * height) + mon_offset_y
self._click((real_x, real_y), "left")
time.sleep(1.0)
print(f" [OBSERVER] Popup fermée — reprise du flow normal")
@@ -718,8 +733,8 @@ class ActionExecutorV1:
self.notifier.replay_target_not_found(target_desc)
return result
real_x = int(x_pct * width)
real_y = int(y_pct * height)
real_x = int(x_pct * width) + mon_offset_x
real_y = int(y_pct * height) + mon_offset_y
button = action.get("button", "left")
mode = "VISUAL" if result.get("visual_resolved") else "COORD"
print(
@@ -781,8 +796,8 @@ class ActionExecutorV1:
print(f" [TYPE] raw_keys disponibles ({len(raw_keys)} events) — replay exact")
# Cliquer sur le champ avant de taper (si coordonnees disponibles)
if x_pct > 0 and y_pct > 0:
real_x = int(x_pct * width)
real_y = int(y_pct * height)
real_x = int(x_pct * width) + mon_offset_x
real_y = int(y_pct * height) + mon_offset_y
print(f" [TYPE] Clic prealable sur ({real_x}, {real_y})")
self._click((real_x, real_y), "left")
time.sleep(0.3)
@@ -808,8 +823,8 @@ class ActionExecutorV1:
logger.info(f"Replay key_combo : {keys} (raw_keys={'oui' if raw_keys else 'non'})")
elif action_type == "scroll":
real_x = int(x_pct * width) if x_pct > 0 else int(0.5 * width)
real_y = int(y_pct * height) if y_pct > 0 else int(0.5 * height)
real_x = (int(x_pct * width) if x_pct > 0 else int(0.5 * width)) + mon_offset_x
real_y = (int(y_pct * height) if y_pct > 0 else int(0.5 * height)) + mon_offset_y
delta = action.get("delta", -3)
print(f" [SCROLL] delta={delta} a ({real_x}, {real_y})")
self.mouse.position = (real_x, real_y)
@@ -1386,6 +1401,16 @@ Example: x_pct=0.50, y_pct=0.30"""
data = resp.json()
action = data.get("action")
if action is None:
# pause_for_human : afficher le message de décision à l'utilisateur
if data.get("replay_paused") and data.get("pause_message"):
msg = data["pause_message"]
print(f"[PAUSE] {msg}")
logger.info(f"Replay en pause — message : {msg}")
self.notifier.notify(
title="Léa — Validation requise",
message=msg[:250],
timeout=30,
)
return False
except (requests.exceptions.ConnectionError, requests.exceptions.Timeout) as e:

View File

@@ -319,7 +319,22 @@ class AgentV1:
if img_hash != self._last_heartbeat_hash:
self._last_heartbeat_hash = img_hash
self.streamer.push_image(full_path, f"heartbeat_{int(time.time())}")
self.streamer.push_event({"type": "heartbeat", "image": full_path, "timestamp": time.time(), "machine_id": self.machine_id})
heartbeat_event = {
"type": "heartbeat",
"image": full_path,
"timestamp": time.time(),
"machine_id": self.machine_id,
}
# QW1 — enrichissement multi-écrans (monitor_index + monitors_geometry)
# Additif, fallback gracieux : sans cet enrichissement, le serveur
# ne reçoit l'info qu'au moment des clics, donc QW1 ne s'active
# pas en continu sur poste Windows multi-écrans.
try:
from .vision.capturer import _enrich_with_monitor_info
_enrich_with_monitor_info(heartbeat_event)
except Exception as e:
logger.debug("QW1 enrichissement heartbeat échoué: %s", e)
self.streamer.push_event(heartbeat_event)
except Exception as e:
logger.error(f"Heartbeat error: {e}")
time.sleep(5)

View File

@@ -8,12 +8,73 @@ import os
import time
import logging
import hashlib
from typing import Any, Dict, List, Optional
from PIL import Image, ImageFilter, ImageStat
import mss
from ..config import TARGETED_CROP_SIZE, SCREENSHOT_QUALITY
logger = logging.getLogger(__name__)
# QW1 — détection multi-écrans (fallback gracieux si screeninfo absent)
try:
from screeninfo import get_monitors as _screeninfo_get_monitors
_SCREENINFO_AVAILABLE = True
except ImportError:
_SCREENINFO_AVAILABLE = False
def _get_monitors_geometry() -> List[Dict[str, Any]]:
"""Retourne la liste des monitors physiques avec leurs offsets.
Returns:
List[dict] : [{idx, x, y, w, h, primary}, ...]. Vide si screeninfo
indisponible (le serveur tombera sur fallback composite).
"""
if not _SCREENINFO_AVAILABLE:
return []
try:
monitors = _screeninfo_get_monitors()
return [
{
"idx": i,
"x": int(m.x),
"y": int(m.y),
"w": int(m.width),
"h": int(m.height),
"primary": bool(getattr(m, "is_primary", False)),
}
for i, m in enumerate(monitors)
]
except Exception:
return []
def _get_active_monitor_index() -> Optional[int]:
"""Retourne l'index logique du monitor où se trouve le curseur (focus actif).
Returns:
int ou None si indéterminable.
"""
if not _SCREENINFO_AVAILABLE:
return None
try:
import pyautogui # import paresseux : évite la dépendance dure
cx, cy = pyautogui.position()
for i, m in enumerate(_screeninfo_get_monitors()):
if m.x <= cx < m.x + m.width and m.y <= cy < m.y + m.height:
return i
except Exception:
return None
return None
def _enrich_with_monitor_info(payload: dict) -> dict:
"""Ajoute monitor_index et monitors_geometry au payload (in-place + return)."""
if isinstance(payload, dict):
payload["monitor_index"] = _get_active_monitor_index()
payload["monitors_geometry"] = _get_monitors_geometry()
return payload
class VisionCapturer:
def __init__(self, session_dir: str):
self.session_dir = session_dir
@@ -72,7 +133,12 @@ class VisionCapturer:
# Mise à jour du hash pour le prochain heartbeat
self.last_img_hash = self._compute_quick_hash(img)
return {"full": full_path, "crop": crop_path}
result = {"full": full_path, "crop": crop_path}
# QW1 — enrichissement multi-écrans (additif, fallback gracieux)
_enrich_with_monitor_info(result)
return result
except Exception as e:
logger.error(f"Erreur Dual Capture: {e}")
return {}

View File

@@ -5,6 +5,7 @@ Pillow>=10.0.0 # Crops et processing image
requests>=2.31.0 # Streaming réseau
python-socketio[client]>=5.10,<6.0 # Bus feedback Léa 'lea:*' (compat Flask-SocketIO 5.3.x serveur)
psutil>=5.9.0 # Monitoring CPU/RAM
screeninfo>=0.8 # QW1 — détection des monitors physiques + offsets
pystray>=0.19.5 # Icône Tray UI
plyer>=2.1.0 # Notifications toast natives (remplace PyQt5)

View File

@@ -33,6 +33,8 @@ from .audit_trail import AuditTrail, AuditEntry
from .agent_registry import AgentRegistry, AgentAlreadyEnrolledError
from .stream_processor import StreamProcessor, build_replay_from_raw_events, enrich_click_from_screenshot
from .worker_stream import StreamWorker
from .monitor_router import resolve_target_monitor # QW1 — résolution écran cible
from .loop_detector import LoopDetector # QW2 — détection de boucle pendant replay
from .execution_plan_runner import (
execution_plan_to_actions,
inject_plan_into_queue,
@@ -222,6 +224,7 @@ from .replay_engine import (
_resolve_runtime_vars,
_SERVER_SIDE_ACTION_TYPES,
_handle_extract_text_action,
_handle_extract_table_action,
_handle_t2a_decision_action,
_expand_compound_steps,
_pre_check_screen_state as _pre_check_screen_state_impl,
@@ -359,6 +362,18 @@ REPLAY_LOCK_FILE = _DATA_DIR / "_replay_active.lock"
processor = StreamProcessor(data_dir=str(LIVE_SESSIONS_DIR))
worker = StreamWorker(live_dir=str(LIVE_SESSIONS_DIR), processor=processor)
# QW2 — LoopDetector singleton lazy (utilise le CLIP embedder du processor)
_loop_detector: Optional["LoopDetector"] = None
def _get_loop_detector() -> "LoopDetector":
"""Singleton lazy — crée le LoopDetector avec le CLIP embedder du processor."""
global _loop_detector
if _loop_detector is None:
embedder = getattr(processor, "_clip_embedder", None)
_loop_detector = LoopDetector(clip_embedder=embedder)
return _loop_detector
# Registre des postes Lea enroles (table enrolled_agents dans rpa_data.db)
# Emplacement configurable via RPA_AGENTS_DB_PATH pour les tests.
_AGENTS_DB_PATH = os.environ.get(
@@ -511,6 +526,7 @@ class ReplayRequest(BaseModel):
session_id: str
machine_id: Optional[str] = None # Machine cible pour le replay (multi-machine)
params: Optional[Dict[str, Any]] = None
variables: Optional[Dict[str, Any]] = None # Variables runtime initiales (templating {{var}})
class RawReplayRequest(BaseModel):
@@ -765,6 +781,21 @@ async def startup():
_cleanup_thread = threading.Thread(target=_cleanup_loop, daemon=True, name="replay_cleanup")
_cleanup_thread.start()
# Préchargement EasyOCR en arrière-plan : sans ça, le 1er extract_text /
# extract_table déclenche un cold start de ~3-5s qui bloque l'event loop
# FastAPI (constaté 2026-05-05 : streaming server inaccessible 2 min).
# Le thread tourne pendant que le boot continue ; le 1er appel OCR sera rapide.
def _preload_easyocr():
try:
t0 = time.time()
from core.llm.ocr_extractor import _get_reader
_get_reader()
logger.info("[OCR] EasyOCR préchargé (fr+en, CPU) en %.1fs", time.time() - t0)
except Exception as e:
logger.warning("[OCR] Échec préchargement EasyOCR : %s", e)
threading.Thread(target=_preload_easyocr, daemon=True, name="preload_easyocr").start()
logger.info(
"API Streaming démarrée — StreamProcessor, Worker et Cleanup prêts. "
"VLM Worker dans un process séparé (run_worker.py)."
@@ -1962,6 +1993,11 @@ async def start_replay(request: ReplayRequest):
machine_id=resolved_machine_id,
actions=actions,
)
# Pré-injection des variables runtime (templating {{var}} sur by_text,
# text, target_spec.* etc.). Permet à l'orchestrateur d'appeler ce
# workflow avec p.ex. variables={"patient_id": "25003284"} pour boucler.
if request.variables:
_replay_states[replay_id]["variables"].update(request.variables)
# Enregistrer le mapping machine -> session pour le replay ciblé
if resolved_machine_id and resolved_machine_id != "default":
_machine_replay_target[resolved_machine_id] = session_id
@@ -2892,8 +2928,54 @@ async def get_next_action(session_id: str, machine_id: str = "default"):
type_ = action.get("type")
# pause_for_human : no-op en mode autonome — on saute et on continue
# pause_for_human : pause supervisée si safety_level/safety_checks ou mode supervised,
# sinon no-op en mode autonome (skip).
if type_ == "pause_for_human":
_params = action.get("parameters") or {}
_exec_mode = (
(owning_replay or {}).get("params", {}).get("execution_mode", "autonomous")
if owning_replay else "autonomous"
)
_has_safety_decl = bool(_params.get("safety_level") or _params.get("safety_checks"))
_is_supervised = _exec_mode != "autonomous"
if owning_replay is not None and (_has_safety_decl or _is_supervised):
# QW4 — Construire le payload de pause enrichi (déclaratif + LLM contextuel)
try:
from agent_v0.server_v1.safety_checks_provider import build_pause_payload
last_screenshot_path = owning_replay.get("last_screenshot")
payload = build_pause_payload(action, owning_replay, last_screenshot_path)
owning_replay["safety_checks"] = payload.checks
owning_replay["pause_payload"] = {
"checks": payload.checks,
"pause_reason": payload.pause_reason,
"message": payload.message,
}
if payload.message:
owning_replay["pause_message"] = payload.message
# Bus event d'observabilité (pattern QW1/QW2 = logger.info)
logger.info(
"[BUS] lea:safety_checks_generated replay=%s count=%d sources=%s",
owning_replay.get("replay_id", "?"),
len(payload.checks),
[c["source"] for c in payload.checks],
)
except Exception as e:
logger.warning("QW4 build_pause_payload échec (%s) — pause sans checks", e)
owning_replay["safety_checks"] = []
# Conserver le contexte de l'action (audit + reprise)
owning_replay["failed_action"] = {
"action_id": action.get("action_id"),
"type": "pause_for_human",
"reason": "user_request",
}
owning_replay["status"] = "paused_need_help"
queue.pop(0)
_replay_queues[session_id] = queue
return {"action": None, "session_id": session_id, "machine_id": machine_id}
# Mode autonome sans safety_checks → skip (comportement legacy)
logger.info(
"pause_for_human ignorée (mode autonome) — replay %s continue",
owning_replay["replay_id"] if owning_replay else "?"
@@ -2914,6 +2996,12 @@ async def get_next_action(session_id: str, machine_id: str = "default"):
_handle_extract_text_action,
action, owning_replay, session_id, _last_heartbeat,
)
elif type_ == "extract_table":
await loop.run_in_executor(
None,
_handle_extract_table_action,
action, owning_replay, session_id, _last_heartbeat,
)
elif type_ == "t2a_decision":
await loop.run_in_executor(
None,
@@ -3117,6 +3205,51 @@ async def get_next_action(session_id: str, machine_id: str = "default"):
f"{_precheck_sim}"
)
# QW1 — Résoudre l'écran cible et joindre l'info à l'action
# Cascade : action.monitor_index → session.last_focused_monitor → composite_fallback
try:
session_qw1 = processor.session_manager.get_session(session_id)
last_window_info_qw1 = (
session_qw1.last_window_info if session_qw1 is not None else {}
) or {}
session_state_qw1 = {
"monitors_geometry": last_window_info_qw1.get("monitors_geometry", []),
"last_focused_monitor": last_window_info_qw1.get("monitor_index"),
}
target = resolve_target_monitor(action, session_state_qw1)
action["monitor_resolution"] = {
"idx": target.idx,
"offset_x": target.offset_x,
"offset_y": target.offset_y,
"w": target.w,
"h": target.h,
"source": target.source,
}
# QW1 — Émission bus lea:monitor_routed (no-op si bus indisponible)
# Le serveur streaming n'a pas de SocketIO local : on logge en INFO
# bien lisible. Un consommateur (agent_chat / dashboard) peut tailer
# `journalctl -u rpa-streaming | grep '\[BUS\] lea:monitor_routed'`.
try:
_replay_id_bus = (
owning_replay.get("replay_id") if owning_replay else None
)
logger.info(
"[BUS] lea:monitor_routed replay=%s action=%s idx=%d source=%s "
"offset=(%d,%d) wh=(%d,%d)",
_replay_id_bus,
action.get("action_id"),
target.idx,
target.source,
target.offset_x,
target.offset_y,
target.w,
target.h,
)
except Exception as _e_bus:
logger.debug("emit lea:monitor_routed échec (non bloquant): %s", _e_bus)
except Exception as e:
logger.debug("QW1 monitor_resolution skip (%s)", e)
response: Dict[str, Any] = {
"action": action,
"session_id": session_id,
@@ -3855,6 +3988,82 @@ async def report_action_result(report: ReplayResultReport):
f"— worker VLM autorisé à reprendre"
)
# ===================================================================
# QW2 — LoopDetector : alimentation des anneaux + évaluation
# ===================================================================
# On n'évalue que si le replay est encore "running" — inutile de
# pauser quelque chose de déjà completed/error/paused.
if replay_state["status"] == "running":
# Snapshot image (PIL) dans l'anneau
try:
from PIL import Image
ss_raw = screenshot_after or replay_state.get("last_screenshot")
img = None
if isinstance(ss_raw, str) and ss_raw:
if os.path.isfile(ss_raw):
img = Image.open(ss_raw).copy() # détache du file handle
else:
# Possible base64 — décoder
try:
import base64
import io as _io
img_bytes = base64.b64decode(ss_raw, validate=False)
img = Image.open(_io.BytesIO(img_bytes)).copy()
except Exception:
img = None
if img is not None:
replay_state.setdefault("_screenshot_history", []).append(img)
replay_state["_screenshot_history"] = replay_state["_screenshot_history"][-5:]
except Exception as e:
logger.debug("LoopDetector: snapshot historique échoué: %s", e)
# Snapshot signature de l'action courante
try:
_act_pos = report.actual_position or {}
action_sig = {
"type": (original_action or {}).get("type")
or replay_state.get("_last_action_type", ""),
"x_pct": _act_pos.get("x_pct") if isinstance(_act_pos, dict)
else (original_action or {}).get("x_pct"),
"y_pct": _act_pos.get("y_pct") if isinstance(_act_pos, dict)
else (original_action or {}).get("y_pct"),
}
replay_state.setdefault("_action_history", []).append(action_sig)
replay_state["_action_history"] = replay_state["_action_history"][-5:]
except Exception as e:
logger.debug("LoopDetector: snapshot action_sig échoué: %s", e)
# Évaluation (silencieux si rien)
try:
verdict = _get_loop_detector().evaluate(
replay_state,
screenshots=replay_state.get("_screenshot_history", []),
actions=replay_state.get("_action_history", []),
)
if verdict.detected:
replay_state["status"] = "paused_need_help"
replay_state["pause_reason"] = "loop_detected"
replay_state["pause_message"] = (
f"Léa semble bloquée — {verdict.signal} "
f"(détail: {verdict.evidence})"
)
logger.warning(
"LoopDetector: replay %s mis en pause — signal=%s evidence=%s",
replay_state["replay_id"], verdict.signal, verdict.evidence,
)
# Bus event d'observabilité (logger pattern QW1)
try:
logger.info(
"[BUS] lea:loop_detected replay=%s signal=%s evidence=%s",
replay_state["replay_id"],
verdict.signal,
verdict.evidence,
)
except Exception as _e_bus:
logger.debug("emit lea:loop_detected échec: %s", _e_bus)
except Exception as e:
logger.warning("LoopDetector: évaluation échouée (non bloquant): %s", e)
return {
"status": "recorded",
"action_id": action_id,
@@ -3941,8 +4150,16 @@ async def list_replays():
}
class ReplayResumeRequest(BaseModel):
"""Body optionnel pour /replay/resume — QW4 acquittement de safety_checks."""
acknowledged_check_ids: List[str] = []
@app.post("/api/v1/traces/stream/replay/{replay_id}/resume")
async def resume_replay(replay_id: str):
async def resume_replay(
replay_id: str,
payload: Optional[ReplayResumeRequest] = None,
):
"""Reprendre un replay en pause supervisee (paused_need_help).
L'utilisateur a intervenu manuellement (naviguer vers le bon ecran,
@@ -3950,6 +4167,10 @@ async def resume_replay(replay_id: str):
est reinjectee en tete de queue pour etre re-tentee.
Si le replay n'est pas en pause, retourne une erreur 409 (conflit).
QW4 — Si des safety_checks sont attachés à la pause, tous ceux marqués
`required` doivent figurer dans `acknowledged_check_ids`. Sinon → 400
avec `{"error": "required_checks_missing", "missing": [...]}`.
"""
with _replay_lock:
state = _replay_states.get(replay_id)
@@ -3968,6 +4189,25 @@ async def resume_replay(replay_id: str):
),
)
# QW4 — Vérification des safety_checks required avant reprise
safety_checks = state.get("safety_checks") or []
ack_ids = (payload.acknowledged_check_ids if payload else []) or []
if safety_checks:
required_ids = {c["id"] for c in safety_checks if c.get("required")}
ack_set = set(ack_ids)
missing = sorted(required_ids - ack_set)
if missing:
raise HTTPException(
status_code=400,
detail={"error": "required_checks_missing", "missing": missing},
)
# Audit trail
state["checks_acknowledged"] = sorted(ack_set)
logger.info(
"QW4 resume replay=%s acquittements=%d (%s)",
state.get("replay_id"), len(ack_set), sorted(ack_set),
)
# Recuperer l'action echouee pour la reinjecter
failed_action = state.get("failed_action")
session_id = state["session_id"]
@@ -3976,6 +4216,10 @@ async def resume_replay(replay_id: str):
state["status"] = "running"
state["failed_action"] = None
state["pause_message"] = None
# QW4 — vider safety_checks après acquittement (la pause est résolue)
state["safety_checks"] = []
state["pause_payload"] = None
state["pause_reason"] = ""
# Reinjecter l'action echouee en tete de queue (sera re-tentee)
# pause_for_human est une pause intentionnelle, pas une erreur — ne pas réinjecter

View File

@@ -256,6 +256,20 @@ class LiveSessionManager:
session.last_window_info["title"] = wc_title
if wc_app:
session.last_window_info["app_name"] = wc_app
# QW1 — propager monitor_index et monitors_geometry depuis window_capture
if "monitor_index" in window_capture:
session.last_window_info["monitor_index"] = window_capture["monitor_index"]
if "monitors_geometry" in window_capture:
session.last_window_info["monitors_geometry"] = window_capture["monitors_geometry"]
# QW1 — propager monitor_index/monitors_geometry du payload event
# (cas heartbeat enrichi sans window/window_title). Toujours
# rafraîchir le focus actif (change souvent) et la géométrie
# (l'utilisateur peut brancher/débrancher un écran).
if "monitor_index" in event_data:
session.last_window_info["monitor_index"] = event_data["monitor_index"]
if "monitors_geometry" in event_data and event_data["monitors_geometry"]:
session.last_window_info["monitors_geometry"] = event_data["monitors_geometry"]
# Accumuler les titres/apps pour le nommage automatique
title = session.last_window_info.get("title", "").strip()

View File

@@ -0,0 +1,154 @@
# agent_v0/server_v1/loop_detector.py
"""LoopDetector composite — détection de stagnation de Léa pendant un replay (QW2).
Trois signaux indépendants :
- screen_static : N captures consécutives avec CLIP similarity > seuil
- action_repeat : N actions consécutives identiques (type + coords)
- retry_threshold : nombre de retries cumulés >= seuil
Un seul signal positif → verdict.detected=True. Le serveur bascule alors le
replay en paused_need_help avec pause_reason explicite.
Désactivable via env var RPA_LOOP_DETECTOR_ENABLED=0.
"""
import logging
import os
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional
logger = logging.getLogger(__name__)
@dataclass
class LoopVerdict:
detected: bool = False
reason: str = ""
signal: str = "" # "screen_static" | "action_repeat" | "retry_threshold" | ""
evidence: Dict[str, Any] = field(default_factory=dict)
def _env_int(name: str, default: int) -> int:
try:
return int(os.environ.get(name, default))
except (TypeError, ValueError):
return default
def _env_float(name: str, default: float) -> float:
try:
return float(os.environ.get(name, default))
except (TypeError, ValueError):
return default
def _env_bool_enabled(name: str) -> bool:
val = os.environ.get(name, "1").strip().lower()
return val not in ("0", "false", "no", "off", "")
def _cosine_similarity(a, b) -> float:
"""Similarité cosine entre deux vecteurs (listes ou np.array). Robuste vecteur nul."""
import numpy as np
av = np.asarray(a, dtype=np.float32).flatten()
bv = np.asarray(b, dtype=np.float32).flatten()
na, nb = float(np.linalg.norm(av)), float(np.linalg.norm(bv))
if na < 1e-8 or nb < 1e-8:
return 0.0
return float(np.dot(av, bv) / (na * nb))
class LoopDetector:
def __init__(self, clip_embedder=None):
self.clip_embedder = clip_embedder
def evaluate(
self,
state: Dict[str, Any],
screenshots: List[Any],
actions: List[Dict[str, Any]],
) -> LoopVerdict:
"""Évalue les 3 signaux. Retourne le premier déclenché.
Args:
state: replay_state (utilisé pour retried_actions)
screenshots: anneau d'embeddings CLIP (les N derniers)
actions: anneau des N dernières actions exécutées
"""
if not _env_bool_enabled("RPA_LOOP_DETECTOR_ENABLED"):
return LoopVerdict(detected=False)
# Signal A : screen_static
verdict = self._check_screen_static(screenshots)
if verdict.detected:
return verdict
# Signal B : action_repeat
verdict = self._check_action_repeat(actions)
if verdict.detected:
return verdict
# Signal C : retry_threshold
verdict = self._check_retry_threshold(state)
if verdict.detected:
return verdict
return LoopVerdict(detected=False)
def _check_screen_static(self, screenshots: List[Any]) -> LoopVerdict:
n_required = _env_int("RPA_LOOP_SCREEN_STATIC_N", 4)
threshold = _env_float("RPA_LOOP_SCREEN_STATIC_THRESHOLD", 0.99)
if self.clip_embedder is None or len(screenshots) < n_required:
return LoopVerdict()
try:
recent = screenshots[-n_required:]
# Embed chaque capture via le CLIP embedder (peut lever)
embeddings = [self.clip_embedder.embed_image(img) for img in recent]
sims = [_cosine_similarity(embeddings[i], embeddings[i + 1])
for i in range(len(embeddings) - 1)]
min_sim = min(sims)
if min_sim > threshold:
return LoopVerdict(
detected=True,
reason="loop_detected",
signal="screen_static",
evidence={"min_similarity": round(min_sim, 4),
"n_captures": n_required,
"threshold": threshold},
)
except Exception as e:
logger.warning("LoopDetector signal_A erreur (%s) — signal inerte ce tick", e)
return LoopVerdict()
def _check_action_repeat(self, actions: List[Dict[str, Any]]) -> LoopVerdict:
n_required = _env_int("RPA_LOOP_ACTION_REPEAT_N", 3)
if len(actions) < n_required:
return LoopVerdict()
recent = actions[-n_required:]
def _signature(a: Dict[str, Any]) -> tuple:
return (a.get("type"), a.get("x_pct"), a.get("y_pct"))
sigs = [_signature(a) for a in recent]
if all(s == sigs[0] for s in sigs):
return LoopVerdict(
detected=True,
reason="loop_detected",
signal="action_repeat",
evidence={"signature": sigs[0], "count": n_required},
)
return LoopVerdict()
def _check_retry_threshold(self, state: Dict[str, Any]) -> LoopVerdict:
threshold = _env_int("RPA_LOOP_RETRY_THRESHOLD", 3)
retried = int(state.get("retried_actions", 0))
if retried >= threshold:
return LoopVerdict(
detected=True,
reason="loop_detected",
signal="retry_threshold",
evidence={"retried_actions": retried, "threshold": threshold},
)
return LoopVerdict()

View File

@@ -0,0 +1,99 @@
# agent_v0/server_v1/monitor_router.py
"""MonitorRouter — résolution de l'écran cible pour le replay (QW1).
Stratégie en cascade :
1. action.monitor_index (hérité de la session source) → cible cet écran
2. session.last_focused_monitor (focus actif vu en dernier heartbeat) → fallback
3. composite (offset 0, 0) → backward compat
Émet sur le bus lea:* l'event monitor_routed avec la source de la décision.
"""
import logging
from dataclasses import dataclass
from typing import Any, Dict, List, Optional
logger = logging.getLogger(__name__)
@dataclass
class MonitorTarget:
"""Représente l'écran cible résolu pour une action de replay."""
idx: int
offset_x: int
offset_y: int
w: int
h: int
source: str # "action" | "focus" | "composite_fallback"
_COMPOSITE_FALLBACK = MonitorTarget(
idx=-1,
offset_x=0,
offset_y=0,
w=0,
h=0,
source="composite_fallback",
)
def _find_monitor(geometry: List[Dict[str, Any]], idx: int) -> Optional[Dict[str, Any]]:
"""Retourne le monitor d'index donné, ou None si absent."""
for m in geometry:
if m.get("idx") == idx:
return m
return None
def _to_target(monitor: Dict[str, Any], source: str) -> MonitorTarget:
return MonitorTarget(
idx=int(monitor["idx"]),
offset_x=int(monitor.get("x", 0)),
offset_y=int(monitor.get("y", 0)),
w=int(monitor.get("w", 0)),
h=int(monitor.get("h", 0)),
source=source,
)
def resolve_target_monitor(
action: Dict[str, Any],
session_state: Dict[str, Any],
) -> MonitorTarget:
"""Résout l'écran cible d'une action de replay.
Args:
action: Dict de l'action (peut contenir `monitor_index`).
session_state: État de la session (doit contenir `monitors_geometry`
et `last_focused_monitor`).
Returns:
MonitorTarget avec l'offset à appliquer aux coordonnées de grounding.
"""
geometry: List[Dict[str, Any]] = session_state.get("monitors_geometry") or []
# 1. Cible explicite via action
explicit_idx = action.get("monitor_index")
if explicit_idx is not None and geometry:
m = _find_monitor(geometry, int(explicit_idx))
if m is not None:
return _to_target(m, source="action")
# Index invalide → on tombe sur le fallback focus
logger.warning(
"[BUS] lea:monitor_invalid_index requested=%d available_idx=%s",
int(explicit_idx), [g.get("idx") for g in geometry],
)
# 2. Fallback focus actif
focused_idx = session_state.get("last_focused_monitor")
if focused_idx is not None and geometry:
m = _find_monitor(geometry, int(focused_idx))
if m is not None:
return _to_target(m, source="focus")
logger.warning(
"[BUS] lea:monitor_unavailable focused_idx=%d available_idx=%s",
int(focused_idx), [g.get("idx") for g in geometry],
)
# 3. Fallback composite (backward compat — comportement actuel mss.monitors[0])
return _COMPOSITE_FALLBACK

View File

@@ -1381,6 +1381,14 @@ def _create_replay_state(
# t2a_decision, etc.). Résolues via templating {{var}} ou {{var.field}}
# dans les paramètres des actions suivantes.
"variables": {},
# QW2 — Anneaux d'historique pour LoopDetector (5 derniers max)
"_screenshot_history": [], # images PIL des N derniers heartbeats (LoopDetector embed à chaque tick)
"_action_history": [], # N dernières actions exécutées (signature)
# QW4 — Safety checks (hybride déclaratif + LLM contextuel) et audit acquittements
"safety_checks": [], # liste produite par SafetyChecksProvider
"checks_acknowledged": [], # ids acquittés via /replay/resume (audit trail)
"pause_reason": "", # "loop_detected" | "" pour V1
"pause_payload": None, # payload complet pour debug/audit
}

View File

@@ -0,0 +1,195 @@
# agent_v0/server_v1/safety_checks_provider.py
"""SafetyChecksProvider — checks hybrides déclaratifs + LLM contextuels (QW4).
Pour une action pause_for_human :
- les checks déclaratifs (workflow) sont toujours inclus
- si safety_level == "medical_critical" et RPA_SAFETY_CHECKS_LLM_ENABLED=1,
un appel LLM (medgemma:4b par défaut) ajoute jusqu'à N checks contextuels
Tout échec côté LLM (timeout, exception, parse) → additional_checks=[] :
le replay continue avec uniquement les déclaratifs (fallback safe).
"""
import base64
import json
import logging
import os
import uuid
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional
logger = logging.getLogger(__name__)
@dataclass
class PausePayload:
checks: List[Dict[str, Any]] = field(default_factory=list)
pause_reason: str = ""
message: str = ""
def _env(name: str, default: str) -> str:
return os.environ.get(name, default).strip()
def _env_int(name: str, default: int) -> int:
try:
return int(os.environ.get(name, default))
except (TypeError, ValueError):
return default
def _env_bool_enabled(name: str) -> bool:
val = os.environ.get(name, "1").strip().lower()
return val not in ("0", "false", "no", "off", "")
def build_pause_payload(
action: Dict[str, Any],
replay_state: Dict[str, Any],
last_screenshot: Optional[str],
) -> PausePayload:
"""Construit le payload de pause enrichi pour une action pause_for_human."""
params = action.get("parameters") or {}
message = params.get("message", "Validation requise")
safety_level = params.get("safety_level")
declarative = params.get("safety_checks") or []
# Normalisation des checks déclaratifs
checks: List[Dict[str, Any]] = []
for d in declarative:
checks.append({
"id": d.get("id") or f"decl_{uuid.uuid4().hex[:6]}",
"label": d.get("label", "Validation"),
"required": bool(d.get("required", True)),
"source": "declarative",
"evidence": None,
})
# Ajout LLM contextual si applicable
if safety_level == "medical_critical" and _env_bool_enabled("RPA_SAFETY_CHECKS_LLM_ENABLED"):
try:
additional = _call_llm_for_contextual_checks(
action=action,
replay_state=replay_state,
last_screenshot=last_screenshot,
existing_labels=[c["label"] for c in checks],
)
except Exception as e:
logger.warning("[BUS] lea:safety_checks_llm_failed reason=exception detail=%s", e)
additional = []
for a in additional:
checks.append({
"id": f"llm_{uuid.uuid4().hex[:6]}",
"label": a.get("label", ""),
"required": False, # checks LLM = informationnels, pas obligatoires V1
"source": "llm_contextual",
"evidence": a.get("evidence", ""),
})
return PausePayload(
checks=checks,
pause_reason="",
message=message,
)
def _call_llm_for_contextual_checks(
action: Dict[str, Any],
replay_state: Dict[str, Any],
last_screenshot: Optional[str],
existing_labels: List[str],
) -> List[Dict[str, str]]:
"""Appelle Ollama en mode JSON strict pour générer 0-N checks contextuels.
Returns:
List[{label, evidence}] (max RPA_SAFETY_CHECKS_LLM_MAX_CHECKS).
[] sur tout échec (timeout, JSON invalide, exception).
"""
import requests
# Défaut gemma4:latest : meilleur compromis détection/latence sur bench
# 2026-05-06 (cf. docs/BENCH_SAFETY_CHECKS_2026-05-06.md). medgemma:4b
# retournait systématiquement [] (refus de signaler).
model = _env("RPA_SAFETY_CHECKS_LLM_MODEL", "gemma4:latest")
# Timeout 7s : warm avg gemma4 = 2.9s + marge 4s. Cold start ~10s couvert
# si le modèle reste résident (OLLAMA_KEEP_ALIVE=24h recommandé prod).
timeout_s = _env_int("RPA_SAFETY_CHECKS_LLM_TIMEOUT_S", 7)
max_checks = _env_int("RPA_SAFETY_CHECKS_LLM_MAX_CHECKS", 3)
ollama_url = _env("OLLAMA_URL", "http://localhost:11434")
params = action.get("parameters") or {}
workflow_message = params.get("message", "")
existing = ", ".join(existing_labels) if existing_labels else "aucun"
prompt = f"""Tu es Léa, assistante médicale supervisée.
Avant de continuer le workflow, tu dois lister 0 à {max_checks} vérifications supplémentaires
que l'humain doit acquitter, en regardant l'écran actuel.
Contexte workflow : {workflow_message}
Checks déjà demandés : {existing}
NE répète PAS un check déjà demandé.
Si rien d'inhabituel à signaler, retourne {{"additional_checks": []}}.
Réponds UNIQUEMENT en JSON :
{{
"additional_checks": [
{{"label": "string court", "evidence": "ce que tu as vu d'inhabituel"}}
]
}}
"""
payload = {
"model": model,
"prompt": prompt,
"stream": False,
"format": "json",
"options": {"temperature": 0.1, "num_predict": 200},
}
if last_screenshot and os.path.isfile(last_screenshot):
try:
with open(last_screenshot, "rb") as f:
payload["images"] = [base64.b64encode(f.read()).decode("ascii")]
except Exception as e:
logger.debug("safety_checks: lecture screenshot échouée (%s) — appel sans image", e)
try:
response = requests.post(
f"{ollama_url}/api/generate",
json=payload,
timeout=timeout_s,
)
if response.status_code != 200:
logger.warning("[BUS] lea:safety_checks_llm_failed reason=http_status detail=%s", response.status_code)
return []
text = response.json().get("response", "").strip()
except requests.Timeout:
logger.warning("[BUS] lea:safety_checks_llm_failed reason=timeout detail=%ss", timeout_s)
return []
except Exception as e:
logger.warning("[BUS] lea:safety_checks_llm_failed reason=network detail=%s", e)
return []
# format=json garantit normalement du JSON valide
try:
parsed = json.loads(text)
except json.JSONDecodeError as e:
logger.warning("[BUS] lea:safety_checks_llm_failed reason=json_decode detail=%s", e)
return []
additional = parsed.get("additional_checks") or []
if not isinstance(additional, list):
return []
# Filtre + tronc
valid = []
for item in additional[:max_checks]:
if isinstance(item, dict) and item.get("label"):
valid.append({
"label": str(item["label"])[:200],
"evidence": str(item.get("evidence", ""))[:300],
})
return valid

View File

@@ -22,6 +22,18 @@ try:
except ImportError:
PYAUTOGUI_AVAILABLE = False
try:
import mss
MSS_AVAILABLE = True
except ImportError:
MSS_AVAILABLE = False
try:
from PIL import Image as PILImage
PIL_AVAILABLE = True
except ImportError:
PIL_AVAILABLE = False
def safe_type_text(text: str):
"""Saisie de texte compatible VM/Citrix et claviers AZERTY/QWERTY.
@@ -157,11 +169,13 @@ def handle_detected_pattern(pattern: Dict[str, Any]) -> bool:
screenshot = sct.grab(monitor)
screen = Image.frombytes('RGB', screenshot.size, screenshot.bgra, 'raw', 'BGRX')
# EasyOCR (rapide, bonne qualité GUI) avec fallback docTR
# EasyOCR (rapide, bonne qualité GUI) avec fallback docTR.
# gpu=True : harmonisé avec dialog_handler.py et title_verifier.py.
# Coût VRAM ~0.5 GB, sous le budget RTX 5070 (cf. deploy/VRAM_BUDGET.md).
words = []
try:
import easyocr
_reader = easyocr.Reader(['fr', 'en'], gpu=False, verbose=False)
_reader = easyocr.Reader(['fr', 'en'], gpu=True, verbose=False)
results = _reader.readtext(np.array(screen))
for (bbox_pts, text, conf) in results:
if not text or len(text.strip()) < 1:
@@ -312,6 +326,7 @@ def find_element_on_screen(
target_description: str = "",
anchor_image_base64: Optional[str] = None,
anchor_bbox: Optional[Dict] = None,
monitor_idx: Optional[int] = None,
) -> Optional[Dict[str, Any]]:
"""
Cherche un élément sur l'écran en utilisant 3 méthodes en cascade.
@@ -325,6 +340,7 @@ def find_element_on_screen(
target_description: Description plus longue (ex: "le dossier Demo sur le bureau")
anchor_image_base64: Image de référence de l'ancre (pour CLIP matching, réservé futur)
anchor_bbox: Position originale de l'ancre (pour désambiguïser les matchs multiples)
monitor_idx: Index logique 0..N-1 du monitor à scruter. None = composite legacy.
Returns:
{'x': int, 'y': int, 'method': str, 'confidence': float} ou None
@@ -347,6 +363,13 @@ def find_element_on_screen(
logger.debug("find_element_on_screen: ni target_text ni target_description fournis")
return None
# Propager monitor_idx au niveau OCR via anchor_bbox (sans muter l'argument original)
if monitor_idx is not None and anchor_bbox is not None:
anchor_bbox = dict(anchor_bbox) # copie pour ne pas muter l'argument
anchor_bbox["monitor_idx"] = monitor_idx
elif monitor_idx is not None:
anchor_bbox = {"monitor_idx": monitor_idx}
search_label = target_description or target_text
logger.info(f"[Grounding] Recherche élément: '{search_label}' (cascade 3 niveaux)")
@@ -356,12 +379,12 @@ def find_element_on_screen(
return result
# ─── Niveau 2 — UI-TARS grounding (~3s) ───
result = _grounding_ui_tars(target_text, target_description)
result = _grounding_ui_tars(target_text, target_description, monitor_idx=monitor_idx)
if result:
return result
# ─── Niveau 3 — VLM reasoning (~10s) ───
result = _grounding_vlm(target_text, target_description)
result = _grounding_vlm(target_text, target_description, monitor_idx=monitor_idx)
if result:
return result
@@ -411,20 +434,43 @@ def _describe_anchor_image(anchor_image_base64: str) -> Optional[str]:
return None
def _capture_screen():
"""Capture l'écran principal et retourne (PIL.Image, width, height)."""
try:
import mss
from PIL import Image as PILImage
def _capture_screen(monitor_idx=None):
"""Capture l'écran et retourne (PIL.Image, width, height, offset_x, offset_y).
Args:
monitor_idx: Index logique 0..N-1 du monitor à capturer (cf. screeninfo).
Si None : capture composite (mss.monitors[0]) — comportement legacy.
Returns:
(image, w, h, offset_x, offset_y). offset = (0,0) en mode composite.
"""
try:
with mss.mss() as sct:
monitor = sct.monitors[0]
if monitor_idx is None:
# Comportement actuel : composite tous écrans
monitor = sct.monitors[0]
offset_x, offset_y = 0, 0
else:
# mss skip monitors[0] (composite). Index logique 0 → mss.monitors[1].
mss_idx = int(monitor_idx) + 1
if mss_idx >= len(sct.monitors):
logger.warning(
"mss.monitors[%d] hors limites (n=%d) — fallback composite",
mss_idx, len(sct.monitors),
)
monitor = sct.monitors[0]
offset_x, offset_y = 0, 0
else:
monitor = sct.monitors[mss_idx]
offset_x = int(monitor.get("left", 0))
offset_y = int(monitor.get("top", 0))
screenshot = sct.grab(monitor)
screen = PILImage.frombytes('RGB', screenshot.size, screenshot.bgra, 'raw', 'BGRX')
return screen, monitor['width'], monitor['height']
return screen, monitor['width'], monitor['height'], offset_x, offset_y
except Exception as e:
logger.debug(f"Capture écran échouée: {e}")
return None, 0, 0
return None, 0, 0, 0, 0
def _grounding_ocr(target_text: str, anchor_bbox: Optional[Dict] = None) -> Optional[Dict[str, Any]]:
@@ -439,7 +485,8 @@ def _grounding_ocr(target_text: str, anchor_bbox: Optional[Dict] = None) -> Opti
return None
try:
screen, screen_w, screen_h = _capture_screen()
monitor_idx_param = anchor_bbox.get("monitor_idx") if anchor_bbox else None
screen, screen_w, screen_h, ox, oy = _capture_screen(monitor_idx=monitor_idx_param)
if screen is None:
return None
@@ -503,14 +550,14 @@ def _grounding_ocr(target_text: str, anchor_bbox: Optional[Dict] = None) -> Opti
sel = " ← CHOISI" if m is best else ""
logger.info(f" [OCR] Candidat: '{m['text']}' à ({m['x']}, {m['y']}) [{m['type']}]{sel}")
return {'x': best['x'], 'y': best['y'], 'method': 'ocr', 'confidence': best['conf']}
return {'x': best['x'] + ox, 'y': best['y'] + oy, 'method': 'ocr', 'confidence': best['conf']}
except Exception as e:
logger.debug(f"[Grounding/OCR] Erreur: {e}")
return None
def _grounding_ui_tars(target_text: str, target_description: str = "") -> Optional[Dict[str, Any]]:
def _grounding_ui_tars(target_text: str, target_description: str = "", monitor_idx=None) -> Optional[Dict[str, Any]]:
"""Niveau 2 — UI-TARS grounding visuel (~3s)."""
try:
import requests
@@ -519,7 +566,7 @@ def _grounding_ui_tars(target_text: str, target_description: str = "") -> Option
import re
import os
screen, screen_w, screen_h = _capture_screen()
screen, screen_w, screen_h, ox, oy = _capture_screen(monitor_idx=monitor_idx)
if screen is None:
return None
@@ -564,7 +611,7 @@ def _grounding_ui_tars(target_text: str, target_description: str = "") -> Option
# Valider que les coordonnées sont dans l'écran
if 0 <= x <= screen_w and 0 <= y <= screen_h:
logger.info(f"[Grounding/UI-TARS] Grounding → ({x}, {y})")
return {'x': x, 'y': y, 'method': 'ui_tars', 'confidence': 0.85}
return {'x': x + ox, 'y': y + oy, 'method': 'ui_tars', 'confidence': 0.85}
else:
logger.warning(f"[Grounding/UI-TARS] Coordonnées hors écran: ({x}, {y}) pour {screen_w}x{screen_h}")
return None
@@ -624,7 +671,7 @@ def _parse_ui_tars_coordinates(text: str, screen_w: int, screen_h: int) -> Optio
return None
def _grounding_vlm(target_text: str, target_description: str = "") -> Optional[Dict[str, Any]]:
def _grounding_vlm(target_text: str, target_description: str = "", monitor_idx=None) -> Optional[Dict[str, Any]]:
"""Niveau 3 — VLM reasoning + confirmation OCR (~10s)."""
try:
search_label = target_description or target_text
@@ -646,7 +693,7 @@ def _grounding_vlm(target_text: str, target_description: str = "") -> Optional[D
logger.info(f"[Grounding/VLM] VLM suggère de cliquer sur: '{vlm_target}'")
# Confirmation par OCR : chercher le target VLM sur l'écran
screen, screen_w, screen_h = _capture_screen()
screen, screen_w, screen_h, ox, oy = _capture_screen(monitor_idx=monitor_idx)
if screen is None:
return None
@@ -668,7 +715,7 @@ def _grounding_vlm(target_text: str, target_description: str = "") -> Optional[D
x = int((x1 + x2) / 2)
y = int((y1 + y2) / 2)
logger.info(f"[Grounding/VLM] Confirmé par OCR: '{word['text']}' à ({x}, {y})")
return {'x': x, 'y': y, 'method': 'vlm', 'confidence': 0.75}
return {'x': x + ox, 'y': y + oy, 'method': 'vlm', 'confidence': 0.75}
logger.debug(f"[Grounding/VLM] Target VLM '{vlm_target}' non trouvé par OCR")
return None

View File

@@ -0,0 +1,95 @@
# Bench QW4 safety_checks — sélection du LLM contextuel
**Date** : 2026-05-06
**Contexte** : QW4 du sprint mai. La fonction `_call_llm_for_contextual_checks`
appelle Ollama avec un screenshot + prompt court pour générer 0-3 checks de
vérification supplémentaires que l'humain doit acquitter avant la reprise
d'un replay en pause supervisée (`safety_level=medical_critical`).
## Méthodologie
- **5 scénarios** : screenshots synthétiques de dossiers patient avec UNE
anomalie volontaire chacun (date de naissance aberrante, IPP incohérent,
diagnostic vide, code CIM inadapté à l'âge, forfait incohérent avec durée).
- **5 candidats** : `gemma4:latest`, `qwen3-vl:8b`, `qwen2.5vl:7b`,
`qwen2.5vl:3b`, `medgemma:4b`.
- **Protocole par modèle** : déchargement VRAM (keep_alive=0 sur tous les
modèles loaded) → 1er appel = cold start chronométré → 4 autres screenshots
× 3 runs = 12 mesures warm.
- **Métriques** : cold start, warm avg, warm p95, % JSON valide, % détection
(anomalie cible présente dans label/evidence d'au moins un check renvoyé).
- **Script** : `tools/bench_safety_checks_models.py`.
## Résultats
| Modèle | Cold (s) | Warm avg (s) | Warm p95 (s) | JSON | Détection |
|---|---:|---:|---:|---:|---:|
| `gemma4:latest` | 10.6 | **2.9** | 3.4 | 92% (12/13) | **46% (6/13)** |
| `qwen3-vl:8b` | 5.6 | — | — | **0%** (0/12) | 0% (0/12) |
| `qwen2.5vl:7b` | 9.4 | 6.6 | 8.1 | 100% (13/13) | 23% (3/13) |
| `qwen2.5vl:3b` | 6.0 | 2.0 | 2.5 | 100% (13/13) | 8% (1/13) |
| `medgemma:4b` | 2.0 | 0.5 | 0.7 | 100% (13/13) | **0%** (0/13) |
## Lecture
- **`medgemma:4b` retourne systématiquement `[]`** sur les 13 mesures.
Trop obéissant à "Si rien d'inhabituel à signaler, retourne []", refuse
de pointer ne serait-ce qu'une date 1900-01-01. **Mauvais choix par défaut**
malgré sa rapidité et sa spécialisation médicale revendiquée.
- **`qwen3-vl:8b` ignore `format=json` Ollama** : 0 réponse parsable. À écarter
pour cette tâche tant que le tooling Ollama / le modèle ne convergent pas.
- **`qwen2.5vl:7b`** détecte mais 2× plus lent (warm 6.6s) que gemma4 et tend
à inventer des anomalies de format de date qui ne sont pas la vraie cible.
- **`qwen2.5vl:3b`** rapide mais détection 8% — il "vérifie pour vérifier"
(renvoie souvent "vérification de la date de naissance" même quand la date
est correcte).
- **`gemma4:latest` gagne** : meilleur taux de détection (46%) ET deuxième
meilleur warm (2.9s). Tend à raisonner cohérence motif/diagnostic plutôt
que valeurs aberrantes brutes.
## Détail détection par scénario
| Scénario | gemma4 | qwen2.5vl:7b | qwen2.5vl:3b | medgemma:4b |
|---|:---:|:---:|:---:|:---:|
| Date naissance aberrante (1900) | ❌ | ✅ | ✅ | ❌ |
| IPP incohérent (`ABC@@##XYZ`) | ❌ | ❌ | ❌ | ❌ |
| Diagnostic principal vide | ✅ | ❌ | ❌ | ❌ |
| Code CIM inadapté à l'âge | ✅ | ❌ | ❌ | ❌ |
| Forfait UHCD vs durée 1h | ❌ | ❌ | ❌ | ❌ |
Aucun modèle ne détecte les 5 scénarios. **L'IPP corrompu et le forfait
incohérent ne sont détectés par personne** — ces anomalies demanderaient
soit un prompt plus dirigé (liste explicite des champs à vérifier), soit
un modèle plus large.
## Décision
- **Défaut serveur** : `RPA_SAFETY_CHECKS_LLM_MODEL=gemma4:latest`
- **Timeout** : `RPA_SAFETY_CHECKS_LLM_TIMEOUT_S=7` (warm 2.9s + marge)
- **Persistance VRAM** : `OLLAMA_KEEP_ALIVE=24h` recommandé pour éviter le
cold start de 10s en démo
Modifications appliquées dans `agent_v0/server_v1/safety_checks_provider.py`.
## Limites & travail futur
1. **46% de détection est faible** : à présenter comme aide au médecin, pas
comme certification. Le médecin reste le décideur.
2. **Prompt actuel trop générique** : un prompt qui liste explicitement les
champs à vérifier (DDN, IPP, diagnostic, forfait, cohérence âge/diagnostic)
donnerait probablement de meilleurs résultats. À mesurer en V2.
3. **Bench sur 5 anomalies seulement** : à étendre dès qu'on a un corpus de
vrais dossiers Easily Assure avec anomalies confirmées par Pauline / Amina.
4. **Pas de test sur des dossiers SANS anomalie** (faux positifs) : à ajouter.
5. **Pas de bench des modèles cloud** (gemma3:27b-cloud, deepseek, gpt-oss)
par contrainte 100% local — mais à explorer si on lève cette contrainte
pour les checks contextuels (qui ne contiennent pas de PII si on
anonymise les screenshots).
## Reproductibilité
```bash
cd /home/dom/ai/rpa_vision_v3
.venv/bin/python tools/bench_safety_checks_models.py
# (BENCH_TIMEOUT=60 par défaut, ~10-15 min sur RTX 5070)
```

View File

@@ -0,0 +1,343 @@
# QW Suite Mai — Smoke tests pour validation manuelle
**Date d'exécution prévue** : 2026-05-06 (matin)
**Branche** : `feature/qw-suite-mai`
**Durée estimée** : ~1h20 si tout passe, +30 min de debug par test KO
> Coche au fur et à mesure. Si un test KO, applique le "Si KO" puis re-tente.
> Tout test critique en KO bloquant → kill-switch (procédure §10).
---
## §0. Préflight (5 min)
- [ ] **0.1** Vérifier branche : `git -C /home/dom/ai/rpa_vision_v3 branch --show-current`
Attendu : `feature/qw-suite-mai`
- [ ] **0.2** Vérifier les commits récents : `git -C /home/dom/ai/rpa_vision_v3 log --oneline -15`
Attendu : voir tous les commits du sprint (spec, plan, QW1×4, QW2×2, QW4×3, docs, fixes A/B/C éventuels)
- [ ] **0.3** Lancer la baseline rapide :
```bash
cd /home/dom/ai/rpa_vision_v3
.venv/bin/pytest tests/unit/test_monitor_router.py \
tests/unit/test_loop_detector.py \
tests/unit/test_safety_checks_provider.py \
tests/integration/test_grounding_offset.py \
tests/integration/test_loop_detector_replay.py \
tests/integration/test_replay_resume_acknowledgments.py \
-q
```
Attendu : `27 passed` (en ~5s).
Si KO : ne pas continuer, regarder l'erreur et m'appeler.
- [ ] **0.4** Vérifier les services systemd :
```bash
./svc.sh status
```
Attendu : `streaming`, `vwb-backend`, `vwb-frontend`, `dashboard` au minimum running.
Si KO : `./svc.sh start` puis re-vérifier.
- [ ] **0.5** Ouvrir un terminal dédié pour `journalctl` (sera utilisé tout le long) :
```bash
journalctl -u rpa-streaming -f
```
Le laisser ouvert dans un coin de l'écran.
---
## §1. Test QW1 mono-écran (10 min) — RÉGRESSION
**But** : prouver que le sprint n'a pas cassé un workflow Easily Assure existant.
- [ ] **1.1** Ouvrir VWB : `https://vwb.labs.laurinebazin.design` (ou `http://localhost:3002` en local)
- [ ] **1.2** Sélectionner un workflow validé le 30/04 sur Easily Assure (UHCD ou Forfait, le plus simple).
- [ ] **1.3** Cliquer "→ Windows" pour lancer le replay sur Agent V1.
- [ ] **1.4** Pendant l'exécution, dans le terminal `journalctl`, chercher la ligne :
```
[BUS] lea:monitor_routed source=focus|composite_fallback ...
```
Attendu : au moins 1 occurrence par action visuelle. Sur poste mono-écran, `source=composite_fallback` ou `source=focus` (les deux sont OK).
- [ ] **1.5** Le replay doit terminer **identique** à avant (mêmes clics aux mêmes endroits).
**Verdict** : ☐ OK ☐ KO
**Si KO** : noter l'écart visuel, kill-switch QW2/QW4 (§10) puis re-tester. Si encore KO → rollback (§11).
---
## §2. Test QW1 multi-écrans (15 min, optionnel) — VALEUR AJOUTÉE
**But** : prouver que le ciblage par écran fonctionne. **Skip si tu n'as qu'un seul écran sur le poste de démo.**
- [ ] **2.1** Brancher un 2ème écran sur le poste Windows (Agent V1).
- [ ] **2.2** Vérifier qu'Agent V1 voit les 2 écrans :
```bash
ssh dom@192.168.1.11
C:\rpa_vision\.venv\Scripts\python.exe -c "from screeninfo import get_monitors; print([(m.x, m.y, m.width, m.height) for m in get_monitors()])"
```
Attendu : 2 tuples affichés.
- [ ] **2.3** Lancer le même workflow Easily Assure (§1.2).
- [ ] **2.4** Dans `journalctl`, observer :
- Heartbeats Windows enrichis (cf. fix A) : la session reçoit `monitor_index` en continu.
- `[BUS] lea:monitor_routed source=focus idx=0` ou `idx=1` selon où Easily est ouvert.
- [ ] **2.5** Déplacer la fenêtre Easily Assure sur le 2ème écran avant un nouveau replay → relancer → vérifier que le clic atterrit sur le 2ème écran (pas sur le composite).
**Verdict** : ☐ OK ☐ KO ☐ Skipped (pas de 2ème écran)
---
## §3. Test QW2 LoopDetector — boucle artificielle (10 min)
**But** : prouver que Léa s'arrête seule quand elle tourne en rond.
- [ ] **3.1** Dupliquer un workflow simple (1-2 actions) dans VWB.
- [ ] **3.2** Modifier la 1ère action `click` pour qu'elle cible un `target_text` impossible (ex: `target_text="ZZZZZ_INEXISTANT_999"`).
- [ ] **3.3** Lancer le replay.
- [ ] **3.4** Dans `journalctl`, attendre l'apparition de :
```
LoopDetector: replay XXX mis en pause — signal=retry_threshold ...
[BUS] lea:loop_detected ...
```
Délai attendu : ~30-60s (3 retries × ~10s par retry visuel).
- [ ] **3.5** Côté VWB : la bulle `PauseDialog` doit apparaître avec `pause_reason=loop_detected`.
- [ ] **3.6** Cliquer "Annuler" pour arrêter le replay propre.
**Verdict** : ☐ OK ☐ KO
**Si KO** : vérifier `RPA_LOOP_DETECTOR_ENABLED=1` (défaut). Si toujours KO → log dans `journalctl` doit donner la raison.
---
## §4. Test QW4 backward — workflow legacy (5 min)
**But** : prouver qu'un `pause_for_human` existant continue à marcher exactement comme avant.
- [ ] **4.1** Sélectionner un workflow ayant déjà une action `pause_for_human` (sans `safety_level` ni `safety_checks`).
- [ ] **4.2** Lancer le replay.
- [ ] **4.3** Quand la pause apparaît : la bulle doit être **identique** à avant (juste le `message`, boutons Continuer/Annuler, **PAS** de checklist).
- [ ] **4.4** Dans `journalctl`, vérifier qu'**aucun** appel à Ollama `medgemma:4b` n'est lancé (pas de ligne avec ce modèle).
- [ ] **4.5** Cliquer Continuer → le replay doit reprendre sans erreur.
**Verdict** : ☐ OK ☐ KO
**Si KO** : régression. Kill-switch QW4 (§10) + re-test.
---
## §5. Test QW4 safety_checks déclaratifs (15 min)
**But** : prouver que la checklist s'affiche et bloque le Continue tant que les required ne sont pas cochés.
- [ ] **5.1** Dans VWB, créer ou modifier un workflow pour insérer une action `pause_for_human` avec :
- `message` : "Validation patient"
- `safety_level` : `standard` (PAS medical_critical, on isole le déclaratif)
- `safety_checks` : 2 entrées
- `{id: "check_ipp", label: "IPP correct ?", required: true}`
- `{id: "check_diag", label: "Diagnostic confirmé ?", required: true}`
- [ ] **5.2** Sauvegarder, lancer le replay.
- [ ] **5.3** Quand la pause apparaît :
- ☐ Bulle "Pause supervisée" affichée
- ☐ 2 cases à cocher visibles avec badges `[obligatoire]`
- ☐ Bouton "Continuer" désactivé (grisé)
- ☐ Aucun badge `[Léa]` (pas de medical_critical → pas de LLM)
- [ ] **5.4** Cocher 1 seule case → Continuer reste désactivé.
- [ ] **5.5** Cocher la 2ème case → Continuer s'active.
- [ ] **5.6** Cliquer Continuer → replay reprend.
- [ ] **5.7** Test de sécurité : forcer un POST `/api/v3/replay/resume` sans cocher (via curl) :
```bash
# Récupérer le replay_id en cours via VWB ou journalctl
curl -X POST http://localhost:5002/api/v3/replay/resume \
-H "Content-Type: application/json" \
-d '{"replay_id":"<replay_id>","acknowledged_check_ids":[]}'
```
Attendu : `400 {"detail": {"error": "required_checks_missing", "missing": ["check_ipp","check_diag"]}}`
**Verdict** : ☐ OK ☐ KO
---
## §6. Test QW4 medical_critical avec LLM (15 min)
**But** : prouver que Léa appelle medgemma:4b en moins de 5s et ajoute des checks contextuels.
- [ ] **6.1** Vérifier que `medgemma:4b` est dispo dans Ollama :
```bash
ollama list | grep medgemma
```
Attendu : `medgemma:4b` listé. Si absent : `ollama pull medgemma:4b` (3.3 GB).
- [ ] **6.2** Reprendre le workflow §5.1 et changer `safety_level: medical_critical`.
- [ ] **6.3** Lancer le replay.
- [ ] **6.4** Quand la pause apparaît :
- ☐ Bulle affichée
- ☐ 2 checks déclaratifs (badges `[obligatoire]`)
- ☐ 0 à 3 checks supplémentaires avec badge `[Léa]` bleu (tooltip = evidence)
- ☐ Délai d'apparition < 5s (sinon le timeout a sauvé)
- [ ] **6.5** Dans `journalctl`, vérifier la ligne :
```
[BUS] lea:safety_checks_generated count=N sources=['declarative', 'declarative', 'llm_contextual', ...]
```
- [ ] **6.6** Si Ollama timeout ou crash, vérifier la ligne :
```
[BUS] lea:safety_checks_llm_failed reason=... detail=...
```
Et la pause s'affiche tout de même avec les 2 checks déclaratifs (fallback safe).
**Verdict** : ☐ OK ☐ KO
---
## §7. Test bus events `lea:*` (5 min)
**But** : agréger les events vus pour audit démo.
- [ ] **7.1** Lancer un replay complet de A à Z (workflow §1 ou §6).
- [ ] **7.2** À la fin, extraire tous les events `[BUS]` du journal :
```bash
journalctl -u rpa-streaming --since "10 minutes ago" | grep "\[BUS\]" | tail -30
```
- [ ] **7.3** Vérifier la présence d'au moins :
- `lea:monitor_routed` (au moins 1 par action visuelle)
- `lea:safety_checks_generated` (si test §6 fait, au moins 1)
- `lea:loop_detected` (si test §3 fait)
**Verdict** : ☐ OK ☐ KO
---
## §8. Test kill-switches (10 min) — RÉFLEXE DÉMO
**But** : savoir désactiver QW2/QW4 en pleine démo si ça part en vrille.
- [ ] **8.1** Désactiver QW2 + QW4 :
```bash
sudo systemctl edit rpa-streaming
# Ajouter sous [Service] :
Environment=RPA_LOOP_DETECTOR_ENABLED=0
Environment=RPA_SAFETY_CHECKS_LLM_ENABLED=0
# Sauver, sortir
sudo systemctl restart rpa-streaming
```
- [ ] **8.2** Re-lancer un replay quelconque.
- [ ] **8.3** Dans `journalctl` : vérifier qu'**aucun** event `lea:loop_detected` ni `lea:safety_checks_generated` n'apparaît.
- [ ] **8.4** Réactiver (avant la démo réelle) :
```bash
sudo systemctl edit rpa-streaming
# Supprimer les 2 lignes Environment=...
sudo systemctl restart rpa-streaming
```
- [ ] **8.5** Re-vérifier qu'un replay normal réémet les bus events.
**Verdict** : ☐ OK ☐ KO
---
## §9. Test rollback complet (procédure) — RÉFLEXE D'URGENCE
**À NE PAS exécuter sauf vraie urgence**, juste connaître la commande :
```bash
cd /home/dom/ai/rpa_vision_v3
git checkout backup/pre-qw-suite-mai-2026-05-05
./svc.sh restart
```
Pour revenir au sprint après rollback :
```bash
git checkout feature/qw-suite-mai
./svc.sh restart
```
- [ ] **9.1** Lire la procédure, savoir où elle est documentée (`docs/QW_SUITE_MAI.md`).
---
## §10. Si problème en pleine démo
Ordre des réflexes :
1. **Kill-switch QW2 d'abord** (LoopDetector = couche passive, désactiver est sans risque) :
```bash
sudo systemctl set-environment RPA_LOOP_DETECTOR_ENABLED=0
sudo systemctl restart rpa-streaming
```
*(set-environment est plus rapide que `systemctl edit` mais ne survit pas au reboot — OK pour démo)*
2. **Kill-switch QW4 ensuite** si toujours problème :
```bash
sudo systemctl set-environment RPA_SAFETY_CHECKS_LLM_ENABLED=0
sudo systemctl restart rpa-streaming
```
3. **Rollback complet** si toujours KO (cf. §9).
---
## §11. Récap final
À cocher après tous les tests pour acter "prêt démo" :
- [ ] §1 mono-écran OK (régression zéro)
- [ ] §2 multi-écrans OK ou skip assumé
- [ ] §3 LoopDetector OK
- [ ] §4 backward QW4 OK
- [ ] §5 safety_checks déclaratifs OK
- [ ] §6 medical_critical + LLM OK
- [ ] §7 bus events visibles dans journalctl
- [ ] §8 kill-switches testés et fonctionnels
- [ ] §9 procédure rollback connue
**Si tout coché → démo GHT GO** 🟢
**Si §1 ou §3 ou §5 KO → démo NO-GO sans fix** 🔴
**Si §2 ou §6 KO → démo OK avec kill-switch QW correspondant** 🟡
---
## Annexes
- Spec : `docs/superpowers/specs/2026-05-05-qw-suite-mai-design.md`
- Plan d'exécution : `docs/superpowers/plans/2026-05-05-qw-suite-mai.md`
- Synthèse livraison : `docs/QW_SUITE_MAI.md`
- Backup distant : `backup/pre-qw-suite-mai-2026-05-05` (Gitea)
- Tests automatisés (référence 116 passed) :
```bash
.venv/bin/pytest tests/unit/test_monitor_router.py \
tests/unit/test_loop_detector.py \
tests/unit/test_safety_checks_provider.py \
tests/integration/test_grounding_offset.py \
tests/integration/test_loop_detector_replay.py \
tests/integration/test_replay_resume_acknowledgments.py \
tests/test_pipeline_e2e.py \
tests/test_phase0_integration.py \
tests/integration/test_stream_processor.py \
-q
```

101
docs/QW_SUITE_MAI.md Normal file
View File

@@ -0,0 +1,101 @@
# QW Suite Mai 2026 — Synthèse de livraison
Sprint d'amélioration RPA Vision V3, branche `feature/qw-suite-mai`,
inspiré par exploration comparative de 5 frameworks computer-use
(Simular Agent-S, browser-use, OpenAI CUA, Coasty, Showlab OOTB).
## Trois quick wins livrés
- **QW1 — Multi-écrans** : capture/grounding par `monitor_index` avec fallbacks
focus actif puis composite. Backward 100% sur workflows existants.
Ajoute `screeninfo>=0.8` aux dépendances Agent V1.
- **QW2 — LoopDetector composite** : détection passive de stagnation via
3 signaux (CLIP screen_static + action_repeat + retry_threshold).
Bascule en `paused_need_help` automatique.
- **QW4 — Safety checks hybrides** : `pause_for_human` enrichi de checks
déclaratifs (workflow) + LLM contextuels (`medgemma:4b` local, timeout 5s,
fallback safe). UX VWB avec ChecklistPanel acquittable + audit trail.
## Kill-switches en cas de problème
```bash
sudo systemctl edit rpa-streaming
# Ajouter sous [Service] :
Environment=RPA_LOOP_DETECTOR_ENABLED=0
Environment=RPA_SAFETY_CHECKS_LLM_ENABLED=0
sudo systemctl restart rpa-streaming
```
Rollback complet : `git checkout backup/pre-qw-suite-mai-2026-05-05`.
## Variables d'environnement utiles
| Variable | Défaut | Effet |
|---|---|---|
| `RPA_LOOP_DETECTOR_ENABLED` | `1` | Kill-switch QW2 (composite) |
| `RPA_LOOP_SCREEN_STATIC_THRESHOLD` | `0.99` | Seuil similarité CLIP |
| `RPA_LOOP_SCREEN_STATIC_N` | `4` | Nb captures consécutives |
| `RPA_LOOP_ACTION_REPEAT_N` | `3` | Nb actions identiques |
| `RPA_LOOP_RETRY_THRESHOLD` | `3` | Nb retries cumulés |
| `RPA_SAFETY_CHECKS_LLM_ENABLED` | `1` | Kill-switch QW4 LLM contextuel |
| `RPA_SAFETY_CHECKS_LLM_MODEL` | `medgemma:4b` | Modèle Ollama |
| `RPA_SAFETY_CHECKS_LLM_TIMEOUT_S` | `5` | Timeout dur (secondes) |
| `RPA_SAFETY_CHECKS_LLM_MAX_CHECKS` | `3` | Max checks LLM ajoutés |
## Smoke tests manuels à effectuer avant la démo GHT
Ces tests demandent une interaction VWB et un Agent V1 actif — non automatisables.
1. **QW1 multi-écrans** : rejouer un workflow Easily Assure validé. Vérifier
logs `[BUS] lea:monitor_routed` dans `journalctl -u rpa-streaming`. Le clic
doit atterrir au bon endroit même sur un poste à 2 écrans.
2. **QW2 LoopDetector** : optionnel, difficile à reproduire fiable. Si tu
constates un bouclage en démo, vérifier que `paused_need_help` se déclenche
automatiquement avec `pause_reason="loop_detected"`.
3. **QW4 safety_checks** :
- Workflow ancien sans `safety_checks` → bulle simple legacy s'affiche
- Workflow avec `safety_checks` déclaratifs → ChecklistPanel s'affiche,
bouton Continuer désactivé tant que required non cochés
- Workflow `safety_level: medical_critical` → checks LLM ajoutés en
plus (badge `[Léa]`), apparaissent dans les 5s
- POST `/api/v3/replay/resume` sans required acquitté → 400 toast UI
## Tests automatisés (référence)
```
.venv/bin/pytest tests/unit/test_monitor_router.py \
tests/integration/test_grounding_offset.py \
tests/unit/test_loop_detector.py \
tests/integration/test_loop_detector_replay.py \
tests/unit/test_safety_checks_provider.py \
tests/integration/test_replay_resume_acknowledgments.py \
-v
```
Référence : 24 tests QW + 89 baseline = 113 passed.
## Référence design
`docs/superpowers/specs/2026-05-05-qw-suite-mai-design.md`
## Référence plan d'exécution
`docs/superpowers/plans/2026-05-05-qw-suite-mai.md`
## Backup
Branche backup poussée Gitea avant le sprint :
`backup/pre-qw-suite-mai-2026-05-05` + tag `backup-pre-qw-suite-mai-2026-05-05`.
## Statut au 2026-05-05
| Composant | État | Smoke démo nécessaire |
|---|---|---|
| QW1 monitor_router + offsets | Livré, tests verts | Oui (multi-écran physique) |
| QW1 enrichissement Agent V1 | Livré, fallback gracieux si screeninfo absent | Oui (Windows réel) |
| QW1 hook serveur + cablage executor | Livré (commit fix fc01afa59) | Oui |
| QW2 LoopDetector module | Livré, tests verts | Non (impossible à reproduire fiable) |
| QW2 hook api_stream | Livré, tests verts | Non |
| QW4 SafetyChecksProvider | Livré, tests verts | Oui (avec workflow `medical_critical`) |
| QW4 endpoint /replay/resume + proxy VWB | Livré, tests verts | Oui (POST avec acknowledged_check_ids) |
| QW4 PauseDialog + PropertiesPanel | Livré, 0 nouvelle erreur TS | Oui (rendre la bulle dans VWB) |

View File

@@ -0,0 +1,41 @@
# tests/integration/test_grounding_offset.py
"""Tests intégration pour la propagation d'offset multi-écrans (QW1)."""
import pytest
from unittest.mock import patch, MagicMock
from core.execution import input_handler
@pytest.fixture
def mock_screen():
"""Mock une capture mss : retourne un PIL Image factice + offsets."""
from PIL import Image
img = Image.new("RGB", (1920, 1080), color="white")
return img
def test_capture_screen_default_returns_composite_when_no_idx(mock_screen):
"""_capture_screen() sans monitor_idx → composite, offset (0, 0)."""
with patch("core.execution.input_handler.mss") as mock_mss:
ctx = mock_mss.mss.return_value.__enter__.return_value
ctx.monitors = [{"left": 0, "top": 0, "width": 3840, "height": 1080}]
ctx.grab.return_value = MagicMock(size=(3840, 1080), bgra=b"\x00" * (3840 * 1080 * 4))
with patch("core.execution.input_handler.PILImage.frombytes", return_value=mock_screen):
screen, w, h, ox, oy = input_handler._capture_screen()
assert (w, h, ox, oy) == (3840, 1080, 0, 0)
def test_capture_screen_targets_specific_monitor_with_offset(mock_screen):
"""_capture_screen(monitor_idx=1) → cible monitors[2] (mss skip [0]), offset = monitor.left."""
with patch("core.execution.input_handler.mss") as mock_mss:
ctx = mock_mss.mss.return_value.__enter__.return_value
# mss layout : [0]=composite, [1]=primary, [2]=secondary
ctx.monitors = [
{"left": 0, "top": 0, "width": 3840, "height": 1080},
{"left": 0, "top": 0, "width": 1920, "height": 1080},
{"left": 1920, "top": 0, "width": 1920, "height": 1080},
]
ctx.grab.return_value = MagicMock(size=(1920, 1080), bgra=b"\x00" * (1920 * 1080 * 4))
with patch("core.execution.input_handler.PILImage.frombytes", return_value=mock_screen):
screen, w, h, ox, oy = input_handler._capture_screen(monitor_idx=1)
assert (w, h, ox, oy) == (1920, 1080, 1920, 0)

View File

@@ -0,0 +1,61 @@
# tests/integration/test_loop_detector_replay.py
"""Tests intégration : un replay simulé qui boucle bascule en paused_need_help."""
import pytest
from unittest.mock import MagicMock
from agent_v0.server_v1.loop_detector import LoopDetector
def test_replay_state_transitions_to_paused_on_screen_static():
"""Cas : 4 screenshots identiques → replay passe à paused_need_help."""
embedder = MagicMock()
embedder.embed_image.return_value = [1.0, 0.0, 0.0] # constant
detector = LoopDetector(clip_embedder=embedder)
state = {
"replay_id": "r_test",
"status": "running",
"retried_actions": 0,
"_screenshot_history": ["img1", "img2", "img3", "img4"], # 4 images factices
"_action_history": [
{"type": "click", "x_pct": 0.1, "y_pct": 0.1},
{"type": "type", "x_pct": 0.2, "y_pct": 0.2},
],
}
verdict = detector.evaluate(state, state["_screenshot_history"], state["_action_history"])
# Simuler ce que ferait api_stream après verdict
if verdict.detected:
state["status"] = "paused_need_help"
state["pause_reason"] = verdict.reason
state["pause_message"] = f"signal={verdict.signal}"
assert state["status"] == "paused_need_help"
assert state["pause_reason"] == "loop_detected"
assert "screen_static" in state["pause_message"]
def test_replay_state_transitions_on_action_repeat():
"""Cas : 3 actions identiques → paused_need_help signal action_repeat."""
detector = LoopDetector(clip_embedder=None)
actions = [{"type": "click", "x_pct": 0.5, "y_pct": 0.5}] * 3
state = {"replay_id": "r2", "status": "running", "retried_actions": 0,
"_screenshot_history": [], "_action_history": actions}
verdict = detector.evaluate(state, [], actions)
assert verdict.detected and verdict.signal == "action_repeat"
def test_kill_switch_keeps_replay_running(monkeypatch):
"""Avec RPA_LOOP_DETECTOR_ENABLED=0 le replay continue même en boucle."""
monkeypatch.setenv("RPA_LOOP_DETECTOR_ENABLED", "0")
embedder = MagicMock()
embedder.embed_image.return_value = [1.0, 0.0, 0.0]
detector = LoopDetector(clip_embedder=embedder)
state = {"retried_actions": 10,
"_screenshot_history": ["img1"] * 10,
"_action_history": [{"type": "click", "x_pct": 0.5, "y_pct": 0.5}] * 10}
verdict = detector.evaluate(state, state["_screenshot_history"], state["_action_history"])
assert verdict.detected is False

View File

@@ -0,0 +1,52 @@
# tests/integration/test_replay_resume_acknowledgments.py
"""Tests intégration : /replay/resume valide les acquittements de safety_checks (QW4)."""
import pytest
def test_resume_accepts_when_all_required_acknowledged():
"""État pause + tous required acquittés → reprise OK."""
state = {
"status": "paused_need_help",
"safety_checks": [
{"id": "c1", "label": "X", "required": True, "source": "declarative", "evidence": None},
{"id": "c2", "label": "Y", "required": True, "source": "declarative", "evidence": None},
],
"checks_acknowledged": [],
}
# Simuler la validation côté serveur
acknowledged = ["c1", "c2"]
required_ids = {c["id"] for c in state["safety_checks"] if c["required"]}
missing = required_ids - set(acknowledged)
assert missing == set() # rien ne manque → reprise OK
def test_resume_rejects_when_required_missing():
"""État pause + un required non acquitté → 400 required_checks_missing."""
state = {
"status": "paused_need_help",
"safety_checks": [
{"id": "c1", "label": "X", "required": True, "source": "declarative", "evidence": None},
{"id": "c2", "label": "Y", "required": False, "source": "llm_contextual", "evidence": "..."},
],
"checks_acknowledged": [],
}
acknowledged = ["c2"] # only optional
required_ids = {c["id"] for c in state["safety_checks"] if c["required"]}
missing = required_ids - set(acknowledged)
assert missing == {"c1"} # c1 manquant → resume doit retourner 400
def test_resume_audit_trail_stored():
"""checks_acknowledged contient les ids reçus (audit)."""
state = {
"status": "paused_need_help",
"safety_checks": [
{"id": "c1", "required": True, "label": "X", "source": "declarative", "evidence": None},
],
"checks_acknowledged": [],
}
acknowledged = ["c1"]
state["checks_acknowledged"] = acknowledged
state["status"] = "running"
assert state["checks_acknowledged"] == ["c1"]
assert state["status"] == "running"

View File

@@ -0,0 +1,96 @@
# tests/unit/test_loop_detector.py
"""Tests unitaires pour LoopDetector composite (QW2)."""
import os
import pytest
from unittest.mock import MagicMock
from agent_v0.server_v1.loop_detector import LoopDetector, LoopVerdict
@pytest.fixture
def detector():
"""LoopDetector avec embedder mocké (signal A toujours dispo)."""
embedder = MagicMock()
# Par défaut : 4 embeddings tous identiques → similarity 1.0
embedder.embed_image.return_value = [1.0, 0.0, 0.0]
return LoopDetector(clip_embedder=embedder)
def _state(retried=0, n_screenshots=0, n_actions=0):
return {
"retried_actions": retried,
"_screenshot_history": [[1.0, 0.0, 0.0]] * n_screenshots,
"_action_history": [{"type": "click", "x_pct": 0.5, "y_pct": 0.5}] * n_actions,
}
def test_screen_static_triggers_when_n_identical_embeddings(detector):
"""Signal A : 4 captures identiques (similarity > 0.99) → detected."""
state = _state(n_screenshots=4)
verdict = detector.evaluate(state, screenshots=state["_screenshot_history"], actions=[])
assert verdict.detected is True
assert verdict.signal == "screen_static"
def test_screen_static_skipped_when_history_too_short(detector):
"""Signal A : moins de N captures → pas de détection."""
state = _state(n_screenshots=2)
verdict = detector.evaluate(state, screenshots=state["_screenshot_history"], actions=[])
# Si seul A pourrait déclencher mais skip, et B/C pas remplis : detected=False
assert verdict.detected is False
def test_action_repeat_triggers_when_n_identical_actions(detector):
"""Signal B : 3 actions consécutives identiques → detected."""
state = _state(n_actions=3)
verdict = detector.evaluate(state, screenshots=[], actions=state["_action_history"])
assert verdict.detected is True
assert verdict.signal == "action_repeat"
def test_action_repeat_skipped_when_actions_differ(detector):
"""Signal B : actions différentes → pas de détection."""
actions = [
{"type": "click", "x_pct": 0.1, "y_pct": 0.1},
{"type": "click", "x_pct": 0.2, "y_pct": 0.2},
{"type": "click", "x_pct": 0.3, "y_pct": 0.3},
]
verdict = detector.evaluate(_state(), screenshots=[], actions=actions)
assert verdict.detected is False
def test_retry_threshold_triggers_at_3(detector):
"""Signal C : retried_actions >= 3 → detected."""
state = _state(retried=3)
verdict = detector.evaluate(state, screenshots=[], actions=[])
assert verdict.detected is True
assert verdict.signal == "retry_threshold"
def test_kill_switch_disables_all_signals(monkeypatch, detector):
"""Si RPA_LOOP_DETECTOR_ENABLED=0 → toujours detected=False."""
monkeypatch.setenv("RPA_LOOP_DETECTOR_ENABLED", "0")
state = _state(retried=10, n_screenshots=10, n_actions=10)
verdict = detector.evaluate(state, screenshots=state["_screenshot_history"],
actions=state["_action_history"])
assert verdict.detected is False
def test_embedder_unavailable_skips_signal_A_continues_others():
"""Si CLIP embedder None → signal A skip, B et C continuent."""
detector = LoopDetector(clip_embedder=None)
# Trigger signal C
state = _state(retried=3)
verdict = detector.evaluate(state, screenshots=[], actions=[])
assert verdict.detected is True
assert verdict.signal == "retry_threshold"
def test_embedder_exception_does_not_crash(detector):
"""Si embed_image lève une exception → log + verdict detected=False."""
detector.clip_embedder.embed_image.side_effect = RuntimeError("CUDA OOM")
state = _state(n_screenshots=4)
# Ne doit PAS lever : signal A devient inerte
verdict = detector.evaluate(state, screenshots=state["_screenshot_history"], actions=[])
# Signal A inerte, B/C pas remplis → detected False
assert verdict.detected is False

View File

@@ -0,0 +1,51 @@
# tests/unit/test_monitor_router.py
"""Tests unitaires pour MonitorRouter (QW1)."""
import pytest
from agent_v0.server_v1.monitor_router import resolve_target_monitor, MonitorTarget
# Geometry de référence pour les 3 tests : 2 écrans côte à côte
TWO_MONITORS = [
{"idx": 0, "x": 0, "y": 0, "w": 1920, "h": 1080, "primary": True},
{"idx": 1, "x": 1920, "y": 0, "w": 1920, "h": 1080, "primary": False},
]
def test_resolve_uses_action_monitor_index_when_present():
"""Si action.monitor_index présent et valide → cible cet écran."""
action = {"monitor_index": 1}
session_state = {"monitors_geometry": TWO_MONITORS, "last_focused_monitor": 0}
result = resolve_target_monitor(action, session_state)
assert result.idx == 1
assert result.offset_x == 1920
assert result.offset_y == 0
assert result.source == "action"
def test_resolve_falls_back_to_focused_monitor_when_action_missing():
"""Si action.monitor_index absent → fallback focus actif."""
action = {} # pas de monitor_index
session_state = {"monitors_geometry": TWO_MONITORS, "last_focused_monitor": 1}
result = resolve_target_monitor(action, session_state)
assert result.idx == 1
assert result.source == "focus"
def test_resolve_falls_back_to_composite_when_geometry_empty():
"""Si geometry vide (vieux Agent V1) → fallback composite (idx=-1, offset=0)."""
action = {}
session_state = {"monitors_geometry": [], "last_focused_monitor": None}
result = resolve_target_monitor(action, session_state)
assert result.source == "composite_fallback"
assert result.offset_x == 0
assert result.offset_y == 0
def test_resolve_falls_back_when_action_index_out_of_range():
"""Si action.monitor_index hors limites (écran débranché) → fallback focus."""
action = {"monitor_index": 5} # n'existe pas
session_state = {"monitors_geometry": TWO_MONITORS, "last_focused_monitor": 0}
result = resolve_target_monitor(action, session_state)
assert result.idx == 0
assert result.source == "focus"

View File

@@ -0,0 +1,111 @@
# tests/unit/test_safety_checks_provider.py
"""Tests unitaires SafetyChecksProvider (QW4)."""
import json
import pytest
from unittest.mock import patch, MagicMock
from agent_v0.server_v1.safety_checks_provider import build_pause_payload, PausePayload
def _action(safety_level=None, declarative_checks=None, message="Validation"):
params = {"message": message}
if safety_level:
params["safety_level"] = safety_level
if declarative_checks is not None:
params["safety_checks"] = declarative_checks
return {"type": "pause_for_human", "parameters": params}
def test_only_declarative_when_no_safety_level():
"""Pas de safety_level → uniquement les checks déclaratifs, pas d'appel LLM."""
decl = [{"id": "c1", "label": "Vérifier IPP", "required": True}]
with patch("agent_v0.server_v1.safety_checks_provider._call_llm_for_contextual_checks") as mock_llm:
payload = build_pause_payload(_action(declarative_checks=decl), {}, last_screenshot=None)
mock_llm.assert_not_called()
assert len(payload.checks) == 1
assert payload.checks[0]["source"] == "declarative"
def test_hybrid_appends_llm_checks_on_medical_critical(monkeypatch):
"""safety_level=medical_critical → LLM appelé, checks concaténés."""
decl = [{"id": "c1", "label": "Vérifier IPP", "required": True}]
llm_resp = [{"label": "Nom patient suspect à l'écran", "evidence": "vu un nom différent"}]
with patch("agent_v0.server_v1.safety_checks_provider._call_llm_for_contextual_checks",
return_value=llm_resp) as mock_llm:
payload = build_pause_payload(
_action(safety_level="medical_critical", declarative_checks=decl),
{}, last_screenshot="/tmp/fake.png",
)
mock_llm.assert_called_once()
assert len(payload.checks) == 2
assert payload.checks[0]["source"] == "declarative"
assert payload.checks[1]["source"] == "llm_contextual"
assert payload.checks[1]["evidence"] == "vu un nom différent"
def test_llm_timeout_falls_back_to_declarative_only():
"""LLM timeout → additional_checks=[], pas de crash, déclaratifs gardés."""
decl = [{"id": "c1", "label": "Vérifier IPP", "required": True}]
with patch("agent_v0.server_v1.safety_checks_provider._call_llm_for_contextual_checks",
return_value=[]) as mock_llm:
payload = build_pause_payload(
_action(safety_level="medical_critical", declarative_checks=decl),
{}, last_screenshot="/tmp/fake.png",
)
assert len(payload.checks) == 1
assert payload.checks[0]["source"] == "declarative"
def test_llm_invalid_response_falls_back():
"""Si _call_llm retourne [] (parse échoué en interne) → fallback safe."""
with patch("agent_v0.server_v1.safety_checks_provider._call_llm_for_contextual_checks",
return_value=[]):
payload = build_pause_payload(
_action(safety_level="medical_critical", declarative_checks=[]),
{}, last_screenshot="/tmp/fake.png",
)
assert payload.checks == []
def test_kill_switch_disables_llm_call(monkeypatch):
"""RPA_SAFETY_CHECKS_LLM_ENABLED=0 → LLM jamais appelé."""
monkeypatch.setenv("RPA_SAFETY_CHECKS_LLM_ENABLED", "0")
decl = [{"id": "c1", "label": "X", "required": True}]
with patch("agent_v0.server_v1.safety_checks_provider._call_llm_for_contextual_checks") as mock_llm:
payload = build_pause_payload(
_action(safety_level="medical_critical", declarative_checks=decl),
{}, last_screenshot="/tmp/fake.png",
)
mock_llm.assert_not_called()
assert len(payload.checks) == 1
def test_max_checks_respected(monkeypatch):
"""RPA_SAFETY_CHECKS_LLM_MAX_CHECKS=2 → max 2 checks LLM ajoutés."""
monkeypatch.setenv("RPA_SAFETY_CHECKS_LLM_MAX_CHECKS", "2")
decl = []
llm_resp = [
{"label": f"Check {i}", "evidence": f"e{i}"} for i in range(5)
]
with patch("agent_v0.server_v1.safety_checks_provider._call_llm_for_contextual_checks",
return_value=llm_resp[:2]): # provider tronque déjà
payload = build_pause_payload(
_action(safety_level="medical_critical", declarative_checks=decl),
{}, last_screenshot="/tmp/fake.png",
)
assert len(payload.checks) == 2
def test_empty_declarative_with_llm_returns_only_llm():
"""Pas de déclaratif + LLM ajoute 2 checks → payload contient les 2."""
llm_resp = [{"label": "Vérifier date", "evidence": "date 1900 suspecte"},
{"label": "Vérifier devise", "evidence": "montant en USD au lieu d'EUR"}]
with patch("agent_v0.server_v1.safety_checks_provider._call_llm_for_contextual_checks",
return_value=llm_resp):
payload = build_pause_payload(
_action(safety_level="medical_critical", declarative_checks=[]),
{}, last_screenshot="/tmp/fake.png",
)
assert len(payload.checks) == 2
assert all(c["source"] == "llm_contextual" for c in payload.checks)

View File

@@ -0,0 +1,437 @@
#!/usr/bin/env python3
"""Bench rigoureux des modèles candidats pour QW4 safety_checks contextuels.
Méthodologie :
- 5 screenshots synthétiques avec différentes anomalies cliniques
- 4 modèles candidats (gemma4:e4b sur :11435, qwen2.5vl:7b/3b et medgemma:4b sur :11434)
- Pour chaque modèle :
1. Décharger TOUS les modèles déjà en VRAM (keep_alive=0)
2. 1er appel = cold start chronométré (1er screenshot)
3. 12 appels warm = (4 autres screenshots × 3 runs)
4. Mesurer : cold_start, warm avg/p95, taux détection, JSON valide
Usage : .venv/bin/python tools/bench_safety_checks_models.py
"""
from __future__ import annotations
import base64
import json
import os
import statistics
import time
from dataclasses import dataclass, field
from typing import Any
import requests
from PIL import Image, ImageDraw, ImageFont
OLLAMA_PRIMARY = os.environ.get("OLLAMA_URL", "http://localhost:11434")
OLLAMA_SECONDARY = os.environ.get("GEMMA4_URL", "http://localhost:11435")
# Configuration des candidats : (nom, url, type)
CANDIDATES = [
("gemma4:latest", OLLAMA_PRIMARY, "vlm_default"),
("qwen3-vl:8b", OLLAMA_PRIMARY, "vision_qwen3_8b"),
("qwen2.5vl:7b", OLLAMA_PRIMARY, "vision_qwen25_7b"),
("qwen2.5vl:3b", OLLAMA_PRIMARY, "vision_qwen25_3b"),
("medgemma:4b", OLLAMA_PRIMARY, "medical_4b"),
]
TIMEOUT_S = int(os.environ.get("BENCH_TIMEOUT", "60")) # large pour ne rien rater
MAX_CHECKS = 3
WORKFLOW_MESSAGE = "Validation T2A avant codage UHCD"
EXISTING_LABELS: list[str] = []
WARM_RUNS_PER_SCREENSHOT = 3 # warm = 4 autres screenshots × 3 runs = 12 mesures
# ---------------------------------------------------------------------------
# Scénarios : 5 screenshots avec anomalies différentes
# ---------------------------------------------------------------------------
@dataclass
class Scenario:
label: str # nom court
rows: list[tuple[str, str]]
anomaly_keywords: list[str] # mots indiquant que l'anomalie est repérée
SCENARIOS = [
Scenario(
label="ddn_aberrante",
rows=[
("Nom :", "DUPONT Marie"),
("IPP :", "25003284"),
("Date de naissance :", "1900-01-01"), # ANOMALIE
("Sexe :", "F"),
("Date d'admission :", "2026-05-05 14:32"),
("Service :", "URGENCES"),
("Motif :", "Douleur abdominale aiguë"),
("Diagnostic principal :", "K35.8 - Appendicite aiguë"),
("Forfait facturation :", "UHCD - Forfait 24h"),
],
anomaly_keywords=["1900", "naissance", "ddn", "date"],
),
Scenario(
label="ipp_incoherent",
rows=[
("Nom :", "MARTIN Paul"),
("IPP :", "ABC@@##XYZ"), # ANOMALIE : non numérique
("Date de naissance :", "1965-04-12"),
("Sexe :", "M"),
("Date d'admission :", "2026-05-06 09:15"),
("Service :", "URGENCES"),
("Motif :", "Chute mécanique"),
("Diagnostic principal :", "S52.5 - Fracture du radius distal"),
("Forfait facturation :", "UHCD - Forfait 24h"),
],
anomaly_keywords=["ipp", "abc", "format", "incohérent", "incoherent", "invalide"],
),
Scenario(
label="diagnostic_vide",
rows=[
("Nom :", "BERNARD Sophie"),
("IPP :", "25004191"),
("Date de naissance :", "1972-11-08"),
("Sexe :", "F"),
("Date d'admission :", "2026-05-06 10:42"),
("Service :", "URGENCES"),
("Motif :", "Céphalées"),
("Diagnostic principal :", ""), # ANOMALIE : vide
("Forfait facturation :", "UHCD - Forfait 24h"),
],
anomaly_keywords=["diagnostic", "vide", "blanc", "absent", "manque", "non renseigné", "non renseigne"],
),
Scenario(
label="cim_inadapte_age",
rows=[
("Nom :", "PETIT Lucas"),
("IPP :", "25004222"),
("Date de naissance :", "2025-11-01"), # nourrisson 6 mois
("Sexe :", "M"),
("Date d'admission :", "2026-05-06 11:00"),
("Service :", "URGENCES PEDIATRIQUES"),
("Motif :", "Pleurs persistants"),
("Diagnostic principal :", "M19.9 - Arthrose, sans précision"), # ANOMALIE
("Forfait facturation :", "UHCD - Forfait 24h"),
],
anomaly_keywords=["arthrose", "âge", "age", "nourrisson", "incohérent", "incoherent", "m19", "incompatible"],
),
Scenario(
label="forfait_incoherent_duree",
rows=[
("Nom :", "ROUSSEAU Jean"),
("IPP :", "25004317"),
("Date de naissance :", "1958-03-22"),
("Sexe :", "M"),
("Date d'admission :", "2026-05-06 08:00"),
("Date de sortie :", "2026-05-06 09:00"), # 1h
("Service :", "URGENCES"),
("Motif :", "Bilan biologique"),
("Diagnostic principal :", "Z00.0 - Examen médical général"),
("Forfait facturation :", "UHCD - Forfait 24h"), # ANOMALIE : 1h ≠ UHCD 24h
],
anomaly_keywords=["forfait", "uhcd", "durée", "duree", "1h", "incohérent", "incoherent", "24h"],
),
]
# ---------------------------------------------------------------------------
# Génération des screenshots
# ---------------------------------------------------------------------------
def make_screenshot(scenario: Scenario, path: str) -> None:
"""Crée un PNG du dossier patient pour un scénario donné."""
img = Image.new("RGB", (1024, 600), color="white")
draw = ImageDraw.Draw(img)
try:
font_title = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 22)
font_body = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 18)
except OSError:
font_title = ImageFont.load_default()
font_body = ImageFont.load_default()
draw.text((20, 20), "DOSSIER PATIENT - URGENCES UHCD", fill="black", font=font_title)
draw.line([(20, 55), (1004, 55)], fill="black", width=2)
y = 80
for label, value in scenario.rows:
draw.text((30, y), label, fill="black", font=font_body)
draw.text((280, y), value, fill="#1f2937", font=font_body)
y += 35
img.save(path, format="PNG")
def encode_image(path: str) -> str:
with open(path, "rb") as f:
return base64.b64encode(f.read()).decode("ascii")
def build_prompt() -> str:
existing = ", ".join(EXISTING_LABELS) if EXISTING_LABELS else "aucun"
return f"""Tu es Léa, assistante médicale supervisée.
Avant de continuer le workflow, tu dois lister 0 à {MAX_CHECKS} vérifications supplémentaires
que l'humain doit acquitter, en regardant l'écran actuel.
Contexte workflow : {WORKFLOW_MESSAGE}
Checks déjà demandés : {existing}
NE répète PAS un check déjà demandé.
Si rien d'inhabituel à signaler, retourne {{"additional_checks": []}}.
Réponds UNIQUEMENT en JSON :
{{
"additional_checks": [
{{"label": "string court", "evidence": "ce que tu as vu d'inhabituel"}}
]
}}
"""
# ---------------------------------------------------------------------------
# Gestion VRAM Ollama (déchargement)
# ---------------------------------------------------------------------------
def list_loaded_models(url: str) -> list[str]:
"""Retourne la liste des modèles actuellement en VRAM sur cet Ollama."""
try:
resp = requests.get(f"{url}/api/ps", timeout=5)
if resp.status_code == 200:
data = resp.json()
return [m["name"] for m in data.get("models", [])]
except Exception:
pass
return []
def unload_all_models() -> None:
"""Décharge tous les modèles en VRAM sur les 2 Ollama (keep_alive=0)."""
for url in (OLLAMA_PRIMARY, OLLAMA_SECONDARY):
loaded = list_loaded_models(url)
for model_name in loaded:
try:
requests.post(
f"{url}/api/generate",
json={"model": model_name, "prompt": "", "keep_alive": 0, "stream": False},
timeout=10,
)
except Exception:
pass
# Petit temps pour laisser le GC GPU faire son travail
time.sleep(2)
# ---------------------------------------------------------------------------
# Appel modèle + parsing
# ---------------------------------------------------------------------------
@dataclass
class CallResult:
elapsed_s: float
error: str = ""
raw: str = ""
checks: list[dict] = field(default_factory=list)
def call_model(model: str, url: str, prompt: str, image_b64: str) -> CallResult:
payload = {
"model": model,
"prompt": prompt,
"stream": False,
"format": "json",
"options": {"temperature": 0.1, "num_predict": 250},
"images": [image_b64],
}
t0 = time.perf_counter()
try:
resp = requests.post(f"{url}/api/generate", json=payload, timeout=TIMEOUT_S)
elapsed = time.perf_counter() - t0
except requests.Timeout:
return CallResult(elapsed_s=TIMEOUT_S, error="TIMEOUT")
except Exception as e:
return CallResult(elapsed_s=time.perf_counter() - t0, error=f"NETWORK:{type(e).__name__}")
if resp.status_code != 200:
return CallResult(elapsed_s=elapsed, error=f"HTTP_{resp.status_code}", raw=resp.text[:200])
raw = resp.json().get("response", "").strip()
try:
parsed = json.loads(raw)
checks = parsed.get("additional_checks") or []
if not isinstance(checks, list):
checks = []
return CallResult(elapsed_s=elapsed, raw=raw[:300], checks=checks)
except json.JSONDecodeError as e:
return CallResult(elapsed_s=elapsed, error=f"JSON:{type(e).__name__}", raw=raw[:200])
def detects_anomaly(scenario: Scenario, checks: list[dict]) -> bool:
blob = " ".join(
f"{c.get('label', '')} {c.get('evidence', '')}".lower()
for c in checks
)
return any(pat.lower() in blob for pat in scenario.anomaly_keywords)
# ---------------------------------------------------------------------------
# Bench main
# ---------------------------------------------------------------------------
@dataclass
class ModelStats:
model: str
cold_s: float = 0.0
warm_times: list[float] = field(default_factory=list)
detection_count: int = 0
detection_total: int = 0
json_valid_count: int = 0
json_valid_total: int = 0
errors: list[str] = field(default_factory=list)
sample_checks: list[tuple[str, list[dict]]] = field(default_factory=list) # (scenario_label, checks)
def run_bench_for_model(model: str, url: str, screenshots: list[tuple[Scenario, str]]) -> ModelStats:
print(f"\n══════════════════════════════════════════════════════════")
print(f" MODEL: {model} ({url})")
print(f"══════════════════════════════════════════════════════════")
# Décharger tout
print(f" [1/3] Déchargement VRAM...", end=" ", flush=True)
unload_all_models()
loaded_after = list_loaded_models(OLLAMA_PRIMARY) + list_loaded_models(OLLAMA_SECONDARY)
print(f"OK (loaded={loaded_after if loaded_after else 'aucun'})")
stats = ModelStats(model=model)
prompt = build_prompt()
# Cold start sur le 1er screenshot
scen0, path0 = screenshots[0]
img_b64 = encode_image(path0)
print(f" [2/3] Cold start ({scen0.label})...", end=" ", flush=True)
r0 = call_model(model, url, prompt, img_b64)
stats.cold_s = r0.elapsed_s
if r0.error:
print(f"{r0.error} ({r0.elapsed_s:.1f}s)")
stats.errors.append(f"cold:{scen0.label}:{r0.error}")
else:
det = detects_anomaly(scen0, r0.checks)
stats.detection_count += int(det)
stats.detection_total += 1
stats.json_valid_count += 1
stats.json_valid_total += 1
stats.sample_checks.append((scen0.label, r0.checks))
print(f"{'' if det else '⚠️'} {len(r0.checks)} check(s) en {r0.elapsed_s:.1f}s (det={det})")
# Warm runs sur les 4 autres screenshots × N runs
print(f" [3/3] Warm runs ({len(screenshots)-1} scenarios × {WARM_RUNS_PER_SCREENSHOT} runs)...")
for scen, path in screenshots[1:]:
img_b64 = encode_image(path)
for run_idx in range(WARM_RUNS_PER_SCREENSHOT):
r = call_model(model, url, prompt, img_b64)
if r.error:
stats.errors.append(f"{scen.label}:run{run_idx}:{r.error}")
stats.json_valid_total += 1
stats.detection_total += 1
print(f" {scen.label} run{run_idx}: ❌ {r.error}")
continue
stats.warm_times.append(r.elapsed_s)
stats.json_valid_count += 1
stats.json_valid_total += 1
det = detects_anomaly(scen, r.checks)
stats.detection_count += int(det)
stats.detection_total += 1
if run_idx == 0:
stats.sample_checks.append((scen.label, r.checks))
print(f" {scen.label} run{run_idx}: {'' if det else '⚠️'} {len(r.checks)} check(s) en {r.elapsed_s:.1f}s")
return stats
def print_summary_table(all_stats: list[ModelStats]) -> None:
print("\n\n══════════════════════════════════════════════════════════")
print(" SYNTHÈSE")
print("══════════════════════════════════════════════════════════\n")
print("| Modèle | Cold (s) | Warm avg (s) | Warm p95 (s) | JSON | Détection | Notes |")
print("|---|---:|---:|---:|---:|---:|---|")
for s in all_stats:
if s.warm_times:
warm_avg = statistics.mean(s.warm_times)
warm_p95 = sorted(s.warm_times)[int(len(s.warm_times) * 0.95) - 1] if len(s.warm_times) > 1 else s.warm_times[0]
else:
warm_avg = warm_p95 = 0.0
json_pct = (s.json_valid_count / s.json_valid_total * 100) if s.json_valid_total else 0
det_pct = (s.detection_count / s.detection_total * 100) if s.detection_total else 0
notes = f"{len(s.errors)} err" if s.errors else "OK"
print(f"| `{s.model}` | {s.cold_s:.1f} | {warm_avg:.1f} | {warm_p95:.1f} | "
f"{json_pct:.0f}% ({s.json_valid_count}/{s.json_valid_total}) | "
f"{det_pct:.0f}% ({s.detection_count}/{s.detection_total}) | {notes} |")
print("\n## Détail des checks par scénario\n")
for s in all_stats:
print(f"\n### `{s.model}`")
if s.errors:
print(f"_Erreurs ({len(s.errors)})_ : {s.errors[:5]}{'...' if len(s.errors) > 5 else ''}")
for label, checks in s.sample_checks:
if not checks:
print(f"- **{label}** : _aucun check_")
else:
for c in checks[:2]:
print(f"- **{label}** : {c.get('label', '?')} — _{c.get('evidence', '?')[:120]}_")
def pick_winner(all_stats: list[ModelStats]) -> ModelStats | None:
"""Le gagnant : meilleur taux détection, départage par warm avg."""
valid = [s for s in all_stats if s.warm_times]
if not valid:
return None
# Tri : détection desc puis warm avg asc
valid.sort(key=lambda s: (-(s.detection_count / max(s.detection_total, 1)), statistics.mean(s.warm_times)))
return valid[0]
def main() -> int:
# Génération des 5 screenshots
print("📸 Génération des 5 screenshots synthétiques :")
screenshots: list[tuple[Scenario, str]] = []
for scen in SCENARIOS:
path = f"/tmp/bench_safety_{scen.label}.png"
make_screenshot(scen, path)
print(f" - {scen.label}{path}")
screenshots.append((scen, path))
print(f"\n⏱ Timeout par appel : {TIMEOUT_S}s")
print(f"🔄 Warm runs par scénario : {WARM_RUNS_PER_SCREENSHOT}")
print(f"📊 Total mesures par modèle : 1 cold + {(len(SCENARIOS)-1) * WARM_RUNS_PER_SCREENSHOT} warm = "
f"{1 + (len(SCENARIOS)-1) * WARM_RUNS_PER_SCREENSHOT}")
print(f"🤖 Candidats : {[c[0] for c in CANDIDATES]}")
all_stats: list[ModelStats] = []
for model, url, _ in CANDIDATES:
try:
stats = run_bench_for_model(model, url, screenshots)
all_stats.append(stats)
except KeyboardInterrupt:
print(f"\n⚠️ Interrompu pendant {model}, on saute le reste")
break
except Exception as e:
print(f"\n❌ Crash bench {model}: {e}")
all_stats.append(ModelStats(model=model, errors=[f"crash:{e}"]))
print_summary_table(all_stats)
winner = pick_winner(all_stats)
print("\n## Recommandation\n")
if winner is None:
print("⚠️ Aucun modèle exploitable. Décision manuelle nécessaire.")
return 1
det_pct = winner.detection_count / max(winner.detection_total, 1) * 100
warm_avg = statistics.mean(winner.warm_times)
print(f"🏆 **{winner.model}** : détection {det_pct:.0f}%, warm avg {warm_avg:.1f}s, cold {winner.cold_s:.1f}s")
print(f"\nPour fixer en production :")
print(f"```bash\nsudo systemctl edit rpa-streaming")
print(f"# [Service]\n# Environment=RPA_SAFETY_CHECKS_LLM_MODEL={winner.model}")
print(f"sudo systemctl restart rpa-streaming\n```")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -1113,3 +1113,76 @@ def execute_windows():
return jsonify({'error': 'Streaming server (port 5005) non disponible'}), 503
except Exception as e:
return jsonify({'error': str(e)}), 500
# ---------------------------------------------------------------------------
# QW4 — Proxy /api/v3/replay/resume → streaming /replay/{id}/resume
# Forward Bearer token + body { replay_id, acknowledged_check_ids }.
# Le frontend (PauseDialog) appelle /api/v3/replay/resume via le VWB ;
# on relaye au streaming server pour valider les acquittements safety_checks.
# ---------------------------------------------------------------------------
@api_v3_bp.route('/replay/resume', methods=['POST'])
def replay_resume_proxy():
"""Proxy QW4 vers le serveur streaming pour la reprise avec safety_checks."""
import requests as req
data = request.get_json() or {}
replay_id = data.get('replay_id')
if not replay_id:
return jsonify({'error': 'replay_id manquant'}), 400
streaming_url = os.environ.get('RPA_STREAMING_URL', 'http://localhost:5005')
token = os.environ.get('RPA_API_TOKEN', '')
headers = {'Content-Type': 'application/json'}
if token:
headers['Authorization'] = f'Bearer {token}'
# Body forwardé : uniquement acknowledged_check_ids (replay_id est dans l'URL)
forward_body = {
'acknowledged_check_ids': data.get('acknowledged_check_ids') or [],
}
try:
resp = req.post(
f'{streaming_url}/api/v1/traces/stream/replay/{replay_id}/resume',
json=forward_body,
headers=headers,
timeout=10,
)
return resp.content, resp.status_code, {'Content-Type': 'application/json'}
except req.ConnectionError:
return jsonify({'error': 'streaming_unreachable',
'detail': f'Streaming server non disponible ({streaming_url})'}), 502
except req.RequestException as e:
return jsonify({'error': 'streaming_unreachable', 'detail': str(e)}), 502
# ---------------------------------------------------------------------------
# QW4 — Proxy GET /api/v3/replay/state/<replay_id> → streaming /replay/{id}
# Forward Bearer token vers le serveur streaming.
# Permet à App.tsx de récupérer le state du replay actif (Agent V1 Windows)
# pour afficher PauseDialog quand status = paused_need_help avec safety_checks.
# ---------------------------------------------------------------------------
@api_v3_bp.route('/replay/state/<replay_id>', methods=['GET'])
def replay_state_proxy(replay_id):
"""Proxy QW4 vers le serveur streaming pour récupérer le state replay actif."""
import requests as req
streaming_url = os.environ.get('RPA_STREAMING_URL', 'http://localhost:5005')
token = os.environ.get('RPA_API_TOKEN', '')
headers = {}
if token:
headers['Authorization'] = f'Bearer {token}'
try:
resp = req.get(
f'{streaming_url}/api/v1/traces/stream/replay/{replay_id}',
headers=headers,
timeout=5,
)
return resp.content, resp.status_code, {'Content-Type': 'application/json'}
except req.ConnectionError:
return jsonify({'error': 'streaming_unreachable',
'detail': f'Streaming server non disponible ({streaming_url})'}), 502
except req.RequestException as e:
return jsonify({'error': 'streaming_unreachable', 'detail': str(e)}), 502

View File

@@ -25,6 +25,7 @@ import ExecutionOverlay from './components/ExecutionOverlay';
import type { Variable } from './components/VariableManager';
import RightPanel from './components/RightPanel';
import SelfHealingDialog from './components/SelfHealingDialog';
import PauseDialog from './components/PauseDialog';
import ConfidenceDashboard from './components/ConfidenceDashboard';
import WorkflowValidation from './components/WorkflowValidation';
import ReviewModal from './components/ReviewModal';
@@ -61,6 +62,13 @@ function App() {
const [healingCandidates, setHealingCandidates] = useState<any[]>([]);
const [healingStepInfo, setHealingStepInfo] = useState<any>(null);
// QW4 — Replay streaming Windows en cours (Agent V1 distant).
// Quand un replay distant est lancé via ExecutionControls "→ Windows",
// ExecutionControls appelle setStreamingReplayId(replay_id) et un useEffect
// poll /api/v3/replay/state/<id> pour fusionner safety_checks + pause_*
// dans appState.execution → PauseDialog s'affiche.
const [streamingReplayId, setStreamingReplayId] = useState<string | null>(null);
// Charger l'état initial
const loadState = useCallback(async () => {
try {
@@ -122,6 +130,62 @@ function App() {
return () => clearInterval(interval);
}, [isExecutionRunning, loadState]);
// QW4 — Polling state replay streaming (Agent V1 Windows distant)
// Tourne dès qu'un replay distant a été lancé. Récupère safety_checks,
// pause_message, pause_reason et les fusionne dans appState.execution
// pour que PauseDialog s'affiche quand status = paused_need_help.
useEffect(() => {
if (!streamingReplayId) return;
let stopped = false;
const pollReplay = async () => {
try {
const resp = await fetch(`/api/v3/replay/state/${streamingReplayId}`);
if (!resp.ok) return;
const state = await resp.json();
if (stopped) return;
// Fusionner dans appState.execution sans écraser le reste.
setAppState(prev => {
if (!prev) return prev;
const prevExec = prev.execution || {
id: streamingReplayId,
workflow_id: prev.session?.active_workflow_id || '',
status: 'pending',
progress: 0,
current_step_index: 0,
completed_steps: 0,
failed_steps: 0,
total_steps: 0,
};
return {
...prev,
execution: {
...prevExec,
status: state.status || prevExec.status,
pause_message: state.pause_message || state.message,
pause_reason: state.pause_reason,
safety_checks: state.safety_checks || [],
replay_id: streamingReplayId,
},
};
});
// Stopper le polling si le replay est terminé / annulé.
if (state.status && ['completed', 'error', 'cancelled'].includes(state.status)) {
setStreamingReplayId(null);
}
} catch (err) {
// ignore (le serveur streaming peut être momentanément indispo)
}
};
// Tick immédiat puis toutes les 1s.
pollReplay();
const interval = setInterval(pollReplay, 1000);
return () => { stopped = true; clearInterval(interval); };
}, [streamingReplayId]);
// Convertir les étapes en nœuds React Flow
// Les edges ne sont générées automatiquement que lors du premier chargement
// d'un workflow. Ensuite, les connexions manuelles de l'utilisateur sont préservées.
@@ -451,6 +515,7 @@ function App() {
execution={appState?.execution || null}
onStart={handleStartExecution}
onStop={handleStopExecution}
onWindowsReplayStarted={(replayId) => setStreamingReplayId(replayId)}
/>
<ConfidenceDashboard
isExecutionRunning={isExecutionRunning}
@@ -569,6 +634,47 @@ function App() {
}}
/>
{/* QW4 — Pause supervisée (safety_checks).
Affiché si le serveur renvoie status == paused_need_help, ou
status == paused avec un payload de checks. Backward 100% : si
safety_checks vide, PauseDialog rend la bulle simple legacy. */}
{(appState?.execution?.status === 'paused_need_help' ||
(appState?.execution?.status === 'paused' &&
(appState?.execution?.safety_checks?.length ?? 0) > 0)) && (
<div className="pause-dialog-overlay">
<PauseDialog
pauseMessage={appState.execution.pause_message || 'Validation requise'}
pauseReason={appState.execution.pause_reason}
safetyChecks={appState.execution.safety_checks || []}
onResume={async (ackIds) => {
const replayId = appState.execution?.replay_id || appState.execution?.id;
if (replayId) {
// Voie streaming server (Agent V1 / replay distant)
const resp = await fetch('/api/v3/replay/resume', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
replay_id: replayId,
acknowledged_check_ids: ackIds,
}),
});
if (!resp.ok) {
const err = await resp.json().catch(() => ({}));
throw new Error(err?.detail?.error || resp.statusText);
}
} else {
// Voie locale (execute/resume)
await api.resumeExecution();
}
await loadState();
}}
onCancel={() => {
handleStopExecution();
}}
/>
</div>
)}
{/* ConfidenceDashboard déplacé dans le header */}
</div>
);

View File

@@ -9,9 +9,12 @@ interface Props {
execution: Execution | null;
onStart: () => void;
onStop: () => void;
// QW4 — Notifie App.tsx quand un replay streaming Windows est lancé,
// pour qu'il poll /api/v3/replay/state/<id> et affiche PauseDialog au besoin.
onWindowsReplayStarted?: (replayId: string) => void;
}
export default function ExecutionControls({ execution, onStart, onStop }: Props) {
export default function ExecutionControls({ execution, onStart, onStop, onWindowsReplayStarted }: Props) {
const isRunning = execution?.status === 'running' || execution?.status === 'paused';
const [windowsStatus, setWindowsStatus] = useState<'idle' | 'sending' | 'sent' | 'error'>('idle');
@@ -56,6 +59,11 @@ export default function ExecutionControls({ execution, onStart, onStop }: Props)
const result = await resp.json();
if (result.replay_id) {
setWindowsStatus('sent');
// QW4 — propage le replay_id à App.tsx pour activer le polling
// /api/v3/replay/state/<id> (PauseDialog si paused_need_help).
if (onWindowsReplayStarted) {
try { onWindowsReplayStarted(result.replay_id); } catch {}
}
alert('Replay lancé ! Réduisez cette fenêtre maintenant.\nLes actions commenceront dans 5 secondes.');
setTimeout(() => setWindowsStatus('idle'), 5000);
} else {
@@ -75,9 +83,27 @@ export default function ExecutionControls({ execution, onStart, onStop }: Props)
{!isRunning ? (
<div style={{ display: 'flex', gap: '4px', alignItems: 'center' }}>
{userOS === 'linux' ? (
<button className="btn-start" onClick={onStart} title="Exécuter sur cet écran">
Exécuter
</button>
<>
<button className="btn-start" onClick={onStart} title="Exécuter sur cet écran (Linux local)">
Exécuter
</button>
<button
className="btn-start"
onClick={handleExecuteWindows}
disabled={windowsStatus === 'sending'}
style={{
background: windowsStatus === 'sent' ? '#22c55e' : windowsStatus === 'error' ? '#ef4444' : '#0078d4',
fontSize: '12px',
opacity: windowsStatus === 'sending' ? 0.6 : 1,
}}
title="Exécuter sur le PC Windows (Agent V1)"
>
{windowsStatus === 'sending' ? 'Envoi...' :
windowsStatus === 'sent' ? 'Lancé !' :
windowsStatus === 'error' ? 'Erreur' :
'→ Windows'}
</button>
</>
) : (
<button
className="btn-start"

View File

@@ -0,0 +1,126 @@
// QW4 — PauseDialog : bulle de pause supervisée avec ChecklistPanel intégré.
//
// 2 modes de rendu :
// - safety_checks vide -> bulle simple legacy (Continuer / Annuler)
// - safety_checks fournis -> ChecklistPanel ; bouton Continuer désactivé
// tant qu'un check `required` n'est pas coché.
//
// Les checks `llm_contextual` portent un badge [Léa] avec evidence en tooltip.
import { useState, useMemo } from 'react';
import type { SafetyCheck } from '../types';
interface Props {
pauseMessage: string;
pauseReason?: string;
safetyChecks: SafetyCheck[];
onResume: (acknowledgedIds: string[]) => Promise<void>;
onCancel: () => void;
}
export default function PauseDialog({
pauseMessage,
pauseReason,
safetyChecks,
onResume,
onCancel,
}: Props) {
const [checked, setChecked] = useState<Record<string, boolean>>({});
const [submitting, setSubmitting] = useState(false);
const [error, setError] = useState<string | null>(null);
const allRequiredOK = useMemo(() => {
return safetyChecks
.filter((c) => c.required)
.every((c) => checked[c.id] === true);
}, [safetyChecks, checked]);
const toggle = (id: string) => {
setChecked((prev) => ({ ...prev, [id]: !prev[id] }));
};
const handleResume = async () => {
setSubmitting(true);
setError(null);
try {
const acknowledgedIds = Object.entries(checked)
.filter(([, v]) => v)
.map(([k]) => k);
await onResume(acknowledgedIds);
} catch (e: any) {
setError(e?.message || 'Erreur lors de la reprise');
} finally {
setSubmitting(false);
}
};
// Backward compat : pas de checks -> bulle simple legacy
if (safetyChecks.length === 0) {
return (
<div className="pause-dialog-simple">
<p>{pauseMessage}</p>
{pauseReason && <small className="pause-reason">Raison : {pauseReason}</small>}
<div className="pause-actions">
<button onClick={() => onResume([])} disabled={submitting}>
Continuer
</button>
<button onClick={onCancel} disabled={submitting}>
Annuler
</button>
</div>
</div>
);
}
return (
<div className="pause-dialog-checks">
<h3>Pause supervisée</h3>
<p className="pause-message">{pauseMessage}</p>
{pauseReason && (
<div className="pause-reason-banner">
<strong>Raison :</strong> {pauseReason}
</div>
)}
<ul className="checklist-panel">
{safetyChecks.map((c) => (
<li key={c.id} className={`check-item ${c.required ? 'required' : 'optional'}`}>
<label>
<input
type="checkbox"
checked={!!checked[c.id]}
onChange={() => toggle(c.id)}
disabled={submitting}
/>
<span className="check-label">{c.label}</span>
{c.required && <span className="badge badge-required">obligatoire</span>}
{c.source === 'llm_contextual' && (
<span className="badge badge-lea" title={c.evidence || ''}>
Léa
</span>
)}
</label>
{c.source === 'llm_contextual' && c.evidence && (
<small className="check-evidence">-&gt; {c.evidence}</small>
)}
</li>
))}
</ul>
{error && <div className="pause-error">{error}</div>}
<div className="pause-actions">
<button
onClick={handleResume}
disabled={!allRequiredOK || submitting}
title={!allRequiredOK ? 'Coche tous les checks obligatoires' : 'Reprendre le replay'}
>
{submitting ? 'Reprise...' : 'Continuer'}
</button>
<button onClick={onCancel} disabled={submitting}>
Annuler
</button>
</div>
</div>
);
}

View File

@@ -1353,6 +1353,136 @@ export default function PropertiesPanel({ step, onUpdateParams, onDelete }: Prop
</>
);
case 'pause_for_human': {
// QW4 — éditeur safety_level + safety_checks (déclaratifs)
const safetyChecks = Array.isArray(params.safety_checks)
? (params.safety_checks as Array<{ id?: string; label?: string; required?: boolean }>)
: [];
return (
<>
<div className="prop-field">
<label>Message affiché à l'opérateur</label>
<textarea
rows={4}
value={String(params.message || '')}
onChange={(e) => updateParam('message', e.target.value)}
placeholder="Ex: Décision : {{dec.decision}}&#10;&#10;{{dec.justification}}"
style={{ width: '100%', fontFamily: 'monospace', fontSize: '12px' }}
/>
</div>
{/* QW4 — Niveau de sécurité */}
<div className="prop-field">
<label>Niveau de sécurité</label>
<select
value={String(params.safety_level || 'standard')}
onChange={(e) => updateParam('safety_level', e.target.value)}
>
<option value="standard">Standard (pas de LLM)</option>
<option value="medical_critical">Médical critique (LLM contextuel)</option>
</select>
</div>
{/* QW4 — Liste éditable de checks déclaratifs */}
<div className="prop-field">
<label>Checks à valider (déclaratifs)</label>
{safetyChecks.map((check, i) => (
<div key={i} className="check-editor-row">
<input
placeholder="ID (ex: check_ipp)"
value={check.id || ''}
style={{ width: '30%' }}
onChange={(e) => {
const next = [...safetyChecks];
next[i] = { ...check, id: e.target.value };
updateParam('safety_checks', next);
}}
/>
<input
placeholder="Libellé"
value={check.label || ''}
style={{ flex: 1 }}
onChange={(e) => {
const next = [...safetyChecks];
next[i] = { ...check, label: e.target.value };
updateParam('safety_checks', next);
}}
/>
<label style={{ display: 'flex', alignItems: 'center', gap: '4px' }}>
<input
type="checkbox"
checked={!!check.required}
onChange={(e) => {
const next = [...safetyChecks];
next[i] = { ...check, required: e.target.checked };
updateParam('safety_checks', next);
}}
/>
Obligatoire
</label>
<button
type="button"
onClick={() => {
const next = safetyChecks.filter((_, j) => j !== i);
updateParam('safety_checks', next);
}}
title="Supprimer ce check"
>
</button>
</div>
))}
<button
type="button"
onClick={() => {
const next = [
...safetyChecks,
{ id: '', label: '', required: true },
];
updateParam('safety_checks', next);
}}
>
+ Ajouter un check
</button>
</div>
</>
);
}
case 't2a_decision':
return (
<>
<div className="prop-field">
<label>Template d'entrée (supporte {'{{var}}'})</label>
<textarea
rows={5}
value={String(params.input_template || '')}
onChange={(e) => updateParam('input_template', e.target.value)}
placeholder={'{{t0}}\n---\n{{t1}}\n{{t2}}\n{{t3}}\n{{t4}}'}
style={{ width: '100%', fontFamily: 'monospace', fontSize: '12px' }}
/>
</div>
<div className="prop-field">
<label>Variable de sortie (ex: dec)</label>
<input
type="text"
value={String(params.output_var || '')}
onChange={(e) => updateParam('output_var', e.target.value)}
placeholder="dec"
/>
</div>
<div className="prop-field">
<label>Modèle Ollama</label>
<input
type="text"
value={String(params.model || 'qwen2.5:7b')}
onChange={(e) => updateParam('model', e.target.value)}
placeholder="qwen2.5:7b"
/>
</div>
</>
);
default:
return <div className="prop-info">Pas de paramètres supplémentaires</div>;
}

View File

@@ -4491,3 +4491,86 @@ body {
.right-panel-tabbed .capture-library {
border-top: 1px solid var(--border);
}
/* === QW4 — PauseDialog & ChecklistPanel === */
.pause-dialog-overlay {
position: fixed;
inset: 0;
background: rgba(15, 23, 42, 0.45);
display: flex;
align-items: center;
justify-content: center;
z-index: 9999;
}
.pause-dialog-simple,
.pause-dialog-checks {
padding: 16px;
max-width: 480px;
background: #fff;
border: 2px solid #f59e0b;
border-radius: 8px;
box-shadow: 0 10px 40px rgba(0, 0, 0, 0.25);
}
.pause-dialog-checks h3 { margin: 0 0 8px; color: #92400e; }
.pause-message { margin: 0 0 12px; }
.pause-reason-banner {
background: #fef3c7;
padding: 8px;
margin-bottom: 12px;
border-radius: 4px;
}
.pause-reason { color: #6b7280; display: block; margin-top: 4px; }
.checklist-panel {
list-style: none;
padding: 0;
margin: 0 0 12px;
}
.check-item {
padding: 6px 0;
border-bottom: 1px solid #f3f4f6;
}
.check-item.required { background: #fef9c3; }
.check-item label {
cursor: pointer;
display: flex;
align-items: center;
gap: 6px;
}
.badge {
font-size: 10px;
padding: 2px 6px;
border-radius: 10px;
margin-left: 6px;
}
.badge-required { background: #dc2626; color: #fff; }
.badge-lea { background: #2563eb; color: #fff; cursor: help; }
.check-evidence {
display: block;
font-style: italic;
color: #6b7280;
margin-left: 24px;
}
.pause-error {
color: #dc2626;
padding: 8px;
background: #fef2f2;
border-radius: 4px;
margin-bottom: 8px;
}
.pause-actions {
display: flex;
gap: 8px;
justify-content: flex-end;
}
.pause-actions button:disabled {
opacity: 0.5;
cursor: not-allowed;
}
/* QW4 — éditeur de safety_checks dans PropertiesPanel */
.check-editor-row {
display: flex;
gap: 4px;
margin-bottom: 4px;
align-items: center;
}

View File

@@ -1,5 +1,16 @@
// Types pour l'API v3
// === QW4 — Safety checks (pause supervisée) ===
export type SafetyLevel = 'standard' | 'medical_critical';
export interface SafetyCheck {
id: string;
label: string;
required: boolean;
source: 'declarative' | 'llm_contextual';
evidence?: string | null;
}
// Mode d'exécution
export type ExecutionMode = 'basic' | 'intelligent' | 'debug' | 'verified';
@@ -133,7 +144,9 @@ export const ACTIONS: ActionDefinition[] = [
{ name: 'max_iterations', type: 'number', description: 'Nombre maximum d\'itérations' }
] },
{ type: 'pause_for_human', label: 'Pause supervisée', icon: '⏸', description: 'Léa s\'arrête et demande validation humaine via une bulle interactive (boutons Continuer / Annuler).', category: 'logic', needsAnchor: false, params: [
{ name: 'message', type: 'string', description: 'Message affiché dans la bulle (ex: "Je ne suis pas sûre du critère 3, validez-vous UHCD ?")' }
{ name: 'message', type: 'string', description: 'Message affiché dans la bulle (ex: "Je ne suis pas sûre du critère 3, validez-vous UHCD ?")' },
{ name: 'safety_level', type: 'select', description: 'Niveau de sécurité : standard (pas de LLM) ou medical_critical (LLM contextuel)' },
{ name: 'safety_checks', type: 'safety_checks_editor', description: 'Liste de checks à valider avant reprise (id, libellé, obligatoire ?). Édité dans le panneau Propriétés.' }
] },
{ type: 't2a_decision', label: 'Décision T2A (LLM)', icon: '🧠', description: 'Analyse un DPI urgences via LLM local (qwen2.5:7b par défaut) et propose FORFAIT_URGENCE ou REQUALIFICATION_HOSPITALISATION. Retourne JSON {decision, justification, elements_pour/contre, confiance}. Bench validé 100% accuracy.', category: 'logic', needsAnchor: false, params: [
{ name: 'input_template', type: 'string', description: 'DPI à analyser. Supporte le templating {{var}} pour concaténer plusieurs extractions (ex: "{{texte_motif}}\\n{{texte_examens}}\\n{{texte_notes}}")' },
@@ -312,13 +325,19 @@ export interface WorkflowSummary {
export interface Execution {
id: string;
workflow_id: string;
status: 'pending' | 'running' | 'paused' | 'completed' | 'error' | 'cancelled';
status: 'pending' | 'running' | 'paused' | 'paused_need_help' | 'completed' | 'error' | 'cancelled';
progress: number;
current_step_index: number;
completed_steps: number;
failed_steps: number;
total_steps: number;
error_message?: string;
// === QW4 — Pause supervisée (renvoyés par /replay/state quand status = paused_need_help) ===
pause_reason?: string;
pause_message?: string;
safety_checks?: SafetyCheck[];
// ID du replay (utile pour appeler /replay/resume avec acknowledged_check_ids)
replay_id?: string;
}
export interface Session {