rpa_vision_v3

Author	SHA1	Message	Date
Dom	6a300a4298	docs(coordination): add dgx spark multi-poste poc focus	2026-06-01 10:14:27 +02:00
Dom	0587036c17	docs(coordination): dispatch dgx spark poc readiness	2026-06-01 10:05:12 +02:00
Dom	f2a9e40502	docs(coordination): report c gamma dashboard promotion	2026-05-29 21:49:36 +02:00
Dom	34527b5cc5	feat(lea): add dashboard competence promotion dry run	2026-05-29 21:48:00 +02:00
Dom	bd3aaf7d64	docs(coordination): dispatch c gamma dashboard work	2026-05-29 19:04:58 +02:00
Dom	05a30f2d1d	docs(coordination): propose c gamma writeback decisions	2026-05-29 18:58:12 +02:00
Dom	47377226f2	feat(vwb): harden supervised verdict evidence	2026-05-29 18:54:54 +02:00
Dom	d515b22d1b	docs(coordination): report c beta supervision	2026-05-29 18:40:03 +02:00
Dom	aba849324a	feat(vwb): log supervised competence verdicts	2026-05-29 18:36:06 +02:00
Dom	7ad260d02f	docs(coordination): report c alpha preview	2026-05-29 18:15:30 +02:00
Dom	794a248dae	feat(vwb): preview lea competence workflows	2026-05-29 18:13:36 +02:00
Dom	8332b2cd37	docs(coordination): delegate yaml vwb supervision patch	2026-05-29 17:54:10 +02:00
Dom	9a45e61e2a	docs(coordination): report wait for state runtime	2026-05-29 17:26:35 +02:00
Dom	e66bc6d452	feat(vwb): execute wait for state	2026-05-29 17:22:35 +02:00
Dom	7b1f30af1a	fix(vwb): preserve static palette tools	2026-05-29 17:16:24 +02:00
Dom	488d14240a	docs(coordination): report vwb catalog patch	2026-05-29 17:11:02 +02:00
Dom	45b6da5e3f	feat(vwb): load palette from catalog	2026-05-29 17:09:47 +02:00
Dom	02211fddf2	docs(coordination): answer lea vwb mapping questions	2026-05-29 16:30:11 +02:00
Dom	ed36bc2b37	docs(coordination): share reflex vwb supervision findings	2026-05-29 14:33:57 +02:00
Dom	9677738f32	docs(coordination): request global review after vwb feedback	2026-05-29 14:05:40 +02:00
Dom	d422aa119c	docs(coordination): require claude qwen vision guardrails	2026-05-29 13:59:39 +02:00
Dom	7b943926db	docs(coordination): clarify vwb learning bridge	2026-05-29 13:46:22 +02:00
Dom	99f89317cb	feat(lea): substitute save menu gesture	2026-05-29 13:45:44 +02:00
Dom	6b8114eb97	docs(coordination): recadre lea direct competence flow	2026-05-29 13:41:18 +02:00
Dom	7ef98d8089	feat(lea): expose competence replay api	2026-05-29 13:40:15 +02:00
Dom	8ea4ed0ad2	docs(coordination): record supervised competence replay plan	2026-05-29 11:38:51 +02:00
Dom	a49f59b4d6	feat(competences): plan supervised replay tests	2026-05-29 11:38:12 +02:00
Dom	762e75a077	docs(coordination): record competence catalog integration	2026-05-29 11:29:18 +02:00
Dom	c1a144c673	feat(vwb): expose competence yaml catalog	2026-05-29 11:28:25 +02:00
Dom	e8a0fb0e42	feat(competences): extract batch candidates	2026-05-29 11:25:00 +02:00
Dom	4ba426c205	fix(replay): guard single in-flight dispatch Add a private in-flight helper for replay dispatch, block machine retargeting while an action is still pending on the previous session, and warn on duplicate in-flight entries for the same replay triplet. Freeze the Notepad runtime dialog success path and add integration coverage for single in-flight dispatch, watchdog late-report documentation, and the known concurrent-poll race as an xfail.	2026-05-25 11:00:59 +02:00
Dom	7bb8d543ab	feat(cognition): dataclasses Trace + SceneExpected + Precondition (Phase 2.1) Crée les 3 dataclasses du modèle Mandat/Protocoles/Scènes v0.3 dans core/cognition/, standalone (aucun branchement runtime), avec sérialisation JSON explicite et tests offline. Préparation des phases : - Phase 2.1 plan : objet Trace (mandate_id, intention_id, scene_id, affordance_signature, expected_retour, level_of_delegation) - Workpack A : SceneExpected (monitor_index, app_name, title_patterns, title_anti, window_rect_hint, scene_role, accepted_transitions, stability_ms) + helper matches_title() - Workpack B : Precondition (kind, window_title_must_contain/anti, critic_question, verify_timeout_ms) + PreconditionRecovery (max_attempts, on_recovery_fail, actions) Toutes les dataclasses sont frozen, immutables, avec to_dict/from_dict tolérants (champs vides/None -> instance vide). Validation au __post_init__ pour Precondition.kind et PreconditionRecovery.on_recovery_fail. Aucune dépendance runtime obligatoire : si l'objet n'est pas posé sur une action, fallback comportement actuel. Aucune modif executor / api_stream / replay_engine / grounding. Tests : 22/22 passent (sérialisation JSON, contrats from_dict tolérants, validation kinds, helpers matches_title/check_title, anti-intention). Tag rollback : rollback/pre-cognition-dataclasses-2026-05-25_0610	2026-05-25 06:08:18 +02:00
Dom	debd7b423c	feat(evaluation): add local Ollama LeaBench adapter	2026-05-24 21:58:06 +02:00
Dom	6544ebe3f0	feat(evaluation): add 16 LeaBench cases from replay failures Extend LeaBench computer-use coverage with cases mined from data/training/replay_failures/. Adds 8 distinct categories: save_as visible, target absent (blank desktop / wrong window), start button, start-menu search, task-view wrong state, systray overflow, ambiguous tab labels, modal-blocker dialogs, and a wrong-window Lea-terminal case. - 16 new cases in benchmarks/computer_use/cases/leabench_extended_2026-05-24.jsonl - 0 duplicate case_id vs notepad_replay_failures_2026-05-24.jsonl - Validated with: python3 tools/lea_bench.py --cases ... --json - pytest tests/unit/test_computer_use_bench.py: 7 passed	2026-05-24 21:57:24 +02:00
Dom	10136f0ee0	feat(agent): add standalone anchor-relative resolver	2026-05-24 21:54:39 +02:00
Dom	054279feb4	feat(evaluation): add LeaBench model prompt packs	2026-05-24 21:53:24 +02:00
Dom	ea1f57afb1	feat(evaluation): add LeaBench computer-use scorer	2026-05-24 21:21:17 +02:00
Dom	345762330b	fix(agent): respect server visual reject before text fallback	2026-05-24 21:10:42 +02:00
Dom	b1b32187ba	fix(agent): P0.6 guard human corrections	2026-05-24 21:07:12 +02:00
Dom	ad24d16d83	fix(executor): P0.9 double-check stabilité post-transition fenêtre Bug observé sur replay_sess_56c10222 (2026-05-24 20:14) : action 11 (clic 'Enregistrer' expected_after='Enregistrer sous') marquée success=True alors que 2 actions plus tard la fenêtre observée est 'NoMachine Desktop Viewer'. Le polling post-vérif a probablement matché brièvement 'Enregistrer sous' puis l'écran a changé sans qu'on ne revérifie. Dom : "Le contrat est rompu : Léa passe d'une action à l'autre sans vérifier que la précédente est bonne. Il faut un contrôle de résultat, si on ne sait pas on demande." Patch : juste après le match initial, attendre 0.5s et reverifier la fenêtre active. Si elle a divergé (race condition, dialog auto- fermée, focus change OS) → matched=False, le flow strict existant prend le relais avec wrong_window + needs_human. Ne touche que les cas où expected_after est défini ET pas de runtime_dialog géré entre temps (le runtime_dialog est légitime de changer la fenêtre). Tag rollback : rollback/pre-P0.9-2026-05-24_2148	2026-05-24 20:24:46 +02:00
Dom	a76f3db682	feat(executor): P1 DialogResolver serveur en fallback du catalog local Léa avait déjà une infra pour les dialogs runtime (`_match_known_runtime_dialog` + `_handle_known_runtime_dialog`) mais avec un catalog local limité à 2 entrées. Le DialogResolver R2 côté serveur a 10 entrées centralisées. P1.MVP : `_try_dialog_resolver_server()` consulte l'endpoint `/api/v1/dialog/resolve` quand le catalog local n'a pas matché. La réponse `DialogResolution` est convertie en dialog_spec compatible avec `_handle_known_runtime_dialog` qui réutilise la cascade existante (serveur VLM grounding + template matching local). - Flag `RPA_DIALOG_RESOLVER_AGENT_ENABLED` (OFF par défaut) — rollback runtime - Auth Bearer via `_auth_headers()` existant - Timeout 3s, fail-safe sur exception/503/no-match → fallback humain intact - Zéro régression sur les chemins existants (le catalog local reste 1ère ligne) Tests unitaires en local (6/6 OK) : - flag OFF → None - serveur 503 → None - matched=False → None - policy=pause (UAC) → None - match auto + click_button → dialog_spec valide - exception réseau → None Tag rollback : rollback/pre-P1-2026-05-24_2105	2026-05-24 19:59:22 +02:00
Dom	9a029a221d	fix(executor): timeout _capture_human_correction 120s → 30s Friction UX remontée par Dom sur replay live (replay_sess_63a1313b) : latence excessive 2-3 minutes après un échec d'action avant que Léa ne reprenne la main. 120s = trop long pour un humain en supervision. 10s d'inactivité reste le critère prioritaire (déjà en place), donc : - humain actif : la correction est captée et le replay reprend en ~1s - humain absent : on libère après 30s au lieu de 120s 5 sites d'appel + signature de fonction (default param) alignés. Tag rollback : rollback/pre-P0.8-2026-05-24_1912 Référence : message 2026-05-24_1910_claude-to-codex_p07-memory-sanity-fix-human-supervised-bug-frictions-ux.md	2026-05-24 19:14:12 +02:00
Dom	5ed1810ef3	fix(memory): rejeter coords (0,0) et hors [0,1] dans memory_record_success Bug observé sur replay_sess_63a1313b 2026-05-24 18:31-18:32 : _capture_human_correction() côté Léa retourne des human_actions sans clic humain réel (cause racine côté agent à investiguer = P0.6). En cascade, memory_record_success était appelé avec coords (0.0, 0.0) et stockait des entrées poison dans target_memory.db. Le sanity check existant rejetait < 0 ou > 1 mais laissait passer (0,0) qui est mathématiquement valide. Au prochain replay, memory_lookup trouvait l'entrée poison et faisait cliquer Léa au coin haut-gauche. Patch : rejet explicite de (0,0) + warning au lieu de debug pour les coords hors [0,1] (besoin de tracabilité runtime). Filet en aval — la vraie cause côté Léa reste à corriger (P0.6). Tag rollback : rollback/pre-P0.7-2026-05-24_1850	2026-05-24 19:01:18 +02:00
Dom	c9878f0a76	fix(validator-v2): override success=False uniquement sur TERMINATE Symptôme observé sur replay_sess_7a4c8e72 (24/05 17:57) : - Action act_setup_sess_verify (type=verify_screen) échoue 4x (+3 retries) - Logs: [VALIDATOR_V2] override success→False verdict=continue conf=0.30 failure_category=None reason='Aucun changement visible pour verify_screen (normal pour ce type d'action)' - Replay tombe en status=error à 7/15 (régression vs 12/15 sans V2) Cause: api_stream.py:3674 testait `if verdict != COMPLETE` (trop large) → toute action qui ne change pas drastiquement l'écran (verify_screen, wait, key_combo Ctrl+S avant ouverture dialog, etc.) renvoie verdict=CONTINUE conf=0.30 du PixelDiffChecker via le default_checker de l'orchestrator, ce qui était traité comme un échec à overrider. Fix: override SEULEMENT sur verdict=TERMINATE (échec certain avec failure_category). CONTINUE = faible signal = on laisse le pipeline historique trancher. COMPLETE n'a pas besoin d'être traité ici car on est déjà dans `if report.success:` (success initial vrai). Effet: - verify_screen/wait/key_combo non-interactif → orchestrator retourne CONTINUE conf=0.30 → V2 ne touche pas report.success (comportement legacy préservé) - click qui rate (act_raw_6c1432b3 type cible) → OcrRoiChecker retourne TERMINATE conf=0.85 failure_category=WRONG_APPLICATION → override OK Tests R1 inchangés (TERMINATE branch testée explicitement). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 17:59:35 +02:00
Dom	08701761e6	merge(R2): DialogResolver MVP P0 (worktree a86565d0)	2026-05-24 17:53:35 +02:00
Dom	a13d6d0052	merge(R1): Validator MVP P0 (worktree a0dcb652)	2026-05-24 17:53:30 +02:00
Dom	84d2d4a667	feat(dialog): R2 MVP P0 — DialogResolver + catalogue 10 entrées (flag OFF default) - agent_v0/server_v1/core/dialog/ : catalogue compact + DialogResolver stateless (match titre + evidence, trichotomie stricte auto/pause/skip). - 10 entrées P0 : confirm-save-overwrite, notepad-unsaved-changes, windows-file-explorer (fallback replay 4c38dbb8), easily-save/overwrite/ confirm-action/clinical-warning, windows-uac, windows-hello-credui, edge-update. - Validateur déclaratif `system_modals_cannot_be_overridden` : rejette toute surcharge auto/skip sur modaux SYSTÈME (windows-/defender-). - Endpoint POST /api/v1/dialog/resolve derrière flag RPA_DIALOG_RESOLVER_ENABLED (OFF par défaut → 503). Aucun rebranchement côté agent_v1 (executor.py inchangé, P1 plus tard). - 25 tests pytest passants (19 unit + 6 intégration HTTP). Spec : docs/recherche/SPEC_POPUPS_CATALOGUE.md §2bis / §3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 17:52:38 +02:00
Dom	1b4e64960b	feat(validator): R1 MVP P0 — OcrRoiChecker + orchestrator (flag OFF default) Package core/validation/ minimal : - result.py : Verdict, FailureCategory, ValidationResult - pixel_diff_checker.py : wrapper de ReplayVerifier.verify_action - ocr_roi_checker.py : ROI 80px autour du clic, détecte WRONG_APPLICATION via SUSPECT_TOKENS (edge/https/explorateur de fichiers/…) - orchestrator.py : Validator dispatch action_type → checkers + agrégation Wiring api_stream.py:3646 derrière RPA_VALIDATOR_V2_ENABLED (OFF par défaut). Si verdict ≠ COMPLETE, override report.success=False et expose failure_category dans result_entry. Zero régression flag OFF. Tests : - tests/unit/test_validator_v2.py : 13 tests (Checkers + Validator + sérialisation) - tests/integration/test_validator_step10.py : 2 tests reproduisant le bug replay_sess_4c38dbb8 / act_raw_6c1432b3 (clic Enregistrer fait basculer vers Explorateur de fichiers) — Validator retourne WRONG_APPLICATION Activation pour test live : RPA_VALIDATOR_V2_ENABLED=true Cf. docs/recherche/SPEC_VALIDATOR_MATRICE.md, AXE_B2_DEEP_VALIDATOR.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 17:52:06 +02:00
Dom	bd100bc538	fix(critic): R0 — réveiller l'enrichissement gemma4 (Critic sémantique) Symptôme observé replay_sess_4c38dbb8 (24/05) : - 0/15 actions avec expected_result rempli - Conséquence : api_stream.py:3630 verify_with_critic() jamais appelé (conditionné à action.expected_result non vide) - Donc Critic sémantique (Ollama) désarmé en production, seul le pixel-diff tournait Causes racines identifiées : 1. _GEMMA4_PORT=11435 hardcodé (legacy Docker dédié supprimé) → check /api/tags timeout silencieux → fonction sort early 2. _CRITIC_MODEL="gemma4:e4b" hardcodé → modèle non installé 3. "think": True dans le payload → "qwen2.5vl:7b-rpa" does not support thinking → 400 sur tous les appels → if not resp.ok: continue 4. Prompt sans few-shot → qwen2.5vl converse au lieu de respecter le format strict INTENTION/AVANT/APRES → parsing vide Fix (stream_processor.py) : - _GEMMA4_PORT default 11435 → 11434 (Ollama native) - _CRITIC_MODEL = os.environ.get("RPA_CRITIC_MODEL", "qwen2.5vl:7b-rpa") - Remplacement de 3 "gemma4:e4b" hardcodés → _CRITIC_MODEL - _unload_gemma4() → no-op (legacy Docker n'existe plus) - Prompt enrichissement : ajout exemple few-shot (Cliquer Enregistrer) - "think": True → False (qwen2.5vl ne supporte pas) Config .env.local : - RPA_VLM_MODEL=qwen2.5vl:7b → qwen2.5vl:7b-rpa (variant num_ctx=8192, créé via Modelfile pour permettre offload partiel GPU sur RTX 5070 12 GB ; sans ça, num_ctx=128k par défaut = 12.5 GB requis = OOM full CPU fallback observé 17:11 le 24/05) Validation : - Avant fix : 0/8 actions enrichies (110 ms total = appels échoués immédiatement avec 400) - Après fix : 5/8 actions enrichies en 35s (~7s/action, cohérent avec appels VLM réels qwen2.5vl) Side effects systemd (à committer séparément côté infra) : - OLLAMA_KEEP_ALIVE: 5m → 24h - t2a-viewer.service stopped + disabled (libère ~2.9 GB VRAM) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 17:42:44 +02:00
Dom	1647e42d32	fix(agent_v1): keepalive headless quand pystray ne peut pas tenir le main thread Symptome (3 incidents 24h les 24/05) : apres relance distante de Lea via SSH, les polls /replay/next repartent un moment puis s'arretent. Diagnostic : - agent_v1/ui/smart_tray.py:875 utilise pystray.Icon.run() comme boucle principale - main.py:132-133 lance _replay_poll_loop et _background_heartbeat_loop en daemon threads - Quand Lea est lancee via sshpass sans session interactive Windows, pystray echoue (pas de systray accessible) et icon.run() sort - agent.run() retourne, main() retourne, main thread termine - Les daemon threads meurent avec le main thread (par design Python) Fix : _headless_keepalive() maintient le main thread vivant via threading.Event quand agent.run() sort en laissant agent.running=True (cas anormal). Handlers SIGTERM/SIGINT/SIGBREAK pour shutdown propre. Invisible en mode interactif normal (icon.run() ne sort jamais). Pas de modification de smart_tray ni de la cascade visuelle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 16:51:19 +02:00

1 2 3 4 5 ...

413 Commits