Réduit le cycle debug d'un workflow de 1-2 min (replay manuel via
Windows + Léa V1 + maquette) à ~2-5s (mock client Linux contre
serveur de streaming localhost:5005). 30-60× plus rapide.
Architecture :
- tools/test_replay_e2e.py — harness CLI (~580 lignes), reproduit la
chaîne réelle : VWB /api/v3/execute-windows → streaming /replay/raw
→ boucle /replay/next côté harness avec resolve_target sur un
screenshot fixture → POST /replay/result. Pas de modification serveur.
- tests/e2e/test_urgence_aiva_demo.py — wrapper pytest (smoke).
- tests/e2e/urgence_aiva_demo_expected.yaml — référence générée par
--export-expected, pour comparaison régression auto.
- pytest.ini — ajout du marqueur e2e.
Usage :
python tools/test_replay_e2e.py --execution-mode autonomous --max-iter 120 --verbose
python tools/test_replay_e2e.py --single-step 8 --shot <heartbeat>.png
python tools/test_replay_e2e.py --expected tests/e2e/urgence_aiva_demo_expected.yaml
pytest tests/e2e -v -m e2e
Sortie : tableau Markdown step × méthode × score × pos × status × diag.
Limitations connues (extensions post-démo) :
- Une seule fixture screenshot pour tout le replay → click_anchor réalistes
échouent dès qu'on dépasse l'écran fixture. Carte step_id → fixture à venir.
- extract_text/table/t2a_decision exécutés côté serveur, observables mais
pas modifiables.
- Pas de simulation screenshot_after → ReplayVerifier (Critic VLM) ne tourne pas.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
82 lines
1.8 KiB
YAML
82 lines
1.8 KiB
YAML
workflow_session_id: test_e2e_sess_20260507T220822_c91f30
|
|
screenshot: /home/dom/ai/rpa_vision_v3/data/training/live_sessions/bg_DESKTOP-58D5CAC_windows/shots/heartbeat_1773792436.png
|
|
steps:
|
|
- order: 1
|
|
action_id: wait_before_start
|
|
action_type: wait
|
|
by_text: ''
|
|
method: simulated
|
|
score: 0.0
|
|
x_pct: null
|
|
y_pct: null
|
|
status: OK
|
|
diag: wait simulé
|
|
elapsed_ms: 1.013040542602539
|
|
- order: 2
|
|
action_id: replay_free_74c2d90b
|
|
action_type: pause:user_request
|
|
by_text: ''
|
|
method: ''
|
|
score: 0.0
|
|
x_pct: null
|
|
y_pct: null
|
|
status: PAUSED
|
|
diag: 'Léa : j''ai trouvé ces dossiers : []. Pour la démo je vais traiter MOREL
|
|
Catherin'
|
|
elapsed_ms: 0.0
|
|
- order: 3
|
|
action_id: step_288d0bceea90_1778162737752
|
|
action_type: click
|
|
by_text: '25003284'
|
|
method: fallback
|
|
score: 0.0
|
|
x_pct: 0.5
|
|
y_pct: 0.5
|
|
status: FAIL
|
|
diag: template_matching_failed
|
|
elapsed_ms: 1064.7194385528564
|
|
- order: 4
|
|
action_id: step_288d0bceea90_1778162737752_retry1
|
|
action_type: click
|
|
by_text: '25003284'
|
|
method: fallback
|
|
score: 0.0
|
|
x_pct: 0.5
|
|
y_pct: 0.5
|
|
status: FAIL
|
|
diag: template_matching_failed
|
|
elapsed_ms: 1075.0248432159424
|
|
- order: 5
|
|
action_id: wait_retry_381c1b
|
|
action_type: wait
|
|
by_text: ''
|
|
method: simulated
|
|
score: 0.0
|
|
x_pct: null
|
|
y_pct: null
|
|
status: OK
|
|
diag: wait simulé
|
|
elapsed_ms: 12.79759407043457
|
|
- order: 6
|
|
action_id: step_288d0bceea90_1778162737752_retry2
|
|
action_type: click
|
|
by_text: '25003284'
|
|
method: fallback
|
|
score: 0.0
|
|
x_pct: 0.5
|
|
y_pct: 0.5
|
|
status: FAIL
|
|
diag: template_matching_failed
|
|
elapsed_ms: 1037.236213684082
|
|
- order: 7
|
|
action_id: step_288d0bceea90_1778162737752_retry3
|
|
action_type: click
|
|
by_text: '25003284'
|
|
method: fallback
|
|
score: 0.0
|
|
x_pct: 0.5
|
|
y_pct: 0.5
|
|
status: FAIL
|
|
diag: template_matching_failed
|
|
elapsed_ms: 1051.6366958618164
|