feat: replay visuel Windows opérationnel — template matching + VWB complet

- Bouton "Windows" dans VWB pour exécuter sur le PC distant
- Template matching OpenCV multi-scale pour localiser les ancres visuelles
- Proxy VWB→streaming server avec chargement ancre (thumb, pas full)
- Fix executor Windows : mss lazy, result reporting, debug prints
- Fix poll replay permanent (sans session active)
- Mapping types VWB→executor (click_anchor→click, type_text→type)
- CORS streaming server, capture Windows dans VWB
- Dédup heartbeats côté client (hash perceptuel)
- Mode cloud VLM configurable via RPA_VLM_MODEL
- Fix resolve_target : pas de ScreenAnalyzer fallback (trop lent)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Dom
2026-03-17 18:56:44 +01:00
parent dd149c1cbb
commit 371db69543
7 changed files with 361 additions and 15 deletions

View File

@@ -321,10 +321,21 @@ Respond with just the role name, nothing else."""
"confidence": 0.3, "success": True,
}
prompt = """Classify this UI element. Reply with ONLY a JSON object.
# Le system prompt contraint le thinking de qwen3-vl et réduit
# drastiquement le nombre de tokens gaspillés en réflexion interne.
# Sans system prompt, le modèle pense 500-800 tokens et épuise le budget.
# Avec, il ne pense que 100-400 tokens et produit du JSON fiable.
system_prompt = "You are a JSON-only UI classifier. No thinking. No explanation. Output raw JSON only."
prompt = """Classify this UI element. Reply with ONLY a JSON object, nothing else.
Types: button, text_input, checkbox, radio, dropdown, tab, link, icon, table_row, menu_item
Roles: primary_action, cancel, submit, form_input, search_field, navigation, settings, close, delete, edit, save
Example: {"type": "button", "role": "submit", "text": "OK"}
Example 1: {"type": "button", "role": "submit", "text": "OK"}
Example 2: {"type": "text_input", "role": "form_input", "text": ""}
Example 3: {"type": "icon", "role": "close", "text": "X"}
Your answer:"""
# Retry une fois si réponse vide
@@ -332,8 +343,9 @@ Your answer:"""
result = self.generate(
prompt,
image=element_image,
system_prompt=system_prompt,
temperature=0.1,
max_tokens=200,
max_tokens=300,
force_json=False
)