rpa_vision_v3

Dom/rpa_vision_v3

Fork 0

Files

History

Dom 73ddcdb29d

security-audit / Bandit (scan statique) (push) Successful in 12s

Details

security-audit / pip-audit (CVE dépendances) (push) Successful in 10s

Details

security-audit / Scan secrets (grep) (push) Successful in 9s

Details

tests / Lint (ruff + black) (push) Successful in 14s

Details

tests / Tests unitaires (sans GPU) (push) Failing after 14s

Details

tests / Tests sécurité (critique) (push) Has been skipped

Details

feat: chaîne de grounding 3 niveaux + refonte capture écran

Grounding en cascade quand CLIP/template échouent :
1. OCR (docTR) → cherche le texte exact sur l'écran (~1s)
2. UI-TARS grounding → "click on X" → coordonnées (~3s, 94% ScreenSpot)
3. VLM reasoning → raisonnement complet + confirmation OCR (~10s)

find_element_on_screen() dans input_handler.py (partagé VWB + Léa).
Câblé dans find_and_click() et execute_action() comme fallback.

Refonte capture écran :
- mss.monitors[0] (composite) pour capturer la VM en plein écran
- FullscreenSelector réécrit : overlay via getBoundingClientRect()
- Bboxes et sélection alignées avec l'image (calcul JS, pas CSS)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-21 09:31:38 +02:00

__init__.py

feat: unification VWB ↔ Léa — import/export bidirectionnel

2026-03-18 22:41:34 +01:00

capture.py

refactor(vwb): refonte complète capture écran — stable définitivement

2026-04-21 09:03:19 +02:00

dag_execute.py

feat(capture_server): auth Bearer + bind localhost + anti-path-traversal

2026-04-14 16:47:45 +02:00

execute.py

feat: chaîne de grounding 3 niveaux + refonte capture écran