docs: cartographie complète d'exécution + fix target_text ORA + worker InfiGUI fichiers

docs/CARTOGRAPHY.md : - Carte complète des 2 chemins d'exécution (Legacy vs ORA) - 12 systèmes de grounding identifiés dont 3 morts - Trace du champ target_text de la capture au clic - Fonctions existantes non branchées (verify, recovery, ShadowLearningHook) - Budget VRAM, fichiers critiques, règles de modification Fix target_text ORA (observe_reason_act.py:217) : - Détecte les target_text absurdes ("click_anchor") - Appelle _describe_anchor_image() (VLM) pour décrire le crop - Même logique que le legacy execute.py:893 Worker InfiGUI via fichiers /tmp : - Communication par fichiers (pas subprocess pipes, pas HTTP) - Process indépendant lancé avant le backend - Résout le crash CUDA dans Flask/FastAPI/uvicorn Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 12:37:43 +02:00
parent f73a2a59a9
commit 3d6868f029
6 changed files with 878 additions and 581 deletions
--- a/docs/CARTOGRAPHY.md
+++ b/docs/CARTOGRAPHY.md
@@ -0,0 +1,233 @@
+# Cartographie d'exécution — RPA Vision V3 (Léa)
+
+> **Date** : 26 avril 2026  
+> **Objectif** : carte complète de ce qui est branché, ce qui ne l'est pas, et comment les données transitent.  
+> **Règle** : LIRE CE DOCUMENT AVANT TOUTE MODIFICATION DE CODE.
+
+---
+
+## 1. Point d'entrée : deux chemins disjoints
+
+```
+POST /api/v3/execute/start  (execute.py:1528)
+  ├── execution_mode = "verified"  → run_workflow_verified()  ← CHEMIN ORA
+  └── execution_mode = "basic"|"intelligent"|"debug"  → execute_workflow_thread()  ← CHEMIN LEGACY
+```
+
+**Il existe DEUX exécuteurs distincts** qui dupliquent le chargement des ancres, la boucle d'étapes, le grounding, la gestion d'erreurs. Ils ne partagent que `input_handler.py`.
+
+---
+
+## 2. Chemin LEGACY (modes basic/intelligent/debug)
+
+```
+[API] POST /execute/start (mode=intelligent)
+  → [execute.py:145] execute_workflow_thread()
+    → [execute.py:160] Charge steps depuis DB
+    → BOUCLE sur chaque step:
+      │
+      ├─ RÉFLEXE PRÉ-ÉTAPE (modes intelligent/debug)
+      │   → [input_handler.py:79] check_screen_for_patterns()
+      │       → UIPatternLibrary.find_pattern(ocr_text)               ← BRANCHÉ
+      │   → [input_handler.py:129] handle_detected_pattern()
+      │       → EasyOCR full screen + clic bouton                     ← BRANCHÉ
+      │
+      ├─ CHARGEMENT ANCRE [execute.py:222-256]
+      │   params['visual_anchor'] = {
+      │     screenshot: base64 du crop,
+      │     bounding_box: {x, y, width, height},
+      │     target_text: anchor.target_text,      ← PEUT ÊTRE VIDE ("")
+      │     description: anchor.ocr_description   ← PEUT ÊTRE VIDE ("")
+      │   }
+      │
+      ├─ execute_action(action_type, params) [execute.py:278]
+      │   │
+      │   ├─ ACTION = click_anchor [execute.py:862-1096]
+      │   │  │
+      │   │  ├─ MODE basic: coordonnées statiques (bbox centre)
+      │   │  │
+      │   │  └─ MODE intelligent/debug:
+      │   │     ├─ target_text = anchor.target_text || step.label
+      │   │     │   Si target_text == "click_anchor" et screenshot_base64:
+      │   │     │     → _describe_anchor_image() (VLM qwen2.5vl:3b)  ← BRANCHÉ
+      │   │     │
+      │   │     ├─ MÉTHODE 1: Template matching (cv2)                ← BRANCHÉ
+      │   │     ├─ MÉTHODE 2: CLIP matching (RF-DETR + CLIP)         ← BRANCHÉ
+      │   │     ├─ MÉTHODE 3: OCR → UI-TARS → VLM                   ← BRANCHÉ
+      │   │     └─ ÉCHEC: self-healing interactif                    ← BRANCHÉ
+      │   │
+      │   ├─ ACTION = type_text → safe_type_text()                   ← BRANCHÉ
+      │   ├─ ACTION = wait → sleep + pattern check                   ← BRANCHÉ
+      │   ├─ ACTION = keyboard_shortcut → pyautogui.hotkey()         ← BRANCHÉ
+      │   ├─ ACTION = ai_analyze_text → Ollama                       ← BRANCHÉ
+      │   ├─ ACTION = extract_text → docTR OCR                       ← BRANCHÉ
+      │   └─ ACTION = hover/scroll/focus → coords statiques          ← PAS DE GROUNDING
+```
+
+---
+
+## 3. Chemin ORA (mode "verified")
+
+```
+[API] POST /execute/start (mode=verified)
+  → [execute.py:1349] run_workflow_verified()
+    → [execute.py:1380-1428] Charge steps + ancres (MÊME logique que legacy)
+    → [execute.py:1433] ORALoop(verify_level='none', max_retries=2)
+    │                            ^^^^^^^^^^^^^^^^^^^
+    │                   VÉRIFICATION DÉSACTIVÉE EN DUR
+    │
+    → [ORA:1478] ora.run_workflow(steps=ora_steps)
+      │
+      BOUCLE sur chaque step:
+        │
+        ├─ [ORA:1258] OBSERVE: capture écran + pHash + titre fenêtre
+        │
+        ├─ [ORA:1263] RÉFLEXE DIALOGUE (si pHash changé > 10)
+        │   → DialogHandler.handle_if_dialog(screenshot)               ← BRANCHÉ
+        │     → EasyOCR full screen → mots-clés dialogues connus
+        │     → InfiGUI worker (/tmp/infigui_*)
+        │     → Fallback OCR clic
+        │
+        ├─ [ORA:196] REASON: reason_workflow_step()
+        │   target_text = anchor.target_text || anchor.description
+        │   Si vide ou nom d'action → _describe_anchor_image()         ← CORRIGÉ 26/04
+        │   Si encore vide → label (si pas un nom d'action)
+        │
+        ├─ [ORA:1306] ACT → _act_click()
+        │   │
+        │   ├─ RPA_USE_FAST_PIPELINE=1 (défaut)
+        │   │   → FastSmartThinkPipeline
+        │   │     → FastDetector (RF-DETR 120ms + EasyOCR 192ms)       ← BRANCHÉ
+        │   │     → SmartMatcher (texte+type+position+voisins <1ms)    ← BRANCHÉ
+        │   │     → SignatureStore.lookup() (apprentissage)             ← BRANCHÉ
+        │   │     → Score ≥ 0.90 → action directe                     ← BRANCHÉ
+        │   │     → Score 0.60-0.90 → ThinkArbiter
+        │   │       → UITarsGrounder → InfiGUI worker (/tmp)           ← BRANCHÉ
+        │   │     → Score < 0.60 → ThinkArbiter seul                  ← BRANCHÉ
+        │   │     → ÉCHEC → _try_fallback()
+        │   │       → GroundingPipeline                    ← NON BRANCHÉ (jamais connecté)
+        │   │
+        │   ├─ FALLBACK template matching (cv2, >0.75)                 ← BRANCHÉ
+        │   ├─ FALLBACK OCR (_grounding_ocr)                           ← BRANCHÉ
+        │   └─ DERNIER RECOURS: coords statiques                       ← BRANCHÉ
+        │
+        ├─ [ORA:1337] VÉRIFICATION TITRE (post-action)
+        │   → TitleVerifier → EasyOCR crop 45px                       ← BRANCHÉ
+        │   *** NE LIT RIEN EN VM (titre Windows dans le framebuffer) ← PROBLÈME
+        │
+        ├─ [ORA:1358] VERIFY: verify(pre, post, decision)
+        │   *** DÉSACTIVÉ (verify_level='none') ***                    ← NON BRANCHÉ
+        │
+        └─ [ORA:1362] RECOVERY (5 stratégies)
+            *** JAMAIS ATTEINT ***                                     ← NON BRANCHÉ
+            - _recover_element_not_found (wait+scroll+UI-TARS)
+            - _recover_overlay_blocking (pattern+Win+D)
+            - _recover_wrong_screen (Alt+Tab)
+            - _recover_no_effect (double-clic+décalage)
+            - _classify_error (4 types)
+```
+
+---
+
+## 4. Trace du champ `target_text`
+
+```
+CAPTURE (VWB CapturePanel → capture.py:201-263)
+  → OCR sur crop élargi (docTR)
+  → VLM qwen2.5vl:3b décrit le crop
+  → Si les deux échouent → target_text = ""
+  → Aucune erreur remontée au frontend
+
+STOCKAGE (DB)
+  → VisualAnchor.target_text (nullable) = "" si non renseigné
+
+CHARGEMENT (execute.py:1400-1428)
+  → SI anchor.target_text existe et non vide → injecté dans visual_anchor
+  → SINON → la clé 'target_text' N'EXISTE PAS dans le dict
+
+LEGACY (execute.py:893-907)
+  → target_text = anchor.get('target_text', '')
+  → SI vide ET c'est un nom d'action → _describe_anchor_image()       ← COMPENSE
+  → SINON → fallback sur step_label
+
+ORA (observe_reason_act.py:217) — CORRIGÉ LE 26 AVRIL
+  → target_text = anchor.target_text || anchor.description
+  → SI vide ou nom d'action → _describe_anchor_image()                ← AJOUTÉ
+  → SINON → label (si pas un nom d'action)
+```
+
+---
+
+## 5. Fonctions existantes NON BRANCHÉES
+
+| Fonction | Fichier | Raison |
+|----------|---------|--------|
+| `verify()` + `_classify_error()` + 5 `_recover_*()` | observe_reason_act.py | verify_level='none' en dur |
+| `GroundingPipeline` (ancien) | pipeline.py | set_fallback_pipeline() jamais appelé |
+| `TemplateMatcher` (classe centralisée) | template_matcher.py | Utilisé seulement par GroundingPipeline mort |
+| `ShadowLearningHook` | shadow_learning_hook.py | Jamais importé dans aucun flux |
+| `CognitiveContext` | working_memory.py | Mode instruction seulement |
+| `VLM pre-check` | observe_reason_act.py | `if False:` en dur |
+| hover/focus grounding | execute.py | Coords statiques uniquement |
+| `grounding/server.py` (FastAPI :8200) | server.py | Crash CUDA, remplacé par worker fichiers |
+
+---
+
+## 6. Les 12 systèmes de grounding
+
+| # | Système | Fichier | Branché ? |
+|---|---------|---------|-----------|
+| 1 | Template matching inline (legacy) | execute.py:914 | ✅ Legacy |
+| 2 | Template matching inline (ORA) | ORA:1475 | ✅ ORA fallback |
+| 3 | CLIP matching (IntelligentExecutor) | intelligent_executor.py | ✅ Legacy |
+| 4 | OCR docTR (_grounding_ocr) | input_handler.py:430 | ✅ Legacy + ORA |
+| 5 | UI-TARS Ollama (_grounding_ui_tars) | input_handler.py:513 | ✅ Legacy |
+| 6 | VLM reasoning (_grounding_vlm) | input_handler.py:627 | ✅ Legacy seulement |
+| 7 | FastDetector (RF-DETR + EasyOCR) | fast_detector.py | ✅ ORA |
+| 8 | SmartMatcher | smart_matcher.py | ✅ ORA |
+| 9 | ThinkArbiter → InfiGUI worker | think_arbiter.py + ui_tars_grounder.py | ✅ ORA |
+| 10 | DialogHandler → InfiGUI | dialog_handler.py | ✅ ORA réflexe |
+| 11 | GroundingPipeline (ancien) | pipeline.py | ❌ Jamais connecté |
+| 12 | TemplateMatcher classe | template_matcher.py | ❌ Via GroundingPipeline mort |
+
+---
+
+## 7. Gestion des dialogues (2 systèmes parallèles)
+
+| # | Système | Base de patterns | OCR | Clic | Utilisé par |
+|---|---------|-----------------|-----|------|-------------|
+| 1 | UIPatternLibrary + handle_detected_pattern | 28 patterns builtin | docTR/EasyOCR | OCR find bouton | Legacy |
+| 2 | DialogHandler + KNOWN_DIALOGS | 15 titres connus | EasyOCR full screen | InfiGUI | ORA |
+
+---
+
+## 8. Budget VRAM (configuration actuelle)
+
+| Composant | VRAM | Process |
+|-----------|------|---------|
+| InfiGUI-G1-3B (NF4) | 2.41 GB | Worker indépendant (/tmp) |
+| RF-DETR Medium | 0.8 GB | Process Flask |
+| EasyOCR | ~1 GB (GPU) | Process Flask |
+| Ollama qwen2.5vl:3b (si appelé) | ~3.2 GB | Process Ollama |
+| Chrome + système | ~1.3 GB | — |
+| **Total max** | **~8.7 GB / 12 GB** | |
+
+---
+
+## 9. Fichiers critiques par ordre d'importance
+
+1. `core/execution/observe_reason_act.py` — boucle ORA, _act_click, reason, verify
+2. `visual_workflow_builder/backend/api_v3/execute.py` — API, chargement ancres, legacy executor
+3. `core/grounding/fast_pipeline.py` — pipeline FAST→SMART→THINK
+4. `core/grounding/ui_tars_grounder.py` — client InfiGUI worker
+5. `core/grounding/infigui_worker.py` — worker InfiGUI (process indépendant)
+6. `core/execution/input_handler.py` — OCR, UI-TARS Ollama, safe_type_text, patterns
+7. `core/grounding/dialog_handler.py` — gestion dialogues ORA
+8. `core/grounding/fast_detector.py` — RF-DETR + EasyOCR
+9. `core/grounding/smart_matcher.py` — matching contextuel
+10. `core/knowledge/ui_patterns.py` — patterns réflexes
+
+---
+
+> **Dernière mise à jour** : 26 avril 2026  
+> **Prochaine action** : rebrancher verify + recovery, converger les 2 exécuteurs, nettoyer le code mort.