Files

Dom 3d6868f029 docs: cartographie complète d'exécution + fix target_text ORA + worker InfiGUI fichiers

docs/CARTOGRAPHY.md :
- Carte complète des 2 chemins d'exécution (Legacy vs ORA)
- 12 systèmes de grounding identifiés dont 3 morts
- Trace du champ target_text de la capture au clic
- Fonctions existantes non branchées (verify, recovery, ShadowLearningHook)
- Budget VRAM, fichiers critiques, règles de modification

Fix target_text ORA (observe_reason_act.py:217) :
- Détecte les target_text absurdes ("click_anchor")
- Appelle _describe_anchor_image() (VLM) pour décrire le crop
- Même logique que le legacy execute.py:893

Worker InfiGUI via fichiers /tmp :
- Communication par fichiers (pas subprocess pipes, pas HTTP)
- Process indépendant lancé avant le backend
- Résout le crash CUDA dans Flask/FastAPI/uvicorn

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-26 12:37:43 +02:00

11 KiB

Raw Blame History

Cartographie d'exécution — RPA Vision V3 (Léa)

Date : 26 avril 2026
Objectif : carte complète de ce qui est branché, ce qui ne l'est pas, et comment les données transitent.
Règle : LIRE CE DOCUMENT AVANT TOUTE MODIFICATION DE CODE.

1. Point d'entrée : deux chemins disjoints

POST /api/v3/execute/start  (execute.py:1528)
  ├── execution_mode = "verified"  → run_workflow_verified()  ← CHEMIN ORA
  └── execution_mode = "basic"|"intelligent"|"debug"  → execute_workflow_thread()  ← CHEMIN LEGACY

Il existe DEUX exécuteurs distincts qui dupliquent le chargement des ancres, la boucle d'étapes, le grounding, la gestion d'erreurs. Ils ne partagent que input_handler.py.

2. Chemin LEGACY (modes basic/intelligent/debug)

[API] POST /execute/start (mode=intelligent)
  → [execute.py:145] execute_workflow_thread()
    → [execute.py:160] Charge steps depuis DB
    → BOUCLE sur chaque step:
      │
      ├─ RÉFLEXE PRÉ-ÉTAPE (modes intelligent/debug)
      │   → [input_handler.py:79] check_screen_for_patterns()
      │       → UIPatternLibrary.find_pattern(ocr_text)               ← BRANCHÉ
      │   → [input_handler.py:129] handle_detected_pattern()
      │       → EasyOCR full screen + clic bouton                     ← BRANCHÉ
      │
      ├─ CHARGEMENT ANCRE [execute.py:222-256]
      │   params['visual_anchor'] = {
      │     screenshot: base64 du crop,
      │     bounding_box: {x, y, width, height},
      │     target_text: anchor.target_text,      ← PEUT ÊTRE VIDE ("")
      │     description: anchor.ocr_description   ← PEUT ÊTRE VIDE ("")
      │   }
      │
      ├─ execute_action(action_type, params) [execute.py:278]
      │   │
      │   ├─ ACTION = click_anchor [execute.py:862-1096]
      │   │  │
      │   │  ├─ MODE basic: coordonnées statiques (bbox centre)
      │   │  │
      │   │  └─ MODE intelligent/debug:
      │   │     ├─ target_text = anchor.target_text || step.label
      │   │     │   Si target_text == "click_anchor" et screenshot_base64:
      │   │     │     → _describe_anchor_image() (VLM qwen2.5vl:3b)  ← BRANCHÉ
      │   │     │
      │   │     ├─ MÉTHODE 1: Template matching (cv2)                ← BRANCHÉ
      │   │     ├─ MÉTHODE 2: CLIP matching (RF-DETR + CLIP)         ← BRANCHÉ
      │   │     ├─ MÉTHODE 3: OCR → UI-TARS → VLM                   ← BRANCHÉ
      │   │     └─ ÉCHEC: self-healing interactif                    ← BRANCHÉ
      │   │
      │   ├─ ACTION = type_text → safe_type_text()                   ← BRANCHÉ
      │   ├─ ACTION = wait → sleep + pattern check                   ← BRANCHÉ
      │   ├─ ACTION = keyboard_shortcut → pyautogui.hotkey()         ← BRANCHÉ
      │   ├─ ACTION = ai_analyze_text → Ollama                       ← BRANCHÉ
      │   ├─ ACTION = extract_text → docTR OCR                       ← BRANCHÉ
      │   └─ ACTION = hover/scroll/focus → coords statiques          ← PAS DE GROUNDING

3. Chemin ORA (mode "verified")

[API] POST /execute/start (mode=verified)
  → [execute.py:1349] run_workflow_verified()
    → [execute.py:1380-1428] Charge steps + ancres (MÊME logique que legacy)
    → [execute.py:1433] ORALoop(verify_level='none', max_retries=2)
    │                            ^^^^^^^^^^^^^^^^^^^
    │                   VÉRIFICATION DÉSACTIVÉE EN DUR
    │
    → [ORA:1478] ora.run_workflow(steps=ora_steps)
      │
      BOUCLE sur chaque step:
        │
        ├─ [ORA:1258] OBSERVE: capture écran + pHash + titre fenêtre
        │
        ├─ [ORA:1263] RÉFLEXE DIALOGUE (si pHash changé > 10)
        │   → DialogHandler.handle_if_dialog(screenshot)               ← BRANCHÉ
        │     → EasyOCR full screen → mots-clés dialogues connus
        │     → InfiGUI worker (/tmp/infigui_*)
        │     → Fallback OCR clic
        │
        ├─ [ORA:196] REASON: reason_workflow_step()
        │   target_text = anchor.target_text || anchor.description
        │   Si vide ou nom d'action → _describe_anchor_image()         ← CORRIGÉ 26/04
        │   Si encore vide → label (si pas un nom d'action)
        │
        ├─ [ORA:1306] ACT → _act_click()
        │   │
        │   ├─ RPA_USE_FAST_PIPELINE=1 (défaut)
        │   │   → FastSmartThinkPipeline
        │   │     → FastDetector (RF-DETR 120ms + EasyOCR 192ms)       ← BRANCHÉ
        │   │     → SmartMatcher (texte+type+position+voisins <1ms)    ← BRANCHÉ
        │   │     → SignatureStore.lookup() (apprentissage)             ← BRANCHÉ
        │   │     → Score ≥ 0.90 → action directe                     ← BRANCHÉ
        │   │     → Score 0.60-0.90 → ThinkArbiter
        │   │       → UITarsGrounder → InfiGUI worker (/tmp)           ← BRANCHÉ
        │   │     → Score < 0.60 → ThinkArbiter seul                  ← BRANCHÉ
        │   │     → ÉCHEC → _try_fallback()
        │   │       → GroundingPipeline                    ← NON BRANCHÉ (jamais connecté)
        │   │
        │   ├─ FALLBACK template matching (cv2, >0.75)                 ← BRANCHÉ
        │   ├─ FALLBACK OCR (_grounding_ocr)                           ← BRANCHÉ
        │   └─ DERNIER RECOURS: coords statiques                       ← BRANCHÉ
        │
        ├─ [ORA:1337] VÉRIFICATION TITRE (post-action)
        │   → TitleVerifier → EasyOCR crop 45px                       ← BRANCHÉ
        │   *** NE LIT RIEN EN VM (titre Windows dans le framebuffer) ← PROBLÈME
        │
        ├─ [ORA:1358] VERIFY: verify(pre, post, decision)
        │   *** DÉSACTIVÉ (verify_level='none') ***                    ← NON BRANCHÉ
        │
        └─ [ORA:1362] RECOVERY (5 stratégies)
            *** JAMAIS ATTEINT ***                                     ← NON BRANCHÉ
            - _recover_element_not_found (wait+scroll+UI-TARS)
            - _recover_overlay_blocking (pattern+Win+D)
            - _recover_wrong_screen (Alt+Tab)
            - _recover_no_effect (double-clic+décalage)
            - _classify_error (4 types)

4. Trace du champ `target_text`

CAPTURE (VWB CapturePanel → capture.py:201-263)
  → OCR sur crop élargi (docTR)
  → VLM qwen2.5vl:3b décrit le crop
  → Si les deux échouent → target_text = ""
  → Aucune erreur remontée au frontend

STOCKAGE (DB)
  → VisualAnchor.target_text (nullable) = "" si non renseigné

CHARGEMENT (execute.py:1400-1428)
  → SI anchor.target_text existe et non vide → injecté dans visual_anchor
  → SINON → la clé 'target_text' N'EXISTE PAS dans le dict

LEGACY (execute.py:893-907)
  → target_text = anchor.get('target_text', '')
  → SI vide ET c'est un nom d'action → _describe_anchor_image()       ← COMPENSE
  → SINON → fallback sur step_label

ORA (observe_reason_act.py:217) — CORRIGÉ LE 26 AVRIL
  → target_text = anchor.target_text || anchor.description
  → SI vide ou nom d'action → _describe_anchor_image()                ← AJOUTÉ
  → SINON → label (si pas un nom d'action)

5. Fonctions existantes NON BRANCHÉES

Fonction	Fichier	Raison
`verify()` + `_classify_error()` + 5 `_recover_*()`	observe_reason_act.py	verify_level='none' en dur
`GroundingPipeline` (ancien)	pipeline.py	set_fallback_pipeline() jamais appelé
`TemplateMatcher` (classe centralisée)	template_matcher.py	Utilisé seulement par GroundingPipeline mort
`ShadowLearningHook`	shadow_learning_hook.py	Jamais importé dans aucun flux
`CognitiveContext`	working_memory.py	Mode instruction seulement
`VLM pre-check`	observe_reason_act.py	`if False:` en dur
hover/focus grounding	execute.py	Coords statiques uniquement
`grounding/server.py` (FastAPI :8200)	server.py	Crash CUDA, remplacé par worker fichiers

6. Les 12 systèmes de grounding

#	Système	Fichier	Branché ?
1	Template matching inline (legacy)	execute.py:914	✅ Legacy
2	Template matching inline (ORA)	ORA:1475	✅ ORA fallback
3	CLIP matching (IntelligentExecutor)	intelligent_executor.py	✅ Legacy
4	OCR docTR (_grounding_ocr)	input_handler.py:430	✅ Legacy + ORA
5	UI-TARS Ollama (_grounding_ui_tars)	input_handler.py:513	✅ Legacy
6	VLM reasoning (_grounding_vlm)	input_handler.py:627	✅ Legacy seulement
7	FastDetector (RF-DETR + EasyOCR)	fast_detector.py	✅ ORA
8	SmartMatcher	smart_matcher.py	✅ ORA
9	ThinkArbiter → InfiGUI worker	think_arbiter.py + ui_tars_grounder.py	✅ ORA
10	DialogHandler → InfiGUI	dialog_handler.py	✅ ORA réflexe
11	GroundingPipeline (ancien)	pipeline.py	❌ Jamais connecté
12	TemplateMatcher classe	template_matcher.py	❌ Via GroundingPipeline mort

7. Gestion des dialogues (2 systèmes parallèles)

#	Système	Base de patterns	OCR	Clic	Utilisé par
1	UIPatternLibrary + handle_detected_pattern	28 patterns builtin	docTR/EasyOCR	OCR find bouton	Legacy
2	DialogHandler + KNOWN_DIALOGS	15 titres connus	EasyOCR full screen	InfiGUI	ORA

8. Budget VRAM (configuration actuelle)

Composant	VRAM	Process
InfiGUI-G1-3B (NF4)	2.41 GB	Worker indépendant (/tmp)
RF-DETR Medium	0.8 GB	Process Flask
EasyOCR	~1 GB (GPU)	Process Flask
Ollama qwen2.5vl:3b (si appelé)	~3.2 GB	Process Ollama
Chrome + système	~1.3 GB	—
Total max	~8.7 GB / 12 GB

9. Fichiers critiques par ordre d'importance

core/execution/observe_reason_act.py — boucle ORA, _act_click, reason, verify
visual_workflow_builder/backend/api_v3/execute.py — API, chargement ancres, legacy executor
core/grounding/fast_pipeline.py — pipeline FAST→SMART→THINK
core/grounding/ui_tars_grounder.py — client InfiGUI worker
core/grounding/infigui_worker.py — worker InfiGUI (process indépendant)
core/execution/input_handler.py — OCR, UI-TARS Ollama, safe_type_text, patterns
core/grounding/dialog_handler.py — gestion dialogues ORA
core/grounding/fast_detector.py — RF-DETR + EasyOCR
core/grounding/smart_matcher.py — matching contextuel
core/knowledge/ui_patterns.py — patterns réflexes

Dernière mise à jour : 26 avril 2026
Prochaine action : rebrancher verify + recovery, converger les 2 exécuteurs, nettoyer le code mort.

11 KiB Raw Blame History