feat(vwb): Intégration UI-DETR-1 + Toggle mode Basique/Intelligent/Debug

- Toggle 3 modes dans le header: Basique (coords fixes), Intelligent (vision IA), Debug (overlay) - Service UI-DETR-1 pour détection d'éléments UI (510MB model, ~800ms/image) - API endpoints: /api/ui-detection/detect, /preload, /status, /find-element - Overlay des bboxes détectées en mode Debug (miniature + plein écran) - Clic sur élément détecté pour le sélectionner comme ancre - Document de vision produit: docs/VISION_RPA_INTELLIGENT.md - Configuration CORS étendue pour ports locaux Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 14:13:32 +01:00
parent 483653a0b4
commit d8d086dac5
11 changed files with 1456 additions and 19 deletions
--- a/docs/VISION_RPA_INTELLIGENT.md
+++ b/docs/VISION_RPA_INTELLIGENT.md
@@ -0,0 +1,242 @@
 # RPA Vision - Architecture et Vision Produit
 > Document de référence - Janvier 2026
 ## Vision Globale
 RPA Vision n'est **pas** un simple enregistreur de macros. C'est un **agent RPA apprenant** qui fonctionne comme un stagiaire :
 1. **Il observe** - Capture les démonstrations humaines
 2. **Il apprend** - Stocke les patterns visuels (embeddings CLIP)
 3. **Il essaie** - Exécute avec supervision
 4. **Il se trompe** - Détecte ses erreurs
 5. **On le corrige** - Feedback humain
 6. **Il devient autonome** - Généralise à de nouveaux cas
 ## Architecture Technique
 ### Machine Cible
 - **CPU** : Ryzen 9 9050X
 - **RAM** : 128 Go DDR5 4040 MHz
 - **GPU** : NVIDIA RTX 5090 12 Go
 Cette configuration permet de tout faire tourner en local (pas de cloud).
 ### Composants Principaux
 ```
 rpa_vision_v3/
 ├── core/
 │   ├── learning/          # Mémoire + apprentissage continu
 │   ├── embedding/         # Représentation visuelle (CLIP)
 │   ├── healing/           # Auto-correction (self-healing)
 │   └── workflow/          # Orchestration
 │
 ├── agent_v0/              # Agents autonomes (produit final)
 │
 ├── web_dashboard/         # Interface de supervision (produit final)
 │
 └── visual_workflow_builder/  # VWB (outil transitoire)
 ```
 ### Pipeline de Détection (Hybride)
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │                        ÉCRAN CAPTURÉ                            │
 └─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
              ┌───────────────────────────────┐
              │      MODE 1: UI MAP JSON      │
              │  (OmniParser / UI-DETR-1)     │
              │  → Liste de bboxes + scores   │
              └───────────────────────────────┘
                              │
                              ▼
              ┌───────────────────────────────┐
              │      DÉCIDEUR (LLM local)     │
              │  Choisit ID élément           │
              │  (pas de coords libres)       │
              └───────────────────────────────┘
                              │
                    ┌─────────┴─────────┐
                    │                   │
            Élément trouvé?      UI map bruitée?
                    │                   │
                    ▼                   ▼
        ┌─────────────────┐   ┌─────────────────┐
        │ REFINE          │   │ MODE 2: GROUND  │
        │ (crop + Moondream│   │ (SeeClick)      │
        │  ou OpenCV)     │   │ → (x,y) direct  │
        └─────────────────┘   └─────────────────┘
                    │                   │
                    └─────────┬─────────┘
                              ▼
              ┌───────────────────────────────┐
              │         VERIFY                │
              │  L'action a-t-elle eu effet?  │
              └───────────────────────────────┘
 ```
 ### Modèles Utilisés
 | Fonction | Modèle | Licence |
 |----------|--------|---------|
 | UI Map (détection) | UI-DETR-1 ou OmniParser | OK commercial |
 | Grounding (fallback) | SeeClick | OK commercial |
 | Embeddings visuels | CLIP | MIT |
 | Décideur | LLM local (Mistral/LLaMA 7-13B) | Apache/LLaMA |
 ### Terrain de jeu
 - Desktop natif Windows/Linux
 - Citrix / VDI (images compressées)
 - Petites icônes (cas difficile)
 ---
 ## Rôle de VWB (Visual Workflow Builder)
 VWB est un **outil transitoire**, pas le produit final.
 ### Utilité
 | Fonction | Description |
 |----------|-------------|
 | **Démo commerciale** | Interface visuelle impressionnante pour prospects |
 | **Bootstrap** | Créer rapidement des exemples d'apprentissage |
 | **Correction** | Humain corrige les erreurs de l'agent via UI |
 | **Accélérateur** | Génère des données d'entraînement validées |
 ### Évolution prévue
 | Aujourd'hui (VWB) | Demain (Produit final) |
 |-------------------|------------------------|
 | Interface drag & drop | Instructions texte/vocal |
 | Workflows manuels | Workflows générés par l'agent |
 | Humain dessine le chemin | Agent déduit le chemin |
 | VWB + Dashboard | Dashboard + Agents seuls |
 ---
 ## Modes d'Exécution VWB
 ### Toggle Global (3 modes)
 ```
 ┌─────────────────────────────────────────────┐
 │ ○ Basique  │ ● Intelligent │ ○ Debug      │
 └─────────────────────────────────────────────┘
 ```
 ### Comparaison des modes
 | Fonction | Basique | Intelligent | Debug |
 |----------|---------|-------------|-------|
 | Localisation | Coordonnées fixes | UI-DETR + CLIP | UI-DETR + CLIP |
 | Décision | Séquentiel strict | LLM choisit | LLM choisit |
 | Self-healing | OFF | ON | ON |
 | Vérification | Aucune | Après chaque action | Après chaque action |
 | Overlay visuel | Aucun | Aucun | Bboxes + scores |
 | Vitesse | Rapide | Plus lent | Plus lent |
 | Usage | Démo simple | Démo "magie" | Debug interne |
 ---
 ## Scénario de Démo Type
 ### Acte 1 : "Le robot classique"
 **[Mode: BASIQUE]**
 - Montrer un workflow simple : login → recherche → export
 - Exécution fluide, rapide, prévisible
 - Message : "Voici ce que font les RPA classiques"
 ### Acte 2 : "Le problème"
 **[Mode: BASIQUE]**
 - Modifier légèrement l'interface (bouton déplacé, thème différent)
 - Relancer le workflow
 - **ÇA CASSE**
 - Message : "Voilà pourquoi les RPA coûtent cher en maintenance"
 ### Acte 3 : "La magie"
 **[Mode: INTELLIGENT]**
 - Même workflow, même interface modifiée
 - L'agent cherche, trouve, s'adapte
 - **ÇA MARCHE**
 - Message : "Notre système apprend et s'adapte"
 ### Acte 4 : "Le futur"
 - Montrer le dashboard avec les agents
 - Message : "À terme, plus besoin de dessiner. Vous lui dites ce que vous voulez."
 ---
 ## Compréhension des Intentions
 Le système utilise une approche **hybride** :
 1. **Matching sémantique** : Compare l'instruction avec les workflows connus (embeddings)
 2. **Planification LLM** : Si pas de match direct, le LLM local décompose l'instruction en étapes
 ```
 Instruction: "Créer une facture pour le client Dupont"
                              │
                              ▼
              ┌───────────────────────────────┐
              │  Matching workflows connus    │
              │  "créer facture" → 85% match  │
              └───────────────────────────────┘
                              │
                    Match trouvé?
                    │         │
                   OUI       NON
                    │         │
                    ▼         ▼
            Exécuter      LLM planifie
            workflow      nouvelles étapes
            existant      (généralisation)
 ```
 ---
 ## Données d'Apprentissage
 VWB génère des données pour entraîner le moteur principal :
 ### Format d'export
 ```json
 {
  "screenshot_before": "base64...",
  "action": {
    "type": "click_anchor",
    "target_description": "Bouton Valider",
    "anchor_embedding": [0.12, -0.34, ...],
    "coordinates": {"x": 450, "y": 230}
  },
  "screenshot_after": "base64...",
  "success": true,
  "human_validated": true
 }
 ```
 ### Boucle d'apprentissage
 1. Agent propose une action
 2. Humain valide/corrige via VWB
 3. Correction stockée dans learning repository
 4. Modèle s'améliore (fine-tuning incrémental)
 ---
 ## Prochaines Étapes
 1. [x] Frontend VWB v4 avec React Flow
 2. [ ] Toggle Mode Basique/Intelligent/Debug
 3. [ ] Intégration UI-DETR-1 pour détection
 4. [ ] Intégration SeeClick en fallback
 5. [ ] Overlay Debug (affichage bboxes)
 6. [ ] Export données d'apprentissage
 7. [ ] Connexion au moteur principal
 ---
 *Document créé le 23 janvier 2026*
--- a/visual_workflow_builder/backend/api/ui_detection.py
+++ b/visual_workflow_builder/backend/api/ui_detection.py
@@ -0,0 +1,237 @@
 """
 API Blueprint pour la détection UI avec UI-DETR-1
 """
 from flask import Blueprint, request, jsonify
 from flask_cors import cross_origin
 import base64
 import io
 from PIL import Image
 ui_detection_bp = Blueprint('ui_detection', __name__, url_prefix='/api/ui-detection')
 # Import lazy du service (le modèle est lourd)
 _service = None
 def get_service():
    """Lazy loading du service de détection"""
    global _service
    if _service is None:
        from services.ui_detection_service import (
            detect_from_base64,
            detect_from_file,
            annotated_image_to_base64,
            preload_model
        )
        _service = {
            'detect_from_base64': detect_from_base64,
            'detect_from_file': detect_from_file,
            'annotated_image_to_base64': annotated_image_to_base64,
            'preload_model': preload_model
        }
    return _service
@ui_detection_bp.route('/detect', methods=['POST'])
@cross_origin()
 def detect_ui_elements():
    """
    Détecte les éléments UI dans une image
    Request body (JSON):
        - image_base64: Image encodée en base64 (requis)
        - threshold: Seuil de confiance (optionnel, défaut: 0.35)
        - annotate: Retourner l'image annotée (optionnel, défaut: false)
        - show_confidence: Afficher les scores sur l'image annotée (optionnel, défaut: false)
    Response:
        - success: bool
        - result: {
            elements: [...],
            count: int,
            processing_time_ms: float,
            image_size: {width, height},
            model: str,
            annotated_image_base64?: str (si annotate=true)
        }
    """
    try:
        data = request.get_json()
        if not data or 'image_base64' not in data:
            return jsonify({
                'success': False,
                'error': 'image_base64 est requis'
            }), 400
        image_base64 = data['image_base64']
        threshold = data.get('threshold', 0.35)
        annotate = data.get('annotate', False)
        show_confidence = data.get('show_confidence', False)
        # Valider le threshold
        threshold = max(0.1, min(1.0, float(threshold)))
        service = get_service()
        # Détecter les éléments
        result = service['detect_from_base64'](image_base64, threshold)
        response_data = result.to_dict()
        # Générer l'image annotée si demandé
        if annotate:
            # Décoder l'image originale
            if ',' in image_base64:
                image_base64_clean = image_base64.split(',')[1]
            else:
                image_base64_clean = image_base64
            image_bytes = base64.b64decode(image_base64_clean)
            image = Image.open(io.BytesIO(image_bytes))
            # Créer l'image annotée
            annotated_b64 = service['annotated_image_to_base64'](
                image, result,
                show_ids=True,
                show_confidence=show_confidence
            )
            response_data['annotated_image_base64'] = f"data:image/png;base64,{annotated_b64}"
        return jsonify({
            'success': True,
            'result': response_data
        })
    except Exception as e:
        import traceback
        traceback.print_exc()
        return jsonify({
            'success': False,
            'error': str(e)
        }), 500
@ui_detection_bp.route('/preload', methods=['POST'])
@cross_origin()
 def preload_model():
    """
    Précharge le modèle UI-DETR-1 en mémoire
    Utile pour éviter la latence du premier appel
    """
    try:
        service = get_service()
        service['preload_model']()
        return jsonify({
            'success': True,
            'message': 'Modèle en cours de chargement'
        })
    except Exception as e:
        return jsonify({
            'success': False,
            'error': str(e)
        }), 500
@ui_detection_bp.route('/status', methods=['GET'])
@cross_origin()
 def get_status():
    """
    Retourne le statut du service de détection
    """
    try:
        from services.ui_detection_service import _model, MODEL_PATH
        import os
        model_exists = os.path.exists(MODEL_PATH)
        model_loaded = _model is not None
        return jsonify({
            'success': True,
            'status': {
                'model_path': MODEL_PATH,
                'model_exists': model_exists,
                'model_loaded': model_loaded,
                'model_name': 'UI-DETR-1',
                'default_threshold': 0.35
            }
        })
    except Exception as e:
        return jsonify({
            'success': False,
            'error': str(e)
        }), 500
@ui_detection_bp.route('/find-element', methods=['POST'])
@cross_origin()
 def find_element():
    """
    Trouve un élément spécifique dans l'image en utilisant une ancre de référence
    Request body (JSON):
        - image_base64: Screenshot actuel
        - anchor_base64: Image de l'ancre à trouver
        - threshold: Seuil de confiance (optionnel)
    Response:
        - success: bool
        - result: {
            found: bool,
            element: {...} ou null,
            all_elements: [...],
            match_score: float
        }
    Note: Cette fonction utilise la détection + comparaison d'embedding CLIP
    """
    try:
        data = request.get_json()
        if not data or 'image_base64' not in data:
            return jsonify({
                'success': False,
                'error': 'image_base64 est requis'
            }), 400
        image_base64 = data['image_base64']
        anchor_base64 = data.get('anchor_base64')
        threshold = data.get('threshold', 0.35)
        service = get_service()
        # Détecter tous les éléments
        result = service['detect_from_base64'](image_base64, threshold)
        response = {
            'found': False,
            'element': None,
            'all_elements': [e.to_dict() for e in result.elements],
            'count': len(result.elements),
            'match_score': 0.0
        }
        # Si une ancre est fournie, essayer de la matcher
        if anchor_base64 and len(result.elements) > 0:
            # TODO: Intégrer CLIP pour le matching d'ancre
            # Pour l'instant, retourner le premier élément comme placeholder
            response['found'] = True
            response['element'] = result.elements[0].to_dict()
            response['match_score'] = 0.5  # Placeholder
        return jsonify({
            'success': True,
            'result': response
        })
    except Exception as e:
        import traceback
        traceback.print_exc()
        return jsonify({
            'success': False,
            'error': str(e)
        }), 500
--- a/visual_workflow_builder/backend/app.py
+++ b/visual_workflow_builder/backend/app.py
@@ -39,10 +39,10 @@ socketio = SocketIO(
    engineio_logger=True
 )
-# Enable CORS
+# Enable CORS - autoriser tous les ports locaux en développement
 CORS(app, resources={
    r"/api/*": {
-        "origins": os.getenv('CORS_ORIGINS', 'http://localhost:3000').split(','),
+        "origins": os.getenv('CORS_ORIGINS', 'http://localhost:3000,http://localhost:3001,http://localhost:3002,http://localhost:3003,http://localhost:3004,http://localhost:5173').split(','),
        "methods": ["GET", "POST", "PUT", "DELETE", "OPTIONS"],
        "allow_headers": ["Content-Type", "Authorization"]
    }
@@ -150,6 +150,14 @@ try:
 except ImportError as e:
    print(f"⚠️  Blueprint anchor_images désactivé: {e}")
 # API UI Detection - UI-DETR-1
 try:
    from api.ui_detection import ui_detection_bp
    app.register_blueprint(ui_detection_bp)
    print("✅ Blueprint ui_detection (UI-DETR-1) enregistré - /api/ui-detection/*")
 except ImportError as e:
    print(f"⚠️  Blueprint ui_detection désactivé: {e}")
 # ============================================================
 # API V3 - Thin Client Architecture (Source de Vérité Unique)
 # ============================================================
--- a/visual_workflow_builder/backend/services/ui_detection_service.py
+++ b/visual_workflow_builder/backend/services/ui_detection_service.py
@@ -0,0 +1,298 @@
 """
 Service de détection UI utilisant UI-DETR-1
 Détecte les éléments d'interface utilisateur dans un screenshot
 """
 import os
 import time
 import base64
 import io
 from typing import List, Dict, Any, Optional
 from dataclasses import dataclass
 import numpy as np
 from PIL import Image
 # Configuration du modèle
 MODEL_PATH = "/home/dom/ai/rpa_vision_v3/models/ui-detr-1/model.pth"
 CONFIDENCE_THRESHOLD = 0.35
 RESOLUTION = 1600
 # Instance globale du modèle (lazy loading)
 _model = None
 _model_loading = False
@dataclass
 class UIElement:
    """Élément UI détecté"""
    id: int
    bbox: Dict[str, int]  # x1, y1, x2, y2
    center: Dict[str, int]  # x, y
    confidence: float
    area: int
    def to_dict(self) -> Dict[str, Any]:
        return {
            "id": self.id,
            "bbox": self.bbox,
            "center": self.center,
            "confidence": round(self.confidence, 3),
            "area": self.area
        }
@dataclass
 class DetectionResult:
    """Résultat de détection"""
    elements: List[UIElement]
    processing_time_ms: float
    image_size: Dict[str, int]
    model_name: str = "UI-DETR-1"
    def to_dict(self) -> Dict[str, Any]:
        return {
            "elements": [e.to_dict() for e in self.elements],
            "count": len(self.elements),
            "processing_time_ms": round(self.processing_time_ms, 1),
            "image_size": self.image_size,
            "model": self.model_name
        }
 def load_model():
    """Charge le modèle UI-DETR-1 (lazy loading)"""
    global _model, _model_loading
    if _model is not None:
        return _model
    if _model_loading:
        # Attendre que le chargement soit terminé
        while _model_loading and _model is None:
            time.sleep(0.1)
        return _model
    _model_loading = True
    try:
        print(f"[UI-DETR-1] Chargement du modèle depuis {MODEL_PATH}...")
        start = time.time()
        from rfdetr.detr import RFDETRMedium
        if not os.path.exists(MODEL_PATH):
            raise FileNotFoundError(f"Modèle non trouvé: {MODEL_PATH}")
        _model = RFDETRMedium(pretrain_weights=MODEL_PATH, resolution=RESOLUTION)
        elapsed = time.time() - start
        print(f"[UI-DETR-1] Modèle chargé en {elapsed:.1f}s")
        return _model
    except Exception as e:
        print(f"[UI-DETR-1] Erreur chargement modèle: {e}")
        _model_loading = False
        raise
    finally:
        _model_loading = False
 def detect_ui_elements(
    image: Image.Image,
    threshold: float = CONFIDENCE_THRESHOLD
 ) -> DetectionResult:
    """
    Détecte les éléments UI dans une image
    Args:
        image: Image PIL
        threshold: Seuil de confiance (0-1)
    Returns:
        DetectionResult avec la liste des éléments détectés
    """
    start_time = time.time()
    # Charger le modèle
    model = load_model()
    # Convertir en numpy array RGB
    image_np = np.array(image.convert('RGB'))
    # Exécuter la détection
    detections = model.predict(image_np, threshold=threshold)
    # Parser les résultats
    elements = []
    boxes = detections.xyxy  # [x1, y1, x2, y2]
    scores = detections.confidence
    for i, (box, score) in enumerate(zip(boxes, scores)):
        x1, y1, x2, y2 = map(int, box)
        element = UIElement(
            id=i,
            bbox={"x1": x1, "y1": y1, "x2": x2, "y2": y2},
            center={"x": (x1 + x2) // 2, "y": (y1 + y2) // 2},
            confidence=float(score),
            area=(x2 - x1) * (y2 - y1)
        )
        elements.append(element)
    # Trier par position (haut-gauche vers bas-droite)
    elements.sort(key=lambda e: (e.bbox["y1"], e.bbox["x1"]))
    # Réassigner les IDs après tri
    for i, elem in enumerate(elements):
        elem.id = i
    processing_time = (time.time() - start_time) * 1000
    return DetectionResult(
        elements=elements,
        processing_time_ms=processing_time,
        image_size={"width": image.width, "height": image.height}
    )
 def detect_from_base64(
    image_base64: str,
    threshold: float = CONFIDENCE_THRESHOLD
 ) -> DetectionResult:
    """
    Détecte les éléments UI depuis une image base64
    Args:
        image_base64: Image encodée en base64 (avec ou sans préfixe data:image/...)
        threshold: Seuil de confiance
    Returns:
        DetectionResult
    """
    # Retirer le préfixe data:image/... si présent
    if ',' in image_base64:
        image_base64 = image_base64.split(',')[1]
    # Décoder
    image_bytes = base64.b64decode(image_base64)
    image = Image.open(io.BytesIO(image_bytes))
    return detect_ui_elements(image, threshold)
 def detect_from_file(
    file_path: str,
    threshold: float = CONFIDENCE_THRESHOLD
 ) -> DetectionResult:
    """
    Détecte les éléments UI depuis un fichier image
    Args:
        file_path: Chemin vers l'image
        threshold: Seuil de confiance
    Returns:
        DetectionResult
    """
    image = Image.open(file_path)
    return detect_ui_elements(image, threshold)
 def create_annotated_image(
    image: Image.Image,
    detection_result: DetectionResult,
    show_ids: bool = True,
    show_confidence: bool = False
 ) -> Image.Image:
    """
    Crée une image annotée avec les bboxes et IDs
    Args:
        image: Image originale
        detection_result: Résultat de détection
        show_ids: Afficher les numéros d'ID
        show_confidence: Afficher les scores de confiance
    Returns:
        Image annotée
    """
    from PIL import ImageDraw, ImageFont
    # Copier l'image
    annotated = image.copy()
    draw = ImageDraw.Draw(annotated)
    # Essayer de charger une police, sinon utiliser la police par défaut
    try:
        font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 14)
        small_font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 10)
    except:
        font = ImageFont.load_default()
        small_font = font
    # Couleurs pour les bboxes
    bbox_color = (233, 69, 96)  # Rouge/rose
    text_bg_color = (233, 69, 96)
    text_color = (255, 255, 255)
    for elem in detection_result.elements:
        bbox = elem.bbox
        x1, y1, x2, y2 = bbox["x1"], bbox["y1"], bbox["x2"], bbox["y2"]
        # Dessiner la bbox
        draw.rectangle([x1, y1, x2, y2], outline=bbox_color, width=2)
        if show_ids:
            # Texte à afficher
            label = str(elem.id)
            if show_confidence:
                label += f" ({elem.confidence:.0%})"
            # Mesurer le texte
            text_bbox = draw.textbbox((0, 0), label, font=font)
            text_width = text_bbox[2] - text_bbox[0]
            text_height = text_bbox[3] - text_bbox[1]
            # Position du label (en haut à gauche de la bbox)
            label_x = x1
            label_y = y1 - text_height - 4
            if label_y < 0:
                label_y = y1 + 2
            # Fond du label
            draw.rectangle(
                [label_x - 2, label_y - 2, label_x + text_width + 4, label_y + text_height + 2],
                fill=text_bg_color
            )
            # Texte du label
            draw.text((label_x, label_y), label, fill=text_color, font=font)
    return annotated
 def annotated_image_to_base64(
    image: Image.Image,
    detection_result: DetectionResult,
    show_ids: bool = True,
    show_confidence: bool = False
 ) -> str:
    """
    Crée une image annotée et la retourne en base64
    """
    annotated = create_annotated_image(image, detection_result, show_ids, show_confidence)
    buffer = io.BytesIO()
    annotated.save(buffer, format='PNG')
    buffer.seek(0)
    return base64.b64encode(buffer.read()).decode('utf-8')
 # Préchargement optionnel
 def preload_model():
    """Précharge le modèle en arrière-plan"""
    import threading
    thread = threading.Thread(target=load_model, daemon=True)
    thread.start()
--- a/visual_workflow_builder/frontend_v4/src/App.tsx
+++ b/visual_workflow_builder/frontend_v4/src/App.tsx
@@ -11,14 +11,15 @@ import type { Node, Edge, NodeTypes } from '@xyflow/react';
 import '@xyflow/react/dist/style.css';
 import * as api from './services/api';
-import type { AppState, Step, ActionType, Capture } from './types';
+import type { AppState, Step, ActionType, Capture, ExecutionMode } from './types';
-import { ACTIONS } from './types';
+import { ACTIONS, EXECUTION_MODES } from './types';
 import StepNode from './components/StepNode';
 import ToolPalette from './components/ToolPalette';
 import PropertiesPanel from './components/PropertiesPanel';
 import CapturePanel from './components/CapturePanel';
 import WorkflowList from './components/WorkflowList';
 import ExecutionControls from './components/ExecutionControls';
 import ExecutionModeToggle from './components/ExecutionModeToggle';
 const nodeTypes: NodeTypes = {
  step: StepNode,
@@ -30,6 +31,7 @@ function App() {
  const [edges, setEdges, onEdgesChange] = useEdgesState<Edge>([]);
  const [capture, setCapture] = useState<Capture | null>(null);
  const [error, setError] = useState<string | null>(null);
  const [executionMode, setExecutionMode] = useState<ExecutionMode>('basic');
  // Charger l'état initial
  const loadState = useCallback(async () => {
@@ -229,6 +231,10 @@ function App() {
      {/* Header */}
      <header className="header">
        <h1>VWB - Visual Workflow Builder</h1>
        <ExecutionModeToggle
          mode={executionMode}
          onChange={setExecutionMode}
        />
        <ExecutionControls
          execution={appState?.execution || null}
          onStart={handleStartExecution}
@@ -292,9 +298,16 @@ function App() {
            onCapture={handleCapture}
            onSelectAnchor={handleSelectAnchor}
            hasSelectedStep={!!appState?.session.selected_step_id}
            executionMode={executionMode}
          />
        </aside>
      </div>
      {/* Indicateur de mode flottant */}
      <div className={`mode-indicator ${executionMode}`}>
        <span>{EXECUTION_MODES[executionMode].icon}</span>
        <span>Mode {EXECUTION_MODES[executionMode].label}</span>
      </div>
    </div>
  );
 }
--- a/visual_workflow_builder/frontend_v4/src/components/CapturePanel.tsx
+++ b/visual_workflow_builder/frontend_v4/src/components/CapturePanel.tsx
@@ -1,11 +1,15 @@
 import { useState, useRef, useEffect } from 'react';
-import type { Capture } from '../types';
+import type { Capture, ExecutionMode } from '../types';
 import DetectionOverlay from './DetectionOverlay';
 import type { UIElement, DetectionResult } from '../services/uiDetection';
 interface Props {
  capture: Capture | null;
  onCapture: () => void;
  onSelectAnchor: (bbox: { x: number; y: number; width: number; height: number }, screenshotBase64?: string) => void;
  hasSelectedStep: boolean;
  executionMode?: ExecutionMode;
  onDetectionComplete?: (result: DetectionResult) => void;
 }
 interface LibraryItem {
@@ -14,12 +18,42 @@ interface LibraryItem {
  timestamp: Date;
 }
-export default function CapturePanel({ capture, onCapture, onSelectAnchor, hasSelectedStep }: Props) {
+export default function CapturePanel({
  capture,
  onCapture,
  onSelectAnchor,
  hasSelectedStep,
  executionMode = 'basic',
  onDetectionComplete
 }: Props) {
  const [isFullscreen, setIsFullscreen] = useState(false);
  const [library, setLibrary] = useState<LibraryItem[]>([]);
  const [currentCapture, setCurrentCapture] = useState<Capture | null>(null);
  const [timerSeconds, setTimerSeconds] = useState(0);
  const [countdown, setCountdown] = useState<number | null>(null);
  const [lastDetection, setLastDetection] = useState<DetectionResult | null>(null);
  const isDebugMode = executionMode === 'debug';
  const handleDetectionComplete = (result: DetectionResult) => {
    setLastDetection(result);
    if (onDetectionComplete) {
      onDetectionComplete(result);
    }
  };
  const handleElementClick = (element: UIElement) => {
    // En mode debug, cliquer sur un élément détecté le sélectionne comme ancre
    if (hasSelectedStep && currentCapture) {
      const bbox = {
        x: element.bbox.x1,
        y: element.bbox.y1,
        width: element.bbox.x2 - element.bbox.x1,
        height: element.bbox.y2 - element.bbox.y1,
      };
      onSelectAnchor(bbox, currentCapture.screenshot_base64);
    }
  };
  // Charger la bibliothèque depuis sessionStorage
  useEffect(() => {
@@ -99,13 +133,26 @@ export default function CapturePanel({ capture, onCapture, onSelectAnchor, hasSe
      {/* Aperçu de la capture */}
      {currentCapture && (
        <div className="capture-preview">
-          <img
+          {isDebugMode ? (
-            src={`data:image/png;base64,${currentCapture.screenshot_base64}`}
+            <DetectionOverlay
-            alt="Capture"
+              imageBase64={`data:image/png;base64,${currentCapture.screenshot_base64}`}
-            onClick={() => setIsFullscreen(true)}
+              enabled={true}
-          />
+              threshold={0.35}
              onDetectionComplete={handleDetectionComplete}
              onElementClick={handleElementClick}
            />
          ) : (
            <img
              src={`data:image/png;base64,${currentCapture.screenshot_base64}`}
              alt="Capture"
              onClick={() => setIsFullscreen(true)}
            />
          )}
          <p className="capture-info">
            {currentCapture.width}x{currentCapture.height}
            {isDebugMode && lastDetection && (
              <span className="detection-summary"> | {lastDetection.count} éléments détectés</span>
            )}
            <button onClick={() => setIsFullscreen(true)}>Plein écran</button>
          </p>
        </div>
@@ -147,6 +194,7 @@ export default function CapturePanel({ capture, onCapture, onSelectAnchor, hasSe
            setIsFullscreen(false);
          }}
          enabled={hasSelectedStep}
          debugMode={isDebugMode}
        />
      )}
    </div>
@@ -158,18 +206,68 @@ function FullscreenSelector({
  capture,
  onClose,
  onSelect,
-  enabled
+  enabled,
  debugMode = false
 }: {
  capture: Capture;
  onClose: () => void;
  onSelect: (bbox: { x: number; y: number; width: number; height: number }) => void;
  enabled: boolean;
  debugMode?: boolean;
 }) {
  const imgRef = useRef<HTMLImageElement>(null);
  const overlayRef = useRef<HTMLDivElement>(null);
  const [isSelecting, setIsSelecting] = useState(false);
  const [startPos, setStartPos] = useState({ x: 0, y: 0 });
  const [selection, setSelection] = useState({ x: 0, y: 0, width: 0, height: 0 });
  const [detectedElements, setDetectedElements] = useState<UIElement[]>([]);
  const [isDetecting, setIsDetecting] = useState(false);
  const [imageScale, setImageScale] = useState({ x: 1, y: 1 });
  // Lancer la détection en mode Debug
  useEffect(() => {
    if (!debugMode) return;
    const runDetection = async () => {
      setIsDetecting(true);
      try {
        const { detectUIElements } = await import('../services/uiDetection');
        const result = await detectUIElements(
          `data:image/png;base64,${capture.screenshot_base64}`,
          { threshold: 0.35 }
        );
        setDetectedElements(result.elements);
      } catch (err) {
        console.error('Erreur détection:', err);
      } finally {
        setIsDetecting(false);
      }
    };
    runDetection();
  }, [debugMode, capture.screenshot_base64]);
  // Calculer le scale quand l'image est chargée
  const handleImageLoad = () => {
    if (imgRef.current) {
      setImageScale({
        x: imgRef.current.width / imgRef.current.naturalWidth,
        y: imgRef.current.height / imgRef.current.naturalHeight
      });
    }
  };
  // Cliquer sur un élément détecté
  const handleElementClick = (elem: UIElement) => {
    if (!enabled) return;
    const bbox = {
      x: elem.bbox.x1,
      y: elem.bbox.y1,
      width: elem.bbox.x2 - elem.bbox.x1,
      height: elem.bbox.y2 - elem.bbox.y1,
    };
    onSelect(bbox);
  };
  useEffect(() => {
    const handleKeyDown = (e: KeyboardEvent) => {
@@ -232,7 +330,11 @@ function FullscreenSelector({
  return (
    <div className="fullscreen-modal">
      <div className="fullscreen-header">
-        <span>{enabled ? 'Dessinez un rectangle pour sélectionner l\'ancre' : 'Sélectionnez d\'abord une étape'}</span>
+        <span>
          {debugMode && isDetecting && '🔍 Détection en cours... '}
          {debugMode && !isDetecting && `🎯 ${detectedElements.length} éléments détectés - `}
          {enabled ? 'Dessinez un rectangle ou cliquez sur un élément détecté' : 'Sélectionnez d\'abord une étape'}
        </span>
        <button onClick={onClose}>Fermer (Échap)</button>
      </div>
      <div
@@ -241,12 +343,55 @@ function FullscreenSelector({
        onMouseMove={handleMouseMove}
        onMouseUp={handleMouseUp}
      >
-        <img
+        {/* Conteneur relatif pour positionner les bboxes par rapport à l'image */}
-          ref={imgRef}
+        <div style={{ position: 'relative', display: 'inline-block' }}>
-          src={`data:image/png;base64,${capture.screenshot_base64}`}
+          <img
-          alt="Capture plein écran"
+            ref={imgRef}
-          draggable={false}
+            src={`data:image/png;base64,${capture.screenshot_base64}`}
-        />
+            alt="Capture plein écran"
            draggable={false}
            onLoad={handleImageLoad}
            style={{ display: 'block' }}
          />
          {/* Overlay des éléments détectés en mode Debug */}
          {debugMode && detectedElements.map((elem) => (
            <div
              key={elem.id}
              className="fullscreen-detection-bbox"
              style={{
                position: 'absolute',
                left: elem.bbox.x1 * imageScale.x,
                top: elem.bbox.y1 * imageScale.y,
                width: (elem.bbox.x2 - elem.bbox.x1) * imageScale.x,
                height: (elem.bbox.y2 - elem.bbox.y1) * imageScale.y,
                border: '2px solid #e94560',
                background: 'rgba(233, 69, 96, 0.15)',
                cursor: enabled ? 'pointer' : 'default',
                zIndex: 10,
              }}
              onClick={(e) => {
                e.stopPropagation();
                handleElementClick(elem);
              }}
              title={`ID: ${elem.id} | Confiance: ${(elem.confidence * 100).toFixed(0)}%`}
            >
              <span style={{
                position: 'absolute',
                top: -20,
                left: 0,
                background: '#e94560',
                color: 'white',
                padding: '2px 6px',
                borderRadius: '3px',
                fontSize: '12px',
                fontWeight: 'bold',
              }}>
                {elem.id}
              </span>
            </div>
          ))}
        </div>
        {(isSelecting || selection.width > 0) && (
          <div
            ref={overlayRef}
--- a/visual_workflow_builder/frontend_v4/src/components/DetectionOverlay.tsx
+++ b/visual_workflow_builder/frontend_v4/src/components/DetectionOverlay.tsx
@@ -0,0 +1,120 @@
 /**
 * Overlay de détection UI
 * Affiche les bboxes détectées par UI-DETR-1 sur un screenshot
 */
 import { useState, useEffect } from 'react';
 import type { UIElement, DetectionResult } from '../services/uiDetection';
 import { detectUIElements } from '../services/uiDetection';
 interface DetectionOverlayProps {
  imageBase64: string | null;
  enabled: boolean;
  threshold?: number;
  onDetectionComplete?: (result: DetectionResult) => void;
  onElementClick?: (element: UIElement) => void;
 }
 export default function DetectionOverlay({
  imageBase64,
  enabled,
  threshold = 0.35,
  onDetectionComplete,
  onElementClick,
 }: DetectionOverlayProps) {
  const [elements, setElements] = useState<UIElement[]>([]);
  const [imageSize, setImageSize] = useState<{ width: number; height: number } | null>(null);
  const [loading, setLoading] = useState(false);
  const [error, setError] = useState<string | null>(null);
  const [processingTime, setProcessingTime] = useState<number | null>(null);
  const [hoveredElement, setHoveredElement] = useState<number | null>(null);
  useEffect(() => {
    if (!enabled || !imageBase64) {
      setElements([]);
      setImageSize(null);
      return;
    }
    const runDetection = async () => {
      setLoading(true);
      setError(null);
      try {
        const result = await detectUIElements(imageBase64, {
          threshold,
          annotate: false,
        });
        setElements(result.elements);
        setImageSize(result.image_size);
        setProcessingTime(result.processing_time_ms);
        if (onDetectionComplete) {
          onDetectionComplete(result);
        }
      } catch (err) {
        setError((err as Error).message);
        setElements([]);
      } finally {
        setLoading(false);
      }
    };
    runDetection();
  }, [imageBase64, enabled, threshold]);
  if (!enabled || !imageBase64) {
    return null;
  }
  return (
    <div className="detection-overlay-container">
      {/* Image de fond */}
      <img
        src={imageBase64.startsWith('data:') ? imageBase64 : `data:image/png;base64,${imageBase64}`}
        alt="Screenshot"
        className="detection-image"
      />
      {/* Overlay des bboxes */}
      <div className="detection-bboxes">
        {elements.map((elem) => (
          <div
            key={elem.id}
            className={`detection-bbox ${hoveredElement === elem.id ? 'hovered' : ''}`}
            style={{
              left: elem.bbox.x1,
              top: elem.bbox.y1,
              width: elem.bbox.x2 - elem.bbox.x1,
              height: elem.bbox.y2 - elem.bbox.y1,
            }}
            onMouseEnter={() => setHoveredElement(elem.id)}
            onMouseLeave={() => setHoveredElement(null)}
            onClick={() => onElementClick?.(elem)}
            title={`ID: ${elem.id} | Confiance: ${(elem.confidence * 100).toFixed(0)}%`}
          >
            <span className="detection-id">{elem.id}</span>
          </div>
        ))}
      </div>
      {/* Barre d'info */}
      <div className="detection-info-bar">
        {loading ? (
          <span className="detection-loading">🔍 Détection en cours...</span>
        ) : error ? (
          <span className="detection-error">❌ {error}</span>
        ) : (
          <>
            <span className="detection-count">🎯 {elements.length} éléments</span>
            {processingTime && (
              <span className="detection-time">⏱️ {processingTime.toFixed(0)}ms</span>
            )}
            <span className="detection-model">🧠 UI-DETR-1</span>
          </>
        )}
      </div>
    </div>
  );
 }
--- a/visual_workflow_builder/frontend_v4/src/components/ExecutionModeToggle.tsx
+++ b/visual_workflow_builder/frontend_v4/src/components/ExecutionModeToggle.tsx
@@ -0,0 +1,33 @@
 import type { ExecutionMode } from '../types';
 import { EXECUTION_MODES } from '../types';
 interface ExecutionModeToggleProps {
  mode: ExecutionMode;
  onChange: (mode: ExecutionMode) => void;
 }
 export default function ExecutionModeToggle({ mode, onChange }: ExecutionModeToggleProps) {
  const modes: ExecutionMode[] = ['basic', 'intelligent', 'debug'];
  return (
    <div className="execution-mode-toggle">
      <span className="mode-label">Mode:</span>
      <div className="mode-buttons">
        {modes.map((m) => {
          const config = EXECUTION_MODES[m];
          return (
            <button
              key={m}
              className={`mode-btn ${mode === m ? 'active' : ''} mode-${m}`}
              onClick={() => onChange(m)}
              title={config.description}
            >
              <span className="mode-icon">{config.icon}</span>
              <span className="mode-text">{config.label}</span>
            </button>
          );
        })}
      </div>
    </div>
  );
 }
--- a/visual_workflow_builder/frontend_v4/src/services/uiDetection.ts
+++ b/visual_workflow_builder/frontend_v4/src/services/uiDetection.ts
@@ -0,0 +1,138 @@
 /**
 * Service de détection UI (UI-DETR-1)
 */
 const API_BASE = 'http://localhost:5001';
 export interface UIElement {
  id: number;
  bbox: {
    x1: number;
    y1: number;
    x2: number;
    y2: number;
  };
  center: {
    x: number;
    y: number;
  };
  confidence: number;
  area: number;
 }
 export interface DetectionResult {
  elements: UIElement[];
  count: number;
  processing_time_ms: number;
  image_size: {
    width: number;
    height: number;
  };
  model: string;
  annotated_image_base64?: string;
 }
 export interface DetectionOptions {
  threshold?: number;
  annotate?: boolean;
  showConfidence?: boolean;
 }
 /**
 * Détecte les éléments UI dans une image
 */
 export async function detectUIElements(
  imageBase64: string,
  options: DetectionOptions = {}
 ): Promise<DetectionResult> {
  const response = await fetch(`${API_BASE}/api/ui-detection/detect`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      image_base64: imageBase64,
      threshold: options.threshold ?? 0.35,
      annotate: options.annotate ?? false,
      show_confidence: options.showConfidence ?? false,
    }),
  });
  const data = await response.json();
  if (!data.success) {
    throw new Error(data.error || 'Erreur de détection');
  }
  return data.result;
 }
 /**
 * Précharge le modèle UI-DETR-1
 */
 export async function preloadModel(): Promise<void> {
  const response = await fetch(`${API_BASE}/api/ui-detection/preload`, {
    method: 'POST',
  });
  const data = await response.json();
  if (!data.success) {
    throw new Error(data.error || 'Erreur de préchargement');
  }
 }
 /**
 * Récupère le statut du service de détection
 */
 export async function getDetectionStatus(): Promise<{
  model_path: string;
  model_exists: boolean;
  model_loaded: boolean;
  model_name: string;
  default_threshold: number;
 }> {
  const response = await fetch(`${API_BASE}/api/ui-detection/status`);
  const data = await response.json();
  if (!data.success) {
    throw new Error(data.error || 'Erreur de statut');
  }
  return data.status;
 }
 /**
 * Trouve un élément spécifique en utilisant une ancre de référence
 */
 export async function findElement(
  imageBase64: string,
  anchorBase64?: string,
  threshold?: number
 ): Promise<{
  found: boolean;
  element: UIElement | null;
  all_elements: UIElement[];
  count: number;
  match_score: number;
 }> {
  const response = await fetch(`${API_BASE}/api/ui-detection/find-element`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      image_base64: imageBase64,
      anchor_base64: anchorBase64,
      threshold: threshold ?? 0.35,
    }),
  });
  const data = await response.json();
  if (!data.success) {
    throw new Error(data.error || 'Erreur de recherche');
  }
  return data.result;
 }
--- a/visual_workflow_builder/frontend_v4/src/styles.css
+++ b/visual_workflow_builder/frontend_v4/src/styles.css
@@ -646,6 +646,70 @@ body {
  pointer-events: none;
 }
 /* Execution Mode Toggle */
 .execution-mode-toggle {
  display: flex;
  align-items: center;
  gap: 0.75rem;
  padding: 0.25rem;
  background: #0f3460;
  border-radius: 8px;
 }
 .mode-label {
  font-size: 0.8rem;
  color: #888;
  padding-left: 0.5rem;
 }
 .mode-buttons {
  display: flex;
  gap: 2px;
 }
 .mode-btn {
  display: flex;
  align-items: center;
  gap: 0.35rem;
  padding: 0.4rem 0.65rem;
  background: transparent;
  border: none;
  color: #888;
  border-radius: 6px;
  cursor: pointer;
  transition: all 0.15s;
  font-size: 0.8rem;
 }
 .mode-btn:hover {
  background: rgba(255, 255, 255, 0.1);
  color: #ccc;
 }
 .mode-btn.active {
  color: white;
 }
 .mode-btn.active.mode-basic {
  background: #4caf50;
 }
 .mode-btn.active.mode-intelligent {
  background: #e94560;
 }
 .mode-btn.active.mode-debug {
  background: #ff9800;
 }
 .mode-icon {
  font-size: 1rem;
 }
 .mode-text {
  font-weight: 500;
 }
 /* Execution Controls */
 .execution-controls {
  display: flex;
@@ -740,3 +804,121 @@ body {
 .react-flow__background {
  background: #1a1a2e;
 }
 /* Detection Overlay */
 .detection-overlay-container {
  position: relative;
  width: 100%;
  overflow: hidden;
 }
 .detection-image {
  width: 100%;
  display: block;
  border-radius: 4px;
 }
 .detection-bboxes {
  position: absolute;
  top: 0;
  left: 0;
  width: 100%;
  height: 100%;
  pointer-events: none;
 }
 .detection-bbox {
  position: absolute;
  border: 2px solid #e94560;
  background: rgba(233, 69, 96, 0.1);
  pointer-events: auto;
  cursor: pointer;
  transition: all 0.15s;
 }
 .detection-bbox:hover,
 .detection-bbox.hovered {
  border-color: #4caf50;
  background: rgba(76, 175, 80, 0.2);
  z-index: 10;
 }
 .detection-id {
  position: absolute;
  top: -18px;
  left: -2px;
  background: #e94560;
  color: white;
  font-size: 10px;
  font-weight: bold;
  padding: 2px 5px;
  border-radius: 3px;
  min-width: 16px;
  text-align: center;
 }
 .detection-bbox:hover .detection-id,
 .detection-bbox.hovered .detection-id {
  background: #4caf50;
 }
 .detection-info-bar {
  display: flex;
  justify-content: space-between;
  align-items: center;
  padding: 0.5rem;
  background: #0f3460;
  border-radius: 0 0 4px 4px;
  font-size: 0.75rem;
  gap: 0.5rem;
 }
 .detection-count {
  color: #4caf50;
 }
 .detection-time {
  color: #888;
 }
 .detection-model {
  color: #e94560;
 }
 .detection-loading {
  color: #ff9800;
 }
 .detection-error {
  color: #e94560;
 }
 /* Mode indicator */
 .mode-indicator {
  position: fixed;
  bottom: 1rem;
  right: 1rem;
  padding: 0.5rem 1rem;
  border-radius: 8px;
  font-size: 0.85rem;
  font-weight: 500;
  z-index: 100;
  display: flex;
  align-items: center;
  gap: 0.5rem;
 }
 .mode-indicator.basic {
  background: rgba(76, 175, 80, 0.9);
  color: white;
 }
 .mode-indicator.intelligent {
  background: rgba(233, 69, 96, 0.9);
  color: white;
 }
 .mode-indicator.debug {
  background: rgba(255, 152, 0, 0.9);
  color: white;
 }
--- a/visual_workflow_builder/frontend_v4/src/types.ts
+++ b/visual_workflow_builder/frontend_v4/src/types.ts
@@ -1,5 +1,26 @@
 // Types pour l'API v3
 // Mode d'exécution
 export type ExecutionMode = 'basic' | 'intelligent' | 'debug';
 export const EXECUTION_MODES: Record<ExecutionMode, { label: string; icon: string; description: string }> = {
  basic: {
    label: 'Basique',
    icon: '⚡',
    description: 'Coordonnées fixes, rapide et prévisible'
  },
  intelligent: {
    label: 'Intelligent',
    icon: '🧠',
    description: 'Vision IA, adaptatif, self-healing'
  },
  debug: {
    label: 'Debug',
    icon: '🔍',
    description: 'Intelligent + overlay détection'
  }
 };
 export type ActionType =
  | 'click_anchor'
  | 'double_click_anchor'