Files

Dom a27b74cf22 v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

- Frontend v4 accessible sur réseau local (192.168.1.40)
- Ports ouverts: 3002 (frontend), 5001 (backend), 5004 (dashboard)
- Ollama GPU fonctionnel
- Self-healing interactif
- Dashboard confiance

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-29 11:23:51 +01:00

3.2 KiB

Raw Blame History

Quick Start - Détection UI Hybride

Installation

1. Installer Ollama

# Linux
curl -fsSL https://ollama.ai/install.sh | sh

# macOS
brew install ollama

2. Démarrer Ollama

ollama serve

3. Télécharger le modèle VLM

ollama pull qwen3-vl:8b

Utilisation

Test Rapide

./rpa_vision_v3/test_quick.sh

Utilisation Programmatique

from rpa_vision_v3.core.detection import create_detector

# Créer le détecteur
detector = create_detector()

# Détecter les éléments
elements = detector.detect("screenshot.png")

# Utiliser les résultats
for elem in elements:
    print(f"{elem.type:15s} | {elem.role:20s} | {elem.label}")

Exemple Complet

from rpa_vision_v3.core.detection import UIDetector, DetectionConfig

# Configuration personnalisée
config = DetectionConfig(
    vlm_model="qwen3-vl:8b",
    confidence_threshold=0.7,
    min_region_size=10,
    max_region_size=600,
    use_vlm_classification=True
)

# Créer le détecteur
detector = UIDetector(config)

# Détecter
elements = detector.detect("screenshot.png", window_context={
    "title": "My Application",
    "process": "myapp"
})

# Filtrer par type
buttons = [e for e in elements if e.type == "button"]
text_inputs = [e for e in elements if e.type == "text_input"]

print(f"Trouvé {len(buttons)} boutons et {len(text_inputs)} champs de texte")

Tests Disponibles

# Test complet avec validation
python3 rpa_vision_v3/examples/test_complete_real.py

# Test hybride basique
python3 rpa_vision_v3/examples/test_hybrid_detection.py screenshot.png

# Test VLM simple
python3 rpa_vision_v3/examples/test_real_vlm_detection.py

Performance

Détection OpenCV: ~10ms
Classification VLM: ~1-2s par élément
Total: ~30-60s pour 20-50 éléments

Types d'Éléments Détectés

button - Boutons
text_input - Champs de texte
checkbox - Cases à cocher
radio - Boutons radio
dropdown - Listes déroulantes
tab - Onglets
link - Liens
icon - Icônes
menu_item - Éléments de menu

Rôles Sémantiques

primary_action - Action principale
cancel - Annulation
submit - Soumission
form_input - Saisie de formulaire
search_field - Champ de recherche
navigation - Navigation
settings - Paramètres
close - Fermeture

Troubleshooting

Ollama non disponible

# Vérifier le service
systemctl status ollama  # Linux
brew services list  # macOS

# Redémarrer
ollama serve

Modèle non trouvé

ollama list
ollama pull qwen3-vl:8b

Détection lente

Réduire max_elements dans la config
Utiliser un modèle plus rapide (granite3.2-vision:2b)
Augmenter confidence_threshold pour filtrer plus

Peu d'éléments détectés

Baisser confidence_threshold (ex: 0.5)
Réduire min_region_size (ex: 10)
Augmenter max_region_size (ex: 600)

Documentation

Support

Pour plus d'aide, consultez les exemples dans rpa_vision_v3/examples/

3.2 KiB Raw Blame History

Quick Start - Détection UI Hybride

Installation

1. Installer Ollama

2. Démarrer Ollama

3. Télécharger le modèle VLM

Utilisation

Test Rapide

Utilisation Programmatique

Exemple Complet

Tests Disponibles

Performance

Types d'Éléments Détectés

Rôles Sémantiques

Troubleshooting

Ollama non disponible

Modèle non trouvé

Détection lente

Peu d'éléments détectés

Documentation

Support

3.2 KiB

Raw Blame History