rpa_vision_v3/QUICK_START.md

# Quick Start - Détection UI Hybride

## Installation

### 1. Installer Ollama

```bash
# Linux
curl -fsSL https://ollama.ai/install.sh | sh

# macOS
brew install ollama
```

### 2. Démarrer Ollama

```bash
ollama serve
```

### 3. Télécharger le modèle VLM

```bash
# Modèle par défaut du projet (voir .env.example)
ollama pull gemma4:latest

# Alternatives supportées
# ollama pull qwen3-vl:8b
# ollama pull 0000/ui-tars-1.5-7b-q8_0:7b   # grounder visuel
```

## Utilisation

### Test Rapide

```bash
./rpa_vision_v3/test_quick.sh
```

### Utilisation Programmatique

```python
from rpa_vision_v3.core.detection import create_detector

# Créer le détecteur
detector = create_detector()

# Détecter les éléments
elements = detector.detect("screenshot.png")

# Utiliser les résultats
for elem in elements:
    print(f"{elem.type:15s} | {elem.role:20s} | {elem.label}")
```

### Exemple Complet

```python
from rpa_vision_v3.core.detection import UIDetector, DetectionConfig

# Configuration personnalisée
config = DetectionConfig(
    vlm_model="qwen3-vl:8b",
    confidence_threshold=0.7,
    min_region_size=10,
    max_region_size=600,
    use_vlm_classification=True
)

# Créer le détecteur
detector = UIDetector(config)

# Détecter
elements = detector.detect("screenshot.png", window_context={
    "title": "My Application",
    "process": "myapp"
})

# Filtrer par type
buttons = [e for e in elements if e.type == "button"]
text_inputs = [e for e in elements if e.type == "text_input"]

print(f"Trouvé {len(buttons)} boutons et {len(text_inputs)} champs de texte")
```

## Tests Disponibles

```bash
# Test complet avec validation
python3 rpa_vision_v3/examples/test_complete_real.py

# Test hybride basique
python3 rpa_vision_v3/examples/test_hybrid_detection.py screenshot.png

# Test VLM simple
python3 rpa_vision_v3/examples/test_real_vlm_detection.py
```

## Performance

- **Détection OpenCV:** ~10ms
- **Classification VLM:** ~1-2s par élément
- **Total:** ~30-60s pour 20-50 éléments

## Types d'Éléments Détectés

- `button` - Boutons
- `text_input` - Champs de texte
- `checkbox` - Cases à cocher
- `radio` - Boutons radio
- `dropdown` - Listes déroulantes
- `tab` - Onglets
- `link` - Liens
- `icon` - Icônes
- `menu_item` - Éléments de menu

## Rôles Sémantiques

- `primary_action` - Action principale
- `cancel` - Annulation
- `submit` - Soumission
- `form_input` - Saisie de formulaire
- `search_field` - Champ de recherche
- `navigation` - Navigation
- `settings` - Paramètres
- `close` - Fermeture

## Troubleshooting

### Ollama non disponible

```bash
# Vérifier le service
systemctl status ollama  # Linux
brew services list  # macOS

# Redémarrer
ollama serve
```

### Modèle non trouvé

```bash
ollama list
ollama pull qwen3-vl:8b
```

### Détection lente

- Réduire `max_elements` dans la config
- Utiliser un modèle plus rapide (granite3.2-vision:2b)
- Augmenter `confidence_threshold` pour filtrer plus

### Peu d'éléments détectés

- Baisser `confidence_threshold` (ex: 0.5)
- Réduire `min_region_size` (ex: 10)
- Augmenter `max_region_size` (ex: 600)

## Documentation

- [Résumé d'implémentation](HYBRID_DETECTION_SUMMARY.md)
- [Intégration Ollama](docs/OLLAMA_INTEGRATION.md)
- [Architecture complète](docs/specs/design.md)

## Support

Pour plus d'aide, consultez les exemples dans `rpa_vision_v3/examples/`