# RPA Vision V3 - 100% Vision-Based Workflow Automation ## 📊 Status 🚀 **PRODUCTION-READY** - Phase 12 Complete (77% System Completion) ✅ **Latest Update**: 14 DĂ©cembre 2024 - ✅ **10/13 Phases ComplĂ©tĂ©es** - SystĂšme mature et fonctionnel - ✅ **Performance Exceptionnelle** - 500-6250x plus rapide que requis - ✅ **Architecture Entreprise** - 148k+ lignes, 19 modules, 6 specs complĂštes - ✅ **Innovations Techniques** - Self-healing, Multi-modal, GPU management - 📊 **Audit Complet** - [Rapport dĂ©taillĂ©](AUDIT_COMPLET_SYSTEME_RPA_VISION_V3.md) **Quick Test**: `bash test_clip.sh` ## 🎯 Vision RPA basĂ© sur la **comprĂ©hension sĂ©mantique** des interfaces, pas sur des coordonnĂ©es de clics. Le systĂšme apprend des workflows en observant l'utilisateur et les automatise de maniĂšre robuste grĂące Ă  une architecture en 5 couches. ## đŸ—ïž Architecture en 5 Couches ``` RawSession (Couche 0) ↓ ScreenState (Couche 1) - 4 niveaux d'abstraction ↓ UIElement Detection (Couche 2) - Types + RĂŽles sĂ©mantiques ↓ State Embedding (Couche 3) - Fusion multi-modale ↓ Workflow Graph (Couche 4) - Nodes + Edges + Learning States ``` ## 📁 Structure ``` rpa_vision_v3/ ├── core/ │ ├── models/ # Couches 0-4 : Structures de donnĂ©es │ ├── capture/ # Couche 0 : Capture Ă©vĂ©nements + screenshots │ ├── detection/ # Couche 2 : DĂ©tection UI sĂ©mantique │ ├── embedding/ # Couche 3 : Fusion multi-modale + FAISS │ ├── graph/ # Couche 4 : Construction + Matching + ExĂ©cution │ └── persistence/ # Sauvegarde/Chargement ├── data/ │ ├── sessions/ # RawSessions │ ├── screen_states/ # ScreenStates │ ├── embeddings/ # Vecteurs .npy │ ├── faiss_index/ # Index FAISS │ └── workflows/ # Workflow Graphs └── tests/ # Tests unitaires + intĂ©gration ``` ## 🚀 DĂ©marrage Rapide ### Installation ```bash # 1. Installer Ollama curl -fsSL https://ollama.ai/install.sh | sh # Linux # ou brew install ollama # macOS # 2. DĂ©marrer Ollama ollama serve # 3. TĂ©lĂ©charger le modĂšle VLM ollama pull qwen3-vl:8b # 4. Installer dĂ©pendances Python pip install -r requirements.txt ``` ### Test Rapide ```bash # Diagnostic systĂšme python3 rpa_vision_v3/examples/diagnostic_vlm.py # Test de dĂ©tection ./rpa_vision_v3/test_quick.sh ``` ### Utilisation - DĂ©tection UI ```python from rpa_vision_v3.core.detection import create_detector # CrĂ©er le dĂ©tecteur detector = create_detector() # DĂ©tecter les Ă©lĂ©ments UI elements = detector.detect("screenshot.png") # Utiliser les rĂ©sultats for elem in elements: print(f"{elem.type:15s} | {elem.role:20s} | {elem.label}") ``` ### Utilisation - Workflow (Phase 4 - À venir) ```python from rpa_vision_v3.core.models import RawSession, ScreenState, Workflow from rpa_vision_v3.core.graph import GraphBuilder, NodeMatcher # 1. Capturer une session session = RawSession(...) # ... capturer Ă©vĂ©nements et screenshots # 2. Construire workflow automatiquement builder = GraphBuilder(...) workflow = builder.build_from_session(session) # 3. Matcher Ă©tat actuel matcher = NodeMatcher(...) current_state = ScreenState(...) match = matcher.match(current_state, workflow) # 4. ExĂ©cuter action if match: edge = workflow.get_outgoing_edges(match.node.node_id)[0] executor.execute_edge(edge, current_state) ``` ## 📚 Documentation ### Guides Principaux - **Quick Start** : `QUICK_START.md` - DĂ©marrage rapide - **Prochaines Étapes** : `NEXT_STEPS.md` - Roadmap et Phase 4 - **Phase 3 ComplĂšte** : `PHASE3_COMPLETE.md` - RĂ©sumĂ© Phase 3 ### Documentation Technique - **Spec complĂšte** : `.kiro/specs/workflow-graph-implementation/` - **Architecture** : `docs/reference/ARCHITECTURE_VISION_COMPLETE.md` - **DĂ©tection Hybride** : `HYBRID_DETECTION_SUMMARY.md` - **IntĂ©gration Ollama** : `docs/OLLAMA_INTEGRATION.md` ## 🎓 Concepts ClĂ©s ### RPA 100% Vision - ❌ Pas de coordonnĂ©es (x, y) fixes - ✅ RĂŽles sĂ©mantiques (primary_action, form_input, etc.) - ✅ Matching par similaritĂ© visuelle et textuelle - ✅ Robuste aux changements d'UI ### Apprentissage Progressif ``` OBSERVATION (5+ exĂ©cutions) ↓ COACHING (10+ assistances, succĂšs >90%) ↓ AUTO_CANDIDATE (20+ exĂ©cutions, succĂšs >95%) ↓ AUTO_CONFIRMÉ (validation utilisateur) ``` ### State Embedding Fusion multi-modale : - 50% Image (screenshot complet) - 30% Texte (texte dĂ©tectĂ©) - 10% Titre (fenĂȘtre) - 10% UI (Ă©lĂ©ments dĂ©tectĂ©s) ## đŸ§Ș Tests ```bash # Tests unitaires pytest tests/unit/ # Tests d'intĂ©gration pytest tests/integration/ # Tests de performance pytest tests/performance/ --benchmark-only ``` ## 📈 Roadmap - 77% ComplĂ©tĂ© (10/13 Phases) ### ✅ **Phases ComplĂ©tĂ©es** - [x] **Phase 1-2** : Fondations + Embeddings FAISS ✅ - [x] **Phase 4-6** : DĂ©tection UI + Workflow Graphs + Action Execution ✅ - [x] **Phase 7-8** : Learning System + Training System ✅ - [x] **Phase 10-12** : GPU Management + Performance + Monitoring ✅ ### 🎯 **Phases Restantes** - [ ] **Phase 3** : Checkpoint Final (tests storage) - [ ] **Phase 9** : Visual Workflow Builder (90% → 100%) - [ ] **Phase 13** : Tests End-to-End + Documentation finale ### 🚀 **Composants Production-Ready** - **Agent V0** : Capture cross-platform + Encryption ✅ - **Server API** : Processing pipeline + Web dashboard ✅ - **Analytics System** : Monitoring + Insights + Reporting ✅ - **Self-Healing** : Automatic adaptation + Recovery ✅ ## đŸ€ Contribution Voir `.kiro/specs/workflow-graph-implementation/tasks.md` pour les tĂąches en cours. ## 📄 Licence PropriĂ©taire - Tous droits rĂ©servĂ©s