Validé sur PC Windows (DESKTOP-58D5CAC, 2560x1600) : - 8 clics résolus visuellement (1 anchor_template, 1 som_text_match, 6 som_vlm) - Score moyen 0.75, temps moyen 1.6s - Texte tapé correctement (bonjour, test word, date, email) - 0 retries, 2 actions non vérifiées (OK) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1.6 KiB
1.6 KiB
Product Overview
RPA Vision V3 is a 100% vision-based workflow automation system that learns from user interactions and automates repetitive tasks through semantic understanding of user interfaces.
Core Concept
Unlike traditional RPA systems that rely on fixed coordinates, RPA Vision V3 uses:
- Semantic UI understanding through computer vision and VLM models
- Multi-modal embeddings combining screenshots, text, and UI elements
- Progressive learning from observation to autonomous execution
- Robust matching that adapts to UI changes
Key Features
- Agent V0: Cross-platform capture tool for recording user sessions
- Hybrid Detection: Combines OpenCV, CLIP embeddings, and VLM models
- Visual Workflow Builder: Web-based interface for creating and editing workflows
- Self-Healing: Automatic adaptation when UI elements change
- Analytics System: Performance monitoring and insights
- Multi-modal Fusion: Combines visual, textual, and spatial information
Architecture Layers
- RawSession (Layer 0): Raw event capture (clicks, keystrokes, screenshots)
- ScreenState (Layer 1): Multi-modal analysis of screen content
- UIElement Detection (Layer 2): Semantic detection of interface elements
- State Embedding (Layer 3): Vector representation for similarity matching
- Workflow Graph (Layer 4): Executable workflow representation
Learning Progression
- OBSERVATION: 5+ executions to learn patterns
- COACHING: 10+ assisted executions with >90% success
- AUTO_CANDIDATE: 20+ executions with >95% success rate
- AUTO_CONFIRMED: User-validated autonomous execution