# Product Overview RPA Vision V3 is a 100% vision-based workflow automation system that learns from user interactions and automates repetitive tasks through semantic understanding of user interfaces. ## Core Concept Unlike traditional RPA systems that rely on fixed coordinates, RPA Vision V3 uses: - **Semantic UI understanding** through computer vision and VLM models - **Multi-modal embeddings** combining screenshots, text, and UI elements - **Progressive learning** from observation to autonomous execution - **Robust matching** that adapts to UI changes ## Key Features - **Agent V0**: Cross-platform capture tool for recording user sessions - **Hybrid Detection**: Combines OpenCV, CLIP embeddings, and VLM models - **Visual Workflow Builder**: Web-based interface for creating and editing workflows - **Self-Healing**: Automatic adaptation when UI elements change - **Analytics System**: Performance monitoring and insights - **Multi-modal Fusion**: Combines visual, textual, and spatial information ## Architecture Layers 1. **RawSession (Layer 0)**: Raw event capture (clicks, keystrokes, screenshots) 2. **ScreenState (Layer 1)**: Multi-modal analysis of screen content 3. **UIElement Detection (Layer 2)**: Semantic detection of interface elements 4. **State Embedding (Layer 3)**: Vector representation for similarity matching 5. **Workflow Graph (Layer 4)**: Executable workflow representation ## Learning Progression - **OBSERVATION**: 5+ executions to learn patterns - **COACHING**: 10+ assisted executions with >90% success - **AUTO_CANDIDATE**: 20+ executions with >95% success rate - **AUTO_CONFIRMED**: User-validated autonomous execution