# Product Overview

RPA Vision V3 is a 100% vision-based workflow automation system that learns from user interactions and automates repetitive tasks through semantic understanding of user interfaces.

## Core Concept

Unlike traditional RPA systems that rely on fixed coordinates, RPA Vision V3 uses:
- **Semantic UI understanding** through computer vision and VLM models
- **Multi-modal embeddings** combining screenshots, text, and UI elements
- **Progressive learning** from observation to autonomous execution
- **Robust matching** that adapts to UI changes

## Key Features

- **Agent V0**: Cross-platform capture tool for recording user sessions
- **Hybrid Detection**: Combines OpenCV, CLIP embeddings, and VLM models
- **Visual Workflow Builder**: Web-based interface for creating and editing workflows
- **Self-Healing**: Automatic adaptation when UI elements change
- **Analytics System**: Performance monitoring and insights
- **Multi-modal Fusion**: Combines visual, textual, and spatial information

## Architecture Layers

1. **RawSession (Layer 0)**: Raw event capture (clicks, keystrokes, screenshots)
2. **ScreenState (Layer 1)**: Multi-modal analysis of screen content
3. **UIElement Detection (Layer 2)**: Semantic detection of interface elements
4. **State Embedding (Layer 3)**: Vector representation for similarity matching
5. **Workflow Graph (Layer 4)**: Executable workflow representation

## Learning Progression

- **OBSERVATION**: 5+ executions to learn patterns
- **COACHING**: 10+ assisted executions with >90% success
- **AUTO_CANDIDATE**: 20+ executions with >95% success rate
- **AUTO_CONFIRMED**: User-validated autonomous execution