Files
rpa_vision_v3/.kiro/steering/product.md
Dom a7de6a488b feat: replay E2E fonctionnel — 25/25 actions, 0 retries, SomEngine via serveur
Validé sur PC Windows (DESKTOP-58D5CAC, 2560x1600) :
- 8 clics résolus visuellement (1 anchor_template, 1 som_text_match, 6 som_vlm)
- Score moyen 0.75, temps moyen 1.6s
- Texte tapé correctement (bonjour, test word, date, email)
- 0 retries, 2 actions non vérifiées (OK)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 14:04:41 +02:00

1.6 KiB

Product Overview

RPA Vision V3 is a 100% vision-based workflow automation system that learns from user interactions and automates repetitive tasks through semantic understanding of user interfaces.

Core Concept

Unlike traditional RPA systems that rely on fixed coordinates, RPA Vision V3 uses:

  • Semantic UI understanding through computer vision and VLM models
  • Multi-modal embeddings combining screenshots, text, and UI elements
  • Progressive learning from observation to autonomous execution
  • Robust matching that adapts to UI changes

Key Features

  • Agent V0: Cross-platform capture tool for recording user sessions
  • Hybrid Detection: Combines OpenCV, CLIP embeddings, and VLM models
  • Visual Workflow Builder: Web-based interface for creating and editing workflows
  • Self-Healing: Automatic adaptation when UI elements change
  • Analytics System: Performance monitoring and insights
  • Multi-modal Fusion: Combines visual, textual, and spatial information

Architecture Layers

  1. RawSession (Layer 0): Raw event capture (clicks, keystrokes, screenshots)
  2. ScreenState (Layer 1): Multi-modal analysis of screen content
  3. UIElement Detection (Layer 2): Semantic detection of interface elements
  4. State Embedding (Layer 3): Vector representation for similarity matching
  5. Workflow Graph (Layer 4): Executable workflow representation

Learning Progression

  • OBSERVATION: 5+ executions to learn patterns
  • COACHING: 10+ assisted executions with >90% success
  • AUTO_CANDIDATE: 20+ executions with >95% success rate
  • AUTO_CONFIRMED: User-validated autonomous execution