Files
rpa_vision_v3/.kiro/specs/rpa-vision-excellence/requirements.md
Dom a7de6a488b feat: replay E2E fonctionnel — 25/25 actions, 0 retries, SomEngine via serveur
Validé sur PC Windows (DESKTOP-58D5CAC, 2560x1600) :
- 8 clics résolus visuellement (1 anchor_template, 1 som_text_match, 6 som_vlm)
- Score moyen 0.75, temps moyen 1.6s
- Texte tapé correctement (bonjour, test word, date, email)
- 0 retries, 2 actions non vérifiées (OK)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 14:04:41 +02:00

8.4 KiB

Requirements Document

Introduction

Ce document définit les exigences pour transformer RPA Vision V3 en un système RPA 100% Vision de niveau production. L'objectif est d'améliorer la fiabilité de l'entraînement, la robustesse du matching, et la capacité d'adaptation continue aux changements d'interface utilisateur.

Le système actuel présente des gaps critiques : pas de validation de qualité d'entraînement, matching trop simpliste basé uniquement sur similarité globale, pas d'apprentissage continu, et gestion insuffisante des variantes d'écran.

Glossary

  • Training_Quality_Validator: Composant qui évalue la qualité des workflows générés à partir des sessions d'entraînement
  • Hierarchical_Matcher: Système de matching multi-niveau (fenêtre → région → élément)
  • Continuous_Learner: Module d'apprentissage continu qui adapte les workflows aux changements
  • Variant_Manager: Gestionnaire des variantes légitimes d'un même état d'écran
  • Drift_Detector: Détecteur de changements significatifs dans l'interface utilisateur
  • Cluster_Quality_Score: Métrique de qualité d'un cluster DBSCAN (silhouette, cohésion, séparation)
  • Embedding_Prototype: Vecteur représentatif d'un état d'écran (moyenne normalisée des embeddings du cluster)
  • Temporal_Context: Séquence d'états précédents influençant le matching actuel
  • UI_State: État d'un élément UI (enabled, disabled, checked, loading, error)
  • Spatial_Relation: Relation spatiale entre éléments (above, below, left_of, right_of, inside)

Requirements

Requirement 1: Training Quality Validation

User Story: As a RPA developer, I want to validate the quality of trained workflows, so that I can ensure reliable replay execution.

Acceptance Criteria

  1. WHEN a workflow is built from a session THEN the Training_Quality_Validator SHALL compute cluster quality metrics including silhouette score, cohesion, and separation for each detected pattern
  2. WHEN cluster quality score falls below 0.7 THEN the Training_Quality_Validator SHALL flag the cluster as low-confidence and require additional training samples
  3. WHEN computing embedding prototypes THEN the Training_Quality_Validator SHALL detect and exclude outlier embeddings using IQR method with 1.5 threshold
  4. WHEN a workflow contains fewer than 3 observations per node THEN the Training_Quality_Validator SHALL mark the workflow as insufficient-data and prevent AUTO_CANDIDATE transition
  5. WHEN validating a workflow THEN the Training_Quality_Validator SHALL perform cross-validation by holding out 20% of observations and measuring match accuracy

Requirement 2: Hierarchical Matching System

User Story: As a RPA system, I want to match screens using multiple levels of granularity, so that I can achieve more robust and accurate state recognition.

Acceptance Criteria

  1. WHEN matching a screenshot THEN the Hierarchical_Matcher SHALL first match at window level using title pattern and process name with confidence weight 0.2
  2. WHEN window-level match succeeds THEN the Hierarchical_Matcher SHALL match at region level by comparing detected UI regions with stored region templates
  3. WHEN region-level match succeeds THEN the Hierarchical_Matcher SHALL match at element level by comparing individual UI elements within matched regions
  4. WHEN computing final match confidence THEN the Hierarchical_Matcher SHALL combine window, region, and element confidences using weighted formula: 0.2window + 0.3region + 0.5*element
  5. WHEN temporal context is available THEN the Hierarchical_Matcher SHALL boost confidence for nodes that are valid successors of the previous matched node by 0.1

Requirement 3: Continuous Learning and Adaptation

User Story: As a RPA system, I want to continuously learn from new observations, so that I can adapt to UI changes without full retraining.

Acceptance Criteria

  1. WHEN a successful execution occurs THEN the Continuous_Learner SHALL update the node embedding prototype using exponential moving average with alpha 0.1
  2. WHEN match confidence drops below 0.85 for 3 consecutive executions THEN the Drift_Detector SHALL flag potential UI drift and notify the user
  3. WHEN UI drift is confirmed THEN the Continuous_Learner SHALL create a new variant for the affected node while preserving the original
  4. WHEN a node has more than 5 variants THEN the Continuous_Learner SHALL consolidate variants by re-clustering with updated parameters
  5. WHEN updating prototypes THEN the Continuous_Learner SHALL maintain version history allowing rollback to previous prototype versions

Requirement 4: Variant and State Management

User Story: As a RPA developer, I want the system to handle screen variants and dynamic states, so that workflows work reliably across different UI conditions.

Acceptance Criteria

  1. WHEN building a workflow THEN the Variant_Manager SHALL detect and group similar but distinct screen states as variants of the same logical node
  2. WHEN a variant differs by more than 0.3 similarity from the primary prototype THEN the Variant_Manager SHALL create a separate variant entry with its own embedding
  3. WHEN matching against a node with variants THEN the Variant_Manager SHALL match against all variants and return the best match with variant identifier
  4. WHEN detecting UI element states THEN the Variant_Manager SHALL identify and store element states including enabled, disabled, checked, unchecked, loading, and error
  5. WHEN an unexpected popup or modal appears THEN the Variant_Manager SHALL detect the overlay and pause execution for user decision or apply configured handling rule

Requirement 5: Advanced UI Understanding

User Story: As a RPA system, I want to understand UI structure and relationships, so that I can locate elements more reliably even when positions change.

Acceptance Criteria

  1. WHEN detecting UI elements THEN the UI_Analyzer SHALL compute spatial relations between elements including above, below, left_of, right_of, and inside
  2. WHEN building a workflow THEN the UI_Analyzer SHALL group related elements into semantic containers such as forms, menus, toolbars, and dialogs
  3. WHEN resolving a target element THEN the UI_Analyzer SHALL use spatial relations as fallback when direct matching fails
  4. WHEN an element cannot be found by primary strategy THEN the UI_Analyzer SHALL search using anchor elements with known spatial relations
  5. WHEN detecting element states THEN the UI_Analyzer SHALL use visual features including color, opacity, and border style to determine enabled, disabled, or loading states

Requirement 6: Training Session Quality

User Story: As a RPA developer, I want feedback on training session quality, so that I can improve my demonstrations for better workflow reliability.

Acceptance Criteria

  1. WHEN a training session is uploaded THEN the Session_Analyzer SHALL compute a quality score based on screenshot clarity, action consistency, and timing patterns
  2. WHEN screenshots have low contrast or blur THEN the Session_Analyzer SHALL flag affected frames and suggest re-recording
  3. WHEN action timing is inconsistent with more than 2x standard deviation THEN the Session_Analyzer SHALL identify potentially problematic transitions
  4. WHEN duplicate or near-duplicate screenshots exceed 30% of session THEN the Session_Analyzer SHALL suggest optimizing capture frequency
  5. WHEN generating quality report THEN the Session_Analyzer SHALL provide actionable recommendations for improving training data

Requirement 7: Execution Robustness

User Story: As a RPA system, I want robust execution handling, so that workflows can recover from transient failures and unexpected conditions.

Acceptance Criteria

  1. WHEN an action fails THEN the Execution_Engine SHALL retry with exponential backoff up to 3 times before marking as failed
  2. WHEN a target element is not found THEN the Execution_Engine SHALL wait up to configured timeout with periodic re-detection before failing
  3. WHEN screen state does not match expected post-condition THEN the Execution_Engine SHALL attempt recovery by re-matching current state to workflow graph
  4. WHEN execution encounters an unknown screen THEN the Execution_Engine SHALL pause and request user guidance with screenshot and context
  5. WHEN recovering from failure THEN the Execution_Engine SHALL log detailed diagnostics including screenshots, match scores, and attempted strategies