Files
rpa_vision_v3/.kiro/specs/self-healing-workflows/design.md
Dom a7de6a488b feat: replay E2E fonctionnel — 25/25 actions, 0 retries, SomEngine via serveur
Validé sur PC Windows (DESKTOP-58D5CAC, 2560x1600) :
- 8 clics résolus visuellement (1 anchor_template, 1 som_text_match, 6 som_vlm)
- Score moyen 0.75, temps moyen 1.6s
- Texte tapé correctement (bonjour, test word, date, email)
- 0 retries, 2 actions non vérifiées (OK)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 14:04:41 +02:00

11 KiB

Self-Healing Workflows Design Document

Overview

This document specifies the design for an enhanced self-healing system for RPA Vision V3 that combines existing healing strategies with progressive tolerance relaxation during retries. The system enables workflows to automatically recover from failures by applying increasingly tolerant matching criteria and spatial search parameters.

Architecture

The self-healing system consists of two main integration points:

  1. Target Resolver Healing Integration: Progressive relaxation of matching criteria based on healing attempt counter
  2. Action Executor Retry Integration: Activation of healing mode during retry loops with exponential backoff

Core Components

┌─────────────────────────────────────────────────────────────────┐
│                    Self-Healing Architecture                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────┐ │
│  │  Action         │    │  Target         │    │  Healing    │ │
│  │  Executor       │    │  Resolver       │    │  Profiles   │ │
│  │                 │    │                 │    │             │ │
│  │ • Retry Loop    │◄──►│ • healing_      │◄──►│ • min_ratio │ │
│  │ • Backoff       │    │   attempt       │    │ • pad_mul   │ │
│  │ • Counter Mgmt  │    │ • Role Aliases  │    │ • expand_   │ │
│  │                 │    │ • Fuzzy Thresh  │    │   roles     │ │
│  └─────────────────┘    └─────────────────┘    └─────────────┘ │
│           │                       │                       │     │
│           │                       │                       │     │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────┐ │
│  │  Existing       │    │  Spatial        │    │  Metrics    │ │
│  │  Healing        │    │  Search         │    │  Collection │ │
│  │  Strategies     │    │  Enhancement    │    │             │ │
│  │                 │    │                 │    │ • Success   │ │
│  │ • Semantic      │    │ • ROI Padding   │    │   Rates     │ │
│  │ • Spatial       │    │ • Container     │    │ • Attempt   │ │
│  │ • Timing        │    │   Detection     │    │   Counts    │ │
│  │ • Format        │    │                 │    │             │ │
│  └─────────────────┘    └─────────────────┘    └─────────────┘ │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Components and Interfaces

1. Healing Profile System

The healing profile system provides progressive tolerance based on attempt count:

@dataclass
class HealingProfile:
    """Configuration for healing attempt tolerance levels"""
    min_ratio: float          # Fuzzy matching threshold
    pad_mul: float           # Spatial padding multiplier  
    expand_roles: bool       # Whether to use role aliases
    attempt_level: int       # Healing attempt level (0=strict, 1+=relaxed)

Healing Profiles by Attempt Level:

  • Level 0 (Normal): min_ratio=0.82, pad_mul=1.0, expand_roles=False
  • Level 1 (First Healing): min_ratio=0.78, pad_mul=1.3, expand_roles=True
  • Level 2+ (Desperate): min_ratio=0.72, pad_mul=1.7, expand_roles=True

2. Role Alias System

Semantic role expansion for more tolerant matching:

ROLE_ALIASES = {
    "input": {"input", "textfield", "text_field", "form_input", "forminput", "edit", "textbox"},
    "button": {"button", "submit", "action", "cta"},
    "label": {"label", "text", "data_display"},
    "checkbox": {"checkbox", "check_box", "toggle"},
}

TYPE_ALIASES = {
    "text_input": {"text_input", "input", "textfield"},
    "button": {"button"},
}

3. Enhanced Target Resolver

The TargetResolver class is enhanced with healing capabilities:

class TargetResolver:
    def __init__(self):
        self.healing_attempt = 0  # Healing attempt counter
        
    def _healing_profile(self) -> Dict[str, Any]:
        """Get tolerance profile based on healing attempt level"""
        
    def _find_element_by_text(self, text: str, ui_elements: List[UIElement], 
                             min_ratio: float = 0.65) -> Optional[UIElement]:
        """Enhanced with configurable fuzzy threshold"""
        
    def _resolve_by_role(self, role: str, ...) -> Optional[ResolvedTarget]:
        """Enhanced with role alias expansion"""
        
    def _build_anchor_and_roi_and_container(self, target_spec, ui_elements):
        """Enhanced with configurable padding multipliers"""

4. Enhanced Action Executor

The ActionExecutor integrates healing during retry loops:

class ActionExecutor:
    def execute_edge(self, edge: WorkflowEdge, screen_state: ScreenState) -> ExecutionResult:
        """Enhanced with healing activation during retries"""
        
        # Normal execution attempt
        result = self._execute_action(edge.action, screen_state, context, edge)
        
        # If failed and retries configured, activate healing
        if result.status != ExecutionStatus.SUCCESS and retries > 0:
            for i in range(retries):
                # Apply exponential backoff
                time.sleep((backoff_ms * (2 ** i)) / 1000.0)
                
                # Activate healing attempt on resolver
                self.target_resolver.healing_attempt = i + 1
                
                try:
                    # Retry with healing active
                    result = self.execute_edge(edge, current_state)
                finally:
                    # Always reset healing attempt
                    self.target_resolver.healing_attempt = 0
                    
                if result.status == ExecutionStatus.SUCCESS:
                    return result

Data Models

HealingAttemptMetrics

@dataclass
class HealingAttemptMetrics:
    """Metrics for healing attempt tracking"""
    attempt_level: int
    success: bool
    strategy_used: str
    original_criteria: Dict[str, Any]
    relaxed_criteria: Dict[str, Any]
    duration_ms: float
    timestamp: datetime

ResolutionDetails Enhancement

The existing ResolvedTarget.resolution_details is enhanced with healing information:

resolution_details = {
    "healing_attempt": int,           # Current healing attempt level
    "healing_profile": Dict[str, Any], # Applied healing profile
    "role_aliases_used": List[str],   # Role aliases that were tried
    "fuzzy_threshold_used": float,    # Actual fuzzy threshold used
    "spatial_padding_used": float,    # Spatial padding multiplier used
}

Correctness Properties

A property is a characteristic or behavior that should hold true across all valid executions of a system-essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.

Property 1: Healing Attempt Progression

For any target resolution that fails initially, when healing attempts are incremented, the tolerance criteria should become progressively more relaxed Validates: Requirements 7.2, 7.3, 7.4

Property 2: Healing Counter Reset

For any successful action execution after healing attempts, the healing attempt counter should be reset to zero Validates: Requirements 8.4

Property 3: Role Alias Expansion

For any role-based target resolution with healing active, the system should accept elements matching role aliases when expand_roles is true Validates: Requirements 7.3

Property 4: Fuzzy Threshold Relaxation

For any text-based target resolution, the fuzzy matching threshold should decrease (become more tolerant) as healing attempt level increases Validates: Requirements 7.2

Property 5: Spatial Padding Expansion

For any spatial search operation during healing, the padding multiplier should increase the search area proportionally to the healing attempt level Validates: Requirements 7.4

Property 6: Backoff Timing Consistency

For any retry sequence with healing, the delay between attempts should follow exponential backoff pattern regardless of healing success Validates: Requirements 8.2

Property 7: Healing Metrics Recording

For any healing attempt, the system should record metrics including attempt level, success status, and applied criteria Validates: Requirements 7.5

Error Handling

Healing Failure Scenarios

  1. Maximum Attempts Reached: Log all attempted healing profiles and strategies
  2. Invalid Healing Configuration: Fall back to strict matching with warning
  3. Role Alias Resolution Conflicts: Use first successful match with preference logging
  4. Spatial Search Boundary Violations: Clamp to screen boundaries with adjustment logging

Recovery Strategies

  1. Graceful Degradation: If healing system fails, continue with strict matching
  2. Profile Validation: Validate healing profiles before application
  3. Counter Synchronization: Ensure healing counter consistency across components
  4. Metrics Resilience: Continue operation even if metrics collection fails

Testing Strategy

Unit Tests

  • Test healing profile generation for different attempt levels
  • Test role alias expansion logic
  • Test fuzzy threshold adjustment
  • Test spatial padding calculations
  • Test healing counter management

Property-Based Tests

  • Property 1: Healing progression tolerance verification
  • Property 2: Counter reset consistency
  • Property 3: Role alias acceptance
  • Property 4: Fuzzy threshold relaxation
  • Property 5: Spatial padding expansion
  • Property 6: Backoff timing verification
  • Property 7: Metrics recording completeness

Integration Tests

  • End-to-end healing scenarios with UI changes
  • Multi-attempt healing sequences
  • Healing with existing self-healing strategies
  • Performance impact measurement
  • Cross-component healing coordination

Performance Tests

  • Healing overhead measurement (<1ms per attempt)
  • Memory usage during extended healing sequences
  • Concurrent healing attempt handling
  • Cache interaction with healing profiles