# Self-Healing Workflows Design Document ## Overview This document specifies the design for an enhanced self-healing system for RPA Vision V3 that combines existing healing strategies with progressive tolerance relaxation during retries. The system enables workflows to automatically recover from failures by applying increasingly tolerant matching criteria and spatial search parameters. ## Architecture The self-healing system consists of two main integration points: 1. **Target Resolver Healing Integration**: Progressive relaxation of matching criteria based on healing attempt counter 2. **Action Executor Retry Integration**: Activation of healing mode during retry loops with exponential backoff ### Core Components ``` ┌─────────────────────────────────────────────────────────────────┐ │ Self-Healing Architecture │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────┐ │ │ │ Action │ │ Target │ │ Healing │ │ │ │ Executor │ │ Resolver │ │ Profiles │ │ │ │ │ │ │ │ │ │ │ │ • Retry Loop │◄──►│ • healing_ │◄──►│ • min_ratio │ │ │ │ • Backoff │ │ attempt │ │ • pad_mul │ │ │ │ • Counter Mgmt │ │ • Role Aliases │ │ • expand_ │ │ │ │ │ │ • Fuzzy Thresh │ │ roles │ │ │ └─────────────────┘ └─────────────────┘ └─────────────┘ │ │ │ │ │ │ │ │ │ │ │ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────┐ │ │ │ Existing │ │ Spatial │ │ Metrics │ │ │ │ Healing │ │ Search │ │ Collection │ │ │ │ Strategies │ │ Enhancement │ │ │ │ │ │ │ │ │ │ • Success │ │ │ │ • Semantic │ │ • ROI Padding │ │ Rates │ │ │ │ • Spatial │ │ • Container │ │ • Attempt │ │ │ │ • Timing │ │ Detection │ │ Counts │ │ │ │ • Format │ │ │ │ │ │ │ └─────────────────┘ └─────────────────┘ └─────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ## Components and Interfaces ### 1. Healing Profile System The healing profile system provides progressive tolerance based on attempt count: ```python @dataclass class HealingProfile: """Configuration for healing attempt tolerance levels""" min_ratio: float # Fuzzy matching threshold pad_mul: float # Spatial padding multiplier expand_roles: bool # Whether to use role aliases attempt_level: int # Healing attempt level (0=strict, 1+=relaxed) ``` **Healing Profiles by Attempt Level:** - **Level 0 (Normal)**: `min_ratio=0.82, pad_mul=1.0, expand_roles=False` - **Level 1 (First Healing)**: `min_ratio=0.78, pad_mul=1.3, expand_roles=True` - **Level 2+ (Desperate)**: `min_ratio=0.72, pad_mul=1.7, expand_roles=True` ### 2. Role Alias System Semantic role expansion for more tolerant matching: ```python ROLE_ALIASES = { "input": {"input", "textfield", "text_field", "form_input", "forminput", "edit", "textbox"}, "button": {"button", "submit", "action", "cta"}, "label": {"label", "text", "data_display"}, "checkbox": {"checkbox", "check_box", "toggle"}, } TYPE_ALIASES = { "text_input": {"text_input", "input", "textfield"}, "button": {"button"}, } ``` ### 3. Enhanced Target Resolver The `TargetResolver` class is enhanced with healing capabilities: ```python class TargetResolver: def __init__(self): self.healing_attempt = 0 # Healing attempt counter def _healing_profile(self) -> Dict[str, Any]: """Get tolerance profile based on healing attempt level""" def _find_element_by_text(self, text: str, ui_elements: List[UIElement], min_ratio: float = 0.65) -> Optional[UIElement]: """Enhanced with configurable fuzzy threshold""" def _resolve_by_role(self, role: str, ...) -> Optional[ResolvedTarget]: """Enhanced with role alias expansion""" def _build_anchor_and_roi_and_container(self, target_spec, ui_elements): """Enhanced with configurable padding multipliers""" ``` ### 4. Enhanced Action Executor The `ActionExecutor` integrates healing during retry loops: ```python class ActionExecutor: def execute_edge(self, edge: WorkflowEdge, screen_state: ScreenState) -> ExecutionResult: """Enhanced with healing activation during retries""" # Normal execution attempt result = self._execute_action(edge.action, screen_state, context, edge) # If failed and retries configured, activate healing if result.status != ExecutionStatus.SUCCESS and retries > 0: for i in range(retries): # Apply exponential backoff time.sleep((backoff_ms * (2 ** i)) / 1000.0) # Activate healing attempt on resolver self.target_resolver.healing_attempt = i + 1 try: # Retry with healing active result = self.execute_edge(edge, current_state) finally: # Always reset healing attempt self.target_resolver.healing_attempt = 0 if result.status == ExecutionStatus.SUCCESS: return result ``` ## Data Models ### HealingAttemptMetrics ```python @dataclass class HealingAttemptMetrics: """Metrics for healing attempt tracking""" attempt_level: int success: bool strategy_used: str original_criteria: Dict[str, Any] relaxed_criteria: Dict[str, Any] duration_ms: float timestamp: datetime ``` ### ResolutionDetails Enhancement The existing `ResolvedTarget.resolution_details` is enhanced with healing information: ```python resolution_details = { "healing_attempt": int, # Current healing attempt level "healing_profile": Dict[str, Any], # Applied healing profile "role_aliases_used": List[str], # Role aliases that were tried "fuzzy_threshold_used": float, # Actual fuzzy threshold used "spatial_padding_used": float, # Spatial padding multiplier used } ``` ## Correctness Properties *A property is a characteristic or behavior that should hold true across all valid executions of a system-essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.* ### Property 1: Healing Attempt Progression *For any* target resolution that fails initially, when healing attempts are incremented, the tolerance criteria should become progressively more relaxed **Validates: Requirements 7.2, 7.3, 7.4** ### Property 2: Healing Counter Reset *For any* successful action execution after healing attempts, the healing attempt counter should be reset to zero **Validates: Requirements 8.4** ### Property 3: Role Alias Expansion *For any* role-based target resolution with healing active, the system should accept elements matching role aliases when expand_roles is true **Validates: Requirements 7.3** ### Property 4: Fuzzy Threshold Relaxation *For any* text-based target resolution, the fuzzy matching threshold should decrease (become more tolerant) as healing attempt level increases **Validates: Requirements 7.2** ### Property 5: Spatial Padding Expansion *For any* spatial search operation during healing, the padding multiplier should increase the search area proportionally to the healing attempt level **Validates: Requirements 7.4** ### Property 6: Backoff Timing Consistency *For any* retry sequence with healing, the delay between attempts should follow exponential backoff pattern regardless of healing success **Validates: Requirements 8.2** ### Property 7: Healing Metrics Recording *For any* healing attempt, the system should record metrics including attempt level, success status, and applied criteria **Validates: Requirements 7.5** ## Error Handling ### Healing Failure Scenarios 1. **Maximum Attempts Reached**: Log all attempted healing profiles and strategies 2. **Invalid Healing Configuration**: Fall back to strict matching with warning 3. **Role Alias Resolution Conflicts**: Use first successful match with preference logging 4. **Spatial Search Boundary Violations**: Clamp to screen boundaries with adjustment logging ### Recovery Strategies 1. **Graceful Degradation**: If healing system fails, continue with strict matching 2. **Profile Validation**: Validate healing profiles before application 3. **Counter Synchronization**: Ensure healing counter consistency across components 4. **Metrics Resilience**: Continue operation even if metrics collection fails ## Testing Strategy ### Unit Tests - Test healing profile generation for different attempt levels - Test role alias expansion logic - Test fuzzy threshold adjustment - Test spatial padding calculations - Test healing counter management ### Property-Based Tests - **Property 1**: Healing progression tolerance verification - **Property 2**: Counter reset consistency - **Property 3**: Role alias acceptance - **Property 4**: Fuzzy threshold relaxation - **Property 5**: Spatial padding expansion - **Property 6**: Backoff timing verification - **Property 7**: Metrics recording completeness ### Integration Tests - End-to-end healing scenarios with UI changes - Multi-attempt healing sequences - Healing with existing self-healing strategies - Performance impact measurement - Cross-component healing coordination ### Performance Tests - Healing overhead measurement (<1ms per attempt) - Memory usage during extended healing sequences - Concurrent healing attempt handling - Cache interaction with healing profiles