feat: replay E2E fonctionnel — 25/25 actions, 0 retries, SomEngine via serveur

Validé sur PC Windows (DESKTOP-58D5CAC, 2560x1600) : - 8 clics résolus visuellement (1 anchor_template, 1 som_text_match, 6 som_vlm) - Score moyen 0.75, temps moyen 1.6s - Texte tapé correctement (bonjour, test word, date, email) - 0 retries, 2 actions non vérifiées (OK) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 14:04:41 +02:00
parent 5e0b53cfd1
commit a7de6a488b
79542 changed files with 6091757 additions and 1 deletions
--- a/.kiro/specs/multi-anchor-constraints/design.md
+++ b/.kiro/specs/multi-anchor-constraints/design.md
@@ -0,0 +1,347 @@
+# Multi-Anchor Constraints Design Document
+
+## Overview
+
+This document specifies the design for a multi-anchor constraint system that enables RPA Vision V3 to understand complex targeting instructions. The system combines multiple anchor references, hard constraints, and intelligent weighting to select optimal target elements with "combinatorial common sense."
+
+## Architecture
+
+The multi-anchor constraint system extends the existing TargetResolver with advanced targeting capabilities:
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                Multi-Anchor Constraint Architecture              │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                 │
+│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────┐ │
+│  │  Enhanced       │    │  Multi-Anchor   │    │  Hard       │ │
+│  │  TargetSpec     │    │  Resolver       │    │  Constraints│ │
+│  │                 │    │                 │    │             │ │
+│  │ • hard_         │◄──►│ • Anchor        │◄──►│ • Container │ │
+│  │   constraints   │    │   Evaluation    │    │   Filter    │ │
+│  │ • weights       │    │ • Best Combo    │    │ • Area      │ │
+│  │ • multi-anchor  │    │   Selection     │    │   Filter    │ │
+│  └─────────────────┘    └─────────────────┘    └─────────────┘ │
+│           │                       │                       │     │
+│           │                       │                       │     │
+│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────┐ │
+│  │  Container      │    │  Weighted       │    │  Tie-Break  │ │
+│  │  Resolver       │    │  Scoring        │    │  System     │ │
+│  │                 │    │                 │    │             │ │
+│  │ • Text-based    │    │ • Proximity     │    │ • Stable    │ │
+│  │   Container     │    │ • Alignment     │    │   Selection │ │
+│  │   Finding       │    │ • Container     │    │ • Multi-    │ │
+│  │ • Smallest      │    │ • ROI IOU       │    │   Criteria  │ │
+│  │   Container     │    │                 │    │             │ │
+│  └─────────────────┘    └─────────────────┘    └─────────────┘ │
+│                                                                 │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Components and Interfaces
+
+### 1. Enhanced TargetSpec
+
+Extension of the existing TargetSpec dataclass with new fields:
+
+```python
+from dataclasses import dataclass, field
+from typing import Any, Dict, Optional
+
+@dataclass
+class TargetSpec:
+    # Existing fields
+    by_role: Optional[str] = None
+    by_text: Optional[str] = None
+    by_position: Optional[Dict[str, Any]] = None
+    selection_policy: str = "first"
+    context_hints: Dict[str, Any] = field(default_factory=dict)
+    
+    # New fields for Fiche #11
+    hard_constraints: Dict[str, Any] = field(default_factory=dict)
+    weights: Dict[str, float] = field(default_factory=dict)
+```
+
+**Usage Examples:**
+```python
+# Multi-anchor with container constraint
+target_spec = TargetSpec(
+    by_role="input",
+    context_hints={"near_text": ["Username", "Identifiant"]},
+    hard_constraints={"within_container_text": "Login"},
+    weights={"proximity": 0.45, "alignment": 0.35, "container": 0.20}
+)
+```
+
+### 2. Multi-Anchor Resolution System
+
+The core logic for evaluating multiple anchors and selecting the best combination:
+
+```python
+class MultiAnchorResolver:
+    def resolve_with_multiple_anchors(self, target_spec: TargetSpec, 
+                                    ui_elements: List[UIElement]) -> Optional[ResolvedTarget]:
+        """
+        Resolve target using multiple anchor evaluation
+        
+        Process:
+        1. Extract all anchor texts from context_hints
+        2. Find all anchor candidates for each text
+        3. For each anchor candidate, build ROI and score all target candidates
+        4. Apply hard constraints to filter candidates
+        5. Apply weighted scoring to rank candidates
+        6. Use stable tie-breaking for final selection
+        """
+```
+
+### 3. Hard Constraints System
+
+Strict filtering system that eliminates candidates before scoring:
+
+```python
+class HardConstraintsFilter:
+    def apply_constraints(self, candidates: List[UIElement], 
+                         constraints: Dict[str, Any],
+                         ui_elements: List[UIElement]) -> List[UIElement]:
+        """
+        Apply hard constraints as strict filters
+        
+        Supported constraints:
+        - within_container_text: Only elements within specified container
+        - min_area: Only elements with area >= threshold
+        - max_distance: Only elements within distance from anchor
+        """
+        
+    def _container_bbox_from_text(self, text: str, 
+                                 ui_elements: List[UIElement]) -> Optional[BBox]:
+        """
+        Find container bounding box from text label
+        
+        Process:
+        1. Find element with matching text
+        2. If element is container type, use its bbox
+        3. If element is label, find smallest containing container
+        4. Return container bbox or None if not found
+        """
+```
+
+### 4. Weighted Scoring System
+
+Configurable scoring system with multiple criteria:
+
+```python
+class WeightedScorer:
+    def calculate_composite_score(self, element: UIElement,
+                                anchor: Optional[UIElement],
+                                roi_bbox: Optional[BBox],
+                                container_bbox: Optional[BBox],
+                                weights: Dict[str, float],
+                                base_score: float) -> float:
+        """
+        Calculate weighted composite score
+        
+        Components:
+        - proximity: Distance from anchor (if available)
+        - alignment: Horizontal/vertical alignment with anchor
+        - container: Preference for elements in preferred container
+        - roi_iou: Intersection over union with ROI
+        """
+        
+    DEFAULT_WEIGHTS = {
+        "proximity": 0.35,
+        "alignment": 0.25, 
+        "container": 0.15,
+        "roi_iou": 0.25
+    }
+```
+
+### 5. Container Resolution System
+
+Text-based container finding with intelligent fallback:
+
+```python
+class ContainerResolver:
+    def find_container_by_text(self, text: str, 
+                              ui_elements: List[UIElement]) -> Optional[BBox]:
+        """
+        Find container by text with smart detection
+        
+        Process:
+        1. Find elements matching the text
+        2. Check if element is already a container type
+        3. If not, find smallest containing container
+        4. Return container bbox with preference for smallest
+        """
+        
+    CONTAINER_ROLES = {"panel", "container", "group", "form", "dialog", "window"}
+    CONTAINER_TYPES = {"panel", "container", "group", "form", "dialog", "window"}
+```
+
+### 6. Stable Tie-Breaking System
+
+Multi-criteria tie-breaking for reproducible results:
+
+```python
+class TieBreaker:
+    def create_sort_key(self, element: UIElement, score: float) -> Tuple:
+        """
+        Create stable sort key for tie-breaking
+        
+        Criteria (in order):
+        1. Composite score (descending)
+        2. Element confidence (descending) 
+        3. Element area (descending)
+        4. Element ID (ascending for stability)
+        """
+        return (
+            score,
+            float(getattr(element, "confidence", 1.0) or 1.0),
+            self._bbox_area(element.bbox),
+            str(element.element_id)
+        )
+```
+
+## Data Models
+
+### Enhanced Resolution Details
+
+Extension of ResolvedTarget.resolution_details with multi-anchor information:
+
+```python
+resolution_details = {
+    # Existing fields
+    "healing_attempt": int,
+    "anchor_id": Optional[str],
+    "top3": List[Dict],
+    
+    # New fields for multi-anchor
+    "anchors_attempted": List[str],           # All anchor texts tried
+    "successful_anchor": Optional[str],       # Which anchor text succeeded
+    "hard_constraints_applied": Dict[str, Any], # Constraints that were applied
+    "candidates_filtered": int,               # How many candidates were filtered
+    "weights_used": Dict[str, float],         # Actual weights applied
+    "tie_break_criteria": Optional[str],      # Which tie-break criterion was used
+    "container_resolved": Optional[str],      # Container text that was resolved
+    "performance_metrics": Dict[str, float]   # Timing and efficiency metrics
+}
+```
+
+### Multi-Anchor Metrics
+
+```python
+@dataclass
+class MultiAnchorMetrics:
+    """Metrics for multi-anchor resolution performance"""
+    total_anchors_attempted: int
+    successful_anchor_index: int
+    candidates_before_constraints: int
+    candidates_after_constraints: int
+    scoring_duration_ms: float
+    container_resolution_duration_ms: float
+    total_resolution_duration_ms: float
+    cache_hits: int
+    cache_misses: int
+```
+
+## Correctness Properties
+
+*A property is a characteristic or behavior that should hold true across all valid executions of a system-essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
+
+### Property 1: Multi-anchor evaluation completeness
+*For any* target specification with multiple anchor texts, all anchor texts should be attempted for resolution until one succeeds or all are exhausted
+**Validates: Requirements 1.1, 1.3**
+
+### Property 2: Hard constraint strictness
+*For any* set of hard constraints, no element that violates any constraint should be included in the final candidate set
+**Validates: Requirements 2.1, 2.4**
+
+### Property 3: Container resolution consistency
+*For any* container text specification, the same container should be resolved consistently across multiple calls with the same UI state
+**Validates: Requirements 4.1, 4.4**
+
+### Property 4: Weighted scoring monotonicity
+*For any* two elements where element A is objectively better than element B on all weighted criteria, element A should have a higher composite score than element B
+**Validates: Requirements 3.1, 3.2, 3.3, 3.4**
+
+### Property 5: Tie-breaking determinism
+*For any* UI state processed multiple times, when multiple elements have identical scores, the same element should always be selected
+**Validates: Requirements 5.5**
+
+### Property 6: Anchor fallback resilience
+*For any* target specification where some anchor texts are missing, resolution should continue with available anchors without failing
+**Validates: Requirements 1.3, 1.4**
+
+### Property 7: Constraint filtering completeness
+*For any* hard constraint specification, all elements that violate the constraint should be filtered out before scoring
+**Validates: Requirements 2.2, 2.3**
+
+### Property 8: Semantic variant equivalence
+*For any* set of semantic variant anchor texts, elements found by any variant should be treated as equivalent candidates
+**Validates: Requirements 6.1, 6.2**
+
+### Property 9: Performance optimization consistency
+*For any* multi-anchor resolution, UI element analysis should be reused between anchor evaluations to avoid redundant computation
+**Validates: Requirements 8.2, 8.4**
+
+### Property 10: Audit trail completeness
+*For any* multi-anchor resolution, the resolution details should contain complete information about anchors attempted, constraints applied, and scoring performed
+**Validates: Requirements 7.1, 7.2, 7.3, 7.5**
+
+## Error Handling
+
+### Multi-Anchor Failure Scenarios
+
+1. **All Anchors Missing**: Fall back to anchor-less resolution with logging
+2. **Invalid Container Text**: Log warning and continue without container constraint
+3. **Malformed Weights**: Validate and fall back to default weights
+4. **Empty Candidate Set**: Return None with detailed failure reason
+5. **Scoring Calculation Errors**: Use base score with error logging
+
+### Recovery Strategies
+
+1. **Graceful Degradation**: Continue with available anchors when some fail
+2. **Weight Validation**: Normalize weights to sum to 1.0 if invalid
+3. **Container Fallback**: Continue without container constraint if resolution fails
+4. **Performance Fallback**: Disable caching if cache operations fail
+5. **Logging Resilience**: Continue operation even if audit logging fails
+
+## Testing Strategy
+
+### Unit Tests
+
+- Test multi-anchor text extraction and candidate finding
+- Test hard constraint filtering with various constraint types
+- Test weighted scoring with different weight configurations
+- Test container resolution with various text patterns
+- Test tie-breaking with identical scores
+- Test performance optimizations (caching, reuse)
+
+### Property-Based Tests
+
+Using Hypothesis framework with 100+ iterations per property:
+
+- **Property 1**: Multi-anchor evaluation completeness
+- **Property 2**: Hard constraint strictness  
+- **Property 3**: Container resolution consistency
+- **Property 4**: Weighted scoring monotonicity
+- **Property 5**: Tie-breaking determinism
+- **Property 6**: Anchor fallback resilience
+- **Property 7**: Constraint filtering completeness
+- **Property 8**: Semantic variant equivalence
+- **Property 9**: Performance optimization consistency
+- **Property 10**: Audit trail completeness
+
+### Integration Tests
+
+- End-to-end multi-anchor resolution with complex UI states
+- Performance benchmarks with large UI element sets
+- Cross-component integration with existing healing system
+- Real-world scenarios with Login/Settings panel disambiguation
+- Stress testing with many anchors and constraints
+
+### Performance Tests
+
+- Multi-anchor resolution should complete within 50ms for typical UI states
+- Memory usage should remain constant regardless of anchor count
+- Cache hit rates should exceed 80% for repeated container lookups
+- Scoring calculations should scale linearly with candidate count