feat: replay E2E fonctionnel — 25/25 actions, 0 retries, SomEngine via serveur

Validé sur PC Windows (DESKTOP-58D5CAC, 2560x1600) :
- 8 clics résolus visuellement (1 anchor_template, 1 som_text_match, 6 som_vlm)
- Score moyen 0.75, temps moyen 1.6s
- Texte tapé correctement (bonjour, test word, date, email)
- 0 retries, 2 actions non vérifiées (OK)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Dom
2026-03-31 14:04:41 +02:00
parent 5e0b53cfd1
commit a7de6a488b
79542 changed files with 6091757 additions and 1 deletions

View File

@@ -0,0 +1,347 @@
# Multi-Anchor Constraints Design Document
## Overview
This document specifies the design for a multi-anchor constraint system that enables RPA Vision V3 to understand complex targeting instructions. The system combines multiple anchor references, hard constraints, and intelligent weighting to select optimal target elements with "combinatorial common sense."
## Architecture
The multi-anchor constraint system extends the existing TargetResolver with advanced targeting capabilities:
```
┌─────────────────────────────────────────────────────────────────┐
│ Multi-Anchor Constraint Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────┐ │
│ │ Enhanced │ │ Multi-Anchor │ │ Hard │ │
│ │ TargetSpec │ │ Resolver │ │ Constraints│ │
│ │ │ │ │ │ │ │
│ │ • hard_ │◄──►│ • Anchor │◄──►│ • Container │ │
│ │ constraints │ │ Evaluation │ │ Filter │ │
│ │ • weights │ │ • Best Combo │ │ • Area │ │
│ │ • multi-anchor │ │ Selection │ │ Filter │ │
│ └─────────────────┘ └─────────────────┘ └─────────────┘ │
│ │ │ │ │
│ │ │ │ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────┐ │
│ │ Container │ │ Weighted │ │ Tie-Break │ │
│ │ Resolver │ │ Scoring │ │ System │ │
│ │ │ │ │ │ │ │
│ │ • Text-based │ │ • Proximity │ │ • Stable │ │
│ │ Container │ │ • Alignment │ │ Selection │ │
│ │ Finding │ │ • Container │ │ • Multi- │ │
│ │ • Smallest │ │ • ROI IOU │ │ Criteria │ │
│ │ Container │ │ │ │ │ │
│ └─────────────────┘ └─────────────────┘ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
## Components and Interfaces
### 1. Enhanced TargetSpec
Extension of the existing TargetSpec dataclass with new fields:
```python
from dataclasses import dataclass, field
from typing import Any, Dict, Optional
@dataclass
class TargetSpec:
# Existing fields
by_role: Optional[str] = None
by_text: Optional[str] = None
by_position: Optional[Dict[str, Any]] = None
selection_policy: str = "first"
context_hints: Dict[str, Any] = field(default_factory=dict)
# New fields for Fiche #11
hard_constraints: Dict[str, Any] = field(default_factory=dict)
weights: Dict[str, float] = field(default_factory=dict)
```
**Usage Examples:**
```python
# Multi-anchor with container constraint
target_spec = TargetSpec(
by_role="input",
context_hints={"near_text": ["Username", "Identifiant"]},
hard_constraints={"within_container_text": "Login"},
weights={"proximity": 0.45, "alignment": 0.35, "container": 0.20}
)
```
### 2. Multi-Anchor Resolution System
The core logic for evaluating multiple anchors and selecting the best combination:
```python
class MultiAnchorResolver:
def resolve_with_multiple_anchors(self, target_spec: TargetSpec,
ui_elements: List[UIElement]) -> Optional[ResolvedTarget]:
"""
Resolve target using multiple anchor evaluation
Process:
1. Extract all anchor texts from context_hints
2. Find all anchor candidates for each text
3. For each anchor candidate, build ROI and score all target candidates
4. Apply hard constraints to filter candidates
5. Apply weighted scoring to rank candidates
6. Use stable tie-breaking for final selection
"""
```
### 3. Hard Constraints System
Strict filtering system that eliminates candidates before scoring:
```python
class HardConstraintsFilter:
def apply_constraints(self, candidates: List[UIElement],
constraints: Dict[str, Any],
ui_elements: List[UIElement]) -> List[UIElement]:
"""
Apply hard constraints as strict filters
Supported constraints:
- within_container_text: Only elements within specified container
- min_area: Only elements with area >= threshold
- max_distance: Only elements within distance from anchor
"""
def _container_bbox_from_text(self, text: str,
ui_elements: List[UIElement]) -> Optional[BBox]:
"""
Find container bounding box from text label
Process:
1. Find element with matching text
2. If element is container type, use its bbox
3. If element is label, find smallest containing container
4. Return container bbox or None if not found
"""
```
### 4. Weighted Scoring System
Configurable scoring system with multiple criteria:
```python
class WeightedScorer:
def calculate_composite_score(self, element: UIElement,
anchor: Optional[UIElement],
roi_bbox: Optional[BBox],
container_bbox: Optional[BBox],
weights: Dict[str, float],
base_score: float) -> float:
"""
Calculate weighted composite score
Components:
- proximity: Distance from anchor (if available)
- alignment: Horizontal/vertical alignment with anchor
- container: Preference for elements in preferred container
- roi_iou: Intersection over union with ROI
"""
DEFAULT_WEIGHTS = {
"proximity": 0.35,
"alignment": 0.25,
"container": 0.15,
"roi_iou": 0.25
}
```
### 5. Container Resolution System
Text-based container finding with intelligent fallback:
```python
class ContainerResolver:
def find_container_by_text(self, text: str,
ui_elements: List[UIElement]) -> Optional[BBox]:
"""
Find container by text with smart detection
Process:
1. Find elements matching the text
2. Check if element is already a container type
3. If not, find smallest containing container
4. Return container bbox with preference for smallest
"""
CONTAINER_ROLES = {"panel", "container", "group", "form", "dialog", "window"}
CONTAINER_TYPES = {"panel", "container", "group", "form", "dialog", "window"}
```
### 6. Stable Tie-Breaking System
Multi-criteria tie-breaking for reproducible results:
```python
class TieBreaker:
def create_sort_key(self, element: UIElement, score: float) -> Tuple:
"""
Create stable sort key for tie-breaking
Criteria (in order):
1. Composite score (descending)
2. Element confidence (descending)
3. Element area (descending)
4. Element ID (ascending for stability)
"""
return (
score,
float(getattr(element, "confidence", 1.0) or 1.0),
self._bbox_area(element.bbox),
str(element.element_id)
)
```
## Data Models
### Enhanced Resolution Details
Extension of ResolvedTarget.resolution_details with multi-anchor information:
```python
resolution_details = {
# Existing fields
"healing_attempt": int,
"anchor_id": Optional[str],
"top3": List[Dict],
# New fields for multi-anchor
"anchors_attempted": List[str], # All anchor texts tried
"successful_anchor": Optional[str], # Which anchor text succeeded
"hard_constraints_applied": Dict[str, Any], # Constraints that were applied
"candidates_filtered": int, # How many candidates were filtered
"weights_used": Dict[str, float], # Actual weights applied
"tie_break_criteria": Optional[str], # Which tie-break criterion was used
"container_resolved": Optional[str], # Container text that was resolved
"performance_metrics": Dict[str, float] # Timing and efficiency metrics
}
```
### Multi-Anchor Metrics
```python
@dataclass
class MultiAnchorMetrics:
"""Metrics for multi-anchor resolution performance"""
total_anchors_attempted: int
successful_anchor_index: int
candidates_before_constraints: int
candidates_after_constraints: int
scoring_duration_ms: float
container_resolution_duration_ms: float
total_resolution_duration_ms: float
cache_hits: int
cache_misses: int
```
## Correctness Properties
*A property is a characteristic or behavior that should hold true across all valid executions of a system-essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
### Property 1: Multi-anchor evaluation completeness
*For any* target specification with multiple anchor texts, all anchor texts should be attempted for resolution until one succeeds or all are exhausted
**Validates: Requirements 1.1, 1.3**
### Property 2: Hard constraint strictness
*For any* set of hard constraints, no element that violates any constraint should be included in the final candidate set
**Validates: Requirements 2.1, 2.4**
### Property 3: Container resolution consistency
*For any* container text specification, the same container should be resolved consistently across multiple calls with the same UI state
**Validates: Requirements 4.1, 4.4**
### Property 4: Weighted scoring monotonicity
*For any* two elements where element A is objectively better than element B on all weighted criteria, element A should have a higher composite score than element B
**Validates: Requirements 3.1, 3.2, 3.3, 3.4**
### Property 5: Tie-breaking determinism
*For any* UI state processed multiple times, when multiple elements have identical scores, the same element should always be selected
**Validates: Requirements 5.5**
### Property 6: Anchor fallback resilience
*For any* target specification where some anchor texts are missing, resolution should continue with available anchors without failing
**Validates: Requirements 1.3, 1.4**
### Property 7: Constraint filtering completeness
*For any* hard constraint specification, all elements that violate the constraint should be filtered out before scoring
**Validates: Requirements 2.2, 2.3**
### Property 8: Semantic variant equivalence
*For any* set of semantic variant anchor texts, elements found by any variant should be treated as equivalent candidates
**Validates: Requirements 6.1, 6.2**
### Property 9: Performance optimization consistency
*For any* multi-anchor resolution, UI element analysis should be reused between anchor evaluations to avoid redundant computation
**Validates: Requirements 8.2, 8.4**
### Property 10: Audit trail completeness
*For any* multi-anchor resolution, the resolution details should contain complete information about anchors attempted, constraints applied, and scoring performed
**Validates: Requirements 7.1, 7.2, 7.3, 7.5**
## Error Handling
### Multi-Anchor Failure Scenarios
1. **All Anchors Missing**: Fall back to anchor-less resolution with logging
2. **Invalid Container Text**: Log warning and continue without container constraint
3. **Malformed Weights**: Validate and fall back to default weights
4. **Empty Candidate Set**: Return None with detailed failure reason
5. **Scoring Calculation Errors**: Use base score with error logging
### Recovery Strategies
1. **Graceful Degradation**: Continue with available anchors when some fail
2. **Weight Validation**: Normalize weights to sum to 1.0 if invalid
3. **Container Fallback**: Continue without container constraint if resolution fails
4. **Performance Fallback**: Disable caching if cache operations fail
5. **Logging Resilience**: Continue operation even if audit logging fails
## Testing Strategy
### Unit Tests
- Test multi-anchor text extraction and candidate finding
- Test hard constraint filtering with various constraint types
- Test weighted scoring with different weight configurations
- Test container resolution with various text patterns
- Test tie-breaking with identical scores
- Test performance optimizations (caching, reuse)
### Property-Based Tests
Using Hypothesis framework with 100+ iterations per property:
- **Property 1**: Multi-anchor evaluation completeness
- **Property 2**: Hard constraint strictness
- **Property 3**: Container resolution consistency
- **Property 4**: Weighted scoring monotonicity
- **Property 5**: Tie-breaking determinism
- **Property 6**: Anchor fallback resilience
- **Property 7**: Constraint filtering completeness
- **Property 8**: Semantic variant equivalence
- **Property 9**: Performance optimization consistency
- **Property 10**: Audit trail completeness
### Integration Tests
- End-to-end multi-anchor resolution with complex UI states
- Performance benchmarks with large UI element sets
- Cross-component integration with existing healing system
- Real-world scenarios with Login/Settings panel disambiguation
- Stress testing with many anchors and constraints
### Performance Tests
- Multi-anchor resolution should complete within 50ms for typical UI states
- Memory usage should remain constant regardless of anchor count
- Cache hit rates should exceed 80% for repeated container lookups
- Scoring calculations should scale linearly with candidate count