Files

Dom a7de6a488b feat: replay E2E fonctionnel — 25/25 actions, 0 retries, SomEngine via serveur

Validé sur PC Windows (DESKTOP-58D5CAC, 2560x1600) :
- 8 clics résolus visuellement (1 anchor_template, 1 som_text_match, 6 som_vlm)
- Score moyen 0.75, temps moyen 1.6s
- Texte tapé correctement (bonjour, test word, date, email)
- 0 retries, 2 actions non vérifiées (OK)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-31 14:04:41 +02:00

15 KiB

Raw Blame History

Multi-Anchor Constraints Design Document

Overview

This document specifies the design for a multi-anchor constraint system that enables RPA Vision V3 to understand complex targeting instructions. The system combines multiple anchor references, hard constraints, and intelligent weighting to select optimal target elements with "combinatorial common sense."

Architecture

The multi-anchor constraint system extends the existing TargetResolver with advanced targeting capabilities:

┌─────────────────────────────────────────────────────────────────┐
│                Multi-Anchor Constraint Architecture              │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────┐ │
│  │  Enhanced       │    │  Multi-Anchor   │    │  Hard       │ │
│  │  TargetSpec     │    │  Resolver       │    │  Constraints│ │
│  │                 │    │                 │    │             │ │
│  │ • hard_         │◄──►│ • Anchor        │◄──►│ • Container │ │
│  │   constraints   │    │   Evaluation    │    │   Filter    │ │
│  │ • weights       │    │ • Best Combo    │    │ • Area      │ │
│  │ • multi-anchor  │    │   Selection     │    │   Filter    │ │
│  └─────────────────┘    └─────────────────┘    └─────────────┘ │
│           │                       │                       │     │
│           │                       │                       │     │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────┐ │
│  │  Container      │    │  Weighted       │    │  Tie-Break  │ │
│  │  Resolver       │    │  Scoring        │    │  System     │ │
│  │                 │    │                 │    │             │ │
│  │ • Text-based    │    │ • Proximity     │    │ • Stable    │ │
│  │   Container     │    │ • Alignment     │    │   Selection │ │
│  │   Finding       │    │ • Container     │    │ • Multi-    │ │
│  │ • Smallest      │    │ • ROI IOU       │    │   Criteria  │ │
│  │   Container     │    │                 │    │             │ │
│  └─────────────────┘    └─────────────────┘    └─────────────┘ │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Components and Interfaces

1. Enhanced TargetSpec

Extension of the existing TargetSpec dataclass with new fields:

from dataclasses import dataclass, field
from typing import Any, Dict, Optional

@dataclass
class TargetSpec:
    # Existing fields
    by_role: Optional[str] = None
    by_text: Optional[str] = None
    by_position: Optional[Dict[str, Any]] = None
    selection_policy: str = "first"
    context_hints: Dict[str, Any] = field(default_factory=dict)
    
    # New fields for Fiche #11
    hard_constraints: Dict[str, Any] = field(default_factory=dict)
    weights: Dict[str, float] = field(default_factory=dict)

Usage Examples:

# Multi-anchor with container constraint
target_spec = TargetSpec(
    by_role="input",
    context_hints={"near_text": ["Username", "Identifiant"]},
    hard_constraints={"within_container_text": "Login"},
    weights={"proximity": 0.45, "alignment": 0.35, "container": 0.20}
)

2. Multi-Anchor Resolution System

The core logic for evaluating multiple anchors and selecting the best combination:

class MultiAnchorResolver:
    def resolve_with_multiple_anchors(self, target_spec: TargetSpec, 
                                    ui_elements: List[UIElement]) -> Optional[ResolvedTarget]:
        """
        Resolve target using multiple anchor evaluation
        
        Process:
        1. Extract all anchor texts from context_hints
        2. Find all anchor candidates for each text
        3. For each anchor candidate, build ROI and score all target candidates
        4. Apply hard constraints to filter candidates
        5. Apply weighted scoring to rank candidates
        6. Use stable tie-breaking for final selection
        """

3. Hard Constraints System

Strict filtering system that eliminates candidates before scoring:

class HardConstraintsFilter:
    def apply_constraints(self, candidates: List[UIElement], 
                         constraints: Dict[str, Any],
                         ui_elements: List[UIElement]) -> List[UIElement]:
        """
        Apply hard constraints as strict filters
        
        Supported constraints:
        - within_container_text: Only elements within specified container
        - min_area: Only elements with area >= threshold
        - max_distance: Only elements within distance from anchor
        """
        
    def _container_bbox_from_text(self, text: str, 
                                 ui_elements: List[UIElement]) -> Optional[BBox]:
        """
        Find container bounding box from text label
        
        Process:
        1. Find element with matching text
        2. If element is container type, use its bbox
        3. If element is label, find smallest containing container
        4. Return container bbox or None if not found
        """

4. Weighted Scoring System

Configurable scoring system with multiple criteria:

class WeightedScorer:
    def calculate_composite_score(self, element: UIElement,
                                anchor: Optional[UIElement],
                                roi_bbox: Optional[BBox],
                                container_bbox: Optional[BBox],
                                weights: Dict[str, float],
                                base_score: float) -> float:
        """
        Calculate weighted composite score
        
        Components:
        - proximity: Distance from anchor (if available)
        - alignment: Horizontal/vertical alignment with anchor
        - container: Preference for elements in preferred container
        - roi_iou: Intersection over union with ROI
        """
        
    DEFAULT_WEIGHTS = {
        "proximity": 0.35,
        "alignment": 0.25, 
        "container": 0.15,
        "roi_iou": 0.25
    }

5. Container Resolution System

Text-based container finding with intelligent fallback:

class ContainerResolver:
    def find_container_by_text(self, text: str, 
                              ui_elements: List[UIElement]) -> Optional[BBox]:
        """
        Find container by text with smart detection
        
        Process:
        1. Find elements matching the text
        2. Check if element is already a container type
        3. If not, find smallest containing container
        4. Return container bbox with preference for smallest
        """
        
    CONTAINER_ROLES = {"panel", "container", "group", "form", "dialog", "window"}
    CONTAINER_TYPES = {"panel", "container", "group", "form", "dialog", "window"}

6. Stable Tie-Breaking System

Multi-criteria tie-breaking for reproducible results:

class TieBreaker:
    def create_sort_key(self, element: UIElement, score: float) -> Tuple:
        """
        Create stable sort key for tie-breaking
        
        Criteria (in order):
        1. Composite score (descending)
        2. Element confidence (descending) 
        3. Element area (descending)
        4. Element ID (ascending for stability)
        """
        return (
            score,
            float(getattr(element, "confidence", 1.0) or 1.0),
            self._bbox_area(element.bbox),
            str(element.element_id)
        )

Data Models

Enhanced Resolution Details

Extension of ResolvedTarget.resolution_details with multi-anchor information:

resolution_details = {
    # Existing fields
    "healing_attempt": int,
    "anchor_id": Optional[str],
    "top3": List[Dict],
    
    # New fields for multi-anchor
    "anchors_attempted": List[str],           # All anchor texts tried
    "successful_anchor": Optional[str],       # Which anchor text succeeded
    "hard_constraints_applied": Dict[str, Any], # Constraints that were applied
    "candidates_filtered": int,               # How many candidates were filtered
    "weights_used": Dict[str, float],         # Actual weights applied
    "tie_break_criteria": Optional[str],      # Which tie-break criterion was used
    "container_resolved": Optional[str],      # Container text that was resolved
    "performance_metrics": Dict[str, float]   # Timing and efficiency metrics
}

Multi-Anchor Metrics

@dataclass
class MultiAnchorMetrics:
    """Metrics for multi-anchor resolution performance"""
    total_anchors_attempted: int
    successful_anchor_index: int
    candidates_before_constraints: int
    candidates_after_constraints: int
    scoring_duration_ms: float
    container_resolution_duration_ms: float
    total_resolution_duration_ms: float
    cache_hits: int
    cache_misses: int

Correctness Properties

A property is a characteristic or behavior that should hold true across all valid executions of a system-essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.

Property 1: Multi-anchor evaluation completeness

For any target specification with multiple anchor texts, all anchor texts should be attempted for resolution until one succeeds or all are exhausted Validates: Requirements 1.1, 1.3

Property 2: Hard constraint strictness

For any set of hard constraints, no element that violates any constraint should be included in the final candidate set Validates: Requirements 2.1, 2.4

Property 3: Container resolution consistency

For any container text specification, the same container should be resolved consistently across multiple calls with the same UI state Validates: Requirements 4.1, 4.4

Property 4: Weighted scoring monotonicity

For any two elements where element A is objectively better than element B on all weighted criteria, element A should have a higher composite score than element B Validates: Requirements 3.1, 3.2, 3.3, 3.4

Property 5: Tie-breaking determinism

For any UI state processed multiple times, when multiple elements have identical scores, the same element should always be selected Validates: Requirements 5.5

Property 6: Anchor fallback resilience

For any target specification where some anchor texts are missing, resolution should continue with available anchors without failing Validates: Requirements 1.3, 1.4

Property 7: Constraint filtering completeness

For any hard constraint specification, all elements that violate the constraint should be filtered out before scoring Validates: Requirements 2.2, 2.3

Property 8: Semantic variant equivalence

For any set of semantic variant anchor texts, elements found by any variant should be treated as equivalent candidates Validates: Requirements 6.1, 6.2

Property 9: Performance optimization consistency

For any multi-anchor resolution, UI element analysis should be reused between anchor evaluations to avoid redundant computation Validates: Requirements 8.2, 8.4

Property 10: Audit trail completeness

For any multi-anchor resolution, the resolution details should contain complete information about anchors attempted, constraints applied, and scoring performed Validates: Requirements 7.1, 7.2, 7.3, 7.5

Error Handling

Multi-Anchor Failure Scenarios

All Anchors Missing: Fall back to anchor-less resolution with logging
Invalid Container Text: Log warning and continue without container constraint
Malformed Weights: Validate and fall back to default weights
Empty Candidate Set: Return None with detailed failure reason
Scoring Calculation Errors: Use base score with error logging

Recovery Strategies

Graceful Degradation: Continue with available anchors when some fail
Weight Validation: Normalize weights to sum to 1.0 if invalid
Container Fallback: Continue without container constraint if resolution fails
Performance Fallback: Disable caching if cache operations fail
Logging Resilience: Continue operation even if audit logging fails

Testing Strategy

Unit Tests

Test multi-anchor text extraction and candidate finding
Test hard constraint filtering with various constraint types
Test weighted scoring with different weight configurations
Test container resolution with various text patterns
Test tie-breaking with identical scores
Test performance optimizations (caching, reuse)

Property-Based Tests

Using Hypothesis framework with 100+ iterations per property:

Property 1: Multi-anchor evaluation completeness
Property 2: Hard constraint strictness
Property 3: Container resolution consistency
Property 4: Weighted scoring monotonicity
Property 5: Tie-breaking determinism
Property 6: Anchor fallback resilience
Property 7: Constraint filtering completeness
Property 8: Semantic variant equivalence
Property 9: Performance optimization consistency
Property 10: Audit trail completeness

Integration Tests

End-to-end multi-anchor resolution with complex UI states
Performance benchmarks with large UI element sets
Cross-component integration with existing healing system
Real-world scenarios with Login/Settings panel disambiguation
Stress testing with many anchors and constraints

Performance Tests

Multi-anchor resolution should complete within 50ms for typical UI states
Memory usage should remain constant regardless of anchor count
Cache hit rates should exceed 80% for repeated container lookups
Scoring calculations should scale linearly with candidate count

15 KiB Raw Blame History