Validé sur PC Windows (DESKTOP-58D5CAC, 2560x1600) : - 8 clics résolus visuellement (1 anchor_template, 1 som_text_match, 6 som_vlm) - Score moyen 0.75, temps moyen 1.6s - Texte tapé correctement (bonjour, test word, date, email) - 0 retries, 2 actions non vérifiées (OK) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
15 KiB
Multi-Anchor Constraints Design Document
Overview
This document specifies the design for a multi-anchor constraint system that enables RPA Vision V3 to understand complex targeting instructions. The system combines multiple anchor references, hard constraints, and intelligent weighting to select optimal target elements with "combinatorial common sense."
Architecture
The multi-anchor constraint system extends the existing TargetResolver with advanced targeting capabilities:
┌─────────────────────────────────────────────────────────────────┐
│ Multi-Anchor Constraint Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────┐ │
│ │ Enhanced │ │ Multi-Anchor │ │ Hard │ │
│ │ TargetSpec │ │ Resolver │ │ Constraints│ │
│ │ │ │ │ │ │ │
│ │ • hard_ │◄──►│ • Anchor │◄──►│ • Container │ │
│ │ constraints │ │ Evaluation │ │ Filter │ │
│ │ • weights │ │ • Best Combo │ │ • Area │ │
│ │ • multi-anchor │ │ Selection │ │ Filter │ │
│ └─────────────────┘ └─────────────────┘ └─────────────┘ │
│ │ │ │ │
│ │ │ │ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────┐ │
│ │ Container │ │ Weighted │ │ Tie-Break │ │
│ │ Resolver │ │ Scoring │ │ System │ │
│ │ │ │ │ │ │ │
│ │ • Text-based │ │ • Proximity │ │ • Stable │ │
│ │ Container │ │ • Alignment │ │ Selection │ │
│ │ Finding │ │ • Container │ │ • Multi- │ │
│ │ • Smallest │ │ • ROI IOU │ │ Criteria │ │
│ │ Container │ │ │ │ │ │
│ └─────────────────┘ └─────────────────┘ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Components and Interfaces
1. Enhanced TargetSpec
Extension of the existing TargetSpec dataclass with new fields:
from dataclasses import dataclass, field
from typing import Any, Dict, Optional
@dataclass
class TargetSpec:
# Existing fields
by_role: Optional[str] = None
by_text: Optional[str] = None
by_position: Optional[Dict[str, Any]] = None
selection_policy: str = "first"
context_hints: Dict[str, Any] = field(default_factory=dict)
# New fields for Fiche #11
hard_constraints: Dict[str, Any] = field(default_factory=dict)
weights: Dict[str, float] = field(default_factory=dict)
Usage Examples:
# Multi-anchor with container constraint
target_spec = TargetSpec(
by_role="input",
context_hints={"near_text": ["Username", "Identifiant"]},
hard_constraints={"within_container_text": "Login"},
weights={"proximity": 0.45, "alignment": 0.35, "container": 0.20}
)
2. Multi-Anchor Resolution System
The core logic for evaluating multiple anchors and selecting the best combination:
class MultiAnchorResolver:
def resolve_with_multiple_anchors(self, target_spec: TargetSpec,
ui_elements: List[UIElement]) -> Optional[ResolvedTarget]:
"""
Resolve target using multiple anchor evaluation
Process:
1. Extract all anchor texts from context_hints
2. Find all anchor candidates for each text
3. For each anchor candidate, build ROI and score all target candidates
4. Apply hard constraints to filter candidates
5. Apply weighted scoring to rank candidates
6. Use stable tie-breaking for final selection
"""
3. Hard Constraints System
Strict filtering system that eliminates candidates before scoring:
class HardConstraintsFilter:
def apply_constraints(self, candidates: List[UIElement],
constraints: Dict[str, Any],
ui_elements: List[UIElement]) -> List[UIElement]:
"""
Apply hard constraints as strict filters
Supported constraints:
- within_container_text: Only elements within specified container
- min_area: Only elements with area >= threshold
- max_distance: Only elements within distance from anchor
"""
def _container_bbox_from_text(self, text: str,
ui_elements: List[UIElement]) -> Optional[BBox]:
"""
Find container bounding box from text label
Process:
1. Find element with matching text
2. If element is container type, use its bbox
3. If element is label, find smallest containing container
4. Return container bbox or None if not found
"""
4. Weighted Scoring System
Configurable scoring system with multiple criteria:
class WeightedScorer:
def calculate_composite_score(self, element: UIElement,
anchor: Optional[UIElement],
roi_bbox: Optional[BBox],
container_bbox: Optional[BBox],
weights: Dict[str, float],
base_score: float) -> float:
"""
Calculate weighted composite score
Components:
- proximity: Distance from anchor (if available)
- alignment: Horizontal/vertical alignment with anchor
- container: Preference for elements in preferred container
- roi_iou: Intersection over union with ROI
"""
DEFAULT_WEIGHTS = {
"proximity": 0.35,
"alignment": 0.25,
"container": 0.15,
"roi_iou": 0.25
}
5. Container Resolution System
Text-based container finding with intelligent fallback:
class ContainerResolver:
def find_container_by_text(self, text: str,
ui_elements: List[UIElement]) -> Optional[BBox]:
"""
Find container by text with smart detection
Process:
1. Find elements matching the text
2. Check if element is already a container type
3. If not, find smallest containing container
4. Return container bbox with preference for smallest
"""
CONTAINER_ROLES = {"panel", "container", "group", "form", "dialog", "window"}
CONTAINER_TYPES = {"panel", "container", "group", "form", "dialog", "window"}
6. Stable Tie-Breaking System
Multi-criteria tie-breaking for reproducible results:
class TieBreaker:
def create_sort_key(self, element: UIElement, score: float) -> Tuple:
"""
Create stable sort key for tie-breaking
Criteria (in order):
1. Composite score (descending)
2. Element confidence (descending)
3. Element area (descending)
4. Element ID (ascending for stability)
"""
return (
score,
float(getattr(element, "confidence", 1.0) or 1.0),
self._bbox_area(element.bbox),
str(element.element_id)
)
Data Models
Enhanced Resolution Details
Extension of ResolvedTarget.resolution_details with multi-anchor information:
resolution_details = {
# Existing fields
"healing_attempt": int,
"anchor_id": Optional[str],
"top3": List[Dict],
# New fields for multi-anchor
"anchors_attempted": List[str], # All anchor texts tried
"successful_anchor": Optional[str], # Which anchor text succeeded
"hard_constraints_applied": Dict[str, Any], # Constraints that were applied
"candidates_filtered": int, # How many candidates were filtered
"weights_used": Dict[str, float], # Actual weights applied
"tie_break_criteria": Optional[str], # Which tie-break criterion was used
"container_resolved": Optional[str], # Container text that was resolved
"performance_metrics": Dict[str, float] # Timing and efficiency metrics
}
Multi-Anchor Metrics
@dataclass
class MultiAnchorMetrics:
"""Metrics for multi-anchor resolution performance"""
total_anchors_attempted: int
successful_anchor_index: int
candidates_before_constraints: int
candidates_after_constraints: int
scoring_duration_ms: float
container_resolution_duration_ms: float
total_resolution_duration_ms: float
cache_hits: int
cache_misses: int
Correctness Properties
A property is a characteristic or behavior that should hold true across all valid executions of a system-essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.
Property 1: Multi-anchor evaluation completeness
For any target specification with multiple anchor texts, all anchor texts should be attempted for resolution until one succeeds or all are exhausted Validates: Requirements 1.1, 1.3
Property 2: Hard constraint strictness
For any set of hard constraints, no element that violates any constraint should be included in the final candidate set Validates: Requirements 2.1, 2.4
Property 3: Container resolution consistency
For any container text specification, the same container should be resolved consistently across multiple calls with the same UI state Validates: Requirements 4.1, 4.4
Property 4: Weighted scoring monotonicity
For any two elements where element A is objectively better than element B on all weighted criteria, element A should have a higher composite score than element B Validates: Requirements 3.1, 3.2, 3.3, 3.4
Property 5: Tie-breaking determinism
For any UI state processed multiple times, when multiple elements have identical scores, the same element should always be selected Validates: Requirements 5.5
Property 6: Anchor fallback resilience
For any target specification where some anchor texts are missing, resolution should continue with available anchors without failing Validates: Requirements 1.3, 1.4
Property 7: Constraint filtering completeness
For any hard constraint specification, all elements that violate the constraint should be filtered out before scoring Validates: Requirements 2.2, 2.3
Property 8: Semantic variant equivalence
For any set of semantic variant anchor texts, elements found by any variant should be treated as equivalent candidates Validates: Requirements 6.1, 6.2
Property 9: Performance optimization consistency
For any multi-anchor resolution, UI element analysis should be reused between anchor evaluations to avoid redundant computation Validates: Requirements 8.2, 8.4
Property 10: Audit trail completeness
For any multi-anchor resolution, the resolution details should contain complete information about anchors attempted, constraints applied, and scoring performed Validates: Requirements 7.1, 7.2, 7.3, 7.5
Error Handling
Multi-Anchor Failure Scenarios
- All Anchors Missing: Fall back to anchor-less resolution with logging
- Invalid Container Text: Log warning and continue without container constraint
- Malformed Weights: Validate and fall back to default weights
- Empty Candidate Set: Return None with detailed failure reason
- Scoring Calculation Errors: Use base score with error logging
Recovery Strategies
- Graceful Degradation: Continue with available anchors when some fail
- Weight Validation: Normalize weights to sum to 1.0 if invalid
- Container Fallback: Continue without container constraint if resolution fails
- Performance Fallback: Disable caching if cache operations fail
- Logging Resilience: Continue operation even if audit logging fails
Testing Strategy
Unit Tests
- Test multi-anchor text extraction and candidate finding
- Test hard constraint filtering with various constraint types
- Test weighted scoring with different weight configurations
- Test container resolution with various text patterns
- Test tie-breaking with identical scores
- Test performance optimizations (caching, reuse)
Property-Based Tests
Using Hypothesis framework with 100+ iterations per property:
- Property 1: Multi-anchor evaluation completeness
- Property 2: Hard constraint strictness
- Property 3: Container resolution consistency
- Property 4: Weighted scoring monotonicity
- Property 5: Tie-breaking determinism
- Property 6: Anchor fallback resilience
- Property 7: Constraint filtering completeness
- Property 8: Semantic variant equivalence
- Property 9: Performance optimization consistency
- Property 10: Audit trail completeness
Integration Tests
- End-to-end multi-anchor resolution with complex UI states
- Performance benchmarks with large UI element sets
- Cross-component integration with existing healing system
- Real-world scenarios with Login/Settings panel disambiguation
- Stress testing with many anchors and constraints
Performance Tests
- Multi-anchor resolution should complete within 50ms for typical UI states
- Memory usage should remain constant regardless of anchor count
- Cache hit rates should exceed 80% for repeated container lookups
- Scoring calculations should scale linearly with candidate count