feat: replay E2E fonctionnel — 25/25 actions, 0 retries, SomEngine via serveur

Validé sur PC Windows (DESKTOP-58D5CAC, 2560x1600) :
- 8 clics résolus visuellement (1 anchor_template, 1 som_text_match, 6 som_vlm)
- Score moyen 0.75, temps moyen 1.6s
- Texte tapé correctement (bonjour, test word, date, email)
- 0 retries, 2 actions non vérifiées (OK)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Dom
2026-03-31 14:04:41 +02:00
parent 5e0b53cfd1
commit a7de6a488b
79542 changed files with 6091757 additions and 1 deletions

View File

@@ -0,0 +1,491 @@
# Design Document
## Overview
Cette refonte complète du Visual Workflow Builder transforme l'interface en un système 100% vision-based, éliminant tous les sélecteurs traditionnels et implémentant une approche de sélection visuelle pure conforme à l'architecture RPA Vision V3.
## Architecture
### System Architecture
```mermaid
graph TB
subgraph "Frontend (React/TypeScript)"
VWB[Visual Workflow Builder]
VS[Visual Selector Component]
VTP[Visual Target Preview]
RSV[Reference Screenshot Viewer]
end
subgraph "Backend APIs"
SCS[Screen Capture Service]
EDE[Element Detection Engine]
VTM[Visual Target Manager]
EGS[Embedding Generation Service]
end
subgraph "Core RPA Vision V3"
SC[Screen Capturer]
UD[UI Detector]
FE[Fusion Engine]
EM[Embedding Manager]
end
VWB --> VS
VS --> SCS
SCS --> SC
SCS --> UD
UD --> EDE
EDE --> FE
FE --> EM
EM --> VTM
VTM --> VTP
VTP --> RSV
```
### Component Hierarchy
```
VisualWorkflowBuilder/
├── VisualCanvas/ # Canvas principal avec workflow nodes
│ ├── VisualNode/ # Node avec preview visuel
│ └── VisualConnection/ # Connexions entre nodes
├── VisualPalette/ # Palette d'actions visuelles
│ └── VisualActionCard/ # Carte d'action avec icône
├── VisualPropertiesPanel/ # Panneau de propriétés visuelles
│ ├── VisualTargetConfig/ # Configuration de cible visuelle
│ ├── ReferenceScreenshotView/ # Affichage de capture de référence
│ └── VisualMetadataDisplay/ # Métadonnées visuelles
├── VisualScreenSelector/ # Sélecteur d'écran (refonte complète)
│ ├── ScreenCaptureView/ # Vue de capture d'écran
│ ├── ElementDetectionOverlay/ # Overlay de détection d'éléments
│ └── BoundingBoxRenderer/ # Rendu des boîtes de délimitation
└── VisualValidationPanel/ # Panneau de validation visuelle
├── TargetPreview/ # Aperçu de la cible
└── ValidationFeedback/ # Feedback de validation
```
## Components and Interfaces
### 1. VisualScreenSelector (Refonte Complète)
**Responsabilités:**
- Capture d'écran en temps réel
- Détection automatique d'éléments UI
- Affichage de l'image de référence avec overlays précis
- Sélection visuelle pure sans sélecteurs CSS/XPath
**Interface:**
```typescript
interface VisualScreenSelectorProps {
open: boolean;
onClose: () => void;
onElementSelected: (target: VisualTarget) => void;
captureMode: 'fullscreen' | 'window' | 'region';
}
interface DetectedElement {
id: string;
bounds: BoundingBox;
elementType: ElementType;
confidence: number;
textContent?: string;
visualFeatures: VisualFeatures;
}
interface BoundingBox {
x: number;
y: number;
width: number;
height: number;
screenScale: number;
dpiScale: number;
}
```
### 2. ReferenceScreenshotView
**Responsabilités:**
- Affichage de l'image de capture de référence
- Overlay précis des éléments sélectionnés
- Zoom et pan pour inspection détaillée
- Gestion des différentes résolutions d'écran
**Interface:**
```typescript
interface ReferenceScreenshotViewProps {
screenshot: string; // Base64 image data
selectedElement?: DetectedElement;
detectedElements: DetectedElement[];
onElementHover: (element: DetectedElement | null) => void;
onElementClick: (element: DetectedElement) => void;
zoomLevel: number;
panOffset: { x: number; y: number };
}
```
### 3. VisualTargetConfig
**Responsabilités:**
- Configuration visuelle des cibles
- Suppression complète des champs CSS/XPath
- Interface basée sur les métadonnées visuelles
- Validation en temps réel
**Interface:**
```typescript
interface VisualTargetConfigProps {
target: VisualTarget;
onTargetUpdate: (target: VisualTarget) => void;
onValidate: () => Promise<ValidationResult>;
showAdvancedOptions: boolean;
}
interface VisualTarget {
id: string;
screenshot: string; // Image de l'élément avec overlay
embedding: Float32Array;
boundingBox: BoundingBox;
confidence: number;
metadata: VisualMetadata;
contextualInfo: ContextualInfo;
validationStatus: ValidationStatus;
}
```
### 4. ElementDetectionOverlay
**Responsabilités:**
- Rendu des overlays de détection en temps réel
- Gestion des interactions hover/click
- Affichage des informations de confiance
- Animation fluide des transitions
**Interface:**
```typescript
interface ElementDetectionOverlayProps {
elements: DetectedElement[];
hoveredElement?: DetectedElement;
selectedElement?: DetectedElement;
onElementInteraction: (element: DetectedElement, action: 'hover' | 'click') => void;
overlayStyle: OverlayStyle;
}
interface OverlayStyle {
hoverColor: string;
selectedColor: string;
borderWidth: number;
opacity: number;
animationDuration: number;
}
```
## Data Models
### VisualTarget (Enhanced)
```typescript
interface VisualTarget {
// Identification
id: string;
signature: string;
// Données visuelles
screenshot: string; // Image de référence avec overlay
embedding: Float32Array;
boundingBox: BoundingBox;
// Métadonnées de détection
confidence: number;
elementType: ElementType;
textContent?: string;
// Informations contextuelles
contextualInfo: {
screenSize: { width: number; height: number };
captureTimestamp: string;
surroundingElements: DetectedElement[];
relativePosition: string;
};
// Validation et qualité
validationStatus: 'pending' | 'valid' | 'invalid' | 'needs_review';
validationScore: number;
lastValidated?: Date;
// Métadonnées visuelles
visualMetadata: {
colorProfile: ColorProfile;
textualFeatures: TextualFeatures;
spatialFeatures: SpatialFeatures;
accessibilityInfo: AccessibilityInfo;
};
}
```
### ScreenCaptureResult
```typescript
interface ScreenCaptureResult {
screenshot: string; // Base64 image data
detectedElements: DetectedElement[];
captureMetadata: {
timestamp: Date;
screenResolution: { width: number; height: number };
dpiScale: number;
captureRegion: BoundingBox;
processingTime: number;
};
qualityMetrics: {
imageQuality: number;
detectionConfidence: number;
elementCount: number;
};
}
```
### ValidationResult
```typescript
interface ValidationResult {
isValid: boolean;
confidence: number;
issues: ValidationIssue[];
suggestions: string[];
visualFeedback: {
highlightAreas: BoundingBox[];
warningMessages: string[];
successIndicators: string[];
};
}
```
## Correctness Properties
*A property is a characteristic or behavior that should hold true across all valid executions of a system-essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
### Property 1: Selector Generation Prohibition
*For any* workflow configuration generated by the Visual_Workflow_Builder, the configuration should never contain CSS selector strings or XPath selector strings
**Validates: Requirements 1.3**
### Property 2: Visual Selection Method Exclusivity
*For any* action configuration interface, only visual selection methods should be available as configuration options
**Validates: Requirements 1.4**
### Property 3: Screen Capture Trigger
*For any* element selection request, the Visual_Selector should initiate screen capture automatically
**Validates: Requirements 2.1**
### Property 4: Reference Screenshot Display
*For any* screen capture result, the Visual_Selector should display the reference screenshot with detected elements highlighted
**Validates: Requirements 2.2**
### Property 5: Hover Bounding Box Response
*For any* mouse hover event over a detected element, the Visual_Selector should display a precise bounding box overlay
**Validates: Requirements 2.3**
### Property 6: Click Visual Target Creation
*For any* click event on a detected element, the Visual_Selector should automatically create a Visual_Target object
**Validates: Requirements 2.4**
### Property 7: Metadata Visibility
*For any* detected element, the Visual_Selector should display confidence scores and metadata information
**Validates: Requirements 2.5**
### Property 8: Zoom Pan Functionality
*For any* reference screenshot display, zoom and pan controls should function correctly and affect the display
**Validates: Requirements 2.6**
### Property 9: Visual Target Screenshot Association
*For any* Visual_Target in the workflow, the Visual_Workflow_Builder should display its associated reference screenshot
**Validates: Requirements 3.1**
### Property 10: Green Border Overlay
*For any* captured element in a reference screenshot, the element should be displayed with a green border overlay
**Validates: Requirements 3.2**
### Property 11: Action Target Image Display
*For any* configured action being viewed, the Visual_Workflow_Builder should show the target element image
**Validates: Requirements 3.3**
### Property 12: Contextual Information Inclusion
*For any* reference screenshot, contextual information including timestamp and screen size should be included and accurate
**Validates: Requirements 3.4**
### Property 13: Screenshot Enlargement
*For any* reference screenshot, enlargement controls should work correctly to provide detailed view
**Validates: Requirements 3.5**
### Property 14: Pixel Perfect Alignment
*For any* detected UI element, the bounding box should be aligned with the element within acceptable pixel tolerance
**Validates: Requirements 4.1**
### Property 15: Screen Scaling Compensation
*For any* overlay display across different screen configurations, the Visual_Selector should correctly account for screen scaling and resolution
**Validates: Requirements 4.2**
### Property 16: Real-time Bounding Box Updates
*For any* mouse hover state change, bounding boxes should update immediately in real-time
**Validates: Requirements 4.3**
### Property 17: Selection State Persistence
*For any* selected element, the bounding box should persist with correct coordinates after selection
**Validates: Requirements 4.4**
### Property 18: Multi-monitor Coordinate Mapping
*For any* multi-monitor setup, the Visual_Selector should handle coordinate mapping correctly across all displays
**Validates: Requirements 4.5**
### Property 19: Screen Capture API Integration
*For any* screen capture operation, the Visual_Workflow_Builder should use the existing Screen_Capture_Service API with proper parameters
**Validates: Requirements 5.1**
### Property 20: Element Detection Integration
*For any* element detection operation, the Visual_Workflow_Builder should properly integrate with the Element_Detection_Engine
**Validates: Requirements 5.2**
### Property 21: Visual Target Manager Usage
*For any* Visual_Target storage or retrieval operation, the Visual_Workflow_Builder should use the Visual_Target_Manager
**Validates: Requirements 5.3**
### Property 22: Embedding System Integration
*For any* embedding generation, the Visual_Workflow_Builder should leverage the existing embedding generation system
**Validates: Requirements 5.4**
### Property 23: RPA Pipeline Compatibility
*For any* generated workflow, the Visual_Workflow_Builder should maintain compatibility with the core RPA Vision V3 pipeline
**Validates: Requirements 5.5**
### Property 24: Visual Target Thumbnails
*For any* Visual_Target in the interface, the Visual_Workflow_Builder should display thumbnail previews
**Validates: Requirements 6.2**
### Property 25: Visual Interaction Feedback
*For any* user interaction, the Visual_Workflow_Builder should provide appropriate visual feedback
**Validates: Requirements 6.4**
### Property 26: Confidence Score Display
*For any* Visual_Target, the Visual_Workflow_Builder should display accurate confidence scores
**Validates: Requirements 7.1**
### Property 27: Element Type Detection Display
*For any* detected element, the Visual_Workflow_Builder should show the correct element type (button, input, link, etc.)
**Validates: Requirements 7.2**
### Property 28: Contextual Information Display
*For any* Visual_Target, the Visual_Workflow_Builder should display accurate contextual information including surrounding elements and position
**Validates: Requirements 7.3**
### Property 29: Embedding Quality Metrics
*For any* Visual_Target, the Visual_Workflow_Builder should show meaningful embedding quality metrics
**Validates: Requirements 7.4**
### Property 30: Validation Status Display
*For any* Visual_Target, the Visual_Workflow_Builder should provide accurate and updated validation status
**Validates: Requirements 7.5**
### Property 31: Capture Button Response
*For any* "Capture Screen" button click, the Screen_Capture_Service should initiate real-time screenshot capture
**Validates: Requirements 8.1**
### Property 32: Interactive Element Detection Completeness
*For any* test screen with interactive elements, the Screen_Capture_Service should detect all interactive UI elements
**Validates: Requirements 8.2**
### Property 33: Sub-pixel Coordinate Precision
*For any* element coordinate data returned, the Screen_Capture_Service should provide sub-pixel precision
**Validates: Requirements 8.3**
### Property 34: Display Configuration Compatibility
*For any* screen resolution and DPI setting combination, the Screen_Capture_Service should handle the configuration correctly
**Validates: Requirements 8.4**
### Property 35: Capture Failure Fallback
*For any* screen capture failure scenario, the Screen_Capture_Service should provide working fallback mechanisms
**Validates: Requirements 8.5**
### Property 36: Selection Highlight Overlay
*For any* element selection, the Visual_Workflow_Builder should highlight the element with a colored overlay
**Validates: Requirements 9.1**
### Property 37: Action Preview Display
*For any* selected element, the Visual_Workflow_Builder should show a preview of the action that will be performed
**Validates: Requirements 9.2**
### Property 38: Validation Error Indicators
*For any* validation failure, the Visual_Workflow_Builder should display visual error indicators
**Validates: Requirements 9.3**
### Property 39: Real-time Detection Feedback
*For any* element detection operation, the Visual_Workflow_Builder should provide real-time feedback during processing
**Validates: Requirements 9.4**
### Property 40: Selection Testing Capability
*For any* element selection, the Visual_Workflow_Builder should allow testing the selection before saving
**Validates: Requirements 9.5**
### Property 41: Screen Capture Performance
*For any* screen capture operation, the Visual_Workflow_Builder should complete the capture in less than 2 seconds
**Validates: Requirements 10.1**
### Property 42: Element Detection Performance
*For any* element detection operation, the Visual_Workflow_Builder should complete detection in less than 3 seconds
**Validates: Requirements 10.2**
### Property 43: Async Operation Loading Indicators
*For any* asynchronous operation, the Visual_Workflow_Builder should provide loading indicators
**Validates: Requirements 10.3**
### Property 44: Screenshot Caching Efficiency
*For any* reference screenshot access, the Visual_Workflow_Builder should cache and retrieve screenshots efficiently
**Validates: Requirements 10.4**
### Property 45: Responsive Image Optimization
*For any* screen size configuration, the Visual_Workflow_Builder should optimize image display appropriately
**Validates: Requirements 10.5**
## Error Handling
### Screen Capture Failures
- **Timeout Handling**: Implement 5-second timeout for capture operations with user notification
- **Permission Errors**: Graceful handling of screen capture permission denials
- **Multi-monitor Issues**: Fallback to primary monitor if multi-monitor capture fails
- **Resolution Conflicts**: Automatic scaling adjustment for resolution mismatches
### Element Detection Failures
- **Low Confidence Elements**: Warning indicators for elements below 70% confidence
- **No Elements Detected**: Clear messaging and retry options when no elements are found
- **Overlapping Elements**: Disambiguation UI for overlapping or nested elements
- **Performance Degradation**: Progressive quality reduction for performance optimization
### Visual Target Validation
- **Invalid Targets**: Clear error messages for targets that fail validation
- **Outdated Screenshots**: Automatic re-capture suggestions for stale references
- **Coordinate Misalignment**: Real-time correction suggestions for misaligned targets
- **Embedding Failures**: Fallback to basic visual matching when embeddings fail
## Testing Strategy
### Unit Testing
- **Component Isolation**: Test each visual component independently with mock data
- **API Integration**: Mock all backend services for frontend component testing
- **Error Scenarios**: Comprehensive testing of error handling paths
- **Performance Boundaries**: Test performance limits and degradation scenarios
### Property-Based Testing
- **Visual Target Generation**: Test Visual_Target creation across various element types and screen configurations
- **Coordinate Precision**: Verify bounding box accuracy across different DPI and scaling settings
- **Screenshot Processing**: Test image processing pipeline with various image formats and sizes
- **Integration Workflows**: Verify end-to-end workflows from capture to execution
### Integration Testing
- **Backend API Integration**: Test real integration with Screen_Capture_Service and Element_Detection_Engine
- **Cross-browser Compatibility**: Verify functionality across different browsers and versions
- **Multi-monitor Support**: Test coordinate mapping and capture across multiple display configurations
- **Performance Under Load**: Test system behavior with multiple concurrent capture operations
### User Acceptance Testing
- **Visual Selection Workflow**: End-to-end testing of the complete visual selection process
- **Reference Screenshot Accuracy**: Verify that reference screenshots accurately represent selected elements
- **Workflow Creation**: Test complete workflow creation using only visual selection methods
- **Error Recovery**: Test user experience during error scenarios and recovery processes