# Design Document ## Overview Cette refonte complète du Visual Workflow Builder transforme l'interface en un système 100% vision-based, éliminant tous les sélecteurs traditionnels et implémentant une approche de sélection visuelle pure conforme à l'architecture RPA Vision V3. ## Architecture ### System Architecture ```mermaid graph TB subgraph "Frontend (React/TypeScript)" VWB[Visual Workflow Builder] VS[Visual Selector Component] VTP[Visual Target Preview] RSV[Reference Screenshot Viewer] end subgraph "Backend APIs" SCS[Screen Capture Service] EDE[Element Detection Engine] VTM[Visual Target Manager] EGS[Embedding Generation Service] end subgraph "Core RPA Vision V3" SC[Screen Capturer] UD[UI Detector] FE[Fusion Engine] EM[Embedding Manager] end VWB --> VS VS --> SCS SCS --> SC SCS --> UD UD --> EDE EDE --> FE FE --> EM EM --> VTM VTM --> VTP VTP --> RSV ``` ### Component Hierarchy ``` VisualWorkflowBuilder/ ├── VisualCanvas/ # Canvas principal avec workflow nodes │ ├── VisualNode/ # Node avec preview visuel │ └── VisualConnection/ # Connexions entre nodes ├── VisualPalette/ # Palette d'actions visuelles │ └── VisualActionCard/ # Carte d'action avec icône ├── VisualPropertiesPanel/ # Panneau de propriétés visuelles │ ├── VisualTargetConfig/ # Configuration de cible visuelle │ ├── ReferenceScreenshotView/ # Affichage de capture de référence │ └── VisualMetadataDisplay/ # Métadonnées visuelles ├── VisualScreenSelector/ # Sélecteur d'écran (refonte complète) │ ├── ScreenCaptureView/ # Vue de capture d'écran │ ├── ElementDetectionOverlay/ # Overlay de détection d'éléments │ └── BoundingBoxRenderer/ # Rendu des boîtes de délimitation └── VisualValidationPanel/ # Panneau de validation visuelle ├── TargetPreview/ # Aperçu de la cible └── ValidationFeedback/ # Feedback de validation ``` ## Components and Interfaces ### 1. VisualScreenSelector (Refonte Complète) **Responsabilités:** - Capture d'écran en temps réel - Détection automatique d'éléments UI - Affichage de l'image de référence avec overlays précis - Sélection visuelle pure sans sélecteurs CSS/XPath **Interface:** ```typescript interface VisualScreenSelectorProps { open: boolean; onClose: () => void; onElementSelected: (target: VisualTarget) => void; captureMode: 'fullscreen' | 'window' | 'region'; } interface DetectedElement { id: string; bounds: BoundingBox; elementType: ElementType; confidence: number; textContent?: string; visualFeatures: VisualFeatures; } interface BoundingBox { x: number; y: number; width: number; height: number; screenScale: number; dpiScale: number; } ``` ### 2. ReferenceScreenshotView **Responsabilités:** - Affichage de l'image de capture de référence - Overlay précis des éléments sélectionnés - Zoom et pan pour inspection détaillée - Gestion des différentes résolutions d'écran **Interface:** ```typescript interface ReferenceScreenshotViewProps { screenshot: string; // Base64 image data selectedElement?: DetectedElement; detectedElements: DetectedElement[]; onElementHover: (element: DetectedElement | null) => void; onElementClick: (element: DetectedElement) => void; zoomLevel: number; panOffset: { x: number; y: number }; } ``` ### 3. VisualTargetConfig **Responsabilités:** - Configuration visuelle des cibles - Suppression complète des champs CSS/XPath - Interface basée sur les métadonnées visuelles - Validation en temps réel **Interface:** ```typescript interface VisualTargetConfigProps { target: VisualTarget; onTargetUpdate: (target: VisualTarget) => void; onValidate: () => Promise; showAdvancedOptions: boolean; } interface VisualTarget { id: string; screenshot: string; // Image de l'élément avec overlay embedding: Float32Array; boundingBox: BoundingBox; confidence: number; metadata: VisualMetadata; contextualInfo: ContextualInfo; validationStatus: ValidationStatus; } ``` ### 4. ElementDetectionOverlay **Responsabilités:** - Rendu des overlays de détection en temps réel - Gestion des interactions hover/click - Affichage des informations de confiance - Animation fluide des transitions **Interface:** ```typescript interface ElementDetectionOverlayProps { elements: DetectedElement[]; hoveredElement?: DetectedElement; selectedElement?: DetectedElement; onElementInteraction: (element: DetectedElement, action: 'hover' | 'click') => void; overlayStyle: OverlayStyle; } interface OverlayStyle { hoverColor: string; selectedColor: string; borderWidth: number; opacity: number; animationDuration: number; } ``` ## Data Models ### VisualTarget (Enhanced) ```typescript interface VisualTarget { // Identification id: string; signature: string; // Données visuelles screenshot: string; // Image de référence avec overlay embedding: Float32Array; boundingBox: BoundingBox; // Métadonnées de détection confidence: number; elementType: ElementType; textContent?: string; // Informations contextuelles contextualInfo: { screenSize: { width: number; height: number }; captureTimestamp: string; surroundingElements: DetectedElement[]; relativePosition: string; }; // Validation et qualité validationStatus: 'pending' | 'valid' | 'invalid' | 'needs_review'; validationScore: number; lastValidated?: Date; // Métadonnées visuelles visualMetadata: { colorProfile: ColorProfile; textualFeatures: TextualFeatures; spatialFeatures: SpatialFeatures; accessibilityInfo: AccessibilityInfo; }; } ``` ### ScreenCaptureResult ```typescript interface ScreenCaptureResult { screenshot: string; // Base64 image data detectedElements: DetectedElement[]; captureMetadata: { timestamp: Date; screenResolution: { width: number; height: number }; dpiScale: number; captureRegion: BoundingBox; processingTime: number; }; qualityMetrics: { imageQuality: number; detectionConfidence: number; elementCount: number; }; } ``` ### ValidationResult ```typescript interface ValidationResult { isValid: boolean; confidence: number; issues: ValidationIssue[]; suggestions: string[]; visualFeedback: { highlightAreas: BoundingBox[]; warningMessages: string[]; successIndicators: string[]; }; } ``` ## Correctness Properties *A property is a characteristic or behavior that should hold true across all valid executions of a system-essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.* ### Property 1: Selector Generation Prohibition *For any* workflow configuration generated by the Visual_Workflow_Builder, the configuration should never contain CSS selector strings or XPath selector strings **Validates: Requirements 1.3** ### Property 2: Visual Selection Method Exclusivity *For any* action configuration interface, only visual selection methods should be available as configuration options **Validates: Requirements 1.4** ### Property 3: Screen Capture Trigger *For any* element selection request, the Visual_Selector should initiate screen capture automatically **Validates: Requirements 2.1** ### Property 4: Reference Screenshot Display *For any* screen capture result, the Visual_Selector should display the reference screenshot with detected elements highlighted **Validates: Requirements 2.2** ### Property 5: Hover Bounding Box Response *For any* mouse hover event over a detected element, the Visual_Selector should display a precise bounding box overlay **Validates: Requirements 2.3** ### Property 6: Click Visual Target Creation *For any* click event on a detected element, the Visual_Selector should automatically create a Visual_Target object **Validates: Requirements 2.4** ### Property 7: Metadata Visibility *For any* detected element, the Visual_Selector should display confidence scores and metadata information **Validates: Requirements 2.5** ### Property 8: Zoom Pan Functionality *For any* reference screenshot display, zoom and pan controls should function correctly and affect the display **Validates: Requirements 2.6** ### Property 9: Visual Target Screenshot Association *For any* Visual_Target in the workflow, the Visual_Workflow_Builder should display its associated reference screenshot **Validates: Requirements 3.1** ### Property 10: Green Border Overlay *For any* captured element in a reference screenshot, the element should be displayed with a green border overlay **Validates: Requirements 3.2** ### Property 11: Action Target Image Display *For any* configured action being viewed, the Visual_Workflow_Builder should show the target element image **Validates: Requirements 3.3** ### Property 12: Contextual Information Inclusion *For any* reference screenshot, contextual information including timestamp and screen size should be included and accurate **Validates: Requirements 3.4** ### Property 13: Screenshot Enlargement *For any* reference screenshot, enlargement controls should work correctly to provide detailed view **Validates: Requirements 3.5** ### Property 14: Pixel Perfect Alignment *For any* detected UI element, the bounding box should be aligned with the element within acceptable pixel tolerance **Validates: Requirements 4.1** ### Property 15: Screen Scaling Compensation *For any* overlay display across different screen configurations, the Visual_Selector should correctly account for screen scaling and resolution **Validates: Requirements 4.2** ### Property 16: Real-time Bounding Box Updates *For any* mouse hover state change, bounding boxes should update immediately in real-time **Validates: Requirements 4.3** ### Property 17: Selection State Persistence *For any* selected element, the bounding box should persist with correct coordinates after selection **Validates: Requirements 4.4** ### Property 18: Multi-monitor Coordinate Mapping *For any* multi-monitor setup, the Visual_Selector should handle coordinate mapping correctly across all displays **Validates: Requirements 4.5** ### Property 19: Screen Capture API Integration *For any* screen capture operation, the Visual_Workflow_Builder should use the existing Screen_Capture_Service API with proper parameters **Validates: Requirements 5.1** ### Property 20: Element Detection Integration *For any* element detection operation, the Visual_Workflow_Builder should properly integrate with the Element_Detection_Engine **Validates: Requirements 5.2** ### Property 21: Visual Target Manager Usage *For any* Visual_Target storage or retrieval operation, the Visual_Workflow_Builder should use the Visual_Target_Manager **Validates: Requirements 5.3** ### Property 22: Embedding System Integration *For any* embedding generation, the Visual_Workflow_Builder should leverage the existing embedding generation system **Validates: Requirements 5.4** ### Property 23: RPA Pipeline Compatibility *For any* generated workflow, the Visual_Workflow_Builder should maintain compatibility with the core RPA Vision V3 pipeline **Validates: Requirements 5.5** ### Property 24: Visual Target Thumbnails *For any* Visual_Target in the interface, the Visual_Workflow_Builder should display thumbnail previews **Validates: Requirements 6.2** ### Property 25: Visual Interaction Feedback *For any* user interaction, the Visual_Workflow_Builder should provide appropriate visual feedback **Validates: Requirements 6.4** ### Property 26: Confidence Score Display *For any* Visual_Target, the Visual_Workflow_Builder should display accurate confidence scores **Validates: Requirements 7.1** ### Property 27: Element Type Detection Display *For any* detected element, the Visual_Workflow_Builder should show the correct element type (button, input, link, etc.) **Validates: Requirements 7.2** ### Property 28: Contextual Information Display *For any* Visual_Target, the Visual_Workflow_Builder should display accurate contextual information including surrounding elements and position **Validates: Requirements 7.3** ### Property 29: Embedding Quality Metrics *For any* Visual_Target, the Visual_Workflow_Builder should show meaningful embedding quality metrics **Validates: Requirements 7.4** ### Property 30: Validation Status Display *For any* Visual_Target, the Visual_Workflow_Builder should provide accurate and updated validation status **Validates: Requirements 7.5** ### Property 31: Capture Button Response *For any* "Capture Screen" button click, the Screen_Capture_Service should initiate real-time screenshot capture **Validates: Requirements 8.1** ### Property 32: Interactive Element Detection Completeness *For any* test screen with interactive elements, the Screen_Capture_Service should detect all interactive UI elements **Validates: Requirements 8.2** ### Property 33: Sub-pixel Coordinate Precision *For any* element coordinate data returned, the Screen_Capture_Service should provide sub-pixel precision **Validates: Requirements 8.3** ### Property 34: Display Configuration Compatibility *For any* screen resolution and DPI setting combination, the Screen_Capture_Service should handle the configuration correctly **Validates: Requirements 8.4** ### Property 35: Capture Failure Fallback *For any* screen capture failure scenario, the Screen_Capture_Service should provide working fallback mechanisms **Validates: Requirements 8.5** ### Property 36: Selection Highlight Overlay *For any* element selection, the Visual_Workflow_Builder should highlight the element with a colored overlay **Validates: Requirements 9.1** ### Property 37: Action Preview Display *For any* selected element, the Visual_Workflow_Builder should show a preview of the action that will be performed **Validates: Requirements 9.2** ### Property 38: Validation Error Indicators *For any* validation failure, the Visual_Workflow_Builder should display visual error indicators **Validates: Requirements 9.3** ### Property 39: Real-time Detection Feedback *For any* element detection operation, the Visual_Workflow_Builder should provide real-time feedback during processing **Validates: Requirements 9.4** ### Property 40: Selection Testing Capability *For any* element selection, the Visual_Workflow_Builder should allow testing the selection before saving **Validates: Requirements 9.5** ### Property 41: Screen Capture Performance *For any* screen capture operation, the Visual_Workflow_Builder should complete the capture in less than 2 seconds **Validates: Requirements 10.1** ### Property 42: Element Detection Performance *For any* element detection operation, the Visual_Workflow_Builder should complete detection in less than 3 seconds **Validates: Requirements 10.2** ### Property 43: Async Operation Loading Indicators *For any* asynchronous operation, the Visual_Workflow_Builder should provide loading indicators **Validates: Requirements 10.3** ### Property 44: Screenshot Caching Efficiency *For any* reference screenshot access, the Visual_Workflow_Builder should cache and retrieve screenshots efficiently **Validates: Requirements 10.4** ### Property 45: Responsive Image Optimization *For any* screen size configuration, the Visual_Workflow_Builder should optimize image display appropriately **Validates: Requirements 10.5** ## Error Handling ### Screen Capture Failures - **Timeout Handling**: Implement 5-second timeout for capture operations with user notification - **Permission Errors**: Graceful handling of screen capture permission denials - **Multi-monitor Issues**: Fallback to primary monitor if multi-monitor capture fails - **Resolution Conflicts**: Automatic scaling adjustment for resolution mismatches ### Element Detection Failures - **Low Confidence Elements**: Warning indicators for elements below 70% confidence - **No Elements Detected**: Clear messaging and retry options when no elements are found - **Overlapping Elements**: Disambiguation UI for overlapping or nested elements - **Performance Degradation**: Progressive quality reduction for performance optimization ### Visual Target Validation - **Invalid Targets**: Clear error messages for targets that fail validation - **Outdated Screenshots**: Automatic re-capture suggestions for stale references - **Coordinate Misalignment**: Real-time correction suggestions for misaligned targets - **Embedding Failures**: Fallback to basic visual matching when embeddings fail ## Testing Strategy ### Unit Testing - **Component Isolation**: Test each visual component independently with mock data - **API Integration**: Mock all backend services for frontend component testing - **Error Scenarios**: Comprehensive testing of error handling paths - **Performance Boundaries**: Test performance limits and degradation scenarios ### Property-Based Testing - **Visual Target Generation**: Test Visual_Target creation across various element types and screen configurations - **Coordinate Precision**: Verify bounding box accuracy across different DPI and scaling settings - **Screenshot Processing**: Test image processing pipeline with various image formats and sizes - **Integration Workflows**: Verify end-to-end workflows from capture to execution ### Integration Testing - **Backend API Integration**: Test real integration with Screen_Capture_Service and Element_Detection_Engine - **Cross-browser Compatibility**: Verify functionality across different browsers and versions - **Multi-monitor Support**: Test coordinate mapping and capture across multiple display configurations - **Performance Under Load**: Test system behavior with multiple concurrent capture operations ### User Acceptance Testing - **Visual Selection Workflow**: End-to-end testing of the complete visual selection process - **Reference Screenshot Accuracy**: Verify that reference screenshots accurately represent selected elements - **Workflow Creation**: Test complete workflow creation using only visual selection methods - **Error Recovery**: Test user experience during error scenarios and recovery processes