Validé sur PC Windows (DESKTOP-58D5CAC, 2560x1600) : - 8 clics résolus visuellement (1 anchor_template, 1 som_text_match, 6 som_vlm) - Score moyen 0.75, temps moyen 1.6s - Texte tapé correctement (bonjour, test word, date, email) - 0 retries, 2 actions non vérifiées (OK) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
18 KiB
Design Document
Overview
Cette refonte complète du Visual Workflow Builder transforme l'interface en un système 100% vision-based, éliminant tous les sélecteurs traditionnels et implémentant une approche de sélection visuelle pure conforme à l'architecture RPA Vision V3.
Architecture
System Architecture
graph TB
subgraph "Frontend (React/TypeScript)"
VWB[Visual Workflow Builder]
VS[Visual Selector Component]
VTP[Visual Target Preview]
RSV[Reference Screenshot Viewer]
end
subgraph "Backend APIs"
SCS[Screen Capture Service]
EDE[Element Detection Engine]
VTM[Visual Target Manager]
EGS[Embedding Generation Service]
end
subgraph "Core RPA Vision V3"
SC[Screen Capturer]
UD[UI Detector]
FE[Fusion Engine]
EM[Embedding Manager]
end
VWB --> VS
VS --> SCS
SCS --> SC
SCS --> UD
UD --> EDE
EDE --> FE
FE --> EM
EM --> VTM
VTM --> VTP
VTP --> RSV
Component Hierarchy
VisualWorkflowBuilder/
├── VisualCanvas/ # Canvas principal avec workflow nodes
│ ├── VisualNode/ # Node avec preview visuel
│ └── VisualConnection/ # Connexions entre nodes
├── VisualPalette/ # Palette d'actions visuelles
│ └── VisualActionCard/ # Carte d'action avec icône
├── VisualPropertiesPanel/ # Panneau de propriétés visuelles
│ ├── VisualTargetConfig/ # Configuration de cible visuelle
│ ├── ReferenceScreenshotView/ # Affichage de capture de référence
│ └── VisualMetadataDisplay/ # Métadonnées visuelles
├── VisualScreenSelector/ # Sélecteur d'écran (refonte complète)
│ ├── ScreenCaptureView/ # Vue de capture d'écran
│ ├── ElementDetectionOverlay/ # Overlay de détection d'éléments
│ └── BoundingBoxRenderer/ # Rendu des boîtes de délimitation
└── VisualValidationPanel/ # Panneau de validation visuelle
├── TargetPreview/ # Aperçu de la cible
└── ValidationFeedback/ # Feedback de validation
Components and Interfaces
1. VisualScreenSelector (Refonte Complète)
Responsabilités:
- Capture d'écran en temps réel
- Détection automatique d'éléments UI
- Affichage de l'image de référence avec overlays précis
- Sélection visuelle pure sans sélecteurs CSS/XPath
Interface:
interface VisualScreenSelectorProps {
open: boolean;
onClose: () => void;
onElementSelected: (target: VisualTarget) => void;
captureMode: 'fullscreen' | 'window' | 'region';
}
interface DetectedElement {
id: string;
bounds: BoundingBox;
elementType: ElementType;
confidence: number;
textContent?: string;
visualFeatures: VisualFeatures;
}
interface BoundingBox {
x: number;
y: number;
width: number;
height: number;
screenScale: number;
dpiScale: number;
}
2. ReferenceScreenshotView
Responsabilités:
- Affichage de l'image de capture de référence
- Overlay précis des éléments sélectionnés
- Zoom et pan pour inspection détaillée
- Gestion des différentes résolutions d'écran
Interface:
interface ReferenceScreenshotViewProps {
screenshot: string; // Base64 image data
selectedElement?: DetectedElement;
detectedElements: DetectedElement[];
onElementHover: (element: DetectedElement | null) => void;
onElementClick: (element: DetectedElement) => void;
zoomLevel: number;
panOffset: { x: number; y: number };
}
3. VisualTargetConfig
Responsabilités:
- Configuration visuelle des cibles
- Suppression complète des champs CSS/XPath
- Interface basée sur les métadonnées visuelles
- Validation en temps réel
Interface:
interface VisualTargetConfigProps {
target: VisualTarget;
onTargetUpdate: (target: VisualTarget) => void;
onValidate: () => Promise<ValidationResult>;
showAdvancedOptions: boolean;
}
interface VisualTarget {
id: string;
screenshot: string; // Image de l'élément avec overlay
embedding: Float32Array;
boundingBox: BoundingBox;
confidence: number;
metadata: VisualMetadata;
contextualInfo: ContextualInfo;
validationStatus: ValidationStatus;
}
4. ElementDetectionOverlay
Responsabilités:
- Rendu des overlays de détection en temps réel
- Gestion des interactions hover/click
- Affichage des informations de confiance
- Animation fluide des transitions
Interface:
interface ElementDetectionOverlayProps {
elements: DetectedElement[];
hoveredElement?: DetectedElement;
selectedElement?: DetectedElement;
onElementInteraction: (element: DetectedElement, action: 'hover' | 'click') => void;
overlayStyle: OverlayStyle;
}
interface OverlayStyle {
hoverColor: string;
selectedColor: string;
borderWidth: number;
opacity: number;
animationDuration: number;
}
Data Models
VisualTarget (Enhanced)
interface VisualTarget {
// Identification
id: string;
signature: string;
// Données visuelles
screenshot: string; // Image de référence avec overlay
embedding: Float32Array;
boundingBox: BoundingBox;
// Métadonnées de détection
confidence: number;
elementType: ElementType;
textContent?: string;
// Informations contextuelles
contextualInfo: {
screenSize: { width: number; height: number };
captureTimestamp: string;
surroundingElements: DetectedElement[];
relativePosition: string;
};
// Validation et qualité
validationStatus: 'pending' | 'valid' | 'invalid' | 'needs_review';
validationScore: number;
lastValidated?: Date;
// Métadonnées visuelles
visualMetadata: {
colorProfile: ColorProfile;
textualFeatures: TextualFeatures;
spatialFeatures: SpatialFeatures;
accessibilityInfo: AccessibilityInfo;
};
}
ScreenCaptureResult
interface ScreenCaptureResult {
screenshot: string; // Base64 image data
detectedElements: DetectedElement[];
captureMetadata: {
timestamp: Date;
screenResolution: { width: number; height: number };
dpiScale: number;
captureRegion: BoundingBox;
processingTime: number;
};
qualityMetrics: {
imageQuality: number;
detectionConfidence: number;
elementCount: number;
};
}
ValidationResult
interface ValidationResult {
isValid: boolean;
confidence: number;
issues: ValidationIssue[];
suggestions: string[];
visualFeedback: {
highlightAreas: BoundingBox[];
warningMessages: string[];
successIndicators: string[];
};
}
Correctness Properties
A property is a characteristic or behavior that should hold true across all valid executions of a system-essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.
Property 1: Selector Generation Prohibition
For any workflow configuration generated by the Visual_Workflow_Builder, the configuration should never contain CSS selector strings or XPath selector strings Validates: Requirements 1.3
Property 2: Visual Selection Method Exclusivity
For any action configuration interface, only visual selection methods should be available as configuration options Validates: Requirements 1.4
Property 3: Screen Capture Trigger
For any element selection request, the Visual_Selector should initiate screen capture automatically Validates: Requirements 2.1
Property 4: Reference Screenshot Display
For any screen capture result, the Visual_Selector should display the reference screenshot with detected elements highlighted Validates: Requirements 2.2
Property 5: Hover Bounding Box Response
For any mouse hover event over a detected element, the Visual_Selector should display a precise bounding box overlay Validates: Requirements 2.3
Property 6: Click Visual Target Creation
For any click event on a detected element, the Visual_Selector should automatically create a Visual_Target object Validates: Requirements 2.4
Property 7: Metadata Visibility
For any detected element, the Visual_Selector should display confidence scores and metadata information Validates: Requirements 2.5
Property 8: Zoom Pan Functionality
For any reference screenshot display, zoom and pan controls should function correctly and affect the display Validates: Requirements 2.6
Property 9: Visual Target Screenshot Association
For any Visual_Target in the workflow, the Visual_Workflow_Builder should display its associated reference screenshot Validates: Requirements 3.1
Property 10: Green Border Overlay
For any captured element in a reference screenshot, the element should be displayed with a green border overlay Validates: Requirements 3.2
Property 11: Action Target Image Display
For any configured action being viewed, the Visual_Workflow_Builder should show the target element image Validates: Requirements 3.3
Property 12: Contextual Information Inclusion
For any reference screenshot, contextual information including timestamp and screen size should be included and accurate Validates: Requirements 3.4
Property 13: Screenshot Enlargement
For any reference screenshot, enlargement controls should work correctly to provide detailed view Validates: Requirements 3.5
Property 14: Pixel Perfect Alignment
For any detected UI element, the bounding box should be aligned with the element within acceptable pixel tolerance Validates: Requirements 4.1
Property 15: Screen Scaling Compensation
For any overlay display across different screen configurations, the Visual_Selector should correctly account for screen scaling and resolution Validates: Requirements 4.2
Property 16: Real-time Bounding Box Updates
For any mouse hover state change, bounding boxes should update immediately in real-time Validates: Requirements 4.3
Property 17: Selection State Persistence
For any selected element, the bounding box should persist with correct coordinates after selection Validates: Requirements 4.4
Property 18: Multi-monitor Coordinate Mapping
For any multi-monitor setup, the Visual_Selector should handle coordinate mapping correctly across all displays Validates: Requirements 4.5
Property 19: Screen Capture API Integration
For any screen capture operation, the Visual_Workflow_Builder should use the existing Screen_Capture_Service API with proper parameters Validates: Requirements 5.1
Property 20: Element Detection Integration
For any element detection operation, the Visual_Workflow_Builder should properly integrate with the Element_Detection_Engine Validates: Requirements 5.2
Property 21: Visual Target Manager Usage
For any Visual_Target storage or retrieval operation, the Visual_Workflow_Builder should use the Visual_Target_Manager Validates: Requirements 5.3
Property 22: Embedding System Integration
For any embedding generation, the Visual_Workflow_Builder should leverage the existing embedding generation system Validates: Requirements 5.4
Property 23: RPA Pipeline Compatibility
For any generated workflow, the Visual_Workflow_Builder should maintain compatibility with the core RPA Vision V3 pipeline Validates: Requirements 5.5
Property 24: Visual Target Thumbnails
For any Visual_Target in the interface, the Visual_Workflow_Builder should display thumbnail previews Validates: Requirements 6.2
Property 25: Visual Interaction Feedback
For any user interaction, the Visual_Workflow_Builder should provide appropriate visual feedback Validates: Requirements 6.4
Property 26: Confidence Score Display
For any Visual_Target, the Visual_Workflow_Builder should display accurate confidence scores Validates: Requirements 7.1
Property 27: Element Type Detection Display
For any detected element, the Visual_Workflow_Builder should show the correct element type (button, input, link, etc.) Validates: Requirements 7.2
Property 28: Contextual Information Display
For any Visual_Target, the Visual_Workflow_Builder should display accurate contextual information including surrounding elements and position Validates: Requirements 7.3
Property 29: Embedding Quality Metrics
For any Visual_Target, the Visual_Workflow_Builder should show meaningful embedding quality metrics Validates: Requirements 7.4
Property 30: Validation Status Display
For any Visual_Target, the Visual_Workflow_Builder should provide accurate and updated validation status Validates: Requirements 7.5
Property 31: Capture Button Response
For any "Capture Screen" button click, the Screen_Capture_Service should initiate real-time screenshot capture Validates: Requirements 8.1
Property 32: Interactive Element Detection Completeness
For any test screen with interactive elements, the Screen_Capture_Service should detect all interactive UI elements Validates: Requirements 8.2
Property 33: Sub-pixel Coordinate Precision
For any element coordinate data returned, the Screen_Capture_Service should provide sub-pixel precision Validates: Requirements 8.3
Property 34: Display Configuration Compatibility
For any screen resolution and DPI setting combination, the Screen_Capture_Service should handle the configuration correctly Validates: Requirements 8.4
Property 35: Capture Failure Fallback
For any screen capture failure scenario, the Screen_Capture_Service should provide working fallback mechanisms Validates: Requirements 8.5
Property 36: Selection Highlight Overlay
For any element selection, the Visual_Workflow_Builder should highlight the element with a colored overlay Validates: Requirements 9.1
Property 37: Action Preview Display
For any selected element, the Visual_Workflow_Builder should show a preview of the action that will be performed Validates: Requirements 9.2
Property 38: Validation Error Indicators
For any validation failure, the Visual_Workflow_Builder should display visual error indicators Validates: Requirements 9.3
Property 39: Real-time Detection Feedback
For any element detection operation, the Visual_Workflow_Builder should provide real-time feedback during processing Validates: Requirements 9.4
Property 40: Selection Testing Capability
For any element selection, the Visual_Workflow_Builder should allow testing the selection before saving Validates: Requirements 9.5
Property 41: Screen Capture Performance
For any screen capture operation, the Visual_Workflow_Builder should complete the capture in less than 2 seconds Validates: Requirements 10.1
Property 42: Element Detection Performance
For any element detection operation, the Visual_Workflow_Builder should complete detection in less than 3 seconds Validates: Requirements 10.2
Property 43: Async Operation Loading Indicators
For any asynchronous operation, the Visual_Workflow_Builder should provide loading indicators Validates: Requirements 10.3
Property 44: Screenshot Caching Efficiency
For any reference screenshot access, the Visual_Workflow_Builder should cache and retrieve screenshots efficiently Validates: Requirements 10.4
Property 45: Responsive Image Optimization
For any screen size configuration, the Visual_Workflow_Builder should optimize image display appropriately Validates: Requirements 10.5
Error Handling
Screen Capture Failures
- Timeout Handling: Implement 5-second timeout for capture operations with user notification
- Permission Errors: Graceful handling of screen capture permission denials
- Multi-monitor Issues: Fallback to primary monitor if multi-monitor capture fails
- Resolution Conflicts: Automatic scaling adjustment for resolution mismatches
Element Detection Failures
- Low Confidence Elements: Warning indicators for elements below 70% confidence
- No Elements Detected: Clear messaging and retry options when no elements are found
- Overlapping Elements: Disambiguation UI for overlapping or nested elements
- Performance Degradation: Progressive quality reduction for performance optimization
Visual Target Validation
- Invalid Targets: Clear error messages for targets that fail validation
- Outdated Screenshots: Automatic re-capture suggestions for stale references
- Coordinate Misalignment: Real-time correction suggestions for misaligned targets
- Embedding Failures: Fallback to basic visual matching when embeddings fail
Testing Strategy
Unit Testing
- Component Isolation: Test each visual component independently with mock data
- API Integration: Mock all backend services for frontend component testing
- Error Scenarios: Comprehensive testing of error handling paths
- Performance Boundaries: Test performance limits and degradation scenarios
Property-Based Testing
- Visual Target Generation: Test Visual_Target creation across various element types and screen configurations
- Coordinate Precision: Verify bounding box accuracy across different DPI and scaling settings
- Screenshot Processing: Test image processing pipeline with various image formats and sizes
- Integration Workflows: Verify end-to-end workflows from capture to execution
Integration Testing
- Backend API Integration: Test real integration with Screen_Capture_Service and Element_Detection_Engine
- Cross-browser Compatibility: Verify functionality across different browsers and versions
- Multi-monitor Support: Test coordinate mapping and capture across multiple display configurations
- Performance Under Load: Test system behavior with multiple concurrent capture operations
User Acceptance Testing
- Visual Selection Workflow: End-to-end testing of the complete visual selection process
- Reference Screenshot Accuracy: Verify that reference screenshots accurately represent selected elements
- Workflow Creation: Test complete workflow creation using only visual selection methods
- Error Recovery: Test user experience during error scenarios and recovery processes