Files
rpa_vision_v3/.kiro/specs/agent-workflow-improvements/tasks.md
Dom a7de6a488b feat: replay E2E fonctionnel — 25/25 actions, 0 retries, SomEngine via serveur
Validé sur PC Windows (DESKTOP-58D5CAC, 2560x1600) :
- 8 clics résolus visuellement (1 anchor_template, 1 som_text_match, 6 som_vlm)
- Score moyen 0.75, temps moyen 1.6s
- Texte tapé correctement (bonjour, test word, date, email)
- 0 retries, 2 actions non vérifiées (OK)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 14:04:41 +02:00

522 lines
18 KiB
Markdown

# Agent V0 - Workflow Improvements Tasks
## Overview
This document outlines the implementation tasks for the Agent V0 workflow improvements, organized by priority and dependencies. The tasks are structured to deliver value incrementally while maintaining system stability.
## Task Organization
### Priority Levels
- **P0 (Critical)**: Must-have features that address core workflow issues
- **P1 (Important)**: Significant improvements that enhance user experience
- **P2 (Nice-to-have)**: Advanced features that provide additional value
### Dependencies
Tasks are organized to minimize dependencies and allow parallel development where possible.
## Phase 1: Core Workflow Enhancements (P0)
### TASK-1.1: Dynamic Workflow Naming System
**Priority**: P0
**Estimated Effort**: 3 days
**Dependencies**: None
**Objective**: Enable users to provide meaningful names for their captured workflows
**Implementation Steps**:
1. **Create WorkflowNamer Component**
- [ ] Implement `WorkflowNamer` class in `agent_v0/workflow_namer.py`
- [ ] Add name validation and sanitization methods
- [ ] Implement default name generation with timestamps
- [ ] Add configuration options for naming patterns
2. **Create UI Dialog for Name Input**
- [ ] Implement `WorkflowNameDialog` in `agent_v0/ui_dialogs.py`
- [ ] Design user-friendly input interface
- [ ] Add validation feedback and error messages
- [ ] Implement cancel/default name handling
3. **Integrate with RawSession**
- [ ] Modify `RawSession` to accept workflow names
- [ ] Update session ID generation to include workflow name
- [ ] Propagate workflow name through session metadata
- [ ] Update file naming conventions
4. **Update TrayUI Integration**
- [ ] Modify `TrayUI` to prompt for workflow name on session start
- [ ] Handle user cancellation gracefully
- [ ] Update menu options to show current workflow name
- [ ] Add workflow name to status indicators
**Acceptance Criteria**:
- [ ] Users can input custom workflow names before starting capture
- [ ] Default names are generated when no input is provided
- [ ] Names are sanitized for filesystem compatibility
- [ ] Workflow names appear in all generated files and metadata
- [ ] UI provides clear feedback for invalid names
**Testing Requirements**:
- [ ] Unit tests for name validation and sanitization
- [ ] UI tests for dialog interaction
- [ ] Integration tests for end-to-end naming flow
- [ ] Edge case testing (empty names, special characters, long names)
---
### TASK-1.2: Enhanced Event Capture System
**Priority**: P0
**Estimated Effort**: 4 days
**Dependencies**: None
**Objective**: Capture complete user interactions including keyboard events and text input
**Implementation Steps**:
1. **Extend EventCaptor for Keyboard Support**
- [ ] Create `EnhancedEventCaptor` extending existing `EventCaptor`
- [ ] Implement keyboard event listeners using pynput
- [ ] Add text buffer management for continuous text input
- [ ] Implement modifier key tracking (Ctrl, Alt, Shift)
2. **Implement Key Combination Detection**
- [ ] Add detection for common key combinations (Ctrl+C, Ctrl+V, etc.)
- [ ] Implement special key handling (Enter, Tab, Escape)
- [ ] Add support for function keys and navigation keys
- [ ] Create configurable key combination mappings
3. **Add Sensitive Field Protection**
- [ ] Implement automatic password field detection
- [ ] Add configurable sensitive field patterns
- [ ] Implement text masking for sensitive inputs
- [ ] Add user override options for sensitive field handling
4. **Integrate Text Input with UI Elements**
- [ ] Associate text input with target UI elements
- [ ] Track focus changes and element transitions
- [ ] Implement text input validation and formatting
- [ ] Add support for multi-line text input
**Acceptance Criteria**:
- [ ] All keyboard events are captured and recorded
- [ ] Key combinations are detected and logged correctly
- [ ] Text input is associated with appropriate UI elements
- [ ] Sensitive fields are automatically masked
- [ ] No performance degradation during intensive typing
**Testing Requirements**:
- [ ] Unit tests for keyboard event handling
- [ ] Tests for key combination detection
- [ ] Sensitive field masking validation
- [ ] Performance tests for high-frequency input
- [ ] Cross-platform compatibility tests
---
### TASK-1.3: Processing Monitoring System
**Priority**: P0
**Estimated Effort**: 3 days
**Dependencies**: TASK-1.1
**Objective**: Provide real-time visibility into session processing pipeline
**Implementation Steps**:
1. **Create ProcessingMonitor Component**
- [ ] Implement `ProcessingMonitor` class in `agent_v0/processing_monitor.py`
- [ ] Add structured logging with different severity levels
- [ ] Implement progress tracking with percentage completion
- [ ] Add status file management for persistent state
2. **Integrate with Processing Pipeline**
- [ ] Modify `server/processing_pipeline.py` to use monitor
- [ ] Add monitoring hooks at each processing stage
- [ ] Implement error handling and recovery logging
- [ ] Add performance metrics collection
3. **Create User Notification System**
- [ ] Implement progress callbacks for UI updates
- [ ] Add system notifications for completion/errors
- [ ] Create status display in tray UI
- [ ] Implement log file access from UI
4. **Add Status Persistence**
- [ ] Create JSON status files for each session
- [ ] Implement status file cleanup and rotation
- [ ] Add status history for troubleshooting
- [ ] Create status query API for external tools
**Acceptance Criteria**:
- [ ] Processing progress is visible to users in real-time
- [ ] All processing steps are logged with timestamps
- [ ] Errors are clearly communicated with actionable information
- [ ] Processing logs are accessible for troubleshooting
- [ ] Status information persists across application restarts
**Testing Requirements**:
- [ ] Unit tests for monitoring component
- [ ] Integration tests with processing pipeline
- [ ] Error handling and recovery tests
- [ ] Performance impact assessment
- [ ] UI notification testing
---
## Phase 2: Advanced Capture Features (P1)
### TASK-2.1: Targeted Screenshot System
**Priority**: P1
**Estimated Effort**: 4 days
**Dependencies**: TASK-1.2
**Objective**: Capture element-focused screenshots for improved UI detection
**Implementation Steps**:
1. **Create TargetedScreenshotCaptor**
- [ ] Implement `TargetedScreenshotCaptor` class
- [ ] Add region calculation around click positions
- [ ] Implement dual capture (full-screen + targeted)
- [ ] Add click position indicators in targeted captures
2. **Implement UI Element Detection**
- [ ] Add basic UI element boundary detection
- [ ] Implement element type classification (button, input, etc.)
- [ ] Add text extraction from UI elements
- [ ] Create element metadata structure
3. **Optimize Image Processing**
- [ ] Implement image compression and optimization
- [ ] Add configurable quality settings
- [ ] Implement automatic image resizing
- [ ] Add support for different image formats
4. **Integrate with Event System**
- [ ] Modify click event handling to use targeted capture
- [ ] Update event data structure for dual screenshots
- [ ] Add element information to event metadata
- [ ] Implement capture mode configuration
**Acceptance Criteria**:
- [ ] Each click generates both full-screen and targeted screenshots
- [ ] Targeted captures include appropriate context margin
- [ ] UI element information is extracted and stored
- [ ] Image optimization maintains acceptable quality
- [ ] Capture performance remains within acceptable limits
**Testing Requirements**:
- [ ] Unit tests for screenshot capture logic
- [ ] Image quality and compression tests
- [ ] UI element detection accuracy tests
- [ ] Performance benchmarks for capture operations
- [ ] Cross-platform screenshot compatibility
---
### TASK-2.2: Workflow Organization System
**Priority**: P1
**Estimated Effort**: 3 days
**Dependencies**: TASK-1.1, TASK-1.3
**Objective**: Organize and provide easy access to generated workflows
**Implementation Steps**:
1. **Create WorkflowLocator Component**
- [ ] Implement `WorkflowLocator` class in `agent_v0/workflow_locator.py`
- [ ] Create organized directory structure for workflows
- [ ] Implement workflow indexing system
- [ ] Add metadata management for workflows
2. **Implement Workflow Storage Structure**
- [ ] Create `data/workflows/` directory hierarchy
- [ ] Implement per-workflow subdirectories
- [ ] Add screenshot organization (full/targeted)
- [ ] Create workflow metadata files
3. **Add Search and Discovery Features**
- [ ] Implement workflow search by name and tags
- [ ] Add filtering by date, type, and status
- [ ] Create workflow listing and browsing
- [ ] Add workflow statistics and analytics
4. **Integrate with UI**
- [ ] Add workflow folder access to tray menu
- [ ] Implement recent workflows display
- [ ] Add workflow browser dialog
- [ ] Create workflow export functionality
**Acceptance Criteria**:
- [ ] Workflows are organized in a clear directory structure
- [ ] Workflow index enables fast search and filtering
- [ ] Users can easily access and browse their workflows
- [ ] Workflow metadata is comprehensive and useful
- [ ] Export functionality supports multiple formats
**Testing Requirements**:
- [ ] Unit tests for workflow organization logic
- [ ] Search and filtering functionality tests
- [ ] Directory structure validation tests
- [ ] UI integration tests
- [ ] Performance tests for large workflow collections
---
## Phase 3: Integration and Polish (P2)
### TASK-3.1: Visual Workflow Builder Integration
**Priority**: P2
**Estimated Effort**: 3 days
**Dependencies**: TASK-2.2
**Objective**: Integrate enhanced workflows with Visual Workflow Builder
**Implementation Steps**:
1. **Update Import/Export System**
- [ ] Modify `visual_workflow_builder/backend/api/import_export.py`
- [ ] Add support for enhanced workflow format
- [ ] Implement targeted screenshot import
- [ ] Update workflow validation for new format
2. **Enhance Workflow Editor**
- [ ] Add support for displaying targeted screenshots
- [ ] Implement enhanced metadata display
- [ ] Add workflow name editing capabilities
- [ ] Create workflow organization browser
3. **Add Direct Access Integration**
- [ ] Implement "Open in Builder" functionality from agent
- [ ] Add automatic workflow import on generation
- [ ] Create workflow synchronization system
- [ ] Add builder launch from agent UI
4. **Update Documentation and Help**
- [ ] Update user documentation for new features
- [ ] Add tooltips and help text for enhanced features
- [ ] Create workflow organization guide
- [ ] Add troubleshooting documentation
**Acceptance Criteria**:
- [ ] Enhanced workflows can be imported into Visual Workflow Builder
- [ ] Targeted screenshots are displayed and usable in editor
- [ ] Direct access from agent to builder works seamlessly
- [ ] Documentation is complete and accurate
**Testing Requirements**:
- [ ] Integration tests between agent and builder
- [ ] Workflow import/export validation tests
- [ ] UI functionality tests in builder
- [ ] Documentation accuracy verification
---
### TASK-3.2: Performance Optimization
**Priority**: P2
**Estimated Effort**: 2 days
**Dependencies**: TASK-2.1
**Objective**: Optimize system performance with new features
**Implementation Steps**:
1. **Optimize Capture Performance**
- [ ] Implement asynchronous screenshot processing
- [ ] Add image processing thread pool
- [ ] Optimize memory usage during capture
- [ ] Implement capture queue management
2. **Optimize Storage Performance**
- [ ] Implement incremental workflow indexing
- [ ] Add lazy loading for workflow metadata
- [ ] Optimize file I/O operations
- [ ] Implement storage cleanup routines
3. **Add Performance Monitoring**
- [ ] Implement capture performance metrics
- [ ] Add memory usage monitoring
- [ ] Create performance benchmarking tools
- [ ] Add performance alerts and warnings
4. **Optimize UI Responsiveness**
- [ ] Implement non-blocking UI operations
- [ ] Add progress indicators for long operations
- [ ] Optimize UI update frequency
- [ ] Implement UI caching where appropriate
**Acceptance Criteria**:
- [ ] Capture performance overhead is less than 20%
- [ ] UI remains responsive during all operations
- [ ] Memory usage is optimized and stable
- [ ] Performance metrics are available for monitoring
**Testing Requirements**:
- [ ] Performance benchmark tests
- [ ] Memory usage profiling
- [ ] UI responsiveness tests
- [ ] Long-running operation tests
---
## Phase 4: Testing and Documentation (P1)
### TASK-4.1: Comprehensive Testing Suite
**Priority**: P1
**Estimated Effort**: 4 days
**Dependencies**: All previous tasks
**Objective**: Ensure system reliability and quality
**Implementation Steps**:
1. **Unit Test Coverage**
- [ ] Achieve >90% code coverage for new components
- [ ] Add tests for all public methods and functions
- [ ] Implement edge case and error condition tests
- [ ] Add performance regression tests
2. **Integration Testing**
- [ ] Test complete workflow capture to generation flow
- [ ] Validate cross-component interactions
- [ ] Test error handling and recovery scenarios
- [ ] Validate backward compatibility
3. **User Acceptance Testing**
- [ ] Create realistic user scenarios
- [ ] Test with different types of applications
- [ ] Validate workflow quality and usability
- [ ] Gather user feedback and iterate
4. **Cross-Platform Testing**
- [ ] Test on Windows, macOS, and Linux
- [ ] Validate platform-specific features
- [ ] Test with different screen resolutions
- [ ] Validate file system compatibility
**Acceptance Criteria**:
- [ ] All tests pass consistently across platforms
- [ ] Code coverage meets quality standards
- [ ] User scenarios work as expected
- [ ] No regressions in existing functionality
**Testing Requirements**:
- [ ] Automated test suite execution
- [ ] Continuous integration setup
- [ ] Performance regression detection
- [ ] User acceptance criteria validation
---
### TASK-4.2: Documentation and User Guides
**Priority**: P1
**Estimated Effort**: 3 days
**Dependencies**: TASK-4.1
**Objective**: Provide comprehensive documentation for new features
**Implementation Steps**:
1. **Technical Documentation**
- [ ] Update API documentation for new components
- [ ] Document configuration options and settings
- [ ] Create architecture diagrams and explanations
- [ ] Add troubleshooting guides
2. **User Documentation**
- [ ] Create user guide for workflow naming
- [ ] Document enhanced capture features
- [ ] Add workflow organization guide
- [ ] Create FAQ for common issues
3. **Developer Documentation**
- [ ] Document extension points and APIs
- [ ] Create development setup guide
- [ ] Add code examples and best practices
- [ ] Document testing procedures
4. **Migration Guide**
- [ ] Create migration guide for existing users
- [ ] Document backward compatibility features
- [ ] Add upgrade procedures and recommendations
- [ ] Create rollback procedures if needed
**Acceptance Criteria**:
- [ ] All new features are documented comprehensively
- [ ] User guides are clear and actionable
- [ ] Developer documentation enables contribution
- [ ] Migration path is well-defined and tested
**Testing Requirements**:
- [ ] Documentation accuracy verification
- [ ] User guide walkthrough testing
- [ ] Developer setup validation
- [ ] Migration procedure testing
---
## Implementation Timeline
### Sprint 1 (Weeks 1-2): Foundation
- TASK-1.1: Dynamic Workflow Naming System
- TASK-1.2: Enhanced Event Capture System (start)
### Sprint 2 (Weeks 3-4): Core Features
- TASK-1.2: Enhanced Event Capture System (complete)
- TASK-1.3: Processing Monitoring System
### Sprint 3 (Weeks 5-6): Advanced Features
- TASK-2.1: Targeted Screenshot System
- TASK-2.2: Workflow Organization System
### Sprint 4 (Weeks 7-8): Integration
- TASK-3.1: Visual Workflow Builder Integration
- TASK-3.2: Performance Optimization
### Sprint 5 (Weeks 9-10): Quality Assurance
- TASK-4.1: Comprehensive Testing Suite
- TASK-4.2: Documentation and User Guides
## Risk Management
### Technical Risks
- **Performance Impact**: Mitigate with incremental optimization and monitoring
- **Cross-Platform Compatibility**: Address with comprehensive testing
- **Integration Complexity**: Manage with clear interfaces and contracts
### Project Risks
- **Scope Creep**: Control with strict prioritization and change management
- **Resource Constraints**: Address with flexible sprint planning
- **User Adoption**: Mitigate with user feedback and iterative improvement
## Success Metrics
### Quantitative Metrics
- **Feature Adoption**: >80% of users use workflow naming
- **Capture Completeness**: >95% of events captured correctly
- **Performance**: <20% overhead increase
- **Quality**: >90% test coverage, <5% defect rate
### Qualitative Metrics
- **User Satisfaction**: >4/5 rating in user surveys
- **Workflow Quality**: Improved workflow accuracy and usability
- **Developer Experience**: Positive feedback from development team
- **Documentation Quality**: Clear and comprehensive documentation
## Definition of Done
A task is considered complete when:
- [ ] All implementation steps are finished
- [ ] Code review is completed and approved
- [ ] Unit tests are written and passing
- [ ] Integration tests are passing
- [ ] Documentation is updated
- [ ] Performance impact is assessed and acceptable
- [ ] User acceptance criteria are met
- [ ] No regressions are introduced
## Maintenance and Support
### Ongoing Maintenance
- Regular performance monitoring and optimization
- Bug fixes and issue resolution
- User feedback incorporation
- Security updates and patches
### Future Enhancements
- AI-powered workflow optimization
- Cloud synchronization capabilities
- Advanced analytics and insights
- Collaborative workflow development
This task breakdown provides a comprehensive roadmap for implementing the Agent V0 workflow improvements while maintaining quality and system stability.