feat: replay E2E fonctionnel — 25/25 actions, 0 retries, SomEngine via serveur

Validé sur PC Windows (DESKTOP-58D5CAC, 2560x1600) : - 8 clics résolus visuellement (1 anchor_template, 1 som_text_match, 6 som_vlm) - Score moyen 0.75, temps moyen 1.6s - Texte tapé correctement (bonjour, test word, date, email) - 0 retries, 2 actions non vérifiées (OK) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 14:04:41 +02:00
parent 5e0b53cfd1
commit a7de6a488b
79542 changed files with 6091757 additions and 1 deletions
--- a/.kiro/specs/gpu-resource-manager/design.md
+++ b/.kiro/specs/gpu-resource-manager/design.md
@@ -0,0 +1,300 @@
+# Design Document - GPU Resource Manager
+
+## Overview
+
+Le GPU Resource Manager est un composant singleton qui orchestre l'allocation dynamique des ressources GPU entre les modèles ML du système RPA Vision V3. Il gère principalement deux ressources :
+
+1. **Ollama VLM (qwen3-vl:8b)** - ~10.5 GB VRAM, utilisé pour la classification UI pendant l'enregistrement
+2. **CLIP (ViT-B-32)** - ~500 MB VRAM, utilisé pour le matching d'embeddings
+
+Le manager optimise l'utilisation de la VRAM en :
+- Déchargeant le VLM quand non nécessaire (mode autopilot)
+- Migrant CLIP sur GPU quand la VRAM est disponible
+- Gérant un idle timeout pour libérer automatiquement les ressources
+
+## Architecture
+
+```mermaid
+graph TB
+    subgraph "GPU Resource Manager"
+        GRM[GPUResourceManager]
+        OM[OllamaManager]
+        CM[CLIPManager]
+        VM[VRAMMonitor]
+        EE[EventEmitter]
+    end
+    
+    subgraph "External Services"
+        OL[Ollama API :11434]
+        NV[nvidia-smi / pynvml]
+    end
+    
+    subgraph "Consumers"
+        EL[ExecutionLoop]
+        UD[UIDetector]
+        FE[FusionEngine]
+    end
+    
+    GRM --> OM
+    GRM --> CM
+    GRM --> VM
+    GRM --> EE
+    
+    OM --> OL
+    VM --> NV
+    
+    EL --> GRM
+    UD --> GRM
+    FE --> GRM
+```
+
+## Components and Interfaces
+
+### GPUResourceManager (Singleton)
+
+```python
+class GPUResourceManager:
+    """Gestionnaire central des ressources GPU."""
+    
+    # Lifecycle
+    def __init__(self, config: GPUResourceConfig)
+    def shutdown(self) -> None
+    
+    # Mode Management
+    def set_execution_mode(self, mode: ExecutionMode) -> None
+    def get_execution_mode(self) -> ExecutionMode
+    
+    # VLM Management
+    async def ensure_vlm_loaded(self) -> bool
+    async def ensure_vlm_unloaded(self) -> bool
+    def is_vlm_loaded(self) -> bool
+    def get_vlm_state(self) -> ModelState
+    
+    # CLIP Management
+    def get_clip_device(self) -> str  # "cpu" or "cuda"
+    async def migrate_clip_to_gpu(self) -> bool
+    async def migrate_clip_to_cpu(self) -> bool
+    
+    # Monitoring
+    def get_status(self) -> GPUResourceStatus
+    def get_vram_usage(self) -> VRAMInfo
+    
+    # Events
+    def on_resource_changed(self, callback: Callable) -> None
+    def on_mode_changed(self, callback: Callable) -> None
+    def on_idle_unload(self, callback: Callable) -> None
+```
+
+### OllamaManager
+
+```python
+class OllamaManager:
+    """Gère le cycle de vie des modèles Ollama."""
+    
+    def __init__(self, endpoint: str = "http://localhost:11434")
+    
+    async def load_model(self, model: str, keep_alive: str = "5m") -> bool
+    async def unload_model(self, model: str) -> bool
+    async def is_model_loaded(self, model: str) -> bool
+    async def list_loaded_models(self) -> List[str]
+    def is_available(self) -> bool
+```
+
+### CLIPManager
+
+```python
+class CLIPManager:
+    """Gère la migration CPU/GPU du modèle CLIP."""
+    
+    def __init__(self, model_name: str = "ViT-B-32")
+    
+    def get_current_device(self) -> str
+    async def migrate_to_device(self, device: str) -> bool
+    def get_model(self) -> Any  # Returns the CLIP model
+    def reinitialize_pipeline(self) -> None
+```
+
+### VRAMMonitor
+
+```python
+class VRAMMonitor:
+    """Surveille l'utilisation de la VRAM."""
+    
+    def __init__(self, poll_interval_ms: int = 1000)
+    
+    def get_vram_info(self) -> VRAMInfo
+    def get_available_vram_mb(self) -> int
+    def start_monitoring(self) -> None
+    def stop_monitoring(self) -> None
+    def on_vram_changed(self, callback: Callable, threshold_mb: int = 100) -> None
+```
+
+## Data Models
+
+```python
+from enum import Enum
+from dataclasses import dataclass
+from typing import Optional, List
+from datetime import datetime
+
+class ExecutionMode(str, Enum):
+    IDLE = "idle"
+    RECORDING = "recording"
+    AUTOPILOT = "autopilot"
+
+class ModelState(str, Enum):
+    UNLOADED = "unloaded"
+    LOADING = "loading"
+    LOADED = "loaded"
+    UNLOADING = "unloading"
+    ERROR = "error"
+
+@dataclass
+class VRAMInfo:
+    total_mb: int
+    used_mb: int
+    free_mb: int
+    gpu_name: str
+    gpu_utilization_percent: int
+
+@dataclass
+class GPUResourceStatus:
+    execution_mode: ExecutionMode
+    vlm_state: ModelState
+    vlm_model: str
+    clip_device: str
+    vram: VRAMInfo
+    idle_timeout_seconds: int
+    last_vlm_request: Optional[datetime]
+    degraded_mode: bool
+    degraded_reason: Optional[str]
+
+@dataclass
+class GPUResourceConfig:
+    ollama_endpoint: str = "http://localhost:11434"
+    vlm_model: str = "qwen3-vl:8b"
+    clip_model: str = "ViT-B-32"
+    idle_timeout_seconds: int = 300  # 5 minutes
+    vram_threshold_for_clip_gpu_mb: int = 1024  # 1 GB
+    max_load_retries: int = 3
+    load_timeout_seconds: int = 30
+    unload_timeout_seconds: int = 5
+
+@dataclass
+class ResourceChangedEvent:
+    timestamp: datetime
+    event_type: str  # "vram_changed", "model_loaded", "model_unloaded", "device_changed"
+    details: dict
+```
+
+## Correctness Properties
+
+*A property is a characteristic or behavior that should hold true across all valid executions of a system-essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
+
+### Property 1: Mode transition triggers VLM unload
+*For any* GPU Resource Manager in RECORDING mode with VLM loaded, transitioning to AUTOPILOT mode should result in VLM being unloaded within 5 seconds.
+**Validates: Requirements 1.1**
+
+### Property 2: Mode transition triggers VLM load
+*For any* GPU Resource Manager in AUTOPILOT mode with VLM unloaded, transitioning to RECORDING mode should result in VLM being loaded within 30 seconds.
+**Validates: Requirements 1.2**
+
+### Property 3: CLIP on GPU in AUTOPILOT
+*For any* GPU Resource Manager in AUTOPILOT mode with available VRAM > 1GB, CLIP should be on GPU device.
+**Validates: Requirements 1.3, 3.1**
+
+### Property 4: VRAM decrease on VLM unload
+*For any* VLM unload operation, the VRAM usage should decrease by at least 8 GB.
+**Validates: Requirements 1.4**
+
+### Property 5: Status query completeness
+*For any* call to get_status(), the returned GPUResourceStatus should contain valid values for all fields including vram, vlm_state, clip_device, and execution_mode.
+**Validates: Requirements 2.1**
+
+### Property 6: CLIP migration ordering
+*For any* VLM load request when CLIP is on GPU, CLIP should be migrated to CPU before VLM loading completes.
+**Validates: Requirements 3.2**
+
+### Property 7: Embedding pipeline consistency
+*For any* CLIP device change, the embedding pipeline should produce valid embeddings after reinitialization.
+**Validates: Requirements 3.3**
+
+### Property 8: Idle timeout behavior
+*For any* configured idle_timeout value, VLM should be unloaded after that duration of inactivity (not the default).
+**Validates: Requirements 4.1, 4.3**
+
+### Property 9: On-demand VLM loading
+*For any* VLM request when VLM is unloaded, the request should complete successfully after VLM is loaded.
+**Validates: Requirements 4.2**
+
+### Property 10: ensure_vlm_loaded blocking
+*For any* call to ensure_vlm_loaded(), the function should only return when is_vlm_loaded() returns True.
+**Validates: Requirements 5.1**
+
+### Property 11: ensure_vlm_unloaded blocking
+*For any* call to ensure_vlm_unloaded(), the function should only return when is_vlm_loaded() returns False.
+**Validates: Requirements 5.2**
+
+### Property 12: get_clip_device validity
+*For any* call to get_clip_device(), the return value should be either "cpu" or "cuda".
+**Validates: Requirements 5.3**
+
+### Property 13: Sequential operation processing
+*For any* concurrent model operations, they should be processed sequentially without race conditions.
+**Validates: Requirements 5.4**
+
+## Error Handling
+
+### Ollama Unavailable
+- Detect via connection timeout or HTTP error
+- Set `degraded_mode = True` with reason
+- CLIP continues on CPU
+- VLM operations return False with logged warning
+- Periodic retry every 30 seconds
+
+### GPU Not Available
+- Detect via pynvml initialization failure
+- Force CPU-only mode for all models
+- Log warning at startup
+- All GPU migration requests return False gracefully
+
+### VRAM Insufficient
+- Check available VRAM before operations
+- Return error with current VRAM info
+- Suggest unloading other models
+
+### Load/Unload Timeout
+- Implement timeout with cancellation
+- Retry up to max_load_retries
+- Mark model as ERROR state after failures
+- Emit error event for monitoring
+
+## Testing Strategy
+
+### Unit Tests
+- Test each manager component in isolation
+- Mock Ollama API responses
+- Mock nvidia-smi/pynvml responses
+- Test state machine transitions
+- Test event emission
+
+### Property-Based Tests (using Hypothesis)
+- Generate random sequences of mode transitions
+- Verify invariants hold after each transition
+- Test concurrent operation handling
+- Test timeout behavior with various configurations
+
+### Integration Tests
+- Test with real Ollama instance
+- Test with real GPU (if available)
+- Test full workflow: RECORDING → AUTOPILOT → RECORDING
+- Measure actual VRAM changes
+
+### Test Configuration
+```python
+# pytest configuration for property tests
+HYPOTHESIS_SETTINGS = {
+    "max_examples": 100,
+    "deadline": 30000,  # 30 seconds for GPU operations
+}
+```
--- a/.kiro/specs/gpu-resource-manager/requirements.md
+++ b/.kiro/specs/gpu-resource-manager/requirements.md
@@ -0,0 +1,83 @@
+# Requirements Document
+
+## Introduction
+
+Le GPU Resource Manager est un composant d'optimisation pour RPA Vision V3 qui gère dynamiquement l'allocation des ressources GPU entre les différents modèles ML (Ollama VLM, CLIP, OWL-ViT) selon le mode d'exécution du système. L'objectif principal est de maximiser l'utilisation de la VRAM disponible en déchargeant les modèles non nécessaires et en basculant les modèles légers sur GPU quand la VRAM est libérée.
+
+## Glossary
+
+- **GPU_Resource_Manager**: Composant central qui orchestre l'allocation des ressources GPU entre les modèles ML
+- **Ollama**: Service externe qui héberge les modèles VLM (Vision-Language Models) et gère leur chargement/déchargement en VRAM
+- **VLM**: Vision-Language Model, modèle multimodal utilisé pour classifier les éléments UI (qwen3-vl:8b, ~10.5 GB VRAM)
+- **CLIP**: Modèle d'embedding visuel utilisé pour le matching de screenshots (~500 MB VRAM)
+- **VRAM**: Video RAM, mémoire dédiée du GPU
+- **Execution_Mode**: Mode d'exécution du RPA (RECORDING, AUTOPILOT, IDLE)
+- **Model_State**: État d'un modèle (LOADED, UNLOADED, LOADING, UNLOADING)
+
+## Requirements
+
+### Requirement 1
+
+**User Story:** As a system operator, I want the GPU resources to be automatically optimized based on the current execution mode, so that VRAM is used efficiently without manual intervention.
+
+#### Acceptance Criteria
+
+1. WHEN the system transitions to AUTOPILOT mode THEN the GPU_Resource_Manager SHALL unload the VLM model from Ollama within 5 seconds
+2. WHEN the system transitions to RECORDING mode THEN the GPU_Resource_Manager SHALL load the VLM model into Ollama within 30 seconds
+3. WHILE in AUTOPILOT mode THE GPU_Resource_Manager SHALL maintain CLIP model on GPU for accelerated matching
+4. WHEN the VLM model is unloaded THEN the GPU_Resource_Manager SHALL verify VRAM usage decreased by at least 8 GB
+5. IF the Ollama service is unavailable THEN the GPU_Resource_Manager SHALL log the error and continue operation in degraded mode
+
+### Requirement 2
+
+**User Story:** As a developer, I want to query the current GPU resource state, so that I can monitor and debug resource allocation issues.
+
+#### Acceptance Criteria
+
+1. WHEN a status query is requested THEN the GPU_Resource_Manager SHALL return current VRAM usage, loaded models, and execution mode
+2. WHEN VRAM usage changes by more than 100 MB THEN the GPU_Resource_Manager SHALL emit a resource_changed event
+3. WHEN a model state changes THEN the GPU_Resource_Manager SHALL log the transition with timestamp and duration
+
+### Requirement 3
+
+**User Story:** As a system operator, I want CLIP to automatically switch between CPU and GPU based on available VRAM, so that matching performance is optimized when resources allow.
+
+#### Acceptance Criteria
+
+1. WHEN VLM is unloaded AND VRAM available exceeds 1 GB THEN the GPU_Resource_Manager SHALL migrate CLIP model to GPU
+2. WHEN VLM loading is requested AND CLIP is on GPU THEN the GPU_Resource_Manager SHALL migrate CLIP back to CPU before loading VLM
+3. WHEN CLIP device changes THEN the GPU_Resource_Manager SHALL reinitialize the embedding pipeline with the new device
+4. IF CLIP GPU migration fails THEN the GPU_Resource_Manager SHALL fallback to CPU mode and log the error
+
+### Requirement 4
+
+**User Story:** As a system operator, I want idle timeout management for VLM, so that VRAM is automatically freed when the model is not used.
+
+#### Acceptance Criteria
+
+1. WHILE VLM is loaded AND no VLM requests occur for 5 minutes THEN the GPU_Resource_Manager SHALL unload the VLM model
+2. WHEN a VLM request arrives AND VLM is unloaded THEN the GPU_Resource_Manager SHALL load VLM on-demand before processing
+3. WHERE idle_timeout is configured THEN the GPU_Resource_Manager SHALL use the configured timeout value instead of default
+4. WHEN idle timeout triggers unload THEN the GPU_Resource_Manager SHALL emit an idle_unload event
+
+### Requirement 5
+
+**User Story:** As a developer, I want the GPU Resource Manager to provide a clean API for model lifecycle management, so that other components can request resources predictably.
+
+#### Acceptance Criteria
+
+1. WHEN ensure_vlm_loaded() is called THEN the GPU_Resource_Manager SHALL return only after VLM is fully loaded and ready
+2. WHEN ensure_vlm_unloaded() is called THEN the GPU_Resource_Manager SHALL return only after VLM is fully unloaded
+3. WHEN get_clip_device() is called THEN the GPU_Resource_Manager SHALL return the current device string ("cpu" or "cuda")
+4. IF a model operation is already in progress THEN the GPU_Resource_Manager SHALL queue the request and process sequentially
+
+### Requirement 6
+
+**User Story:** As a system operator, I want graceful fallback when GPU operations fail, so that the system remains functional even with degraded performance.
+
+#### Acceptance Criteria
+
+1. IF GPU is not available THEN the GPU_Resource_Manager SHALL operate in CPU-only mode without errors
+2. IF Ollama model loading fails after 3 retries THEN the GPU_Resource_Manager SHALL mark VLM as unavailable and notify listeners
+3. WHEN operating in degraded mode THEN the GPU_Resource_Manager SHALL log periodic warnings about reduced functionality
+4. IF VRAM is insufficient for requested operation THEN the GPU_Resource_Manager SHALL return an error with available VRAM information
--- a/.kiro/specs/gpu-resource-manager/tasks.md
+++ b/.kiro/specs/gpu-resource-manager/tasks.md
@@ -0,0 +1,147 @@
+# Implementation Plan
+
+- [x] 1. Set up project structure and core interfaces
+  - [x] 1.1 Create core/gpu/gpu_resource_manager.py with GPUResourceManager class skeleton
+    - Define ExecutionMode, ModelState enums
+    - Define GPUResourceConfig, GPUResourceStatus, VRAMInfo dataclasses
+    - Implement singleton pattern
+    - _Requirements: 2.1, 5.1, 5.2, 5.3_
+  - [x] 1.2 Create core/gpu/ollama_manager.py with OllamaManager class
+    - Implement Ollama API client (load_model, unload_model, is_model_loaded)
+    - Add connection health check
+    - _Requirements: 1.1, 1.2, 1.5_
+  - [x] 1.3 Create core/gpu/vram_monitor.py with VRAMMonitor class
+    - Implement pynvml wrapper for VRAM queries
+    - Add fallback for systems without GPU
+    - _Requirements: 1.4, 2.1, 6.1_
+  - [x] 1.4 Write property test for OllamaManager
+    - **Property 10: ensure_vlm_loaded blocking**
+    - **Property 11: ensure_vlm_unloaded blocking**
+    - **Validates: Requirements 5.1, 5.2**
+
+- [x] 2. Implement VLM lifecycle management
+  - [x] 2.1 Implement ensure_vlm_loaded() in GPUResourceManager
+    - Add async loading with timeout
+    - Implement retry logic (max 3 retries)
+    - Queue concurrent requests
+    - _Requirements: 5.1, 5.4, 6.2_
+  - [x] 2.2 Implement ensure_vlm_unloaded() in GPUResourceManager
+    - Add async unloading with timeout
+    - Verify VRAM decrease
+    - _Requirements: 5.2, 1.4_
+  - [x] 2.3 Write property test for VLM lifecycle
+    - **Property 4: VRAM decrease on VLM unload**
+    - **Validates: Requirements 1.4**
+  - [x] 2.4 Write property test for blocking behavior
+    - **Property 10: ensure_vlm_loaded blocking**
+    - **Property 11: ensure_vlm_unloaded blocking**
+    - **Validates: Requirements 5.1, 5.2**
+
+- [x] 3. Implement CLIP device management
+  - [x] 3.1 Create core/gpu/clip_manager.py with CLIPManager class
+    - Implement device detection and migration
+    - Add pipeline reinitialization
+    - _Requirements: 3.1, 3.3, 3.4_
+  - [x] 3.2 Implement migrate_clip_to_gpu() and migrate_clip_to_cpu()
+    - Check VRAM availability before GPU migration
+    - Handle migration failures gracefully
+    - _Requirements: 3.1, 3.2, 3.4_
+  - [x] 3.3 Write property test for CLIP device
+    - **Property 12: get_clip_device validity**
+    - **Validates: Requirements 5.3**
+  - [x] 3.4 Write property test for embedding consistency
+    - **Property 7: Embedding pipeline consistency**
+    - **Validates: Requirements 3.3**
+
+- [x] 4. Checkpoint - Ensure all tests pass
+  - Ensure all tests pass, ask the user if questions arise.
+
+- [x] 5. Implement execution mode management
+  - [x] 5.1 Implement set_execution_mode() with automatic resource management
+    - AUTOPILOT: unload VLM, migrate CLIP to GPU
+    - RECORDING: load VLM, migrate CLIP to CPU
+    - IDLE: no automatic changes
+    - _Requirements: 1.1, 1.2, 1.3, 3.1, 3.2_
+  - [x] 5.2 Implement mode transition coordination
+    - Ensure CLIP migrates before VLM loads
+    - Handle concurrent mode changes
+    - _Requirements: 3.2, 5.4_
+  - [x] 5.3 Write property test for mode transitions
+    - **Property 1: Mode transition triggers VLM unload**
+    - **Property 2: Mode transition triggers VLM load**
+    - **Validates: Requirements 1.1, 1.2**
+  - [x] 5.4 Write property test for CLIP in AUTOPILOT
+    - **Property 3: CLIP on GPU in AUTOPILOT**
+    - **Validates: Requirements 1.3, 3.1**
+  - [x] 5.5 Write property test for migration ordering
+    - **Property 6: CLIP migration ordering**
+    - **Validates: Requirements 3.2**
+
+- [x] 6. Implement idle timeout management
+  - [x] 6.1 Add idle timeout tracking in GPUResourceManager
+    - Track last VLM request timestamp
+    - Implement background timer for timeout check
+    - _Requirements: 4.1, 4.3_
+  - [x] 6.2 Implement on-demand VLM loading
+    - Intercept VLM requests when unloaded
+    - Load VLM before processing request
+    - _Requirements: 4.2_
+  - [x] 6.3 Write property test for idle timeout
+    - **Property 8: Idle timeout behavior**
+    - **Validates: Requirements 4.1, 4.3**
+  - [x] 6.4 Write property test for on-demand loading
+    - **Property 9: On-demand VLM loading**
+    - **Validates: Requirements 4.2**
+
+- [x] 7. Implement monitoring and events
+  - [x] 7.1 Implement get_status() returning complete GPUResourceStatus
+    - Include all fields: vram, vlm_state, clip_device, execution_mode
+    - _Requirements: 2.1_
+  - [x] 7.2 Implement event emission system
+    - resource_changed, mode_changed, idle_unload events
+    - VRAM change threshold detection (100 MB)
+    - _Requirements: 2.2, 2.3, 4.4_
+  - [x] 7.3 Write property test for status completeness
+    - **Property 5: Status query completeness**
+    - **Validates: Requirements 2.1**
+
+- [x] 8. Implement error handling and degraded mode
+  - [x] 8.1 Implement graceful degradation for missing GPU
+    - Detect GPU availability at startup
+    - Force CPU-only mode if no GPU
+    - _Requirements: 6.1_
+  - [x] 8.2 Implement Ollama unavailable handling
+    - Connection retry logic
+    - Degraded mode flag and reason
+    - _Requirements: 1.5, 6.2, 6.3_
+  - [x] 8.3 Implement VRAM insufficient error handling
+    - Check VRAM before operations
+    - Return informative errors
+    - _Requirements: 6.4_
+  - [x] 8.4 Write property test for sequential processing
+    - **Property 13: Sequential operation processing**
+    - **Validates: Requirements 5.4**
+
+- [x] 9. Checkpoint - Ensure all tests pass
+  - Ensure all tests pass, ask the user if questions arise.
+
+- [x] 10. Integration with existing components
+  - [x] 10.1 Integrate GPUResourceManager with ExecutionLoop
+    - Call set_execution_mode() on mode changes
+    - Use ensure_vlm_loaded() before VLM operations
+    - _Requirements: 1.1, 1.2, 4.2_
+  - [x] 10.2 Integrate with UIDetector
+    - Check VLM availability before classification
+    - Handle degraded mode gracefully
+    - _Requirements: 1.5, 6.2_
+  - [x] 10.3 Integrate with FusionEngine/CLIP embedding
+    - Use CLIPManager for device-aware embeddings
+    - Reinitialize on device change
+    - _Requirements: 3.3_
+  - [x] 10.4 Update core/config.py with GPU resource configuration
+    - Add GPUResourceConfig to AppConfig
+    - Support environment variables
+    - _Requirements: 4.3_
+
+- [x] 11. Final Checkpoint - Ensure all tests pass
+  - Ensure all tests pass, ask the user if questions arise.