feat: replay E2E fonctionnel — 25/25 actions, 0 retries, SomEngine via serveur

Validé sur PC Windows (DESKTOP-58D5CAC, 2560x1600) :
- 8 clics résolus visuellement (1 anchor_template, 1 som_text_match, 6 som_vlm)
- Score moyen 0.75, temps moyen 1.6s
- Texte tapé correctement (bonjour, test word, date, email)
- 0 retries, 2 actions non vérifiées (OK)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Dom
2026-03-31 14:04:41 +02:00
parent 5e0b53cfd1
commit a7de6a488b
79542 changed files with 6091757 additions and 1 deletions

View File

@@ -0,0 +1,300 @@
# Design Document - GPU Resource Manager
## Overview
Le GPU Resource Manager est un composant singleton qui orchestre l'allocation dynamique des ressources GPU entre les modèles ML du système RPA Vision V3. Il gère principalement deux ressources :
1. **Ollama VLM (qwen3-vl:8b)** - ~10.5 GB VRAM, utilisé pour la classification UI pendant l'enregistrement
2. **CLIP (ViT-B-32)** - ~500 MB VRAM, utilisé pour le matching d'embeddings
Le manager optimise l'utilisation de la VRAM en :
- Déchargeant le VLM quand non nécessaire (mode autopilot)
- Migrant CLIP sur GPU quand la VRAM est disponible
- Gérant un idle timeout pour libérer automatiquement les ressources
## Architecture
```mermaid
graph TB
subgraph "GPU Resource Manager"
GRM[GPUResourceManager]
OM[OllamaManager]
CM[CLIPManager]
VM[VRAMMonitor]
EE[EventEmitter]
end
subgraph "External Services"
OL[Ollama API :11434]
NV[nvidia-smi / pynvml]
end
subgraph "Consumers"
EL[ExecutionLoop]
UD[UIDetector]
FE[FusionEngine]
end
GRM --> OM
GRM --> CM
GRM --> VM
GRM --> EE
OM --> OL
VM --> NV
EL --> GRM
UD --> GRM
FE --> GRM
```
## Components and Interfaces
### GPUResourceManager (Singleton)
```python
class GPUResourceManager:
"""Gestionnaire central des ressources GPU."""
# Lifecycle
def __init__(self, config: GPUResourceConfig)
def shutdown(self) -> None
# Mode Management
def set_execution_mode(self, mode: ExecutionMode) -> None
def get_execution_mode(self) -> ExecutionMode
# VLM Management
async def ensure_vlm_loaded(self) -> bool
async def ensure_vlm_unloaded(self) -> bool
def is_vlm_loaded(self) -> bool
def get_vlm_state(self) -> ModelState
# CLIP Management
def get_clip_device(self) -> str # "cpu" or "cuda"
async def migrate_clip_to_gpu(self) -> bool
async def migrate_clip_to_cpu(self) -> bool
# Monitoring
def get_status(self) -> GPUResourceStatus
def get_vram_usage(self) -> VRAMInfo
# Events
def on_resource_changed(self, callback: Callable) -> None
def on_mode_changed(self, callback: Callable) -> None
def on_idle_unload(self, callback: Callable) -> None
```
### OllamaManager
```python
class OllamaManager:
"""Gère le cycle de vie des modèles Ollama."""
def __init__(self, endpoint: str = "http://localhost:11434")
async def load_model(self, model: str, keep_alive: str = "5m") -> bool
async def unload_model(self, model: str) -> bool
async def is_model_loaded(self, model: str) -> bool
async def list_loaded_models(self) -> List[str]
def is_available(self) -> bool
```
### CLIPManager
```python
class CLIPManager:
"""Gère la migration CPU/GPU du modèle CLIP."""
def __init__(self, model_name: str = "ViT-B-32")
def get_current_device(self) -> str
async def migrate_to_device(self, device: str) -> bool
def get_model(self) -> Any # Returns the CLIP model
def reinitialize_pipeline(self) -> None
```
### VRAMMonitor
```python
class VRAMMonitor:
"""Surveille l'utilisation de la VRAM."""
def __init__(self, poll_interval_ms: int = 1000)
def get_vram_info(self) -> VRAMInfo
def get_available_vram_mb(self) -> int
def start_monitoring(self) -> None
def stop_monitoring(self) -> None
def on_vram_changed(self, callback: Callable, threshold_mb: int = 100) -> None
```
## Data Models
```python
from enum import Enum
from dataclasses import dataclass
from typing import Optional, List
from datetime import datetime
class ExecutionMode(str, Enum):
IDLE = "idle"
RECORDING = "recording"
AUTOPILOT = "autopilot"
class ModelState(str, Enum):
UNLOADED = "unloaded"
LOADING = "loading"
LOADED = "loaded"
UNLOADING = "unloading"
ERROR = "error"
@dataclass
class VRAMInfo:
total_mb: int
used_mb: int
free_mb: int
gpu_name: str
gpu_utilization_percent: int
@dataclass
class GPUResourceStatus:
execution_mode: ExecutionMode
vlm_state: ModelState
vlm_model: str
clip_device: str
vram: VRAMInfo
idle_timeout_seconds: int
last_vlm_request: Optional[datetime]
degraded_mode: bool
degraded_reason: Optional[str]
@dataclass
class GPUResourceConfig:
ollama_endpoint: str = "http://localhost:11434"
vlm_model: str = "qwen3-vl:8b"
clip_model: str = "ViT-B-32"
idle_timeout_seconds: int = 300 # 5 minutes
vram_threshold_for_clip_gpu_mb: int = 1024 # 1 GB
max_load_retries: int = 3
load_timeout_seconds: int = 30
unload_timeout_seconds: int = 5
@dataclass
class ResourceChangedEvent:
timestamp: datetime
event_type: str # "vram_changed", "model_loaded", "model_unloaded", "device_changed"
details: dict
```
## Correctness Properties
*A property is a characteristic or behavior that should hold true across all valid executions of a system-essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
### Property 1: Mode transition triggers VLM unload
*For any* GPU Resource Manager in RECORDING mode with VLM loaded, transitioning to AUTOPILOT mode should result in VLM being unloaded within 5 seconds.
**Validates: Requirements 1.1**
### Property 2: Mode transition triggers VLM load
*For any* GPU Resource Manager in AUTOPILOT mode with VLM unloaded, transitioning to RECORDING mode should result in VLM being loaded within 30 seconds.
**Validates: Requirements 1.2**
### Property 3: CLIP on GPU in AUTOPILOT
*For any* GPU Resource Manager in AUTOPILOT mode with available VRAM > 1GB, CLIP should be on GPU device.
**Validates: Requirements 1.3, 3.1**
### Property 4: VRAM decrease on VLM unload
*For any* VLM unload operation, the VRAM usage should decrease by at least 8 GB.
**Validates: Requirements 1.4**
### Property 5: Status query completeness
*For any* call to get_status(), the returned GPUResourceStatus should contain valid values for all fields including vram, vlm_state, clip_device, and execution_mode.
**Validates: Requirements 2.1**
### Property 6: CLIP migration ordering
*For any* VLM load request when CLIP is on GPU, CLIP should be migrated to CPU before VLM loading completes.
**Validates: Requirements 3.2**
### Property 7: Embedding pipeline consistency
*For any* CLIP device change, the embedding pipeline should produce valid embeddings after reinitialization.
**Validates: Requirements 3.3**
### Property 8: Idle timeout behavior
*For any* configured idle_timeout value, VLM should be unloaded after that duration of inactivity (not the default).
**Validates: Requirements 4.1, 4.3**
### Property 9: On-demand VLM loading
*For any* VLM request when VLM is unloaded, the request should complete successfully after VLM is loaded.
**Validates: Requirements 4.2**
### Property 10: ensure_vlm_loaded blocking
*For any* call to ensure_vlm_loaded(), the function should only return when is_vlm_loaded() returns True.
**Validates: Requirements 5.1**
### Property 11: ensure_vlm_unloaded blocking
*For any* call to ensure_vlm_unloaded(), the function should only return when is_vlm_loaded() returns False.
**Validates: Requirements 5.2**
### Property 12: get_clip_device validity
*For any* call to get_clip_device(), the return value should be either "cpu" or "cuda".
**Validates: Requirements 5.3**
### Property 13: Sequential operation processing
*For any* concurrent model operations, they should be processed sequentially without race conditions.
**Validates: Requirements 5.4**
## Error Handling
### Ollama Unavailable
- Detect via connection timeout or HTTP error
- Set `degraded_mode = True` with reason
- CLIP continues on CPU
- VLM operations return False with logged warning
- Periodic retry every 30 seconds
### GPU Not Available
- Detect via pynvml initialization failure
- Force CPU-only mode for all models
- Log warning at startup
- All GPU migration requests return False gracefully
### VRAM Insufficient
- Check available VRAM before operations
- Return error with current VRAM info
- Suggest unloading other models
### Load/Unload Timeout
- Implement timeout with cancellation
- Retry up to max_load_retries
- Mark model as ERROR state after failures
- Emit error event for monitoring
## Testing Strategy
### Unit Tests
- Test each manager component in isolation
- Mock Ollama API responses
- Mock nvidia-smi/pynvml responses
- Test state machine transitions
- Test event emission
### Property-Based Tests (using Hypothesis)
- Generate random sequences of mode transitions
- Verify invariants hold after each transition
- Test concurrent operation handling
- Test timeout behavior with various configurations
### Integration Tests
- Test with real Ollama instance
- Test with real GPU (if available)
- Test full workflow: RECORDING → AUTOPILOT → RECORDING
- Measure actual VRAM changes
### Test Configuration
```python
# pytest configuration for property tests
HYPOTHESIS_SETTINGS = {
"max_examples": 100,
"deadline": 30000, # 30 seconds for GPU operations
}
```

View File

@@ -0,0 +1,83 @@
# Requirements Document
## Introduction
Le GPU Resource Manager est un composant d'optimisation pour RPA Vision V3 qui gère dynamiquement l'allocation des ressources GPU entre les différents modèles ML (Ollama VLM, CLIP, OWL-ViT) selon le mode d'exécution du système. L'objectif principal est de maximiser l'utilisation de la VRAM disponible en déchargeant les modèles non nécessaires et en basculant les modèles légers sur GPU quand la VRAM est libérée.
## Glossary
- **GPU_Resource_Manager**: Composant central qui orchestre l'allocation des ressources GPU entre les modèles ML
- **Ollama**: Service externe qui héberge les modèles VLM (Vision-Language Models) et gère leur chargement/déchargement en VRAM
- **VLM**: Vision-Language Model, modèle multimodal utilisé pour classifier les éléments UI (qwen3-vl:8b, ~10.5 GB VRAM)
- **CLIP**: Modèle d'embedding visuel utilisé pour le matching de screenshots (~500 MB VRAM)
- **VRAM**: Video RAM, mémoire dédiée du GPU
- **Execution_Mode**: Mode d'exécution du RPA (RECORDING, AUTOPILOT, IDLE)
- **Model_State**: État d'un modèle (LOADED, UNLOADED, LOADING, UNLOADING)
## Requirements
### Requirement 1
**User Story:** As a system operator, I want the GPU resources to be automatically optimized based on the current execution mode, so that VRAM is used efficiently without manual intervention.
#### Acceptance Criteria
1. WHEN the system transitions to AUTOPILOT mode THEN the GPU_Resource_Manager SHALL unload the VLM model from Ollama within 5 seconds
2. WHEN the system transitions to RECORDING mode THEN the GPU_Resource_Manager SHALL load the VLM model into Ollama within 30 seconds
3. WHILE in AUTOPILOT mode THE GPU_Resource_Manager SHALL maintain CLIP model on GPU for accelerated matching
4. WHEN the VLM model is unloaded THEN the GPU_Resource_Manager SHALL verify VRAM usage decreased by at least 8 GB
5. IF the Ollama service is unavailable THEN the GPU_Resource_Manager SHALL log the error and continue operation in degraded mode
### Requirement 2
**User Story:** As a developer, I want to query the current GPU resource state, so that I can monitor and debug resource allocation issues.
#### Acceptance Criteria
1. WHEN a status query is requested THEN the GPU_Resource_Manager SHALL return current VRAM usage, loaded models, and execution mode
2. WHEN VRAM usage changes by more than 100 MB THEN the GPU_Resource_Manager SHALL emit a resource_changed event
3. WHEN a model state changes THEN the GPU_Resource_Manager SHALL log the transition with timestamp and duration
### Requirement 3
**User Story:** As a system operator, I want CLIP to automatically switch between CPU and GPU based on available VRAM, so that matching performance is optimized when resources allow.
#### Acceptance Criteria
1. WHEN VLM is unloaded AND VRAM available exceeds 1 GB THEN the GPU_Resource_Manager SHALL migrate CLIP model to GPU
2. WHEN VLM loading is requested AND CLIP is on GPU THEN the GPU_Resource_Manager SHALL migrate CLIP back to CPU before loading VLM
3. WHEN CLIP device changes THEN the GPU_Resource_Manager SHALL reinitialize the embedding pipeline with the new device
4. IF CLIP GPU migration fails THEN the GPU_Resource_Manager SHALL fallback to CPU mode and log the error
### Requirement 4
**User Story:** As a system operator, I want idle timeout management for VLM, so that VRAM is automatically freed when the model is not used.
#### Acceptance Criteria
1. WHILE VLM is loaded AND no VLM requests occur for 5 minutes THEN the GPU_Resource_Manager SHALL unload the VLM model
2. WHEN a VLM request arrives AND VLM is unloaded THEN the GPU_Resource_Manager SHALL load VLM on-demand before processing
3. WHERE idle_timeout is configured THEN the GPU_Resource_Manager SHALL use the configured timeout value instead of default
4. WHEN idle timeout triggers unload THEN the GPU_Resource_Manager SHALL emit an idle_unload event
### Requirement 5
**User Story:** As a developer, I want the GPU Resource Manager to provide a clean API for model lifecycle management, so that other components can request resources predictably.
#### Acceptance Criteria
1. WHEN ensure_vlm_loaded() is called THEN the GPU_Resource_Manager SHALL return only after VLM is fully loaded and ready
2. WHEN ensure_vlm_unloaded() is called THEN the GPU_Resource_Manager SHALL return only after VLM is fully unloaded
3. WHEN get_clip_device() is called THEN the GPU_Resource_Manager SHALL return the current device string ("cpu" or "cuda")
4. IF a model operation is already in progress THEN the GPU_Resource_Manager SHALL queue the request and process sequentially
### Requirement 6
**User Story:** As a system operator, I want graceful fallback when GPU operations fail, so that the system remains functional even with degraded performance.
#### Acceptance Criteria
1. IF GPU is not available THEN the GPU_Resource_Manager SHALL operate in CPU-only mode without errors
2. IF Ollama model loading fails after 3 retries THEN the GPU_Resource_Manager SHALL mark VLM as unavailable and notify listeners
3. WHEN operating in degraded mode THEN the GPU_Resource_Manager SHALL log periodic warnings about reduced functionality
4. IF VRAM is insufficient for requested operation THEN the GPU_Resource_Manager SHALL return an error with available VRAM information

View File

@@ -0,0 +1,147 @@
# Implementation Plan
- [x] 1. Set up project structure and core interfaces
- [x] 1.1 Create core/gpu/gpu_resource_manager.py with GPUResourceManager class skeleton
- Define ExecutionMode, ModelState enums
- Define GPUResourceConfig, GPUResourceStatus, VRAMInfo dataclasses
- Implement singleton pattern
- _Requirements: 2.1, 5.1, 5.2, 5.3_
- [x] 1.2 Create core/gpu/ollama_manager.py with OllamaManager class
- Implement Ollama API client (load_model, unload_model, is_model_loaded)
- Add connection health check
- _Requirements: 1.1, 1.2, 1.5_
- [x] 1.3 Create core/gpu/vram_monitor.py with VRAMMonitor class
- Implement pynvml wrapper for VRAM queries
- Add fallback for systems without GPU
- _Requirements: 1.4, 2.1, 6.1_
- [x] 1.4 Write property test for OllamaManager
- **Property 10: ensure_vlm_loaded blocking**
- **Property 11: ensure_vlm_unloaded blocking**
- **Validates: Requirements 5.1, 5.2**
- [x] 2. Implement VLM lifecycle management
- [x] 2.1 Implement ensure_vlm_loaded() in GPUResourceManager
- Add async loading with timeout
- Implement retry logic (max 3 retries)
- Queue concurrent requests
- _Requirements: 5.1, 5.4, 6.2_
- [x] 2.2 Implement ensure_vlm_unloaded() in GPUResourceManager
- Add async unloading with timeout
- Verify VRAM decrease
- _Requirements: 5.2, 1.4_
- [x] 2.3 Write property test for VLM lifecycle
- **Property 4: VRAM decrease on VLM unload**
- **Validates: Requirements 1.4**
- [x] 2.4 Write property test for blocking behavior
- **Property 10: ensure_vlm_loaded blocking**
- **Property 11: ensure_vlm_unloaded blocking**
- **Validates: Requirements 5.1, 5.2**
- [x] 3. Implement CLIP device management
- [x] 3.1 Create core/gpu/clip_manager.py with CLIPManager class
- Implement device detection and migration
- Add pipeline reinitialization
- _Requirements: 3.1, 3.3, 3.4_
- [x] 3.2 Implement migrate_clip_to_gpu() and migrate_clip_to_cpu()
- Check VRAM availability before GPU migration
- Handle migration failures gracefully
- _Requirements: 3.1, 3.2, 3.4_
- [x] 3.3 Write property test for CLIP device
- **Property 12: get_clip_device validity**
- **Validates: Requirements 5.3**
- [x] 3.4 Write property test for embedding consistency
- **Property 7: Embedding pipeline consistency**
- **Validates: Requirements 3.3**
- [x] 4. Checkpoint - Ensure all tests pass
- Ensure all tests pass, ask the user if questions arise.
- [x] 5. Implement execution mode management
- [x] 5.1 Implement set_execution_mode() with automatic resource management
- AUTOPILOT: unload VLM, migrate CLIP to GPU
- RECORDING: load VLM, migrate CLIP to CPU
- IDLE: no automatic changes
- _Requirements: 1.1, 1.2, 1.3, 3.1, 3.2_
- [x] 5.2 Implement mode transition coordination
- Ensure CLIP migrates before VLM loads
- Handle concurrent mode changes
- _Requirements: 3.2, 5.4_
- [x] 5.3 Write property test for mode transitions
- **Property 1: Mode transition triggers VLM unload**
- **Property 2: Mode transition triggers VLM load**
- **Validates: Requirements 1.1, 1.2**
- [x] 5.4 Write property test for CLIP in AUTOPILOT
- **Property 3: CLIP on GPU in AUTOPILOT**
- **Validates: Requirements 1.3, 3.1**
- [x] 5.5 Write property test for migration ordering
- **Property 6: CLIP migration ordering**
- **Validates: Requirements 3.2**
- [x] 6. Implement idle timeout management
- [x] 6.1 Add idle timeout tracking in GPUResourceManager
- Track last VLM request timestamp
- Implement background timer for timeout check
- _Requirements: 4.1, 4.3_
- [x] 6.2 Implement on-demand VLM loading
- Intercept VLM requests when unloaded
- Load VLM before processing request
- _Requirements: 4.2_
- [x] 6.3 Write property test for idle timeout
- **Property 8: Idle timeout behavior**
- **Validates: Requirements 4.1, 4.3**
- [x] 6.4 Write property test for on-demand loading
- **Property 9: On-demand VLM loading**
- **Validates: Requirements 4.2**
- [x] 7. Implement monitoring and events
- [x] 7.1 Implement get_status() returning complete GPUResourceStatus
- Include all fields: vram, vlm_state, clip_device, execution_mode
- _Requirements: 2.1_
- [x] 7.2 Implement event emission system
- resource_changed, mode_changed, idle_unload events
- VRAM change threshold detection (100 MB)
- _Requirements: 2.2, 2.3, 4.4_
- [x] 7.3 Write property test for status completeness
- **Property 5: Status query completeness**
- **Validates: Requirements 2.1**
- [x] 8. Implement error handling and degraded mode
- [x] 8.1 Implement graceful degradation for missing GPU
- Detect GPU availability at startup
- Force CPU-only mode if no GPU
- _Requirements: 6.1_
- [x] 8.2 Implement Ollama unavailable handling
- Connection retry logic
- Degraded mode flag and reason
- _Requirements: 1.5, 6.2, 6.3_
- [x] 8.3 Implement VRAM insufficient error handling
- Check VRAM before operations
- Return informative errors
- _Requirements: 6.4_
- [x] 8.4 Write property test for sequential processing
- **Property 13: Sequential operation processing**
- **Validates: Requirements 5.4**
- [x] 9. Checkpoint - Ensure all tests pass
- Ensure all tests pass, ask the user if questions arise.
- [x] 10. Integration with existing components
- [x] 10.1 Integrate GPUResourceManager with ExecutionLoop
- Call set_execution_mode() on mode changes
- Use ensure_vlm_loaded() before VLM operations
- _Requirements: 1.1, 1.2, 4.2_
- [x] 10.2 Integrate with UIDetector
- Check VLM availability before classification
- Handle degraded mode gracefully
- _Requirements: 1.5, 6.2_
- [x] 10.3 Integrate with FusionEngine/CLIP embedding
- Use CLIPManager for device-aware embeddings
- Reinitialize on device change
- _Requirements: 3.3_
- [x] 10.4 Update core/config.py with GPU resource configuration
- Add GPUResourceConfig to AppConfig
- Support environment variables
- _Requirements: 4.3_
- [x] 11. Final Checkpoint - Ensure all tests pass
- Ensure all tests pass, ask the user if questions arise.