Validé sur PC Windows (DESKTOP-58D5CAC, 2560x1600) : - 8 clics résolus visuellement (1 anchor_template, 1 som_text_match, 6 som_vlm) - Score moyen 0.75, temps moyen 1.6s - Texte tapé correctement (bonjour, test word, date, email) - 0 retries, 2 actions non vérifiées (OK) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
301 lines
9.1 KiB
Markdown
301 lines
9.1 KiB
Markdown
# Design Document - GPU Resource Manager
|
|
|
|
## Overview
|
|
|
|
Le GPU Resource Manager est un composant singleton qui orchestre l'allocation dynamique des ressources GPU entre les modèles ML du système RPA Vision V3. Il gère principalement deux ressources :
|
|
|
|
1. **Ollama VLM (qwen3-vl:8b)** - ~10.5 GB VRAM, utilisé pour la classification UI pendant l'enregistrement
|
|
2. **CLIP (ViT-B-32)** - ~500 MB VRAM, utilisé pour le matching d'embeddings
|
|
|
|
Le manager optimise l'utilisation de la VRAM en :
|
|
- Déchargeant le VLM quand non nécessaire (mode autopilot)
|
|
- Migrant CLIP sur GPU quand la VRAM est disponible
|
|
- Gérant un idle timeout pour libérer automatiquement les ressources
|
|
|
|
## Architecture
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "GPU Resource Manager"
|
|
GRM[GPUResourceManager]
|
|
OM[OllamaManager]
|
|
CM[CLIPManager]
|
|
VM[VRAMMonitor]
|
|
EE[EventEmitter]
|
|
end
|
|
|
|
subgraph "External Services"
|
|
OL[Ollama API :11434]
|
|
NV[nvidia-smi / pynvml]
|
|
end
|
|
|
|
subgraph "Consumers"
|
|
EL[ExecutionLoop]
|
|
UD[UIDetector]
|
|
FE[FusionEngine]
|
|
end
|
|
|
|
GRM --> OM
|
|
GRM --> CM
|
|
GRM --> VM
|
|
GRM --> EE
|
|
|
|
OM --> OL
|
|
VM --> NV
|
|
|
|
EL --> GRM
|
|
UD --> GRM
|
|
FE --> GRM
|
|
```
|
|
|
|
## Components and Interfaces
|
|
|
|
### GPUResourceManager (Singleton)
|
|
|
|
```python
|
|
class GPUResourceManager:
|
|
"""Gestionnaire central des ressources GPU."""
|
|
|
|
# Lifecycle
|
|
def __init__(self, config: GPUResourceConfig)
|
|
def shutdown(self) -> None
|
|
|
|
# Mode Management
|
|
def set_execution_mode(self, mode: ExecutionMode) -> None
|
|
def get_execution_mode(self) -> ExecutionMode
|
|
|
|
# VLM Management
|
|
async def ensure_vlm_loaded(self) -> bool
|
|
async def ensure_vlm_unloaded(self) -> bool
|
|
def is_vlm_loaded(self) -> bool
|
|
def get_vlm_state(self) -> ModelState
|
|
|
|
# CLIP Management
|
|
def get_clip_device(self) -> str # "cpu" or "cuda"
|
|
async def migrate_clip_to_gpu(self) -> bool
|
|
async def migrate_clip_to_cpu(self) -> bool
|
|
|
|
# Monitoring
|
|
def get_status(self) -> GPUResourceStatus
|
|
def get_vram_usage(self) -> VRAMInfo
|
|
|
|
# Events
|
|
def on_resource_changed(self, callback: Callable) -> None
|
|
def on_mode_changed(self, callback: Callable) -> None
|
|
def on_idle_unload(self, callback: Callable) -> None
|
|
```
|
|
|
|
### OllamaManager
|
|
|
|
```python
|
|
class OllamaManager:
|
|
"""Gère le cycle de vie des modèles Ollama."""
|
|
|
|
def __init__(self, endpoint: str = "http://localhost:11434")
|
|
|
|
async def load_model(self, model: str, keep_alive: str = "5m") -> bool
|
|
async def unload_model(self, model: str) -> bool
|
|
async def is_model_loaded(self, model: str) -> bool
|
|
async def list_loaded_models(self) -> List[str]
|
|
def is_available(self) -> bool
|
|
```
|
|
|
|
### CLIPManager
|
|
|
|
```python
|
|
class CLIPManager:
|
|
"""Gère la migration CPU/GPU du modèle CLIP."""
|
|
|
|
def __init__(self, model_name: str = "ViT-B-32")
|
|
|
|
def get_current_device(self) -> str
|
|
async def migrate_to_device(self, device: str) -> bool
|
|
def get_model(self) -> Any # Returns the CLIP model
|
|
def reinitialize_pipeline(self) -> None
|
|
```
|
|
|
|
### VRAMMonitor
|
|
|
|
```python
|
|
class VRAMMonitor:
|
|
"""Surveille l'utilisation de la VRAM."""
|
|
|
|
def __init__(self, poll_interval_ms: int = 1000)
|
|
|
|
def get_vram_info(self) -> VRAMInfo
|
|
def get_available_vram_mb(self) -> int
|
|
def start_monitoring(self) -> None
|
|
def stop_monitoring(self) -> None
|
|
def on_vram_changed(self, callback: Callable, threshold_mb: int = 100) -> None
|
|
```
|
|
|
|
## Data Models
|
|
|
|
```python
|
|
from enum import Enum
|
|
from dataclasses import dataclass
|
|
from typing import Optional, List
|
|
from datetime import datetime
|
|
|
|
class ExecutionMode(str, Enum):
|
|
IDLE = "idle"
|
|
RECORDING = "recording"
|
|
AUTOPILOT = "autopilot"
|
|
|
|
class ModelState(str, Enum):
|
|
UNLOADED = "unloaded"
|
|
LOADING = "loading"
|
|
LOADED = "loaded"
|
|
UNLOADING = "unloading"
|
|
ERROR = "error"
|
|
|
|
@dataclass
|
|
class VRAMInfo:
|
|
total_mb: int
|
|
used_mb: int
|
|
free_mb: int
|
|
gpu_name: str
|
|
gpu_utilization_percent: int
|
|
|
|
@dataclass
|
|
class GPUResourceStatus:
|
|
execution_mode: ExecutionMode
|
|
vlm_state: ModelState
|
|
vlm_model: str
|
|
clip_device: str
|
|
vram: VRAMInfo
|
|
idle_timeout_seconds: int
|
|
last_vlm_request: Optional[datetime]
|
|
degraded_mode: bool
|
|
degraded_reason: Optional[str]
|
|
|
|
@dataclass
|
|
class GPUResourceConfig:
|
|
ollama_endpoint: str = "http://localhost:11434"
|
|
vlm_model: str = "qwen3-vl:8b"
|
|
clip_model: str = "ViT-B-32"
|
|
idle_timeout_seconds: int = 300 # 5 minutes
|
|
vram_threshold_for_clip_gpu_mb: int = 1024 # 1 GB
|
|
max_load_retries: int = 3
|
|
load_timeout_seconds: int = 30
|
|
unload_timeout_seconds: int = 5
|
|
|
|
@dataclass
|
|
class ResourceChangedEvent:
|
|
timestamp: datetime
|
|
event_type: str # "vram_changed", "model_loaded", "model_unloaded", "device_changed"
|
|
details: dict
|
|
```
|
|
|
|
## Correctness Properties
|
|
|
|
*A property is a characteristic or behavior that should hold true across all valid executions of a system-essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
|
|
|
|
### Property 1: Mode transition triggers VLM unload
|
|
*For any* GPU Resource Manager in RECORDING mode with VLM loaded, transitioning to AUTOPILOT mode should result in VLM being unloaded within 5 seconds.
|
|
**Validates: Requirements 1.1**
|
|
|
|
### Property 2: Mode transition triggers VLM load
|
|
*For any* GPU Resource Manager in AUTOPILOT mode with VLM unloaded, transitioning to RECORDING mode should result in VLM being loaded within 30 seconds.
|
|
**Validates: Requirements 1.2**
|
|
|
|
### Property 3: CLIP on GPU in AUTOPILOT
|
|
*For any* GPU Resource Manager in AUTOPILOT mode with available VRAM > 1GB, CLIP should be on GPU device.
|
|
**Validates: Requirements 1.3, 3.1**
|
|
|
|
### Property 4: VRAM decrease on VLM unload
|
|
*For any* VLM unload operation, the VRAM usage should decrease by at least 8 GB.
|
|
**Validates: Requirements 1.4**
|
|
|
|
### Property 5: Status query completeness
|
|
*For any* call to get_status(), the returned GPUResourceStatus should contain valid values for all fields including vram, vlm_state, clip_device, and execution_mode.
|
|
**Validates: Requirements 2.1**
|
|
|
|
### Property 6: CLIP migration ordering
|
|
*For any* VLM load request when CLIP is on GPU, CLIP should be migrated to CPU before VLM loading completes.
|
|
**Validates: Requirements 3.2**
|
|
|
|
### Property 7: Embedding pipeline consistency
|
|
*For any* CLIP device change, the embedding pipeline should produce valid embeddings after reinitialization.
|
|
**Validates: Requirements 3.3**
|
|
|
|
### Property 8: Idle timeout behavior
|
|
*For any* configured idle_timeout value, VLM should be unloaded after that duration of inactivity (not the default).
|
|
**Validates: Requirements 4.1, 4.3**
|
|
|
|
### Property 9: On-demand VLM loading
|
|
*For any* VLM request when VLM is unloaded, the request should complete successfully after VLM is loaded.
|
|
**Validates: Requirements 4.2**
|
|
|
|
### Property 10: ensure_vlm_loaded blocking
|
|
*For any* call to ensure_vlm_loaded(), the function should only return when is_vlm_loaded() returns True.
|
|
**Validates: Requirements 5.1**
|
|
|
|
### Property 11: ensure_vlm_unloaded blocking
|
|
*For any* call to ensure_vlm_unloaded(), the function should only return when is_vlm_loaded() returns False.
|
|
**Validates: Requirements 5.2**
|
|
|
|
### Property 12: get_clip_device validity
|
|
*For any* call to get_clip_device(), the return value should be either "cpu" or "cuda".
|
|
**Validates: Requirements 5.3**
|
|
|
|
### Property 13: Sequential operation processing
|
|
*For any* concurrent model operations, they should be processed sequentially without race conditions.
|
|
**Validates: Requirements 5.4**
|
|
|
|
## Error Handling
|
|
|
|
### Ollama Unavailable
|
|
- Detect via connection timeout or HTTP error
|
|
- Set `degraded_mode = True` with reason
|
|
- CLIP continues on CPU
|
|
- VLM operations return False with logged warning
|
|
- Periodic retry every 30 seconds
|
|
|
|
### GPU Not Available
|
|
- Detect via pynvml initialization failure
|
|
- Force CPU-only mode for all models
|
|
- Log warning at startup
|
|
- All GPU migration requests return False gracefully
|
|
|
|
### VRAM Insufficient
|
|
- Check available VRAM before operations
|
|
- Return error with current VRAM info
|
|
- Suggest unloading other models
|
|
|
|
### Load/Unload Timeout
|
|
- Implement timeout with cancellation
|
|
- Retry up to max_load_retries
|
|
- Mark model as ERROR state after failures
|
|
- Emit error event for monitoring
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Tests
|
|
- Test each manager component in isolation
|
|
- Mock Ollama API responses
|
|
- Mock nvidia-smi/pynvml responses
|
|
- Test state machine transitions
|
|
- Test event emission
|
|
|
|
### Property-Based Tests (using Hypothesis)
|
|
- Generate random sequences of mode transitions
|
|
- Verify invariants hold after each transition
|
|
- Test concurrent operation handling
|
|
- Test timeout behavior with various configurations
|
|
|
|
### Integration Tests
|
|
- Test with real Ollama instance
|
|
- Test with real GPU (if available)
|
|
- Test full workflow: RECORDING → AUTOPILOT → RECORDING
|
|
- Measure actual VRAM changes
|
|
|
|
### Test Configuration
|
|
```python
|
|
# pytest configuration for property tests
|
|
HYPOTHESIS_SETTINGS = {
|
|
"max_examples": 100,
|
|
"deadline": 30000, # 30 seconds for GPU operations
|
|
}
|
|
```
|