# Requirements Document

## Introduction

Le GPU Resource Manager est un composant d'optimisation pour RPA Vision V3 qui gère dynamiquement l'allocation des ressources GPU entre les différents modèles ML (Ollama VLM, CLIP, OWL-ViT) selon le mode d'exécution du système. L'objectif principal est de maximiser l'utilisation de la VRAM disponible en déchargeant les modèles non nécessaires et en basculant les modèles légers sur GPU quand la VRAM est libérée.

## Glossary

- **GPU_Resource_Manager**: Composant central qui orchestre l'allocation des ressources GPU entre les modèles ML
- **Ollama**: Service externe qui héberge les modèles VLM (Vision-Language Models) et gère leur chargement/déchargement en VRAM
- **VLM**: Vision-Language Model, modèle multimodal utilisé pour classifier les éléments UI (qwen3-vl:8b, ~10.5 GB VRAM)
- **CLIP**: Modèle d'embedding visuel utilisé pour le matching de screenshots (~500 MB VRAM)
- **VRAM**: Video RAM, mémoire dédiée du GPU
- **Execution_Mode**: Mode d'exécution du RPA (RECORDING, AUTOPILOT, IDLE)
- **Model_State**: État d'un modèle (LOADED, UNLOADED, LOADING, UNLOADING)

## Requirements

### Requirement 1

**User Story:** As a system operator, I want the GPU resources to be automatically optimized based on the current execution mode, so that VRAM is used efficiently without manual intervention.

#### Acceptance Criteria

1. WHEN the system transitions to AUTOPILOT mode THEN the GPU_Resource_Manager SHALL unload the VLM model from Ollama within 5 seconds
2. WHEN the system transitions to RECORDING mode THEN the GPU_Resource_Manager SHALL load the VLM model into Ollama within 30 seconds
3. WHILE in AUTOPILOT mode THE GPU_Resource_Manager SHALL maintain CLIP model on GPU for accelerated matching
4. WHEN the VLM model is unloaded THEN the GPU_Resource_Manager SHALL verify VRAM usage decreased by at least 8 GB
5. IF the Ollama service is unavailable THEN the GPU_Resource_Manager SHALL log the error and continue operation in degraded mode

### Requirement 2

**User Story:** As a developer, I want to query the current GPU resource state, so that I can monitor and debug resource allocation issues.

#### Acceptance Criteria

1. WHEN a status query is requested THEN the GPU_Resource_Manager SHALL return current VRAM usage, loaded models, and execution mode
2. WHEN VRAM usage changes by more than 100 MB THEN the GPU_Resource_Manager SHALL emit a resource_changed event
3. WHEN a model state changes THEN the GPU_Resource_Manager SHALL log the transition with timestamp and duration

### Requirement 3

**User Story:** As a system operator, I want CLIP to automatically switch between CPU and GPU based on available VRAM, so that matching performance is optimized when resources allow.

#### Acceptance Criteria

1. WHEN VLM is unloaded AND VRAM available exceeds 1 GB THEN the GPU_Resource_Manager SHALL migrate CLIP model to GPU
2. WHEN VLM loading is requested AND CLIP is on GPU THEN the GPU_Resource_Manager SHALL migrate CLIP back to CPU before loading VLM
3. WHEN CLIP device changes THEN the GPU_Resource_Manager SHALL reinitialize the embedding pipeline with the new device
4. IF CLIP GPU migration fails THEN the GPU_Resource_Manager SHALL fallback to CPU mode and log the error

### Requirement 4

**User Story:** As a system operator, I want idle timeout management for VLM, so that VRAM is automatically freed when the model is not used.

#### Acceptance Criteria

1. WHILE VLM is loaded AND no VLM requests occur for 5 minutes THEN the GPU_Resource_Manager SHALL unload the VLM model
2. WHEN a VLM request arrives AND VLM is unloaded THEN the GPU_Resource_Manager SHALL load VLM on-demand before processing
3. WHERE idle_timeout is configured THEN the GPU_Resource_Manager SHALL use the configured timeout value instead of default
4. WHEN idle timeout triggers unload THEN the GPU_Resource_Manager SHALL emit an idle_unload event

### Requirement 5

**User Story:** As a developer, I want the GPU Resource Manager to provide a clean API for model lifecycle management, so that other components can request resources predictably.

#### Acceptance Criteria

1. WHEN ensure_vlm_loaded() is called THEN the GPU_Resource_Manager SHALL return only after VLM is fully loaded and ready
2. WHEN ensure_vlm_unloaded() is called THEN the GPU_Resource_Manager SHALL return only after VLM is fully unloaded
3. WHEN get_clip_device() is called THEN the GPU_Resource_Manager SHALL return the current device string ("cpu" or "cuda")
4. IF a model operation is already in progress THEN the GPU_Resource_Manager SHALL queue the request and process sequentially

### Requirement 6

**User Story:** As a system operator, I want graceful fallback when GPU operations fail, so that the system remains functional even with degraded performance.

#### Acceptance Criteria

1. IF GPU is not available THEN the GPU_Resource_Manager SHALL operate in CPU-only mode without errors
2. IF Ollama model loading fails after 3 retries THEN the GPU_Resource_Manager SHALL mark VLM as unavailable and notify listeners
3. WHEN operating in degraded mode THEN the GPU_Resource_Manager SHALL log periodic warnings about reduced functionality
4. IF VRAM is insufficient for requested operation THEN the GPU_Resource_Manager SHALL return an error with available VRAM information