rpa_vision_v3/.kiro/specs/gpu-resource-manager/tasks.md

# Implementation Plan

- [x] 1. Set up project structure and core interfaces
  - [x] 1.1 Create core/gpu/gpu_resource_manager.py with GPUResourceManager class skeleton
    - Define ExecutionMode, ModelState enums
    - Define GPUResourceConfig, GPUResourceStatus, VRAMInfo dataclasses
    - Implement singleton pattern
    - _Requirements: 2.1, 5.1, 5.2, 5.3_
  - [x] 1.2 Create core/gpu/ollama_manager.py with OllamaManager class
    - Implement Ollama API client (load_model, unload_model, is_model_loaded)
    - Add connection health check
    - _Requirements: 1.1, 1.2, 1.5_
  - [x] 1.3 Create core/gpu/vram_monitor.py with VRAMMonitor class
    - Implement pynvml wrapper for VRAM queries
    - Add fallback for systems without GPU
    - _Requirements: 1.4, 2.1, 6.1_
  - [x] 1.4 Write property test for OllamaManager
    - **Property 10: ensure_vlm_loaded blocking**
    - **Property 11: ensure_vlm_unloaded blocking**
    - **Validates: Requirements 5.1, 5.2**

- [x] 2. Implement VLM lifecycle management
  - [x] 2.1 Implement ensure_vlm_loaded() in GPUResourceManager
    - Add async loading with timeout
    - Implement retry logic (max 3 retries)
    - Queue concurrent requests
    - _Requirements: 5.1, 5.4, 6.2_
  - [x] 2.2 Implement ensure_vlm_unloaded() in GPUResourceManager
    - Add async unloading with timeout
    - Verify VRAM decrease
    - _Requirements: 5.2, 1.4_
  - [x] 2.3 Write property test for VLM lifecycle
    - **Property 4: VRAM decrease on VLM unload**
    - **Validates: Requirements 1.4**
  - [x] 2.4 Write property test for blocking behavior
    - **Property 10: ensure_vlm_loaded blocking**
    - **Property 11: ensure_vlm_unloaded blocking**
    - **Validates: Requirements 5.1, 5.2**

- [x] 3. Implement CLIP device management
  - [x] 3.1 Create core/gpu/clip_manager.py with CLIPManager class
    - Implement device detection and migration
    - Add pipeline reinitialization
    - _Requirements: 3.1, 3.3, 3.4_
  - [x] 3.2 Implement migrate_clip_to_gpu() and migrate_clip_to_cpu()
    - Check VRAM availability before GPU migration
    - Handle migration failures gracefully
    - _Requirements: 3.1, 3.2, 3.4_
  - [x] 3.3 Write property test for CLIP device
    - **Property 12: get_clip_device validity**
    - **Validates: Requirements 5.3**
  - [x] 3.4 Write property test for embedding consistency
    - **Property 7: Embedding pipeline consistency**
    - **Validates: Requirements 3.3**

- [x] 4. Checkpoint - Ensure all tests pass
  - Ensure all tests pass, ask the user if questions arise.

- [x] 5. Implement execution mode management
  - [x] 5.1 Implement set_execution_mode() with automatic resource management
    - AUTOPILOT: unload VLM, migrate CLIP to GPU
    - RECORDING: load VLM, migrate CLIP to CPU
    - IDLE: no automatic changes
    - _Requirements: 1.1, 1.2, 1.3, 3.1, 3.2_
  - [x] 5.2 Implement mode transition coordination
    - Ensure CLIP migrates before VLM loads
    - Handle concurrent mode changes
    - _Requirements: 3.2, 5.4_
  - [x] 5.3 Write property test for mode transitions
    - **Property 1: Mode transition triggers VLM unload**
    - **Property 2: Mode transition triggers VLM load**
    - **Validates: Requirements 1.1, 1.2**
  - [x] 5.4 Write property test for CLIP in AUTOPILOT
    - **Property 3: CLIP on GPU in AUTOPILOT**
    - **Validates: Requirements 1.3, 3.1**
  - [x] 5.5 Write property test for migration ordering
    - **Property 6: CLIP migration ordering**
    - **Validates: Requirements 3.2**

- [x] 6. Implement idle timeout management
  - [x] 6.1 Add idle timeout tracking in GPUResourceManager
    - Track last VLM request timestamp
    - Implement background timer for timeout check
    - _Requirements: 4.1, 4.3_
  - [x] 6.2 Implement on-demand VLM loading
    - Intercept VLM requests when unloaded
    - Load VLM before processing request
    - _Requirements: 4.2_
  - [x] 6.3 Write property test for idle timeout
    - **Property 8: Idle timeout behavior**
    - **Validates: Requirements 4.1, 4.3**
  - [x] 6.4 Write property test for on-demand loading
    - **Property 9: On-demand VLM loading**
    - **Validates: Requirements 4.2**

- [x] 7. Implement monitoring and events
  - [x] 7.1 Implement get_status() returning complete GPUResourceStatus
    - Include all fields: vram, vlm_state, clip_device, execution_mode
    - _Requirements: 2.1_
  - [x] 7.2 Implement event emission system
    - resource_changed, mode_changed, idle_unload events
    - VRAM change threshold detection (100 MB)
    - _Requirements: 2.2, 2.3, 4.4_
  - [x] 7.3 Write property test for status completeness
    - **Property 5: Status query completeness**
    - **Validates: Requirements 2.1**

- [x] 8. Implement error handling and degraded mode
  - [x] 8.1 Implement graceful degradation for missing GPU
    - Detect GPU availability at startup
    - Force CPU-only mode if no GPU
    - _Requirements: 6.1_
  - [x] 8.2 Implement Ollama unavailable handling
    - Connection retry logic
    - Degraded mode flag and reason
    - _Requirements: 1.5, 6.2, 6.3_
  - [x] 8.3 Implement VRAM insufficient error handling
    - Check VRAM before operations
    - Return informative errors
    - _Requirements: 6.4_
  - [x] 8.4 Write property test for sequential processing
    - **Property 13: Sequential operation processing**
    - **Validates: Requirements 5.4**

- [x] 9. Checkpoint - Ensure all tests pass
  - Ensure all tests pass, ask the user if questions arise.

- [x] 10. Integration with existing components
  - [x] 10.1 Integrate GPUResourceManager with ExecutionLoop
    - Call set_execution_mode() on mode changes
    - Use ensure_vlm_loaded() before VLM operations
    - _Requirements: 1.1, 1.2, 4.2_
  - [x] 10.2 Integrate with UIDetector
    - Check VLM availability before classification
    - Handle degraded mode gracefully
    - _Requirements: 1.5, 6.2_
  - [x] 10.3 Integrate with FusionEngine/CLIP embedding
    - Use CLIPManager for device-aware embeddings
    - Reinitialize on device change
    - _Requirements: 3.3_
  - [x] 10.4 Update core/config.py with GPU resource configuration
    - Add GPUResourceConfig to AppConfig
    - Support environment variables
    - _Requirements: 4.3_

- [x] 11. Final Checkpoint - Ensure all tests pass
  - Ensure all tests pass, ask the user if questions arise.