# Implementation Plan - [x] 1. Set up project structure and core interfaces - [x] 1.1 Create core/gpu/gpu_resource_manager.py with GPUResourceManager class skeleton - Define ExecutionMode, ModelState enums - Define GPUResourceConfig, GPUResourceStatus, VRAMInfo dataclasses - Implement singleton pattern - _Requirements: 2.1, 5.1, 5.2, 5.3_ - [x] 1.2 Create core/gpu/ollama_manager.py with OllamaManager class - Implement Ollama API client (load_model, unload_model, is_model_loaded) - Add connection health check - _Requirements: 1.1, 1.2, 1.5_ - [x] 1.3 Create core/gpu/vram_monitor.py with VRAMMonitor class - Implement pynvml wrapper for VRAM queries - Add fallback for systems without GPU - _Requirements: 1.4, 2.1, 6.1_ - [x] 1.4 Write property test for OllamaManager - **Property 10: ensure_vlm_loaded blocking** - **Property 11: ensure_vlm_unloaded blocking** - **Validates: Requirements 5.1, 5.2** - [x] 2. Implement VLM lifecycle management - [x] 2.1 Implement ensure_vlm_loaded() in GPUResourceManager - Add async loading with timeout - Implement retry logic (max 3 retries) - Queue concurrent requests - _Requirements: 5.1, 5.4, 6.2_ - [x] 2.2 Implement ensure_vlm_unloaded() in GPUResourceManager - Add async unloading with timeout - Verify VRAM decrease - _Requirements: 5.2, 1.4_ - [x] 2.3 Write property test for VLM lifecycle - **Property 4: VRAM decrease on VLM unload** - **Validates: Requirements 1.4** - [x] 2.4 Write property test for blocking behavior - **Property 10: ensure_vlm_loaded blocking** - **Property 11: ensure_vlm_unloaded blocking** - **Validates: Requirements 5.1, 5.2** - [x] 3. Implement CLIP device management - [x] 3.1 Create core/gpu/clip_manager.py with CLIPManager class - Implement device detection and migration - Add pipeline reinitialization - _Requirements: 3.1, 3.3, 3.4_ - [x] 3.2 Implement migrate_clip_to_gpu() and migrate_clip_to_cpu() - Check VRAM availability before GPU migration - Handle migration failures gracefully - _Requirements: 3.1, 3.2, 3.4_ - [x] 3.3 Write property test for CLIP device - **Property 12: get_clip_device validity** - **Validates: Requirements 5.3** - [x] 3.4 Write property test for embedding consistency - **Property 7: Embedding pipeline consistency** - **Validates: Requirements 3.3** - [x] 4. Checkpoint - Ensure all tests pass - Ensure all tests pass, ask the user if questions arise. - [x] 5. Implement execution mode management - [x] 5.1 Implement set_execution_mode() with automatic resource management - AUTOPILOT: unload VLM, migrate CLIP to GPU - RECORDING: load VLM, migrate CLIP to CPU - IDLE: no automatic changes - _Requirements: 1.1, 1.2, 1.3, 3.1, 3.2_ - [x] 5.2 Implement mode transition coordination - Ensure CLIP migrates before VLM loads - Handle concurrent mode changes - _Requirements: 3.2, 5.4_ - [x] 5.3 Write property test for mode transitions - **Property 1: Mode transition triggers VLM unload** - **Property 2: Mode transition triggers VLM load** - **Validates: Requirements 1.1, 1.2** - [x] 5.4 Write property test for CLIP in AUTOPILOT - **Property 3: CLIP on GPU in AUTOPILOT** - **Validates: Requirements 1.3, 3.1** - [x] 5.5 Write property test for migration ordering - **Property 6: CLIP migration ordering** - **Validates: Requirements 3.2** - [x] 6. Implement idle timeout management - [x] 6.1 Add idle timeout tracking in GPUResourceManager - Track last VLM request timestamp - Implement background timer for timeout check - _Requirements: 4.1, 4.3_ - [x] 6.2 Implement on-demand VLM loading - Intercept VLM requests when unloaded - Load VLM before processing request - _Requirements: 4.2_ - [x] 6.3 Write property test for idle timeout - **Property 8: Idle timeout behavior** - **Validates: Requirements 4.1, 4.3** - [x] 6.4 Write property test for on-demand loading - **Property 9: On-demand VLM loading** - **Validates: Requirements 4.2** - [x] 7. Implement monitoring and events - [x] 7.1 Implement get_status() returning complete GPUResourceStatus - Include all fields: vram, vlm_state, clip_device, execution_mode - _Requirements: 2.1_ - [x] 7.2 Implement event emission system - resource_changed, mode_changed, idle_unload events - VRAM change threshold detection (100 MB) - _Requirements: 2.2, 2.3, 4.4_ - [x] 7.3 Write property test for status completeness - **Property 5: Status query completeness** - **Validates: Requirements 2.1** - [x] 8. Implement error handling and degraded mode - [x] 8.1 Implement graceful degradation for missing GPU - Detect GPU availability at startup - Force CPU-only mode if no GPU - _Requirements: 6.1_ - [x] 8.2 Implement Ollama unavailable handling - Connection retry logic - Degraded mode flag and reason - _Requirements: 1.5, 6.2, 6.3_ - [x] 8.3 Implement VRAM insufficient error handling - Check VRAM before operations - Return informative errors - _Requirements: 6.4_ - [x] 8.4 Write property test for sequential processing - **Property 13: Sequential operation processing** - **Validates: Requirements 5.4** - [x] 9. Checkpoint - Ensure all tests pass - Ensure all tests pass, ask the user if questions arise. - [x] 10. Integration with existing components - [x] 10.1 Integrate GPUResourceManager with ExecutionLoop - Call set_execution_mode() on mode changes - Use ensure_vlm_loaded() before VLM operations - _Requirements: 1.1, 1.2, 4.2_ - [x] 10.2 Integrate with UIDetector - Check VLM availability before classification - Handle degraded mode gracefully - _Requirements: 1.5, 6.2_ - [x] 10.3 Integrate with FusionEngine/CLIP embedding - Use CLIPManager for device-aware embeddings - Reinitialize on device change - _Requirements: 3.3_ - [x] 10.4 Update core/config.py with GPU resource configuration - Add GPUResourceConfig to AppConfig - Support environment variables - _Requirements: 4.3_ - [x] 11. Final Checkpoint - Ensure all tests pass - Ensure all tests pass, ask the user if questions arise.