chore(dgx): snapshot consolidation WIP pour transfert poc DGX
Some checks failed
tests / Lint (ruff + black) (push) Failing after 1m44s
tests / Tests unitaires (sans GPU) (push) Failing after 1m49s
tests / Tests sécurité (critique) (push) Has been skipped

Regroupe le WIP non committé requis pour le clone/runtime DGX (Option A) :
- api_stream.py : préflight replay + smoke santé modèles + handler 403 WP-B
- de-hardcode VLM : vlm_config, gpu/*, vram_orchestrator, ollama_manager
- stream_processor, semantic_matcher, agent_chat (app/planner/intent)
- workflows.db (acquis ; le transfert artifacts le mettra à jour + rewrite chemins)
- docs : plans DGX, benchmarks VLM/grounders, recherche SOTA, coordination 8 juin

Snapshot destiné à la branche poc-dgx poussée sur Gitea pour cloner le DGX.
Scan anti-secret : clean. graphify (repo embarqué) exclu.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Dom
2026-06-08 16:33:58 +02:00
parent f18de016d7
commit 6d34b3cb68
204 changed files with 15744 additions and 47 deletions

View File

@@ -2,7 +2,7 @@
GPU Resource Management Module for RPA Vision V3
This module provides dynamic GPU resource allocation between ML models:
- Ollama VLM (gemma4:e4b par défaut, configurable via RPA_VLM_MODEL) for UI classification
- Ollama VLM (modèle central configurable via RPA_VLM_MODEL) for UI classification
- CLIP (ViT-B-32) for embedding matching
The GPUResourceManager optimizes VRAM usage by:

View File

@@ -2,7 +2,7 @@
GPU Resource Manager - Central orchestrator for GPU resource allocation
Manages dynamic allocation of GPU resources between:
- Ollama VLM (gemma4:e4b par défaut) - ~10 GB VRAM for UI classification
- Ollama VLM (modèle reasoning/VLM central) - ~10 GB VRAM for UI classification
- CLIP (ViT-B-32) - ~500 MB VRAM for embedding matching
Optimizes VRAM usage based on execution mode:
@@ -21,6 +21,8 @@ from datetime import datetime
from enum import Enum
from typing import Any, Callable, Dict, Iterator, List, Optional
from core.detection.vlm_config import get_reasoning_model
logger = logging.getLogger(__name__)
@@ -54,7 +56,7 @@ class VRAMInfo:
class GPUResourceConfig:
"""Configuration for GPU resource management."""
ollama_endpoint: str = "http://localhost:11434"
vlm_model: str = "gemma4:e4b"
vlm_model: str = field(default_factory=get_reasoning_model)
clip_model: str = "ViT-B-32"
idle_timeout_seconds: int = 300 # 5 minutes
vram_threshold_for_clip_gpu_mb: int = 1024 # 1 GB

View File

@@ -13,6 +13,8 @@ from typing import List, Optional
import aiohttp
from core.detection.vlm_config import get_reasoning_model
logger = logging.getLogger(__name__)
@@ -32,7 +34,7 @@ class OllamaManager:
def __init__(
self,
endpoint: str = "http://localhost:11434",
model: str = "gemma4:e4b",
model: Optional[str] = None,
default_keep_alive: str = "5m"
):
"""
@@ -44,7 +46,7 @@ class OllamaManager:
default_keep_alive: Default keep-alive duration
"""
self._endpoint = endpoint.rstrip("/")
self._model = model
self._model = model or get_reasoning_model()
self._default_keep_alive = default_keep_alive
self._session: Optional[aiohttp.ClientSession] = None