fix(grounding): désactiver orchestrateur VRAM pendant exécution + qwen2.5vl:3b pour description

L'orchestrateur VRAM redémarrait Ollama en pleine exécution → timeout. Désactivé pendant le workflow. L'orchestrateur reste disponible pour bascule manuelle avant/après. Description ancre via qwen2.5vl:3b (3 Go) au lieu de 7b — tient en VRAM sans décharger CLIP ni RF-DETR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 10:16:27 +02:00
parent d1b556b6cd
commit 203e5cc6c1
1 changed files with 3 additions and 6 deletions
--- a/core/execution/input_handler.py
+++ b/core/execution/input_handler.py
@@ -286,10 +286,6 @@ Si tu vois un dialogue ou une popup, indique quel bouton cliquer.
 Si l'écran est normal sans action nécessaire, réponds action="nothing".
 Réponds UNIQUEMENT le JSON, pas d'explication."""

-        from core.cognition.vram_orchestrator import get_orchestrator
-        orch = get_orchestrator()
-        orch.ensure_reasoning_ready()
-
        ollama_url = os.environ.get("OLLAMA_URL", "http://localhost:11434")
        model = os.environ.get("RPA_REASONING_MODEL", "qwen2.5vl:7b")

@@ -402,8 +398,9 @@ def _describe_anchor_image(anchor_image_base64: str) -> Optional[str]:
            anchor_image_base64 = anchor_image_base64.split(',', 1)[1]

        ollama_url = os.environ.get("OLLAMA_URL", "http://localhost:11434")
-        model = os.environ.get("RPA_REASONING_MODEL", "qwen2.5vl:7b")
+        model = "qwen2.5vl:3b"

+        logger.info(f"[Grounding] Description ancre via {model}...")
        response = requests.post(
            f"{ollama_url}/api/generate",
            json={
@@ -413,7 +410,7 @@ def _describe_anchor_image(anchor_image_base64: str) -> Optional[str]:
                "stream": False,
                "options": {"temperature": 0.1, "num_predict": 20}
            },
-            timeout=15
+            timeout=30
        )

        if response.status_code == 200: