perf(grounding): réflexe pHash-only + max_new_tokens 64

Réflexe check : déclenché uniquement si pHash change (popup inattendu), plus d'OCR full screen systématique à chaque step. Gain ~9s/workflow. Serveur grounding : max_new_tokens 256→64 (la réponse fait ~20 tokens). Validé : 5+ tests consécutifs 7/7, apprentissage actif (CR_patient en fast_exact_text 2.2s, Feuille calcul en template 83ms). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 03:07:35 +02:00
parent 73cea2385e
commit 90007cc7c1
2 changed files with 25 additions and 20 deletions
--- a/core/grounding/server.py
+++ b/core/grounding/server.py
@@ -360,7 +360,7 @@ def ground(req: GroundRequest):
        # Inference
        t0 = time.time()
        with torch.no_grad():
-            gen = _model.generate(**inputs, max_new_tokens=256)
+            gen = _model.generate(**inputs, max_new_tokens=64)
        infer_ms = (time.time() - t0) * 1000

        # Decoder