perf(grounding): réflexe pHash-only + max_new_tokens 64

Réflexe check : déclenché uniquement si pHash change (popup inattendu),
plus d'OCR full screen systématique à chaque step. Gain ~9s/workflow.

Serveur grounding : max_new_tokens 256→64 (la réponse fait ~20 tokens).

Validé : 5+ tests consécutifs 7/7, apprentissage actif
(CR_patient en fast_exact_text 2.2s, Feuille calcul en template 83ms).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Dom
2026-04-26 03:07:35 +02:00
parent 73cea2385e
commit 90007cc7c1
2 changed files with 25 additions and 20 deletions

View File

@@ -360,7 +360,7 @@ def ground(req: GroundRequest):
# Inference
t0 = time.time()
with torch.no_grad():
gen = _model.generate(**inputs, max_new_tokens=256)
gen = _model.generate(**inputs, max_new_tokens=64)
infer_ms = (time.time() - t0) * 1000
# Decoder