perf(grounding): réflexe pHash-only + max_new_tokens 64
Réflexe check : déclenché uniquement si pHash change (popup inattendu), plus d'OCR full screen systématique à chaque step. Gain ~9s/workflow. Serveur grounding : max_new_tokens 256→64 (la réponse fait ~20 tokens). Validé : 5+ tests consécutifs 7/7, apprentissage actif (CR_patient en fast_exact_text 2.2s, Feuille calcul en template 83ms). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -360,7 +360,7 @@ def ground(req: GroundRequest):
|
||||
# Inference
|
||||
t0 = time.time()
|
||||
with torch.no_grad():
|
||||
gen = _model.generate(**inputs, max_new_tokens=256)
|
||||
gen = _model.generate(**inputs, max_new_tokens=64)
|
||||
infer_ms = (time.time() - t0) * 1000
|
||||
|
||||
# Decoder
|
||||
|
||||
Reference in New Issue
Block a user