feat(evaluation): add local Ollama LeaBench adapter

2026-05-24 21:58:06 +02:00
parent 6544ebe3f0
commit debd7b423c
4 changed files with 498 additions and 0 deletions
--- a/benchmarks/computer_use/README.md
+++ b/benchmarks/computer_use/README.md
@@ -59,6 +59,16 @@ python3 tools/lea_bench.py \
  --json
 ```

+Produire des predictions avec Ollama local :
+
+```bash
+python3 tools/lea_bench_ollama.py \
+  --cases benchmarks/computer_use/cases/notepad_replay_failures_2026-05-24.jsonl \
+  --repo-root . \
+  --model qwen2.5vl:7b-rpa \
+  --output benchmarks/computer_use/predictions/qwen25vl_notepad.jsonl
+```
+
 ## Role strategique

 Ce bench evite de choisir un modele sur impression. On mesure :