fix(grounding): confiance grounding dérivée sémantique (DETTE-019)

Le score/confidence figés à 0.85 dans _resolve_by_grounding rendaient le garde-seuil (_RESOLUTION_MIN_SCORES["grounding"]=0.60) inopérant (0.85>0.60 toujours accepté). Le grounding VLM n'a pas de confiance modèle native (prompt {"x","y"}, pas de logprob de localisation — confirmé QG Qwen 2026-06-15). On dérive une confiance SÉMANTIQUE : le texte cible est-il à la position trouvée ? (_validate_text_at_position). Confirmé→0.90, absent→0.45 (<seuil→rejet), non vérifiable→0.70. Confiance contextuelle documentée, PAS une proba modèle. TDD : 5 tests (score varie / présent accepté / absent rejeté / score==confidence / sans by_text neutre), RED→GREEN. Non-régression : 24 tests resolve_engine + câblage qwen3vl + legacy bbox verts. E2E panel inchangé (15/15). Pré-check OCR non impacté. DETTE-018 (legacy non gardé) reste séparée. refs DETTE-019 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 09:17:46 +02:00
parent c0e4c382be
commit 33c1e2e0d1
2 changed files with 186 additions and 2 deletions
--- a/agent_v0/server_v1/resolve_engine.py
+++ b/agent_v0/server_v1/resolve_engine.py
@@ -870,6 +870,50 @@ def _vlm_quick_find(
 # Résolution par VLM Grounding Direct (configurable via RPA_VLM_MODEL)
 # ---------------------------------------------------------------------------

+# DETTE-019 — confiance grounding DÉRIVÉE (et NON une confiance modèle native).
+# Le grounding VLM ne fournit aucune confiance exploitable : le prompt demande
+# {"x","y"} et aucun logprob de localisation n'est extrait (confirmé QG Qwen
+# 2026-06-15). Le seul signal de confiance RÉEL est sémantique : le texte cible
+# est-il bien à la position trouvée ? On le dérive via la même vérif OCR que le
+# pré-check aval (`_validate_text_at_position`). Approche validée par Dom.
+# ⚠ Confiance CONTEXTUELLE, pas une probabilité du modèle : ne pas l'afficher
+# comme « confiance du VLM » côté dashboard.
+_GROUNDING_CONF_TEXT_CONFIRMED = 0.90   # texte cible retrouvé à la position
+_GROUNDING_CONF_UNVERIFIABLE = 0.70     # pas de texte vérifiable → neutre (> seuil 0.60)
+_GROUNDING_CONF_TEXT_ABSENT = 0.45      # texte cible absent → < seuil 0.60 → rejeté
+
+
+def _grounding_semantic_confidence(
+    screenshot_path: str,
+    x_pct: float,
+    y_pct: float,
+    by_text: str,
+    screen_width: int,
+    screen_height: int,
+) -> float:
+    """Confiance DÉRIVÉE (sémantique) d'un grounding — DETTE-019.
+
+    Mesure contextuelle, PAS une confiance du modèle : le texte cible `by_text`
+    est-il présent à la position (x_pct, y_pct) ? Réutilise la garde OCR du
+    pré-check aval (`_validate_text_at_position`).
+
+    - texte confirmé           → CONFIRMED (accepté)
+    - texte absent             → ABSENT (< seuil → rejeté par
+                                 `_validate_resolution_quality`)
+    - pas de by_text / OCR KO  → UNVERIFIABLE (neutre, > seuil : pas de faux rejet)
+    """
+    by_text = (by_text or "").strip()
+    if not by_text:
+        return _GROUNDING_CONF_UNVERIFIABLE
+    try:
+        is_valid, _observed, _ms = _validate_text_at_position(
+            screenshot_path, x_pct, y_pct, by_text, screen_width, screen_height,
+        )
+    except Exception as e:  # OCR indisponible : dégradation gracieuse, pas de pénalité
+        logger.debug("Grounding confidence : vérif sémantique indisponible (%s) → neutre", e)
+        return _GROUNDING_CONF_UNVERIFIABLE
+    return _GROUNDING_CONF_TEXT_CONFIRMED if is_valid else _GROUNDING_CONF_TEXT_ABSENT
+

 def _resolve_by_grounding(
    screenshot_path: str,
@@ -1113,6 +1157,13 @@ def _resolve_by_grounding(
            _grounding_model, description[:50], x_pct, y_pct, elapsed,
        )

+    # DETTE-019 : confiance DÉRIVÉE sémantique (le texte cible est-il à la
+    # position ?), plus de score figé. Cohérence score == confidence.
+    _conf = _grounding_semantic_confidence(
+        screenshot_path, round(x_pct, 6), round(y_pct, 6),
+        by_text, screen_width, screen_height,
+    )
+
    return {
        "resolved": True,
        # method gardée par _RESOLUTION_MIN_SCORES : en mode qwen3vl, "grounding"
@@ -1125,9 +1176,9 @@ def _resolve_by_grounding(
            "label": description[:60],
            "type": "grounding",
            "role": "grounding_vlm",
-            "confidence": 0.85,
+            "confidence": _conf,
        },
-        "score": 0.85,
+        "score": _conf,
    }