Compare commits
148 Commits
c7b0649716
...
demo/ght-2
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
f8dc3c3af4 | ||
|
|
ca81850a20 | ||
|
|
35fd6cf4c5 | ||
|
|
7847a0e829 | ||
|
|
40440f1ca0 | ||
|
|
7233df2bb9 | ||
|
|
f62fda575f | ||
|
|
22c0a2ba61 | ||
|
|
6fdedbfe9d | ||
|
|
c969f93a23 | ||
|
|
1cbec2806e | ||
|
|
864530c851 | ||
|
|
d1ebf62217 | ||
|
|
87dbe8c5ff | ||
|
|
0a02a6ec9c | ||
|
|
83be93e121 | ||
|
|
f5c33477f0 | ||
|
|
b1a3aa16f1 | ||
|
|
0bcfddbbc4 | ||
|
|
aa47172f0f | ||
|
|
65da557310 | ||
|
|
af13cd80ff | ||
|
|
7c6945171e | ||
|
|
ca0b436a61 | ||
|
|
fc01afa59c | ||
|
|
2a51a844b9 | ||
|
|
2d71e2a249 | ||
|
|
fae95c5366 | ||
|
|
6582a69d31 | ||
|
|
5543e25f9d | ||
|
|
2a07d8084b | ||
|
|
35b27ae492 | ||
|
|
b584bbabc3 | ||
|
|
8817f527e7 | ||
|
|
964856ab30 | ||
|
|
a67d896104 | ||
|
|
90c1d8036f | ||
|
|
6261002039 | ||
|
|
0e6e61f2b1 | ||
|
|
41c1250c99 | ||
|
|
2af3bc3b93 | ||
|
|
6154423a91 | ||
|
|
41eba898c0 | ||
|
|
9452e86fd1 | ||
|
|
5e31cdf666 | ||
|
|
487bcb8618 | ||
|
|
3d6868f029 | ||
|
|
f73a2a59a9 | ||
|
|
77faa03ec9 | ||
|
|
343d6fbe95 | ||
|
|
cc64439738 | ||
|
|
90007cc7c1 | ||
|
|
73cea2385e | ||
|
|
e2046837cf | ||
|
|
b30d4b6656 | ||
|
|
e4a48e78bf | ||
|
|
ea36bba5cc | ||
|
|
9da589c8c2 | ||
|
|
16ff396dbf | ||
|
|
e44fd7b328 | ||
|
|
66815b7a1a | ||
|
|
c6b695eca8 | ||
|
|
99d2083dea | ||
|
|
a718086140 | ||
|
|
c82979e72b | ||
|
|
2185c41cc1 | ||
|
|
26804eb123 | ||
|
|
d71d5df4a8 | ||
|
|
6829ad8e79 | ||
|
|
8903f35433 | ||
|
|
4ab2c15e5c | ||
|
|
eba6fea779 | ||
|
|
f04398d5a7 | ||
|
|
4ce9c47f45 | ||
|
|
9dfcdb5fb0 | ||
|
|
3efe15d2c7 | ||
|
|
9d87ed64c5 | ||
|
|
00134963e5 | ||
|
|
0ec5e2a25b | ||
|
|
0c5fffe951 | ||
|
|
5027ed9a23 | ||
|
|
6caab2c600 | ||
|
|
552e66dbf6 | ||
|
|
de1026ee2e | ||
|
|
7b50725bf8 | ||
|
|
7feef3b6a9 | ||
|
|
0b06db222d | ||
|
|
74ee0dadee | ||
|
|
0b452f975a | ||
|
|
6ab385d671 | ||
|
|
b3eab83a0f | ||
|
|
27490849a8 | ||
|
|
cebbf0809a | ||
|
|
3e227d28ad | ||
|
|
8ce63fcba2 | ||
|
|
4202431421 | ||
|
|
4923623dd4 | ||
|
|
84181cc982 | ||
|
|
7355d315a3 | ||
|
|
c50adab3a1 | ||
|
|
2fbb305f65 | ||
|
|
ff581be397 | ||
|
|
203e5cc6c1 | ||
|
|
d1b556b6cd | ||
|
|
729cd67743 | ||
|
|
73ddcdb29d | ||
|
|
14a9442343 | ||
|
|
5da4581e76 | ||
|
|
cbe8dc95d2 | ||
|
|
04a14a56b2 | ||
|
|
2290f1846b | ||
|
|
c57b40ae1d | ||
|
|
bc21b27da7 | ||
|
|
6a2248ddcd | ||
|
|
82d7b38cff | ||
|
|
6c7f88c05d | ||
|
|
447fbb2c6e | ||
|
|
623be15bfe | ||
|
|
55d5aebbd2 | ||
|
|
73b731fef8 | ||
|
|
ffd97ae9a5 | ||
|
|
d168833609 | ||
|
|
23a06a744c | ||
|
|
af4eae28b9 | ||
|
|
c198c930a1 | ||
|
|
e3efef2fe7 | ||
|
|
95fddeebb3 | ||
|
|
71523cebd3 | ||
|
|
3aa806a630 | ||
|
|
588c8f22c1 | ||
|
|
3d243d731d | ||
|
|
2431a6c9e9 | ||
|
|
969236da03 | ||
|
|
f30461b88c | ||
|
|
f34eca20f9 | ||
|
|
309dfd5287 | ||
|
|
f5a672d7b9 | ||
|
|
1acea85fa6 | ||
|
|
4f61741420 | ||
|
|
2fa864b5c7 | ||
|
|
10739c33fa | ||
|
|
39bea1b042 | ||
|
|
26b4e6d8ce | ||
|
|
4fb84b1090 | ||
|
|
7f2bc6fe97 | ||
|
|
eded968c70 | ||
|
|
53d29d9b24 | ||
|
|
690053bd57 |
@@ -46,6 +46,14 @@ LOGS_PATH=logs
|
|||||||
UPLOADS_PATH=data/training/uploads
|
UPLOADS_PATH=data/training/uploads
|
||||||
SESSIONS_PATH=data/training/sessions
|
SESSIONS_PATH=data/training/sessions
|
||||||
|
|
||||||
|
# ============================================================================
|
||||||
|
# Feedback Bus (Léa parle pendant exécution)
|
||||||
|
# ============================================================================
|
||||||
|
# Bus SocketIO unifié 'lea:*' (action_started, action_done, need_confirm, paused).
|
||||||
|
# Désactivé par défaut. Mettre à 1 pour activer les bulles temps réel dans ChatWindow.
|
||||||
|
# Si la connexion bus échoue, l'exécution continue normalement (fail-safe).
|
||||||
|
LEA_FEEDBACK_BUS=0
|
||||||
|
|
||||||
# ============================================================================
|
# ============================================================================
|
||||||
# FAISS
|
# FAISS
|
||||||
# ============================================================================
|
# ============================================================================
|
||||||
|
|||||||
@@ -33,6 +33,10 @@ env:
|
|||||||
# Les modules d'exécution lisent parfois ces vars ; valeurs neutres en CI.
|
# Les modules d'exécution lisent parfois ces vars ; valeurs neutres en CI.
|
||||||
RPA_VISION_CI: "1"
|
RPA_VISION_CI: "1"
|
||||||
RPA_AUTH_VAULT_PATH: "/tmp/ci_vault.enc"
|
RPA_AUTH_VAULT_PATH: "/tmp/ci_vault.enc"
|
||||||
|
# api_stream.py a un fail-closed P0-C : si RPA_API_TOKEN absent, sys.exit(1)
|
||||||
|
# au module load. On fournit un token bidon pour que les imports passent en CI.
|
||||||
|
# (Le token n'est jamais utilisé réellement — les tests mockent les requêtes.)
|
||||||
|
RPA_API_TOKEN: "ci_test_token_not_used_for_real_auth_just_to_pass_import_check_0123456789"
|
||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
# ----------------------------------------------------------------
|
# ----------------------------------------------------------------
|
||||||
@@ -69,9 +73,17 @@ jobs:
|
|||||||
- name: Ruff (lint rapide)
|
- name: Ruff (lint rapide)
|
||||||
run: |
|
run: |
|
||||||
if command -v ruff >/dev/null 2>&1; then
|
if command -v ruff >/dev/null 2>&1; then
|
||||||
# Ruff : on limite aux erreurs critiques (E9, F63, F7, F82) pour
|
# Ruff : erreurs critiques uniquement (E9 syntax, F63 invalid print,
|
||||||
# éviter le bruit. Dom peut durcir progressivement.
|
# F7 syntax, F82 undefined in __all__).
|
||||||
|
# F821 (undefined name) volontairement exclu le temps de nettoyer
|
||||||
|
# la dette technique préexistante (voir docs/STATUS.md).
|
||||||
|
# Dossiers legacy exclus :
|
||||||
|
# - agent_v0/deploy/windows_client/ : clone obsolète (marqué OBSOLÈTE)
|
||||||
|
# - tests/property/ : tests cassés connus (cf. MEMORY.md)
|
||||||
ruff check --select=E9,F63,F7,F82 --output-format=github \
|
ruff check --select=E9,F63,F7,F82 --output-format=github \
|
||||||
|
--exclude "agent_v0/deploy/windows_client" \
|
||||||
|
--exclude "tests/property" \
|
||||||
|
--exclude "tests/integration/test_visual_rpa_checkpoint.py" \
|
||||||
core/ agent_v0/ tests/ || {
|
core/ agent_v0/ tests/ || {
|
||||||
echo "::warning::Ruff a trouvé des erreurs critiques"
|
echo "::warning::Ruff a trouvé des erreurs critiques"
|
||||||
exit 1
|
exit 1
|
||||||
@@ -84,7 +96,10 @@ jobs:
|
|||||||
run: |
|
run: |
|
||||||
if command -v black >/dev/null 2>&1; then
|
if command -v black >/dev/null 2>&1; then
|
||||||
# --check : ne modifie pas, signale juste.
|
# --check : ne modifie pas, signale juste.
|
||||||
black --check --diff core/ agent_v0/ tests/ || {
|
# Dossiers legacy exclus (cohérent avec ruff).
|
||||||
|
black --check --diff \
|
||||||
|
--exclude "agent_v0/deploy/windows_client|tests/property" \
|
||||||
|
core/ agent_v0/ tests/ || {
|
||||||
echo "::warning::Black suggère un reformatage — non bloquant"
|
echo "::warning::Black suggère un reformatage — non bloquant"
|
||||||
exit 0
|
exit 0
|
||||||
}
|
}
|
||||||
|
|||||||
1
.gitignore
vendored
1
.gitignore
vendored
@@ -95,6 +95,7 @@ archives/
|
|||||||
|
|
||||||
# === Données runtime (sessions, learning, buffer, config local) ===
|
# === Données runtime (sessions, learning, buffer, config local) ===
|
||||||
data/
|
data/
|
||||||
|
**/capture_library.json
|
||||||
.hypothesis/
|
.hypothesis/
|
||||||
.deps_installed
|
.deps_installed
|
||||||
# Buffers SQLite locaux (streamer, cache)
|
# Buffers SQLite locaux (streamer, cache)
|
||||||
|
|||||||
@@ -185,6 +185,7 @@ Quelques tests legacy sont connus comme cassés — voir la mémoire projet et
|
|||||||
|
|
||||||
- [`docs/STATUS.md`](docs/STATUS.md) — état réel par module
|
- [`docs/STATUS.md`](docs/STATUS.md) — état réel par module
|
||||||
- [`docs/DEV_SETUP.md`](docs/DEV_SETUP.md) — tâches d'administration (worktrees, build)
|
- [`docs/DEV_SETUP.md`](docs/DEV_SETUP.md) — tâches d'administration (worktrees, build)
|
||||||
|
- [`docs/EXECUTION_LOOP_FLAGS.md`](docs/EXECUTION_LOOP_FLAGS.md) — flags C1 vision-aware (`enable_ui_detection`, `enable_ocr`, `analyze_timeout_ms`, `window_info_provider`)
|
||||||
- [`docs/VISION_RPA_INTELLIGENT.md`](docs/VISION_RPA_INTELLIGENT.md) — cahier des charges
|
- [`docs/VISION_RPA_INTELLIGENT.md`](docs/VISION_RPA_INTELLIGENT.md) — cahier des charges
|
||||||
- [`docs/PLAN_ACTEUR_V1.md`](docs/PLAN_ACTEUR_V1.md) — architecture 3 niveaux (Macro / Méso / Micro)
|
- [`docs/PLAN_ACTEUR_V1.md`](docs/PLAN_ACTEUR_V1.md) — architecture 3 niveaux (Macro / Méso / Micro)
|
||||||
- [`docs/CONFORMITE_AI_ACT.md`](docs/CONFORMITE_AI_ACT.md) — journalisation, floutage, rétention
|
- [`docs/CONFORMITE_AI_ACT.md`](docs/CONFORMITE_AI_ACT.md) — journalisation, floutage, rétention
|
||||||
|
|||||||
@@ -133,6 +133,28 @@ def _streaming_headers() -> dict:
|
|||||||
headers["Authorization"] = f"Bearer {_STREAMING_API_TOKEN}"
|
headers["Authorization"] = f"Bearer {_STREAMING_API_TOKEN}"
|
||||||
return headers
|
return headers
|
||||||
|
|
||||||
|
|
||||||
|
# ============================================================
|
||||||
|
# Feedback Bus — events 'lea:*' temps réel vers ChatWindow
|
||||||
|
# ============================================================
|
||||||
|
LEA_FEEDBACK_BUS = os.environ.get("LEA_FEEDBACK_BUS", "0").lower() in ("1", "true", "yes", "on")
|
||||||
|
|
||||||
|
|
||||||
|
def _emit_lea(event: str, payload: Dict[str, Any]) -> None:
|
||||||
|
"""Émet 'lea:{event}' sur le bus SocketIO. No-op silencieux si flag off ou erreur."""
|
||||||
|
if not LEA_FEEDBACK_BUS:
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
socketio.emit(f"lea:{event}", payload)
|
||||||
|
except Exception:
|
||||||
|
logger.debug("_emit_lea silenced", exc_info=True)
|
||||||
|
|
||||||
|
|
||||||
|
def _emit_dual(legacy_event: str, lea_event: str, payload: Dict[str, Any], **kwargs) -> None:
|
||||||
|
"""Émet l'event legacy (compat dashboard) ET l'alias lea:* (ChatWindow tkinter)."""
|
||||||
|
socketio.emit(legacy_event, payload, **kwargs)
|
||||||
|
_emit_lea(lea_event, payload)
|
||||||
|
|
||||||
execution_status = {
|
execution_status = {
|
||||||
"running": False,
|
"running": False,
|
||||||
"workflow": None,
|
"workflow": None,
|
||||||
@@ -623,7 +645,7 @@ def api_execute():
|
|||||||
}
|
}
|
||||||
|
|
||||||
# Notifier via WebSocket
|
# Notifier via WebSocket
|
||||||
socketio.emit('execution_started', {
|
_emit_dual('execution_started', 'action_started', {
|
||||||
"workflow": match.workflow_name,
|
"workflow": match.workflow_name,
|
||||||
"params": all_params
|
"params": all_params
|
||||||
})
|
})
|
||||||
@@ -1181,28 +1203,28 @@ def _execute_gesture(gesture):
|
|||||||
)
|
)
|
||||||
|
|
||||||
if resp.status_code == 200:
|
if resp.status_code == 200:
|
||||||
socketio.emit('execution_completed', {
|
_emit_dual('execution_completed', 'done', {
|
||||||
"workflow": gesture.name,
|
"workflow": gesture.name,
|
||||||
"success": True,
|
"success": True,
|
||||||
"message": f"Geste '{gesture.name}' ({'+'.join(gesture.keys)}) envoyé",
|
"message": f"Geste '{gesture.name}' ({'+'.join(gesture.keys)}) envoyé",
|
||||||
})
|
})
|
||||||
else:
|
else:
|
||||||
error = resp.text[:200]
|
error = resp.text[:200]
|
||||||
socketio.emit('execution_completed', {
|
_emit_dual('execution_completed', 'done', {
|
||||||
"workflow": gesture.name,
|
"workflow": gesture.name,
|
||||||
"success": False,
|
"success": False,
|
||||||
"message": f"Erreur: {error}",
|
"message": f"Erreur: {error}",
|
||||||
})
|
})
|
||||||
|
|
||||||
except http_requests.ConnectionError:
|
except http_requests.ConnectionError:
|
||||||
socketio.emit('execution_completed', {
|
_emit_dual('execution_completed', 'done', {
|
||||||
"workflow": gesture.name,
|
"workflow": gesture.name,
|
||||||
"success": False,
|
"success": False,
|
||||||
"message": "Serveur de streaming non disponible (port 5005).",
|
"message": "Serveur de streaming non disponible (port 5005).",
|
||||||
})
|
})
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Gesture execution error: {e}")
|
logger.error(f"Gesture execution error: {e}")
|
||||||
socketio.emit('execution_completed', {
|
_emit_dual('execution_completed', 'done', {
|
||||||
"workflow": gesture.name,
|
"workflow": gesture.name,
|
||||||
"success": False,
|
"success": False,
|
||||||
"message": f"Erreur: {str(e)}",
|
"message": f"Erreur: {str(e)}",
|
||||||
@@ -1661,6 +1683,52 @@ def handle_copilot_abort():
|
|||||||
})
|
})
|
||||||
|
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# Bulle paused_need_help — handlers SocketIO depuis ChatWindow (J3.5)
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
@socketio.on('lea:replay_resume')
|
||||||
|
def handle_lea_replay_resume(data):
|
||||||
|
"""Bouton Continuer : relayer le resume vers le streaming server."""
|
||||||
|
replay_id = (data or {}).get("replay_id")
|
||||||
|
if not replay_id:
|
||||||
|
_emit_lea("resume_acked", {"status": "error", "detail": "replay_id manquant"})
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
resp = http_requests.post(
|
||||||
|
f"{STREAMING_SERVER_URL}/api/v1/traces/stream/replay/{replay_id}/resume",
|
||||||
|
headers=_streaming_headers(),
|
||||||
|
timeout=5,
|
||||||
|
)
|
||||||
|
if resp.ok:
|
||||||
|
logger.info(f"Replay {replay_id} resume relayé OK")
|
||||||
|
_emit_lea("resume_acked", {"replay_id": replay_id, "status": "ok"})
|
||||||
|
else:
|
||||||
|
detail = resp.text[:200]
|
||||||
|
logger.warning(f"Resume échoué (HTTP {resp.status_code}): {detail}")
|
||||||
|
_emit_lea("resume_acked", {
|
||||||
|
"replay_id": replay_id, "status": "error",
|
||||||
|
"http_status": resp.status_code, "detail": detail,
|
||||||
|
})
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Resume relay error: {e}")
|
||||||
|
_emit_lea("resume_acked", {
|
||||||
|
"replay_id": replay_id, "status": "error", "detail": str(e),
|
||||||
|
})
|
||||||
|
|
||||||
|
|
||||||
|
@socketio.on('lea:replay_abort')
|
||||||
|
def handle_lea_replay_abort(data):
|
||||||
|
"""Bouton Annuler : arrêter le polling local. Le replay côté streaming sera
|
||||||
|
cleaned up naturellement au prochain replay (cf api_stream._replay_states stale)."""
|
||||||
|
global execution_status
|
||||||
|
replay_id = (data or {}).get("replay_id")
|
||||||
|
execution_status["running"] = False
|
||||||
|
execution_status["message"] = "Annulé par l'utilisateur"
|
||||||
|
logger.info(f"Replay {replay_id or '?'} abort par l'utilisateur (paused bubble)")
|
||||||
|
_emit_lea("abort_acked", {"replay_id": replay_id, "status": "ok"})
|
||||||
|
|
||||||
|
|
||||||
# =============================================================================
|
# =============================================================================
|
||||||
# Exécution de workflow
|
# Exécution de workflow
|
||||||
# =============================================================================
|
# =============================================================================
|
||||||
@@ -1730,14 +1798,20 @@ def _poll_replay_progress(replay_id: str, workflow_name: str, total_actions: int
|
|||||||
"""Suivre la progression d'un replay distant via polling."""
|
"""Suivre la progression d'un replay distant via polling."""
|
||||||
import time
|
import time
|
||||||
|
|
||||||
max_wait = 120 # 2 minutes max
|
max_wait_running = 120 # 2 min en exécution active
|
||||||
|
max_wait_paused = 600 # 10 min en pause supervisée (humain peut prendre son temps)
|
||||||
poll_interval = 2.0
|
poll_interval = 2.0
|
||||||
elapsed = 0
|
elapsed = 0
|
||||||
|
was_paused = False
|
||||||
|
|
||||||
while elapsed < max_wait and execution_status.get("running"):
|
while execution_status.get("running"):
|
||||||
time.sleep(poll_interval)
|
time.sleep(poll_interval)
|
||||||
elapsed += poll_interval
|
elapsed += poll_interval
|
||||||
|
|
||||||
|
cap = max_wait_paused if was_paused else max_wait_running
|
||||||
|
if elapsed >= cap:
|
||||||
|
break
|
||||||
|
|
||||||
try:
|
try:
|
||||||
resp = http_requests.get(
|
resp = http_requests.get(
|
||||||
f"{STREAMING_SERVER_URL}/api/v1/traces/stream/replay/{replay_id}",
|
f"{STREAMING_SERVER_URL}/api/v1/traces/stream/replay/{replay_id}",
|
||||||
@@ -1753,7 +1827,26 @@ def _poll_replay_progress(replay_id: str, workflow_name: str, total_actions: int
|
|||||||
failed = data.get("failed_actions", 0)
|
failed = data.get("failed_actions", 0)
|
||||||
progress = int(10 + (completed / max(total_actions, 1)) * 80)
|
progress = int(10 + (completed / max(total_actions, 1)) * 80)
|
||||||
|
|
||||||
socketio.emit('execution_progress', {
|
if status == "paused_need_help" and not was_paused:
|
||||||
|
_emit_lea("paused", {
|
||||||
|
"workflow": workflow_name,
|
||||||
|
"replay_id": replay_id,
|
||||||
|
"completed": completed,
|
||||||
|
"total": total_actions,
|
||||||
|
"failed_action": data.get("failed_action"),
|
||||||
|
"reason": data.get("error") or "Action incertaine",
|
||||||
|
})
|
||||||
|
was_paused = True
|
||||||
|
elapsed = 0
|
||||||
|
elif was_paused and status != "paused_need_help":
|
||||||
|
_emit_lea("resumed", {
|
||||||
|
"workflow": workflow_name,
|
||||||
|
"replay_id": replay_id,
|
||||||
|
"status_after": status,
|
||||||
|
})
|
||||||
|
was_paused = False
|
||||||
|
|
||||||
|
_emit_dual('execution_progress', 'action_progress', {
|
||||||
"progress": progress,
|
"progress": progress,
|
||||||
"step": f"Action {completed}/{total_actions} exécutée",
|
"step": f"Action {completed}/{total_actions} exécutée",
|
||||||
"current": completed,
|
"current": completed,
|
||||||
@@ -1922,7 +2015,7 @@ def execute_workflow_copilot(match, params: Dict[str, Any]):
|
|||||||
|
|
||||||
actions = _build_actions_from_workflow(match, params)
|
actions = _build_actions_from_workflow(match, params)
|
||||||
if not actions:
|
if not actions:
|
||||||
socketio.emit('copilot_complete', {
|
_emit_dual('copilot_complete', 'done', {
|
||||||
"workflow": workflow_name,
|
"workflow": workflow_name,
|
||||||
"status": "error",
|
"status": "error",
|
||||||
"message": "Aucune action exécutable dans ce workflow.",
|
"message": "Aucune action exécutable dans ce workflow.",
|
||||||
@@ -1959,7 +2052,7 @@ def execute_workflow_copilot(match, params: Dict[str, Any]):
|
|||||||
break
|
break
|
||||||
|
|
||||||
copilot_state["status"] = "waiting_approval"
|
copilot_state["status"] = "waiting_approval"
|
||||||
socketio.emit('copilot_step', {
|
_emit_dual('copilot_step', 'need_confirm', {
|
||||||
"workflow": workflow_name,
|
"workflow": workflow_name,
|
||||||
"step_index": idx,
|
"step_index": idx,
|
||||||
"total": total,
|
"total": total,
|
||||||
@@ -1982,7 +2075,7 @@ def execute_workflow_copilot(match, params: Dict[str, Any]):
|
|||||||
|
|
||||||
if waited >= max_wait:
|
if waited >= max_wait:
|
||||||
copilot_state["status"] = "aborted"
|
copilot_state["status"] = "aborted"
|
||||||
socketio.emit('copilot_complete', {
|
_emit_dual('copilot_complete', 'done', {
|
||||||
"workflow": workflow_name,
|
"workflow": workflow_name,
|
||||||
"status": "timeout",
|
"status": "timeout",
|
||||||
"message": f"Timeout : pas de réponse après {max_wait}s.",
|
"message": f"Timeout : pas de réponse après {max_wait}s.",
|
||||||
@@ -1999,7 +2092,7 @@ def execute_workflow_copilot(match, params: Dict[str, Any]):
|
|||||||
elif decision == "skipped":
|
elif decision == "skipped":
|
||||||
copilot_state["skipped"] += 1
|
copilot_state["skipped"] += 1
|
||||||
logger.info(f"Copilot skip étape {idx + 1}/{total}")
|
logger.info(f"Copilot skip étape {idx + 1}/{total}")
|
||||||
socketio.emit('copilot_step_result', {
|
_emit_dual('copilot_step_result', 'step_result', {
|
||||||
"step_index": idx,
|
"step_index": idx,
|
||||||
"total": total,
|
"total": total,
|
||||||
"status": "skipped",
|
"status": "skipped",
|
||||||
@@ -2034,7 +2127,7 @@ def execute_workflow_copilot(match, params: Dict[str, Any]):
|
|||||||
|
|
||||||
if action_success:
|
if action_success:
|
||||||
copilot_state["completed"] += 1
|
copilot_state["completed"] += 1
|
||||||
socketio.emit('copilot_step_result', {
|
_emit_dual('copilot_step_result', 'step_result', {
|
||||||
"step_index": idx,
|
"step_index": idx,
|
||||||
"total": total,
|
"total": total,
|
||||||
"status": "completed",
|
"status": "completed",
|
||||||
@@ -2042,7 +2135,7 @@ def execute_workflow_copilot(match, params: Dict[str, Any]):
|
|||||||
})
|
})
|
||||||
else:
|
else:
|
||||||
copilot_state["failed"] += 1
|
copilot_state["failed"] += 1
|
||||||
socketio.emit('copilot_step_result', {
|
_emit_dual('copilot_step_result', 'step_result', {
|
||||||
"step_index": idx,
|
"step_index": idx,
|
||||||
"total": total,
|
"total": total,
|
||||||
"status": "failed",
|
"status": "failed",
|
||||||
@@ -2051,7 +2144,7 @@ def execute_workflow_copilot(match, params: Dict[str, Any]):
|
|||||||
else:
|
else:
|
||||||
error = resp.text[:200]
|
error = resp.text[:200]
|
||||||
copilot_state["failed"] += 1
|
copilot_state["failed"] += 1
|
||||||
socketio.emit('copilot_step_result', {
|
_emit_dual('copilot_step_result', 'step_result', {
|
||||||
"step_index": idx,
|
"step_index": idx,
|
||||||
"total": total,
|
"total": total,
|
||||||
"status": "failed",
|
"status": "failed",
|
||||||
@@ -2060,7 +2153,7 @@ def execute_workflow_copilot(match, params: Dict[str, Any]):
|
|||||||
|
|
||||||
except http_requests.ConnectionError:
|
except http_requests.ConnectionError:
|
||||||
copilot_state["failed"] += 1
|
copilot_state["failed"] += 1
|
||||||
socketio.emit('copilot_step_result', {
|
_emit_dual('copilot_step_result', 'step_result', {
|
||||||
"step_index": idx,
|
"step_index": idx,
|
||||||
"total": total,
|
"total": total,
|
||||||
"status": "failed",
|
"status": "failed",
|
||||||
@@ -2070,7 +2163,7 @@ def execute_workflow_copilot(match, params: Dict[str, Any]):
|
|||||||
except Exception as e:
|
except Exception as e:
|
||||||
copilot_state["failed"] += 1
|
copilot_state["failed"] += 1
|
||||||
logger.error(f"Copilot action error: {e}")
|
logger.error(f"Copilot action error: {e}")
|
||||||
socketio.emit('copilot_step_result', {
|
_emit_dual('copilot_step_result', 'step_result', {
|
||||||
"step_index": idx,
|
"step_index": idx,
|
||||||
"total": total,
|
"total": total,
|
||||||
"status": "failed",
|
"status": "failed",
|
||||||
@@ -2098,7 +2191,7 @@ def execute_workflow_copilot(match, params: Dict[str, Any]):
|
|||||||
f"Copilot terminé : {completed} réussies, "
|
f"Copilot terminé : {completed} réussies, "
|
||||||
f"{skipped} passées, {failed} échouées sur {total} étapes."
|
f"{skipped} passées, {failed} échouées sur {total} étapes."
|
||||||
)
|
)
|
||||||
socketio.emit('copilot_complete', {
|
_emit_dual('copilot_complete', 'done', {
|
||||||
"workflow": workflow_name,
|
"workflow": workflow_name,
|
||||||
"status": "completed" if success else "partial",
|
"status": "completed" if success else "partial",
|
||||||
"message": message,
|
"message": message,
|
||||||
@@ -2175,7 +2268,7 @@ def execute_workflow(match, params):
|
|||||||
execution_status["progress"] = 10
|
execution_status["progress"] = 10
|
||||||
execution_status["message"] = f"Envoyé à l'Agent V1 ({target_session})"
|
execution_status["message"] = f"Envoyé à l'Agent V1 ({target_session})"
|
||||||
|
|
||||||
socketio.emit('execution_progress', {
|
_emit_dual('execution_progress', 'action_progress', {
|
||||||
"progress": 10,
|
"progress": 10,
|
||||||
"step": f"Replay envoyé à l'Agent V1 — {total_actions} actions en attente",
|
"step": f"Replay envoyé à l'Agent V1 — {total_actions} actions en attente",
|
||||||
"current": 0,
|
"current": 0,
|
||||||
@@ -2523,7 +2616,7 @@ def update_progress(progress: int, message: str, current: int, total: int):
|
|||||||
execution_status["progress"] = progress
|
execution_status["progress"] = progress
|
||||||
execution_status["message"] = message
|
execution_status["message"] = message
|
||||||
|
|
||||||
socketio.emit('execution_progress', {
|
_emit_dual('execution_progress', 'action_progress', {
|
||||||
"progress": progress,
|
"progress": progress,
|
||||||
"step": message,
|
"step": message,
|
||||||
"current": current,
|
"current": current,
|
||||||
@@ -2543,7 +2636,7 @@ def finish_execution(workflow_name: str, success: bool, message: str):
|
|||||||
if command_history:
|
if command_history:
|
||||||
command_history[-1]["status"] = "completed" if success else "failed"
|
command_history[-1]["status"] = "completed" if success else "failed"
|
||||||
|
|
||||||
socketio.emit('execution_completed', {
|
_emit_dual('execution_completed', 'done', {
|
||||||
"workflow": workflow_name,
|
"workflow": workflow_name,
|
||||||
"success": success,
|
"success": success,
|
||||||
"message": message
|
"message": message
|
||||||
|
|||||||
@@ -49,7 +49,10 @@ try:
|
|||||||
from PIL import Image as PILImage
|
from PIL import Image as PILImage
|
||||||
import pyautogui
|
import pyautogui
|
||||||
PYAUTOGUI_AVAILABLE = True
|
PYAUTOGUI_AVAILABLE = True
|
||||||
except ImportError:
|
except Exception:
|
||||||
|
# pyautogui peut lever Xlib.error.DisplayConnectionError (pas un ImportError)
|
||||||
|
# quand X n'est pas accessible — typique d'un service systemd headless côté
|
||||||
|
# serveur. Le serveur n'a pas besoin de pyautogui (utilisé côté client agent).
|
||||||
PYAUTOGUI_AVAILABLE = False
|
PYAUTOGUI_AVAILABLE = False
|
||||||
PILImage = None
|
PILImage = None
|
||||||
pyautogui = None
|
pyautogui = None
|
||||||
@@ -147,8 +150,10 @@ class AutonomousPlanner:
|
|||||||
"""Initialise le client VLM pour analyse intelligente."""
|
"""Initialise le client VLM pour analyse intelligente."""
|
||||||
if VLM_AVAILABLE and OllamaClient:
|
if VLM_AVAILABLE and OllamaClient:
|
||||||
try:
|
try:
|
||||||
self._vlm_client = OllamaClient(model="qwen2.5vl:7b")
|
from core.detection.vlm_config import get_vlm_model
|
||||||
logger.info("VLM client initialized (qwen2.5vl:7b)")
|
_planner_vlm = get_vlm_model()
|
||||||
|
self._vlm_client = OllamaClient(model=_planner_vlm)
|
||||||
|
logger.info("VLM client initialized (%s)", _planner_vlm)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.warning(f"Could not initialize VLM client: {e}")
|
logger.warning(f"Could not initialize VLM client: {e}")
|
||||||
self._vlm_client = None
|
self._vlm_client = None
|
||||||
|
|||||||
@@ -40,10 +40,18 @@ MACHINE_ID = os.environ.get(
|
|||||||
BASE_DIR = Path(__file__).resolve().parent
|
BASE_DIR = Path(__file__).resolve().parent
|
||||||
|
|
||||||
# Endpoint du serveur Streaming (port 5005)
|
# Endpoint du serveur Streaming (port 5005)
|
||||||
|
# SERVER_URL contient TOUJOURS /api/v1 à la fin (convention unifiée).
|
||||||
SERVER_URL = os.getenv("RPA_SERVER_URL", "http://localhost:5005/api/v1")
|
SERVER_URL = os.getenv("RPA_SERVER_URL", "http://localhost:5005/api/v1")
|
||||||
|
# Base sans /api/v1 — pour les routes à la racine (/health)
|
||||||
|
SERVER_BASE = SERVER_URL.rsplit("/api/v1", 1)[0]
|
||||||
UPLOAD_ENDPOINT = f"{SERVER_URL}/traces/upload"
|
UPLOAD_ENDPOINT = f"{SERVER_URL}/traces/upload"
|
||||||
STREAMING_ENDPOINT = f"{SERVER_URL}/traces/stream"
|
STREAMING_ENDPOINT = f"{SERVER_URL}/traces/stream"
|
||||||
|
|
||||||
|
# Host Ollama — SÉPARÉ du serveur RPA.
|
||||||
|
# Ollama tourne en local sur la machine serveur, jamais exposé via le reverse proxy.
|
||||||
|
# Défaut : localhost (exécution locale ou accès LAN direct).
|
||||||
|
OLLAMA_HOST = os.getenv("RPA_OLLAMA_HOST", "localhost")
|
||||||
|
|
||||||
# Token d'authentification API (doit correspondre au token du serveur)
|
# Token d'authentification API (doit correspondre au token du serveur)
|
||||||
# Configurable via variable d'environnement RPA_API_TOKEN
|
# Configurable via variable d'environnement RPA_API_TOKEN
|
||||||
API_TOKEN = os.environ.get("RPA_API_TOKEN", "")
|
API_TOKEN = os.environ.get("RPA_API_TOKEN", "")
|
||||||
|
|||||||
@@ -94,6 +94,11 @@ class ActionExecutorV1:
|
|||||||
# pause supervisée au serveur (`paused_need_help`).
|
# pause supervisée au serveur (`paused_need_help`).
|
||||||
# Cf. core/system_dialog_guard.py
|
# Cf. core/system_dialog_guard.py
|
||||||
self._system_dialog_pause: Optional[Dict[str, Any]] = None
|
self._system_dialog_pause: Optional[Dict[str, Any]] = None
|
||||||
|
# Référence à la ChatWindow Léa V1 (Tkinter) pour afficher les bulles
|
||||||
|
# paused interactives quand le serveur signale une pause supervisée.
|
||||||
|
# Câblée depuis main.py après instanciation des deux objets.
|
||||||
|
# Si None (mode headless / tests), fallback sur self.notifier.
|
||||||
|
self._chat_window_ref = None
|
||||||
# Log de la resolution physique pour le diagnostic DPI
|
# Log de la resolution physique pour le diagnostic DPI
|
||||||
self._log_screen_info()
|
self._log_screen_info()
|
||||||
|
|
||||||
@@ -477,9 +482,15 @@ class ActionExecutorV1:
|
|||||||
},
|
},
|
||||||
headers=headers,
|
headers=headers,
|
||||||
timeout=10,
|
timeout=10,
|
||||||
|
allow_redirects=False,
|
||||||
)
|
)
|
||||||
|
|
||||||
if resp.ok:
|
if resp.status_code in (301, 302, 307, 308):
|
||||||
|
logger.warning(
|
||||||
|
f"Redirection {resp.status_code} sur POST {url} — "
|
||||||
|
f"verifiez RPA_SERVER_URL (https:// si redirect)"
|
||||||
|
)
|
||||||
|
elif resp.ok:
|
||||||
data = resp.json()
|
data = resp.json()
|
||||||
state = data.get("screen_state", "ok")
|
state = data.get("screen_state", "ok")
|
||||||
if state != "ok":
|
if state != "ok":
|
||||||
@@ -703,7 +714,11 @@ class ActionExecutorV1:
|
|||||||
f"attendu '{expected_title}' → mode apprentissage"
|
f"attendu '{expected_title}' → mode apprentissage"
|
||||||
)
|
)
|
||||||
try:
|
try:
|
||||||
self.notifier.replay_wrong_window(current_title, expected_title)
|
self.notifier.replay_learning_mode(
|
||||||
|
raison="wrong_window",
|
||||||
|
target_description=expected_title,
|
||||||
|
window_title=current_title,
|
||||||
|
)
|
||||||
except Exception:
|
except Exception:
|
||||||
pass
|
pass
|
||||||
|
|
||||||
@@ -935,9 +950,10 @@ class ActionExecutorV1:
|
|||||||
# et ne trouve toujours pas. L'humain doit montrer.
|
# et ne trouve toujours pas. L'humain doit montrer.
|
||||||
print(f" [POLICY] Retry échoué → mode apprentissage")
|
print(f" [POLICY] Retry échoué → mode apprentissage")
|
||||||
try:
|
try:
|
||||||
self.notifier.replay_target_not_found(
|
self.notifier.replay_learning_mode(
|
||||||
target_desc,
|
raison="retry_failed",
|
||||||
target_spec.get("window_title", ""),
|
target_description=target_desc,
|
||||||
|
window_title=target_spec.get("window_title", ""),
|
||||||
)
|
)
|
||||||
except Exception:
|
except Exception:
|
||||||
pass
|
pass
|
||||||
@@ -993,9 +1009,10 @@ class ActionExecutorV1:
|
|||||||
# passe en mode capture et enregistre ce que
|
# passe en mode capture et enregistre ce que
|
||||||
# l'humain fait (mini-workflow de correction).
|
# l'humain fait (mini-workflow de correction).
|
||||||
try:
|
try:
|
||||||
self.notifier.replay_target_not_found(
|
self.notifier.replay_learning_mode(
|
||||||
target_desc,
|
raison="supervise",
|
||||||
target_spec.get("window_title", ""),
|
target_description=target_desc,
|
||||||
|
window_title=target_spec.get("window_title", ""),
|
||||||
)
|
)
|
||||||
except Exception:
|
except Exception:
|
||||||
pass
|
pass
|
||||||
@@ -1221,7 +1238,9 @@ class ActionExecutorV1:
|
|||||||
f"je demande de l'aide"
|
f"je demande de l'aide"
|
||||||
)
|
)
|
||||||
try:
|
try:
|
||||||
self.notifier.replay_no_screen_change(action_type)
|
self.notifier.replay_learning_mode(
|
||||||
|
raison="no_screen_change",
|
||||||
|
)
|
||||||
except Exception:
|
except Exception:
|
||||||
pass
|
pass
|
||||||
|
|
||||||
@@ -1377,7 +1396,13 @@ class ActionExecutorV1:
|
|||||||
|
|
||||||
try:
|
try:
|
||||||
print(f" [SERVER-RESOLVE] Appel serveur {server_url}...")
|
print(f" [SERVER-RESOLVE] Appel serveur {server_url}...")
|
||||||
resp = _requests.post(url, json=payload, headers=headers, timeout=30)
|
resp = _requests.post(url, json=payload, headers=headers, timeout=30, allow_redirects=False)
|
||||||
|
if resp.status_code in (301, 302, 307, 308):
|
||||||
|
logger.warning(
|
||||||
|
f"Redirection {resp.status_code} sur POST {url} — "
|
||||||
|
f"verifiez RPA_SERVER_URL (https:// si redirect)"
|
||||||
|
)
|
||||||
|
return None
|
||||||
if not resp.ok:
|
if not resp.ok:
|
||||||
logger.warning(f"Server resolve HTTP {resp.status_code}")
|
logger.warning(f"Server resolve HTTP {resp.status_code}")
|
||||||
return None
|
return None
|
||||||
@@ -1521,7 +1546,7 @@ class ActionExecutorV1:
|
|||||||
if not vlm_description:
|
if not vlm_description:
|
||||||
return None
|
return None
|
||||||
|
|
||||||
ollama_host = os.environ.get("RPA_SERVER_HOST", "localhost")
|
ollama_host = os.environ.get("RPA_OLLAMA_HOST", "localhost")
|
||||||
ollama_url = f"http://{ollama_host}:11434/api/chat"
|
ollama_url = f"http://{ollama_host}:11434/api/chat"
|
||||||
|
|
||||||
prompt = (
|
prompt = (
|
||||||
@@ -1657,7 +1682,7 @@ Example: x_pct=0.50, y_pct=0.30"""
|
|||||||
if anchor_b64:
|
if anchor_b64:
|
||||||
images.append(anchor_b64)
|
images.append(anchor_b64)
|
||||||
|
|
||||||
ollama_host = os.environ.get("RPA_SERVER_HOST", "localhost")
|
ollama_host = os.environ.get("RPA_OLLAMA_HOST", "localhost")
|
||||||
ollama_url = f"http://{ollama_host}:11434/api/chat"
|
ollama_url = f"http://{ollama_host}:11434/api/chat"
|
||||||
|
|
||||||
# Prefill pour les modèles thinking (qwen3) — évite le mode réflexion >180s
|
# Prefill pour les modèles thinking (qwen3) — évite le mode réflexion >180s
|
||||||
@@ -1776,6 +1801,65 @@ Example: x_pct=0.50, y_pct=0.30"""
|
|||||||
self._last_conn_error_logged = False
|
self._last_conn_error_logged = False
|
||||||
|
|
||||||
data = resp.json()
|
data = resp.json()
|
||||||
|
|
||||||
|
# Plan B (8 mai 2026 — démo GHT) : si le serveur signale une pause
|
||||||
|
# supervisée, afficher le pause_message dans la ChatWindow Léa V1
|
||||||
|
# (Tkinter, déjà ouverte sur Windows) sous forme de bulle interactive
|
||||||
|
# avec boutons Continuer / Annuler. Permet à l'utilisateur Windows de
|
||||||
|
# voir physiquement ce que Léa attend (pause_for_human ou échec
|
||||||
|
# résolution). Fallback notifier.notify si la ChatWindow n'est pas
|
||||||
|
# câblée (mode headless / tests).
|
||||||
|
if data.get("replay_paused"):
|
||||||
|
pause_msg = data.get("pause_message") or "Léa a besoin de votre aide"
|
||||||
|
replay_id = data.get("replay_id") or ""
|
||||||
|
pause_key = (replay_id, pause_msg)
|
||||||
|
if getattr(self, "_last_pause_msg_shown", None) != pause_key:
|
||||||
|
self._last_pause_msg_shown = pause_key
|
||||||
|
completed = data.get("current_action_index", 0)
|
||||||
|
total = data.get("total_actions", "?")
|
||||||
|
payload = {
|
||||||
|
"replay_id": replay_id,
|
||||||
|
"workflow": "Replay en cours",
|
||||||
|
"reason": pause_msg,
|
||||||
|
"completed": completed,
|
||||||
|
"total": total,
|
||||||
|
}
|
||||||
|
# Toast Tkinter custom topmost — visible même si la
|
||||||
|
# ChatWindow est withdraw()-cachée par défaut. Sans dépendance
|
||||||
|
# plyer (Focus Assist Windows 11 filtre les balloons système).
|
||||||
|
try:
|
||||||
|
from ..ui.paused_toast import show_paused_toast
|
||||||
|
show_paused_toast(
|
||||||
|
title="Léa a besoin de votre aide",
|
||||||
|
message=pause_msg[:300],
|
||||||
|
)
|
||||||
|
except Exception:
|
||||||
|
logger.debug("paused_toast launch silenced", exc_info=True)
|
||||||
|
|
||||||
|
chat_window = getattr(self, "_chat_window_ref", None)
|
||||||
|
if chat_window is not None:
|
||||||
|
try:
|
||||||
|
# _add_paused_bubble est thread-safe (utilise root.after)
|
||||||
|
# et force l'affichage de la fenêtre + toast topmost
|
||||||
|
chat_window._add_paused_bubble(payload)
|
||||||
|
except Exception:
|
||||||
|
logger.debug(
|
||||||
|
"chat_window._add_paused_bubble pause silenced",
|
||||||
|
exc_info=True,
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
# Fallback notifier (tests headless / chat fermé)
|
||||||
|
try:
|
||||||
|
self.notifier.notify(
|
||||||
|
title="Léa — j'ai besoin de vous",
|
||||||
|
message=pause_msg[:300],
|
||||||
|
timeout=15,
|
||||||
|
bypass_rate_limit=True,
|
||||||
|
)
|
||||||
|
except Exception:
|
||||||
|
logger.debug("notifier.notify pause silenced", exc_info=True)
|
||||||
|
return False
|
||||||
|
|
||||||
action = data.get("action")
|
action = data.get("action")
|
||||||
if action is None:
|
if action is None:
|
||||||
return False
|
return False
|
||||||
@@ -1861,8 +1945,14 @@ Example: x_pct=0.50, y_pct=0.30"""
|
|||||||
json=report,
|
json=report,
|
||||||
headers=self._auth_headers(),
|
headers=self._auth_headers(),
|
||||||
timeout=10,
|
timeout=10,
|
||||||
|
allow_redirects=False,
|
||||||
)
|
)
|
||||||
if resp2.ok:
|
if resp2.status_code in (301, 302, 307, 308):
|
||||||
|
logger.warning(
|
||||||
|
f"Redirection {resp2.status_code} sur POST {replay_result_url} — "
|
||||||
|
f"verifiez RPA_SERVER_URL (https:// si redirect)"
|
||||||
|
)
|
||||||
|
elif resp2.ok:
|
||||||
server_resp = resp2.json()
|
server_resp = resp2.json()
|
||||||
msg = (
|
msg = (
|
||||||
f"Resultat rapporte : replay_status={server_resp.get('replay_status')}, "
|
f"Resultat rapporte : replay_status={server_resp.get('replay_status')}, "
|
||||||
@@ -2128,7 +2218,7 @@ Example: x_pct=0.50, y_pct=0.30"""
|
|||||||
"""
|
"""
|
||||||
import requests as _requests
|
import requests as _requests
|
||||||
|
|
||||||
ollama_host = os.environ.get("RPA_SERVER_HOST", "localhost")
|
ollama_host = os.environ.get("RPA_OLLAMA_HOST", "localhost")
|
||||||
ollama_url = f"http://{ollama_host}:11434/api/chat"
|
ollama_url = f"http://{ollama_host}:11434/api/chat"
|
||||||
|
|
||||||
prompt = (
|
prompt = (
|
||||||
@@ -2154,8 +2244,11 @@ Example: x_pct=0.50, y_pct=0.30"""
|
|||||||
},
|
},
|
||||||
{"role": "user", "content": prompt, "images": [screenshot_b64]},
|
{"role": "user", "content": prompt, "images": [screenshot_b64]},
|
||||||
]
|
]
|
||||||
|
# Prefill pour les modèles "thinking" (qwen3-vl) : force la sortie à commencer
|
||||||
|
# par cette chaîne, évite les longs blocs de raisonnement interne.
|
||||||
|
prefill = "The button to click is: " if _is_thinking_popup else ""
|
||||||
if _is_thinking_popup:
|
if _is_thinking_popup:
|
||||||
messages_popup.append({"role": "assistant", "content": "The button to click is: "})
|
messages_popup.append({"role": "assistant", "content": prefill})
|
||||||
|
|
||||||
payload = {
|
payload = {
|
||||||
"model": _vlm_model_popup,
|
"model": _vlm_model_popup,
|
||||||
@@ -2268,7 +2361,7 @@ Example: x_pct=0.50, y_pct=0.30"""
|
|||||||
|
|
||||||
best_match = None
|
best_match = None
|
||||||
best_val = 0.0
|
best_val = 0.0
|
||||||
threshold = 0.50 # Seuil équilibré
|
threshold = 0.75 # Démo GHT 8 mai — éviter faux positifs (placeholders italiques, tabs voisins). En dessous, mieux vaut tomber en mode apprentissage humain qu'un clic au pif.
|
||||||
|
|
||||||
# Essayer plusieurs tailles de police pour couvrir différentes résolutions
|
# Essayer plusieurs tailles de police pour couvrir différentes résolutions
|
||||||
for font_size in [14, 16, 18, 20, 22, 24, 12, 26, 28, 10]:
|
for font_size in [14, 16, 18, 20, 22, 24, 12, 26, 28, 10]:
|
||||||
@@ -2572,8 +2665,8 @@ Example: x_pct=0.50, y_pct=0.30"""
|
|||||||
f"inactivité={INACTIVITY_TIMEOUT}s, hotkey=Ctrl+Shift+L)"
|
f"inactivité={INACTIVITY_TIMEOUT}s, hotkey=Ctrl+Shift+L)"
|
||||||
)
|
)
|
||||||
print(
|
print(
|
||||||
f" [APPRENTISSAGE] Montre-moi comment faire.\n"
|
f" [APPRENTISSAGE] Je n'y arrive pas, montrez-moi comment faire.\n"
|
||||||
f" Quand tu as fini → Ctrl+Shift+L\n"
|
f" Quand vous avez fini → Ctrl+Shift+L\n"
|
||||||
f" (ou j'attends {INACTIVITY_TIMEOUT}s sans action)"
|
f" (ou j'attends {INACTIVITY_TIMEOUT}s sans action)"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|||||||
@@ -17,6 +17,7 @@ import threading
|
|||||||
from .config import (
|
from .config import (
|
||||||
SESSIONS_ROOT, AGENT_VERSION, SERVER_URL, MACHINE_ID, LOG_RETENTION_DAYS,
|
SESSIONS_ROOT, AGENT_VERSION, SERVER_URL, MACHINE_ID, LOG_RETENTION_DAYS,
|
||||||
SCREEN_RESOLUTION, DPI_SCALE, OS_THEME, API_TOKEN, MAX_SESSION_DURATION_S,
|
SCREEN_RESOLUTION, DPI_SCALE, OS_THEME, API_TOKEN, MAX_SESSION_DURATION_S,
|
||||||
|
STREAMING_ENDPOINT,
|
||||||
)
|
)
|
||||||
from .core.captor import EventCaptorV1
|
from .core.captor import EventCaptorV1
|
||||||
from .core.executor import ActionExecutorV1
|
from .core.executor import ActionExecutorV1
|
||||||
@@ -86,22 +87,23 @@ class AgentV1:
|
|||||||
self._state.set_on_stop(self.stop_session)
|
self._state.set_on_stop(self.stop_session)
|
||||||
|
|
||||||
# Client serveur pour le chat et les workflows
|
# Client serveur pour le chat et les workflows
|
||||||
|
# Plus de RPA_SERVER_HOST : le LeaServerClient derive tout de SERVER_URL
|
||||||
self._server_client = None
|
self._server_client = None
|
||||||
if LeaServerClient is not None:
|
if LeaServerClient is not None:
|
||||||
# Forcer le token API pour éviter les 401
|
# Forcer le token API pour éviter les 401
|
||||||
# (le token est set par start.bat dans l'environnement)
|
# (le token est set par start.bat dans l'environnement)
|
||||||
from .config import API_TOKEN as _token
|
from .config import API_TOKEN as _token
|
||||||
server_host = os.getenv("RPA_SERVER_HOST", "localhost")
|
self._server_client = LeaServerClient()
|
||||||
self._server_client = LeaServerClient(server_host=server_host)
|
|
||||||
if _token and not self._server_client._api_token:
|
if _token and not self._server_client._api_token:
|
||||||
self._server_client._api_token = _token
|
self._server_client._api_token = _token
|
||||||
logger.info("Token API forcé dans LeaServerClient")
|
logger.info("Token API forcé dans LeaServerClient")
|
||||||
|
|
||||||
# Fenetre de chat Lea (tkinter natif)
|
# Fenetre de chat Lea (tkinter natif)
|
||||||
|
# Le host est derive de SERVER_URL (plus de RPA_SERVER_HOST)
|
||||||
server_host = (
|
server_host = (
|
||||||
self._server_client.server_host
|
self._server_client.server_host
|
||||||
if self._server_client is not None
|
if self._server_client is not None
|
||||||
else os.getenv("RPA_SERVER_HOST", "localhost")
|
else "localhost"
|
||||||
)
|
)
|
||||||
self._chat_window = ChatWindow(
|
self._chat_window = ChatWindow(
|
||||||
server_client=self._server_client,
|
server_client=self._server_client,
|
||||||
@@ -114,6 +116,14 @@ class AgentV1:
|
|||||||
# Executeur pour le replay (doit exister avant le poll)
|
# Executeur pour le replay (doit exister avant le poll)
|
||||||
self._executor = ActionExecutorV1()
|
self._executor = ActionExecutorV1()
|
||||||
|
|
||||||
|
# Wiring ChatWindow → Executor pour Plan B (pause_message → bulle interactive)
|
||||||
|
# Permet à l'executor d'afficher une bulle paused dans la fenêtre Léa V1
|
||||||
|
# quand le serveur signale replay_paused=True via /replay/next.
|
||||||
|
try:
|
||||||
|
self._executor._chat_window_ref = self._chat_window
|
||||||
|
except Exception:
|
||||||
|
logger.debug("Wiring chat_window→executor échoué (non bloquant)", exc_info=True)
|
||||||
|
|
||||||
# Boucles permanentes (pas besoin de session active)
|
# Boucles permanentes (pas besoin de session active)
|
||||||
self.running = True
|
self.running = True
|
||||||
self._bg_vision = VisionCapturer(str(SESSIONS_ROOT / "_background"))
|
self._bg_vision = VisionCapturer(str(SESSIONS_ROOT / "_background"))
|
||||||
@@ -363,11 +373,11 @@ class AgentV1:
|
|||||||
continue
|
continue
|
||||||
self._last_bg_hash = img_hash
|
self._last_bg_hash = img_hash
|
||||||
|
|
||||||
# Envoyer au streaming server (avec token auth)
|
# Envoyer au streaming server (via STREAMING_ENDPOINT unifié)
|
||||||
headers = {"Authorization": f"Bearer {API_TOKEN}"} if API_TOKEN else {}
|
headers = {"Authorization": f"Bearer {API_TOKEN}"} if API_TOKEN else {}
|
||||||
with open(full_path, 'rb') as f:
|
with open(full_path, 'rb') as f:
|
||||||
req.post(
|
req.post(
|
||||||
f"{SERVER_URL}/traces/stream/image",
|
f"{STREAMING_ENDPOINT}/image",
|
||||||
params={
|
params={
|
||||||
"session_id": bg_session,
|
"session_id": bg_session,
|
||||||
"shot_id": f"heartbeat_{int(time.time())}",
|
"shot_id": f"heartbeat_{int(time.time())}",
|
||||||
@@ -376,6 +386,7 @@ class AgentV1:
|
|||||||
headers=headers,
|
headers=headers,
|
||||||
files={"file": ("screenshot.png", f, "image/png")},
|
files={"file": ("screenshot.png", f, "image/png")},
|
||||||
timeout=10,
|
timeout=10,
|
||||||
|
allow_redirects=False,
|
||||||
)
|
)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.debug(f"[HEARTBEAT] Erreur: {e}")
|
logger.debug(f"[HEARTBEAT] Erreur: {e}")
|
||||||
@@ -445,6 +456,12 @@ class AgentV1:
|
|||||||
window_title = self.vision.get_active_window_title()
|
window_title = self.vision.get_active_window_title()
|
||||||
if window_title:
|
if window_title:
|
||||||
heartbeat_event["active_window_title"] = window_title
|
heartbeat_event["active_window_title"] = window_title
|
||||||
|
# QW1 — enrichissement multi-écrans (additif, fallback gracieux)
|
||||||
|
try:
|
||||||
|
from .vision.capturer import _enrich_with_monitor_info
|
||||||
|
_enrich_with_monitor_info(heartbeat_event)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
self.streamer.push_event(heartbeat_event)
|
self.streamer.push_event(heartbeat_event)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Heartbeat error: {e}")
|
logger.error(f"Heartbeat error: {e}")
|
||||||
|
|||||||
149
agent_v0/agent_v1/network/feedback_bus.py
Normal file
149
agent_v0/agent_v1/network/feedback_bus.py
Normal file
@@ -0,0 +1,149 @@
|
|||||||
|
# agent_v1/network/feedback_bus.py
|
||||||
|
"""Client SocketIO pour le bus feedback Léa.
|
||||||
|
|
||||||
|
Consomme les events 'lea:*' émis par agent_chat (port 5004) et les dispatche
|
||||||
|
vers ChatWindow pour affichage en bulles temps réel.
|
||||||
|
|
||||||
|
Events écoutés :
|
||||||
|
lea:action_started — début d'un workflow ou d'une action
|
||||||
|
lea:action_progress — progression dans le workflow
|
||||||
|
lea:done — fin d'un workflow ou d'un copilot
|
||||||
|
lea:need_confirm — étape copilot en attente de validation
|
||||||
|
lea:step_result — résultat d'une étape copilot
|
||||||
|
lea:paused — basculement en paused_need_help (asset démo)
|
||||||
|
lea:resumed — sortie de pause supervisée
|
||||||
|
|
||||||
|
Fail-safe : toute erreur de connexion ou de dispatch est silencieusement
|
||||||
|
loggée. Le ChatWindow continue de fonctionner même si le bus est mort
|
||||||
|
(comportement strictement identique au pré-J3).
|
||||||
|
|
||||||
|
Usage :
|
||||||
|
bus = FeedbackBusClient(
|
||||||
|
server_url="http://localhost:5004",
|
||||||
|
token=os.environ.get("RPA_API_TOKEN", ""),
|
||||||
|
on_event=lambda event, payload: print(event, payload),
|
||||||
|
)
|
||||||
|
bus.start() # connexion en arrière-plan, non-bloquant
|
||||||
|
# ... ChatWindow tourne ...
|
||||||
|
bus.stop()
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import threading
|
||||||
|
from typing import Callable, Optional
|
||||||
|
|
||||||
|
import socketio
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
LEA_EVENTS = (
|
||||||
|
'lea:action_started',
|
||||||
|
'lea:action_progress',
|
||||||
|
'lea:done',
|
||||||
|
'lea:need_confirm',
|
||||||
|
'lea:step_result',
|
||||||
|
'lea:paused',
|
||||||
|
'lea:resumed',
|
||||||
|
)
|
||||||
|
|
||||||
|
EventCallback = Callable[[str, dict], None]
|
||||||
|
|
||||||
|
|
||||||
|
class FeedbackBusClient:
|
||||||
|
"""Client SocketIO non-bloquant pour le bus 'lea:*'."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
server_url: str,
|
||||||
|
token: Optional[str] = None,
|
||||||
|
on_event: Optional[EventCallback] = None,
|
||||||
|
):
|
||||||
|
self._url = server_url.rstrip('/')
|
||||||
|
self._token = token or None
|
||||||
|
self._on_event: EventCallback = on_event or (lambda e, p: None)
|
||||||
|
self._sio = socketio.Client(
|
||||||
|
reconnection=True,
|
||||||
|
reconnection_attempts=0, # 0 = illimité
|
||||||
|
reconnection_delay=2,
|
||||||
|
reconnection_delay_max=30,
|
||||||
|
logger=False,
|
||||||
|
engineio_logger=False,
|
||||||
|
)
|
||||||
|
self._thread: Optional[threading.Thread] = None
|
||||||
|
self._register_handlers()
|
||||||
|
|
||||||
|
def _register_handlers(self) -> None:
|
||||||
|
@self._sio.event
|
||||||
|
def connect():
|
||||||
|
logger.info("FeedbackBus connecté à %s", self._url)
|
||||||
|
|
||||||
|
@self._sio.event
|
||||||
|
def disconnect():
|
||||||
|
logger.info("FeedbackBus déconnecté")
|
||||||
|
|
||||||
|
for ev in LEA_EVENTS:
|
||||||
|
self._sio.on(ev, lambda data, e=ev: self._dispatch(e, data))
|
||||||
|
|
||||||
|
def _dispatch(self, event: str, payload: Optional[dict]) -> None:
|
||||||
|
try:
|
||||||
|
self._on_event(event, payload or {})
|
||||||
|
except Exception:
|
||||||
|
logger.debug("FeedbackBus dispatch silenced", exc_info=True)
|
||||||
|
|
||||||
|
def start(self) -> None:
|
||||||
|
"""Démarrer la connexion en arrière-plan (idempotent, non-bloquant)."""
|
||||||
|
if self._thread is not None and self._thread.is_alive():
|
||||||
|
return
|
||||||
|
self._thread = threading.Thread(
|
||||||
|
target=self._run, daemon=True, name="LeaFeedbackBus",
|
||||||
|
)
|
||||||
|
self._thread.start()
|
||||||
|
|
||||||
|
def _run(self) -> None:
|
||||||
|
headers = {}
|
||||||
|
if self._token:
|
||||||
|
headers['Authorization'] = f'Bearer {self._token}'
|
||||||
|
try:
|
||||||
|
self._sio.connect(self._url, headers=headers, wait=True)
|
||||||
|
self._sio.wait()
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(
|
||||||
|
"FeedbackBus connect échoué (%s) — ChatWindow continue normalement", e,
|
||||||
|
)
|
||||||
|
|
||||||
|
def stop(self) -> None:
|
||||||
|
"""Arrêter proprement la connexion (idempotent, fail-safe)."""
|
||||||
|
try:
|
||||||
|
if self._sio.connected:
|
||||||
|
self._sio.disconnect()
|
||||||
|
except Exception:
|
||||||
|
logger.debug("FeedbackBus stop silenced", exc_info=True)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def connected(self) -> bool:
|
||||||
|
return bool(self._sio.connected)
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Actions utilisateur depuis la bulle paused_need_help (J3.5)
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def resume_replay(self, replay_id: str) -> bool:
|
||||||
|
"""Bouton Continuer : émet 'lea:replay_resume' vers agent_chat.
|
||||||
|
|
||||||
|
Retourne True si l'event a pu être émis, False sinon (déconnecté/erreur).
|
||||||
|
"""
|
||||||
|
return self._safe_emit("lea:replay_resume", {"replay_id": replay_id})
|
||||||
|
|
||||||
|
def abort_replay(self, replay_id: str) -> bool:
|
||||||
|
"""Bouton Annuler : émet 'lea:replay_abort' vers agent_chat."""
|
||||||
|
return self._safe_emit("lea:replay_abort", {"replay_id": replay_id})
|
||||||
|
|
||||||
|
def _safe_emit(self, event: str, payload: dict) -> bool:
|
||||||
|
try:
|
||||||
|
if not self._sio.connected:
|
||||||
|
return False
|
||||||
|
self._sio.emit(event, payload)
|
||||||
|
return True
|
||||||
|
except Exception:
|
||||||
|
logger.debug("FeedbackBus _safe_emit silenced", exc_info=True)
|
||||||
|
return False
|
||||||
@@ -544,6 +544,28 @@ class TraceStreamer:
|
|||||||
except OSError as e:
|
except OSError as e:
|
||||||
logger.debug(f"Purge échouée : {path} — {e}")
|
logger.debug(f"Purge échouée : {path} — {e}")
|
||||||
|
|
||||||
|
# =========================================================================
|
||||||
|
# Protection redirect POST→GET (INC-7)
|
||||||
|
# =========================================================================
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _check_redirect(resp, url: str):
|
||||||
|
"""Detecter et logger une redirection sur un POST.
|
||||||
|
|
||||||
|
La lib requests transforme un POST en GET sur 301/302 (RFC 7231).
|
||||||
|
Avec allow_redirects=False, on recoit le 301/302 directement.
|
||||||
|
On log un WARNING explicite pour que l'admin corrige l'URL.
|
||||||
|
"""
|
||||||
|
if resp.status_code in (301, 302, 307, 308):
|
||||||
|
location = resp.headers.get("Location", "?")
|
||||||
|
logger.warning(
|
||||||
|
f"Redirection {resp.status_code} detectee sur POST {url} "
|
||||||
|
f"→ {location}. Verifiez que RPA_SERVER_URL utilise "
|
||||||
|
f"https:// si le serveur redirige."
|
||||||
|
)
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
# =========================================================================
|
# =========================================================================
|
||||||
# Envois HTTP
|
# Envois HTTP
|
||||||
# =========================================================================
|
# =========================================================================
|
||||||
@@ -551,15 +573,20 @@ class TraceStreamer:
|
|||||||
def _register_session(self):
|
def _register_session(self):
|
||||||
"""Enregistrer la session auprès du serveur (avec identifiant machine)."""
|
"""Enregistrer la session auprès du serveur (avec identifiant machine)."""
|
||||||
try:
|
try:
|
||||||
|
url = f"{STREAMING_ENDPOINT}/register"
|
||||||
resp = requests.post(
|
resp = requests.post(
|
||||||
f"{STREAMING_ENDPOINT}/register",
|
url,
|
||||||
params={
|
params={
|
||||||
"session_id": self.session_id,
|
"session_id": self.session_id,
|
||||||
"machine_id": self.machine_id,
|
"machine_id": self.machine_id,
|
||||||
},
|
},
|
||||||
headers=self._auth_headers(),
|
headers=self._auth_headers(),
|
||||||
timeout=3,
|
timeout=3,
|
||||||
|
allow_redirects=False,
|
||||||
)
|
)
|
||||||
|
if self._check_redirect(resp, url):
|
||||||
|
logger.warning("Enregistrement session échoué (redirect)")
|
||||||
|
return
|
||||||
if resp.ok:
|
if resp.ok:
|
||||||
logger.info(
|
logger.info(
|
||||||
f"Session {self.session_id} enregistrée sur le serveur "
|
f"Session {self.session_id} enregistrée sur le serveur "
|
||||||
@@ -579,15 +606,18 @@ class TraceStreamer:
|
|||||||
C'est la dernière chance de sauver les données de la session.
|
C'est la dernière chance de sauver les données de la session.
|
||||||
"""
|
"""
|
||||||
try:
|
try:
|
||||||
|
url = f"{STREAMING_ENDPOINT}/finalize"
|
||||||
resp = requests.post(
|
resp = requests.post(
|
||||||
f"{STREAMING_ENDPOINT}/finalize",
|
url,
|
||||||
params={
|
params={
|
||||||
"session_id": self.session_id,
|
"session_id": self.session_id,
|
||||||
"machine_id": self.machine_id,
|
"machine_id": self.machine_id,
|
||||||
},
|
},
|
||||||
headers=self._auth_headers(),
|
headers=self._auth_headers(),
|
||||||
timeout=30, # Le build workflow peut prendre du temps
|
timeout=30, # Le build workflow peut prendre du temps
|
||||||
|
allow_redirects=False,
|
||||||
)
|
)
|
||||||
|
self._check_redirect(resp, url)
|
||||||
if resp.ok:
|
if resp.ok:
|
||||||
result = resp.json()
|
result = resp.json()
|
||||||
logger.info(f"Session finalisée: {result}")
|
logger.info(f"Session finalisée: {result}")
|
||||||
@@ -601,6 +631,7 @@ class TraceStreamer:
|
|||||||
if not self._server_available:
|
if not self._server_available:
|
||||||
return False
|
return False
|
||||||
try:
|
try:
|
||||||
|
url = f"{STREAMING_ENDPOINT}/event"
|
||||||
payload = {
|
payload = {
|
||||||
"session_id": self.session_id,
|
"session_id": self.session_id,
|
||||||
"timestamp": time.time(),
|
"timestamp": time.time(),
|
||||||
@@ -608,11 +639,14 @@ class TraceStreamer:
|
|||||||
"machine_id": self.machine_id,
|
"machine_id": self.machine_id,
|
||||||
}
|
}
|
||||||
resp = requests.post(
|
resp = requests.post(
|
||||||
f"{STREAMING_ENDPOINT}/event",
|
url,
|
||||||
json=payload,
|
json=payload,
|
||||||
headers=self._auth_headers(),
|
headers=self._auth_headers(),
|
||||||
timeout=2,
|
timeout=2,
|
||||||
|
allow_redirects=False,
|
||||||
)
|
)
|
||||||
|
if self._check_redirect(resp, url):
|
||||||
|
return False
|
||||||
return resp.ok
|
return resp.ok
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.debug(f"Streaming Event échoué: {e}")
|
logger.debug(f"Streaming Event échoué: {e}")
|
||||||
@@ -645,18 +679,22 @@ class TraceStreamer:
|
|||||||
"machine_id": self.machine_id,
|
"machine_id": self.machine_id,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
url = f"{STREAMING_ENDPOINT}/image"
|
||||||
if jpeg_buf is not None:
|
if jpeg_buf is not None:
|
||||||
# Envoi du JPEG compressé (BytesIO, pas de fuite possible)
|
# Envoi du JPEG compressé (BytesIO, pas de fuite possible)
|
||||||
files = {
|
files = {
|
||||||
"file": (f"{shot_id}{suffix}", jpeg_buf, content_type)
|
"file": (f"{shot_id}{suffix}", jpeg_buf, content_type)
|
||||||
}
|
}
|
||||||
resp = requests.post(
|
resp = requests.post(
|
||||||
f"{STREAMING_ENDPOINT}/image",
|
url,
|
||||||
files=files,
|
files=files,
|
||||||
params=params,
|
params=params,
|
||||||
headers=self._auth_headers(),
|
headers=self._auth_headers(),
|
||||||
timeout=5,
|
timeout=5,
|
||||||
|
allow_redirects=False,
|
||||||
)
|
)
|
||||||
|
if self._check_redirect(resp, url):
|
||||||
|
return ImageSendResult.FAILED
|
||||||
if resp.ok:
|
if resp.ok:
|
||||||
self._purge_local_image(path)
|
self._purge_local_image(path)
|
||||||
return ImageSendResult.OK
|
return ImageSendResult.OK
|
||||||
@@ -668,12 +706,15 @@ class TraceStreamer:
|
|||||||
"file": (f"{shot_id}.png", f, "image/png")
|
"file": (f"{shot_id}.png", f, "image/png")
|
||||||
}
|
}
|
||||||
resp = requests.post(
|
resp = requests.post(
|
||||||
f"{STREAMING_ENDPOINT}/image",
|
url,
|
||||||
files=files,
|
files=files,
|
||||||
params=params,
|
params=params,
|
||||||
headers=self._auth_headers(),
|
headers=self._auth_headers(),
|
||||||
timeout=5,
|
timeout=5,
|
||||||
|
allow_redirects=False,
|
||||||
)
|
)
|
||||||
|
if self._check_redirect(resp, url):
|
||||||
|
return ImageSendResult.FAILED
|
||||||
if resp.ok:
|
if resp.ok:
|
||||||
self._purge_local_image(path)
|
self._purge_local_image(path)
|
||||||
return ImageSendResult.OK
|
return ImageSendResult.OK
|
||||||
|
|||||||
@@ -3,7 +3,9 @@ mss>=9.0.1 # Capture d'écran haute performance
|
|||||||
pynput>=1.7.7 # Clavier/Souris Cross-plateforme
|
pynput>=1.7.7 # Clavier/Souris Cross-plateforme
|
||||||
Pillow>=10.0.0 # Crops et processing image
|
Pillow>=10.0.0 # Crops et processing image
|
||||||
requests>=2.31.0 # Streaming réseau
|
requests>=2.31.0 # Streaming réseau
|
||||||
|
python-socketio[client]>=5.10,<6.0 # Bus feedback Léa 'lea:*' (compat Flask-SocketIO 5.3.x serveur)
|
||||||
psutil>=5.9.0 # Monitoring CPU/RAM
|
psutil>=5.9.0 # Monitoring CPU/RAM
|
||||||
|
screeninfo>=0.8 # QW1 — détection des monitors physiques + offsets
|
||||||
pystray>=0.19.5 # Icône Tray UI
|
pystray>=0.19.5 # Icône Tray UI
|
||||||
plyer>=2.1.0 # Notifications toast natives (remplace PyQt5)
|
plyer>=2.1.0 # Notifications toast natives (remplace PyQt5)
|
||||||
pywebview>=5.0 # Fenêtre de chat Léa intégrée (Edge WebView2 sur Windows)
|
pywebview>=5.0 # Fenêtre de chat Léa intégrée (Edge WebView2 sur Windows)
|
||||||
|
|||||||
0
agent_v0/agent_v1/tools/__init__.py
Normal file
0
agent_v0/agent_v1/tools/__init__.py
Normal file
87
agent_v0/agent_v1/tools/test_lea_toast.py
Normal file
87
agent_v0/agent_v1/tools/test_lea_toast.py
Normal file
@@ -0,0 +1,87 @@
|
|||||||
|
# agent_v1/tools/test_lea_toast.py
|
||||||
|
"""
|
||||||
|
Test visuel rapide du toast Léa (démo GHT 8 mai 2026).
|
||||||
|
|
||||||
|
Lance trois scénarios de toast successifs pour valider l'affichage Windows :
|
||||||
|
1. Toast simple « pause supervisée »
|
||||||
|
2. Toast avec message long (vérifier wraplength)
|
||||||
|
3. Toast type BLOCAGE (= ce que voit l'utilisateur quand Léa est perdue)
|
||||||
|
|
||||||
|
Usage Windows :
|
||||||
|
C:\\rpa_vision\\.venv\\Scripts\\python.exe C:\\rpa_vision\\agent_v1\\tools\\test_lea_toast.py
|
||||||
|
|
||||||
|
Le script s'attend à voir trois toasts successifs en haut-droite de l'écran
|
||||||
|
principal, espacés de ~6 s, fond bleu Léa, autodismiss après 15 s ou clic.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
def _bootstrap_path() -> None:
|
||||||
|
"""Autoriser l'exécution directe sans -m : ajouter C:\\rpa_vision au sys.path."""
|
||||||
|
here = Path(__file__).resolve()
|
||||||
|
# On remonte : tools -> agent_v1 -> rpa_vision (parent du package agent_v1)
|
||||||
|
rpa_root = here.parent.parent.parent
|
||||||
|
if str(rpa_root) not in sys.path:
|
||||||
|
sys.path.insert(0, str(rpa_root))
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> int:
|
||||||
|
_bootstrap_path()
|
||||||
|
|
||||||
|
# Import après ajout du path (les deux variantes fonctionnent)
|
||||||
|
try:
|
||||||
|
from agent_v1.ui.paused_toast import show_paused_toast
|
||||||
|
except Exception as e: # pragma: no cover (debug only)
|
||||||
|
print(f"[TEST] ERREUR import agent_v1.ui.paused_toast : {e}")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
scenarios = [
|
||||||
|
(
|
||||||
|
"Toast 1/3 : pause simple",
|
||||||
|
"Léa a besoin de votre aide",
|
||||||
|
"Test 1/3 — Pause supervisée. Cliquez sur 'Continuer' dans la chat.",
|
||||||
|
),
|
||||||
|
(
|
||||||
|
"Toast 2/3 : message long",
|
||||||
|
"Léa — j'attends votre validation",
|
||||||
|
(
|
||||||
|
"Test 2/3 — J'ai trouvé 11 dossiers correspondant à vos critères "
|
||||||
|
"(UHCD, Forfait 1, PE2). Je vais traiter le dossier de M. DUPONT "
|
||||||
|
"Jean en premier. Pouvez-vous valider que c'est le bon ordre "
|
||||||
|
"avant que je continue ?"
|
||||||
|
),
|
||||||
|
),
|
||||||
|
(
|
||||||
|
"Toast 3/3 : blocage cible non trouvée",
|
||||||
|
"Léa — je ne vois pas l'élément",
|
||||||
|
(
|
||||||
|
"Test 3/3 — Je n'arrive pas à trouver « Examens cliniques » à "
|
||||||
|
"l'écran. Pouvez-vous me montrer où cliquer ?"
|
||||||
|
),
|
||||||
|
),
|
||||||
|
]
|
||||||
|
|
||||||
|
for label, title, message in scenarios:
|
||||||
|
print(f"[TEST] {label}")
|
||||||
|
ok = show_paused_toast(title=title, message=message)
|
||||||
|
print(f" show_paused_toast() = {ok}")
|
||||||
|
if not ok:
|
||||||
|
print(f" ECHEC : {label}")
|
||||||
|
# Espacer pour que Dom voit chaque toast distinctement
|
||||||
|
# (rate limit interne = 3s pour message identique, mais ici les
|
||||||
|
# messages diffèrent, le rate limit ne s'applique pas)
|
||||||
|
time.sleep(6)
|
||||||
|
|
||||||
|
print("[TEST] Attente 12s supplémentaires pour laisser le dernier toast vivre...")
|
||||||
|
time.sleep(12)
|
||||||
|
print("[TEST] OK — fin du test. Si vous avez vu 3 toasts bleus en haut-droite,")
|
||||||
|
print(" le mécanisme Léa pause est validé.")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
sys.exit(main())
|
||||||
53
agent_v0/agent_v1/ui/_test_paused_toast.py
Normal file
53
agent_v0/agent_v1/ui/_test_paused_toast.py
Normal file
@@ -0,0 +1,53 @@
|
|||||||
|
# agent_v1/ui/_test_paused_toast.py
|
||||||
|
"""
|
||||||
|
Test isolé du toast paused — à exécuter directement sur Windows.
|
||||||
|
|
||||||
|
Usage (sur Windows, depuis C:\\rpa_vision\\agent_v1) :
|
||||||
|
python -m agent_v1.ui._test_paused_toast
|
||||||
|
|
||||||
|
OU plus simple :
|
||||||
|
python C:\\rpa_vision\\agent_v1\\ui\\_test_paused_toast.py
|
||||||
|
|
||||||
|
Le toast doit s'afficher en haut à droite de l'écran principal pendant ~15s.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> int:
|
||||||
|
print("[TEST] Lancement du toast paused...")
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Import flexible : essai relatif puis absolu
|
||||||
|
try:
|
||||||
|
from .paused_toast import show_paused_toast
|
||||||
|
except ImportError:
|
||||||
|
from paused_toast import show_paused_toast
|
||||||
|
except Exception as e:
|
||||||
|
print(f"[TEST] ERREUR import : {e}")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
ok = show_paused_toast(
|
||||||
|
title="Léa a besoin de votre aide",
|
||||||
|
message=(
|
||||||
|
"Test isolé — démo GHT 8 mai 2026.\n"
|
||||||
|
"Si vous voyez ce toast, le mécanisme de pause supervisée "
|
||||||
|
"fonctionne correctement."
|
||||||
|
),
|
||||||
|
)
|
||||||
|
print(f"[TEST] show_paused_toast() retour = {ok}")
|
||||||
|
|
||||||
|
if not ok:
|
||||||
|
print("[TEST] ÉCHEC : toast non déclenché.")
|
||||||
|
return 2
|
||||||
|
|
||||||
|
print("[TEST] Toast déclenché. Attente de 18s pour le voir s'afficher puis se fermer...")
|
||||||
|
time.sleep(18)
|
||||||
|
print("[TEST] OK — fin du test.")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
sys.exit(main())
|
||||||
@@ -16,6 +16,15 @@ from typing import Any, Callable, Dict, Optional
|
|||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# FeedbackBus : import fail-safe (le ChatWindow doit tourner même si python-socketio
|
||||||
|
# n'est pas installé sur le poste client, par exemple ancienne installation Pauline)
|
||||||
|
try:
|
||||||
|
from ..network.feedback_bus import FeedbackBusClient
|
||||||
|
_HAS_FEEDBACK_BUS = True
|
||||||
|
except Exception:
|
||||||
|
FeedbackBusClient = None # type: ignore
|
||||||
|
_HAS_FEEDBACK_BUS = False
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# Theme — palette professionnelle claire
|
# Theme — palette professionnelle claire
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
@@ -42,6 +51,25 @@ SCROLLBAR_BG = "#E5E7EB" # Fond scrollbar
|
|||||||
SCROLLBAR_FG = "#9CA3AF" # Curseur scrollbar
|
SCROLLBAR_FG = "#9CA3AF" # Curseur scrollbar
|
||||||
MSG_BORDER_COLOR = "#D1D5DB" # Bordure subtile des bulles de messages
|
MSG_BORDER_COLOR = "#D1D5DB" # Bordure subtile des bulles de messages
|
||||||
|
|
||||||
|
# Bulle paused_need_help (J3.5) — alerte non bloquante, asset démo majeur
|
||||||
|
PAUSED_BG = "#FEF3C7" # Jaune pâle
|
||||||
|
PAUSED_BORDER = "#F59E0B" # Orange ambré
|
||||||
|
PAUSED_FG = "#92400E" # Brun foncé (lisible sur fond jaune)
|
||||||
|
PAUSED_BTN_RESUME_BG = "#22C55E" # Vert
|
||||||
|
PAUSED_BTN_RESUME_HOVER = "#16A34A"
|
||||||
|
PAUSED_BTN_ABORT_BG = "#9CA3AF" # Gris neutre (pas dramatique)
|
||||||
|
PAUSED_BTN_ABORT_HOVER = "#6B7280"
|
||||||
|
|
||||||
|
# Bulle "Léa exécute" (J3.4) — distincte des bulles chat normales
|
||||||
|
ACTION_BG = "#F1F5F9" # Gris très clair (différencie d'une réponse chat)
|
||||||
|
ACTION_BORDER = "#CBD5E1" # Gris pâle
|
||||||
|
ACTION_FG = "#1E293B" # Gris foncé
|
||||||
|
ACTION_META_FG = "#94A3B8" # Métadonnées en gris discret
|
||||||
|
ACTION_ICON_RUN = "#3B82F6" # Bleu (en cours)
|
||||||
|
ACTION_ICON_OK = "#22C55E" # Vert (succès)
|
||||||
|
ACTION_ICON_ERR = "#EF4444" # Rouge (échec)
|
||||||
|
ACTION_ICON_INFO = "#64748B" # Gris (neutre)
|
||||||
|
|
||||||
# Dimensions — confortables
|
# Dimensions — confortables
|
||||||
WIN_WIDTH = 600
|
WIN_WIDTH = 600
|
||||||
WIN_HEIGHT = 800
|
WIN_HEIGHT = 800
|
||||||
@@ -62,6 +90,80 @@ FONT_SEND_BTN = ("Segoe UI", 13)
|
|||||||
FONT_RESIZE_GRIP = ("Segoe UI", 10)
|
FONT_RESIZE_GRIP = ("Segoe UI", 10)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Templates de bulles "Léa exécute" (J3.4)
|
||||||
|
# Chaque template prend un payload et retourne (icon, icon_color, title).
|
||||||
|
# Les libellés sont volontairement neutres : le contexte métier vient du
|
||||||
|
# payload (workflow, action, message), pas de hardcoding.
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _tpl_action_started(payload: Dict[str, Any]) -> tuple:
|
||||||
|
wf = payload.get("workflow") or "?"
|
||||||
|
return ("▶", ACTION_ICON_RUN, f"Démarrage : {wf}")
|
||||||
|
|
||||||
|
|
||||||
|
def _tpl_action_progress(payload: Dict[str, Any]) -> tuple:
|
||||||
|
cur = payload.get("current", "?")
|
||||||
|
tot = payload.get("total", "?")
|
||||||
|
step = payload.get("step")
|
||||||
|
title = step if step else f"Étape {cur}/{tot}"
|
||||||
|
return ("⋯", ACTION_ICON_RUN, str(title))
|
||||||
|
|
||||||
|
|
||||||
|
def _tpl_done(payload: Dict[str, Any]) -> tuple:
|
||||||
|
success = bool(payload.get("success", True))
|
||||||
|
msg = payload.get("message") or ("Terminé" if success else "Échec")
|
||||||
|
if success:
|
||||||
|
return ("✓", ACTION_ICON_OK, str(msg))
|
||||||
|
return ("✗", ACTION_ICON_ERR, str(msg))
|
||||||
|
|
||||||
|
|
||||||
|
def _tpl_need_confirm(payload: Dict[str, Any]) -> tuple:
|
||||||
|
action = payload.get("action") or {}
|
||||||
|
desc = action.get("description") if isinstance(action, dict) else None
|
||||||
|
title = desc or "Validation requise"
|
||||||
|
return ("?", ACTION_ICON_RUN, str(title))
|
||||||
|
|
||||||
|
|
||||||
|
def _tpl_step_result(payload: Dict[str, Any]) -> tuple:
|
||||||
|
status = (payload.get("status") or "").lower()
|
||||||
|
msg = payload.get("message") or status or "Étape terminée"
|
||||||
|
if status in ("ok", "success", "approved"):
|
||||||
|
return ("✓", ACTION_ICON_OK, str(msg))
|
||||||
|
if status in ("error", "failed"):
|
||||||
|
return ("✗", ACTION_ICON_ERR, str(msg))
|
||||||
|
return ("·", ACTION_ICON_INFO, str(msg))
|
||||||
|
|
||||||
|
|
||||||
|
def _tpl_resumed(payload: Dict[str, Any]) -> tuple:
|
||||||
|
return ("→", ACTION_ICON_OK, "Reprise")
|
||||||
|
|
||||||
|
|
||||||
|
_ACTION_TEMPLATES = {
|
||||||
|
"lea:action_started": _tpl_action_started,
|
||||||
|
"lea:action_progress": _tpl_action_progress,
|
||||||
|
"lea:done": _tpl_done,
|
||||||
|
"lea:need_confirm": _tpl_need_confirm,
|
||||||
|
"lea:step_result": _tpl_step_result,
|
||||||
|
"lea:resumed": _tpl_resumed,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_meta(payload: Dict[str, Any]) -> str:
|
||||||
|
"""Métadonnées techniques en pied de bulle (workflow, étape, replay_id court)."""
|
||||||
|
parts = []
|
||||||
|
wf = payload.get("workflow")
|
||||||
|
if wf:
|
||||||
|
parts.append(str(wf))
|
||||||
|
cur, tot = payload.get("current"), payload.get("total")
|
||||||
|
if cur is not None and tot is not None:
|
||||||
|
parts.append(f"étape {cur}/{tot}")
|
||||||
|
rid = payload.get("replay_id")
|
||||||
|
if rid:
|
||||||
|
parts.append(f"#{str(rid)[-6:]}")
|
||||||
|
return " • ".join(parts)
|
||||||
|
|
||||||
|
|
||||||
class ChatWindow:
|
class ChatWindow:
|
||||||
"""Fenetre de chat Lea en tkinter natif.
|
"""Fenetre de chat Lea en tkinter natif.
|
||||||
|
|
||||||
@@ -91,6 +193,8 @@ class ChatWindow:
|
|||||||
self._root = None
|
self._root = None
|
||||||
self._ready = threading.Event()
|
self._ready = threading.Event()
|
||||||
self._messages = [] # historique local
|
self._messages = [] # historique local
|
||||||
|
self._bus: Optional[Any] = None # FeedbackBusClient (J3.3, peut rester None)
|
||||||
|
self._active_paused_bubble: Optional[Dict[str, Any]] = None # bulle paused active (J3.5)
|
||||||
|
|
||||||
# S'abonner aux changements de l'etat partage
|
# S'abonner aux changements de l'etat partage
|
||||||
if self._shared_state is not None:
|
if self._shared_state is not None:
|
||||||
@@ -266,6 +370,9 @@ class ChatWindow:
|
|||||||
# Signaler que la fenetre est prete
|
# Signaler que la fenetre est prete
|
||||||
self._ready.set()
|
self._ready.set()
|
||||||
|
|
||||||
|
# Demarrer le bus feedback Lea (events 'lea:*' temps reel)
|
||||||
|
self._start_feedback_bus()
|
||||||
|
|
||||||
# Boucle tkinter
|
# Boucle tkinter
|
||||||
root.mainloop()
|
root.mainloop()
|
||||||
|
|
||||||
@@ -608,6 +715,12 @@ class ChatWindow:
|
|||||||
|
|
||||||
def _do_destroy(self) -> None:
|
def _do_destroy(self) -> None:
|
||||||
"""Detruit la fenetre (appele dans le thread tkinter)."""
|
"""Detruit la fenetre (appele dans le thread tkinter)."""
|
||||||
|
if self._bus is not None:
|
||||||
|
try:
|
||||||
|
self._bus.stop()
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
self._bus = None
|
||||||
if self._root is not None:
|
if self._root is not None:
|
||||||
try:
|
try:
|
||||||
self._root.quit()
|
self._root.quit()
|
||||||
@@ -617,6 +730,260 @@ class ChatWindow:
|
|||||||
self._root = None
|
self._root = None
|
||||||
self._visible = False
|
self._visible = False
|
||||||
|
|
||||||
|
# ======================================================================
|
||||||
|
# FeedbackBus — bulles temps reel pendant l'execution (J3.3)
|
||||||
|
# ======================================================================
|
||||||
|
|
||||||
|
def _start_feedback_bus(self) -> None:
|
||||||
|
"""Demarrer la connexion au bus 'lea:*' si flag actif et lib disponible."""
|
||||||
|
if not _HAS_FEEDBACK_BUS:
|
||||||
|
logger.debug("FeedbackBus non disponible (python-socketio manquant)")
|
||||||
|
return
|
||||||
|
flag = os.environ.get("LEA_FEEDBACK_BUS", "0").lower()
|
||||||
|
if flag not in ("1", "true", "yes", "on"):
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
url = f"http://{self._server_host}:{self._chat_port}"
|
||||||
|
token = os.environ.get("RPA_API_TOKEN", "") or None
|
||||||
|
self._bus = FeedbackBusClient(url, token=token, on_event=self._on_lea_event)
|
||||||
|
self._bus.start()
|
||||||
|
logger.info("FeedbackBus demarre : %s", url)
|
||||||
|
except Exception:
|
||||||
|
logger.debug("FeedbackBus init silenced", exc_info=True)
|
||||||
|
self._bus = None
|
||||||
|
|
||||||
|
def _on_lea_event(self, event: str, payload: Dict[str, Any]) -> None:
|
||||||
|
"""Callback bus → bulle Lea. Thread-safe : helpers utilisent root.after."""
|
||||||
|
payload = payload or {}
|
||||||
|
|
||||||
|
# J3.5 : la pause supervisée a sa propre bulle interactive
|
||||||
|
if event == "lea:paused":
|
||||||
|
self._add_paused_bubble(payload)
|
||||||
|
return
|
||||||
|
if event in ("lea:resumed", "lea:done"):
|
||||||
|
self._close_active_paused_bubble(reason=event)
|
||||||
|
# on continue pour afficher la bulle d'action (cf. dispatch ci-dessous)
|
||||||
|
|
||||||
|
# Acks bus (resume_acked, abort_acked) : silencieux côté UI
|
||||||
|
if event in ("lea:resume_acked", "lea:abort_acked"):
|
||||||
|
return
|
||||||
|
|
||||||
|
# J3.4 : bulle "Léa exécute" stylisée (séparée des bulles chat normales)
|
||||||
|
rendered = _ACTION_TEMPLATES.get(event)
|
||||||
|
if rendered is None:
|
||||||
|
# Event inconnu : on affiche en bulle d'action neutre
|
||||||
|
self._add_action_bubble(
|
||||||
|
icon="·", icon_color=ACTION_ICON_INFO,
|
||||||
|
title=event.removeprefix("lea:"),
|
||||||
|
meta=_extract_meta(payload),
|
||||||
|
)
|
||||||
|
return
|
||||||
|
icon, icon_color, title = rendered(payload)
|
||||||
|
self._add_action_bubble(
|
||||||
|
icon=icon, icon_color=icon_color, title=title,
|
||||||
|
meta=_extract_meta(payload),
|
||||||
|
)
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Bulle "Léa exécute" stylisée (J3.4)
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _add_action_bubble(
|
||||||
|
self, icon: str, icon_color: str, title: str, meta: str = "",
|
||||||
|
) -> None:
|
||||||
|
if self._root is None:
|
||||||
|
return
|
||||||
|
self._root.after(0, lambda: self._render_action_bubble(icon, icon_color, title, meta))
|
||||||
|
|
||||||
|
def _render_action_bubble(
|
||||||
|
self, icon: str, icon_color: str, title: str, meta: str,
|
||||||
|
) -> None:
|
||||||
|
tk = self._tk
|
||||||
|
if getattr(self, "_msg_frame", None) is None:
|
||||||
|
return
|
||||||
|
now = datetime.now().strftime("%H:%M")
|
||||||
|
|
||||||
|
container = tk.Frame(self._msg_frame, bg=BG_COLOR)
|
||||||
|
container.pack(fill=tk.X, padx=MARGIN, pady=3)
|
||||||
|
|
||||||
|
inner = tk.Frame(
|
||||||
|
container, bg=ACTION_BG, padx=10, pady=6,
|
||||||
|
highlightbackground=ACTION_BORDER, highlightthickness=1,
|
||||||
|
)
|
||||||
|
inner.pack(anchor=tk.W, padx=(0, 70), fill=tk.X)
|
||||||
|
|
||||||
|
row = tk.Frame(inner, bg=ACTION_BG)
|
||||||
|
row.pack(fill=tk.X, anchor=tk.W)
|
||||||
|
|
||||||
|
tk.Label(
|
||||||
|
row, text=icon, bg=ACTION_BG, fg=icon_color,
|
||||||
|
font=("Segoe UI", 13, "bold"), padx=4,
|
||||||
|
).pack(side=tk.LEFT)
|
||||||
|
|
||||||
|
tk.Label(
|
||||||
|
row, text=title, bg=ACTION_BG, fg=ACTION_FG,
|
||||||
|
font=FONT_MSG, anchor="w", justify=tk.LEFT,
|
||||||
|
wraplength=MSG_WRAP_WIDTH - 60,
|
||||||
|
).pack(side=tk.LEFT, fill=tk.X, expand=True, padx=(2, 0))
|
||||||
|
|
||||||
|
if meta:
|
||||||
|
tk.Label(
|
||||||
|
inner, text=f"{meta} • {now}",
|
||||||
|
bg=ACTION_BG, fg=ACTION_META_FG,
|
||||||
|
font=FONT_TIMESTAMP, anchor="w",
|
||||||
|
).pack(fill=tk.X, anchor=tk.W, pady=(2, 0))
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Bulle paused_need_help interactive (J3.5)
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _add_paused_bubble(self, payload: Dict[str, Any]) -> None:
|
||||||
|
"""Ajouter une bulle paused interactive (asset démo : Léa demande de l'aide).
|
||||||
|
|
||||||
|
IMPORTANT (8 mai 2026, démo GHT) : par défaut la fenêtre démarre cachée
|
||||||
|
(`root.withdraw()`). Il FAUT la rendre visible et la forcer au premier
|
||||||
|
plan, sinon Dom ne voit jamais la bulle. On exécute dans le thread
|
||||||
|
tkinter via `root.after(0, ...)`.
|
||||||
|
"""
|
||||||
|
if self._root is None:
|
||||||
|
return
|
||||||
|
|
||||||
|
def _show_and_render():
|
||||||
|
try:
|
||||||
|
self._do_show()
|
||||||
|
# Re-pin topmost pour passer devant les apps actives
|
||||||
|
self._root.attributes("-topmost", True)
|
||||||
|
self._root.lift()
|
||||||
|
# Toast topmost en complément (visible même si la chat est
|
||||||
|
# masquée par une fenêtre d'app)
|
||||||
|
try:
|
||||||
|
from .paused_toast import show_paused_toast
|
||||||
|
reason = payload.get("reason") or "Action en attente."
|
||||||
|
show_paused_toast(
|
||||||
|
title="Léa a besoin de votre aide",
|
||||||
|
message=str(reason)[:300],
|
||||||
|
)
|
||||||
|
except Exception:
|
||||||
|
logger.debug("paused_toast launch silenced", exc_info=True)
|
||||||
|
except Exception:
|
||||||
|
logger.debug("force-show chat_window silenced", exc_info=True)
|
||||||
|
self._render_paused_bubble(payload)
|
||||||
|
|
||||||
|
self._root.after(0, _show_and_render)
|
||||||
|
|
||||||
|
def _render_paused_bubble(self, payload: Dict[str, Any]) -> None:
|
||||||
|
tk = self._tk
|
||||||
|
if getattr(self, "_msg_frame", None) is None:
|
||||||
|
return
|
||||||
|
|
||||||
|
replay_id = str(payload.get("replay_id", "") or "")
|
||||||
|
workflow = payload.get("workflow", "?")
|
||||||
|
reason = payload.get("reason") or "Action incertaine — j'ai besoin de votre validation."
|
||||||
|
completed = payload.get("completed", 0)
|
||||||
|
total = payload.get("total", "?")
|
||||||
|
now = datetime.now().strftime("%H:%M")
|
||||||
|
|
||||||
|
container = tk.Frame(self._msg_frame, bg=BG_COLOR)
|
||||||
|
container.pack(fill=tk.X, padx=MARGIN, pady=6)
|
||||||
|
|
||||||
|
inner = tk.Frame(
|
||||||
|
container, bg=PAUSED_BG, padx=14, pady=12,
|
||||||
|
highlightbackground=PAUSED_BORDER, highlightthickness=2,
|
||||||
|
)
|
||||||
|
inner.pack(anchor=tk.W, padx=(0, 50), fill=tk.X)
|
||||||
|
|
||||||
|
tk.Label(
|
||||||
|
inner, text=f"⏸ Pause supervisée • {now}",
|
||||||
|
bg=PAUSED_BG, fg=PAUSED_FG,
|
||||||
|
font=("Segoe UI", 12, "bold"), anchor="w",
|
||||||
|
).pack(fill=tk.X, anchor=tk.W)
|
||||||
|
|
||||||
|
tk.Label(
|
||||||
|
inner, text=reason, bg=PAUSED_BG, fg=PAUSED_FG,
|
||||||
|
font=FONT_MSG, wraplength=MSG_WRAP_WIDTH - 30,
|
||||||
|
anchor="w", justify=tk.LEFT,
|
||||||
|
).pack(fill=tk.X, anchor=tk.W, pady=(6, 0))
|
||||||
|
|
||||||
|
tk.Label(
|
||||||
|
inner, text=f"{workflow} — étape {completed}/{total}",
|
||||||
|
bg=PAUSED_BG, fg=TIMESTAMP_FG, font=FONT_TIMESTAMP, anchor="w",
|
||||||
|
).pack(fill=tk.X, anchor=tk.W, pady=(4, 8))
|
||||||
|
|
||||||
|
btn_frame = tk.Frame(inner, bg=PAUSED_BG)
|
||||||
|
btn_frame.pack(fill=tk.X, anchor=tk.W)
|
||||||
|
|
||||||
|
btn_resume = tk.Button(
|
||||||
|
btn_frame, text="Continuer",
|
||||||
|
bg=PAUSED_BTN_RESUME_BG, fg="white", font=FONT_QUICK_BTN,
|
||||||
|
padx=14, pady=4, bd=0, cursor="hand2",
|
||||||
|
activebackground=PAUSED_BTN_RESUME_HOVER, activeforeground="white",
|
||||||
|
command=lambda: self._on_paused_resume(replay_id),
|
||||||
|
)
|
||||||
|
btn_resume.pack(side=tk.LEFT, padx=(0, 8))
|
||||||
|
|
||||||
|
btn_abort = tk.Button(
|
||||||
|
btn_frame, text="Annuler",
|
||||||
|
bg=PAUSED_BTN_ABORT_BG, fg="white", font=FONT_QUICK_BTN,
|
||||||
|
padx=14, pady=4, bd=0, cursor="hand2",
|
||||||
|
activebackground=PAUSED_BTN_ABORT_HOVER, activeforeground="white",
|
||||||
|
command=lambda: self._on_paused_abort(replay_id),
|
||||||
|
)
|
||||||
|
btn_abort.pack(side=tk.LEFT)
|
||||||
|
|
||||||
|
self._active_paused_bubble = {
|
||||||
|
"container": container, "inner": inner,
|
||||||
|
"btn_resume": btn_resume, "btn_abort": btn_abort,
|
||||||
|
"replay_id": replay_id,
|
||||||
|
}
|
||||||
|
|
||||||
|
def _close_active_paused_bubble(self, reason: str) -> None:
|
||||||
|
if self._active_paused_bubble is None or self._root is None:
|
||||||
|
return
|
||||||
|
self._root.after(0, lambda: self._do_close_paused_bubble(reason))
|
||||||
|
|
||||||
|
def _do_close_paused_bubble(self, reason: str) -> None:
|
||||||
|
bubble = self._active_paused_bubble
|
||||||
|
if bubble is None:
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
bubble["btn_resume"].config(state="disabled")
|
||||||
|
bubble["btn_abort"].config(state="disabled")
|
||||||
|
label_text = {
|
||||||
|
"lea:resumed": "→ Reprise",
|
||||||
|
"lea:done": "→ Terminé",
|
||||||
|
}.get(reason, f"→ {reason}")
|
||||||
|
self._tk.Label(
|
||||||
|
bubble["inner"], text=label_text,
|
||||||
|
bg=PAUSED_BG, fg=PAUSED_FG, font=FONT_TIMESTAMP, anchor="w",
|
||||||
|
).pack(fill="x", anchor="w", pady=(6, 0))
|
||||||
|
except Exception:
|
||||||
|
logger.debug("close paused bubble silenced", exc_info=True)
|
||||||
|
self._active_paused_bubble = None
|
||||||
|
|
||||||
|
def _on_paused_resume(self, replay_id: str) -> None:
|
||||||
|
if not replay_id or self._bus is None or not self._bus.connected:
|
||||||
|
self._add_lea_message("⚠ Bus indisponible — impossible de relancer")
|
||||||
|
return
|
||||||
|
self._bus.resume_replay(replay_id)
|
||||||
|
if self._active_paused_bubble:
|
||||||
|
try:
|
||||||
|
self._active_paused_bubble["btn_resume"].config(state="disabled")
|
||||||
|
self._active_paused_bubble["btn_abort"].config(state="disabled")
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
def _on_paused_abort(self, replay_id: str) -> None:
|
||||||
|
if self._bus is None or not self._bus.connected:
|
||||||
|
self._add_lea_message("⚠ Bus indisponible — impossible d'annuler")
|
||||||
|
return
|
||||||
|
self._bus.abort_replay(replay_id)
|
||||||
|
if self._active_paused_bubble:
|
||||||
|
try:
|
||||||
|
self._active_paused_bubble["btn_resume"].config(state="disabled")
|
||||||
|
self._active_paused_bubble["btn_abort"].config(state="disabled")
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
# ======================================================================
|
# ======================================================================
|
||||||
# Ajout de messages dans la zone de chat
|
# Ajout de messages dans la zone de chat
|
||||||
# ======================================================================
|
# ======================================================================
|
||||||
|
|||||||
@@ -293,6 +293,49 @@ def formatter_ecran_inchange(action_type: str = "") -> MessageUtilisateur:
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def formatter_mode_apprentissage(
|
||||||
|
raison: str = "",
|
||||||
|
description_cible: str = "",
|
||||||
|
titre_fenetre: Optional[str] = None,
|
||||||
|
) -> MessageUtilisateur:
|
||||||
|
"""Message quand Léa passe en mode apprentissage (pause supervisée).
|
||||||
|
|
||||||
|
L'utilisateur doit comprendre :
|
||||||
|
1. Léa est bloquée et a besoin d'aide
|
||||||
|
2. L'utilisateur doit prendre la main et montrer comment faire
|
||||||
|
3. Ctrl+Shift+L pour signaler qu'il a fini
|
||||||
|
|
||||||
|
Le ton est humble, clair, actionnable. Pas technique.
|
||||||
|
|
||||||
|
Exemple :
|
||||||
|
Léa a besoin d'aide
|
||||||
|
Je n'y arrive pas, montrez-moi comment faire.
|
||||||
|
Quand vous avez fini, appuyez sur Ctrl+Shift+L.
|
||||||
|
"""
|
||||||
|
cible = _nettoyer_description_cible(description_cible) if description_cible else ""
|
||||||
|
app = _extraire_nom_application(titre_fenetre or "") if titre_fenetre else ""
|
||||||
|
|
||||||
|
# Construire un contexte court si disponible
|
||||||
|
contexte = ""
|
||||||
|
if cible and app:
|
||||||
|
contexte = f" (« {cible} » dans {app})"
|
||||||
|
elif cible:
|
||||||
|
contexte = f" (« {cible} »)"
|
||||||
|
|
||||||
|
corps = (
|
||||||
|
f"Je n'y arrive pas{contexte}, montrez-moi comment faire. "
|
||||||
|
f"Quand vous avez fini, appuyez sur Ctrl+Shift+L."
|
||||||
|
)
|
||||||
|
|
||||||
|
return MessageUtilisateur(
|
||||||
|
niveau=NiveauMessage.BLOCAGE,
|
||||||
|
titre="Léa a besoin d'aide",
|
||||||
|
corps=corps,
|
||||||
|
duree_s=DUREE_PAR_NIVEAU[NiveauMessage.BLOCAGE],
|
||||||
|
persistent=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
def formatter_connexion_perdue(hote_serveur: str = "") -> MessageUtilisateur:
|
def formatter_connexion_perdue(hote_serveur: str = "") -> MessageUtilisateur:
|
||||||
"""Message quand la connexion avec le serveur est perdue.
|
"""Message quand la connexion avec le serveur est perdue.
|
||||||
|
|
||||||
|
|||||||
@@ -32,6 +32,7 @@ from .messages import (
|
|||||||
formatter_etape_workflow,
|
formatter_etape_workflow,
|
||||||
formatter_fenetre_incorrecte,
|
formatter_fenetre_incorrecte,
|
||||||
formatter_fin_workflow,
|
formatter_fin_workflow,
|
||||||
|
formatter_mode_apprentissage,
|
||||||
formatter_ralentissement,
|
formatter_ralentissement,
|
||||||
formatter_retry,
|
formatter_retry,
|
||||||
)
|
)
|
||||||
@@ -138,10 +139,28 @@ class NotificationManager:
|
|||||||
|
|
||||||
Les messages BLOCAGE bypass le rate limit pour garantir que
|
Les messages BLOCAGE bypass le rate limit pour garantir que
|
||||||
l'utilisateur voit qu'on a besoin de lui.
|
l'utilisateur voit qu'on a besoin de lui.
|
||||||
|
|
||||||
|
Démo GHT 8 mai 2026 : pour les BLOCAGE, on déclenche en complément
|
||||||
|
un toast Tkinter custom topmost (paused_toast). Plyer est silencieux
|
||||||
|
sur Windows 11 quand Focus Assist / Quiet Hours / app-id manquante
|
||||||
|
bloquent les balloons. Le toast custom est 100 % autonome et garantit
|
||||||
|
que Dom voit le message en démo.
|
||||||
"""
|
"""
|
||||||
bypass = msg.niveau == NiveauMessage.BLOCAGE
|
bypass = msg.niveau == NiveauMessage.BLOCAGE
|
||||||
# Log aussi pour tracer dans les logs fichiers
|
# Log aussi pour tracer dans les logs fichiers
|
||||||
self._log_message(msg)
|
self._log_message(msg)
|
||||||
|
|
||||||
|
# Toast Tkinter custom — uniquement BLOCAGE pour ne pas spammer
|
||||||
|
if msg.niveau == NiveauMessage.BLOCAGE:
|
||||||
|
try:
|
||||||
|
from .paused_toast import show_paused_toast
|
||||||
|
show_paused_toast(
|
||||||
|
title=str(msg.titre)[:80] or "Léa a besoin de votre aide",
|
||||||
|
message=str(msg.corps)[:300],
|
||||||
|
)
|
||||||
|
except Exception:
|
||||||
|
logger.debug("paused_toast (BLOCAGE) silenced", exc_info=True)
|
||||||
|
|
||||||
return self.notify(
|
return self.notify(
|
||||||
title=msg.titre,
|
title=msg.titre,
|
||||||
message=msg.corps,
|
message=msg.corps,
|
||||||
@@ -273,6 +292,20 @@ class NotificationManager:
|
|||||||
msg = formatter_ecran_inchange(action_type)
|
msg = formatter_ecran_inchange(action_type)
|
||||||
return self.notify_message(msg)
|
return self.notify_message(msg)
|
||||||
|
|
||||||
|
def replay_learning_mode(
|
||||||
|
self,
|
||||||
|
raison: str = "",
|
||||||
|
target_description: str = "",
|
||||||
|
window_title: Optional[str] = None,
|
||||||
|
) -> bool:
|
||||||
|
"""Notification quand Léa passe en mode apprentissage.
|
||||||
|
|
||||||
|
Léa est bloquée et demande à l'utilisateur de montrer comment faire.
|
||||||
|
Message humble et actionnable pour un utilisateur non technique.
|
||||||
|
"""
|
||||||
|
msg = formatter_mode_apprentissage(raison, target_description, window_title)
|
||||||
|
return self.notify_message(msg)
|
||||||
|
|
||||||
def replay_retry(self, action_type: str = "", tentative: int = 2) -> bool:
|
def replay_retry(self, action_type: str = "", tentative: int = 2) -> bool:
|
||||||
"""Notification quand Léa retente une action."""
|
"""Notification quand Léa retente une action."""
|
||||||
msg = formatter_retry(action_type, tentative)
|
msg = formatter_retry(action_type, tentative)
|
||||||
|
|||||||
290
agent_v0/agent_v1/ui/paused_toast.py
Normal file
290
agent_v0/agent_v1/ui/paused_toast.py
Normal file
@@ -0,0 +1,290 @@
|
|||||||
|
# agent_v1/ui/paused_toast.py
|
||||||
|
"""
|
||||||
|
Toast Tkinter custom pour la pause supervisée (« Léa a besoin de votre aide »).
|
||||||
|
|
||||||
|
Démo GHT 8 mai 2026 — Fallback robuste 100 % autonome quand :
|
||||||
|
- plyer.notification est silencieux sous Windows 11 (Focus Assist, balloon tips
|
||||||
|
bloqués par la stratégie système),
|
||||||
|
- la ChatWindow Léa V1 est `withdraw()`-cachée par défaut (Dom ne la voit pas),
|
||||||
|
- aucune autre UI ne peut garantir que Dom verra physiquement le message.
|
||||||
|
|
||||||
|
Stratégie :
|
||||||
|
- Toplevel topmost overrideredirect en haut à droite de l'écran principal,
|
||||||
|
- fond bleu Léa, titre + message, auto-close après TOAST_DURATION_S,
|
||||||
|
- thread-safe : peut être appelé depuis n'importe quel thread (le polling
|
||||||
|
replay tourne dans un daemon thread, pas le thread principal),
|
||||||
|
- aucune dépendance externe (juste tkinter stdlib),
|
||||||
|
- rate limit interne pour éviter le flood (1 toast / 3s minimum).
|
||||||
|
|
||||||
|
Si un Tk root existe déjà dans le process (ChatWindow), on attache le Toplevel
|
||||||
|
à ce root via `root.after(0, ...)` — c'est l'idiome thread-safe officiel de
|
||||||
|
tkinter. Sinon on crée un Tk() dédié dans un daemon thread.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import threading
|
||||||
|
import time
|
||||||
|
from typing import Any, Optional
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# Couleurs cohérentes avec le thème Léa (cf. chat_window.py)
|
||||||
|
TOAST_BG = "#2563EB" # Bleu Léa (HEADER_BG)
|
||||||
|
TOAST_FG = "#FFFFFF"
|
||||||
|
TOAST_TITLE_BG = "#1E40AF" # Bleu plus foncé pour le bandeau titre
|
||||||
|
TOAST_BORDER = "#1E3A8A"
|
||||||
|
|
||||||
|
TOAST_WIDTH = 380
|
||||||
|
TOAST_PAD_X = 18
|
||||||
|
TOAST_PAD_Y = 14
|
||||||
|
TOAST_DURATION_MS = 15000
|
||||||
|
TOAST_RATE_LIMIT_S = 3.0
|
||||||
|
|
||||||
|
_lock = threading.Lock()
|
||||||
|
_last_shown_at: float = 0.0
|
||||||
|
_last_message: str = ""
|
||||||
|
|
||||||
|
|
||||||
|
def _resolve_existing_root() -> Optional[Any]:
|
||||||
|
"""Tente de récupérer le Tk root déjà créé par la ChatWindow.
|
||||||
|
|
||||||
|
On évite tk._default_root (deprecated) et on remonte plutôt via les
|
||||||
|
threads existants : la ChatWindow garde une référence dans son instance
|
||||||
|
mais n'expose rien de global. On se rabat donc sur la création d'un Tk
|
||||||
|
indépendant si on n'a rien — c'est sûr, tkinter supporte plusieurs Tk()
|
||||||
|
concurrents tant qu'ils sont chacun dans leur propre thread.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
import tkinter as tk
|
||||||
|
# tk._default_root est interne mais c'est le moyen le plus simple
|
||||||
|
# de partager un mainloop existant. Si ChatWindow tourne, ce sera
|
||||||
|
# son root.
|
||||||
|
root = getattr(tk, "_default_root", None)
|
||||||
|
if root is not None:
|
||||||
|
# Vérifier qu'il est encore vivant
|
||||||
|
try:
|
||||||
|
root.winfo_exists()
|
||||||
|
return root
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
return None
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _build_toast(parent: Any, title: str, message: str) -> Any:
|
||||||
|
"""Construit le Toplevel toast (appelé dans le thread tkinter)."""
|
||||||
|
import tkinter as tk
|
||||||
|
|
||||||
|
top = tk.Toplevel(parent)
|
||||||
|
top.withdraw() # éviter le flash pendant la construction
|
||||||
|
top.overrideredirect(True) # pas de barre de titre
|
||||||
|
top.attributes("-topmost", True)
|
||||||
|
try:
|
||||||
|
# Petit boost de visibilité Windows : alpha légèrement transparent
|
||||||
|
top.attributes("-alpha", 0.97)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Bordure visuelle (cadre extérieur foncé)
|
||||||
|
outer = tk.Frame(top, bg=TOAST_BORDER, padx=2, pady=2)
|
||||||
|
outer.pack(fill="both", expand=True)
|
||||||
|
|
||||||
|
# Bandeau titre
|
||||||
|
title_frame = tk.Frame(outer, bg=TOAST_TITLE_BG)
|
||||||
|
title_frame.pack(fill="x")
|
||||||
|
tk.Label(
|
||||||
|
title_frame,
|
||||||
|
text=f" ⏸ {title}",
|
||||||
|
bg=TOAST_TITLE_BG,
|
||||||
|
fg=TOAST_FG,
|
||||||
|
font=("Segoe UI", 12, "bold"),
|
||||||
|
anchor="w",
|
||||||
|
padx=10,
|
||||||
|
pady=8,
|
||||||
|
).pack(fill="x")
|
||||||
|
|
||||||
|
# Corps du message
|
||||||
|
body_frame = tk.Frame(outer, bg=TOAST_BG)
|
||||||
|
body_frame.pack(fill="both", expand=True)
|
||||||
|
tk.Label(
|
||||||
|
body_frame,
|
||||||
|
text=message,
|
||||||
|
bg=TOAST_BG,
|
||||||
|
fg=TOAST_FG,
|
||||||
|
font=("Segoe UI", 11),
|
||||||
|
wraplength=TOAST_WIDTH - 40,
|
||||||
|
justify="left",
|
||||||
|
anchor="w",
|
||||||
|
padx=TOAST_PAD_X,
|
||||||
|
pady=TOAST_PAD_Y,
|
||||||
|
).pack(fill="both", expand=True)
|
||||||
|
|
||||||
|
# Pied de page : "Cliquez pour fermer"
|
||||||
|
footer = tk.Label(
|
||||||
|
outer,
|
||||||
|
text="Cliquez pour fermer",
|
||||||
|
bg=TOAST_BG,
|
||||||
|
fg="#BFDBFE",
|
||||||
|
font=("Segoe UI", 9, "italic"),
|
||||||
|
anchor="e",
|
||||||
|
padx=10,
|
||||||
|
pady=4,
|
||||||
|
)
|
||||||
|
footer.pack(fill="x", side="bottom")
|
||||||
|
|
||||||
|
# Position : haut-droite de l'écran principal
|
||||||
|
top.update_idletasks()
|
||||||
|
height = top.winfo_reqheight()
|
||||||
|
screen_w = top.winfo_screenwidth()
|
||||||
|
x = screen_w - TOAST_WIDTH - 16
|
||||||
|
y = 16
|
||||||
|
top.geometry(f"{TOAST_WIDTH}x{height}+{x}+{y}")
|
||||||
|
|
||||||
|
# Click anywhere to close
|
||||||
|
def _close(_=None):
|
||||||
|
try:
|
||||||
|
top.destroy()
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
top.bind("<Button-1>", _close)
|
||||||
|
for child in (outer, title_frame, body_frame, footer):
|
||||||
|
try:
|
||||||
|
child.bind("<Button-1>", _close)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Afficher + boost focus brut pour passer devant Focus Assist
|
||||||
|
top.deiconify()
|
||||||
|
top.lift()
|
||||||
|
try:
|
||||||
|
top.focus_force()
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Re-pin topmost après 100 ms (Windows désactive parfois -topmost
|
||||||
|
# quand le focus est pris par une autre app)
|
||||||
|
def _repin():
|
||||||
|
try:
|
||||||
|
top.attributes("-topmost", True)
|
||||||
|
top.lift()
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
try:
|
||||||
|
top.after(100, _repin)
|
||||||
|
top.after(500, _repin)
|
||||||
|
top.after(2000, _repin)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Auto-close
|
||||||
|
try:
|
||||||
|
top.after(TOAST_DURATION_MS, _close)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
return top
|
||||||
|
|
||||||
|
|
||||||
|
def _show_in_dedicated_thread(title: str, message: str) -> None:
|
||||||
|
"""Crée un Tk() indépendant dans un daemon thread.
|
||||||
|
|
||||||
|
Utilisé en fallback quand aucun Tk root n'existe. Le thread vit le
|
||||||
|
temps du toast (~15s) puis se termine proprement.
|
||||||
|
"""
|
||||||
|
def _run():
|
||||||
|
try:
|
||||||
|
# DPI awareness (Windows haute résolution)
|
||||||
|
try:
|
||||||
|
import ctypes
|
||||||
|
ctypes.windll.shcore.SetProcessDpiAwareness(1)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
import tkinter as tk
|
||||||
|
|
||||||
|
root = tk.Tk()
|
||||||
|
root.withdraw()
|
||||||
|
try:
|
||||||
|
dpi = root.winfo_fpixels("1i")
|
||||||
|
root.tk.call("tk", "scaling", dpi / 72.0)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
top = _build_toast(root, title, message)
|
||||||
|
|
||||||
|
# Quitter mainloop quand le toast est détruit
|
||||||
|
def _watch():
|
||||||
|
try:
|
||||||
|
if not top.winfo_exists():
|
||||||
|
root.quit()
|
||||||
|
return
|
||||||
|
except Exception:
|
||||||
|
root.quit()
|
||||||
|
return
|
||||||
|
root.after(200, _watch)
|
||||||
|
|
||||||
|
root.after(200, _watch)
|
||||||
|
root.mainloop()
|
||||||
|
try:
|
||||||
|
root.destroy()
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
except Exception:
|
||||||
|
logger.debug("paused_toast dedicated thread failed", exc_info=True)
|
||||||
|
|
||||||
|
t = threading.Thread(target=_run, daemon=True, name="paused-toast-tk")
|
||||||
|
t.start()
|
||||||
|
|
||||||
|
|
||||||
|
def show_paused_toast(
|
||||||
|
title: str = "Léa a besoin de votre aide",
|
||||||
|
message: str = "",
|
||||||
|
) -> bool:
|
||||||
|
"""Affiche un toast paused topmost.
|
||||||
|
|
||||||
|
Thread-safe, rate-limité, sans dépendance externe. Retourne True si le
|
||||||
|
toast a été déclenché, False s'il a été ignoré (rate limit ou erreur).
|
||||||
|
"""
|
||||||
|
global _last_shown_at, _last_message
|
||||||
|
|
||||||
|
if not message:
|
||||||
|
message = "Action en attente de votre validation."
|
||||||
|
|
||||||
|
# Rate limit basique : éviter qu'un poll en boucle ouvre 50 toasts
|
||||||
|
now = time.monotonic()
|
||||||
|
with _lock:
|
||||||
|
same_message = (message == _last_message)
|
||||||
|
elapsed = now - _last_shown_at
|
||||||
|
if same_message and elapsed < TOAST_RATE_LIMIT_S:
|
||||||
|
logger.debug(
|
||||||
|
"paused_toast rate-limited (%.1fs since last identical)", elapsed
|
||||||
|
)
|
||||||
|
return False
|
||||||
|
_last_shown_at = now
|
||||||
|
_last_message = message
|
||||||
|
|
||||||
|
# Tentative 1 : utiliser le Tk root existant (ChatWindow) via after()
|
||||||
|
root = _resolve_existing_root()
|
||||||
|
if root is not None:
|
||||||
|
try:
|
||||||
|
root.after(0, lambda: _build_toast(root, title, message))
|
||||||
|
logger.info("paused_toast scheduled on existing Tk root")
|
||||||
|
return True
|
||||||
|
except Exception:
|
||||||
|
logger.debug("paused_toast existing-root path failed", exc_info=True)
|
||||||
|
|
||||||
|
# Tentative 2 : créer un Tk() dans un daemon thread
|
||||||
|
try:
|
||||||
|
_show_in_dedicated_thread(title, message)
|
||||||
|
logger.info("paused_toast scheduled in dedicated thread")
|
||||||
|
return True
|
||||||
|
except Exception:
|
||||||
|
logger.error("paused_toast dedicated-thread path failed", exc_info=True)
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
__all__ = ["show_paused_toast"]
|
||||||
@@ -2,12 +2,20 @@
|
|||||||
"""
|
"""
|
||||||
Gestionnaire de vision avancé pour Agent V1.
|
Gestionnaire de vision avancé pour Agent V1.
|
||||||
Optimisé pour le streaming fibre avec détection de changement.
|
Optimisé pour le streaming fibre avec détection de changement.
|
||||||
|
|
||||||
|
Captures disponibles :
|
||||||
|
- Plein écran (full) : contexte global 1920x1080+
|
||||||
|
- Crop ciblé (crop) : 80x80 autour du clic (apprentissage VLM)
|
||||||
|
- Fenêtre active (window) : image isolée de la fenêtre + métadonnées
|
||||||
|
(titre, rect, coordonnées clic relatives) — cross-platform
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import os
|
import os
|
||||||
import time
|
import time
|
||||||
import logging
|
import logging
|
||||||
import hashlib
|
import hashlib
|
||||||
|
import platform
|
||||||
|
from typing import Any, Dict, List, Optional
|
||||||
from PIL import Image, ImageFilter, ImageStat
|
from PIL import Image, ImageFilter, ImageStat
|
||||||
import mss
|
import mss
|
||||||
from ..config import TARGETED_CROP_SIZE, SCREENSHOT_QUALITY, BLUR_SENSITIVE
|
from ..config import TARGETED_CROP_SIZE, SCREENSHOT_QUALITY, BLUR_SENSITIVE
|
||||||
@@ -15,6 +23,69 @@ from .blur_sensitive import blur_sensitive_regions
|
|||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# OS courant (détecté une seule fois)
|
||||||
|
_SYSTEM = platform.system()
|
||||||
|
|
||||||
|
# QW1 — détection multi-écrans (fallback gracieux si screeninfo absent)
|
||||||
|
try:
|
||||||
|
from screeninfo import get_monitors as _screeninfo_get_monitors
|
||||||
|
_SCREENINFO_AVAILABLE = True
|
||||||
|
except ImportError:
|
||||||
|
_SCREENINFO_AVAILABLE = False
|
||||||
|
|
||||||
|
|
||||||
|
def _get_monitors_geometry() -> List[Dict[str, Any]]:
|
||||||
|
"""Retourne la liste des monitors physiques avec leurs offsets.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List[dict] : [{idx, x, y, w, h, primary}, ...]. Vide si screeninfo
|
||||||
|
indisponible (le serveur tombera sur fallback composite).
|
||||||
|
"""
|
||||||
|
if not _SCREENINFO_AVAILABLE:
|
||||||
|
return []
|
||||||
|
try:
|
||||||
|
monitors = _screeninfo_get_monitors()
|
||||||
|
return [
|
||||||
|
{
|
||||||
|
"idx": i,
|
||||||
|
"x": int(m.x),
|
||||||
|
"y": int(m.y),
|
||||||
|
"w": int(m.width),
|
||||||
|
"h": int(m.height),
|
||||||
|
"primary": bool(getattr(m, "is_primary", False)),
|
||||||
|
}
|
||||||
|
for i, m in enumerate(monitors)
|
||||||
|
]
|
||||||
|
except Exception:
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def _get_active_monitor_index() -> Optional[int]:
|
||||||
|
"""Retourne l'index logique du monitor où se trouve le curseur (focus actif).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
int ou None si indéterminable.
|
||||||
|
"""
|
||||||
|
if not _SCREENINFO_AVAILABLE:
|
||||||
|
return None
|
||||||
|
try:
|
||||||
|
import pyautogui # import paresseux : évite la dépendance dure
|
||||||
|
cx, cy = pyautogui.position()
|
||||||
|
for i, m in enumerate(_screeninfo_get_monitors()):
|
||||||
|
if m.x <= cx < m.x + m.width and m.y <= cy < m.y + m.height:
|
||||||
|
return i
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _enrich_with_monitor_info(payload: dict) -> dict:
|
||||||
|
"""Ajoute monitor_index et monitors_geometry au payload (in-place + return)."""
|
||||||
|
if isinstance(payload, dict):
|
||||||
|
payload["monitor_index"] = _get_active_monitor_index()
|
||||||
|
payload["monitors_geometry"] = _get_monitors_geometry()
|
||||||
|
return payload
|
||||||
|
|
||||||
class VisionCapturer:
|
class VisionCapturer:
|
||||||
def __init__(self, session_dir: str):
|
def __init__(self, session_dir: str):
|
||||||
self.session_dir = session_dir
|
self.session_dir = session_dir
|
||||||
@@ -27,6 +98,9 @@ class VisionCapturer:
|
|||||||
"""
|
"""
|
||||||
Capture l'écran complet.
|
Capture l'écran complet.
|
||||||
Si force=False, vérifie d'abord si l'écran a changé.
|
Si force=False, vérifie d'abord si l'écran a changé.
|
||||||
|
|
||||||
|
Enrichit les métadonnées avec le titre de la fenêtre active
|
||||||
|
(utile pour le contextualisation des heartbeats côté serveur).
|
||||||
"""
|
"""
|
||||||
try:
|
try:
|
||||||
with mss.mss() as sct:
|
with mss.mss() as sct:
|
||||||
@@ -52,8 +126,24 @@ class VisionCapturer:
|
|||||||
logger.error(f"Erreur Context Capture: {e}")
|
logger.error(f"Erreur Context Capture: {e}")
|
||||||
return ""
|
return ""
|
||||||
|
|
||||||
|
def get_active_window_title(self) -> str:
|
||||||
|
"""Retourne le titre de la fenêtre active (pour enrichir les heartbeats).
|
||||||
|
|
||||||
|
Fallback gracieux : retourne une chaîne vide si indisponible.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
from ..window_info_crossplatform import get_active_window_info
|
||||||
|
info = get_active_window_info()
|
||||||
|
return info.get("title", "")
|
||||||
|
except Exception:
|
||||||
|
return ""
|
||||||
|
|
||||||
def capture_dual(self, x: int, y: int, screenshot_id: str, anonymize=False) -> dict:
|
def capture_dual(self, x: int, y: int, screenshot_id: str, anonymize=False) -> dict:
|
||||||
"""Capture duale (Full + Crop) systématique (forcée car liée à une action)."""
|
"""Capture triple (Full + Crop + Fenêtre active) systématique.
|
||||||
|
|
||||||
|
La fenêtre active est un AJOUT — en cas d'échec, le full + crop
|
||||||
|
sont toujours retournés (fallback gracieux).
|
||||||
|
"""
|
||||||
try:
|
try:
|
||||||
with mss.mss() as sct:
|
with mss.mss() as sct:
|
||||||
full_path = os.path.join(self.shots_dir, f"{screenshot_id}_full.png")
|
full_path = os.path.join(self.shots_dir, f"{screenshot_id}_full.png")
|
||||||
@@ -82,11 +172,136 @@ class VisionCapturer:
|
|||||||
# Mise à jour du hash pour le prochain heartbeat
|
# Mise à jour du hash pour le prochain heartbeat
|
||||||
self.last_img_hash = self._compute_quick_hash(img)
|
self.last_img_hash = self._compute_quick_hash(img)
|
||||||
|
|
||||||
return {"full": full_path, "crop": crop_path}
|
result = {"full": full_path, "crop": crop_path}
|
||||||
|
|
||||||
|
# --- Capture de la fenêtre active ---
|
||||||
|
# Ajout non-bloquant : enrichit le résultat avec l'image
|
||||||
|
# de la fenêtre seule + métadonnées (titre, rect, clic relatif)
|
||||||
|
window_info = self.capture_active_window(x, y, screenshot_id, full_img=img)
|
||||||
|
if window_info:
|
||||||
|
result["window_capture"] = window_info
|
||||||
|
|
||||||
|
# QW1 — enrichissement multi-écrans (additif, fallback gracieux)
|
||||||
|
_enrich_with_monitor_info(result)
|
||||||
|
|
||||||
|
return result
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Erreur Dual Capture: {e}")
|
logger.error(f"Erreur Dual Capture: {e}")
|
||||||
return {}
|
return {}
|
||||||
|
|
||||||
|
def capture_active_window(
|
||||||
|
self,
|
||||||
|
x: int,
|
||||||
|
y: int,
|
||||||
|
screenshot_id: str,
|
||||||
|
full_img: Optional[Image.Image] = None,
|
||||||
|
) -> Optional[Dict[str, Any]]:
|
||||||
|
"""Capture l'image de la fenêtre active seule + métadonnées.
|
||||||
|
|
||||||
|
Stratégie :
|
||||||
|
1. Obtenir le rectangle de la fenêtre via l'API OS (pywin32 / xdotool / Quartz)
|
||||||
|
2. Cropper depuis le screenshot plein écran (plus fiable que PrintWindow)
|
||||||
|
3. Calculer les coordonnées du clic relatives à la fenêtre
|
||||||
|
|
||||||
|
Args:
|
||||||
|
x, y: coordonnées du clic en pixels écran
|
||||||
|
screenshot_id: identifiant pour le nom de fichier
|
||||||
|
full_img: screenshot plein écran déjà capturé (optionnel, évite une
|
||||||
|
double capture si appelé depuis capture_dual)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict avec window_image, window_title, window_rect, click_in_window,
|
||||||
|
window_size — ou None si la fenêtre est introuvable.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
from ..window_info_crossplatform import get_active_window_rect
|
||||||
|
|
||||||
|
rect_info = get_active_window_rect()
|
||||||
|
if not rect_info:
|
||||||
|
logger.debug("Fenêtre active introuvable — skip capture fenêtre")
|
||||||
|
return None
|
||||||
|
|
||||||
|
win_rect = rect_info["rect"] # [left, top, right, bottom]
|
||||||
|
win_left, win_top, win_right, win_bottom = win_rect
|
||||||
|
win_w, win_h = rect_info["size"] # [width, height]
|
||||||
|
title = rect_info.get("title", "unknown_window")
|
||||||
|
app_name = rect_info.get("app_name", "unknown_app")
|
||||||
|
|
||||||
|
# Ignorer les fenêtres trop petites (barres de tâches, popups système)
|
||||||
|
if win_w < 50 or win_h < 50:
|
||||||
|
logger.debug(f"Fenêtre trop petite ({win_w}x{win_h}) — skip")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Coordonnées du clic relatives à la fenêtre
|
||||||
|
click_rel_x = x - win_left
|
||||||
|
click_rel_y = y - win_top
|
||||||
|
|
||||||
|
# Si le clic est en dehors de la fenêtre, on le signale mais on continue
|
||||||
|
click_inside = (0 <= click_rel_x <= win_w and 0 <= click_rel_y <= win_h)
|
||||||
|
|
||||||
|
# --- Crop de la fenêtre depuis le plein écran ---
|
||||||
|
if full_img is None:
|
||||||
|
# Pas de screenshot fourni — en capturer un (cas standalone)
|
||||||
|
try:
|
||||||
|
with mss.mss() as sct:
|
||||||
|
monitor = sct.monitors[1]
|
||||||
|
sct_img = sct.grab(monitor)
|
||||||
|
full_img = Image.frombytes(
|
||||||
|
"RGB", sct_img.size, sct_img.bgra, "raw", "BGRX"
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Erreur capture plein écran pour fenêtre : {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Borner le crop aux limites de l'image plein écran
|
||||||
|
img_w, img_h = full_img.size
|
||||||
|
crop_left = max(0, win_left)
|
||||||
|
crop_top = max(0, win_top)
|
||||||
|
crop_right = min(img_w, win_right)
|
||||||
|
crop_bottom = min(img_h, win_bottom)
|
||||||
|
|
||||||
|
if crop_right <= crop_left or crop_bottom <= crop_top:
|
||||||
|
logger.debug("Fenêtre hors écran — skip capture fenêtre")
|
||||||
|
return None
|
||||||
|
|
||||||
|
window_img = full_img.crop((crop_left, crop_top, crop_right, crop_bottom))
|
||||||
|
|
||||||
|
# Floutage conformité AI Act
|
||||||
|
if BLUR_SENSITIVE:
|
||||||
|
blur_sensitive_regions(window_img)
|
||||||
|
|
||||||
|
# Sauvegarde
|
||||||
|
window_path = os.path.join(
|
||||||
|
self.shots_dir, f"{screenshot_id}_window.png"
|
||||||
|
)
|
||||||
|
window_img.save(window_path, "PNG", quality=SCREENSHOT_QUALITY)
|
||||||
|
|
||||||
|
result = {
|
||||||
|
"window_image": window_path,
|
||||||
|
"window_title": title,
|
||||||
|
"app_name": app_name,
|
||||||
|
"window_rect": win_rect,
|
||||||
|
"window_size": [win_w, win_h],
|
||||||
|
"click_in_window": [click_rel_x, click_rel_y],
|
||||||
|
"click_inside_window": click_inside,
|
||||||
|
}
|
||||||
|
|
||||||
|
# QW1 — enrichissement multi-écrans (additif)
|
||||||
|
_enrich_with_monitor_info(result)
|
||||||
|
|
||||||
|
logger.debug(
|
||||||
|
f"Fenêtre capturée : {title} ({win_w}x{win_h}) — "
|
||||||
|
f"clic relatif ({click_rel_x}, {click_rel_y})"
|
||||||
|
)
|
||||||
|
return result
|
||||||
|
|
||||||
|
except ImportError as e:
|
||||||
|
logger.debug(f"Module fenêtre indisponible : {e}")
|
||||||
|
return None
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Erreur capture fenêtre active : {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
def _compute_quick_hash(self, img: Image) -> str:
|
def _compute_quick_hash(self, img: Image) -> str:
|
||||||
"""Calcule un hash rapide basé sur une vignette réduite pour détecter les changements."""
|
"""Calcule un hash rapide basé sur une vignette réduite pour détecter les changements."""
|
||||||
# On réduit l'image à 64x64 pour comparer les masses de couleurs (très rapide)
|
# On réduit l'image à 64x64 pour comparer les masses de couleurs (très rapide)
|
||||||
|
|||||||
@@ -17,7 +17,7 @@ from __future__ import annotations
|
|||||||
|
|
||||||
import platform
|
import platform
|
||||||
import subprocess
|
import subprocess
|
||||||
from typing import Dict, Optional
|
from typing import Any, Dict, Optional
|
||||||
|
|
||||||
|
|
||||||
def _run_cmd(cmd: list[str]) -> Optional[str]:
|
def _run_cmd(cmd: list[str]) -> Optional[str]:
|
||||||
@@ -51,6 +51,32 @@ def get_active_window_info() -> Dict[str, str]:
|
|||||||
return {"title": "unknown_window", "app_name": "unknown_app"}
|
return {"title": "unknown_window", "app_name": "unknown_app"}
|
||||||
|
|
||||||
|
|
||||||
|
def get_active_window_rect() -> Optional[Dict[str, Any]]:
|
||||||
|
"""
|
||||||
|
Renvoie le rectangle de la fenêtre active :
|
||||||
|
{
|
||||||
|
"title": "...",
|
||||||
|
"app_name": "...",
|
||||||
|
"rect": [left, top, right, bottom],
|
||||||
|
"position": [left, top],
|
||||||
|
"size": [width, height],
|
||||||
|
"hwnd": int # Windows uniquement
|
||||||
|
}
|
||||||
|
|
||||||
|
Retourne None si la fenêtre est introuvable ou minimisée.
|
||||||
|
Détecte automatiquement l'OS et utilise la méthode appropriée.
|
||||||
|
"""
|
||||||
|
system = platform.system()
|
||||||
|
|
||||||
|
if system == "Windows":
|
||||||
|
return _get_window_rect_windows()
|
||||||
|
elif system == "Linux":
|
||||||
|
return _get_window_rect_linux()
|
||||||
|
elif system == "Darwin":
|
||||||
|
return _get_window_rect_macos()
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
def _get_window_info_linux() -> Dict[str, str]:
|
def _get_window_info_linux() -> Dict[str, str]:
|
||||||
"""
|
"""
|
||||||
Linux: utilise xdotool (X11)
|
Linux: utilise xdotool (X11)
|
||||||
@@ -178,6 +204,163 @@ def _get_window_info_macos() -> Dict[str, str]:
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _get_window_rect_windows() -> Optional[Dict[str, Any]]:
|
||||||
|
"""
|
||||||
|
Windows : utilise pywin32 pour obtenir le rectangle de la fenêtre active.
|
||||||
|
|
||||||
|
Retourne None si la fenêtre est minimisée (icônifiée) ou si pywin32 manque.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
import win32gui
|
||||||
|
import win32process
|
||||||
|
import psutil
|
||||||
|
|
||||||
|
hwnd = win32gui.GetForegroundWindow()
|
||||||
|
if not hwnd:
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Ignorer les fenêtres minimisées (pas de contenu visible)
|
||||||
|
if win32gui.IsIconic(hwnd):
|
||||||
|
return None
|
||||||
|
|
||||||
|
title = win32gui.GetWindowText(hwnd) or "unknown_window"
|
||||||
|
|
||||||
|
# Rectangle de la fenêtre (coordonnées écran absolues)
|
||||||
|
left, top, right, bottom = win32gui.GetWindowRect(hwnd)
|
||||||
|
width = right - left
|
||||||
|
height = bottom - top
|
||||||
|
|
||||||
|
# Ignorer les fenêtres de taille nulle ou absurde
|
||||||
|
if width <= 0 or height <= 0:
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Nom du processus
|
||||||
|
_, pid = win32process.GetWindowThreadProcessId(hwnd)
|
||||||
|
try:
|
||||||
|
app_name = psutil.Process(pid).name()
|
||||||
|
except Exception:
|
||||||
|
app_name = "unknown_app"
|
||||||
|
|
||||||
|
return {
|
||||||
|
"title": title,
|
||||||
|
"app_name": app_name,
|
||||||
|
"rect": [left, top, right, bottom],
|
||||||
|
"position": [left, top],
|
||||||
|
"size": [width, height],
|
||||||
|
"hwnd": hwnd,
|
||||||
|
}
|
||||||
|
|
||||||
|
except ImportError:
|
||||||
|
return None
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _get_window_rect_linux() -> Optional[Dict[str, Any]]:
|
||||||
|
"""
|
||||||
|
Linux (X11) : utilise xdotool + xwininfo pour obtenir le rectangle.
|
||||||
|
|
||||||
|
Nécessite : sudo apt-get install xdotool x11-utils
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
# Identifiant de la fenêtre active
|
||||||
|
wid = _run_cmd(["xdotool", "getactivewindow"])
|
||||||
|
if not wid:
|
||||||
|
return None
|
||||||
|
|
||||||
|
title = _run_cmd(["xdotool", "getactivewindow", "getwindowname"]) or "unknown_window"
|
||||||
|
pid_str = _run_cmd(["xdotool", "getactivewindow", "getwindowpid"])
|
||||||
|
app_name = "unknown_app"
|
||||||
|
if pid_str:
|
||||||
|
app_name = _run_cmd(["ps", "-p", pid_str.strip(), "-o", "comm="]) or "unknown_app"
|
||||||
|
|
||||||
|
# Géométrie via xdotool --shell (position + taille)
|
||||||
|
geom_raw = _run_cmd(["xdotool", "getwindowgeometry", "--shell", wid])
|
||||||
|
if not geom_raw:
|
||||||
|
return None
|
||||||
|
|
||||||
|
vals: Dict[str, int] = {}
|
||||||
|
for line in geom_raw.strip().splitlines():
|
||||||
|
if "=" in line:
|
||||||
|
k, v = line.split("=", 1)
|
||||||
|
try:
|
||||||
|
vals[k.strip()] = int(v.strip())
|
||||||
|
except ValueError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
if not {"X", "Y", "WIDTH", "HEIGHT"} <= vals.keys():
|
||||||
|
return None
|
||||||
|
|
||||||
|
x, y = vals["X"], vals["Y"]
|
||||||
|
w, h = vals["WIDTH"], vals["HEIGHT"]
|
||||||
|
|
||||||
|
return {
|
||||||
|
"title": title,
|
||||||
|
"app_name": app_name,
|
||||||
|
"rect": [x, y, x + w, y + h],
|
||||||
|
"position": [x, y],
|
||||||
|
"size": [w, h],
|
||||||
|
}
|
||||||
|
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _get_window_rect_macos() -> Optional[Dict[str, Any]]:
|
||||||
|
"""
|
||||||
|
macOS : utilise Quartz (CGWindowListCopyWindowInfo) pour obtenir le rectangle.
|
||||||
|
|
||||||
|
Nécessite : pip install pyobjc-framework-Quartz
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
from AppKit import NSWorkspace
|
||||||
|
from Quartz import (
|
||||||
|
CGWindowListCopyWindowInfo,
|
||||||
|
kCGWindowListOptionOnScreenOnly,
|
||||||
|
kCGNullWindowID,
|
||||||
|
)
|
||||||
|
|
||||||
|
active_app = NSWorkspace.sharedWorkspace().activeApplication()
|
||||||
|
app_name = active_app.get("NSApplicationName", "unknown_app")
|
||||||
|
|
||||||
|
window_list = CGWindowListCopyWindowInfo(
|
||||||
|
kCGWindowListOptionOnScreenOnly, kCGNullWindowID
|
||||||
|
)
|
||||||
|
|
||||||
|
for window in window_list:
|
||||||
|
owner_name = window.get("kCGWindowOwnerName", "")
|
||||||
|
if owner_name != app_name:
|
||||||
|
continue
|
||||||
|
|
||||||
|
bounds = window.get("kCGWindowBounds")
|
||||||
|
if not bounds:
|
||||||
|
continue
|
||||||
|
|
||||||
|
x = int(bounds.get("X", 0))
|
||||||
|
y = int(bounds.get("Y", 0))
|
||||||
|
w = int(bounds.get("Width", 0))
|
||||||
|
h = int(bounds.get("Height", 0))
|
||||||
|
if w <= 0 or h <= 0:
|
||||||
|
continue
|
||||||
|
|
||||||
|
title = window.get("kCGWindowName", "unknown_window") or "unknown_window"
|
||||||
|
|
||||||
|
return {
|
||||||
|
"title": title,
|
||||||
|
"app_name": app_name,
|
||||||
|
"rect": [x, y, x + w, y + h],
|
||||||
|
"position": [x, y],
|
||||||
|
"size": [w, h],
|
||||||
|
}
|
||||||
|
|
||||||
|
except ImportError:
|
||||||
|
return None
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
# Test rapide
|
# Test rapide
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
import time
|
import time
|
||||||
@@ -188,5 +371,10 @@ if __name__ == "__main__":
|
|||||||
|
|
||||||
for i in range(5):
|
for i in range(5):
|
||||||
info = get_active_window_info()
|
info = get_active_window_info()
|
||||||
|
rect = get_active_window_rect()
|
||||||
print(f"[{i+1}] App: {info['app_name']:20s} | Title: {info['title']}")
|
print(f"[{i+1}] App: {info['app_name']:20s} | Title: {info['title']}")
|
||||||
|
if rect:
|
||||||
|
print(f" Rect: {rect['rect']} | Size: {rect['size']}")
|
||||||
|
else:
|
||||||
|
print(" Rect: non disponible")
|
||||||
time.sleep(1)
|
time.sleep(1)
|
||||||
|
|||||||
@@ -512,6 +512,21 @@ class ActionExecutorV1:
|
|||||||
x_pct = action.get("x_pct", 0.0)
|
x_pct = action.get("x_pct", 0.0)
|
||||||
y_pct = action.get("y_pct", 0.0)
|
y_pct = action.get("y_pct", 0.0)
|
||||||
|
|
||||||
|
# QW1 — Si le serveur a résolu un monitor cible (idx >= 0),
|
||||||
|
# appliquer son offset aux coords absolues. Pour idx == -1
|
||||||
|
# (composite_fallback), aucun offset (backward compat).
|
||||||
|
# Le calcul des coords reste percent * (width/height) du monitor[1]
|
||||||
|
# côté client (x_pct est exprimé sur l'écran physique principal).
|
||||||
|
mon_res = action.get("monitor_resolution") or {}
|
||||||
|
mon_idx = mon_res.get("idx", -1)
|
||||||
|
mon_offset_x = mon_res.get("offset_x", 0) if mon_idx >= 0 else 0
|
||||||
|
mon_offset_y = mon_res.get("offset_y", 0) if mon_idx >= 0 else 0
|
||||||
|
if mon_idx >= 0 and (mon_offset_x or mon_offset_y):
|
||||||
|
logger.info(
|
||||||
|
f"[REPLAY] QW1 monitor cible idx={mon_idx} source={mon_res.get('source')} "
|
||||||
|
f"offset=({mon_offset_x},{mon_offset_y}) — appliqué aux coords"
|
||||||
|
)
|
||||||
|
|
||||||
# ── Diagnostic résolution ──
|
# ── Diagnostic résolution ──
|
||||||
logger.info(
|
logger.info(
|
||||||
f"[REPLAY] Action {action_id} ({action_type}) — "
|
f"[REPLAY] Action {action_id} ({action_type}) — "
|
||||||
@@ -578,8 +593,8 @@ class ActionExecutorV1:
|
|||||||
print(f" [OBSERVER] Popup détectée : '{popup_label}' — fermeture")
|
print(f" [OBSERVER] Popup détectée : '{popup_label}' — fermeture")
|
||||||
logger.info(f"Observer : popup '{popup_label}' détectée avant résolution")
|
logger.info(f"Observer : popup '{popup_label}' détectée avant résolution")
|
||||||
if popup_coords:
|
if popup_coords:
|
||||||
real_x = int(popup_coords["x_pct"] * width)
|
real_x = int(popup_coords["x_pct"] * width) + mon_offset_x
|
||||||
real_y = int(popup_coords["y_pct"] * height)
|
real_y = int(popup_coords["y_pct"] * height) + mon_offset_y
|
||||||
self._click((real_x, real_y), "left")
|
self._click((real_x, real_y), "left")
|
||||||
time.sleep(1.0)
|
time.sleep(1.0)
|
||||||
print(f" [OBSERVER] Popup fermée — reprise du flow normal")
|
print(f" [OBSERVER] Popup fermée — reprise du flow normal")
|
||||||
@@ -718,8 +733,8 @@ class ActionExecutorV1:
|
|||||||
self.notifier.replay_target_not_found(target_desc)
|
self.notifier.replay_target_not_found(target_desc)
|
||||||
return result
|
return result
|
||||||
|
|
||||||
real_x = int(x_pct * width)
|
real_x = int(x_pct * width) + mon_offset_x
|
||||||
real_y = int(y_pct * height)
|
real_y = int(y_pct * height) + mon_offset_y
|
||||||
button = action.get("button", "left")
|
button = action.get("button", "left")
|
||||||
mode = "VISUAL" if result.get("visual_resolved") else "COORD"
|
mode = "VISUAL" if result.get("visual_resolved") else "COORD"
|
||||||
print(
|
print(
|
||||||
@@ -781,8 +796,8 @@ class ActionExecutorV1:
|
|||||||
print(f" [TYPE] raw_keys disponibles ({len(raw_keys)} events) — replay exact")
|
print(f" [TYPE] raw_keys disponibles ({len(raw_keys)} events) — replay exact")
|
||||||
# Cliquer sur le champ avant de taper (si coordonnees disponibles)
|
# Cliquer sur le champ avant de taper (si coordonnees disponibles)
|
||||||
if x_pct > 0 and y_pct > 0:
|
if x_pct > 0 and y_pct > 0:
|
||||||
real_x = int(x_pct * width)
|
real_x = int(x_pct * width) + mon_offset_x
|
||||||
real_y = int(y_pct * height)
|
real_y = int(y_pct * height) + mon_offset_y
|
||||||
print(f" [TYPE] Clic prealable sur ({real_x}, {real_y})")
|
print(f" [TYPE] Clic prealable sur ({real_x}, {real_y})")
|
||||||
self._click((real_x, real_y), "left")
|
self._click((real_x, real_y), "left")
|
||||||
time.sleep(0.3)
|
time.sleep(0.3)
|
||||||
@@ -808,8 +823,8 @@ class ActionExecutorV1:
|
|||||||
logger.info(f"Replay key_combo : {keys} (raw_keys={'oui' if raw_keys else 'non'})")
|
logger.info(f"Replay key_combo : {keys} (raw_keys={'oui' if raw_keys else 'non'})")
|
||||||
|
|
||||||
elif action_type == "scroll":
|
elif action_type == "scroll":
|
||||||
real_x = int(x_pct * width) if x_pct > 0 else int(0.5 * width)
|
real_x = (int(x_pct * width) if x_pct > 0 else int(0.5 * width)) + mon_offset_x
|
||||||
real_y = int(y_pct * height) if y_pct > 0 else int(0.5 * height)
|
real_y = (int(y_pct * height) if y_pct > 0 else int(0.5 * height)) + mon_offset_y
|
||||||
delta = action.get("delta", -3)
|
delta = action.get("delta", -3)
|
||||||
print(f" [SCROLL] delta={delta} a ({real_x}, {real_y})")
|
print(f" [SCROLL] delta={delta} a ({real_x}, {real_y})")
|
||||||
self.mouse.position = (real_x, real_y)
|
self.mouse.position = (real_x, real_y)
|
||||||
@@ -1386,6 +1401,16 @@ Example: x_pct=0.50, y_pct=0.30"""
|
|||||||
data = resp.json()
|
data = resp.json()
|
||||||
action = data.get("action")
|
action = data.get("action")
|
||||||
if action is None:
|
if action is None:
|
||||||
|
# pause_for_human : afficher le message de décision à l'utilisateur
|
||||||
|
if data.get("replay_paused") and data.get("pause_message"):
|
||||||
|
msg = data["pause_message"]
|
||||||
|
print(f"[PAUSE] {msg}")
|
||||||
|
logger.info(f"Replay en pause — message : {msg}")
|
||||||
|
self.notifier.notify(
|
||||||
|
title="Léa — Validation requise",
|
||||||
|
message=msg[:250],
|
||||||
|
timeout=30,
|
||||||
|
)
|
||||||
return False
|
return False
|
||||||
|
|
||||||
except (requests.exceptions.ConnectionError, requests.exceptions.Timeout) as e:
|
except (requests.exceptions.ConnectionError, requests.exceptions.Timeout) as e:
|
||||||
|
|||||||
@@ -319,7 +319,22 @@ class AgentV1:
|
|||||||
if img_hash != self._last_heartbeat_hash:
|
if img_hash != self._last_heartbeat_hash:
|
||||||
self._last_heartbeat_hash = img_hash
|
self._last_heartbeat_hash = img_hash
|
||||||
self.streamer.push_image(full_path, f"heartbeat_{int(time.time())}")
|
self.streamer.push_image(full_path, f"heartbeat_{int(time.time())}")
|
||||||
self.streamer.push_event({"type": "heartbeat", "image": full_path, "timestamp": time.time(), "machine_id": self.machine_id})
|
heartbeat_event = {
|
||||||
|
"type": "heartbeat",
|
||||||
|
"image": full_path,
|
||||||
|
"timestamp": time.time(),
|
||||||
|
"machine_id": self.machine_id,
|
||||||
|
}
|
||||||
|
# QW1 — enrichissement multi-écrans (monitor_index + monitors_geometry)
|
||||||
|
# Additif, fallback gracieux : sans cet enrichissement, le serveur
|
||||||
|
# ne reçoit l'info qu'au moment des clics, donc QW1 ne s'active
|
||||||
|
# pas en continu sur poste Windows multi-écrans.
|
||||||
|
try:
|
||||||
|
from .vision.capturer import _enrich_with_monitor_info
|
||||||
|
_enrich_with_monitor_info(heartbeat_event)
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug("QW1 enrichissement heartbeat échoué: %s", e)
|
||||||
|
self.streamer.push_event(heartbeat_event)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Heartbeat error: {e}")
|
logger.error(f"Heartbeat error: {e}")
|
||||||
time.sleep(5)
|
time.sleep(5)
|
||||||
|
|||||||
@@ -8,12 +8,73 @@ import os
|
|||||||
import time
|
import time
|
||||||
import logging
|
import logging
|
||||||
import hashlib
|
import hashlib
|
||||||
|
from typing import Any, Dict, List, Optional
|
||||||
from PIL import Image, ImageFilter, ImageStat
|
from PIL import Image, ImageFilter, ImageStat
|
||||||
import mss
|
import mss
|
||||||
from ..config import TARGETED_CROP_SIZE, SCREENSHOT_QUALITY
|
from ..config import TARGETED_CROP_SIZE, SCREENSHOT_QUALITY
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# QW1 — détection multi-écrans (fallback gracieux si screeninfo absent)
|
||||||
|
try:
|
||||||
|
from screeninfo import get_monitors as _screeninfo_get_monitors
|
||||||
|
_SCREENINFO_AVAILABLE = True
|
||||||
|
except ImportError:
|
||||||
|
_SCREENINFO_AVAILABLE = False
|
||||||
|
|
||||||
|
|
||||||
|
def _get_monitors_geometry() -> List[Dict[str, Any]]:
|
||||||
|
"""Retourne la liste des monitors physiques avec leurs offsets.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List[dict] : [{idx, x, y, w, h, primary}, ...]. Vide si screeninfo
|
||||||
|
indisponible (le serveur tombera sur fallback composite).
|
||||||
|
"""
|
||||||
|
if not _SCREENINFO_AVAILABLE:
|
||||||
|
return []
|
||||||
|
try:
|
||||||
|
monitors = _screeninfo_get_monitors()
|
||||||
|
return [
|
||||||
|
{
|
||||||
|
"idx": i,
|
||||||
|
"x": int(m.x),
|
||||||
|
"y": int(m.y),
|
||||||
|
"w": int(m.width),
|
||||||
|
"h": int(m.height),
|
||||||
|
"primary": bool(getattr(m, "is_primary", False)),
|
||||||
|
}
|
||||||
|
for i, m in enumerate(monitors)
|
||||||
|
]
|
||||||
|
except Exception:
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def _get_active_monitor_index() -> Optional[int]:
|
||||||
|
"""Retourne l'index logique du monitor où se trouve le curseur (focus actif).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
int ou None si indéterminable.
|
||||||
|
"""
|
||||||
|
if not _SCREENINFO_AVAILABLE:
|
||||||
|
return None
|
||||||
|
try:
|
||||||
|
import pyautogui # import paresseux : évite la dépendance dure
|
||||||
|
cx, cy = pyautogui.position()
|
||||||
|
for i, m in enumerate(_screeninfo_get_monitors()):
|
||||||
|
if m.x <= cx < m.x + m.width and m.y <= cy < m.y + m.height:
|
||||||
|
return i
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _enrich_with_monitor_info(payload: dict) -> dict:
|
||||||
|
"""Ajoute monitor_index et monitors_geometry au payload (in-place + return)."""
|
||||||
|
if isinstance(payload, dict):
|
||||||
|
payload["monitor_index"] = _get_active_monitor_index()
|
||||||
|
payload["monitors_geometry"] = _get_monitors_geometry()
|
||||||
|
return payload
|
||||||
|
|
||||||
class VisionCapturer:
|
class VisionCapturer:
|
||||||
def __init__(self, session_dir: str):
|
def __init__(self, session_dir: str):
|
||||||
self.session_dir = session_dir
|
self.session_dir = session_dir
|
||||||
@@ -72,7 +133,12 @@ class VisionCapturer:
|
|||||||
# Mise à jour du hash pour le prochain heartbeat
|
# Mise à jour du hash pour le prochain heartbeat
|
||||||
self.last_img_hash = self._compute_quick_hash(img)
|
self.last_img_hash = self._compute_quick_hash(img)
|
||||||
|
|
||||||
return {"full": full_path, "crop": crop_path}
|
result = {"full": full_path, "crop": crop_path}
|
||||||
|
|
||||||
|
# QW1 — enrichissement multi-écrans (additif, fallback gracieux)
|
||||||
|
_enrich_with_monitor_info(result)
|
||||||
|
|
||||||
|
return result
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Erreur Dual Capture: {e}")
|
logger.error(f"Erreur Dual Capture: {e}")
|
||||||
return {}
|
return {}
|
||||||
|
|||||||
@@ -3,7 +3,9 @@ mss>=9.0.1 # Capture d'écran haute performance
|
|||||||
pynput>=1.7.7 # Clavier/Souris Cross-plateforme
|
pynput>=1.7.7 # Clavier/Souris Cross-plateforme
|
||||||
Pillow>=10.0.0 # Crops et processing image
|
Pillow>=10.0.0 # Crops et processing image
|
||||||
requests>=2.31.0 # Streaming réseau
|
requests>=2.31.0 # Streaming réseau
|
||||||
|
python-socketio[client]>=5.10,<6.0 # Bus feedback Léa 'lea:*' (compat Flask-SocketIO 5.3.x serveur)
|
||||||
psutil>=5.9.0 # Monitoring CPU/RAM
|
psutil>=5.9.0 # Monitoring CPU/RAM
|
||||||
|
screeninfo>=0.8 # QW1 — détection des monitors physiques + offsets
|
||||||
pystray>=0.19.5 # Icône Tray UI
|
pystray>=0.19.5 # Icône Tray UI
|
||||||
plyer>=2.1.0 # Notifications toast natives (remplace PyQt5)
|
plyer>=2.1.0 # Notifications toast natives (remplace PyQt5)
|
||||||
|
|
||||||
|
|||||||
@@ -21,36 +21,33 @@ from typing import Any, Callable, Dict, List, Optional
|
|||||||
logger = logging.getLogger("lea_ui.server_client")
|
logger = logging.getLogger("lea_ui.server_client")
|
||||||
|
|
||||||
|
|
||||||
def _get_server_host() -> str:
|
def _get_server_url() -> str:
|
||||||
"""Recuperer l'adresse du serveur Linux.
|
"""Recuperer l'URL du serveur RPA (avec /api/v1).
|
||||||
|
|
||||||
Ordre de resolution :
|
Ordre de resolution :
|
||||||
1. Variable d'environnement RPA_SERVER_HOST
|
1. Import depuis agent_v1.config (source de verite unique)
|
||||||
2. Fichier de config agent_config.json (cle "server_host")
|
2. Variable d'environnement RPA_SERVER_URL
|
||||||
3. Fallback localhost
|
3. Fallback http://localhost:5005/api/v1
|
||||||
"""
|
"""
|
||||||
# 1. Variable d'environnement
|
# 1. Import depuis config.py (source de verite)
|
||||||
host = os.environ.get("RPA_SERVER_HOST", "").strip()
|
|
||||||
if host:
|
|
||||||
return host
|
|
||||||
|
|
||||||
# 2. Fichier de config
|
|
||||||
config_paths = [
|
|
||||||
os.path.join(os.path.dirname(__file__), "..", "agent_config.json"),
|
|
||||||
os.path.join(os.path.dirname(__file__), "..", "..", "agent_config.json"),
|
|
||||||
]
|
|
||||||
for config_path in config_paths:
|
|
||||||
try:
|
try:
|
||||||
with open(config_path, "r", encoding="utf-8") as f:
|
from agent_v1.config import SERVER_URL
|
||||||
cfg = json.load(f)
|
return SERVER_URL
|
||||||
host = cfg.get("server_host", "").strip()
|
except ImportError:
|
||||||
if host:
|
pass
|
||||||
return host
|
|
||||||
except (OSError, json.JSONDecodeError):
|
# 2. Variable d'environnement directe
|
||||||
continue
|
url = os.environ.get("RPA_SERVER_URL", "").strip().rstrip("/")
|
||||||
|
if url:
|
||||||
|
return url
|
||||||
|
|
||||||
# 3. Fallback
|
# 3. Fallback
|
||||||
return "localhost"
|
return "http://localhost:5005/api/v1"
|
||||||
|
|
||||||
|
|
||||||
|
def _get_server_base(server_url: str) -> str:
|
||||||
|
"""Extraire la base URL (sans /api/v1) pour les routes racine (/health)."""
|
||||||
|
return server_url.rsplit("/api/v1", 1)[0]
|
||||||
|
|
||||||
|
|
||||||
class LeaServerClient:
|
class LeaServerClient:
|
||||||
@@ -67,19 +64,22 @@ class LeaServerClient:
|
|||||||
chat_port: int = 5004,
|
chat_port: int = 5004,
|
||||||
stream_port: int = 5005,
|
stream_port: int = 5005,
|
||||||
) -> None:
|
) -> None:
|
||||||
self._host = server_host or _get_server_host()
|
# URL unifiée : SERVER_URL contient TOUJOURS /api/v1 (convention INC-1).
|
||||||
|
# _stream_url = URL avec /api/v1 (pour les routes API)
|
||||||
|
# _stream_base = URL sans /api/v1 (pour /health uniquement)
|
||||||
|
self._stream_url = _get_server_url()
|
||||||
|
self._stream_base = _get_server_base(self._stream_url)
|
||||||
|
|
||||||
|
# Extraire le host depuis l'URL pour le chat et pour l'affichage
|
||||||
|
try:
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
parsed = urlparse(self._stream_base)
|
||||||
|
self._host = parsed.hostname or "localhost"
|
||||||
|
except Exception:
|
||||||
|
self._host = server_host or "localhost"
|
||||||
|
|
||||||
self._chat_port = chat_port
|
self._chat_port = chat_port
|
||||||
self._stream_port = stream_port
|
self._stream_port = stream_port
|
||||||
|
|
||||||
# En prod, la base URL passe par le reverse proxy HTTPS
|
|
||||||
# (ex. https://lea.labs.laurinebazin.design). Si RPA_SERVER_URL est
|
|
||||||
# definie on l'utilise telle quelle, sinon on reconstruit http://host:port.
|
|
||||||
server_url = os.environ.get("RPA_SERVER_URL", "").strip().rstrip("/")
|
|
||||||
if server_url:
|
|
||||||
self._stream_base = server_url
|
|
||||||
else:
|
|
||||||
self._stream_base = f"http://{self._host}:{self._stream_port}"
|
|
||||||
|
|
||||||
self._chat_base = f"http://{self._host}:{self._chat_port}"
|
self._chat_base = f"http://{self._host}:{self._chat_port}"
|
||||||
|
|
||||||
# Etat de connexion
|
# Etat de connexion
|
||||||
@@ -103,8 +103,8 @@ class LeaServerClient:
|
|||||||
self._api_token = os.environ.get("RPA_API_TOKEN", "")
|
self._api_token = os.environ.get("RPA_API_TOKEN", "")
|
||||||
|
|
||||||
logger.info(
|
logger.info(
|
||||||
"LeaServerClient initialise : chat=%s, stream=%s",
|
"LeaServerClient initialise : chat=%s, stream_url=%s, stream_base=%s",
|
||||||
self._chat_base, self._stream_base,
|
self._chat_base, self._stream_url, self._stream_base,
|
||||||
)
|
)
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
@@ -154,7 +154,11 @@ class LeaServerClient:
|
|||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
def check_connection(self) -> bool:
|
def check_connection(self) -> bool:
|
||||||
"""Tester la connexion au serveur streaming (port 5005)."""
|
"""Tester la connexion au serveur streaming (port 5005).
|
||||||
|
|
||||||
|
Le health check utilise _stream_base (sans /api/v1) car la route
|
||||||
|
/health est a la racine du serveur FastAPI, pas sous /api/v1.
|
||||||
|
"""
|
||||||
try:
|
try:
|
||||||
import requests
|
import requests
|
||||||
resp = requests.get(
|
resp = requests.get(
|
||||||
@@ -227,7 +231,7 @@ class LeaServerClient:
|
|||||||
import requests
|
import requests
|
||||||
headers = self._auth_headers()
|
headers = self._auth_headers()
|
||||||
resp = requests.get(
|
resp = requests.get(
|
||||||
f"{self._stream_base}/api/v1/traces/stream/workflows",
|
f"{self._stream_url}/traces/stream/workflows",
|
||||||
headers=headers,
|
headers=headers,
|
||||||
timeout=10,
|
timeout=10,
|
||||||
)
|
)
|
||||||
@@ -284,7 +288,7 @@ class LeaServerClient:
|
|||||||
while self._polling:
|
while self._polling:
|
||||||
try:
|
try:
|
||||||
resp = req_lib.get(
|
resp = req_lib.get(
|
||||||
f"{self._stream_base}/api/v1/traces/stream/replay/next",
|
f"{self._stream_url}/traces/stream/replay/next",
|
||||||
params={"session_id": self._poll_session_id},
|
params={"session_id": self._poll_session_id},
|
||||||
headers=self._auth_headers(),
|
headers=self._auth_headers(),
|
||||||
timeout=5,
|
timeout=5,
|
||||||
@@ -318,7 +322,7 @@ class LeaServerClient:
|
|||||||
try:
|
try:
|
||||||
import requests
|
import requests
|
||||||
resp = requests.get(
|
resp = requests.get(
|
||||||
f"{self._stream_base}/api/v1/traces/stream/replays",
|
f"{self._stream_url}/traces/stream/replays",
|
||||||
headers=self._auth_headers(),
|
headers=self._auth_headers(),
|
||||||
timeout=5,
|
timeout=5,
|
||||||
)
|
)
|
||||||
@@ -346,7 +350,7 @@ class LeaServerClient:
|
|||||||
try:
|
try:
|
||||||
import requests
|
import requests
|
||||||
requests.post(
|
requests.post(
|
||||||
f"{self._stream_base}/api/v1/traces/stream/replay/result",
|
f"{self._stream_url}/traces/stream/replay/result",
|
||||||
json={
|
json={
|
||||||
"session_id": session_id,
|
"session_id": session_id,
|
||||||
"action_id": action_id,
|
"action_id": action_id,
|
||||||
|
|||||||
@@ -9,6 +9,7 @@ Inclut les endpoints de replay pour renvoyer des ordres d'exécution à l'Agent
|
|||||||
"""
|
"""
|
||||||
|
|
||||||
import atexit
|
import atexit
|
||||||
|
import contextlib
|
||||||
import json
|
import json
|
||||||
import logging
|
import logging
|
||||||
import os
|
import os
|
||||||
@@ -33,6 +34,8 @@ from .audit_trail import AuditTrail, AuditEntry
|
|||||||
from .agent_registry import AgentRegistry, AgentAlreadyEnrolledError
|
from .agent_registry import AgentRegistry, AgentAlreadyEnrolledError
|
||||||
from .stream_processor import StreamProcessor, build_replay_from_raw_events, enrich_click_from_screenshot
|
from .stream_processor import StreamProcessor, build_replay_from_raw_events, enrich_click_from_screenshot
|
||||||
from .worker_stream import StreamWorker
|
from .worker_stream import StreamWorker
|
||||||
|
from .monitor_router import resolve_target_monitor # QW1 — résolution écran cible
|
||||||
|
from .loop_detector import LoopDetector # QW2 — détection de boucle pendant replay
|
||||||
from .execution_plan_runner import (
|
from .execution_plan_runner import (
|
||||||
execution_plan_to_actions,
|
execution_plan_to_actions,
|
||||||
inject_plan_into_queue,
|
inject_plan_into_queue,
|
||||||
@@ -219,6 +222,11 @@ from .replay_engine import (
|
|||||||
_is_learned_workflow,
|
_is_learned_workflow,
|
||||||
_edge_to_normalized_actions,
|
_edge_to_normalized_actions,
|
||||||
_substitute_variables,
|
_substitute_variables,
|
||||||
|
_resolve_runtime_vars,
|
||||||
|
_SERVER_SIDE_ACTION_TYPES,
|
||||||
|
_handle_extract_text_action,
|
||||||
|
_handle_extract_table_action,
|
||||||
|
_handle_t2a_decision_action,
|
||||||
_expand_compound_steps,
|
_expand_compound_steps,
|
||||||
_pre_check_screen_state as _pre_check_screen_state_impl,
|
_pre_check_screen_state as _pre_check_screen_state_impl,
|
||||||
_detect_popup_hint as _detect_popup_hint_impl,
|
_detect_popup_hint as _detect_popup_hint_impl,
|
||||||
@@ -292,6 +300,20 @@ app.add_middleware(
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@app.middleware("http")
|
||||||
|
async def url_compat_rewrite(request: Request, call_next):
|
||||||
|
"""Rétrocompatibilité : réécriture des anciennes URLs sans préfixe /api/v1.
|
||||||
|
|
||||||
|
Certains agents clients (Léa V1 gelée) envoient sur /traces/stream/...
|
||||||
|
au lieu de /api/v1/traces/stream/... Ce middleware redirige silencieusement.
|
||||||
|
"""
|
||||||
|
path = request.url.path
|
||||||
|
if path.startswith("/traces/stream/") and not path.startswith("/api/v1/"):
|
||||||
|
new_path = "/api/v1" + path
|
||||||
|
request.scope["path"] = new_path
|
||||||
|
return await call_next(request)
|
||||||
|
|
||||||
|
|
||||||
@app.middleware("http")
|
@app.middleware("http")
|
||||||
async def security_headers_middleware(request: Request, call_next):
|
async def security_headers_middleware(request: Request, call_next):
|
||||||
"""Ajouter les headers de sécurité sur toutes les réponses."""
|
"""Ajouter les headers de sécurité sur toutes les réponses."""
|
||||||
@@ -341,6 +363,18 @@ REPLAY_LOCK_FILE = _DATA_DIR / "_replay_active.lock"
|
|||||||
processor = StreamProcessor(data_dir=str(LIVE_SESSIONS_DIR))
|
processor = StreamProcessor(data_dir=str(LIVE_SESSIONS_DIR))
|
||||||
worker = StreamWorker(live_dir=str(LIVE_SESSIONS_DIR), processor=processor)
|
worker = StreamWorker(live_dir=str(LIVE_SESSIONS_DIR), processor=processor)
|
||||||
|
|
||||||
|
# QW2 — LoopDetector singleton lazy (utilise le CLIP embedder du processor)
|
||||||
|
_loop_detector: Optional["LoopDetector"] = None
|
||||||
|
|
||||||
|
|
||||||
|
def _get_loop_detector() -> "LoopDetector":
|
||||||
|
"""Singleton lazy — crée le LoopDetector avec le CLIP embedder du processor."""
|
||||||
|
global _loop_detector
|
||||||
|
if _loop_detector is None:
|
||||||
|
embedder = getattr(processor, "_clip_embedder", None)
|
||||||
|
_loop_detector = LoopDetector(clip_embedder=embedder)
|
||||||
|
return _loop_detector
|
||||||
|
|
||||||
# Registre des postes Lea enroles (table enrolled_agents dans rpa_data.db)
|
# Registre des postes Lea enroles (table enrolled_agents dans rpa_data.db)
|
||||||
# Emplacement configurable via RPA_AGENTS_DB_PATH pour les tests.
|
# Emplacement configurable via RPA_AGENTS_DB_PATH pour les tests.
|
||||||
_AGENTS_DB_PATH = os.environ.get(
|
_AGENTS_DB_PATH = os.environ.get(
|
||||||
@@ -472,6 +506,33 @@ _pending_lock = threading.Lock()
|
|||||||
# Chaque session a une queue d'actions à exécuter et un état de replay
|
# Chaque session a une queue d'actions à exécuter et un état de replay
|
||||||
# =========================================================================
|
# =========================================================================
|
||||||
_replay_lock = threading.Lock()
|
_replay_lock = threading.Lock()
|
||||||
|
|
||||||
|
|
||||||
|
# Context manager async pour acquérir _replay_lock sans bloquer l'event loop
|
||||||
|
# FastAPI. Pattern complémentaire au commit 35b27ae49 (lock async sur
|
||||||
|
# /replay/next) et 87dbe8c5f (get_replay_status non-bloquant) : tous les
|
||||||
|
# endpoints `async def` qui faisaient `with _replay_lock:` synchrone gelaient
|
||||||
|
# l'event loop dès qu'une opération longue tenait le lock dans un autre
|
||||||
|
# thread. Avec ce helper, l'acquire passe par run_in_executor (l'event loop
|
||||||
|
# reste libre pour servir les autres requêtes pendant l'attente). Si le lock
|
||||||
|
# est tenu plus de `timeout` secondes, on retourne 503 plutôt que de geler le
|
||||||
|
# serveur.
|
||||||
|
@contextlib.asynccontextmanager
|
||||||
|
async def _async_replay_lock(timeout: float = 4.5):
|
||||||
|
import asyncio
|
||||||
|
loop = asyncio.get_event_loop()
|
||||||
|
acquired = await loop.run_in_executor(None, _replay_lock.acquire, True, timeout)
|
||||||
|
if not acquired:
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=503,
|
||||||
|
detail=f"Serveur occupé (lock _replay tenu > {timeout}s) — réessayer",
|
||||||
|
)
|
||||||
|
try:
|
||||||
|
yield
|
||||||
|
finally:
|
||||||
|
_replay_lock.release()
|
||||||
|
|
||||||
|
|
||||||
# session_id -> liste d'actions en attente (FIFO)
|
# session_id -> liste d'actions en attente (FIFO)
|
||||||
_replay_queues: Dict[str, List[Dict[str, Any]]] = defaultdict(list)
|
_replay_queues: Dict[str, List[Dict[str, Any]]] = defaultdict(list)
|
||||||
# machine_id -> session_id (mapping pour le replay ciblé par machine)
|
# machine_id -> session_id (mapping pour le replay ciblé par machine)
|
||||||
@@ -493,6 +554,7 @@ class ReplayRequest(BaseModel):
|
|||||||
session_id: str
|
session_id: str
|
||||||
machine_id: Optional[str] = None # Machine cible pour le replay (multi-machine)
|
machine_id: Optional[str] = None # Machine cible pour le replay (multi-machine)
|
||||||
params: Optional[Dict[str, Any]] = None
|
params: Optional[Dict[str, Any]] = None
|
||||||
|
variables: Optional[Dict[str, Any]] = None # Variables runtime initiales (templating {{var}})
|
||||||
|
|
||||||
|
|
||||||
class RawReplayRequest(BaseModel):
|
class RawReplayRequest(BaseModel):
|
||||||
@@ -501,6 +563,11 @@ class RawReplayRequest(BaseModel):
|
|||||||
session_id: str = ""
|
session_id: str = ""
|
||||||
machine_id: Optional[str] = None # Machine cible (multi-machine)
|
machine_id: Optional[str] = None # Machine cible (multi-machine)
|
||||||
task_description: str = ""
|
task_description: str = ""
|
||||||
|
# Paramètres runtime du replay (lus dans replay_state.params côté pipeline).
|
||||||
|
# Notamment execution_mode : "autonomous" (défaut, pause_for_human skippée)
|
||||||
|
# ou "supervised" (pause_for_human bloque jusqu'à validation humaine via
|
||||||
|
# PauseDialog VWB). Cf. replay_engine.py / api_stream.py:2964.
|
||||||
|
params: Optional[Dict[str, Any]] = None
|
||||||
|
|
||||||
|
|
||||||
class SingleActionRequest(BaseModel):
|
class SingleActionRequest(BaseModel):
|
||||||
@@ -747,6 +814,21 @@ async def startup():
|
|||||||
_cleanup_thread = threading.Thread(target=_cleanup_loop, daemon=True, name="replay_cleanup")
|
_cleanup_thread = threading.Thread(target=_cleanup_loop, daemon=True, name="replay_cleanup")
|
||||||
_cleanup_thread.start()
|
_cleanup_thread.start()
|
||||||
|
|
||||||
|
# Préchargement EasyOCR en arrière-plan : sans ça, le 1er extract_text /
|
||||||
|
# extract_table déclenche un cold start de ~3-5s qui bloque l'event loop
|
||||||
|
# FastAPI (constaté 2026-05-05 : streaming server inaccessible 2 min).
|
||||||
|
# Le thread tourne pendant que le boot continue ; le 1er appel OCR sera rapide.
|
||||||
|
def _preload_easyocr():
|
||||||
|
try:
|
||||||
|
t0 = time.time()
|
||||||
|
from core.llm.ocr_extractor import _get_reader
|
||||||
|
_get_reader()
|
||||||
|
logger.info("[OCR] EasyOCR préchargé (fr+en, CPU) en %.1fs", time.time() - t0)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning("[OCR] Échec préchargement EasyOCR : %s", e)
|
||||||
|
|
||||||
|
threading.Thread(target=_preload_easyocr, daemon=True, name="preload_easyocr").start()
|
||||||
|
|
||||||
logger.info(
|
logger.info(
|
||||||
"API Streaming démarrée — StreamProcessor, Worker et Cleanup prêts. "
|
"API Streaming démarrée — StreamProcessor, Worker et Cleanup prêts. "
|
||||||
"VLM Worker dans un process séparé (run_worker.py)."
|
"VLM Worker dans un process séparé (run_worker.py)."
|
||||||
@@ -1933,7 +2015,7 @@ async def start_replay(request: ReplayRequest):
|
|||||||
resolved_machine_id = target_machine_id or (session_obj.machine_id if session_obj else "default")
|
resolved_machine_id = target_machine_id or (session_obj.machine_id if session_obj else "default")
|
||||||
|
|
||||||
# Injecter les actions dans la queue de la session
|
# Injecter les actions dans la queue de la session
|
||||||
with _replay_lock:
|
async with _async_replay_lock():
|
||||||
_replay_queues[session_id] = list(actions) # Remplacer la queue existante
|
_replay_queues[session_id] = list(actions) # Remplacer la queue existante
|
||||||
_replay_states[replay_id] = _create_replay_state(
|
_replay_states[replay_id] = _create_replay_state(
|
||||||
replay_id=replay_id,
|
replay_id=replay_id,
|
||||||
@@ -1944,6 +2026,11 @@ async def start_replay(request: ReplayRequest):
|
|||||||
machine_id=resolved_machine_id,
|
machine_id=resolved_machine_id,
|
||||||
actions=actions,
|
actions=actions,
|
||||||
)
|
)
|
||||||
|
# Pré-injection des variables runtime (templating {{var}} sur by_text,
|
||||||
|
# text, target_spec.* etc.). Permet à l'orchestrateur d'appeler ce
|
||||||
|
# workflow avec p.ex. variables={"patient_id": "25003284"} pour boucler.
|
||||||
|
if request.variables:
|
||||||
|
_replay_states[replay_id]["variables"].update(request.variables)
|
||||||
# Enregistrer le mapping machine -> session pour le replay ciblé
|
# Enregistrer le mapping machine -> session pour le replay ciblé
|
||||||
if resolved_machine_id and resolved_machine_id != "default":
|
if resolved_machine_id and resolved_machine_id != "default":
|
||||||
_machine_replay_target[resolved_machine_id] = session_id
|
_machine_replay_target[resolved_machine_id] = session_id
|
||||||
@@ -2028,7 +2115,7 @@ async def start_raw_replay(request: RawReplayRequest):
|
|||||||
session_obj = processor.session_manager.get_session(session_id)
|
session_obj = processor.session_manager.get_session(session_id)
|
||||||
resolved_machine_id = target_machine_id or (session_obj.machine_id if session_obj else "default")
|
resolved_machine_id = target_machine_id or (session_obj.machine_id if session_obj else "default")
|
||||||
|
|
||||||
with _replay_lock:
|
async with _async_replay_lock():
|
||||||
# ── Nettoyage : annuler les replays bloqués pour cette machine ──
|
# ── Nettoyage : annuler les replays bloqués pour cette machine ──
|
||||||
# Un replay en paused_need_help bloque tous les suivants.
|
# Un replay en paused_need_help bloque tous les suivants.
|
||||||
# Quand on lance un nouveau replay, les anciens sont obsolètes.
|
# Quand on lance un nouveau replay, les anciens sont obsolètes.
|
||||||
@@ -2055,7 +2142,7 @@ async def start_raw_replay(request: RawReplayRequest):
|
|||||||
workflow_id=f"free_task:{task[:50]}",
|
workflow_id=f"free_task:{task[:50]}",
|
||||||
session_id=session_id,
|
session_id=session_id,
|
||||||
total_actions=len(actions),
|
total_actions=len(actions),
|
||||||
params={},
|
params=dict(request.params or {}),
|
||||||
machine_id=resolved_machine_id,
|
machine_id=resolved_machine_id,
|
||||||
actions=actions,
|
actions=actions,
|
||||||
)
|
)
|
||||||
@@ -2248,7 +2335,7 @@ async def replay_from_session(
|
|||||||
# ── 5. Injecter dans la queue de replay ──
|
# ── 5. Injecter dans la queue de replay ──
|
||||||
replay_id = f"replay_sess_{uuid.uuid4().hex[:8]}"
|
replay_id = f"replay_sess_{uuid.uuid4().hex[:8]}"
|
||||||
|
|
||||||
with _replay_lock:
|
async with _async_replay_lock():
|
||||||
_replay_queues[target_session_id] = list(actions)
|
_replay_queues[target_session_id] = list(actions)
|
||||||
_replay_states[replay_id] = _create_replay_state(
|
_replay_states[replay_id] = _create_replay_state(
|
||||||
replay_id=replay_id,
|
replay_id=replay_id,
|
||||||
@@ -2339,7 +2426,7 @@ async def enqueue_single_action(request: SingleActionRequest):
|
|||||||
|
|
||||||
action_id = action["action_id"]
|
action_id = action["action_id"]
|
||||||
|
|
||||||
with _replay_lock:
|
async with _async_replay_lock():
|
||||||
_replay_queues[session_id].append(action)
|
_replay_queues[session_id].append(action)
|
||||||
|
|
||||||
logger.info(
|
logger.info(
|
||||||
@@ -2505,7 +2592,7 @@ async def launch_replay_from_plan(request: PlanReplayRequest):
|
|||||||
or (session_obj.machine_id if session_obj else "default")
|
or (session_obj.machine_id if session_obj else "default")
|
||||||
)
|
)
|
||||||
|
|
||||||
with _replay_lock:
|
async with _async_replay_lock():
|
||||||
_replay_queues[target_session_id] = list(validated)
|
_replay_queues[target_session_id] = list(validated)
|
||||||
_replay_states[replay_id] = _create_replay_state(
|
_replay_states[replay_id] = _create_replay_state(
|
||||||
replay_id=replay_id,
|
replay_id=replay_id,
|
||||||
@@ -2744,8 +2831,29 @@ async def get_next_action(session_id: str, machine_id: str = "default"):
|
|||||||
|
|
||||||
Si la session de l'agent n'a pas d'actions en attente, cherche dans les
|
Si la session de l'agent n'a pas d'actions en attente, cherche dans les
|
||||||
autres queues de la MÊME machine (pas cross-machine).
|
autres queues de la MÊME machine (pas cross-machine).
|
||||||
|
|
||||||
|
Acquire timeout : si une action serveur lente (extract_text OCR,
|
||||||
|
t2a_decision LLM) tient le lock, on retourne immédiatement
|
||||||
|
{action: None, server_busy: True} avant que le client ne timeout à 5s.
|
||||||
|
Sans cela, des actions seraient popped serveur puis envoyées sur des
|
||||||
|
sockets clients déjà fermées par timeout — perdues silencieusement.
|
||||||
|
|
||||||
|
L'acquire et les actions serveur lentes sont exécutés via
|
||||||
|
run_in_executor : sinon l'appel synchrone bloque l'event loop FastAPI
|
||||||
|
(single-threaded) et même les polls qui devraient recevoir server_busy
|
||||||
|
sont bloqués jusqu'à libération — ce qui annule l'effet du timeout.
|
||||||
"""
|
"""
|
||||||
with _replay_lock:
|
import asyncio
|
||||||
|
loop = asyncio.get_event_loop()
|
||||||
|
acquired = await loop.run_in_executor(None, _replay_lock.acquire, True, 4.5)
|
||||||
|
if not acquired:
|
||||||
|
return {
|
||||||
|
"action": None,
|
||||||
|
"session_id": session_id,
|
||||||
|
"machine_id": machine_id,
|
||||||
|
"server_busy": True,
|
||||||
|
}
|
||||||
|
try:
|
||||||
# Verifier si le replay est en pause supervisee (target_not_found).
|
# Verifier si le replay est en pause supervisee (target_not_found).
|
||||||
# Dans ce cas, NE PAS envoyer d'action — attendre l'intervention utilisateur.
|
# Dans ce cas, NE PAS envoyer d'action — attendre l'intervention utilisateur.
|
||||||
for state in _replay_states.values():
|
for state in _replay_states.values():
|
||||||
@@ -2810,6 +2918,7 @@ async def get_next_action(session_id: str, machine_id: str = "default"):
|
|||||||
break
|
break
|
||||||
if target_state:
|
if target_state:
|
||||||
queue = target_queue
|
queue = target_queue
|
||||||
|
owning_replay = target_state
|
||||||
_replay_queues[session_id] = target_queue
|
_replay_queues[session_id] = target_queue
|
||||||
del _replay_queues[target_sid]
|
del _replay_queues[target_sid]
|
||||||
target_state["session_id"] = session_id
|
target_state["session_id"] = session_id
|
||||||
@@ -2826,6 +2935,7 @@ async def get_next_action(session_id: str, machine_id: str = "default"):
|
|||||||
other_queue = _replay_queues.get(other_sid, [])
|
other_queue = _replay_queues.get(other_sid, [])
|
||||||
if other_queue:
|
if other_queue:
|
||||||
queue = other_queue
|
queue = other_queue
|
||||||
|
owning_replay = state
|
||||||
_replay_queues[session_id] = other_queue
|
_replay_queues[session_id] = other_queue
|
||||||
del _replay_queues[other_sid]
|
del _replay_queues[other_sid]
|
||||||
state["session_id"] = session_id
|
state["session_id"] = session_id
|
||||||
@@ -2836,9 +2946,148 @@ async def get_next_action(session_id: str, machine_id: str = "default"):
|
|||||||
if not queue:
|
if not queue:
|
||||||
return {"action": None, "session_id": session_id, "machine_id": machine_id}
|
return {"action": None, "session_id": session_id, "machine_id": machine_id}
|
||||||
|
|
||||||
# Peek à la prochaine action SANS la retirer (pour le pre-check)
|
# ── Boucle de traitement : actions serveur (extract_text, t2a_decision)
|
||||||
|
# exécutées entièrement côté serveur jusqu'à trouver une action visuelle
|
||||||
|
# à transmettre à l'Agent V1 ou un pause_for_human qui bloque le replay.
|
||||||
|
action = None
|
||||||
|
while queue:
|
||||||
action = queue[0]
|
action = queue[0]
|
||||||
|
|
||||||
|
# Résoudre les variables runtime ({{var}} et {{var.field}})
|
||||||
|
if owning_replay is not None:
|
||||||
|
runtime_vars = owning_replay.get("variables") or {}
|
||||||
|
if runtime_vars:
|
||||||
|
action = _resolve_runtime_vars(action, runtime_vars)
|
||||||
|
|
||||||
|
type_ = action.get("type")
|
||||||
|
|
||||||
|
# pause_for_human : pause supervisée si safety_level/safety_checks ou mode supervised,
|
||||||
|
# sinon no-op en mode autonome (skip).
|
||||||
|
if type_ == "pause_for_human":
|
||||||
|
_params = action.get("parameters") or {}
|
||||||
|
_exec_mode = (
|
||||||
|
(owning_replay or {}).get("params", {}).get("execution_mode", "autonomous")
|
||||||
|
if owning_replay else "autonomous"
|
||||||
|
)
|
||||||
|
_has_safety_decl = bool(_params.get("safety_level") or _params.get("safety_checks"))
|
||||||
|
_is_supervised = _exec_mode != "autonomous"
|
||||||
|
|
||||||
|
if owning_replay is not None and (_has_safety_decl or _is_supervised):
|
||||||
|
# QW4 — Construire le payload de pause enrichi (déclaratif + LLM contextuel)
|
||||||
|
try:
|
||||||
|
from agent_v0.server_v1.safety_checks_provider import build_pause_payload
|
||||||
|
last_screenshot_path = owning_replay.get("last_screenshot")
|
||||||
|
payload = build_pause_payload(action, owning_replay, last_screenshot_path)
|
||||||
|
owning_replay["safety_checks"] = payload.checks
|
||||||
|
owning_replay["pause_payload"] = {
|
||||||
|
"checks": payload.checks,
|
||||||
|
"pause_reason": payload.pause_reason,
|
||||||
|
"message": payload.message,
|
||||||
|
}
|
||||||
|
if payload.message:
|
||||||
|
owning_replay["pause_message"] = payload.message
|
||||||
|
# Bus event d'observabilité (pattern QW1/QW2 = logger.info)
|
||||||
|
logger.info(
|
||||||
|
"[BUS] lea:safety_checks_generated replay=%s count=%d sources=%s",
|
||||||
|
owning_replay.get("replay_id", "?"),
|
||||||
|
len(payload.checks),
|
||||||
|
[c["source"] for c in payload.checks],
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning("QW4 build_pause_payload échec (%s) — pause sans checks", e)
|
||||||
|
owning_replay["safety_checks"] = []
|
||||||
|
|
||||||
|
# Conserver le contexte de l'action (audit + reprise)
|
||||||
|
owning_replay["failed_action"] = {
|
||||||
|
"action_id": action.get("action_id"),
|
||||||
|
"type": "pause_for_human",
|
||||||
|
"reason": "user_request",
|
||||||
|
}
|
||||||
|
owning_replay["status"] = "paused_need_help"
|
||||||
|
queue.pop(0)
|
||||||
|
_replay_queues[session_id] = queue
|
||||||
|
return {"action": None, "session_id": session_id, "machine_id": machine_id}
|
||||||
|
|
||||||
|
# Mode autonome sans safety_checks → skip (comportement legacy)
|
||||||
|
logger.info(
|
||||||
|
"pause_for_human ignorée (mode autonome) — replay %s continue",
|
||||||
|
owning_replay["replay_id"] if owning_replay else "?"
|
||||||
|
)
|
||||||
|
queue.pop(0)
|
||||||
|
_replay_queues[session_id] = queue
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Actions serveur : exécuter HORS event loop pour ne pas bloquer
|
||||||
|
# les autres polls (extract_text OCR ~5s, t2a_decision LLM ~8-13s).
|
||||||
|
# Le lock reste tenu (queue cohérente) mais l'event loop est libre,
|
||||||
|
# donc les polls concurrents peuvent recevoir {server_busy: True}.
|
||||||
|
#
|
||||||
|
# Borne dure 180s par action : un hang d'EasyOCR / Ollama / I/O
|
||||||
|
# ne doit JAMAIS pouvoir tenir _replay_lock indéfiniment, sinon
|
||||||
|
# tous les endpoints sous lock (get_replay_status, /replay/next…)
|
||||||
|
# gèlent le serveur. TimeoutError est rattrapée par l'except
|
||||||
|
# Exception ci-dessous → queue.pop(0) → on passe à la suite.
|
||||||
|
if type_ in _SERVER_SIDE_ACTION_TYPES and owning_replay is not None:
|
||||||
|
try:
|
||||||
|
if type_ == "extract_text":
|
||||||
|
await asyncio.wait_for(
|
||||||
|
loop.run_in_executor(
|
||||||
|
None,
|
||||||
|
_handle_extract_text_action,
|
||||||
|
action, owning_replay, session_id, _last_heartbeat,
|
||||||
|
),
|
||||||
|
timeout=180,
|
||||||
|
)
|
||||||
|
elif type_ == "extract_table":
|
||||||
|
await asyncio.wait_for(
|
||||||
|
loop.run_in_executor(
|
||||||
|
None,
|
||||||
|
_handle_extract_table_action,
|
||||||
|
action, owning_replay, session_id, _last_heartbeat,
|
||||||
|
),
|
||||||
|
timeout=180,
|
||||||
|
)
|
||||||
|
elif type_ == "t2a_decision":
|
||||||
|
await asyncio.wait_for(
|
||||||
|
loop.run_in_executor(
|
||||||
|
None,
|
||||||
|
_handle_t2a_decision_action,
|
||||||
|
action, owning_replay,
|
||||||
|
),
|
||||||
|
timeout=180,
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Action serveur {type_} a levé : {e}")
|
||||||
|
queue.pop(0)
|
||||||
|
_replay_queues[session_id] = queue
|
||||||
|
continue # action suivante
|
||||||
|
|
||||||
|
# Clic conditionnel : si l'action a un paramètre "condition", évaluer la variable
|
||||||
|
# Format : "dec.critere1_valide" → runtime_vars["dec"]["critere1_valide"]
|
||||||
|
condition_key = (action.get("parameters") or {}).get("condition")
|
||||||
|
if condition_key and owning_replay is not None:
|
||||||
|
runtime_vars = owning_replay.get("variables") or {}
|
||||||
|
parts = condition_key.split(".", 1)
|
||||||
|
if len(parts) == 2:
|
||||||
|
val = (runtime_vars.get(parts[0]) or {}).get(parts[1])
|
||||||
|
else:
|
||||||
|
val = runtime_vars.get(parts[0])
|
||||||
|
if not val:
|
||||||
|
logger.info("Clic conditionnel ignoré (%s=%s) — action %s",
|
||||||
|
condition_key, val, action.get("action_id", "?"))
|
||||||
|
queue.pop(0)
|
||||||
|
_replay_queues[session_id] = queue
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Action visuelle : sortir de la boucle pour la transmettre à l'Agent V1
|
||||||
|
break
|
||||||
|
|
||||||
|
# Si la queue s'est vidée après les exécutions serveur, rien à transmettre
|
||||||
|
if not queue or action is None:
|
||||||
|
return {"action": None, "session_id": session_id, "machine_id": machine_id}
|
||||||
|
finally:
|
||||||
|
_replay_lock.release()
|
||||||
|
|
||||||
# ---- Pre-check écran (optionnel, non bloquant) ----
|
# ---- Pre-check écran (optionnel, non bloquant) ----
|
||||||
# Ne s'applique qu'aux actions qui ont un from_node (actions de workflow,
|
# Ne s'applique qu'aux actions qui ont un from_node (actions de workflow,
|
||||||
# pas les wait/retry auto-injectés ni les actions Copilot/Agent Libre)
|
# pas les wait/retry auto-injectés ni les actions Copilot/Agent Libre)
|
||||||
@@ -2901,7 +3150,7 @@ async def get_next_action(session_id: str, machine_id: str = "default"):
|
|||||||
auth_actions = _auth_handler.get_auth_actions(auth_request)
|
auth_actions = _auth_handler.get_auth_actions(auth_request)
|
||||||
if auth_actions:
|
if auth_actions:
|
||||||
# Injecter les actions d'auth en tête de queue (avant l'action bloquée)
|
# Injecter les actions d'auth en tête de queue (avant l'action bloquée)
|
||||||
with _replay_lock:
|
async with _async_replay_lock():
|
||||||
current_q = _replay_queues.get(session_id, [])
|
current_q = _replay_queues.get(session_id, [])
|
||||||
_replay_queues[session_id] = auth_actions + current_q
|
_replay_queues[session_id] = auth_actions + current_q
|
||||||
logger.info(
|
logger.info(
|
||||||
@@ -2910,7 +3159,7 @@ async def get_next_action(session_id: str, machine_id: str = "default"):
|
|||||||
f"type={auth_request.auth_type} (confiance={auth_request.confidence:.2f})"
|
f"type={auth_request.auth_type} (confiance={auth_request.confidence:.2f})"
|
||||||
)
|
)
|
||||||
# Retourner la première action d'auth immédiatement
|
# Retourner la première action d'auth immédiatement
|
||||||
with _replay_lock:
|
async with _async_replay_lock():
|
||||||
first_auth = _replay_queues[session_id].pop(0)
|
first_auth = _replay_queues[session_id].pop(0)
|
||||||
return {
|
return {
|
||||||
"action": first_auth,
|
"action": first_auth,
|
||||||
@@ -2958,7 +3207,7 @@ async def get_next_action(session_id: str, machine_id: str = "default"):
|
|||||||
}
|
}
|
||||||
|
|
||||||
# Pre-check OK (ou skip) : retirer l'action de la queue et l'envoyer
|
# Pre-check OK (ou skip) : retirer l'action de la queue et l'envoyer
|
||||||
with _replay_lock:
|
async with _async_replay_lock():
|
||||||
current_queue = _replay_queues.get(session_id, [])
|
current_queue = _replay_queues.get(session_id, [])
|
||||||
if current_queue and current_queue[0].get("action_id") == action.get("action_id"):
|
if current_queue and current_queue[0].get("action_id") == action.get("action_id"):
|
||||||
current_queue.pop(0)
|
current_queue.pop(0)
|
||||||
@@ -3004,6 +3253,51 @@ async def get_next_action(session_id: str, machine_id: str = "default"):
|
|||||||
f"{_precheck_sim}"
|
f"{_precheck_sim}"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# QW1 — Résoudre l'écran cible et joindre l'info à l'action
|
||||||
|
# Cascade : action.monitor_index → session.last_focused_monitor → composite_fallback
|
||||||
|
try:
|
||||||
|
session_qw1 = processor.session_manager.get_session(session_id)
|
||||||
|
last_window_info_qw1 = (
|
||||||
|
session_qw1.last_window_info if session_qw1 is not None else {}
|
||||||
|
) or {}
|
||||||
|
session_state_qw1 = {
|
||||||
|
"monitors_geometry": last_window_info_qw1.get("monitors_geometry", []),
|
||||||
|
"last_focused_monitor": last_window_info_qw1.get("monitor_index"),
|
||||||
|
}
|
||||||
|
target = resolve_target_monitor(action, session_state_qw1)
|
||||||
|
action["monitor_resolution"] = {
|
||||||
|
"idx": target.idx,
|
||||||
|
"offset_x": target.offset_x,
|
||||||
|
"offset_y": target.offset_y,
|
||||||
|
"w": target.w,
|
||||||
|
"h": target.h,
|
||||||
|
"source": target.source,
|
||||||
|
}
|
||||||
|
# QW1 — Émission bus lea:monitor_routed (no-op si bus indisponible)
|
||||||
|
# Le serveur streaming n'a pas de SocketIO local : on logge en INFO
|
||||||
|
# bien lisible. Un consommateur (agent_chat / dashboard) peut tailer
|
||||||
|
# `journalctl -u rpa-streaming | grep '\[BUS\] lea:monitor_routed'`.
|
||||||
|
try:
|
||||||
|
_replay_id_bus = (
|
||||||
|
owning_replay.get("replay_id") if owning_replay else None
|
||||||
|
)
|
||||||
|
logger.info(
|
||||||
|
"[BUS] lea:monitor_routed replay=%s action=%s idx=%d source=%s "
|
||||||
|
"offset=(%d,%d) wh=(%d,%d)",
|
||||||
|
_replay_id_bus,
|
||||||
|
action.get("action_id"),
|
||||||
|
target.idx,
|
||||||
|
target.source,
|
||||||
|
target.offset_x,
|
||||||
|
target.offset_y,
|
||||||
|
target.w,
|
||||||
|
target.h,
|
||||||
|
)
|
||||||
|
except Exception as _e_bus:
|
||||||
|
logger.debug("emit lea:monitor_routed échec (non bloquant): %s", _e_bus)
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug("QW1 monitor_resolution skip (%s)", e)
|
||||||
|
|
||||||
response: Dict[str, Any] = {
|
response: Dict[str, Any] = {
|
||||||
"action": action,
|
"action": action,
|
||||||
"session_id": session_id,
|
"session_id": session_id,
|
||||||
@@ -3045,7 +3339,7 @@ async def report_action_result(report: ReplayResultReport):
|
|||||||
)
|
)
|
||||||
|
|
||||||
# Trouver le replay correspondant à cette session
|
# Trouver le replay correspondant à cette session
|
||||||
with _replay_lock:
|
async with _async_replay_lock():
|
||||||
replay_state = None
|
replay_state = None
|
||||||
for state in _replay_states.values():
|
for state in _replay_states.values():
|
||||||
if state["session_id"] == session_id and state["status"] == "running":
|
if state["session_id"] == session_id and state["status"] == "running":
|
||||||
@@ -3078,7 +3372,7 @@ async def report_action_result(report: ReplayResultReport):
|
|||||||
# Mettre à jour le dernier screenshot reçu
|
# Mettre à jour le dernier screenshot reçu
|
||||||
screenshot_after = report.screenshot_after or report.screenshot
|
screenshot_after = report.screenshot_after or report.screenshot
|
||||||
if screenshot_after:
|
if screenshot_after:
|
||||||
with _replay_lock:
|
async with _async_replay_lock():
|
||||||
replay_state["last_screenshot"] = screenshot_after
|
replay_state["last_screenshot"] = screenshot_after
|
||||||
|
|
||||||
# === Vérification post-action ===
|
# === Vérification post-action ===
|
||||||
@@ -3149,7 +3443,7 @@ async def report_action_result(report: ReplayResultReport):
|
|||||||
|
|
||||||
# Stocker le screenshot actuel comme "before" pour la prochaine action
|
# Stocker le screenshot actuel comme "before" pour la prochaine action
|
||||||
if screenshot_after:
|
if screenshot_after:
|
||||||
with _replay_lock:
|
async with _async_replay_lock():
|
||||||
replay_state["_last_screenshot_before"] = screenshot_after
|
replay_state["_last_screenshot_before"] = screenshot_after
|
||||||
|
|
||||||
# [REPLAY] log structuré de la décision de vérification
|
# [REPLAY] log structuré de la décision de vérification
|
||||||
@@ -3171,7 +3465,7 @@ async def report_action_result(report: ReplayResultReport):
|
|||||||
)
|
)
|
||||||
|
|
||||||
# === Enregistrer le résultat ===
|
# === Enregistrer le résultat ===
|
||||||
with _replay_lock:
|
async with _async_replay_lock():
|
||||||
result_entry = {
|
result_entry = {
|
||||||
"action_id": action_id,
|
"action_id": action_id,
|
||||||
"success": report.success,
|
"success": report.success,
|
||||||
@@ -3331,7 +3625,7 @@ async def report_action_result(report: ReplayResultReport):
|
|||||||
except Exception as _mem_exc:
|
except Exception as _mem_exc:
|
||||||
logger.debug("Memory record skipped : %s", _mem_exc)
|
logger.debug("Memory record skipped : %s", _mem_exc)
|
||||||
|
|
||||||
with _replay_lock:
|
async with _async_replay_lock():
|
||||||
# === Logique de retry / success / failure ===
|
# === Logique de retry / success / failure ===
|
||||||
if report.success and (verification is None or verification.verified):
|
if report.success and (verification is None or verification.verified):
|
||||||
# Action réussie (vérification OK ou pas de vérification)
|
# Action réussie (vérification OK ou pas de vérification)
|
||||||
@@ -3742,6 +4036,82 @@ async def report_action_result(report: ReplayResultReport):
|
|||||||
f"— worker VLM autorisé à reprendre"
|
f"— worker VLM autorisé à reprendre"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# ===================================================================
|
||||||
|
# QW2 — LoopDetector : alimentation des anneaux + évaluation
|
||||||
|
# ===================================================================
|
||||||
|
# On n'évalue que si le replay est encore "running" — inutile de
|
||||||
|
# pauser quelque chose de déjà completed/error/paused.
|
||||||
|
if replay_state["status"] == "running":
|
||||||
|
# Snapshot image (PIL) dans l'anneau
|
||||||
|
try:
|
||||||
|
from PIL import Image
|
||||||
|
ss_raw = screenshot_after or replay_state.get("last_screenshot")
|
||||||
|
img = None
|
||||||
|
if isinstance(ss_raw, str) and ss_raw:
|
||||||
|
if os.path.isfile(ss_raw):
|
||||||
|
img = Image.open(ss_raw).copy() # détache du file handle
|
||||||
|
else:
|
||||||
|
# Possible base64 — décoder
|
||||||
|
try:
|
||||||
|
import base64
|
||||||
|
import io as _io
|
||||||
|
img_bytes = base64.b64decode(ss_raw, validate=False)
|
||||||
|
img = Image.open(_io.BytesIO(img_bytes)).copy()
|
||||||
|
except Exception:
|
||||||
|
img = None
|
||||||
|
if img is not None:
|
||||||
|
replay_state.setdefault("_screenshot_history", []).append(img)
|
||||||
|
replay_state["_screenshot_history"] = replay_state["_screenshot_history"][-5:]
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug("LoopDetector: snapshot historique échoué: %s", e)
|
||||||
|
|
||||||
|
# Snapshot signature de l'action courante
|
||||||
|
try:
|
||||||
|
_act_pos = report.actual_position or {}
|
||||||
|
action_sig = {
|
||||||
|
"type": (original_action or {}).get("type")
|
||||||
|
or replay_state.get("_last_action_type", ""),
|
||||||
|
"x_pct": _act_pos.get("x_pct") if isinstance(_act_pos, dict)
|
||||||
|
else (original_action or {}).get("x_pct"),
|
||||||
|
"y_pct": _act_pos.get("y_pct") if isinstance(_act_pos, dict)
|
||||||
|
else (original_action or {}).get("y_pct"),
|
||||||
|
}
|
||||||
|
replay_state.setdefault("_action_history", []).append(action_sig)
|
||||||
|
replay_state["_action_history"] = replay_state["_action_history"][-5:]
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug("LoopDetector: snapshot action_sig échoué: %s", e)
|
||||||
|
|
||||||
|
# Évaluation (silencieux si rien)
|
||||||
|
try:
|
||||||
|
verdict = _get_loop_detector().evaluate(
|
||||||
|
replay_state,
|
||||||
|
screenshots=replay_state.get("_screenshot_history", []),
|
||||||
|
actions=replay_state.get("_action_history", []),
|
||||||
|
)
|
||||||
|
if verdict.detected:
|
||||||
|
replay_state["status"] = "paused_need_help"
|
||||||
|
replay_state["pause_reason"] = "loop_detected"
|
||||||
|
replay_state["pause_message"] = (
|
||||||
|
f"Léa semble bloquée — {verdict.signal} "
|
||||||
|
f"(détail: {verdict.evidence})"
|
||||||
|
)
|
||||||
|
logger.warning(
|
||||||
|
"LoopDetector: replay %s mis en pause — signal=%s evidence=%s",
|
||||||
|
replay_state["replay_id"], verdict.signal, verdict.evidence,
|
||||||
|
)
|
||||||
|
# Bus event d'observabilité (logger pattern QW1)
|
||||||
|
try:
|
||||||
|
logger.info(
|
||||||
|
"[BUS] lea:loop_detected replay=%s signal=%s evidence=%s",
|
||||||
|
replay_state["replay_id"],
|
||||||
|
verdict.signal,
|
||||||
|
verdict.evidence,
|
||||||
|
)
|
||||||
|
except Exception as _e_bus:
|
||||||
|
logger.debug("emit lea:loop_detected échec: %s", _e_bus)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning("LoopDetector: évaluation échouée (non bloquant): %s", e)
|
||||||
|
|
||||||
return {
|
return {
|
||||||
"status": "recorded",
|
"status": "recorded",
|
||||||
"action_id": action_id,
|
"action_id": action_id,
|
||||||
@@ -3767,7 +4137,7 @@ async def register_error_callback(config: ErrorCallbackConfig):
|
|||||||
replay_id = config.replay_id
|
replay_id = config.replay_id
|
||||||
callback_url = config.callback_url
|
callback_url = config.callback_url
|
||||||
|
|
||||||
with _replay_lock:
|
async with _async_replay_lock():
|
||||||
if replay_id not in _replay_states:
|
if replay_id not in _replay_states:
|
||||||
raise HTTPException(
|
raise HTTPException(
|
||||||
status_code=404,
|
status_code=404,
|
||||||
@@ -3791,10 +4161,26 @@ async def get_replay_status(replay_id: str):
|
|||||||
Quand le replay est en pause supervisee (paused_need_help), la reponse
|
Quand le replay est en pause supervisee (paused_need_help), la reponse
|
||||||
inclut le contexte complet de l'echec : action echouee, screenshot,
|
inclut le contexte complet de l'echec : action echouee, screenshot,
|
||||||
target_spec, et message utilisateur.
|
target_spec, et message utilisateur.
|
||||||
"""
|
|
||||||
with _replay_lock:
|
|
||||||
state = _replay_states.get(replay_id)
|
|
||||||
|
|
||||||
|
Endpoint poll-friendly : l'acquisition du lock est timeboxée à 0.5 s.
|
||||||
|
Si une action serveur lente (extract_text/extract_table/t2a_decision)
|
||||||
|
tient le lock, le poll repart immédiatement avec status="busy" plutôt
|
||||||
|
que de bloquer l'event loop FastAPI (qui gèlerait l'ensemble des
|
||||||
|
endpoints jusqu'à libération). Suite logique du commit 35b27ae49 qui
|
||||||
|
avait déjà appliqué ce pattern à /replay/next ; QW4 a recâblé le
|
||||||
|
polling frontend ici → même classe de bug, même remède.
|
||||||
|
"""
|
||||||
|
import asyncio
|
||||||
|
loop = asyncio.get_event_loop()
|
||||||
|
acquired = await loop.run_in_executor(None, _replay_lock.acquire, True, 0.5)
|
||||||
|
if not acquired:
|
||||||
|
return {
|
||||||
|
"replay_id": replay_id,
|
||||||
|
"status": "busy",
|
||||||
|
"message": "Serveur occupé (action en cours), réessaie dans 1s",
|
||||||
|
}
|
||||||
|
try:
|
||||||
|
state = _replay_states.get(replay_id)
|
||||||
if not state:
|
if not state:
|
||||||
raise HTTPException(
|
raise HTTPException(
|
||||||
status_code=404, detail=f"Replay '{replay_id}' non trouvé"
|
status_code=404, detail=f"Replay '{replay_id}' non trouvé"
|
||||||
@@ -3813,12 +4199,14 @@ async def get_replay_status(replay_id: str):
|
|||||||
# Le failed_action contient deja screenshot_b64 et target_spec
|
# Le failed_action contient deja screenshot_b64 et target_spec
|
||||||
|
|
||||||
return result
|
return result
|
||||||
|
finally:
|
||||||
|
_replay_lock.release()
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/v1/traces/stream/replays")
|
@app.get("/api/v1/traces/stream/replays")
|
||||||
async def list_replays():
|
async def list_replays():
|
||||||
"""Lister tous les replays (actifs, terminés, en erreur)."""
|
"""Lister tous les replays (actifs, terminés, en erreur)."""
|
||||||
with _replay_lock:
|
async with _async_replay_lock():
|
||||||
# Filtrer les champs internes (préfixés par _)
|
# Filtrer les champs internes (préfixés par _)
|
||||||
return {
|
return {
|
||||||
"replays": [
|
"replays": [
|
||||||
@@ -3828,8 +4216,16 @@ async def list_replays():
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class ReplayResumeRequest(BaseModel):
|
||||||
|
"""Body optionnel pour /replay/resume — QW4 acquittement de safety_checks."""
|
||||||
|
acknowledged_check_ids: List[str] = []
|
||||||
|
|
||||||
|
|
||||||
@app.post("/api/v1/traces/stream/replay/{replay_id}/resume")
|
@app.post("/api/v1/traces/stream/replay/{replay_id}/resume")
|
||||||
async def resume_replay(replay_id: str):
|
async def resume_replay(
|
||||||
|
replay_id: str,
|
||||||
|
payload: Optional[ReplayResumeRequest] = None,
|
||||||
|
):
|
||||||
"""Reprendre un replay en pause supervisee (paused_need_help).
|
"""Reprendre un replay en pause supervisee (paused_need_help).
|
||||||
|
|
||||||
L'utilisateur a intervenu manuellement (naviguer vers le bon ecran,
|
L'utilisateur a intervenu manuellement (naviguer vers le bon ecran,
|
||||||
@@ -3837,8 +4233,12 @@ async def resume_replay(replay_id: str):
|
|||||||
est reinjectee en tete de queue pour etre re-tentee.
|
est reinjectee en tete de queue pour etre re-tentee.
|
||||||
|
|
||||||
Si le replay n'est pas en pause, retourne une erreur 409 (conflit).
|
Si le replay n'est pas en pause, retourne une erreur 409 (conflit).
|
||||||
|
|
||||||
|
QW4 — Si des safety_checks sont attachés à la pause, tous ceux marqués
|
||||||
|
`required` doivent figurer dans `acknowledged_check_ids`. Sinon → 400
|
||||||
|
avec `{"error": "required_checks_missing", "missing": [...]}`.
|
||||||
"""
|
"""
|
||||||
with _replay_lock:
|
async with _async_replay_lock():
|
||||||
state = _replay_states.get(replay_id)
|
state = _replay_states.get(replay_id)
|
||||||
|
|
||||||
if not state:
|
if not state:
|
||||||
@@ -3855,6 +4255,25 @@ async def resume_replay(replay_id: str):
|
|||||||
),
|
),
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# QW4 — Vérification des safety_checks required avant reprise
|
||||||
|
safety_checks = state.get("safety_checks") or []
|
||||||
|
ack_ids = (payload.acknowledged_check_ids if payload else []) or []
|
||||||
|
if safety_checks:
|
||||||
|
required_ids = {c["id"] for c in safety_checks if c.get("required")}
|
||||||
|
ack_set = set(ack_ids)
|
||||||
|
missing = sorted(required_ids - ack_set)
|
||||||
|
if missing:
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=400,
|
||||||
|
detail={"error": "required_checks_missing", "missing": missing},
|
||||||
|
)
|
||||||
|
# Audit trail
|
||||||
|
state["checks_acknowledged"] = sorted(ack_set)
|
||||||
|
logger.info(
|
||||||
|
"QW4 resume replay=%s acquittements=%d (%s)",
|
||||||
|
state.get("replay_id"), len(ack_set), sorted(ack_set),
|
||||||
|
)
|
||||||
|
|
||||||
# Recuperer l'action echouee pour la reinjecter
|
# Recuperer l'action echouee pour la reinjecter
|
||||||
failed_action = state.get("failed_action")
|
failed_action = state.get("failed_action")
|
||||||
session_id = state["session_id"]
|
session_id = state["session_id"]
|
||||||
@@ -3863,9 +4282,15 @@ async def resume_replay(replay_id: str):
|
|||||||
state["status"] = "running"
|
state["status"] = "running"
|
||||||
state["failed_action"] = None
|
state["failed_action"] = None
|
||||||
state["pause_message"] = None
|
state["pause_message"] = None
|
||||||
|
# QW4 — vider safety_checks après acquittement (la pause est résolue)
|
||||||
|
state["safety_checks"] = []
|
||||||
|
state["pause_payload"] = None
|
||||||
|
state["pause_reason"] = ""
|
||||||
|
|
||||||
# Reinjecter l'action echouee en tete de queue (sera re-tentee)
|
# Reinjecter l'action echouee en tete de queue (sera re-tentee)
|
||||||
if failed_action and failed_action.get("action_id"):
|
# pause_for_human est une pause intentionnelle, pas une erreur — ne pas réinjecter
|
||||||
|
if (failed_action and failed_action.get("action_id")
|
||||||
|
and failed_action.get("reason") != "user_request"):
|
||||||
# Reconstruire l'action a partir du retry_pending ou de l'original
|
# Reconstruire l'action a partir du retry_pending ou de l'original
|
||||||
original_action_id = failed_action["action_id"]
|
original_action_id = failed_action["action_id"]
|
||||||
# Chercher l'action originale dans les retry_pending
|
# Chercher l'action originale dans les retry_pending
|
||||||
@@ -3906,6 +4331,26 @@ async def resume_replay(replay_id: str):
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/api/v1/traces/stream/replay/{replay_id}/cancel")
|
||||||
|
async def cancel_replay(replay_id: str):
|
||||||
|
"""Annuler un replay (quel que soit son statut) et vider sa queue."""
|
||||||
|
async with _async_replay_lock():
|
||||||
|
state = _replay_states.get(replay_id)
|
||||||
|
if not state:
|
||||||
|
raise HTTPException(status_code=404, detail=f"Replay '{replay_id}' non trouvé")
|
||||||
|
session_id = state["session_id"]
|
||||||
|
state["status"] = "cancelled"
|
||||||
|
state["failed_action"] = None
|
||||||
|
state["pause_message"] = None
|
||||||
|
_replay_queues[session_id] = []
|
||||||
|
keys_to_del = [k for k, v in _retry_pending.items() if v.get("replay_id") == replay_id]
|
||||||
|
for k in keys_to_del:
|
||||||
|
_retry_pending.pop(k, None)
|
||||||
|
|
||||||
|
logger.info("Replay %s annulé manuellement", replay_id)
|
||||||
|
return {"status": "cancelled", "replay_id": replay_id, "session_id": session_id}
|
||||||
|
|
||||||
|
|
||||||
# =========================================================================
|
# =========================================================================
|
||||||
# Visual Replay — Résolution visuelle des cibles (module resolve_engine)
|
# Visual Replay — Résolution visuelle des cibles (module resolve_engine)
|
||||||
# =========================================================================
|
# =========================================================================
|
||||||
@@ -3960,6 +4405,72 @@ async def resolve_target(request: ResolveTargetRequest):
|
|||||||
logger.error(f"Décodage screenshot échoué: {e}")
|
logger.error(f"Décodage screenshot échoué: {e}")
|
||||||
return _fallback_response(request, "decode_error", str(e))
|
return _fallback_response(request, "decode_error", str(e))
|
||||||
|
|
||||||
|
# Détection image tronquée + fallback heartbeat full screen.
|
||||||
|
# Bug client constaté ce 2026-05-07 (PC Windows 192.168.1.11, agent V1) :
|
||||||
|
# mss.monitors[1] retourne parfois une bande étroite type 2560x60, 2560x108,
|
||||||
|
# 600x72 — possiblement la barre des tâches Windows confondue avec un monitor,
|
||||||
|
# ou un état mss corrompu. Reproductible même PC en mono physique. Cause
|
||||||
|
# exacte non isolée côté client (cf. session_20260506_handoff_v2.md).
|
||||||
|
# Les heartbeats (capturer.py, chemin différent de executor.py) restent en
|
||||||
|
# full screen 2560x1600. On compense ici en remplaçant l'image tronquée
|
||||||
|
# par le dernier heartbeat avant la cascade _resolve_target_sync.
|
||||||
|
effective_w = request.screen_width
|
||||||
|
effective_h = request.screen_height
|
||||||
|
# Seuil large : un écran moderne fait 2560x1600 ou plus. Tout en dessous
|
||||||
|
# de 1200x800 est suspect — bug client mss.monitors[1] qui crop sur
|
||||||
|
# barre des tâches (2560x60), Edge fenêtré (622x856), etc.
|
||||||
|
if img.height < 800 or img.width < 1200:
|
||||||
|
logger.warning(
|
||||||
|
"[RESOLVE_TARGET] Image client tronquée %dx%d (declared %dx%d) — "
|
||||||
|
"fallback heartbeat full screen",
|
||||||
|
img.width, img.height, effective_w, effective_h,
|
||||||
|
)
|
||||||
|
# Source 1 : _last_heartbeat (mémoire, peuplé par /stream/image)
|
||||||
|
candidate_path = None
|
||||||
|
candidate_age_s = None
|
||||||
|
latest_hb = max(
|
||||||
|
(h for h in _last_heartbeat.values() if h.get("path")),
|
||||||
|
key=lambda h: h.get("timestamp", 0),
|
||||||
|
default=None,
|
||||||
|
)
|
||||||
|
if latest_hb and os.path.isfile(latest_hb["path"]):
|
||||||
|
candidate_path = latest_hb["path"]
|
||||||
|
candidate_age_s = time.time() - latest_hb.get("timestamp", time.time())
|
||||||
|
else:
|
||||||
|
# Source 2 : scan disque (utile après restart serveur, avant que
|
||||||
|
# _last_heartbeat ne se repeuple — ou si l'agent V1 ne polle pas)
|
||||||
|
try:
|
||||||
|
import glob as _glob
|
||||||
|
pattern = "/home/dom/ai/rpa_vision_v3/data/training/live_sessions/*/bg_*/shots/heartbeat_*.png"
|
||||||
|
all_files = _glob.glob(pattern)
|
||||||
|
files = [
|
||||||
|
f for f in all_files
|
||||||
|
if "_blurred" not in f and os.path.isfile(f)
|
||||||
|
]
|
||||||
|
logger.info(
|
||||||
|
"[RESOLVE_TARGET] Scan disque : %d match glob, %d non-blurred existants",
|
||||||
|
len(all_files), len(files),
|
||||||
|
)
|
||||||
|
if files:
|
||||||
|
files.sort(key=lambda f: os.path.getmtime(f), reverse=True)
|
||||||
|
candidate_path = files[0]
|
||||||
|
candidate_age_s = time.time() - os.path.getmtime(candidate_path)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning("[RESOLVE_TARGET] Scan disque heartbeat échoué : %s", e)
|
||||||
|
|
||||||
|
if candidate_path:
|
||||||
|
try:
|
||||||
|
img = Image.open(candidate_path)
|
||||||
|
effective_w, effective_h = img.size
|
||||||
|
logger.info(
|
||||||
|
"[RESOLVE_TARGET] Heartbeat fallback OK : %s (%dx%d, age=%.1fs)",
|
||||||
|
candidate_path, effective_w, effective_h, candidate_age_s or -1,
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning("[RESOLVE_TARGET] Ouverture heartbeat échouée : %s", e)
|
||||||
|
else:
|
||||||
|
logger.warning("[RESOLVE_TARGET] Aucun heartbeat disponible pour fallback")
|
||||||
|
|
||||||
# Sauver temporairement pour les analyseurs (ils attendent un chemin fichier)
|
# Sauver temporairement pour les analyseurs (ils attendent un chemin fichier)
|
||||||
with tempfile.NamedTemporaryFile(suffix=".jpg", delete=False) as tmp:
|
with tempfile.NamedTemporaryFile(suffix=".jpg", delete=False) as tmp:
|
||||||
img.save(tmp, format="JPEG", quality=90)
|
img.save(tmp, format="JPEG", quality=90)
|
||||||
@@ -3975,8 +4486,8 @@ async def resolve_target(request: ResolveTargetRequest):
|
|||||||
_resolve_target_sync,
|
_resolve_target_sync,
|
||||||
tmp_path,
|
tmp_path,
|
||||||
request.target_spec,
|
request.target_spec,
|
||||||
request.screen_width,
|
effective_w,
|
||||||
request.screen_height,
|
effective_h,
|
||||||
request.fallback_x_pct,
|
request.fallback_x_pct,
|
||||||
request.fallback_y_pct,
|
request.fallback_y_pct,
|
||||||
request.strict_mode,
|
request.strict_mode,
|
||||||
@@ -3992,6 +4503,44 @@ async def resolve_target(request: ResolveTargetRequest):
|
|||||||
request.fallback_y_pct,
|
request.fallback_y_pct,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# Pré-check sémantique post-cascade : OCR sur une zone autour de la
|
||||||
|
# coordonnée résolue pour vérifier que le by_text attendu y est bien
|
||||||
|
# présent. Attrape les cas où la cascade rend des coords plausibles
|
||||||
|
# mais pointant sur un autre élément (ex : clic sur "Dossier en cours"
|
||||||
|
# du menu au lieu de "Synthèse Urgences" du tab plus bas).
|
||||||
|
if result and result.get("resolved"):
|
||||||
|
_by_text = (request.target_spec.get("by_text") or "").strip()
|
||||||
|
if _by_text:
|
||||||
|
from agent_v0.server_v1.resolve_engine import _validate_text_at_position
|
||||||
|
_is_valid, _observed, _ocr_ms = _validate_text_at_position(
|
||||||
|
tmp_path,
|
||||||
|
float(result.get("x_pct", 0) or 0),
|
||||||
|
float(result.get("y_pct", 0) or 0),
|
||||||
|
_by_text,
|
||||||
|
effective_w,
|
||||||
|
effective_h,
|
||||||
|
)
|
||||||
|
if not _is_valid:
|
||||||
|
logger.warning(
|
||||||
|
"[REPLAY] Pre-check OCR REJET : '%s' attendu @ (%.4f, %.4f) "
|
||||||
|
"via %s mais OCR voit '%s' (%.0fms)",
|
||||||
|
_by_text[:40],
|
||||||
|
float(result.get("x_pct", 0) or 0),
|
||||||
|
float(result.get("y_pct", 0) or 0),
|
||||||
|
result.get("method", "?"),
|
||||||
|
_observed[:80],
|
||||||
|
_ocr_ms,
|
||||||
|
)
|
||||||
|
result = {
|
||||||
|
"resolved": False,
|
||||||
|
"method": "rejected_text_mismatch",
|
||||||
|
"reason": f"expected='{_by_text[:40]}' observed='{_observed[:60]}'",
|
||||||
|
"original_method": result.get("method"),
|
||||||
|
"original_score": result.get("score"),
|
||||||
|
"x_pct": None,
|
||||||
|
"y_pct": None,
|
||||||
|
}
|
||||||
|
|
||||||
# [REPLAY] log structuré de sortie résolution (après validation)
|
# [REPLAY] log structuré de sortie résolution (après validation)
|
||||||
logger.info(
|
logger.info(
|
||||||
f"[REPLAY] RESOLVE_EXIT session={request.session_id} "
|
f"[REPLAY] RESOLVE_EXIT session={request.session_id} "
|
||||||
@@ -4007,7 +4556,8 @@ async def resolve_target(request: ResolveTargetRequest):
|
|||||||
logger.error(f"[REPLAY] RESOLVE_EXCEPTION session={request.session_id} error={e}")
|
logger.error(f"[REPLAY] RESOLVE_EXCEPTION session={request.session_id} error={e}")
|
||||||
return _fallback_response(request, "analysis_error", str(e))
|
return _fallback_response(request, "analysis_error", str(e))
|
||||||
finally:
|
finally:
|
||||||
import os
|
# `os` est déjà importé en haut du fichier — pas de re-import local
|
||||||
|
# (sinon UnboundLocalError plus haut dans la fonction).
|
||||||
try:
|
try:
|
||||||
os.unlink(tmp_path)
|
os.unlink(tmp_path)
|
||||||
except OSError:
|
except OSError:
|
||||||
|
|||||||
@@ -256,6 +256,20 @@ class LiveSessionManager:
|
|||||||
session.last_window_info["title"] = wc_title
|
session.last_window_info["title"] = wc_title
|
||||||
if wc_app:
|
if wc_app:
|
||||||
session.last_window_info["app_name"] = wc_app
|
session.last_window_info["app_name"] = wc_app
|
||||||
|
# QW1 — propager monitor_index et monitors_geometry depuis window_capture
|
||||||
|
if "monitor_index" in window_capture:
|
||||||
|
session.last_window_info["monitor_index"] = window_capture["monitor_index"]
|
||||||
|
if "monitors_geometry" in window_capture:
|
||||||
|
session.last_window_info["monitors_geometry"] = window_capture["monitors_geometry"]
|
||||||
|
|
||||||
|
# QW1 — propager monitor_index/monitors_geometry du payload event
|
||||||
|
# (cas heartbeat enrichi sans window/window_title). Toujours
|
||||||
|
# rafraîchir le focus actif (change souvent) et la géométrie
|
||||||
|
# (l'utilisateur peut brancher/débrancher un écran).
|
||||||
|
if "monitor_index" in event_data:
|
||||||
|
session.last_window_info["monitor_index"] = event_data["monitor_index"]
|
||||||
|
if "monitors_geometry" in event_data and event_data["monitors_geometry"]:
|
||||||
|
session.last_window_info["monitors_geometry"] = event_data["monitors_geometry"]
|
||||||
|
|
||||||
# Accumuler les titres/apps pour le nommage automatique
|
# Accumuler les titres/apps pour le nommage automatique
|
||||||
title = session.last_window_info.get("title", "").strip()
|
title = session.last_window_info.get("title", "").strip()
|
||||||
|
|||||||
154
agent_v0/server_v1/loop_detector.py
Normal file
154
agent_v0/server_v1/loop_detector.py
Normal file
@@ -0,0 +1,154 @@
|
|||||||
|
# agent_v0/server_v1/loop_detector.py
|
||||||
|
"""LoopDetector composite — détection de stagnation de Léa pendant un replay (QW2).
|
||||||
|
|
||||||
|
Trois signaux indépendants :
|
||||||
|
- screen_static : N captures consécutives avec CLIP similarity > seuil
|
||||||
|
- action_repeat : N actions consécutives identiques (type + coords)
|
||||||
|
- retry_threshold : nombre de retries cumulés >= seuil
|
||||||
|
|
||||||
|
Un seul signal positif → verdict.detected=True. Le serveur bascule alors le
|
||||||
|
replay en paused_need_help avec pause_reason explicite.
|
||||||
|
|
||||||
|
Désactivable via env var RPA_LOOP_DETECTOR_ENABLED=0.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from typing import Any, Dict, List, Optional
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class LoopVerdict:
|
||||||
|
detected: bool = False
|
||||||
|
reason: str = ""
|
||||||
|
signal: str = "" # "screen_static" | "action_repeat" | "retry_threshold" | ""
|
||||||
|
evidence: Dict[str, Any] = field(default_factory=dict)
|
||||||
|
|
||||||
|
|
||||||
|
def _env_int(name: str, default: int) -> int:
|
||||||
|
try:
|
||||||
|
return int(os.environ.get(name, default))
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
return default
|
||||||
|
|
||||||
|
|
||||||
|
def _env_float(name: str, default: float) -> float:
|
||||||
|
try:
|
||||||
|
return float(os.environ.get(name, default))
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
return default
|
||||||
|
|
||||||
|
|
||||||
|
def _env_bool_enabled(name: str) -> bool:
|
||||||
|
val = os.environ.get(name, "1").strip().lower()
|
||||||
|
return val not in ("0", "false", "no", "off", "")
|
||||||
|
|
||||||
|
|
||||||
|
def _cosine_similarity(a, b) -> float:
|
||||||
|
"""Similarité cosine entre deux vecteurs (listes ou np.array). Robuste vecteur nul."""
|
||||||
|
import numpy as np
|
||||||
|
av = np.asarray(a, dtype=np.float32).flatten()
|
||||||
|
bv = np.asarray(b, dtype=np.float32).flatten()
|
||||||
|
na, nb = float(np.linalg.norm(av)), float(np.linalg.norm(bv))
|
||||||
|
if na < 1e-8 or nb < 1e-8:
|
||||||
|
return 0.0
|
||||||
|
return float(np.dot(av, bv) / (na * nb))
|
||||||
|
|
||||||
|
|
||||||
|
class LoopDetector:
|
||||||
|
def __init__(self, clip_embedder=None):
|
||||||
|
self.clip_embedder = clip_embedder
|
||||||
|
|
||||||
|
def evaluate(
|
||||||
|
self,
|
||||||
|
state: Dict[str, Any],
|
||||||
|
screenshots: List[Any],
|
||||||
|
actions: List[Dict[str, Any]],
|
||||||
|
) -> LoopVerdict:
|
||||||
|
"""Évalue les 3 signaux. Retourne le premier déclenché.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
state: replay_state (utilisé pour retried_actions)
|
||||||
|
screenshots: anneau d'embeddings CLIP (les N derniers)
|
||||||
|
actions: anneau des N dernières actions exécutées
|
||||||
|
"""
|
||||||
|
if not _env_bool_enabled("RPA_LOOP_DETECTOR_ENABLED"):
|
||||||
|
return LoopVerdict(detected=False)
|
||||||
|
|
||||||
|
# Signal A : screen_static
|
||||||
|
verdict = self._check_screen_static(screenshots)
|
||||||
|
if verdict.detected:
|
||||||
|
return verdict
|
||||||
|
|
||||||
|
# Signal B : action_repeat
|
||||||
|
verdict = self._check_action_repeat(actions)
|
||||||
|
if verdict.detected:
|
||||||
|
return verdict
|
||||||
|
|
||||||
|
# Signal C : retry_threshold
|
||||||
|
verdict = self._check_retry_threshold(state)
|
||||||
|
if verdict.detected:
|
||||||
|
return verdict
|
||||||
|
|
||||||
|
return LoopVerdict(detected=False)
|
||||||
|
|
||||||
|
def _check_screen_static(self, screenshots: List[Any]) -> LoopVerdict:
|
||||||
|
n_required = _env_int("RPA_LOOP_SCREEN_STATIC_N", 4)
|
||||||
|
threshold = _env_float("RPA_LOOP_SCREEN_STATIC_THRESHOLD", 0.99)
|
||||||
|
|
||||||
|
if self.clip_embedder is None or len(screenshots) < n_required:
|
||||||
|
return LoopVerdict()
|
||||||
|
|
||||||
|
try:
|
||||||
|
recent = screenshots[-n_required:]
|
||||||
|
# Embed chaque capture via le CLIP embedder (peut lever)
|
||||||
|
embeddings = [self.clip_embedder.embed_image(img) for img in recent]
|
||||||
|
sims = [_cosine_similarity(embeddings[i], embeddings[i + 1])
|
||||||
|
for i in range(len(embeddings) - 1)]
|
||||||
|
min_sim = min(sims)
|
||||||
|
if min_sim > threshold:
|
||||||
|
return LoopVerdict(
|
||||||
|
detected=True,
|
||||||
|
reason="loop_detected",
|
||||||
|
signal="screen_static",
|
||||||
|
evidence={"min_similarity": round(min_sim, 4),
|
||||||
|
"n_captures": n_required,
|
||||||
|
"threshold": threshold},
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning("LoopDetector signal_A erreur (%s) — signal inerte ce tick", e)
|
||||||
|
return LoopVerdict()
|
||||||
|
|
||||||
|
def _check_action_repeat(self, actions: List[Dict[str, Any]]) -> LoopVerdict:
|
||||||
|
n_required = _env_int("RPA_LOOP_ACTION_REPEAT_N", 3)
|
||||||
|
if len(actions) < n_required:
|
||||||
|
return LoopVerdict()
|
||||||
|
recent = actions[-n_required:]
|
||||||
|
|
||||||
|
def _signature(a: Dict[str, Any]) -> tuple:
|
||||||
|
return (a.get("type"), a.get("x_pct"), a.get("y_pct"))
|
||||||
|
|
||||||
|
sigs = [_signature(a) for a in recent]
|
||||||
|
if all(s == sigs[0] for s in sigs):
|
||||||
|
return LoopVerdict(
|
||||||
|
detected=True,
|
||||||
|
reason="loop_detected",
|
||||||
|
signal="action_repeat",
|
||||||
|
evidence={"signature": sigs[0], "count": n_required},
|
||||||
|
)
|
||||||
|
return LoopVerdict()
|
||||||
|
|
||||||
|
def _check_retry_threshold(self, state: Dict[str, Any]) -> LoopVerdict:
|
||||||
|
threshold = _env_int("RPA_LOOP_RETRY_THRESHOLD", 3)
|
||||||
|
retried = int(state.get("retried_actions", 0))
|
||||||
|
if retried >= threshold:
|
||||||
|
return LoopVerdict(
|
||||||
|
detected=True,
|
||||||
|
reason="loop_detected",
|
||||||
|
signal="retry_threshold",
|
||||||
|
evidence={"retried_actions": retried, "threshold": threshold},
|
||||||
|
)
|
||||||
|
return LoopVerdict()
|
||||||
99
agent_v0/server_v1/monitor_router.py
Normal file
99
agent_v0/server_v1/monitor_router.py
Normal file
@@ -0,0 +1,99 @@
|
|||||||
|
# agent_v0/server_v1/monitor_router.py
|
||||||
|
"""MonitorRouter — résolution de l'écran cible pour le replay (QW1).
|
||||||
|
|
||||||
|
Stratégie en cascade :
|
||||||
|
1. action.monitor_index (hérité de la session source) → cible cet écran
|
||||||
|
2. session.last_focused_monitor (focus actif vu en dernier heartbeat) → fallback
|
||||||
|
3. composite (offset 0, 0) → backward compat
|
||||||
|
|
||||||
|
Émet sur le bus lea:* l'event monitor_routed avec la source de la décision.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from typing import Any, Dict, List, Optional
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class MonitorTarget:
|
||||||
|
"""Représente l'écran cible résolu pour une action de replay."""
|
||||||
|
idx: int
|
||||||
|
offset_x: int
|
||||||
|
offset_y: int
|
||||||
|
w: int
|
||||||
|
h: int
|
||||||
|
source: str # "action" | "focus" | "composite_fallback"
|
||||||
|
|
||||||
|
|
||||||
|
_COMPOSITE_FALLBACK = MonitorTarget(
|
||||||
|
idx=-1,
|
||||||
|
offset_x=0,
|
||||||
|
offset_y=0,
|
||||||
|
w=0,
|
||||||
|
h=0,
|
||||||
|
source="composite_fallback",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _find_monitor(geometry: List[Dict[str, Any]], idx: int) -> Optional[Dict[str, Any]]:
|
||||||
|
"""Retourne le monitor d'index donné, ou None si absent."""
|
||||||
|
for m in geometry:
|
||||||
|
if m.get("idx") == idx:
|
||||||
|
return m
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _to_target(monitor: Dict[str, Any], source: str) -> MonitorTarget:
|
||||||
|
return MonitorTarget(
|
||||||
|
idx=int(monitor["idx"]),
|
||||||
|
offset_x=int(monitor.get("x", 0)),
|
||||||
|
offset_y=int(monitor.get("y", 0)),
|
||||||
|
w=int(monitor.get("w", 0)),
|
||||||
|
h=int(monitor.get("h", 0)),
|
||||||
|
source=source,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_target_monitor(
|
||||||
|
action: Dict[str, Any],
|
||||||
|
session_state: Dict[str, Any],
|
||||||
|
) -> MonitorTarget:
|
||||||
|
"""Résout l'écran cible d'une action de replay.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
action: Dict de l'action (peut contenir `monitor_index`).
|
||||||
|
session_state: État de la session (doit contenir `monitors_geometry`
|
||||||
|
et `last_focused_monitor`).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
MonitorTarget avec l'offset à appliquer aux coordonnées de grounding.
|
||||||
|
"""
|
||||||
|
geometry: List[Dict[str, Any]] = session_state.get("monitors_geometry") or []
|
||||||
|
|
||||||
|
# 1. Cible explicite via action
|
||||||
|
explicit_idx = action.get("monitor_index")
|
||||||
|
if explicit_idx is not None and geometry:
|
||||||
|
m = _find_monitor(geometry, int(explicit_idx))
|
||||||
|
if m is not None:
|
||||||
|
return _to_target(m, source="action")
|
||||||
|
# Index invalide → on tombe sur le fallback focus
|
||||||
|
logger.warning(
|
||||||
|
"[BUS] lea:monitor_invalid_index requested=%d available_idx=%s",
|
||||||
|
int(explicit_idx), [g.get("idx") for g in geometry],
|
||||||
|
)
|
||||||
|
|
||||||
|
# 2. Fallback focus actif
|
||||||
|
focused_idx = session_state.get("last_focused_monitor")
|
||||||
|
if focused_idx is not None and geometry:
|
||||||
|
m = _find_monitor(geometry, int(focused_idx))
|
||||||
|
if m is not None:
|
||||||
|
return _to_target(m, source="focus")
|
||||||
|
logger.warning(
|
||||||
|
"[BUS] lea:monitor_unavailable focused_idx=%d available_idx=%s",
|
||||||
|
int(focused_idx), [g.get("idx") for g in geometry],
|
||||||
|
)
|
||||||
|
|
||||||
|
# 3. Fallback composite (backward compat — comportement actuel mss.monitors[0])
|
||||||
|
return _COMPOSITE_FALLBACK
|
||||||
@@ -33,7 +33,15 @@ _ALLOWED_ACTION_TYPES = {
|
|||||||
"file_open", "file_save", "file_close", "file_new", "file_dialog",
|
"file_open", "file_save", "file_close", "file_new", "file_dialog",
|
||||||
"double_click", "right_click", "drag",
|
"double_click", "right_click", "drag",
|
||||||
"verify_screen", # Replay hybride : vérification visuelle entre groupes
|
"verify_screen", # Replay hybride : vérification visuelle entre groupes
|
||||||
|
"pause_for_human", # Pause supervisée explicite (interceptée par /replay/next)
|
||||||
|
"extract_text", # OCR serveur sur dernier heartbeat → variable workflow
|
||||||
|
"t2a_decision", # Analyse LLM facturation T2A → variable workflow
|
||||||
}
|
}
|
||||||
|
|
||||||
|
# Types d'actions exécutées CÔTÉ SERVEUR (jamais transmises à l'Agent V1).
|
||||||
|
# Le pipeline /replay/next les traite en boucle interne et passe à l'action
|
||||||
|
# suivante jusqu'à trouver une action visuelle (à transmettre au client).
|
||||||
|
_SERVER_SIDE_ACTION_TYPES = {"extract_text", "t2a_decision"}
|
||||||
_MAX_ACTION_TEXT_LENGTH = 10000
|
_MAX_ACTION_TEXT_LENGTH = 10000
|
||||||
_MAX_KEYS_PER_COMBO = 10
|
_MAX_KEYS_PER_COMBO = 10
|
||||||
# Touches autorisées dans les key_combo (modificateurs + touches spéciales + caractères simples)
|
# Touches autorisées dans les key_combo (modificateurs + touches spéciales + caractères simples)
|
||||||
@@ -852,6 +860,30 @@ def _edge_to_normalized_actions(edge, params: Dict[str, Any]) -> List[Dict[str,
|
|||||||
keys = [action_params["key"]]
|
keys = [action_params["key"]]
|
||||||
normalized["keys"] = keys
|
normalized["keys"] = keys
|
||||||
|
|
||||||
|
elif action_type == "pause_for_human":
|
||||||
|
normalized["type"] = "pause_for_human"
|
||||||
|
normalized["parameters"] = {
|
||||||
|
"message": action_params.get("message", "Validation requise"),
|
||||||
|
}
|
||||||
|
return [normalized] # pas de target/coords pour cette action logique
|
||||||
|
|
||||||
|
elif action_type == "extract_text":
|
||||||
|
normalized["type"] = "extract_text"
|
||||||
|
normalized["parameters"] = {
|
||||||
|
"output_var": action_params.get("output_var", "extracted_text"),
|
||||||
|
"paragraph": bool(action_params.get("paragraph", True)),
|
||||||
|
}
|
||||||
|
return [normalized]
|
||||||
|
|
||||||
|
elif action_type == "t2a_decision":
|
||||||
|
normalized["type"] = "t2a_decision"
|
||||||
|
normalized["parameters"] = {
|
||||||
|
"input_template": action_params.get("input_template", ""),
|
||||||
|
"output_var": action_params.get("output_var", "t2a_result"),
|
||||||
|
"model": action_params.get("model"),
|
||||||
|
}
|
||||||
|
return [normalized]
|
||||||
|
|
||||||
else:
|
else:
|
||||||
logger.warning(f"Type d'action inconnu : {action_type}")
|
logger.warning(f"Type d'action inconnu : {action_type}")
|
||||||
return []
|
return []
|
||||||
@@ -886,6 +918,143 @@ def _substitute_variables(text: str, params: Dict[str, Any], defaults: Dict[str,
|
|||||||
return re.sub(r'\$\{(\w+)\}', replacer, text)
|
return re.sub(r'\$\{(\w+)\}', replacer, text)
|
||||||
|
|
||||||
|
|
||||||
|
# Regex pour le templating runtime : {{var}} ou {{var.champ}} ou {{var.champ.sous}}
|
||||||
|
_RUNTIME_VAR_PATTERN = re.compile(r'\{\{\s*(\w+)(?:\.([\w.]+))?\s*\}\}')
|
||||||
|
|
||||||
|
|
||||||
|
def _resolve_runtime_vars_in_str(text: str, variables: Dict[str, Any]) -> str:
|
||||||
|
"""Remplace {{var}} et {{var.field}} par leur valeur depuis le dict variables.
|
||||||
|
|
||||||
|
Variables/champs absents : laissés tels quels (ne casse pas le pipeline).
|
||||||
|
Pour les valeurs non-str (dict, list), str() est appelé.
|
||||||
|
"""
|
||||||
|
def replacer(match):
|
||||||
|
var_name = match.group(1)
|
||||||
|
path = match.group(2)
|
||||||
|
if var_name not in variables:
|
||||||
|
return match.group(0)
|
||||||
|
value = variables[var_name]
|
||||||
|
if path:
|
||||||
|
for field in path.split('.'):
|
||||||
|
if isinstance(value, dict) and field in value:
|
||||||
|
value = value[field]
|
||||||
|
else:
|
||||||
|
return match.group(0)
|
||||||
|
return str(value)
|
||||||
|
|
||||||
|
return _RUNTIME_VAR_PATTERN.sub(replacer, text)
|
||||||
|
|
||||||
|
|
||||||
|
def _resolve_runtime_vars(value: Any, variables: Dict[str, Any]) -> Any:
|
||||||
|
"""Résout récursivement les {{var}} et {{var.field}} dans une valeur.
|
||||||
|
|
||||||
|
Supporte str, dict, list. Les autres types sont retournés tels quels.
|
||||||
|
Si variables est vide ou None, value est retournée inchangée.
|
||||||
|
"""
|
||||||
|
if not variables:
|
||||||
|
return value
|
||||||
|
if isinstance(value, str):
|
||||||
|
return _resolve_runtime_vars_in_str(value, variables)
|
||||||
|
if isinstance(value, dict):
|
||||||
|
return {k: _resolve_runtime_vars(v, variables) for k, v in value.items()}
|
||||||
|
if isinstance(value, list):
|
||||||
|
return [_resolve_runtime_vars(item, variables) for item in value]
|
||||||
|
return value
|
||||||
|
|
||||||
|
|
||||||
|
# =========================================================================
|
||||||
|
# Handlers pour les actions exécutées côté serveur (extract_text, t2a_decision)
|
||||||
|
# =========================================================================
|
||||||
|
|
||||||
|
def _handle_extract_text_action(
|
||||||
|
action: Dict[str, Any],
|
||||||
|
replay_state: Dict[str, Any],
|
||||||
|
session_id: str,
|
||||||
|
last_heartbeat: Dict[str, Dict[str, Any]],
|
||||||
|
) -> bool:
|
||||||
|
"""Traite une action extract_text côté serveur. Stocke le texte OCRisé dans
|
||||||
|
replay_state["variables"][output_var]. Retourne True si succès.
|
||||||
|
|
||||||
|
Robuste aux échecs : si pas de heartbeat ou OCR raté, stocke "" et retourne
|
||||||
|
False (le pipeline continue, pas de blocage).
|
||||||
|
"""
|
||||||
|
params = action.get("parameters") or {}
|
||||||
|
output_var = (params.get("output_var") or "extracted_text").strip()
|
||||||
|
paragraph = bool(params.get("paragraph", True))
|
||||||
|
|
||||||
|
heartbeat = last_heartbeat.get(session_id) or {}
|
||||||
|
path = heartbeat.get("path")
|
||||||
|
text = ""
|
||||||
|
|
||||||
|
if path:
|
||||||
|
try:
|
||||||
|
from core.llm import extract_text_from_image
|
||||||
|
text = extract_text_from_image(path, paragraph=paragraph)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning("extract_text OCR échoué (%s) — variable '%s' = ''", e, output_var)
|
||||||
|
else:
|
||||||
|
logger.warning(
|
||||||
|
"extract_text : pas de heartbeat pour session %s — variable '%s' = ''",
|
||||||
|
session_id, output_var,
|
||||||
|
)
|
||||||
|
|
||||||
|
replay_state.setdefault("variables", {})[output_var] = text
|
||||||
|
logger.info(
|
||||||
|
"extract_text → variable '%s' (%d chars) replay %s",
|
||||||
|
output_var, len(text), replay_state.get("replay_id", "?"),
|
||||||
|
)
|
||||||
|
return bool(text)
|
||||||
|
|
||||||
|
|
||||||
|
def _handle_t2a_decision_action(
|
||||||
|
action: Dict[str, Any],
|
||||||
|
replay_state: Dict[str, Any],
|
||||||
|
) -> bool:
|
||||||
|
"""Traite une action t2a_decision côté serveur. Stocke le résultat JSON
|
||||||
|
dans replay_state["variables"][output_var]. Retourne True si succès.
|
||||||
|
|
||||||
|
Le DPI à analyser vient de action.parameters.input_template (déjà résolu
|
||||||
|
par _resolve_runtime_vars donc les {{var}} sont remplis).
|
||||||
|
"""
|
||||||
|
params = action.get("parameters") or {}
|
||||||
|
output_var = (params.get("output_var") or "t2a_result").strip()
|
||||||
|
dpi_text = (params.get("input_template") or params.get("dpi") or "").strip()
|
||||||
|
model = params.get("model") or None # None → DEFAULT_MODEL
|
||||||
|
|
||||||
|
if not dpi_text:
|
||||||
|
logger.warning(
|
||||||
|
"t2a_decision : input vide — variable '%s' = {decision: 'INDETERMINE'}", output_var,
|
||||||
|
)
|
||||||
|
replay_state.setdefault("variables", {})[output_var] = {
|
||||||
|
"decision": "INDETERMINE",
|
||||||
|
"justification": "DPI vide ou non extrait",
|
||||||
|
"confiance": "faible",
|
||||||
|
"_error": "empty_input",
|
||||||
|
}
|
||||||
|
return False
|
||||||
|
|
||||||
|
try:
|
||||||
|
from core.llm import analyze_dpi, DEFAULT_MODEL
|
||||||
|
result = analyze_dpi(dpi_text, model=model or DEFAULT_MODEL)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning("t2a_decision : analyze_dpi exception %s", e)
|
||||||
|
result = {
|
||||||
|
"decision": "INDETERMINE",
|
||||||
|
"justification": f"Erreur analyse : {e}",
|
||||||
|
"confiance": "faible",
|
||||||
|
"_error": str(e),
|
||||||
|
}
|
||||||
|
|
||||||
|
replay_state.setdefault("variables", {})[output_var] = result
|
||||||
|
decision = result.get("decision", "?")
|
||||||
|
elapsed = result.get("_elapsed_s", "?")
|
||||||
|
logger.info(
|
||||||
|
"t2a_decision → variable '%s' decision=%s (%ss) replay %s",
|
||||||
|
output_var, decision, elapsed, replay_state.get("replay_id", "?"),
|
||||||
|
)
|
||||||
|
return "_error" not in result
|
||||||
|
|
||||||
|
|
||||||
def _expand_compound_steps(
|
def _expand_compound_steps(
|
||||||
steps: List[Dict[str, Any]], base: Dict[str, Any], params: Dict[str, Any]
|
steps: List[Dict[str, Any]], base: Dict[str, Any], params: Dict[str, Any]
|
||||||
) -> List[Dict[str, Any]]:
|
) -> List[Dict[str, Any]]:
|
||||||
@@ -1208,6 +1377,18 @@ def _create_replay_state(
|
|||||||
# Champs pour pause supervisée (target_not_found)
|
# Champs pour pause supervisée (target_not_found)
|
||||||
"failed_action": None, # Contexte de l'action en echec (quand paused_need_help)
|
"failed_action": None, # Contexte de l'action en echec (quand paused_need_help)
|
||||||
"pause_message": None, # Message a afficher a l'utilisateur
|
"pause_message": None, # Message a afficher a l'utilisateur
|
||||||
|
# Variables d'exécution produites en cours de workflow (extract_text,
|
||||||
|
# t2a_decision, etc.). Résolues via templating {{var}} ou {{var.field}}
|
||||||
|
# dans les paramètres des actions suivantes.
|
||||||
|
"variables": {},
|
||||||
|
# QW2 — Anneaux d'historique pour LoopDetector (5 derniers max)
|
||||||
|
"_screenshot_history": [], # images PIL des N derniers heartbeats (LoopDetector embed à chaque tick)
|
||||||
|
"_action_history": [], # N dernières actions exécutées (signature)
|
||||||
|
# QW4 — Safety checks (hybride déclaratif + LLM contextuel) et audit acquittements
|
||||||
|
"safety_checks": [], # liste produite par SafetyChecksProvider
|
||||||
|
"checks_acknowledged": [], # ids acquittés via /replay/resume (audit trail)
|
||||||
|
"pause_reason": "", # "loop_detected" | "" pour V1
|
||||||
|
"pause_payload": None, # payload complet pour debug/audit
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -1746,6 +1746,49 @@ def _resolve_target_sync(
|
|||||||
)
|
)
|
||||||
return result
|
return result
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------
|
||||||
|
# Étape 0.5 : OCR direct (hybrid_text_direct) — chemin rapide
|
||||||
|
# ---------------------------------------------------------------
|
||||||
|
# Si on a un texte cible non vide, le localiser par OCR direct
|
||||||
|
# avant de tomber sur le VLM (~100-300ms vs 2-23s par appel VLM).
|
||||||
|
# Reconnecté le 2026-05-06 : la fonction _resolve_by_ocr_text
|
||||||
|
# existait déjà mais n'était appelée QUE depuis le runtime V4
|
||||||
|
# (resolve_order pré-compilé), qui n'est pas branché côté frontend
|
||||||
|
# (cf. audit project-quality-guardian Cas #5). La cascade legacy
|
||||||
|
# tombait directement sur VLM Quick Find d'où des replays à 23s
|
||||||
|
# par action visuelle au lieu de <500ms attendus.
|
||||||
|
# Le method est rebadgé "hybrid_text_direct" (seuil 0.80 dans
|
||||||
|
# _RESOLUTION_MIN_SCORES, identifiant historique côté client
|
||||||
|
# Agent V1 et logs Learning).
|
||||||
|
if by_text_strict:
|
||||||
|
ocr_result = _resolve_by_ocr_text(
|
||||||
|
screenshot_path=screenshot_path,
|
||||||
|
target_text=by_text_strict,
|
||||||
|
screen_width=screen_width,
|
||||||
|
screen_height=screen_height,
|
||||||
|
)
|
||||||
|
if ocr_result and ocr_result.get("score", 0) >= 0.80:
|
||||||
|
ocr_result["method"] = "hybrid_text_direct"
|
||||||
|
logger.info(
|
||||||
|
"Strict resolve OCR-DIRECT : OK '%s' → (%.4f, %.4f) score=%.2f",
|
||||||
|
by_text_strict[:40],
|
||||||
|
ocr_result.get("x_pct", 0),
|
||||||
|
ocr_result.get("y_pct", 0),
|
||||||
|
ocr_result.get("score", 0),
|
||||||
|
)
|
||||||
|
return ocr_result
|
||||||
|
elif ocr_result:
|
||||||
|
logger.info(
|
||||||
|
"Strict resolve OCR-DIRECT : '%s' trouvé score=%.2f < 0.80, passage VLM",
|
||||||
|
by_text_strict[:40],
|
||||||
|
ocr_result.get("score", 0),
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
logger.info(
|
||||||
|
"Strict resolve OCR-DIRECT : '%s' non trouvé, passage VLM",
|
||||||
|
by_text_strict[:40],
|
||||||
|
)
|
||||||
|
|
||||||
# ---------------------------------------------------------------
|
# ---------------------------------------------------------------
|
||||||
# Étape 1 : VLM Quick Find (fallback, multi-image)
|
# Étape 1 : VLM Quick Find (fallback, multi-image)
|
||||||
# ---------------------------------------------------------------
|
# ---------------------------------------------------------------
|
||||||
@@ -2117,6 +2160,135 @@ _RESOLUTION_MIN_SCORES: Dict[str, float] = {
|
|||||||
_RESOLUTION_MAX_DRIFT: float = 0.20
|
_RESOLUTION_MAX_DRIFT: float = 0.20
|
||||||
|
|
||||||
|
|
||||||
|
# ===========================================================================
|
||||||
|
# Pré-check sémantique : OCR de validation de position
|
||||||
|
# ===========================================================================
|
||||||
|
# Avant de dispatcher un clic, on vérifie que le texte attendu (by_text) est
|
||||||
|
# bien présent dans une fenêtre OCR autour de la coordonnée résolue. Cela
|
||||||
|
# attrape les cas où la cascade renvoie une coordonnée plausible mais qui
|
||||||
|
# pointe en réalité sur un autre élément (ex: clic sur "Dossier en cours" du
|
||||||
|
# menu au lieu de "Synthèse Urgences" du tab plus bas).
|
||||||
|
# ===========================================================================
|
||||||
|
|
||||||
|
_VALIDATION_OCR_READER = None
|
||||||
|
_VALIDATION_OCR_LOCK = threading.Lock()
|
||||||
|
_VALIDATION_OCR_FAILED = False
|
||||||
|
|
||||||
|
|
||||||
|
def _get_validation_ocr_reader():
|
||||||
|
"""Singleton EasyOCR partagé pour la validation post-cascade.
|
||||||
|
|
||||||
|
Chargement paresseux à la première requête. En cas d'échec, on cache
|
||||||
|
le statut FAILED pour ne pas retenter à chaque appel et bloquer le flux.
|
||||||
|
"""
|
||||||
|
global _VALIDATION_OCR_READER, _VALIDATION_OCR_FAILED
|
||||||
|
if _VALIDATION_OCR_FAILED:
|
||||||
|
return None
|
||||||
|
with _VALIDATION_OCR_LOCK:
|
||||||
|
if _VALIDATION_OCR_READER is None and not _VALIDATION_OCR_FAILED:
|
||||||
|
try:
|
||||||
|
import easyocr # type: ignore
|
||||||
|
_VALIDATION_OCR_READER = easyocr.Reader(
|
||||||
|
['fr', 'en'], gpu=True, verbose=False
|
||||||
|
)
|
||||||
|
logger.info("[REPLAY] EasyOCR validator chargé (fr+en, GPU)")
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning("[REPLAY] EasyOCR validator indisponible (%s) — pré-check désactivé", e)
|
||||||
|
_VALIDATION_OCR_FAILED = True
|
||||||
|
return None
|
||||||
|
return _VALIDATION_OCR_READER
|
||||||
|
|
||||||
|
|
||||||
|
def _normalize_for_match(s: str) -> str:
|
||||||
|
"""Normalisation pour comparaison textuelle robuste : lowercase, sans
|
||||||
|
accents, ponctuation → espace, espaces multiples écrasés.
|
||||||
|
"""
|
||||||
|
import unicodedata
|
||||||
|
decomposed = unicodedata.normalize('NFD', s.lower())
|
||||||
|
no_accents = ''.join(c for c in decomposed if unicodedata.category(c) != 'Mn')
|
||||||
|
cleaned = ''.join(c if c.isalnum() or c.isspace() else ' ' for c in no_accents)
|
||||||
|
return ' '.join(cleaned.split())
|
||||||
|
|
||||||
|
|
||||||
|
def _text_match_fuzzy(expected: str, observed: str, min_token_ratio: float = 0.60) -> bool:
|
||||||
|
"""Match tolérant aux imperfections OCR.
|
||||||
|
|
||||||
|
1. Substring exacte → match.
|
||||||
|
2. Sinon : split en tokens ≥3 caractères, retourne True si au moins
|
||||||
|
`min_token_ratio` des tokens attendus apparaissent dans observed.
|
||||||
|
Ex : "Coller ou saisir le dossier patient" → tokens
|
||||||
|
['coller', 'saisir', 'dossier', 'patient'] ; si OCR voit "u saisir
|
||||||
|
le dossier patient" → 3/4 = 75% présents → match accepté.
|
||||||
|
|
||||||
|
Cible le compromis entre strict (faux négatifs sur erreurs OCR) et
|
||||||
|
permissif (faux positifs sur textes voisins).
|
||||||
|
"""
|
||||||
|
nexp = _normalize_for_match(expected)
|
||||||
|
nobs = _normalize_for_match(observed)
|
||||||
|
if not nexp:
|
||||||
|
return True
|
||||||
|
if nexp in nobs:
|
||||||
|
return True
|
||||||
|
tokens = [t for t in nexp.split() if len(t) >= 3]
|
||||||
|
if not tokens:
|
||||||
|
return False
|
||||||
|
matched = sum(1 for t in tokens if t in nobs)
|
||||||
|
return matched / len(tokens) >= min_token_ratio
|
||||||
|
|
||||||
|
|
||||||
|
def _validate_text_at_position(
|
||||||
|
screenshot_path: str,
|
||||||
|
x_pct: float,
|
||||||
|
y_pct: float,
|
||||||
|
expected_text: str,
|
||||||
|
screen_width: int,
|
||||||
|
screen_height: int,
|
||||||
|
radius_px: int = 200,
|
||||||
|
) -> tuple:
|
||||||
|
"""Pré-check sémantique : OCR sur une zone autour de (x_pct, y_pct) et
|
||||||
|
vérifie que `expected_text` y est présent (substring ou fuzzy 60%).
|
||||||
|
|
||||||
|
Retourne (is_valid: bool, observed_text: str, elapsed_ms: float).
|
||||||
|
|
||||||
|
Politique en cas d'échec OCR (lib absente, exception) : retourne
|
||||||
|
(True, "", 0.0) pour ne pas bloquer le flux. Mieux vaut un faux positif
|
||||||
|
rare qu'une régression bloquante introduite par la validation elle-même.
|
||||||
|
"""
|
||||||
|
reader = _get_validation_ocr_reader()
|
||||||
|
if reader is None:
|
||||||
|
return True, "", 0.0
|
||||||
|
if not expected_text or not expected_text.strip():
|
||||||
|
return True, "", 0.0
|
||||||
|
try:
|
||||||
|
from PIL import Image
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
t0 = time.time()
|
||||||
|
img = Image.open(screenshot_path).convert("RGB")
|
||||||
|
img_w, img_h = img.size
|
||||||
|
cx = int(x_pct * screen_width)
|
||||||
|
cy = int(y_pct * screen_height)
|
||||||
|
# Saturer dans les bornes de l'image (le screenshot peut être plus
|
||||||
|
# large que la fenêtre logique — utiliser min(img_*, screen_*) en sécurité).
|
||||||
|
max_x = min(img_w, screen_width)
|
||||||
|
max_y = min(img_h, screen_height)
|
||||||
|
x1 = max(0, cx - radius_px)
|
||||||
|
y1 = max(0, cy - radius_px)
|
||||||
|
x2 = min(max_x, cx + radius_px)
|
||||||
|
y2 = min(max_y, cy + radius_px)
|
||||||
|
if x2 - x1 < 10 or y2 - y1 < 10:
|
||||||
|
return True, "", 0.0
|
||||||
|
crop = img.crop((x1, y1, x2, y2))
|
||||||
|
results = reader.readtext(np.array(crop))
|
||||||
|
observed = " ".join(r[1] for r in results if r and len(r) >= 2)
|
||||||
|
elapsed_ms = (time.time() - t0) * 1000
|
||||||
|
is_valid = _text_match_fuzzy(expected_text, observed, min_token_ratio=0.60)
|
||||||
|
return is_valid, observed, elapsed_ms
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning("[REPLAY] _validate_text_at_position erreur (%s) — pas de blocage", e)
|
||||||
|
return True, "", 0.0
|
||||||
|
|
||||||
|
|
||||||
def _validate_resolution_quality(
|
def _validate_resolution_quality(
|
||||||
result: Optional[Dict[str, Any]],
|
result: Optional[Dict[str, Any]],
|
||||||
fallback_x_pct: float,
|
fallback_x_pct: float,
|
||||||
@@ -2193,6 +2365,30 @@ def _validate_resolution_quality(
|
|||||||
dx = abs(resolved_x - fallback_x_pct)
|
dx = abs(resolved_x - fallback_x_pct)
|
||||||
dy = abs(resolved_y - fallback_y_pct)
|
dy = abs(resolved_y - fallback_y_pct)
|
||||||
if dx > _RESOLUTION_MAX_DRIFT or dy > _RESOLUTION_MAX_DRIFT:
|
if dx > _RESOLUTION_MAX_DRIFT or dy > _RESOLUTION_MAX_DRIFT:
|
||||||
|
# Exception : pour les méthodes "haute confiance" qui ont
|
||||||
|
# identifié sémantiquement la cible (texte exact via OCR ou
|
||||||
|
# image quasi parfaite via template), on fait confiance à la
|
||||||
|
# position visuelle peu importe le drift. Le drift par rapport
|
||||||
|
# à l'enregistrement ne reflète qu'un changement de layout
|
||||||
|
# (scroll, redimensionnement, F11, refonte UI, résolution
|
||||||
|
# différente), pas une erreur de résolution.
|
||||||
|
#
|
||||||
|
# - template_matching ≥ 0.95 : image retrouvée pixel-perfect
|
||||||
|
# - hybrid_text_direct ≥ 0.80 : texte exact reconnu par OCR
|
||||||
|
# (0.80 est déjà le seuil d'acceptation côté _RESOLUTION_MIN_SCORES,
|
||||||
|
# au-dessus on a un signal sémantique fiable).
|
||||||
|
_high_confidence_method = (
|
||||||
|
(method.startswith("template_matching") and score >= 0.95)
|
||||||
|
or (method == "hybrid_text_direct" and score >= 0.80)
|
||||||
|
)
|
||||||
|
if _high_confidence_method:
|
||||||
|
logger.info(
|
||||||
|
"[REPLAY] Drift (%.3f, %.3f) > %.2f IGNORÉ : score=%.3f "
|
||||||
|
"sur %s — résultat visuel fiable, on l'utilise",
|
||||||
|
dx, dy, _RESOLUTION_MAX_DRIFT, score, method,
|
||||||
|
)
|
||||||
|
return result
|
||||||
|
|
||||||
logger.warning(
|
logger.warning(
|
||||||
"[REPLAY] Resolution REJETÉE (drift trop grand) : "
|
"[REPLAY] Resolution REJETÉE (drift trop grand) : "
|
||||||
"method=%s resolved=(%.3f, %.3f) expected=(%.3f, %.3f) "
|
"method=%s resolved=(%.3f, %.3f) expected=(%.3f, %.3f) "
|
||||||
@@ -2201,6 +2397,10 @@ def _validate_resolution_quality(
|
|||||||
fallback_x_pct, fallback_y_pct,
|
fallback_x_pct, fallback_y_pct,
|
||||||
dx, dy, _RESOLUTION_MAX_DRIFT,
|
dx, dy, _RESOLUTION_MAX_DRIFT,
|
||||||
)
|
)
|
||||||
|
# 100% visuel : on ne clique JAMAIS aux coords enregistrées en aveugle.
|
||||||
|
# resolved=False → la couche supérieure tente la méthode suivante
|
||||||
|
# (VLM Quick Find, SoM, grounding) ; si toutes échouent, l'agent
|
||||||
|
# passe par "visual_resolve_failed" → Policy → pause supervisée.
|
||||||
return {
|
return {
|
||||||
"resolved": False,
|
"resolved": False,
|
||||||
"method": f"rejected_drift_{method}",
|
"method": f"rejected_drift_{method}",
|
||||||
|
|||||||
195
agent_v0/server_v1/safety_checks_provider.py
Normal file
195
agent_v0/server_v1/safety_checks_provider.py
Normal file
@@ -0,0 +1,195 @@
|
|||||||
|
# agent_v0/server_v1/safety_checks_provider.py
|
||||||
|
"""SafetyChecksProvider — checks hybrides déclaratifs + LLM contextuels (QW4).
|
||||||
|
|
||||||
|
Pour une action pause_for_human :
|
||||||
|
- les checks déclaratifs (workflow) sont toujours inclus
|
||||||
|
- si safety_level == "medical_critical" et RPA_SAFETY_CHECKS_LLM_ENABLED=1,
|
||||||
|
un appel LLM (medgemma:4b par défaut) ajoute jusqu'à N checks contextuels
|
||||||
|
|
||||||
|
Tout échec côté LLM (timeout, exception, parse) → additional_checks=[] :
|
||||||
|
le replay continue avec uniquement les déclaratifs (fallback safe).
|
||||||
|
"""
|
||||||
|
|
||||||
|
import base64
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import uuid
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from typing import Any, Dict, List, Optional
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class PausePayload:
|
||||||
|
checks: List[Dict[str, Any]] = field(default_factory=list)
|
||||||
|
pause_reason: str = ""
|
||||||
|
message: str = ""
|
||||||
|
|
||||||
|
|
||||||
|
def _env(name: str, default: str) -> str:
|
||||||
|
return os.environ.get(name, default).strip()
|
||||||
|
|
||||||
|
|
||||||
|
def _env_int(name: str, default: int) -> int:
|
||||||
|
try:
|
||||||
|
return int(os.environ.get(name, default))
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
return default
|
||||||
|
|
||||||
|
|
||||||
|
def _env_bool_enabled(name: str) -> bool:
|
||||||
|
val = os.environ.get(name, "1").strip().lower()
|
||||||
|
return val not in ("0", "false", "no", "off", "")
|
||||||
|
|
||||||
|
|
||||||
|
def build_pause_payload(
|
||||||
|
action: Dict[str, Any],
|
||||||
|
replay_state: Dict[str, Any],
|
||||||
|
last_screenshot: Optional[str],
|
||||||
|
) -> PausePayload:
|
||||||
|
"""Construit le payload de pause enrichi pour une action pause_for_human."""
|
||||||
|
params = action.get("parameters") or {}
|
||||||
|
message = params.get("message", "Validation requise")
|
||||||
|
safety_level = params.get("safety_level")
|
||||||
|
declarative = params.get("safety_checks") or []
|
||||||
|
|
||||||
|
# Normalisation des checks déclaratifs
|
||||||
|
checks: List[Dict[str, Any]] = []
|
||||||
|
for d in declarative:
|
||||||
|
checks.append({
|
||||||
|
"id": d.get("id") or f"decl_{uuid.uuid4().hex[:6]}",
|
||||||
|
"label": d.get("label", "Validation"),
|
||||||
|
"required": bool(d.get("required", True)),
|
||||||
|
"source": "declarative",
|
||||||
|
"evidence": None,
|
||||||
|
})
|
||||||
|
|
||||||
|
# Ajout LLM contextual si applicable
|
||||||
|
if safety_level == "medical_critical" and _env_bool_enabled("RPA_SAFETY_CHECKS_LLM_ENABLED"):
|
||||||
|
try:
|
||||||
|
additional = _call_llm_for_contextual_checks(
|
||||||
|
action=action,
|
||||||
|
replay_state=replay_state,
|
||||||
|
last_screenshot=last_screenshot,
|
||||||
|
existing_labels=[c["label"] for c in checks],
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning("[BUS] lea:safety_checks_llm_failed reason=exception detail=%s", e)
|
||||||
|
additional = []
|
||||||
|
|
||||||
|
for a in additional:
|
||||||
|
checks.append({
|
||||||
|
"id": f"llm_{uuid.uuid4().hex[:6]}",
|
||||||
|
"label": a.get("label", ""),
|
||||||
|
"required": False, # checks LLM = informationnels, pas obligatoires V1
|
||||||
|
"source": "llm_contextual",
|
||||||
|
"evidence": a.get("evidence", ""),
|
||||||
|
})
|
||||||
|
|
||||||
|
return PausePayload(
|
||||||
|
checks=checks,
|
||||||
|
pause_reason="",
|
||||||
|
message=message,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _call_llm_for_contextual_checks(
|
||||||
|
action: Dict[str, Any],
|
||||||
|
replay_state: Dict[str, Any],
|
||||||
|
last_screenshot: Optional[str],
|
||||||
|
existing_labels: List[str],
|
||||||
|
) -> List[Dict[str, str]]:
|
||||||
|
"""Appelle Ollama en mode JSON strict pour générer 0-N checks contextuels.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List[{label, evidence}] (max RPA_SAFETY_CHECKS_LLM_MAX_CHECKS).
|
||||||
|
[] sur tout échec (timeout, JSON invalide, exception).
|
||||||
|
"""
|
||||||
|
import requests
|
||||||
|
|
||||||
|
# Défaut gemma4:latest : meilleur compromis détection/latence sur bench
|
||||||
|
# 2026-05-06 (cf. docs/BENCH_SAFETY_CHECKS_2026-05-06.md). medgemma:4b
|
||||||
|
# retournait systématiquement [] (refus de signaler).
|
||||||
|
model = _env("RPA_SAFETY_CHECKS_LLM_MODEL", "gemma4:latest")
|
||||||
|
# Timeout 7s : warm avg gemma4 = 2.9s + marge 4s. Cold start ~10s couvert
|
||||||
|
# si le modèle reste résident (OLLAMA_KEEP_ALIVE=24h recommandé prod).
|
||||||
|
timeout_s = _env_int("RPA_SAFETY_CHECKS_LLM_TIMEOUT_S", 7)
|
||||||
|
max_checks = _env_int("RPA_SAFETY_CHECKS_LLM_MAX_CHECKS", 3)
|
||||||
|
ollama_url = _env("OLLAMA_URL", "http://localhost:11434")
|
||||||
|
|
||||||
|
params = action.get("parameters") or {}
|
||||||
|
workflow_message = params.get("message", "")
|
||||||
|
existing = ", ".join(existing_labels) if existing_labels else "aucun"
|
||||||
|
|
||||||
|
prompt = f"""Tu es Léa, assistante médicale supervisée.
|
||||||
|
Avant de continuer le workflow, tu dois lister 0 à {max_checks} vérifications supplémentaires
|
||||||
|
que l'humain doit acquitter, en regardant l'écran actuel.
|
||||||
|
|
||||||
|
Contexte workflow : {workflow_message}
|
||||||
|
Checks déjà demandés : {existing}
|
||||||
|
|
||||||
|
NE répète PAS un check déjà demandé.
|
||||||
|
Si rien d'inhabituel à signaler, retourne {{"additional_checks": []}}.
|
||||||
|
|
||||||
|
Réponds UNIQUEMENT en JSON :
|
||||||
|
{{
|
||||||
|
"additional_checks": [
|
||||||
|
{{"label": "string court", "evidence": "ce que tu as vu d'inhabituel"}}
|
||||||
|
]
|
||||||
|
}}
|
||||||
|
"""
|
||||||
|
|
||||||
|
payload = {
|
||||||
|
"model": model,
|
||||||
|
"prompt": prompt,
|
||||||
|
"stream": False,
|
||||||
|
"format": "json",
|
||||||
|
"options": {"temperature": 0.1, "num_predict": 200},
|
||||||
|
}
|
||||||
|
|
||||||
|
if last_screenshot and os.path.isfile(last_screenshot):
|
||||||
|
try:
|
||||||
|
with open(last_screenshot, "rb") as f:
|
||||||
|
payload["images"] = [base64.b64encode(f.read()).decode("ascii")]
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug("safety_checks: lecture screenshot échouée (%s) — appel sans image", e)
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = requests.post(
|
||||||
|
f"{ollama_url}/api/generate",
|
||||||
|
json=payload,
|
||||||
|
timeout=timeout_s,
|
||||||
|
)
|
||||||
|
if response.status_code != 200:
|
||||||
|
logger.warning("[BUS] lea:safety_checks_llm_failed reason=http_status detail=%s", response.status_code)
|
||||||
|
return []
|
||||||
|
text = response.json().get("response", "").strip()
|
||||||
|
except requests.Timeout:
|
||||||
|
logger.warning("[BUS] lea:safety_checks_llm_failed reason=timeout detail=%ss", timeout_s)
|
||||||
|
return []
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning("[BUS] lea:safety_checks_llm_failed reason=network detail=%s", e)
|
||||||
|
return []
|
||||||
|
|
||||||
|
# format=json garantit normalement du JSON valide
|
||||||
|
try:
|
||||||
|
parsed = json.loads(text)
|
||||||
|
except json.JSONDecodeError as e:
|
||||||
|
logger.warning("[BUS] lea:safety_checks_llm_failed reason=json_decode detail=%s", e)
|
||||||
|
return []
|
||||||
|
|
||||||
|
additional = parsed.get("additional_checks") or []
|
||||||
|
if not isinstance(additional, list):
|
||||||
|
return []
|
||||||
|
|
||||||
|
# Filtre + tronc
|
||||||
|
valid = []
|
||||||
|
for item in additional[:max_checks]:
|
||||||
|
if isinstance(item, dict) and item.get("label"):
|
||||||
|
valid.append({
|
||||||
|
"label": str(item["label"])[:200],
|
||||||
|
"evidence": str(item.get("evidence", ""))[:300],
|
||||||
|
})
|
||||||
|
return valid
|
||||||
@@ -1791,6 +1791,10 @@ class StreamProcessor:
|
|||||||
# Workflows construits (pour le matching)
|
# Workflows construits (pour le matching)
|
||||||
self._workflows: Dict[str, Any] = {}
|
self._workflows: Dict[str, Any] = {}
|
||||||
|
|
||||||
|
# Shadow learning : dernier pattern UI détecté par session
|
||||||
|
# Stocke {session_id: {"pattern": str, "ocr_text": str, "screen_state": obj, "shot_id": str}}
|
||||||
|
self._pending_ui_patterns: Dict[str, Dict[str, Any]] = {}
|
||||||
|
|
||||||
# Charger les workflows existants depuis le disque
|
# Charger les workflows existants depuis le disque
|
||||||
self._load_persisted_workflows()
|
self._load_persisted_workflows()
|
||||||
|
|
||||||
@@ -1975,6 +1979,9 @@ class StreamProcessor:
|
|||||||
- key_combo/key_press avec uniquement des modificateurs seuls (ctrl, alt, shift, etc.)
|
- key_combo/key_press avec uniquement des modificateurs seuls (ctrl, alt, shift, etc.)
|
||||||
- key_combo/key_press avec liste de touches vide
|
- key_combo/key_press avec liste de touches vide
|
||||||
- text_input avec texte vide
|
- text_input avec texte vide
|
||||||
|
|
||||||
|
Shadow learning : quand un clic suit un pattern UI détecté,
|
||||||
|
on apprend l'association dialogue→bouton.
|
||||||
"""
|
"""
|
||||||
if _is_parasitic_event(event_data):
|
if _is_parasitic_event(event_data):
|
||||||
logger.debug(
|
logger.debug(
|
||||||
@@ -1982,9 +1989,119 @@ class StreamProcessor:
|
|||||||
f"type={event_data.get('type')}, data={event_data.get('keys', event_data.get('text', ''))}"
|
f"type={event_data.get('type')}, data={event_data.get('keys', event_data.get('text', ''))}"
|
||||||
)
|
)
|
||||||
return {"status": "event_filtered", "session_id": session_id, "reason": "parasitic"}
|
return {"status": "event_filtered", "session_id": session_id, "reason": "parasitic"}
|
||||||
|
|
||||||
|
# Shadow learning : si un pattern UI est en attente et qu'on reçoit un clic
|
||||||
|
if event_data.get("type") == "mouse_click":
|
||||||
|
self._try_shadow_learn(session_id, event_data)
|
||||||
|
|
||||||
self.session_manager.add_event(session_id, event_data)
|
self.session_manager.add_event(session_id, event_data)
|
||||||
return {"status": "event_recorded", "session_id": session_id}
|
return {"status": "event_recorded", "session_id": session_id}
|
||||||
|
|
||||||
|
def _try_shadow_learn(self, session_id: str, click_event: Dict[str, Any]):
|
||||||
|
"""Tente d'apprendre un pattern UI depuis un clic observé en Shadow.
|
||||||
|
|
||||||
|
Quand un screenshot contenait un pattern UI détecté (dialogue) et que
|
||||||
|
l'utilisateur clique ensuite, on extrait le texte OCR au point de clic
|
||||||
|
pour apprendre l'association : "quand je vois ce texte → cliquer sur ce bouton".
|
||||||
|
"""
|
||||||
|
with self._data_lock:
|
||||||
|
pending = self._pending_ui_patterns.pop(session_id, None)
|
||||||
|
if not pending:
|
||||||
|
return
|
||||||
|
|
||||||
|
screen_state = pending.get("screen_state")
|
||||||
|
if screen_state is None:
|
||||||
|
return
|
||||||
|
|
||||||
|
# Extraire la position du clic (pixels absolus)
|
||||||
|
pos = click_event.get("pos", [])
|
||||||
|
if not pos or len(pos) != 2:
|
||||||
|
return
|
||||||
|
|
||||||
|
click_x, click_y = pos[0], pos[1]
|
||||||
|
|
||||||
|
# Trouver le texte OCR le plus proche du point de clic
|
||||||
|
# via les ui_elements du ScreenState (ils ont bbox + label)
|
||||||
|
clicked_label = self._find_label_at_position(screen_state, click_x, click_y)
|
||||||
|
if not clicked_label:
|
||||||
|
return
|
||||||
|
|
||||||
|
# Extraire le trigger principal du texte OCR du dialogue
|
||||||
|
ocr_text = pending.get("ocr_text", "")
|
||||||
|
# Utiliser un extrait court comme trigger (max 80 chars, premier segment pertinent)
|
||||||
|
trigger_text = ocr_text[:80].strip().lower()
|
||||||
|
if not trigger_text:
|
||||||
|
return
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
f"Shadow learning: pattern '{pending['pattern_name']}' "
|
||||||
|
f"→ utilisateur a cliqué '{clicked_label}' | trigger='{trigger_text[:40]}...'"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Sauvegarder le pattern appris
|
||||||
|
try:
|
||||||
|
from core.knowledge.ui_patterns import UIPatternLibrary
|
||||||
|
lib = UIPatternLibrary()
|
||||||
|
lib.save_learned_pattern({
|
||||||
|
"category": "dialog",
|
||||||
|
"triggers": [trigger_text],
|
||||||
|
"action": "click",
|
||||||
|
"target": clicked_label,
|
||||||
|
"os": "windows",
|
||||||
|
"confidence": 0.8,
|
||||||
|
})
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Shadow learning: échec sauvegarde pattern: {e}")
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _find_label_at_position(screen_state, click_x: int, click_y: int) -> Optional[str]:
|
||||||
|
"""Trouve le label de l'élément UI le plus proche du point de clic.
|
||||||
|
|
||||||
|
Parcourt les ui_elements du ScreenState et retourne le label de
|
||||||
|
l'élément dont la bbox contient le point, ou le plus proche si aucun
|
||||||
|
ne contient exactement le point.
|
||||||
|
"""
|
||||||
|
ui_elements = getattr(screen_state, "ui_elements", [])
|
||||||
|
if not ui_elements:
|
||||||
|
return None
|
||||||
|
|
||||||
|
best_label = None
|
||||||
|
best_dist = float("inf")
|
||||||
|
|
||||||
|
for elem in ui_elements:
|
||||||
|
bbox = getattr(elem, "bbox", None)
|
||||||
|
label = getattr(elem, "label", "")
|
||||||
|
if not bbox or not label:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# BBox = (x, y, width, height) — extraire les coordonnées
|
||||||
|
try:
|
||||||
|
bx, by = bbox.x, bbox.y
|
||||||
|
bw, bh = bbox.width, bbox.height
|
||||||
|
except AttributeError:
|
||||||
|
# Fallback si bbox est une liste/tuple
|
||||||
|
if hasattr(bbox, '__len__') and len(bbox) >= 4:
|
||||||
|
bx, by, bw, bh = bbox[0], bbox[1], bbox[2], bbox[3]
|
||||||
|
else:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Vérifier si le clic est dans la bbox
|
||||||
|
if bx <= click_x <= bx + bw and by <= click_y <= by + bh:
|
||||||
|
return label.strip()
|
||||||
|
|
||||||
|
# Sinon calculer la distance au centre
|
||||||
|
cx = bx + bw / 2
|
||||||
|
cy = by + bh / 2
|
||||||
|
dist = ((click_x - cx) ** 2 + (click_y - cy) ** 2) ** 0.5
|
||||||
|
if dist < best_dist:
|
||||||
|
best_dist = dist
|
||||||
|
best_label = label.strip()
|
||||||
|
|
||||||
|
# Ne retourner le plus proche que s'il est raisonnablement proche (< 100px)
|
||||||
|
if best_label and best_dist < 100:
|
||||||
|
return best_label
|
||||||
|
return None
|
||||||
|
|
||||||
# =========================================================================
|
# =========================================================================
|
||||||
# Screenshots
|
# Screenshots
|
||||||
# =========================================================================
|
# =========================================================================
|
||||||
@@ -2042,6 +2159,37 @@ class StreamProcessor:
|
|||||||
self._screen_states[session_id] = []
|
self._screen_states[session_id] = []
|
||||||
self._screen_states[session_id].append(screen_state)
|
self._screen_states[session_id].append(screen_state)
|
||||||
|
|
||||||
|
# Enrichir avec les patterns UI connus
|
||||||
|
try:
|
||||||
|
from core.knowledge.ui_patterns import UIPatternLibrary
|
||||||
|
detected_text = getattr(screen_state.perception, "detected_text", [])
|
||||||
|
if detected_text:
|
||||||
|
ocr_text = " ".join(str(t) for t in detected_text) if isinstance(detected_text, list) else str(detected_text)
|
||||||
|
lib = UIPatternLibrary()
|
||||||
|
pattern = lib.find_pattern(ocr_text)
|
||||||
|
if pattern:
|
||||||
|
result["ui_pattern"] = pattern["pattern"]
|
||||||
|
result["ui_pattern_action"] = pattern["action"]
|
||||||
|
result["ui_pattern_target"] = pattern["target"]
|
||||||
|
logger.info(f"Pattern UI détecté: {pattern['pattern']} → {pattern['target']}")
|
||||||
|
|
||||||
|
# Shadow learning : mémoriser le pattern en attente du clic utilisateur
|
||||||
|
with self._data_lock:
|
||||||
|
self._pending_ui_patterns[session_id] = {
|
||||||
|
"pattern_name": pattern["pattern"],
|
||||||
|
"ocr_text": ocr_text,
|
||||||
|
"screen_state": screen_state,
|
||||||
|
"shot_id": shot_id,
|
||||||
|
}
|
||||||
|
else:
|
||||||
|
# Pas de pattern connu → effacer le pending (l'écran a changé)
|
||||||
|
with self._data_lock:
|
||||||
|
self._pending_ui_patterns.pop(session_id, None)
|
||||||
|
except ImportError:
|
||||||
|
pass
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug(f"Pattern check: {e}")
|
||||||
|
|
||||||
logger.info(
|
logger.info(
|
||||||
f"Screenshot analysé: {shot_id} | "
|
f"Screenshot analysé: {shot_id} | "
|
||||||
f"{result['ui_elements_count']} UI elements, "
|
f"{result['ui_elements_count']} UI elements, "
|
||||||
|
|||||||
643
core/analytics/process_mining_bridge.py
Normal file
643
core/analytics/process_mining_bridge.py
Normal file
@@ -0,0 +1,643 @@
|
|||||||
|
"""
|
||||||
|
Bridge entre les workflows Lea (core) et PM4Py pour le process mining.
|
||||||
|
Genere des diagrammes BPMN et KPIs depuis les traces Shadow.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
from core.analytics.process_mining_bridge import (
|
||||||
|
sessions_to_event_log,
|
||||||
|
workflow_to_event_log,
|
||||||
|
discover_bpmn,
|
||||||
|
compute_kpis,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Depuis des sessions JSONL brutes
|
||||||
|
df = sessions_to_event_log(sessions_data)
|
||||||
|
result = discover_bpmn(df, output_dir="data/analytics/bpmn")
|
||||||
|
kpis = compute_kpis(df)
|
||||||
|
|
||||||
|
# Depuis un workflow core (dict JSON)
|
||||||
|
df = workflow_to_event_log(workflow_dict)
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any, Dict, List, Optional
|
||||||
|
|
||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# ---- Import conditionnel PM4Py -----------------------------------------
|
||||||
|
|
||||||
|
try:
|
||||||
|
import pm4py
|
||||||
|
PM4PY_AVAILABLE = True
|
||||||
|
except ImportError:
|
||||||
|
PM4PY_AVAILABLE = False
|
||||||
|
logger.warning("pm4py non installe -- le process mining est desactive")
|
||||||
|
|
||||||
|
|
||||||
|
def _sanitize_label(label: str) -> str:
|
||||||
|
"""
|
||||||
|
Supprime les caracteres de controle (0x00-0x1F sauf tab/newline)
|
||||||
|
qui sont invalides en XML et font planter PM4Py.
|
||||||
|
"""
|
||||||
|
return "".join(
|
||||||
|
c if c in ("\t", "\n", "\r") or ord(c) >= 0x20 else f"<0x{ord(c):02x}>"
|
||||||
|
for c in label
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# ---- Types d'evenements a ignorer (bruit) --------------------------------
|
||||||
|
|
||||||
|
_NOISE_EVENT_TYPES = frozenset({
|
||||||
|
"heartbeat",
|
||||||
|
"action_result",
|
||||||
|
"screenshot",
|
||||||
|
})
|
||||||
|
|
||||||
|
# Types d'evenements significatifs pour le process mining
|
||||||
|
_RELEVANT_EVENT_TYPES = frozenset({
|
||||||
|
"mouse_click",
|
||||||
|
"text_input",
|
||||||
|
"key_press",
|
||||||
|
"key_combo",
|
||||||
|
"window_focus_change",
|
||||||
|
})
|
||||||
|
|
||||||
|
|
||||||
|
# ===========================================================================
|
||||||
|
# Conversion sessions JSONL -> event log PM4Py
|
||||||
|
# ===========================================================================
|
||||||
|
|
||||||
|
|
||||||
|
def _build_activity_label(event: dict) -> Optional[str]:
|
||||||
|
"""
|
||||||
|
Construit un label d'activite lisible depuis un event JSONL brut.
|
||||||
|
|
||||||
|
Regles :
|
||||||
|
- mouse_click -> "Clic - <app_name> (<window_title tronque>)"
|
||||||
|
- text_input -> "Saisie '<text>' - <app_name>"
|
||||||
|
- key_press -> "Touche <key> - <app_name>"
|
||||||
|
- key_combo -> "Raccourci <keys> - <app_name>"
|
||||||
|
- window_focus_change -> "Fenetre <to.title> (<to.app_name>)"
|
||||||
|
|
||||||
|
Tous les labels sont sanitises pour supprimer les caracteres de controle
|
||||||
|
(ex: \\x13 pour Ctrl+S) qui sont invalides en XML/BPMN.
|
||||||
|
"""
|
||||||
|
evt = event.get("event", event)
|
||||||
|
etype = evt.get("type", "")
|
||||||
|
|
||||||
|
if etype in _NOISE_EVENT_TYPES:
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Extraction fenetre
|
||||||
|
window = evt.get("window", {})
|
||||||
|
app_name = window.get("app_name", "inconnu")
|
||||||
|
win_title = window.get("title", "")
|
||||||
|
# Tronquer le titre a 40 caracteres
|
||||||
|
short_title = (win_title[:40] + "...") if len(win_title) > 40 else win_title
|
||||||
|
|
||||||
|
label: Optional[str] = None
|
||||||
|
|
||||||
|
if etype == "mouse_click":
|
||||||
|
label = f"Clic - {app_name} ({short_title})"
|
||||||
|
|
||||||
|
elif etype == "text_input":
|
||||||
|
text = evt.get("text", "")
|
||||||
|
# Tronquer le texte a 20 caracteres pour rester lisible
|
||||||
|
short_text = (text[:20] + "...") if len(text) > 20 else text
|
||||||
|
label = f"Saisie '{short_text}' - {app_name}"
|
||||||
|
|
||||||
|
elif etype == "key_press":
|
||||||
|
key = evt.get("key", "?")
|
||||||
|
label = f"Touche {key} - {app_name}"
|
||||||
|
|
||||||
|
elif etype == "key_combo":
|
||||||
|
keys = evt.get("keys", [])
|
||||||
|
combo = "+".join(str(k) for k in keys)
|
||||||
|
label = f"Raccourci {combo} - {app_name}"
|
||||||
|
|
||||||
|
elif etype == "window_focus_change":
|
||||||
|
to_info = evt.get("to", {})
|
||||||
|
if not to_info:
|
||||||
|
return None
|
||||||
|
to_title = to_info.get("title", "?")
|
||||||
|
to_app = to_info.get("app_name", "?")
|
||||||
|
label = f"Fenetre {to_title} ({to_app})"
|
||||||
|
|
||||||
|
else:
|
||||||
|
# Types non reconnus : label generique
|
||||||
|
label = f"{etype} - {app_name}"
|
||||||
|
|
||||||
|
return _sanitize_label(label) if label else None
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_timestamp(event: dict) -> Optional[float]:
|
||||||
|
"""Extrait le timestamp unix depuis un event JSONL."""
|
||||||
|
# Le timestamp peut etre au niveau racine ou dans event.timestamp
|
||||||
|
evt = event.get("event", event)
|
||||||
|
ts = evt.get("timestamp") or event.get("timestamp")
|
||||||
|
if ts is not None:
|
||||||
|
return float(ts)
|
||||||
|
# Fallback sur le champ 't' (format simplifie)
|
||||||
|
t = evt.get("t") or event.get("t")
|
||||||
|
if t is not None:
|
||||||
|
return float(t)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def sessions_to_event_log(
|
||||||
|
sessions_data: List[dict],
|
||||||
|
deduplicate_windows: bool = True,
|
||||||
|
) -> pd.DataFrame:
|
||||||
|
"""
|
||||||
|
Convertit des traces de sessions brutes (events JSONL) en event log PM4Py.
|
||||||
|
|
||||||
|
Chaque event pertinent devient une ligne :
|
||||||
|
- case:concept:name = session_id
|
||||||
|
- concept:name = label d'activite (ex: "Clic - Notepad.exe (Bloc-notes)")
|
||||||
|
- time:timestamp = timestamp UTC
|
||||||
|
|
||||||
|
Args:
|
||||||
|
sessions_data: liste de dicts, chaque dict est une ligne JSONL parsee.
|
||||||
|
deduplicate_windows: si True, supprime les window_focus_change
|
||||||
|
consecutifs vers la meme fenetre (bruit typique de Windows).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
DataFrame pret pour PM4Py.
|
||||||
|
"""
|
||||||
|
rows: List[Dict[str, Any]] = []
|
||||||
|
|
||||||
|
# Regrouper par session_id pour le deduplication
|
||||||
|
sessions: Dict[str, List[dict]] = {}
|
||||||
|
for event in sessions_data:
|
||||||
|
sid = event.get("session_id", "unknown")
|
||||||
|
sessions.setdefault(sid, []).append(event)
|
||||||
|
|
||||||
|
for sid, events in sessions.items():
|
||||||
|
# Trier par timestamp
|
||||||
|
events.sort(key=lambda e: _extract_timestamp(e) or 0.0)
|
||||||
|
last_window_label: Optional[str] = None
|
||||||
|
|
||||||
|
for event in events:
|
||||||
|
label = _build_activity_label(event)
|
||||||
|
if label is None:
|
||||||
|
continue
|
||||||
|
|
||||||
|
ts = _extract_timestamp(event)
|
||||||
|
if ts is None:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Deduplication des changements de fenetre consecutifs
|
||||||
|
evt = event.get("event", event)
|
||||||
|
if deduplicate_windows and evt.get("type") == "window_focus_change":
|
||||||
|
if label == last_window_label:
|
||||||
|
continue
|
||||||
|
last_window_label = label
|
||||||
|
else:
|
||||||
|
last_window_label = None
|
||||||
|
|
||||||
|
rows.append({
|
||||||
|
"case:concept:name": sid,
|
||||||
|
"concept:name": label,
|
||||||
|
"time:timestamp": pd.Timestamp(
|
||||||
|
datetime.fromtimestamp(ts, tz=timezone.utc)
|
||||||
|
),
|
||||||
|
"event_type": evt.get("type", ""),
|
||||||
|
"app_name": evt.get("window", {}).get("app_name", ""),
|
||||||
|
})
|
||||||
|
|
||||||
|
if not rows:
|
||||||
|
logger.warning("Aucun evenement pertinent trouve dans les sessions")
|
||||||
|
return pd.DataFrame(columns=[
|
||||||
|
"case:concept:name",
|
||||||
|
"concept:name",
|
||||||
|
"time:timestamp",
|
||||||
|
"event_type",
|
||||||
|
"app_name",
|
||||||
|
])
|
||||||
|
|
||||||
|
df = pd.DataFrame(rows)
|
||||||
|
df = df.sort_values(["case:concept:name", "time:timestamp"]).reset_index(drop=True)
|
||||||
|
logger.info(
|
||||||
|
"Event log cree : %d evenements, %d sessions, %d activites distinctes",
|
||||||
|
len(df),
|
||||||
|
df["case:concept:name"].nunique(),
|
||||||
|
df["concept:name"].nunique(),
|
||||||
|
)
|
||||||
|
return df
|
||||||
|
|
||||||
|
|
||||||
|
# ===========================================================================
|
||||||
|
# Conversion workflow core (dict JSON) -> event log PM4Py
|
||||||
|
# ===========================================================================
|
||||||
|
|
||||||
|
|
||||||
|
def workflow_to_event_log(workflow_dict: dict) -> pd.DataFrame:
|
||||||
|
"""
|
||||||
|
Convertit un workflow core (dict JSON) en DataFrame PM4Py.
|
||||||
|
|
||||||
|
Utilise les nodes et edges pour reconstituer une trace.
|
||||||
|
Chaque chemin du entry_node vers un end_node = un case.
|
||||||
|
|
||||||
|
Mapping :
|
||||||
|
- case:concept:name = workflow_id + suffixe de chemin
|
||||||
|
- concept:name = node.name
|
||||||
|
- time:timestamp = deduced from edge stats ou created_at
|
||||||
|
"""
|
||||||
|
wf_id = workflow_dict.get("workflow_id", "wf_unknown")
|
||||||
|
nodes = {n["node_id"]: n for n in workflow_dict.get("nodes", [])}
|
||||||
|
edges = workflow_dict.get("edges", [])
|
||||||
|
entry_nodes = workflow_dict.get("entry_nodes", [])
|
||||||
|
created_at = workflow_dict.get("created_at", datetime.now(timezone.utc).isoformat())
|
||||||
|
|
||||||
|
if not nodes or not edges:
|
||||||
|
logger.warning("Workflow vide ou sans edges : %s", wf_id)
|
||||||
|
return pd.DataFrame(columns=[
|
||||||
|
"case:concept:name",
|
||||||
|
"concept:name",
|
||||||
|
"time:timestamp",
|
||||||
|
])
|
||||||
|
|
||||||
|
# Construire un graphe d'adjacence
|
||||||
|
adjacency: Dict[str, List[dict]] = {}
|
||||||
|
for edge in edges:
|
||||||
|
from_node = edge.get("from_node") or edge.get("source_node", "")
|
||||||
|
adjacency.setdefault(from_node, []).append(edge)
|
||||||
|
|
||||||
|
# Parcours DFS pour trouver les chemins (limites a eviter l'explosion)
|
||||||
|
MAX_PATHS = 100
|
||||||
|
paths: List[List[str]] = []
|
||||||
|
|
||||||
|
def _dfs(current: str, path: List[str], visited: set) -> None:
|
||||||
|
if len(paths) >= MAX_PATHS:
|
||||||
|
return
|
||||||
|
if current in visited:
|
||||||
|
# Boucle detectee, sauvegarder le chemin tel quel
|
||||||
|
paths.append(path[:])
|
||||||
|
return
|
||||||
|
visited.add(current)
|
||||||
|
path.append(current)
|
||||||
|
|
||||||
|
outgoing = adjacency.get(current, [])
|
||||||
|
if not outgoing:
|
||||||
|
# End node
|
||||||
|
paths.append(path[:])
|
||||||
|
else:
|
||||||
|
for edge in outgoing:
|
||||||
|
to_node = edge.get("to_node") or edge.get("target_node", "")
|
||||||
|
if to_node:
|
||||||
|
_dfs(to_node, path, visited)
|
||||||
|
path.pop()
|
||||||
|
visited.discard(current)
|
||||||
|
|
||||||
|
for entry in entry_nodes:
|
||||||
|
if entry in nodes:
|
||||||
|
_dfs(entry, [], set())
|
||||||
|
|
||||||
|
# Si pas d'entry nodes, essayer tous les nodes sans edges entrants
|
||||||
|
if not paths:
|
||||||
|
target_nodes = set()
|
||||||
|
for edge in edges:
|
||||||
|
to_node = edge.get("to_node") or edge.get("target_node", "")
|
||||||
|
target_nodes.add(to_node)
|
||||||
|
root_nodes = [nid for nid in nodes if nid not in target_nodes]
|
||||||
|
for root in root_nodes[:3]:
|
||||||
|
_dfs(root, [], set())
|
||||||
|
|
||||||
|
# Construire le DataFrame
|
||||||
|
rows: List[Dict[str, Any]] = []
|
||||||
|
try:
|
||||||
|
base_time = pd.Timestamp(datetime.fromisoformat(created_at))
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
base_time = pd.Timestamp(datetime.now(timezone.utc))
|
||||||
|
|
||||||
|
for i, path in enumerate(paths):
|
||||||
|
case_id = f"{wf_id}_path_{i}"
|
||||||
|
for step_idx, node_id in enumerate(path):
|
||||||
|
node = nodes.get(node_id, {})
|
||||||
|
rows.append({
|
||||||
|
"case:concept:name": case_id,
|
||||||
|
"concept:name": node.get("name", node_id),
|
||||||
|
"time:timestamp": base_time + pd.Timedelta(seconds=step_idx),
|
||||||
|
})
|
||||||
|
|
||||||
|
df = pd.DataFrame(rows)
|
||||||
|
if not df.empty:
|
||||||
|
df = df.sort_values(["case:concept:name", "time:timestamp"]).reset_index(drop=True)
|
||||||
|
logger.info(
|
||||||
|
"Event log depuis workflow : %d evenements, %d chemins",
|
||||||
|
len(df), len(paths),
|
||||||
|
)
|
||||||
|
return df
|
||||||
|
|
||||||
|
|
||||||
|
# ===========================================================================
|
||||||
|
# Decouverte BPMN
|
||||||
|
# ===========================================================================
|
||||||
|
|
||||||
|
|
||||||
|
def discover_bpmn(
|
||||||
|
event_log_df: pd.DataFrame,
|
||||||
|
output_dir: str = "data/analytics/bpmn",
|
||||||
|
name: str = "process",
|
||||||
|
) -> dict:
|
||||||
|
"""
|
||||||
|
Decouvre un modele BPMN depuis un event log via Inductive Miner.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
event_log_df: DataFrame au format PM4Py.
|
||||||
|
output_dir: repertoire de sortie pour les fichiers generes.
|
||||||
|
name: prefixe pour les noms de fichiers.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
{
|
||||||
|
'bpmn_xml_path': str,
|
||||||
|
'bpmn_image_path': str,
|
||||||
|
'petri_net_image_path': str,
|
||||||
|
'dfg_image_path': str,
|
||||||
|
'stats': {
|
||||||
|
'activities': int,
|
||||||
|
'variants': int,
|
||||||
|
'cases': int,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
if not PM4PY_AVAILABLE:
|
||||||
|
raise ImportError("pm4py n'est pas installe. Installez-le : pip install pm4py")
|
||||||
|
|
||||||
|
if event_log_df.empty:
|
||||||
|
raise ValueError("Event log vide, impossible de decouvrir un BPMN")
|
||||||
|
|
||||||
|
out = Path(output_dir)
|
||||||
|
out.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
# Decouverte BPMN par Inductive Miner
|
||||||
|
bpmn_model = pm4py.discover_bpmn_inductive(event_log_df)
|
||||||
|
|
||||||
|
# Export BPMN XML
|
||||||
|
bpmn_xml_path = str(out / f"{name}.bpmn")
|
||||||
|
try:
|
||||||
|
pm4py.write_bpmn(bpmn_model, bpmn_xml_path)
|
||||||
|
except Exception as e:
|
||||||
|
# PM4Py layout peut echouer avec des labels contenant des caracteres
|
||||||
|
# speciaux (accents, guillemets, etc.). Fallback : export via l'exporter
|
||||||
|
# interne sans layout.
|
||||||
|
logger.warning("Layout BPMN echoue (%s), export sans layout", e)
|
||||||
|
from pm4py.objects.bpmn.exporter import exporter as bpmn_exporter
|
||||||
|
bpmn_exporter.apply(bpmn_model, bpmn_xml_path)
|
||||||
|
logger.info("BPMN XML exporte : %s", bpmn_xml_path)
|
||||||
|
|
||||||
|
# Export image BPMN (PNG) — grande taille pour lisibilité
|
||||||
|
bpmn_image_path = str(out / f"{name}_bpmn.png")
|
||||||
|
try:
|
||||||
|
from pm4py.visualization.bpmn import visualizer as bpmn_vis
|
||||||
|
gviz = bpmn_vis.apply(bpmn_model, parameters={
|
||||||
|
"rankdir": "TB",
|
||||||
|
"font_size": "12",
|
||||||
|
})
|
||||||
|
gviz.graph_attr["dpi"] = "150"
|
||||||
|
gviz.graph_attr["size"] = "40,20!"
|
||||||
|
gviz.graph_attr["rankdir"] = "TB"
|
||||||
|
gviz.render(filename=bpmn_image_path.replace(".png", ""), format="png", cleanup=True)
|
||||||
|
logger.info("BPMN PNG exporte : %s", bpmn_image_path)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning("BPMN image fallback : %s", e)
|
||||||
|
try:
|
||||||
|
pm4py.save_vis_bpmn(bpmn_model, bpmn_image_path)
|
||||||
|
except Exception:
|
||||||
|
bpmn_image_path = None
|
||||||
|
|
||||||
|
# DFG (Directly-Follows Graph) — grande taille
|
||||||
|
dfg_image_path = str(out / f"{name}_dfg.png")
|
||||||
|
try:
|
||||||
|
from pm4py.visualization.dfg import visualizer as dfg_vis
|
||||||
|
dfg, sa, ea = pm4py.discover_dfg(event_log_df)
|
||||||
|
gviz = dfg_vis.apply(dfg, activities_count=sa, parameters={
|
||||||
|
"start_activities": sa,
|
||||||
|
"end_activities": ea,
|
||||||
|
"rankdir": "TB",
|
||||||
|
"font_size": "11",
|
||||||
|
})
|
||||||
|
gviz.graph_attr["dpi"] = "150"
|
||||||
|
gviz.graph_attr["size"] = "40,20!"
|
||||||
|
gviz.graph_attr["rankdir"] = "TB"
|
||||||
|
gviz.render(filename=dfg_image_path.replace(".png", ""), format="png", cleanup=True)
|
||||||
|
logger.info("DFG PNG exporte : %s", dfg_image_path)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning("DFG image fallback : %s", e)
|
||||||
|
try:
|
||||||
|
pm4py.save_vis_dfg(*pm4py.discover_dfg(event_log_df), file_path=dfg_image_path)
|
||||||
|
except Exception:
|
||||||
|
dfg_image_path = None
|
||||||
|
|
||||||
|
# Petri net via Inductive Miner (pour visualisation alternative)
|
||||||
|
petri_image_path = str(out / f"{name}_petri.png")
|
||||||
|
try:
|
||||||
|
net, im, fm = pm4py.discover_petri_net_inductive(event_log_df)
|
||||||
|
pm4py.save_vis_petri_net(net, im, fm, file_path=petri_image_path)
|
||||||
|
logger.info("Petri net PNG exporte : %s", petri_image_path)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning("Impossible de generer le Petri net : %s", e)
|
||||||
|
petri_image_path = None
|
||||||
|
|
||||||
|
# Stats de base
|
||||||
|
variants = pm4py.get_variants(event_log_df)
|
||||||
|
n_cases = event_log_df["case:concept:name"].nunique()
|
||||||
|
n_activities = event_log_df["concept:name"].nunique()
|
||||||
|
|
||||||
|
result = {
|
||||||
|
"bpmn_xml_path": bpmn_xml_path,
|
||||||
|
"bpmn_image_path": bpmn_image_path,
|
||||||
|
"petri_net_image_path": petri_image_path,
|
||||||
|
"dfg_image_path": dfg_image_path,
|
||||||
|
"stats": {
|
||||||
|
"activities": n_activities,
|
||||||
|
"variants": len(variants),
|
||||||
|
"cases": n_cases,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
logger.info("Decouverte BPMN terminee : %s", result["stats"])
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
# ===========================================================================
|
||||||
|
# KPIs de process mining
|
||||||
|
# ===========================================================================
|
||||||
|
|
||||||
|
|
||||||
|
def compute_kpis(event_log_df: pd.DataFrame) -> dict:
|
||||||
|
"""
|
||||||
|
Calcule les KPIs de process mining.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
{
|
||||||
|
'total_cases': int,
|
||||||
|
'total_events': int,
|
||||||
|
'unique_activities': int,
|
||||||
|
'variants_count': int,
|
||||||
|
'variants_top5': list,
|
||||||
|
'avg_case_duration_seconds': float,
|
||||||
|
'median_case_duration_seconds': float,
|
||||||
|
'avg_events_per_case': float,
|
||||||
|
'activity_stats': {
|
||||||
|
'<activity_name>': {
|
||||||
|
'count': int,
|
||||||
|
'avg_duration_seconds': float,
|
||||||
|
'min_duration_seconds': float,
|
||||||
|
'max_duration_seconds': float,
|
||||||
|
}
|
||||||
|
},
|
||||||
|
'bottlenecks': [...], # top 3 activites les plus lentes
|
||||||
|
'app_distribution': { '<app_name>': int },
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
if event_log_df.empty:
|
||||||
|
return {
|
||||||
|
"total_cases": 0,
|
||||||
|
"total_events": 0,
|
||||||
|
"unique_activities": 0,
|
||||||
|
"variants_count": 0,
|
||||||
|
"variants_top5": [],
|
||||||
|
"avg_case_duration_seconds": 0.0,
|
||||||
|
"median_case_duration_seconds": 0.0,
|
||||||
|
"avg_events_per_case": 0.0,
|
||||||
|
"activity_stats": {},
|
||||||
|
"bottlenecks": [],
|
||||||
|
"app_distribution": {},
|
||||||
|
}
|
||||||
|
|
||||||
|
df = event_log_df.copy()
|
||||||
|
|
||||||
|
# ---- Metriques globales ----
|
||||||
|
total_cases = df["case:concept:name"].nunique()
|
||||||
|
total_events = len(df)
|
||||||
|
unique_activities = df["concept:name"].nunique()
|
||||||
|
|
||||||
|
# ---- Variantes (PM4Py) ----
|
||||||
|
if PM4PY_AVAILABLE:
|
||||||
|
variants = pm4py.get_variants(df)
|
||||||
|
variants_count = len(variants)
|
||||||
|
# Top 5 variantes par frequence
|
||||||
|
sorted_variants = sorted(variants.items(), key=lambda x: x[1], reverse=True)
|
||||||
|
variants_top5 = [
|
||||||
|
{"variant": " -> ".join(v), "count": c}
|
||||||
|
for v, c in sorted_variants[:5]
|
||||||
|
]
|
||||||
|
else:
|
||||||
|
variants_count = 0
|
||||||
|
variants_top5 = []
|
||||||
|
|
||||||
|
# ---- Duree par case ----
|
||||||
|
case_durations: List[float] = []
|
||||||
|
for _case_id, group in df.groupby("case:concept:name"):
|
||||||
|
ts = group["time:timestamp"]
|
||||||
|
if len(ts) >= 2:
|
||||||
|
duration = (ts.max() - ts.min()).total_seconds()
|
||||||
|
case_durations.append(duration)
|
||||||
|
|
||||||
|
avg_case_dur = float(pd.Series(case_durations).mean()) if case_durations else 0.0
|
||||||
|
median_case_dur = float(pd.Series(case_durations).median()) if case_durations else 0.0
|
||||||
|
avg_events_per_case = total_events / total_cases if total_cases > 0 else 0.0
|
||||||
|
|
||||||
|
# ---- Stats par activite ----
|
||||||
|
activity_stats: Dict[str, Dict[str, Any]] = {}
|
||||||
|
# Calculer la duree entre chaque evenement et le suivant dans le meme case
|
||||||
|
df_sorted = df.sort_values(["case:concept:name", "time:timestamp"])
|
||||||
|
df_sorted["next_timestamp"] = df_sorted.groupby("case:concept:name")[
|
||||||
|
"time:timestamp"
|
||||||
|
].shift(-1)
|
||||||
|
df_sorted["duration_to_next"] = (
|
||||||
|
df_sorted["next_timestamp"] - df_sorted["time:timestamp"]
|
||||||
|
).dt.total_seconds()
|
||||||
|
|
||||||
|
for activity, grp in df_sorted.groupby("concept:name"):
|
||||||
|
durations = grp["duration_to_next"].dropna()
|
||||||
|
# Filtrer les durees aberrantes (> 5 min = probablement une pause)
|
||||||
|
durations = durations[durations <= 300]
|
||||||
|
stats: Dict[str, Any] = {
|
||||||
|
"count": len(grp),
|
||||||
|
"avg_duration_seconds": round(float(durations.mean()), 2) if len(durations) > 0 else 0.0,
|
||||||
|
"min_duration_seconds": round(float(durations.min()), 2) if len(durations) > 0 else 0.0,
|
||||||
|
"max_duration_seconds": round(float(durations.max()), 2) if len(durations) > 0 else 0.0,
|
||||||
|
}
|
||||||
|
activity_stats[activity] = stats
|
||||||
|
|
||||||
|
# ---- Goulots d'etranglement (top 3 activites les plus lentes) ----
|
||||||
|
bottlenecks = sorted(
|
||||||
|
[
|
||||||
|
{"activity": act, "avg_duration_seconds": s["avg_duration_seconds"]}
|
||||||
|
for act, s in activity_stats.items()
|
||||||
|
if s["avg_duration_seconds"] > 0
|
||||||
|
],
|
||||||
|
key=lambda x: x["avg_duration_seconds"],
|
||||||
|
reverse=True,
|
||||||
|
)[:3]
|
||||||
|
|
||||||
|
# ---- Distribution par application ----
|
||||||
|
app_distribution: Dict[str, int] = {}
|
||||||
|
if "app_name" in df.columns:
|
||||||
|
app_distribution = df["app_name"].value_counts().to_dict()
|
||||||
|
|
||||||
|
return {
|
||||||
|
"total_cases": total_cases,
|
||||||
|
"total_events": total_events,
|
||||||
|
"unique_activities": unique_activities,
|
||||||
|
"variants_count": variants_count,
|
||||||
|
"variants_top5": variants_top5,
|
||||||
|
"avg_case_duration_seconds": round(avg_case_dur, 2),
|
||||||
|
"median_case_duration_seconds": round(median_case_dur, 2),
|
||||||
|
"avg_events_per_case": round(avg_events_per_case, 1),
|
||||||
|
"activity_stats": activity_stats,
|
||||||
|
"bottlenecks": bottlenecks,
|
||||||
|
"app_distribution": app_distribution,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ===========================================================================
|
||||||
|
# Helpers : chargement sessions JSONL
|
||||||
|
# ===========================================================================
|
||||||
|
|
||||||
|
|
||||||
|
def load_jsonl_session(jsonl_path: str) -> List[dict]:
|
||||||
|
"""
|
||||||
|
Charge un fichier live_events.jsonl en liste de dicts.
|
||||||
|
|
||||||
|
Ignore les lignes vides ou invalides.
|
||||||
|
"""
|
||||||
|
events: List[dict] = []
|
||||||
|
path = Path(jsonl_path)
|
||||||
|
if not path.exists():
|
||||||
|
raise FileNotFoundError(f"Fichier JSONL introuvable : {jsonl_path}")
|
||||||
|
|
||||||
|
with open(path, "r", encoding="utf-8") as f:
|
||||||
|
for line_num, line in enumerate(f, 1):
|
||||||
|
line = line.strip()
|
||||||
|
if not line:
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
events.append(json.loads(line))
|
||||||
|
except json.JSONDecodeError as e:
|
||||||
|
logger.warning("Ligne %d invalide dans %s : %s", line_num, jsonl_path, e)
|
||||||
|
|
||||||
|
logger.info("Charge %d evenements depuis %s", len(events), jsonl_path)
|
||||||
|
return events
|
||||||
|
|
||||||
|
|
||||||
|
def load_multiple_sessions(session_dirs: List[str]) -> List[dict]:
|
||||||
|
"""
|
||||||
|
Charge plusieurs sessions depuis leurs repertoires.
|
||||||
|
|
||||||
|
Cherche un fichier live_events.jsonl dans chaque repertoire.
|
||||||
|
"""
|
||||||
|
all_events: List[dict] = []
|
||||||
|
for session_dir in session_dirs:
|
||||||
|
jsonl_path = Path(session_dir) / "live_events.jsonl"
|
||||||
|
if jsonl_path.exists():
|
||||||
|
all_events.extend(load_jsonl_session(str(jsonl_path)))
|
||||||
|
else:
|
||||||
|
logger.warning("Pas de live_events.jsonl dans %s", session_dir)
|
||||||
|
return all_events
|
||||||
60
core/analytics/screen_change_detector.py
Normal file
60
core/analytics/screen_change_detector.py
Normal file
@@ -0,0 +1,60 @@
|
|||||||
|
"""
|
||||||
|
Détection rapide de changement d'écran via perceptual hash (pHash).
|
||||||
|
|
||||||
|
Utilise imagehash pour calculer un hash perceptuel par screenshot.
|
||||||
|
La distance de Hamming entre deux hashes indique le degré de changement :
|
||||||
|
- < 5 : même écran (bruit, curseur déplacé)
|
||||||
|
- 5-15 : changement mineur (scroll, popup, champ rempli)
|
||||||
|
- > 15 : nouvel écran (nouvelle fenêtre, navigation)
|
||||||
|
|
||||||
|
Performance : ~15ms par hash sur CPU pour des screenshots 2560x1600.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from PIL import Image
|
||||||
|
import imagehash
|
||||||
|
from typing import Tuple, Optional
|
||||||
|
from enum import Enum
|
||||||
|
|
||||||
|
|
||||||
|
class ScreenChangeLevel(Enum):
|
||||||
|
SAME = "same" # distance < 5
|
||||||
|
MINOR = "minor" # 5 <= distance < 15
|
||||||
|
MAJOR = "major" # distance >= 15
|
||||||
|
|
||||||
|
|
||||||
|
def compute_phash(image: Image.Image, hash_size: int = 8) -> imagehash.ImageHash:
|
||||||
|
"""Calcule le pHash d'une image PIL."""
|
||||||
|
return imagehash.phash(image, hash_size=hash_size)
|
||||||
|
|
||||||
|
|
||||||
|
def compare_screenshots(img1: Image.Image, img2: Image.Image, hash_size: int = 8) -> Tuple[int, ScreenChangeLevel]:
|
||||||
|
"""
|
||||||
|
Compare deux screenshots et retourne la distance + le niveau de changement.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
(distance, level) — distance de Hamming et niveau de changement
|
||||||
|
"""
|
||||||
|
h1 = compute_phash(img1, hash_size)
|
||||||
|
h2 = compute_phash(img2, hash_size)
|
||||||
|
distance = h1 - h2
|
||||||
|
|
||||||
|
if distance < 5:
|
||||||
|
level = ScreenChangeLevel.SAME
|
||||||
|
elif distance < 15:
|
||||||
|
level = ScreenChangeLevel.MINOR
|
||||||
|
else:
|
||||||
|
level = ScreenChangeLevel.MAJOR
|
||||||
|
|
||||||
|
return distance, level
|
||||||
|
|
||||||
|
|
||||||
|
def compare_hashes(hash1: imagehash.ImageHash, hash2: imagehash.ImageHash) -> Tuple[int, ScreenChangeLevel]:
|
||||||
|
"""Compare deux hashes pré-calculés."""
|
||||||
|
distance = hash1 - hash2
|
||||||
|
if distance < 5:
|
||||||
|
level = ScreenChangeLevel.SAME
|
||||||
|
elif distance < 15:
|
||||||
|
level = ScreenChangeLevel.MINOR
|
||||||
|
else:
|
||||||
|
level = ScreenChangeLevel.MAJOR
|
||||||
|
return distance, level
|
||||||
0
core/cognition/__init__.py
Normal file
0
core/cognition/__init__.py
Normal file
191
core/cognition/vram_orchestrator.py
Normal file
191
core/cognition/vram_orchestrator.py
Normal file
@@ -0,0 +1,191 @@
|
|||||||
|
"""
|
||||||
|
Orchestrateur VRAM — gère le chargement/déchargement des modèles selon le mode.
|
||||||
|
|
||||||
|
Deux modes :
|
||||||
|
- SHADOW : streaming server + agent_chat actifs, VLM raisonnement déchargé
|
||||||
|
- REPLAY : VLM raisonnement (qwen2.5vl:7b) chargé, services non-essentiels stoppés
|
||||||
|
|
||||||
|
Bascule automatique ou manuelle selon le contexte.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import subprocess
|
||||||
|
import time
|
||||||
|
from enum import Enum
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
OLLAMA_URL = os.environ.get("OLLAMA_URL", "http://localhost:11434")
|
||||||
|
REASONING_MODEL = os.environ.get("RPA_REASONING_MODEL", "qwen2.5vl:7b")
|
||||||
|
MIN_VRAM_FOR_REASONING = 5.0 # Go minimum pour charger le modèle de raisonnement
|
||||||
|
|
||||||
|
|
||||||
|
class VRAMMode(Enum):
|
||||||
|
SHADOW = "shadow"
|
||||||
|
REPLAY = "replay"
|
||||||
|
|
||||||
|
|
||||||
|
class VRAMOrchestrator:
|
||||||
|
"""Gère la VRAM pour éviter les conflits entre modèles."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self._current_mode: Optional[VRAMMode] = None
|
||||||
|
self._stopped_services: list = []
|
||||||
|
|
||||||
|
def get_free_vram_gb(self) -> float:
|
||||||
|
"""Retourne la VRAM libre en Go."""
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
["nvidia-smi", "--query-gpu=memory.free", "--format=csv,noheader,nounits"],
|
||||||
|
capture_output=True, text=True, timeout=5
|
||||||
|
)
|
||||||
|
return float(result.stdout.strip()) / 1024
|
||||||
|
except Exception:
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
def get_used_vram_gb(self) -> float:
|
||||||
|
"""Retourne la VRAM utilisée en Go."""
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
["nvidia-smi", "--query-gpu=memory.used", "--format=csv,noheader,nounits"],
|
||||||
|
capture_output=True, text=True, timeout=5
|
||||||
|
)
|
||||||
|
return float(result.stdout.strip()) / 1024
|
||||||
|
except Exception:
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
def switch_to_replay(self) -> bool:
|
||||||
|
"""Bascule en mode replay : libère la VRAM pour le VLM de raisonnement.
|
||||||
|
|
||||||
|
1. Stoppe les services non-essentiels (agent_chat)
|
||||||
|
2. Redémarre Ollama pour libérer les modèles chargés
|
||||||
|
3. Précharge le modèle de raisonnement
|
||||||
|
"""
|
||||||
|
if self._current_mode == VRAMMode.REPLAY:
|
||||||
|
logger.info("Déjà en mode REPLAY")
|
||||||
|
return True
|
||||||
|
|
||||||
|
logger.info("Bascule en mode REPLAY...")
|
||||||
|
|
||||||
|
# Stopper agent_chat si il tourne
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
["pgrep", "-f", "agent_chat"],
|
||||||
|
capture_output=True, text=True, timeout=5
|
||||||
|
)
|
||||||
|
pids = result.stdout.strip().split('\n')
|
||||||
|
for pid in pids:
|
||||||
|
if pid.strip():
|
||||||
|
subprocess.run(["kill", pid.strip()], timeout=5)
|
||||||
|
self._stopped_services.append(("agent_chat", pid.strip()))
|
||||||
|
logger.info(f"agent_chat stoppé (PID {pid.strip()})")
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug(f"Pas d'agent_chat à stopper: {e}")
|
||||||
|
|
||||||
|
# Redémarrer Ollama pour libérer la mémoire
|
||||||
|
try:
|
||||||
|
subprocess.run(["sudo", "systemctl", "restart", "ollama"],
|
||||||
|
timeout=10, check=True)
|
||||||
|
time.sleep(2)
|
||||||
|
logger.info("Ollama redémarré")
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Impossible de redémarrer Ollama: {e}")
|
||||||
|
|
||||||
|
# Vérifier la VRAM disponible
|
||||||
|
free = self.get_free_vram_gb()
|
||||||
|
logger.info(f"VRAM libre: {free:.1f} Go")
|
||||||
|
|
||||||
|
if free < MIN_VRAM_FOR_REASONING:
|
||||||
|
logger.warning(f"VRAM insuffisante ({free:.1f} Go < {MIN_VRAM_FOR_REASONING} Go)")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Précharger le modèle de raisonnement
|
||||||
|
try:
|
||||||
|
import requests
|
||||||
|
logger.info(f"Préchargement {REASONING_MODEL}...")
|
||||||
|
resp = requests.post(f"{OLLAMA_URL}/api/generate", json={
|
||||||
|
"model": REASONING_MODEL,
|
||||||
|
"prompt": "test",
|
||||||
|
"stream": False,
|
||||||
|
"options": {"num_predict": 1}
|
||||||
|
}, timeout=60)
|
||||||
|
if resp.status_code == 200:
|
||||||
|
logger.info(f"{REASONING_MODEL} chargé en VRAM")
|
||||||
|
free_after = self.get_free_vram_gb()
|
||||||
|
logger.info(f"VRAM libre après chargement: {free_after:.1f} Go")
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Préchargement échoué: {e}")
|
||||||
|
|
||||||
|
self._current_mode = VRAMMode.REPLAY
|
||||||
|
return True
|
||||||
|
|
||||||
|
def switch_to_shadow(self) -> bool:
|
||||||
|
"""Bascule en mode shadow : relance les services d'observation.
|
||||||
|
|
||||||
|
1. Redémarre Ollama (décharge le VLM de raisonnement)
|
||||||
|
2. Relance les services stoppés
|
||||||
|
"""
|
||||||
|
if self._current_mode == VRAMMode.SHADOW:
|
||||||
|
logger.info("Déjà en mode SHADOW")
|
||||||
|
return True
|
||||||
|
|
||||||
|
logger.info("Bascule en mode SHADOW...")
|
||||||
|
|
||||||
|
# Redémarrer Ollama
|
||||||
|
try:
|
||||||
|
subprocess.run(["sudo", "systemctl", "restart", "ollama"],
|
||||||
|
timeout=10, check=True)
|
||||||
|
time.sleep(2)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Impossible de redémarrer Ollama: {e}")
|
||||||
|
|
||||||
|
# Relancer les services stoppés
|
||||||
|
for service_name, _pid in self._stopped_services:
|
||||||
|
try:
|
||||||
|
if service_name == "agent_chat":
|
||||||
|
subprocess.Popen(
|
||||||
|
["python3", "-m", "agent_chat.app"],
|
||||||
|
cwd="/home/dom/ai/rpa_vision_v3",
|
||||||
|
stdout=subprocess.DEVNULL,
|
||||||
|
stderr=subprocess.DEVNULL
|
||||||
|
)
|
||||||
|
logger.info(f"{service_name} relancé")
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Impossible de relancer {service_name}: {e}")
|
||||||
|
|
||||||
|
self._stopped_services.clear()
|
||||||
|
self._current_mode = VRAMMode.SHADOW
|
||||||
|
return True
|
||||||
|
|
||||||
|
def ensure_reasoning_ready(self) -> bool:
|
||||||
|
"""Vérifie que le VLM de raisonnement est prêt. Bascule si nécessaire."""
|
||||||
|
free = self.get_free_vram_gb()
|
||||||
|
if free >= MIN_VRAM_FOR_REASONING:
|
||||||
|
return True
|
||||||
|
return self.switch_to_replay()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def current_mode(self) -> Optional[str]:
|
||||||
|
return self._current_mode.value if self._current_mode else None
|
||||||
|
|
||||||
|
def status(self) -> dict:
|
||||||
|
return {
|
||||||
|
"mode": self.current_mode,
|
||||||
|
"vram_free_gb": round(self.get_free_vram_gb(), 1),
|
||||||
|
"vram_used_gb": round(self.get_used_vram_gb(), 1),
|
||||||
|
"reasoning_model": REASONING_MODEL,
|
||||||
|
"stopped_services": [s[0] for s in self._stopped_services],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# Singleton
|
||||||
|
_orchestrator: Optional[VRAMOrchestrator] = None
|
||||||
|
|
||||||
|
|
||||||
|
def get_orchestrator() -> VRAMOrchestrator:
|
||||||
|
global _orchestrator
|
||||||
|
if _orchestrator is None:
|
||||||
|
_orchestrator = VRAMOrchestrator()
|
||||||
|
return _orchestrator
|
||||||
260
core/cognition/working_memory.py
Normal file
260
core/cognition/working_memory.py
Normal file
@@ -0,0 +1,260 @@
|
|||||||
|
"""
|
||||||
|
Mémoire de travail de Léa — contexte cognitif pendant l'exécution.
|
||||||
|
|
||||||
|
Donne à Léa la conscience de "où elle en est" :
|
||||||
|
- Quel objectif elle poursuit
|
||||||
|
- Quel écran elle voit
|
||||||
|
- Ce qu'elle vient de faire
|
||||||
|
- Ce qu'elle doit faire ensuite
|
||||||
|
- Ce qu'elle a appris en cours de route
|
||||||
|
|
||||||
|
Sans ça, chaque étape est indépendante — Léa est amnésique entre
|
||||||
|
deux actions. Avec ça, elle raisonne en contexte.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Any, Dict, List, Optional
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Observation:
|
||||||
|
"""Ce que Léa observe sur l'écran à un instant donné."""
|
||||||
|
timestamp: datetime
|
||||||
|
window_title: str = ""
|
||||||
|
application: str = ""
|
||||||
|
ocr_text: str = ""
|
||||||
|
ui_pattern: Optional[str] = None
|
||||||
|
screen_description: str = ""
|
||||||
|
confidence: float = 0.0
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ActionRecord:
|
||||||
|
"""Une action que Léa a effectuée."""
|
||||||
|
timestamp: datetime
|
||||||
|
action_type: str
|
||||||
|
target: str = ""
|
||||||
|
result: str = ""
|
||||||
|
success: bool = True
|
||||||
|
duration_ms: float = 0
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CognitiveContext:
|
||||||
|
"""Contexte cognitif complet — la "pensée" de Léa à un instant donné.
|
||||||
|
|
||||||
|
C'est le bloc-notes interne qui est réinjecté à chaque décision.
|
||||||
|
Le VLM reçoit ce contexte pour raisonner en connaissance de cause.
|
||||||
|
"""
|
||||||
|
|
||||||
|
# Objectif global (ce que Léa essaie d'accomplir)
|
||||||
|
objective: str = ""
|
||||||
|
|
||||||
|
# Étape courante dans le plan
|
||||||
|
current_step: int = 0
|
||||||
|
total_steps: int = 0
|
||||||
|
current_step_description: str = ""
|
||||||
|
|
||||||
|
# Ce que Léa voit maintenant
|
||||||
|
current_observation: Optional[Observation] = None
|
||||||
|
|
||||||
|
# Historique des N dernières actions (mémoire court terme)
|
||||||
|
action_history: List[ActionRecord] = field(default_factory=list)
|
||||||
|
max_history: int = 10
|
||||||
|
|
||||||
|
# Ce que Léa a appris pendant cette session
|
||||||
|
learned_facts: List[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
# Plan : les étapes restantes
|
||||||
|
remaining_steps: List[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
# État émotionnel / confiance
|
||||||
|
confidence: float = 1.0
|
||||||
|
needs_help: bool = False
|
||||||
|
help_reason: str = ""
|
||||||
|
|
||||||
|
# Timing
|
||||||
|
session_id: str = ""
|
||||||
|
machine_id: str = ""
|
||||||
|
started_at: Optional[datetime] = None
|
||||||
|
step_started_at: Optional[datetime] = None
|
||||||
|
step_durations: Dict[str, List[float]] = field(default_factory=dict)
|
||||||
|
|
||||||
|
# Ce que Léa devrait voir à l'écran (comparaison attendu vs réel)
|
||||||
|
expected_screen: str = ""
|
||||||
|
|
||||||
|
def record_action(self, action_type: str, target: str = "",
|
||||||
|
result: str = "", success: bool = True,
|
||||||
|
duration_ms: float = 0):
|
||||||
|
"""Enregistre une action dans l'historique."""
|
||||||
|
self.action_history.append(ActionRecord(
|
||||||
|
timestamp=datetime.now(),
|
||||||
|
action_type=action_type,
|
||||||
|
target=target,
|
||||||
|
result=result,
|
||||||
|
success=success,
|
||||||
|
duration_ms=duration_ms,
|
||||||
|
))
|
||||||
|
if len(self.action_history) > self.max_history:
|
||||||
|
self.action_history = self.action_history[-self.max_history:]
|
||||||
|
|
||||||
|
if not success:
|
||||||
|
self.confidence = max(0, self.confidence - 0.2)
|
||||||
|
else:
|
||||||
|
self.confidence = min(1.0, self.confidence + 0.05)
|
||||||
|
|
||||||
|
def observe(self, window_title: str = "", application: str = "",
|
||||||
|
ocr_text: str = "", ui_pattern: Optional[str] = None,
|
||||||
|
screen_description: str = ""):
|
||||||
|
"""Met à jour l'observation courante."""
|
||||||
|
self.current_observation = Observation(
|
||||||
|
timestamp=datetime.now(),
|
||||||
|
window_title=window_title,
|
||||||
|
application=application,
|
||||||
|
ocr_text=ocr_text,
|
||||||
|
ui_pattern=ui_pattern,
|
||||||
|
screen_description=screen_description,
|
||||||
|
)
|
||||||
|
|
||||||
|
def advance_step(self):
|
||||||
|
"""Passe à l'étape suivante du plan."""
|
||||||
|
# Enregistrer la durée de l'étape précédente
|
||||||
|
if self.step_started_at:
|
||||||
|
duration = (datetime.now() - self.step_started_at).total_seconds()
|
||||||
|
step_key = self.current_step_description or f"step_{self.current_step}"
|
||||||
|
self.step_durations.setdefault(step_key, []).append(duration)
|
||||||
|
|
||||||
|
self.current_step += 1
|
||||||
|
self.step_started_at = datetime.now()
|
||||||
|
if self.remaining_steps:
|
||||||
|
self.current_step_description = self.remaining_steps.pop(0)
|
||||||
|
|
||||||
|
def get_step_timing(self) -> Optional[Dict[str, Any]]:
|
||||||
|
"""Retourne les infos de timing de l'étape en cours."""
|
||||||
|
if not self.step_started_at:
|
||||||
|
return None
|
||||||
|
|
||||||
|
elapsed = (datetime.now() - self.step_started_at).total_seconds()
|
||||||
|
step_key = self.current_step_description or f"step_{self.current_step}"
|
||||||
|
history = self.step_durations.get(step_key, [])
|
||||||
|
avg = sum(history) / len(history) if history else None
|
||||||
|
|
||||||
|
result = {"elapsed_seconds": elapsed}
|
||||||
|
if avg:
|
||||||
|
result["avg_previous"] = avg
|
||||||
|
result["is_slow"] = elapsed > avg * 2
|
||||||
|
return result
|
||||||
|
|
||||||
|
def set_expected_screen(self, description: str):
|
||||||
|
"""Définit ce que Léa devrait voir à l'écran pour cette étape."""
|
||||||
|
self.expected_screen = description
|
||||||
|
|
||||||
|
def check_screen_matches_expected(self) -> Optional[bool]:
|
||||||
|
"""Compare l'observation actuelle avec l'écran attendu."""
|
||||||
|
if not self.expected_screen or not self.current_observation:
|
||||||
|
return None
|
||||||
|
obs_text = (self.current_observation.window_title + " " +
|
||||||
|
self.current_observation.ocr_text).lower()
|
||||||
|
expected_words = self.expected_screen.lower().split()
|
||||||
|
matches = sum(1 for w in expected_words if w in obs_text)
|
||||||
|
return matches / max(len(expected_words), 1) > 0.3
|
||||||
|
|
||||||
|
def learn(self, fact: str):
|
||||||
|
"""Enregistre un fait appris pendant l'exécution."""
|
||||||
|
if fact not in self.learned_facts:
|
||||||
|
self.learned_facts.append(fact)
|
||||||
|
logger.info(f"Fait appris: {fact}")
|
||||||
|
|
||||||
|
def ask_for_help(self, reason: str):
|
||||||
|
"""Signale que Léa a besoin d'aide."""
|
||||||
|
self.needs_help = True
|
||||||
|
self.help_reason = reason
|
||||||
|
self.confidence = max(0, self.confidence - 0.3)
|
||||||
|
logger.warning(f"Léa demande de l'aide: {reason}")
|
||||||
|
|
||||||
|
def to_prompt_context(self) -> str:
|
||||||
|
"""Génère le contexte à injecter dans le prompt VLM.
|
||||||
|
|
||||||
|
C'est ce texte qui donne au VLM la conscience de la situation.
|
||||||
|
"""
|
||||||
|
lines = []
|
||||||
|
|
||||||
|
if self.objective:
|
||||||
|
lines.append(f"OBJECTIF : {self.objective}")
|
||||||
|
|
||||||
|
if self.current_step > 0:
|
||||||
|
lines.append(f"PROGRESSION : étape {self.current_step}/{self.total_steps}")
|
||||||
|
if self.current_step_description:
|
||||||
|
lines.append(f"ÉTAPE EN COURS : {self.current_step_description}")
|
||||||
|
|
||||||
|
if self.current_observation:
|
||||||
|
obs = self.current_observation
|
||||||
|
if obs.window_title:
|
||||||
|
lines.append(f"FENÊTRE ACTIVE : {obs.window_title}")
|
||||||
|
if obs.application:
|
||||||
|
lines.append(f"APPLICATION : {obs.application}")
|
||||||
|
if obs.ui_pattern:
|
||||||
|
lines.append(f"DIALOGUE DÉTECTÉ : {obs.ui_pattern}")
|
||||||
|
|
||||||
|
if self.action_history:
|
||||||
|
last_actions = self.action_history[-3:]
|
||||||
|
lines.append("DERNIÈRES ACTIONS :")
|
||||||
|
for a in last_actions:
|
||||||
|
status = "OK" if a.success else "ÉCHEC"
|
||||||
|
lines.append(f" - {a.action_type} '{a.target}' → {status}")
|
||||||
|
|
||||||
|
if self.learned_facts:
|
||||||
|
lines.append("FAITS APPRIS :")
|
||||||
|
for fact in self.learned_facts[-5:]:
|
||||||
|
lines.append(f" - {fact}")
|
||||||
|
|
||||||
|
if self.remaining_steps:
|
||||||
|
lines.append("PROCHAINES ÉTAPES :")
|
||||||
|
for step in self.remaining_steps[:3]:
|
||||||
|
lines.append(f" - {step}")
|
||||||
|
|
||||||
|
timing = self.get_step_timing()
|
||||||
|
if timing:
|
||||||
|
lines.append(f"TEMPS ÉTAPE : {timing['elapsed_seconds']:.1f}s")
|
||||||
|
if timing.get('avg_previous'):
|
||||||
|
lines.append(f"MOYENNE PRÉCÉDENTE : {timing['avg_previous']:.1f}s")
|
||||||
|
if timing.get('is_slow'):
|
||||||
|
lines.append("⚠ ÉTAPE ANORMALEMENT LENTE")
|
||||||
|
|
||||||
|
if self.expected_screen:
|
||||||
|
match = self.check_screen_matches_expected()
|
||||||
|
if match is False:
|
||||||
|
lines.append(f"⚠ ÉCRAN INATTENDU (attendu: {self.expected_screen})")
|
||||||
|
elif match is True:
|
||||||
|
lines.append(f"ÉCRAN CONFORME : {self.expected_screen}")
|
||||||
|
|
||||||
|
lines.append(f"CONFIANCE : {self.confidence:.0%}")
|
||||||
|
|
||||||
|
if self.needs_help:
|
||||||
|
lines.append(f"BESOIN D'AIDE : {self.help_reason}")
|
||||||
|
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
def to_dict(self) -> Dict[str, Any]:
|
||||||
|
"""Sérialise le contexte pour le stockage/transport."""
|
||||||
|
return {
|
||||||
|
"objective": self.objective,
|
||||||
|
"current_step": self.current_step,
|
||||||
|
"total_steps": self.total_steps,
|
||||||
|
"current_step_description": self.current_step_description,
|
||||||
|
"confidence": self.confidence,
|
||||||
|
"needs_help": self.needs_help,
|
||||||
|
"help_reason": self.help_reason,
|
||||||
|
"action_count": len(self.action_history),
|
||||||
|
"learned_facts": self.learned_facts,
|
||||||
|
"remaining_steps": self.remaining_steps,
|
||||||
|
"last_observation": {
|
||||||
|
"window_title": self.current_observation.window_title,
|
||||||
|
"application": self.current_observation.application,
|
||||||
|
"ui_pattern": self.current_observation.ui_pattern,
|
||||||
|
} if self.current_observation else None,
|
||||||
|
}
|
||||||
@@ -58,8 +58,18 @@ class CLIPEmbedder(EmbedderBase):
|
|||||||
"Install it with: pip install open-clip-torch"
|
"Install it with: pip install open-clip-torch"
|
||||||
)
|
)
|
||||||
|
|
||||||
# Default to CPU to save GPU for vision models (Qwen3-VL, etc.)
|
|
||||||
if device is None:
|
if device is None:
|
||||||
|
try:
|
||||||
|
import torch
|
||||||
|
if torch.cuda.is_available():
|
||||||
|
free_vram = torch.cuda.mem_get_info()[0] / 1024**3
|
||||||
|
if free_vram > 1.5:
|
||||||
|
device = "cuda"
|
||||||
|
else:
|
||||||
|
device = "cpu"
|
||||||
|
else:
|
||||||
|
device = "cpu"
|
||||||
|
except Exception:
|
||||||
device = "cpu"
|
device = "cpu"
|
||||||
|
|
||||||
self.model_name = model_name
|
self.model_name = model_name
|
||||||
|
|||||||
@@ -10,6 +10,7 @@ from .error_handler import ErrorHandler, ErrorType, RecoveryStrategy
|
|||||||
from .workflow_runner import WorkflowRunner, RunResult, RunStatus, RunnerConfig
|
from .workflow_runner import WorkflowRunner, RunResult, RunStatus, RunnerConfig
|
||||||
from .dag_executor import DAGExecutor, WorkflowStep, StepType, StepStatus, DAGExecutionResult
|
from .dag_executor import DAGExecutor, WorkflowStep, StepType, StepStatus, DAGExecutionResult
|
||||||
from .llm_actions import LLMActionHandler
|
from .llm_actions import LLMActionHandler
|
||||||
|
from .observe_reason_act import ORALoop, Observation, Decision, VerificationResult, LoopResult
|
||||||
|
|
||||||
# Import tardif pour éviter import circulaire avec pipeline
|
# Import tardif pour éviter import circulaire avec pipeline
|
||||||
def _get_execution_loop():
|
def _get_execution_loop():
|
||||||
@@ -34,5 +35,11 @@ __all__ = [
|
|||||||
'StepStatus',
|
'StepStatus',
|
||||||
'DAGExecutionResult',
|
'DAGExecutionResult',
|
||||||
'LLMActionHandler',
|
'LLMActionHandler',
|
||||||
|
# ORA — boucle Observe-Raisonne-Agit avec vérification
|
||||||
|
'ORALoop',
|
||||||
|
'Observation',
|
||||||
|
'Decision',
|
||||||
|
'VerificationResult',
|
||||||
|
'LoopResult',
|
||||||
# ExecutionLoop accessible via import direct du module
|
# ExecutionLoop accessible via import direct du module
|
||||||
]
|
]
|
||||||
|
|||||||
@@ -654,7 +654,8 @@ class ActionExecutor:
|
|||||||
if PYAUTOGUI_AVAILABLE:
|
if PYAUTOGUI_AVAILABLE:
|
||||||
pyautogui.click(click_x, click_y)
|
pyautogui.click(click_x, click_y)
|
||||||
time.sleep(0.2)
|
time.sleep(0.2)
|
||||||
pyautogui.write(text, interval=0.05)
|
from .input_handler import safe_type_text
|
||||||
|
safe_type_text(text)
|
||||||
else:
|
else:
|
||||||
logger.info(f" (Simulated click at {click_x:.0f}, {click_y:.0f})")
|
logger.info(f" (Simulated click at {click_x:.0f}, {click_y:.0f})")
|
||||||
logger.info(f" (Simulated typing: {text[:50]}...)")
|
logger.info(f" (Simulated typing: {text[:50]}...)")
|
||||||
|
|||||||
757
core/execution/input_handler.py
Normal file
757
core/execution/input_handler.py
Normal file
@@ -0,0 +1,757 @@
|
|||||||
|
"""
|
||||||
|
Module partagé de saisie texte et gestion des dialogues.
|
||||||
|
|
||||||
|
Utilisé par les deux executors :
|
||||||
|
- VWB executor (visual_workflow_builder/backend/api_v3/execute.py)
|
||||||
|
- Core executor (core/execution/action_executor.py)
|
||||||
|
|
||||||
|
Garantit le même comportement AZERTY/VM/Citrix partout.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import subprocess
|
||||||
|
import shutil
|
||||||
|
import time
|
||||||
|
from typing import Any, Dict, List, Optional
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
try:
|
||||||
|
import pyautogui
|
||||||
|
PYAUTOGUI_AVAILABLE = True
|
||||||
|
except Exception:
|
||||||
|
# pyautogui peut lever Xlib.error.DisplayConnectionError (pas un ImportError)
|
||||||
|
# quand X n'est pas accessible — typique d'un service systemd côté serveur.
|
||||||
|
PYAUTOGUI_AVAILABLE = False
|
||||||
|
|
||||||
|
try:
|
||||||
|
import mss
|
||||||
|
MSS_AVAILABLE = True
|
||||||
|
except ImportError:
|
||||||
|
MSS_AVAILABLE = False
|
||||||
|
|
||||||
|
try:
|
||||||
|
from PIL import Image as PILImage
|
||||||
|
PIL_AVAILABLE = True
|
||||||
|
except ImportError:
|
||||||
|
PIL_AVAILABLE = False
|
||||||
|
|
||||||
|
|
||||||
|
def safe_type_text(text: str):
|
||||||
|
"""Saisie de texte compatible VM/Citrix et claviers AZERTY/QWERTY.
|
||||||
|
|
||||||
|
Priorité :
|
||||||
|
1. xdotool type avec refresh layout → traverse les VM spice/QEMU
|
||||||
|
2. Presse-papier (xclip) + Ctrl+V → fallback
|
||||||
|
3. pyautogui.write() → dernier recours
|
||||||
|
"""
|
||||||
|
if not text:
|
||||||
|
return
|
||||||
|
|
||||||
|
# Méthode 1 : xdotool type avec refresh du layout clavier
|
||||||
|
if shutil.which('xdotool') and shutil.which('setxkbmap'):
|
||||||
|
try:
|
||||||
|
subprocess.run(['setxkbmap', 'fr'], timeout=2)
|
||||||
|
subprocess.run(
|
||||||
|
['xdotool', 'type', '--delay', '0', '--clearmodifiers', '--', text],
|
||||||
|
timeout=max(30, len(text) * 0.05),
|
||||||
|
check=True
|
||||||
|
)
|
||||||
|
logger.debug(f"Saisie via xdotool type ({len(text)} car.)")
|
||||||
|
return
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug(f"xdotool type échoué: {e}")
|
||||||
|
|
||||||
|
# Méthode 2 : Presse-papier
|
||||||
|
xclip = shutil.which('xclip')
|
||||||
|
if xclip and PYAUTOGUI_AVAILABLE:
|
||||||
|
try:
|
||||||
|
p = subprocess.Popen(
|
||||||
|
['xclip', '-selection', 'clipboard'],
|
||||||
|
stdin=subprocess.PIPE,
|
||||||
|
stdout=subprocess.DEVNULL,
|
||||||
|
stderr=subprocess.DEVNULL
|
||||||
|
)
|
||||||
|
p.stdin.write(text.encode('utf-8'))
|
||||||
|
p.stdin.close()
|
||||||
|
time.sleep(0.2)
|
||||||
|
pyautogui.hotkey('ctrl', 'v')
|
||||||
|
time.sleep(0.3)
|
||||||
|
logger.debug(f"Saisie via presse-papier ({len(text)} car.)")
|
||||||
|
return
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug(f"xclip échoué: {e}")
|
||||||
|
|
||||||
|
# Méthode 3 : pyautogui
|
||||||
|
if PYAUTOGUI_AVAILABLE:
|
||||||
|
logger.warning("Saisie via pyautogui.write() (AZERTY non garanti)")
|
||||||
|
pyautogui.write(text, interval=0.02)
|
||||||
|
else:
|
||||||
|
logger.warning(f"Aucune méthode de saisie disponible pour: {text[:50]}")
|
||||||
|
|
||||||
|
|
||||||
|
def check_screen_for_patterns() -> Optional[Dict[str, Any]]:
|
||||||
|
"""Vérifie si l'écran contient un pattern UI connu (dialogue, popup).
|
||||||
|
|
||||||
|
Capture l'écran, extrait le texte via OCR, et cherche un pattern
|
||||||
|
dans la UIPatternLibrary.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict avec le pattern trouvé, ou None.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
from core.knowledge.ui_patterns import UIPatternLibrary
|
||||||
|
import mss
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
lib = UIPatternLibrary()
|
||||||
|
|
||||||
|
with mss.mss() as sct:
|
||||||
|
monitor = sct.monitors[0]
|
||||||
|
screenshot = sct.grab(monitor)
|
||||||
|
screen = Image.frombytes('RGB', screenshot.size, screenshot.bgra, 'raw', 'BGRX')
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Essayer docTR d'abord (peut être importé depuis différents chemins)
|
||||||
|
try:
|
||||||
|
from services.ocr_service import ocr_extract_text
|
||||||
|
except ImportError:
|
||||||
|
from core.extraction.field_extractor import FieldExtractor
|
||||||
|
extractor = FieldExtractor()
|
||||||
|
ocr_extract_text = lambda img: extractor.extract_text_from_image(img)
|
||||||
|
|
||||||
|
ocr_text = ocr_extract_text(screen)
|
||||||
|
except ImportError:
|
||||||
|
logger.debug("OCR non disponible pour pattern check")
|
||||||
|
return None
|
||||||
|
|
||||||
|
if not ocr_text or len(ocr_text) < 5:
|
||||||
|
return None
|
||||||
|
|
||||||
|
pattern = lib.find_pattern(ocr_text)
|
||||||
|
if pattern and pattern['category'] in ('dialog', 'popup'):
|
||||||
|
print(f"🧠 [PatternCheck] Détecté: '{pattern['pattern']}' → {pattern['action']} '{pattern['target']}'")
|
||||||
|
return pattern
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠️ [PatternCheck] Erreur: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def handle_detected_pattern(pattern: Dict[str, Any]) -> bool:
|
||||||
|
"""Gère automatiquement un pattern UI détecté.
|
||||||
|
|
||||||
|
Cherche le bouton cible via OCR (position réelle sur l'écran).
|
||||||
|
100% vision — zéro coordonnée hardcodée.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True si le pattern a été géré avec succès.
|
||||||
|
"""
|
||||||
|
if not PYAUTOGUI_AVAILABLE:
|
||||||
|
logger.warning("pyautogui non disponible — impossible de gérer le pattern")
|
||||||
|
return False
|
||||||
|
|
||||||
|
action = pattern.get('action')
|
||||||
|
target = pattern.get('target', '')
|
||||||
|
alternatives = pattern.get('alternatives', [])
|
||||||
|
|
||||||
|
if action == 'click':
|
||||||
|
candidates_labels = [target] + alternatives
|
||||||
|
print(f"🔧 [Réflexe/handle] Recherche bouton parmi: {candidates_labels}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
import mss
|
||||||
|
import numpy as np
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
with mss.mss() as sct:
|
||||||
|
monitor = sct.monitors[0]
|
||||||
|
screenshot = sct.grab(monitor)
|
||||||
|
screen = Image.frombytes('RGB', screenshot.size, screenshot.bgra, 'raw', 'BGRX')
|
||||||
|
|
||||||
|
# EasyOCR (rapide, bonne qualité GUI) avec fallback docTR.
|
||||||
|
# gpu=True : harmonisé avec dialog_handler.py et title_verifier.py.
|
||||||
|
# Coût VRAM ~0.5 GB, sous le budget RTX 5070 (cf. deploy/VRAM_BUDGET.md).
|
||||||
|
words = []
|
||||||
|
try:
|
||||||
|
import easyocr
|
||||||
|
_reader = easyocr.Reader(['fr', 'en'], gpu=True, verbose=False)
|
||||||
|
results = _reader.readtext(np.array(screen))
|
||||||
|
for (bbox_pts, text, conf) in results:
|
||||||
|
if not text or len(text.strip()) < 1:
|
||||||
|
continue
|
||||||
|
x1 = int(min(p[0] for p in bbox_pts))
|
||||||
|
y1 = int(min(p[1] for p in bbox_pts))
|
||||||
|
x2 = int(max(p[0] for p in bbox_pts))
|
||||||
|
y2 = int(max(p[1] for p in bbox_pts))
|
||||||
|
words.append({'text': text.strip(), 'bbox': [x1, y1, x2, y2]})
|
||||||
|
except ImportError:
|
||||||
|
try:
|
||||||
|
from services.ocr_service import ocr_extract_words
|
||||||
|
words = ocr_extract_words(screen) or []
|
||||||
|
except ImportError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
print(f"🔧 [Réflexe/handle] {len(words)} mots OCR détectés")
|
||||||
|
|
||||||
|
# Collecter tous les matchs, prendre le plus bas (bouton = bas du dialogue)
|
||||||
|
all_matches = []
|
||||||
|
|
||||||
|
for candidate in candidates_labels:
|
||||||
|
candidate_lower = candidate.lower()
|
||||||
|
for word in words:
|
||||||
|
word_text = word['text'].lower()
|
||||||
|
if len(word_text) < 2 or len(candidate_lower) < 2:
|
||||||
|
continue
|
||||||
|
# Match exact ou inclusion
|
||||||
|
if word_text == candidate_lower or candidate_lower in word_text or word_text in candidate_lower:
|
||||||
|
x1, y1, x2, y2 = word['bbox']
|
||||||
|
all_matches.append({
|
||||||
|
'text': word['text'],
|
||||||
|
'x': int((x1 + x2) / 2),
|
||||||
|
'y': int((y1 + y2) / 2),
|
||||||
|
'candidate': candidate,
|
||||||
|
})
|
||||||
|
|
||||||
|
if all_matches:
|
||||||
|
best = max(all_matches, key=lambda m: m['y'])
|
||||||
|
print(f"✅ [Réflexe/handle] Clic sur '{best['text']}' à ({best['x']}, {best['y']})")
|
||||||
|
pyautogui.click(best['x'], best['y'])
|
||||||
|
time.sleep(1.0)
|
||||||
|
return True
|
||||||
|
|
||||||
|
print(f"⚠️ [Réflexe/handle] Bouton '{target}' introuvable parmi {[w['text'] for w in words[:15]]}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠️ [Réflexe/handle] Erreur: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
elif action == 'hotkey':
|
||||||
|
keys = target.split('+')
|
||||||
|
logger.info(f"Raccourci automatique: {target}")
|
||||||
|
pyautogui.hotkey(*keys)
|
||||||
|
time.sleep(0.5)
|
||||||
|
return True
|
||||||
|
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def vlm_reason_about_screen(objective: str = "", context: str = "") -> Optional[Dict[str, Any]]:
|
||||||
|
"""Demande au VLM de raisonner sur l'écran actuel et proposer une action.
|
||||||
|
|
||||||
|
Utilisé quand les réflexes (patterns) ne suffisent pas.
|
||||||
|
Le VLM voit l'écran et décide quoi faire.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
objective: Ce que Léa essaie de faire (ex: "cliquer sur Enregistrer")
|
||||||
|
context: Contexte additionnel (ex: "un dialogue est apparu")
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict avec 'action', 'target', 'reasoning' ou None si le VLM ne peut pas aider.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
import mss
|
||||||
|
import requests
|
||||||
|
import json
|
||||||
|
import base64
|
||||||
|
import io
|
||||||
|
import os
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
with mss.mss() as sct:
|
||||||
|
monitor = sct.monitors[0]
|
||||||
|
screenshot = sct.grab(monitor)
|
||||||
|
screen = Image.frombytes('RGB', screenshot.size, screenshot.bgra, 'raw', 'BGRX')
|
||||||
|
|
||||||
|
buffer = io.BytesIO()
|
||||||
|
screen.save(buffer, format='JPEG', quality=70)
|
||||||
|
image_b64 = base64.b64encode(buffer.getvalue()).decode('utf-8')
|
||||||
|
|
||||||
|
prompt = f"""Analyse cet écran et dis-moi quoi faire.
|
||||||
|
|
||||||
|
Objectif : {objective or "Interagir avec l'interface visible"}
|
||||||
|
Contexte : {context or "Aucun contexte supplémentaire"}
|
||||||
|
|
||||||
|
Réponds en JSON strict :
|
||||||
|
{{
|
||||||
|
"action": "click" ou "type" ou "wait" ou "nothing",
|
||||||
|
"target": "texte exact du bouton ou champ à cliquer",
|
||||||
|
"reasoning": "explication courte de ton choix"
|
||||||
|
}}
|
||||||
|
|
||||||
|
Si tu vois un dialogue ou une popup, indique quel bouton cliquer.
|
||||||
|
Si l'écran est normal sans action nécessaire, réponds action="nothing".
|
||||||
|
Réponds UNIQUEMENT le JSON, pas d'explication."""
|
||||||
|
|
||||||
|
ollama_url = os.environ.get("OLLAMA_URL", "http://localhost:11434")
|
||||||
|
model = os.environ.get("RPA_REASONING_MODEL", "qwen2.5vl:7b")
|
||||||
|
|
||||||
|
response = requests.post(
|
||||||
|
f"{ollama_url}/api/generate",
|
||||||
|
json={
|
||||||
|
"model": model,
|
||||||
|
"prompt": prompt,
|
||||||
|
"images": [image_b64],
|
||||||
|
"stream": False,
|
||||||
|
"options": {"temperature": 0.1, "num_predict": 200}
|
||||||
|
},
|
||||||
|
timeout=30
|
||||||
|
)
|
||||||
|
|
||||||
|
if response.status_code != 200:
|
||||||
|
logger.warning(f"VLM reasoning failed: HTTP {response.status_code}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
result = response.json()
|
||||||
|
text = result.get('response', '').strip()
|
||||||
|
|
||||||
|
import re
|
||||||
|
match = re.search(r'\{[\s\S]*\}', text)
|
||||||
|
if match:
|
||||||
|
parsed = json.loads(match.group())
|
||||||
|
logger.info(f"VLM reasoning: {parsed.get('action')} '{parsed.get('target')}' — {parsed.get('reasoning', '')[:80]}")
|
||||||
|
return parsed
|
||||||
|
|
||||||
|
logger.debug(f"VLM response not parseable: {text[:100]}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug(f"VLM reasoning failed: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def find_element_on_screen(
|
||||||
|
target_text: str,
|
||||||
|
target_description: str = "",
|
||||||
|
anchor_image_base64: Optional[str] = None,
|
||||||
|
anchor_bbox: Optional[Dict] = None,
|
||||||
|
monitor_idx: Optional[int] = None,
|
||||||
|
) -> Optional[Dict[str, Any]]:
|
||||||
|
"""
|
||||||
|
Cherche un élément sur l'écran en utilisant 3 méthodes en cascade.
|
||||||
|
|
||||||
|
Niveau 1 — OCR (rapide, ~1s) : docTR pour trouver le texte exact
|
||||||
|
Niveau 2 — UI-TARS grounding (~3s) : modèle GUI spécialisé
|
||||||
|
Niveau 3 — VLM reasoning (~10s) : raisonnement + OCR de confirmation
|
||||||
|
|
||||||
|
Args:
|
||||||
|
target_text: Texte de l'élément à trouver (ex: "Demo", "Enregistrer")
|
||||||
|
target_description: Description plus longue (ex: "le dossier Demo sur le bureau")
|
||||||
|
anchor_image_base64: Image de référence de l'ancre (pour CLIP matching, réservé futur)
|
||||||
|
anchor_bbox: Position originale de l'ancre (pour désambiguïser les matchs multiples)
|
||||||
|
monitor_idx: Index logique 0..N-1 du monitor à scruter. None = composite legacy.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
{'x': int, 'y': int, 'method': str, 'confidence': float} ou None
|
||||||
|
"""
|
||||||
|
# Si le target_text est vide ou c'est juste le type d'action,
|
||||||
|
# utiliser le VLM pour décrire l'image de l'ancre
|
||||||
|
action_types = {'click_anchor', 'double_click_anchor', 'right_click_anchor',
|
||||||
|
'hover_anchor', 'focus_anchor', 'scroll_to_anchor'}
|
||||||
|
has_useful_text = target_text and target_text not in action_types
|
||||||
|
|
||||||
|
if not has_useful_text and anchor_image_base64:
|
||||||
|
desc = _describe_anchor_image(anchor_image_base64)
|
||||||
|
if desc:
|
||||||
|
logger.info(f"[Grounding] Ancre décrite par VLM: '{desc}'")
|
||||||
|
target_description = desc
|
||||||
|
if not has_useful_text:
|
||||||
|
target_text = desc
|
||||||
|
|
||||||
|
if not target_text and not target_description:
|
||||||
|
logger.debug("find_element_on_screen: ni target_text ni target_description fournis")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Propager monitor_idx au niveau OCR via anchor_bbox (sans muter l'argument original)
|
||||||
|
if monitor_idx is not None and anchor_bbox is not None:
|
||||||
|
anchor_bbox = dict(anchor_bbox) # copie pour ne pas muter l'argument
|
||||||
|
anchor_bbox["monitor_idx"] = monitor_idx
|
||||||
|
elif monitor_idx is not None:
|
||||||
|
anchor_bbox = {"monitor_idx": monitor_idx}
|
||||||
|
|
||||||
|
search_label = target_description or target_text
|
||||||
|
logger.info(f"[Grounding] Recherche élément: '{search_label}' (cascade 3 niveaux)")
|
||||||
|
|
||||||
|
# ─── Niveau 1 — OCR (rapide, ~1s) ───
|
||||||
|
result = _grounding_ocr(target_text, anchor_bbox=anchor_bbox)
|
||||||
|
if result:
|
||||||
|
return result
|
||||||
|
|
||||||
|
# ─── Niveau 2 — UI-TARS grounding (~3s) ───
|
||||||
|
result = _grounding_ui_tars(target_text, target_description, monitor_idx=monitor_idx)
|
||||||
|
if result:
|
||||||
|
return result
|
||||||
|
|
||||||
|
# ─── Niveau 3 — VLM reasoning (~10s) ───
|
||||||
|
result = _grounding_vlm(target_text, target_description, monitor_idx=monitor_idx)
|
||||||
|
if result:
|
||||||
|
return result
|
||||||
|
|
||||||
|
logger.warning(f"[Grounding] ÉCHEC total pour '{search_label}' — aucune méthode n'a trouvé l'élément")
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _describe_anchor_image(anchor_image_base64: str) -> Optional[str]:
|
||||||
|
"""Demande au VLM de décrire l'image de l'ancre en quelques mots.
|
||||||
|
|
||||||
|
Utilisé quand le label est vide — le VLM regarde le crop de l'ancre
|
||||||
|
et décrit ce qu'il voit ("folder icon named Demo", "Save button", etc.)
|
||||||
|
pour que UI-TARS puisse chercher cet élément sur l'écran complet.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
import requests
|
||||||
|
import os
|
||||||
|
|
||||||
|
if ',' in anchor_image_base64:
|
||||||
|
anchor_image_base64 = anchor_image_base64.split(',', 1)[1]
|
||||||
|
|
||||||
|
ollama_url = os.environ.get("OLLAMA_URL", "http://localhost:11434")
|
||||||
|
model = "qwen2.5vl:3b"
|
||||||
|
|
||||||
|
logger.info(f"[Grounding] Description ancre via {model}...")
|
||||||
|
response = requests.post(
|
||||||
|
f"{ollama_url}/api/generate",
|
||||||
|
json={
|
||||||
|
"model": model,
|
||||||
|
"prompt": "Describe this UI element in 5 words maximum. Just the element name, nothing else. Example: 'folder icon named Demo' or 'Save button' or 'Chrome browser icon'",
|
||||||
|
"images": [anchor_image_base64],
|
||||||
|
"stream": False,
|
||||||
|
"options": {"temperature": 0.1, "num_predict": 20}
|
||||||
|
},
|
||||||
|
timeout=30
|
||||||
|
)
|
||||||
|
|
||||||
|
if response.status_code == 200:
|
||||||
|
desc = response.json().get('response', '').strip().strip('"').strip("'")
|
||||||
|
if desc and len(desc) > 2:
|
||||||
|
return desc
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"[Grounding] Description ancre échouée: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _capture_screen(monitor_idx=None):
|
||||||
|
"""Capture l'écran et retourne (PIL.Image, width, height, offset_x, offset_y).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
monitor_idx: Index logique 0..N-1 du monitor à capturer (cf. screeninfo).
|
||||||
|
Si None : capture composite (mss.monitors[0]) — comportement legacy.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
(image, w, h, offset_x, offset_y). offset = (0,0) en mode composite.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
with mss.mss() as sct:
|
||||||
|
if monitor_idx is None:
|
||||||
|
# Comportement actuel : composite tous écrans
|
||||||
|
monitor = sct.monitors[0]
|
||||||
|
offset_x, offset_y = 0, 0
|
||||||
|
else:
|
||||||
|
# mss skip monitors[0] (composite). Index logique 0 → mss.monitors[1].
|
||||||
|
mss_idx = int(monitor_idx) + 1
|
||||||
|
if mss_idx >= len(sct.monitors):
|
||||||
|
logger.warning(
|
||||||
|
"mss.monitors[%d] hors limites (n=%d) — fallback composite",
|
||||||
|
mss_idx, len(sct.monitors),
|
||||||
|
)
|
||||||
|
monitor = sct.monitors[0]
|
||||||
|
offset_x, offset_y = 0, 0
|
||||||
|
else:
|
||||||
|
monitor = sct.monitors[mss_idx]
|
||||||
|
offset_x = int(monitor.get("left", 0))
|
||||||
|
offset_y = int(monitor.get("top", 0))
|
||||||
|
|
||||||
|
screenshot = sct.grab(monitor)
|
||||||
|
screen = PILImage.frombytes('RGB', screenshot.size, screenshot.bgra, 'raw', 'BGRX')
|
||||||
|
return screen, monitor['width'], monitor['height'], offset_x, offset_y
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug(f"Capture écran échouée: {e}")
|
||||||
|
return None, 0, 0, 0, 0
|
||||||
|
|
||||||
|
|
||||||
|
def _grounding_ocr(target_text: str, anchor_bbox: Optional[Dict] = None) -> Optional[Dict[str, Any]]:
|
||||||
|
"""Niveau 1 — Cherche le texte par OCR (docTR). ~1s.
|
||||||
|
|
||||||
|
Collecte TOUS les matchs et choisit le plus pertinent :
|
||||||
|
- Si anchor_bbox fourni → le plus proche de la position originale
|
||||||
|
- Sinon → le plus proche du centre de l'écran (zone contenu)
|
||||||
|
"""
|
||||||
|
logger.debug(f"[Grounding/OCR] target='{target_text}' bbox={anchor_bbox}")
|
||||||
|
if not target_text:
|
||||||
|
return None
|
||||||
|
|
||||||
|
try:
|
||||||
|
monitor_idx_param = anchor_bbox.get("monitor_idx") if anchor_bbox else None
|
||||||
|
screen, screen_w, screen_h, ox, oy = _capture_screen(monitor_idx=monitor_idx_param)
|
||||||
|
if screen is None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
try:
|
||||||
|
from services.ocr_service import ocr_extract_words
|
||||||
|
except ImportError:
|
||||||
|
from core.extraction.field_extractor import FieldExtractor
|
||||||
|
extractor = FieldExtractor()
|
||||||
|
def ocr_extract_words(img):
|
||||||
|
return extractor.extract_words_from_image(img)
|
||||||
|
|
||||||
|
words = ocr_extract_words(screen)
|
||||||
|
if not words:
|
||||||
|
logger.debug("[Grounding/OCR] Aucun mot détecté")
|
||||||
|
return None
|
||||||
|
|
||||||
|
target_lower = target_text.lower()
|
||||||
|
all_matches = []
|
||||||
|
|
||||||
|
# Collecter tous les matchs
|
||||||
|
for word in words:
|
||||||
|
word_lower = word['text'].lower()
|
||||||
|
x1, y1, x2, y2 = word['bbox']
|
||||||
|
cx, cy = int((x1 + x2) / 2), int((y1 + y2) / 2)
|
||||||
|
|
||||||
|
if word_lower == target_lower:
|
||||||
|
all_matches.append({'text': word['text'], 'x': cx, 'y': cy, 'type': 'exact', 'conf': 0.95})
|
||||||
|
elif len(word_lower) >= 3 and len(target_lower) >= 3:
|
||||||
|
if target_lower in word_lower or word_lower in target_lower:
|
||||||
|
# Pénaliser les matchs partiels trop courts par rapport au target
|
||||||
|
ratio = len(word_lower) / max(len(target_lower), 1)
|
||||||
|
conf = 0.80 if ratio > 0.5 else 0.50
|
||||||
|
all_matches.append({'text': word['text'], 'x': cx, 'y': cy, 'type': 'partial', 'conf': conf})
|
||||||
|
|
||||||
|
# Matching lettre initiale manquante
|
||||||
|
if not all_matches and len(target_lower) > 3:
|
||||||
|
partial = target_lower[1:]
|
||||||
|
for word in words:
|
||||||
|
if partial in word['text'].lower():
|
||||||
|
x1, y1, x2, y2 = word['bbox']
|
||||||
|
all_matches.append({'text': word['text'], 'x': int((x1+x2)/2), 'y': int((y1+y2)/2), 'type': 'partial_cut', 'conf': 0.70})
|
||||||
|
|
||||||
|
if not all_matches:
|
||||||
|
logger.debug(f"[Grounding/OCR] '{target_text}' non trouvé parmi {len(words)} mots")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Choisir le meilleur match
|
||||||
|
if len(all_matches) == 1:
|
||||||
|
best = all_matches[0]
|
||||||
|
elif anchor_bbox:
|
||||||
|
# Prendre le plus proche de la position originale de l'ancre
|
||||||
|
orig_x = anchor_bbox.get('x', 0) + anchor_bbox.get('width', 0) / 2
|
||||||
|
orig_y = anchor_bbox.get('y', 0) + anchor_bbox.get('height', 0) / 2
|
||||||
|
best = min(all_matches, key=lambda m: ((m['x'] - orig_x)**2 + (m['y'] - orig_y)**2))
|
||||||
|
else:
|
||||||
|
# Prendre le plus central (zone contenu, pas les barres de titre)
|
||||||
|
center_x, center_y = screen_w / 2, screen_h / 2
|
||||||
|
best = min(all_matches, key=lambda m: ((m['x'] - center_x)**2 + (m['y'] - center_y)**2))
|
||||||
|
|
||||||
|
for m in all_matches:
|
||||||
|
sel = " ← CHOISI" if m is best else ""
|
||||||
|
logger.info(f" [OCR] Candidat: '{m['text']}' à ({m['x']}, {m['y']}) [{m['type']}]{sel}")
|
||||||
|
|
||||||
|
return {'x': best['x'] + ox, 'y': best['y'] + oy, 'method': 'ocr', 'confidence': best['conf']}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug(f"[Grounding/OCR] Erreur: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _grounding_ui_tars(target_text: str, target_description: str = "", monitor_idx=None) -> Optional[Dict[str, Any]]:
|
||||||
|
"""Niveau 2 — UI-TARS grounding visuel (~3s)."""
|
||||||
|
try:
|
||||||
|
import requests
|
||||||
|
import base64
|
||||||
|
import io
|
||||||
|
import re
|
||||||
|
import os
|
||||||
|
|
||||||
|
screen, screen_w, screen_h, ox, oy = _capture_screen(monitor_idx=monitor_idx)
|
||||||
|
if screen is None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Encoder le screenshot en base64
|
||||||
|
buffer = io.BytesIO()
|
||||||
|
screen.save(buffer, format='JPEG', quality=70)
|
||||||
|
image_b64 = base64.b64encode(buffer.getvalue()).decode('utf-8')
|
||||||
|
|
||||||
|
# Construire le prompt pour UI-TARS
|
||||||
|
click_target = target_description or target_text
|
||||||
|
prompt = f"click on {click_target}"
|
||||||
|
|
||||||
|
ollama_url = os.environ.get("OLLAMA_URL", "http://localhost:11434")
|
||||||
|
model = "0000/ui-tars-1.5-7b-q8_0:7b"
|
||||||
|
|
||||||
|
logger.info(f"[Grounding/UI-TARS] Envoi à {model}: '{prompt}'")
|
||||||
|
|
||||||
|
response = requests.post(
|
||||||
|
f"{ollama_url}/api/generate",
|
||||||
|
json={
|
||||||
|
"model": model,
|
||||||
|
"prompt": prompt,
|
||||||
|
"images": [image_b64],
|
||||||
|
"stream": False,
|
||||||
|
"options": {"temperature": 0.1, "num_predict": 50}
|
||||||
|
},
|
||||||
|
timeout=30
|
||||||
|
)
|
||||||
|
|
||||||
|
if response.status_code != 200:
|
||||||
|
logger.warning(f"[Grounding/UI-TARS] HTTP {response.status_code}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
result = response.json()
|
||||||
|
text = result.get('response', '').strip()
|
||||||
|
logger.debug(f"[Grounding/UI-TARS] Réponse brute: {text[:200]}")
|
||||||
|
|
||||||
|
# Parser les coordonnées de UI-TARS
|
||||||
|
coords = _parse_ui_tars_coordinates(text, screen_w, screen_h)
|
||||||
|
if coords:
|
||||||
|
x, y = coords
|
||||||
|
# Valider que les coordonnées sont dans l'écran
|
||||||
|
if 0 <= x <= screen_w and 0 <= y <= screen_h:
|
||||||
|
logger.info(f"[Grounding/UI-TARS] Grounding → ({x}, {y})")
|
||||||
|
return {'x': x + ox, 'y': y + oy, 'method': 'ui_tars', 'confidence': 0.85}
|
||||||
|
else:
|
||||||
|
logger.warning(f"[Grounding/UI-TARS] Coordonnées hors écran: ({x}, {y}) pour {screen_w}x{screen_h}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
logger.debug(f"[Grounding/UI-TARS] Pas de coordonnées parsées dans: {text[:100]}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug(f"[Grounding/UI-TARS] Erreur: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_ui_tars_coordinates(text: str, screen_w: int, screen_h: int) -> Optional[tuple]:
|
||||||
|
"""Parse les coordonnées retournées par UI-TARS.
|
||||||
|
|
||||||
|
UI-TARS peut retourner :
|
||||||
|
- Coordonnées normalisées (0-1000) : "click at (500, 300)"
|
||||||
|
- Coordonnées en pixels : "click at (960, 540)"
|
||||||
|
- Format (x, y) ou [x, y] ou x,y
|
||||||
|
- Format "Action: click\nCoordinate: (500, 300)" ou "[500, 300]"
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
(x_pixel, y_pixel) ou None
|
||||||
|
"""
|
||||||
|
import re
|
||||||
|
|
||||||
|
# Chercher des patterns de coordonnées
|
||||||
|
patterns = [
|
||||||
|
r'Coordinate:\s*\[?\(?\s*(\d+(?:\.\d+)?)\s*,\s*(\d+(?:\.\d+)?)\s*\)?\]?',
|
||||||
|
r'click\s+(?:at\s+)?\[?\(?\s*(\d+(?:\.\d+)?)\s*,\s*(\d+(?:\.\d+)?)\s*\)?\]?',
|
||||||
|
r'\(\s*(\d+(?:\.\d+)?)\s*,\s*(\d+(?:\.\d+)?)\s*\)',
|
||||||
|
r'\[\s*(\d+(?:\.\d+)?)\s*,\s*(\d+(?:\.\d+)?)\s*\]',
|
||||||
|
]
|
||||||
|
|
||||||
|
for pattern in patterns:
|
||||||
|
match = re.search(pattern, text, re.IGNORECASE)
|
||||||
|
if match:
|
||||||
|
raw_x = float(match.group(1))
|
||||||
|
raw_y = float(match.group(2))
|
||||||
|
|
||||||
|
# UI-TARS utilise souvent des coordonnées normalisées 0-1000
|
||||||
|
if raw_x <= 1000 and raw_y <= 1000 and (raw_x > 1 or raw_y > 1):
|
||||||
|
# Probablement normalisées sur 1000
|
||||||
|
x = int(raw_x * screen_w / 1000)
|
||||||
|
y = int(raw_y * screen_h / 1000)
|
||||||
|
elif raw_x <= 1.0 and raw_y <= 1.0:
|
||||||
|
# Normalisées 0-1
|
||||||
|
x = int(raw_x * screen_w)
|
||||||
|
y = int(raw_y * screen_h)
|
||||||
|
else:
|
||||||
|
# Pixels directs
|
||||||
|
x = int(raw_x)
|
||||||
|
y = int(raw_y)
|
||||||
|
|
||||||
|
return (x, y)
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _grounding_vlm(target_text: str, target_description: str = "", monitor_idx=None) -> Optional[Dict[str, Any]]:
|
||||||
|
"""Niveau 3 — VLM reasoning + confirmation OCR (~10s)."""
|
||||||
|
try:
|
||||||
|
search_label = target_description or target_text
|
||||||
|
|
||||||
|
vlm_result = vlm_reason_about_screen(
|
||||||
|
objective=f"Cliquer sur {search_label}",
|
||||||
|
context=f"Je cherche l'élément '{target_text}' sur l'écran pour cliquer dessus"
|
||||||
|
)
|
||||||
|
|
||||||
|
if not vlm_result:
|
||||||
|
logger.debug("[Grounding/VLM] VLM n'a pas retourné de résultat")
|
||||||
|
return None
|
||||||
|
|
||||||
|
if vlm_result.get('action') != 'click' or not vlm_result.get('target'):
|
||||||
|
logger.debug(f"[Grounding/VLM] VLM action={vlm_result.get('action')}, pas un clic")
|
||||||
|
return None
|
||||||
|
|
||||||
|
vlm_target = vlm_result['target']
|
||||||
|
logger.info(f"[Grounding/VLM] VLM suggère de cliquer sur: '{vlm_target}'")
|
||||||
|
|
||||||
|
# Confirmation par OCR : chercher le target VLM sur l'écran
|
||||||
|
screen, screen_w, screen_h, ox, oy = _capture_screen(monitor_idx=monitor_idx)
|
||||||
|
if screen is None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
try:
|
||||||
|
try:
|
||||||
|
from services.ocr_service import ocr_extract_words
|
||||||
|
except ImportError:
|
||||||
|
from core.extraction.field_extractor import FieldExtractor
|
||||||
|
extractor = FieldExtractor()
|
||||||
|
def ocr_extract_words(img):
|
||||||
|
return extractor.extract_words_from_image(img)
|
||||||
|
|
||||||
|
words = ocr_extract_words(screen)
|
||||||
|
|
||||||
|
vlm_target_lower = vlm_target.lower()
|
||||||
|
for word in words:
|
||||||
|
if vlm_target_lower in word['text'].lower() or word['text'].lower() in vlm_target_lower:
|
||||||
|
x1, y1, x2, y2 = word['bbox']
|
||||||
|
x = int((x1 + x2) / 2)
|
||||||
|
y = int((y1 + y2) / 2)
|
||||||
|
logger.info(f"[Grounding/VLM] Confirmé par OCR: '{word['text']}' à ({x}, {y})")
|
||||||
|
return {'x': x + ox, 'y': y + oy, 'method': 'vlm', 'confidence': 0.75}
|
||||||
|
|
||||||
|
logger.debug(f"[Grounding/VLM] Target VLM '{vlm_target}' non trouvé par OCR")
|
||||||
|
return None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug(f"[Grounding/VLM] OCR de confirmation échoué: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug(f"[Grounding/VLM] Erreur: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def post_execution_cleanup(execution_mode: str = 'debug'):
|
||||||
|
"""Vérifie l'écran après exécution et gère les dialogues restants.
|
||||||
|
|
||||||
|
Appelé après la dernière étape d'un workflow pour laisser l'écran propre.
|
||||||
|
"""
|
||||||
|
if execution_mode not in ('intelligent', 'debug'):
|
||||||
|
return
|
||||||
|
|
||||||
|
logger.info("Vérification écran final...")
|
||||||
|
time.sleep(1.0)
|
||||||
|
for _ in range(3):
|
||||||
|
detected = check_screen_for_patterns()
|
||||||
|
if detected:
|
||||||
|
logger.info(f"Dialogue résiduel détecté: {detected.get('pattern')}")
|
||||||
|
handle_detected_pattern(detected)
|
||||||
|
time.sleep(1.0)
|
||||||
|
else:
|
||||||
|
vlm_result = vlm_reason_about_screen(
|
||||||
|
objective="Vérifier que l'écran est propre après l'exécution",
|
||||||
|
context="Le workflow vient de se terminer"
|
||||||
|
)
|
||||||
|
if vlm_result and vlm_result.get('action') in ('click', 'type'):
|
||||||
|
logger.info(f"VLM post-workflow: {vlm_result.get('action')} '{vlm_result.get('target')}'")
|
||||||
|
break
|
||||||
@@ -40,12 +40,16 @@ class LLMActionHandler:
|
|||||||
def __init__(
|
def __init__(
|
||||||
self,
|
self,
|
||||||
ollama_endpoint: str = "http://localhost:11434",
|
ollama_endpoint: str = "http://localhost:11434",
|
||||||
model: str = "qwen3-vl:8b",
|
model: str = None,
|
||||||
temperature: float = 0.1,
|
temperature: float = 0.1,
|
||||||
timeout: int = 120,
|
timeout: int = 120,
|
||||||
):
|
):
|
||||||
self.endpoint = ollama_endpoint.rstrip("/")
|
self.endpoint = ollama_endpoint.rstrip("/")
|
||||||
|
if model is not None:
|
||||||
self.model = model
|
self.model = model
|
||||||
|
else:
|
||||||
|
from core.detection.vlm_config import get_vlm_model
|
||||||
|
self.model = get_vlm_model()
|
||||||
self.temperature = temperature
|
self.temperature = temperature
|
||||||
self.timeout = timeout
|
self.timeout = timeout
|
||||||
|
|
||||||
|
|||||||
2008
core/execution/observe_reason_act.py
Normal file
2008
core/execution/observe_reason_act.py
Normal file
File diff suppressed because it is too large
Load Diff
@@ -1697,12 +1697,6 @@ class TargetResolver:
|
|||||||
|
|
||||||
return best_elem, tie_break_criterion
|
return best_elem, tie_break_criterion
|
||||||
|
|
||||||
# Spatial analyzer (lazy load) - Exigence 5.3
|
|
||||||
self._spatial_analyzer: Optional[SpatialAnalyzer] = None
|
|
||||||
self._spatial_relations_cache: Dict[str, List[SpatialRelation]] = {}
|
|
||||||
|
|
||||||
logger.info(f"TargetResolver initialized (threshold={similarity_threshold}, spatial={use_spatial_fallback})")
|
|
||||||
|
|
||||||
# =========================================================================
|
# =========================================================================
|
||||||
# Résolution principale
|
# Résolution principale
|
||||||
# =========================================================================
|
# =========================================================================
|
||||||
|
|||||||
@@ -22,7 +22,7 @@ logger = logging.getLogger(__name__)
|
|||||||
|
|
||||||
# Configuration Ollama (coherente avec le reste du projet)
|
# Configuration Ollama (coherente avec le reste du projet)
|
||||||
OLLAMA_DEFAULT_URL = os.environ.get("OLLAMA_URL", "http://localhost:11434")
|
OLLAMA_DEFAULT_URL = os.environ.get("OLLAMA_URL", "http://localhost:11434")
|
||||||
OLLAMA_DEFAULT_MODEL = os.environ.get("VLM_MODEL", "qwen3-vl:8b")
|
OLLAMA_DEFAULT_MODEL = os.environ.get("RPA_VLM_MODEL", os.environ.get("VLM_MODEL", "gemma4:e4b"))
|
||||||
|
|
||||||
|
|
||||||
class FieldExtractor:
|
class FieldExtractor:
|
||||||
|
|||||||
@@ -2,7 +2,7 @@
|
|||||||
GPU Resource Management Module for RPA Vision V3
|
GPU Resource Management Module for RPA Vision V3
|
||||||
|
|
||||||
This module provides dynamic GPU resource allocation between ML models:
|
This module provides dynamic GPU resource allocation between ML models:
|
||||||
- Ollama VLM (qwen3-vl:8b) for UI classification
|
- Ollama VLM (gemma4:e4b par défaut, configurable via RPA_VLM_MODEL) for UI classification
|
||||||
- CLIP (ViT-B-32) for embedding matching
|
- CLIP (ViT-B-32) for embedding matching
|
||||||
|
|
||||||
The GPUResourceManager optimizes VRAM usage by:
|
The GPUResourceManager optimizes VRAM usage by:
|
||||||
|
|||||||
@@ -2,7 +2,7 @@
|
|||||||
GPU Resource Manager - Central orchestrator for GPU resource allocation
|
GPU Resource Manager - Central orchestrator for GPU resource allocation
|
||||||
|
|
||||||
Manages dynamic allocation of GPU resources between:
|
Manages dynamic allocation of GPU resources between:
|
||||||
- Ollama VLM (qwen3-vl:8b) - ~10.5 GB VRAM for UI classification
|
- Ollama VLM (gemma4:e4b par défaut) - ~10 GB VRAM for UI classification
|
||||||
- CLIP (ViT-B-32) - ~500 MB VRAM for embedding matching
|
- CLIP (ViT-B-32) - ~500 MB VRAM for embedding matching
|
||||||
|
|
||||||
Optimizes VRAM usage based on execution mode:
|
Optimizes VRAM usage based on execution mode:
|
||||||
@@ -12,13 +12,14 @@ Optimizes VRAM usage based on execution mode:
|
|||||||
"""
|
"""
|
||||||
|
|
||||||
import asyncio
|
import asyncio
|
||||||
|
import contextlib
|
||||||
import logging
|
import logging
|
||||||
import threading
|
import threading
|
||||||
import time
|
import time
|
||||||
from dataclasses import dataclass, field
|
from dataclasses import dataclass, field
|
||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
from enum import Enum
|
from enum import Enum
|
||||||
from typing import Any, Callable, Dict, List, Optional
|
from typing import Any, Callable, Dict, Iterator, List, Optional
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
@@ -53,7 +54,7 @@ class VRAMInfo:
|
|||||||
class GPUResourceConfig:
|
class GPUResourceConfig:
|
||||||
"""Configuration for GPU resource management."""
|
"""Configuration for GPU resource management."""
|
||||||
ollama_endpoint: str = "http://localhost:11434"
|
ollama_endpoint: str = "http://localhost:11434"
|
||||||
vlm_model: str = "qwen3-vl:8b"
|
vlm_model: str = "gemma4:e4b"
|
||||||
clip_model: str = "ViT-B-32"
|
clip_model: str = "ViT-B-32"
|
||||||
idle_timeout_seconds: int = 300 # 5 minutes
|
idle_timeout_seconds: int = 300 # 5 minutes
|
||||||
vram_threshold_for_clip_gpu_mb: int = 1024 # 1 GB
|
vram_threshold_for_clip_gpu_mb: int = 1024 # 1 GB
|
||||||
@@ -127,6 +128,12 @@ class GPUResourceManager:
|
|||||||
self._operation_queue: asyncio.Queue = asyncio.Queue()
|
self._operation_queue: asyncio.Queue = asyncio.Queue()
|
||||||
self._operation_lock = asyncio.Lock()
|
self._operation_lock = asyncio.Lock()
|
||||||
|
|
||||||
|
# Lock d'inférence synchrone : sérialise les appels GPU concurrents
|
||||||
|
# (ScreenAnalyzer.analyze, UIDetector, CLIP.encode) entre
|
||||||
|
# ExecutionLoop et stream_processor pour éviter la saturation VRAM
|
||||||
|
# sur RTX 5070 (12 Go). Un seul analyze à la fois sur le GPU.
|
||||||
|
self._inference_lock = threading.Lock()
|
||||||
|
|
||||||
# Event callbacks
|
# Event callbacks
|
||||||
self._on_resource_changed: List[Callable[[ResourceChangedEvent], None]] = []
|
self._on_resource_changed: List[Callable[[ResourceChangedEvent], None]] = []
|
||||||
self._on_mode_changed: List[Callable[[ExecutionMode], None]] = []
|
self._on_mode_changed: List[Callable[[ExecutionMode], None]] = []
|
||||||
@@ -208,6 +215,44 @@ class GPUResourceManager:
|
|||||||
"""Get the current execution mode."""
|
"""Get the current execution mode."""
|
||||||
return self._execution_mode
|
return self._execution_mode
|
||||||
|
|
||||||
|
# =========================================================================
|
||||||
|
# Inference serialization (sync)
|
||||||
|
# =========================================================================
|
||||||
|
|
||||||
|
@contextlib.contextmanager
|
||||||
|
def acquire_inference(self, timeout: Optional[float] = None) -> Iterator[bool]:
|
||||||
|
"""
|
||||||
|
Context manager synchrone pour sérialiser les inférences GPU.
|
||||||
|
|
||||||
|
Garantit qu'un seul appel d'inférence (ScreenAnalyzer.analyze,
|
||||||
|
UIDetector.detect, CLIP.encode…) tourne à la fois sur le GPU.
|
||||||
|
Évite la saturation VRAM quand ExecutionLoop et stream_processor
|
||||||
|
appellent analyze() simultanément sur une RTX 5070 (12 Go).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
timeout: Délai max d'attente (secondes). None = bloquant.
|
||||||
|
|
||||||
|
Yields:
|
||||||
|
True si le lock est acquis, False en cas de timeout.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> with gpu_manager.acquire_inference(timeout=30.0) as acquired:
|
||||||
|
... if not acquired:
|
||||||
|
... logger.warning("GPU lock timeout")
|
||||||
|
... state = analyzer.analyze(path)
|
||||||
|
"""
|
||||||
|
if timeout is None:
|
||||||
|
self._inference_lock.acquire()
|
||||||
|
acquired = True
|
||||||
|
else:
|
||||||
|
acquired = self._inference_lock.acquire(timeout=timeout)
|
||||||
|
|
||||||
|
try:
|
||||||
|
yield acquired
|
||||||
|
finally:
|
||||||
|
if acquired:
|
||||||
|
self._inference_lock.release()
|
||||||
|
|
||||||
# =========================================================================
|
# =========================================================================
|
||||||
# VLM Management
|
# VLM Management
|
||||||
# =========================================================================
|
# =========================================================================
|
||||||
|
|||||||
@@ -32,7 +32,7 @@ class OllamaManager:
|
|||||||
def __init__(
|
def __init__(
|
||||||
self,
|
self,
|
||||||
endpoint: str = "http://localhost:11434",
|
endpoint: str = "http://localhost:11434",
|
||||||
model: str = "qwen3-vl:8b",
|
model: str = "gemma4:e4b",
|
||||||
default_keep_alive: str = "5m"
|
default_keep_alive: str = "5m"
|
||||||
):
|
):
|
||||||
"""
|
"""
|
||||||
|
|||||||
@@ -173,6 +173,10 @@ class GraphBuilder:
|
|||||||
clustering_eps: float = 0.08,
|
clustering_eps: float = 0.08,
|
||||||
clustering_min_samples: int = 2,
|
clustering_min_samples: int = 2,
|
||||||
enable_quality_validation: bool = True,
|
enable_quality_validation: bool = True,
|
||||||
|
ui_detector: Optional[Any] = None,
|
||||||
|
screen_analyzer: Optional[Any] = None,
|
||||||
|
enable_ui_enrichment: bool = True,
|
||||||
|
element_proximity_max_px: float = 50.0,
|
||||||
):
|
):
|
||||||
"""
|
"""
|
||||||
Initialiser le GraphBuilder.
|
Initialiser le GraphBuilder.
|
||||||
@@ -185,6 +189,17 @@ class GraphBuilder:
|
|||||||
clustering_eps: Epsilon pour DBSCAN (distance max entre points)
|
clustering_eps: Epsilon pour DBSCAN (distance max entre points)
|
||||||
clustering_min_samples: Nombre minimum d'échantillons pour un cluster
|
clustering_min_samples: Nombre minimum d'échantillons pour un cluster
|
||||||
enable_quality_validation: Activer la validation de qualité
|
enable_quality_validation: Activer la validation de qualité
|
||||||
|
ui_detector: UIDetector optionnel. Si fourni, sera utilisé par
|
||||||
|
l'analyzer lazy-initialisé. Sinon, fallback sur le singleton
|
||||||
|
partagé (`get_screen_analyzer()`).
|
||||||
|
screen_analyzer: Instance ScreenAnalyzer à utiliser directement.
|
||||||
|
Si None, lazy init via le singleton partagé C1.
|
||||||
|
enable_ui_enrichment: Active l'enrichissement visuel des
|
||||||
|
ScreenStates lors de `_create_screen_states` (OCR + UIDetector).
|
||||||
|
False = comportement historique (ui_elements=[], detected_text=[]).
|
||||||
|
element_proximity_max_px: Distance maximale (en pixels) entre un
|
||||||
|
clic et le bbox le plus proche pour qu'un UIElement soit
|
||||||
|
considéré comme cible. Au-delà, le clic reste sans ancre.
|
||||||
"""
|
"""
|
||||||
self.embedding_builder = embedding_builder or StateEmbeddingBuilder()
|
self.embedding_builder = embedding_builder or StateEmbeddingBuilder()
|
||||||
self.faiss_manager = faiss_manager
|
self.faiss_manager = faiss_manager
|
||||||
@@ -193,22 +208,73 @@ class GraphBuilder:
|
|||||||
self.clustering_eps = clustering_eps
|
self.clustering_eps = clustering_eps
|
||||||
self.clustering_min_samples = clustering_min_samples
|
self.clustering_min_samples = clustering_min_samples
|
||||||
self.enable_quality_validation = enable_quality_validation
|
self.enable_quality_validation = enable_quality_validation
|
||||||
self._screen_analyzer = None # ScreenAnalyzer (lazy import)
|
self.enable_ui_enrichment = enable_ui_enrichment
|
||||||
|
self.element_proximity_max_px = element_proximity_max_px
|
||||||
|
# UIDetector explicite (optionnel) — injecté dans l'analyzer lazy.
|
||||||
|
self._ui_detector = ui_detector
|
||||||
|
# Instance ScreenAnalyzer. Si fournie, on l'utilise telle quelle ;
|
||||||
|
# sinon, on bascule sur le singleton partagé (lazy init).
|
||||||
|
self._screen_analyzer = screen_analyzer
|
||||||
|
|
||||||
logger.info(
|
logger.info(
|
||||||
f"GraphBuilder initialized: "
|
f"GraphBuilder initialized: "
|
||||||
f"min_repetitions={min_pattern_repetitions}, "
|
f"min_repetitions={min_pattern_repetitions}, "
|
||||||
f"eps={clustering_eps}, "
|
f"eps={clustering_eps}, "
|
||||||
f"min_samples={clustering_min_samples}, "
|
f"min_samples={clustering_min_samples}, "
|
||||||
f"quality_validation={enable_quality_validation}"
|
f"quality_validation={enable_quality_validation}, "
|
||||||
|
f"ui_enrichment={enable_ui_enrichment}"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Résolution paresseuse du ScreenAnalyzer (singleton C1 par défaut)
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _get_screen_analyzer(self):
|
||||||
|
"""
|
||||||
|
Retourner l'instance ScreenAnalyzer à utiliser.
|
||||||
|
|
||||||
|
Priorité :
|
||||||
|
1. Instance injectée via le constructeur (`screen_analyzer=…`).
|
||||||
|
2. Singleton partagé `get_screen_analyzer()` (C1) — évite le double
|
||||||
|
chargement GPU quand ExecutionLoop et stream_processor tournent.
|
||||||
|
3. En dernier recours (import circulaire, tests), création locale.
|
||||||
|
"""
|
||||||
|
if self._screen_analyzer is not None:
|
||||||
|
return self._screen_analyzer
|
||||||
|
|
||||||
|
try:
|
||||||
|
from core.pipeline import get_screen_analyzer
|
||||||
|
|
||||||
|
self._screen_analyzer = get_screen_analyzer(
|
||||||
|
ui_detector=self._ui_detector,
|
||||||
|
)
|
||||||
|
return self._screen_analyzer
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(
|
||||||
|
f"Impossible d'obtenir le ScreenAnalyzer singleton "
|
||||||
|
f"({e}); fallback sur une instance locale."
|
||||||
|
)
|
||||||
|
try:
|
||||||
|
from core.pipeline.screen_analyzer import ScreenAnalyzer
|
||||||
|
|
||||||
|
self._screen_analyzer = ScreenAnalyzer(
|
||||||
|
ui_detector=self._ui_detector,
|
||||||
|
)
|
||||||
|
return self._screen_analyzer
|
||||||
|
except Exception as e2:
|
||||||
|
logger.error(
|
||||||
|
f"Impossible d'instancier ScreenAnalyzer: {e2}. "
|
||||||
|
"Enrichissement UI désactivé."
|
||||||
|
)
|
||||||
|
return None
|
||||||
|
|
||||||
def build_from_session(
|
def build_from_session(
|
||||||
self,
|
self,
|
||||||
session: RawSession,
|
session: RawSession,
|
||||||
workflow_name: Optional[str] = None,
|
workflow_name: Optional[str] = None,
|
||||||
precomputed_states: Optional[List["ScreenState"]] = None,
|
precomputed_states: Optional[List["ScreenState"]] = None,
|
||||||
precomputed_embeddings: Optional[List] = None,
|
precomputed_embeddings: Optional[List] = None,
|
||||||
|
sequential: bool = False,
|
||||||
) -> Workflow:
|
) -> Workflow:
|
||||||
"""
|
"""
|
||||||
Construire un Workflow complet depuis une RawSession.
|
Construire un Workflow complet depuis une RawSession.
|
||||||
@@ -216,7 +282,7 @@ class GraphBuilder:
|
|||||||
Processus:
|
Processus:
|
||||||
1. Créer ScreenStates depuis screenshots (ou utiliser precomputed_states)
|
1. Créer ScreenStates depuis screenshots (ou utiliser precomputed_states)
|
||||||
2. Calculer embeddings pour chaque état (ou réutiliser precomputed_embeddings)
|
2. Calculer embeddings pour chaque état (ou réutiliser precomputed_embeddings)
|
||||||
3. Détecter patterns via clustering
|
3. Détecter patterns via clustering (ou mode séquentiel)
|
||||||
4. Construire nodes depuis clusters
|
4. Construire nodes depuis clusters
|
||||||
5. Construire edges depuis transitions
|
5. Construire edges depuis transitions
|
||||||
|
|
||||||
@@ -228,6 +294,10 @@ class GraphBuilder:
|
|||||||
precomputed_embeddings: Embeddings déjà calculés (streaming).
|
precomputed_embeddings: Embeddings déjà calculés (streaming).
|
||||||
Si fourni et de la bonne longueur (= len(screen_states)),
|
Si fourni et de la bonne longueur (= len(screen_states)),
|
||||||
saute l'étape 2 (pas de recalcul CLIP).
|
saute l'étape 2 (pas de recalcul CLIP).
|
||||||
|
sequential: Si True, crée un node par état d'écran (pas de
|
||||||
|
clustering DBSCAN). Approprié pour les enregistrements
|
||||||
|
single-pass d'un workflow — chaque screenshot est une étape
|
||||||
|
distincte avec ses actions associées.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
Workflow construit avec nodes et edges
|
Workflow construit avec nodes et edges
|
||||||
@@ -242,6 +312,7 @@ class GraphBuilder:
|
|||||||
f"Building workflow from session {session.session_id} "
|
f"Building workflow from session {session.session_id} "
|
||||||
f"with {len(precomputed_states or session.screenshots)} "
|
f"with {len(precomputed_states or session.screenshots)} "
|
||||||
f"{'precomputed states' if precomputed_states else 'screenshots'}"
|
f"{'precomputed states' if precomputed_states else 'screenshots'}"
|
||||||
|
f"{' (mode séquentiel)' if sequential else ''}"
|
||||||
)
|
)
|
||||||
|
|
||||||
# Étape 1: Créer ScreenStates (ou réutiliser ceux pré-calculés)
|
# Étape 1: Créer ScreenStates (ou réutiliser ceux pré-calculés)
|
||||||
@@ -266,7 +337,16 @@ class GraphBuilder:
|
|||||||
embeddings = self._compute_embeddings(screen_states)
|
embeddings = self._compute_embeddings(screen_states)
|
||||||
logger.debug(f"Computed {len(embeddings)} embeddings")
|
logger.debug(f"Computed {len(embeddings)} embeddings")
|
||||||
|
|
||||||
# Étape 3: Détecter patterns
|
# Étape 3: Détecter patterns ou mode séquentiel
|
||||||
|
if sequential:
|
||||||
|
# Mode séquentiel : chaque état d'écran est un node distinct.
|
||||||
|
# Pas de clustering — essentiel pour les enregistrements single-pass
|
||||||
|
# où l'on veut reproduire fidèlement la séquence des actions.
|
||||||
|
clusters = {i: [i] for i in range(len(screen_states))}
|
||||||
|
logger.info(
|
||||||
|
f"Mode séquentiel: {len(clusters)} nodes (1 par état)"
|
||||||
|
)
|
||||||
|
else:
|
||||||
clusters = self._detect_patterns(embeddings, screen_states)
|
clusters = self._detect_patterns(embeddings, screen_states)
|
||||||
logger.info(f"Detected {len(clusters)} patterns")
|
logger.info(f"Detected {len(clusters)} patterns")
|
||||||
|
|
||||||
@@ -275,7 +355,10 @@ class GraphBuilder:
|
|||||||
logger.info(f"Built {len(nodes)} workflow nodes")
|
logger.info(f"Built {len(nodes)} workflow nodes")
|
||||||
|
|
||||||
# Étape 5: Construire edges (passer les embeddings pour éviter recalcul)
|
# Étape 5: Construire edges (passer les embeddings pour éviter recalcul)
|
||||||
edges = self._build_edges(nodes, screen_states, session, embeddings=embeddings)
|
edges = self._build_edges(
|
||||||
|
nodes, screen_states, session, embeddings=embeddings,
|
||||||
|
sequential=sequential,
|
||||||
|
)
|
||||||
logger.info(f"Built {len(edges)} workflow edges")
|
logger.info(f"Built {len(edges)} workflow edges")
|
||||||
|
|
||||||
# Créer Workflow
|
# Créer Workflow
|
||||||
@@ -395,11 +478,28 @@ class GraphBuilder:
|
|||||||
if event.screenshot_id:
|
if event.screenshot_id:
|
||||||
screenshot_to_event[event.screenshot_id] = event
|
screenshot_to_event[event.screenshot_id] = event
|
||||||
|
|
||||||
|
# Récupérer (une seule fois) l'analyzer partagé si l'enrichissement est actif.
|
||||||
|
# Le singleton C1 garantit qu'on ne recharge pas UIDetector/CLIP inutilement.
|
||||||
|
analyzer = None
|
||||||
|
if self.enable_ui_enrichment:
|
||||||
|
analyzer = self._get_screen_analyzer()
|
||||||
|
|
||||||
|
# Cache partagé (C1) : réutiliser les analyses si même screenshot est
|
||||||
|
# repassé plusieurs fois (peu fréquent en construction, utile en tests).
|
||||||
|
try:
|
||||||
|
from core.pipeline import get_screen_state_cache
|
||||||
|
|
||||||
|
state_cache = get_screen_state_cache()
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug(f"ScreenStateCache indisponible ({e}); aucun cache utilisé.")
|
||||||
|
state_cache = None
|
||||||
|
|
||||||
|
enriched_count = 0
|
||||||
for i, screenshot in enumerate(session.screenshots):
|
for i, screenshot in enumerate(session.screenshots):
|
||||||
# Trouver l'événement associé
|
# Trouver l'événement associé
|
||||||
event = screenshot_to_event.get(screenshot.screenshot_id)
|
event = screenshot_to_event.get(screenshot.screenshot_id)
|
||||||
|
|
||||||
# Créer WindowContext depuis l'événement
|
# Construire WindowContext depuis l'événement (si dispo)
|
||||||
screen_env = session.environment.get("screen", {})
|
screen_env = session.environment.get("screen", {})
|
||||||
screen_res = screen_env.get("primary_resolution", [1920, 1080])
|
screen_res = screen_env.get("primary_resolution", [1920, 1080])
|
||||||
if event and event.window:
|
if event and event.window:
|
||||||
@@ -427,59 +527,127 @@ class GraphBuilder:
|
|||||||
os_language=session.environment.get("os_language", "unknown"),
|
os_language=session.environment.get("os_language", "unknown"),
|
||||||
)
|
)
|
||||||
|
|
||||||
# Créer RawLevel
|
# Chemin absolu du screenshot
|
||||||
# Construire chemin absolu : data/training/sessions/{session_id}/{session_id}/{relative_path}
|
screenshot_absolute_path = (
|
||||||
screenshot_absolute_path = f"data/training/sessions/{session.session_id}/{session.session_id}/{screenshot.relative_path}"
|
f"data/training/sessions/{session.session_id}/"
|
||||||
|
f"{session.session_id}/{screenshot.relative_path}"
|
||||||
|
)
|
||||||
screenshot_path = Path(screenshot_absolute_path)
|
screenshot_path = Path(screenshot_absolute_path)
|
||||||
|
|
||||||
|
# Timestamp
|
||||||
|
if isinstance(screenshot.captured_at, str):
|
||||||
|
timestamp = datetime.fromisoformat(
|
||||||
|
screenshot.captured_at.replace('Z', '+00:00')
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
timestamp = screenshot.captured_at
|
||||||
|
|
||||||
|
# ------------------------------------------------------------
|
||||||
|
# Enrichissement visuel : déléguer au ScreenAnalyzer partagé
|
||||||
|
# ------------------------------------------------------------
|
||||||
|
# L'analyzer renvoie un ScreenState complet avec :
|
||||||
|
# - raw (image + file_size)
|
||||||
|
# - perception (OCR + embedding ref)
|
||||||
|
# - ui_elements (détection UIDetector)
|
||||||
|
# On récupère ces niveaux et on rebâtit un état final avec le
|
||||||
|
# WindowContext et les metadata issus de la session brute (les
|
||||||
|
# données "metier" que l'analyzer ignore).
|
||||||
|
# ------------------------------------------------------------
|
||||||
|
detected_text: List[str] = []
|
||||||
|
text_method = "none"
|
||||||
|
ui_elements: List = []
|
||||||
raw = RawLevel(
|
raw = RawLevel(
|
||||||
screenshot_path=str(screenshot_path),
|
screenshot_path=str(screenshot_path),
|
||||||
capture_method="mss",
|
capture_method="mss",
|
||||||
file_size_bytes=screenshot_path.stat().st_size if screenshot_path.exists() else 0
|
file_size_bytes=(
|
||||||
|
screenshot_path.stat().st_size
|
||||||
|
if screenshot_path.exists()
|
||||||
|
else 0
|
||||||
|
),
|
||||||
)
|
)
|
||||||
|
|
||||||
# Créer PerceptionLevel — enrichir avec OCR si le screenshot existe
|
if analyzer is not None and screenshot_path.exists():
|
||||||
detected_text = []
|
|
||||||
text_method = "none"
|
|
||||||
|
|
||||||
if screenshot_path.exists():
|
|
||||||
try:
|
try:
|
||||||
if self._screen_analyzer is None:
|
# Construire l'info fenêtre pour donner le contexte à
|
||||||
from core.pipeline.screen_analyzer import ScreenAnalyzer
|
# l'UIDetector (certains détecteurs s'en servent pour
|
||||||
self._screen_analyzer = ScreenAnalyzer(session_id=session.session_id)
|
# filtrer hors-fenêtre).
|
||||||
extracted = self._screen_analyzer._extract_text(str(screenshot_path))
|
window_info = {
|
||||||
if extracted:
|
"app_name": window.app_name,
|
||||||
detected_text = extracted
|
"title": window.window_title,
|
||||||
text_method = self._screen_analyzer._get_ocr_method_name()
|
"screen_resolution": list(window.screen_resolution or []),
|
||||||
except Exception as e:
|
}
|
||||||
logger.debug(f"OCR échoué pour {screenshot_path}: {e}")
|
|
||||||
|
|
||||||
|
analyzed = analyzer.analyze(
|
||||||
|
str(screenshot_path),
|
||||||
|
window_info=window_info,
|
||||||
|
enable_ocr=True,
|
||||||
|
enable_ui_detection=True,
|
||||||
|
session_id=session.session_id,
|
||||||
|
)
|
||||||
|
detected_text = list(analyzed.perception.detected_text or [])
|
||||||
|
text_method = (
|
||||||
|
analyzed.perception.text_detection_method or "none"
|
||||||
|
)
|
||||||
|
ui_elements = list(analyzed.ui_elements or [])
|
||||||
|
# Garder les métriques OCR/UI si présentes (debug)
|
||||||
|
analyzer_metadata = dict(analyzed.metadata or {})
|
||||||
|
raw = analyzed.raw # conserver file_size réel mesuré
|
||||||
|
if ui_elements:
|
||||||
|
enriched_count += 1
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(
|
||||||
|
f"Enrichissement visuel échoué pour {screenshot_path}: {e}. "
|
||||||
|
"Fallback sur ScreenState minimal."
|
||||||
|
)
|
||||||
|
analyzer_metadata = {"analyzer_error": str(e)}
|
||||||
|
else:
|
||||||
|
analyzer_metadata = {}
|
||||||
|
if self.enable_ui_enrichment and not screenshot_path.exists():
|
||||||
|
logger.debug(
|
||||||
|
f"Screenshot introuvable: {screenshot_path} "
|
||||||
|
"— ui_elements restera vide"
|
||||||
|
)
|
||||||
|
|
||||||
|
# PerceptionLevel : vector_id calculé de façon déterministe.
|
||||||
perception = PerceptionLevel(
|
perception = PerceptionLevel(
|
||||||
embedding=EmbeddingRef(
|
embedding=EmbeddingRef(
|
||||||
provider="openclip_ViT-B-32",
|
provider="openclip_ViT-B-32",
|
||||||
vector_id=f"data/embeddings/screens/{session.session_id}_state_{i:04d}.npy",
|
vector_id=(
|
||||||
dimensions=512
|
f"data/embeddings/screens/"
|
||||||
|
f"{session.session_id}_state_{i:04d}.npy"
|
||||||
|
),
|
||||||
|
dimensions=512,
|
||||||
),
|
),
|
||||||
detected_text=detected_text,
|
detected_text=detected_text,
|
||||||
text_detection_method=text_method,
|
text_detection_method=text_method,
|
||||||
confidence_avg=0.85 if detected_text else 0.0
|
confidence_avg=0.85 if detected_text else 0.0,
|
||||||
)
|
)
|
||||||
|
|
||||||
# Créer ContextLevel
|
# ContextLevel (métier)
|
||||||
context = ContextLevel(
|
context = ContextLevel(
|
||||||
current_workflow_candidate=None,
|
current_workflow_candidate=None,
|
||||||
workflow_step=i,
|
workflow_step=i,
|
||||||
user_id=session.user.get("id", "unknown"),
|
user_id=session.user.get("id", "unknown"),
|
||||||
tags=list(session.context.get("tags", [])) if isinstance(session.context.get("tags"), list) else [],
|
tags=(
|
||||||
business_variables={}
|
list(session.context.get("tags", []))
|
||||||
|
if isinstance(session.context.get("tags"), list)
|
||||||
|
else []
|
||||||
|
),
|
||||||
|
business_variables={},
|
||||||
)
|
)
|
||||||
|
|
||||||
# Parser timestamp
|
# Metadata : on garde le lien événement/session + éventuels
|
||||||
if isinstance(screenshot.captured_at, str):
|
# compteurs remontés par l'analyzer.
|
||||||
timestamp = datetime.fromisoformat(screenshot.captured_at.replace('Z', '+00:00'))
|
metadata = {
|
||||||
else:
|
"screenshot_id": screenshot.screenshot_id,
|
||||||
timestamp = screenshot.captured_at
|
"event_type": event.type if event else None,
|
||||||
|
"event_time": event.t if event else None,
|
||||||
|
}
|
||||||
|
# Propager les indicateurs utiles de l'analyzer sans écraser la base.
|
||||||
|
for key in ("ocr_ms", "ui_ms", "analyzer_error"):
|
||||||
|
if key in analyzer_metadata:
|
||||||
|
metadata[key] = analyzer_metadata[key]
|
||||||
|
|
||||||
# Créer ScreenState complet
|
|
||||||
state = ScreenState(
|
state = ScreenState(
|
||||||
screen_state_id=f"{session.session_id}_state_{i:04d}",
|
screen_state_id=f"{session.session_id}_state_{i:04d}",
|
||||||
timestamp=timestamp,
|
timestamp=timestamp,
|
||||||
@@ -488,17 +656,17 @@ class GraphBuilder:
|
|||||||
raw=raw,
|
raw=raw,
|
||||||
perception=perception,
|
perception=perception,
|
||||||
context=context,
|
context=context,
|
||||||
metadata={
|
metadata=metadata,
|
||||||
"screenshot_id": screenshot.screenshot_id,
|
ui_elements=ui_elements,
|
||||||
"event_type": event.type if event else None,
|
|
||||||
"event_time": event.t if event else None
|
|
||||||
},
|
|
||||||
ui_elements=[] # Sera rempli par UIDetector si disponible
|
|
||||||
)
|
)
|
||||||
|
|
||||||
screen_states.append(state)
|
screen_states.append(state)
|
||||||
|
|
||||||
logger.info(f"Created {len(screen_states)} enriched screen states")
|
logger.info(
|
||||||
|
f"Created {len(screen_states)} enriched screen states "
|
||||||
|
f"({enriched_count} avec UI détectée, "
|
||||||
|
f"ui_enrichment={self.enable_ui_enrichment})"
|
||||||
|
)
|
||||||
return screen_states
|
return screen_states
|
||||||
|
|
||||||
def _compute_embeddings(
|
def _compute_embeddings(
|
||||||
@@ -924,6 +1092,99 @@ class GraphBuilder:
|
|||||||
constraints.sort(key=lambda c: role_counts.get(c.get("role", ""), 0), reverse=True)
|
constraints.sort(key=lambda c: role_counts.get(c.get("role", ""), 0), reverse=True)
|
||||||
return constraints[:8]
|
return constraints[:8]
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Association spatiale clic → UIElement
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _find_clicked_element(
|
||||||
|
self,
|
||||||
|
event: Event,
|
||||||
|
ui_elements: List[Any],
|
||||||
|
) -> Optional[Any]:
|
||||||
|
"""
|
||||||
|
Identifier l'UIElement cible d'un clic par proximité spatiale.
|
||||||
|
|
||||||
|
Règle :
|
||||||
|
1. Si un bbox contient strictement la position du clic → match.
|
||||||
|
2. Sinon, on prend le bbox le plus proche (distance euclidienne
|
||||||
|
au bord) sous réserve qu'il soit à <= `element_proximity_max_px`.
|
||||||
|
3. Sinon, aucun ancrage possible → None.
|
||||||
|
|
||||||
|
Cette association transforme un clic "aveugle" (coordonnées brutes)
|
||||||
|
en un clic "intelligent" (rôle + label), permettant au matcher de
|
||||||
|
retrouver l'élément même si la résolution ou la position change.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
event: Événement `mouse_click` (avec `data["pos"] = [x, y]`).
|
||||||
|
ui_elements: Liste des UIElement détectés sur l'écran source.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
UIElement le plus pertinent, ou None si rien ne correspond.
|
||||||
|
"""
|
||||||
|
if not ui_elements:
|
||||||
|
return None
|
||||||
|
if not event or event.type != "mouse_click":
|
||||||
|
return None
|
||||||
|
|
||||||
|
pos = event.data.get("pos") if event.data else None
|
||||||
|
if not pos or len(pos) < 2:
|
||||||
|
return None
|
||||||
|
|
||||||
|
try:
|
||||||
|
click_x = float(pos[0])
|
||||||
|
click_y = float(pos[1])
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
return None
|
||||||
|
|
||||||
|
best_contained = None
|
||||||
|
best_contained_area = None
|
||||||
|
best_near = None
|
||||||
|
best_near_distance = None
|
||||||
|
|
||||||
|
for element in ui_elements:
|
||||||
|
bbox = getattr(element, "bbox", None)
|
||||||
|
if bbox is None:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Extraction défensive des coordonnées (BBox Pydantic ou tuple)
|
||||||
|
try:
|
||||||
|
bx = int(getattr(bbox, "x", bbox[0]))
|
||||||
|
by = int(getattr(bbox, "y", bbox[1]))
|
||||||
|
bw = int(getattr(bbox, "width", bbox[2]))
|
||||||
|
bh = int(getattr(bbox, "height", bbox[3]))
|
||||||
|
except (AttributeError, IndexError, TypeError):
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Cas 1 : la position est strictement dans le bbox.
|
||||||
|
if bx <= click_x <= bx + bw and by <= click_y <= by + bh:
|
||||||
|
# Sélectionner le plus petit bbox qui contient (élément le plus spécifique)
|
||||||
|
area = max(1, bw * bh)
|
||||||
|
if best_contained is None or area < best_contained_area:
|
||||||
|
best_contained = element
|
||||||
|
best_contained_area = area
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Cas 2 : calculer la distance au bord le plus proche.
|
||||||
|
dx = max(bx - click_x, 0, click_x - (bx + bw))
|
||||||
|
dy = max(by - click_y, 0, click_y - (by + bh))
|
||||||
|
distance = (dx * dx + dy * dy) ** 0.5
|
||||||
|
|
||||||
|
if best_near is None or distance < best_near_distance:
|
||||||
|
best_near = element
|
||||||
|
best_near_distance = distance
|
||||||
|
|
||||||
|
if best_contained is not None:
|
||||||
|
return best_contained
|
||||||
|
|
||||||
|
if (
|
||||||
|
best_near is not None
|
||||||
|
and best_near_distance is not None
|
||||||
|
and best_near_distance <= self.element_proximity_max_px
|
||||||
|
):
|
||||||
|
return best_near
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
# Patterns d'erreur courants pour la détection fail_fast
|
# Patterns d'erreur courants pour la détection fail_fast
|
||||||
_ERROR_PATTERNS = [
|
_ERROR_PATTERNS = [
|
||||||
"erreur", "error", "échec", "failed", "impossible",
|
"erreur", "error", "échec", "failed", "impossible",
|
||||||
@@ -937,12 +1198,14 @@ class GraphBuilder:
|
|||||||
screen_states: List[ScreenState],
|
screen_states: List[ScreenState],
|
||||||
session: RawSession,
|
session: RawSession,
|
||||||
embeddings: Optional[List[np.ndarray]] = None,
|
embeddings: Optional[List[np.ndarray]] = None,
|
||||||
|
sequential: bool = False,
|
||||||
) -> List[WorkflowEdge]:
|
) -> List[WorkflowEdge]:
|
||||||
"""
|
"""
|
||||||
Construire WorkflowEdges depuis les transitions observées.
|
Construire WorkflowEdges depuis les transitions observées.
|
||||||
|
|
||||||
Algorithme:
|
Algorithme:
|
||||||
1. Mapper chaque ScreenState vers son node (via embedding similarity)
|
1. Mapper chaque ScreenState vers son node (via embedding similarity)
|
||||||
|
En mode séquentiel, le mapping est direct (state i → node i).
|
||||||
2. Identifier les transitions (state_i -> state_j où node change)
|
2. Identifier les transitions (state_i -> state_j où node change)
|
||||||
3. Extraire l'action depuis l'événement entre les deux états
|
3. Extraire l'action depuis l'événement entre les deux états
|
||||||
4. Créer WorkflowEdge avec action, pré-conditions et post-conditions
|
4. Créer WorkflowEdge avec action, pré-conditions et post-conditions
|
||||||
@@ -960,6 +1223,7 @@ class GraphBuilder:
|
|||||||
screen_states: ScreenStates
|
screen_states: ScreenStates
|
||||||
session: Session brute (pour événements)
|
session: Session brute (pour événements)
|
||||||
embeddings: Embeddings pré-calculés (évite un recalcul dans _map_states_to_nodes)
|
embeddings: Embeddings pré-calculés (évite un recalcul dans _map_states_to_nodes)
|
||||||
|
sequential: Mode séquentiel — chaque paire consécutive = transition
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
Liste de WorkflowEdges
|
Liste de WorkflowEdges
|
||||||
@@ -975,7 +1239,19 @@ class GraphBuilder:
|
|||||||
node_by_id = {node.node_id: node for node in nodes}
|
node_by_id = {node.node_id: node for node in nodes}
|
||||||
|
|
||||||
# Étape 1: Mapper chaque état vers son node
|
# Étape 1: Mapper chaque état vers son node
|
||||||
state_to_node = self._map_states_to_nodes(screen_states, nodes, embeddings=embeddings)
|
if sequential:
|
||||||
|
# Mode séquentiel : mapping direct state[i] → node[i]
|
||||||
|
state_to_node = {}
|
||||||
|
for i, state in enumerate(screen_states):
|
||||||
|
if i < len(nodes):
|
||||||
|
state_to_node[state.screen_state_id] = nodes[i].node_id
|
||||||
|
logger.debug(
|
||||||
|
f"Mode séquentiel: {len(state_to_node)} states mappés directement"
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
state_to_node = self._map_states_to_nodes(
|
||||||
|
screen_states, nodes, embeddings=embeddings
|
||||||
|
)
|
||||||
|
|
||||||
# Étape 2: Récupérer la résolution d'écran pour normaliser les coordonnées
|
# Étape 2: Récupérer la résolution d'écran pour normaliser les coordonnées
|
||||||
screen_env = session.environment.get("screen", {})
|
screen_env = session.environment.get("screen", {})
|
||||||
@@ -989,8 +1265,11 @@ class GraphBuilder:
|
|||||||
current_node_id = state_to_node.get(current_state.screen_state_id)
|
current_node_id = state_to_node.get(current_state.screen_state_id)
|
||||||
next_node_id = state_to_node.get(next_state.screen_state_id)
|
next_node_id = state_to_node.get(next_state.screen_state_id)
|
||||||
|
|
||||||
# Si les deux états sont dans des nodes différents, c'est une transition
|
# En mode séquentiel, chaque paire consécutive est une transition
|
||||||
if current_node_id and next_node_id and current_node_id != next_node_id:
|
# En mode clustering, uniquement si les nodes sont différents
|
||||||
|
if current_node_id and next_node_id and (
|
||||||
|
sequential or current_node_id != next_node_id
|
||||||
|
):
|
||||||
# Trouver TOUS les événements entre les deux états
|
# Trouver TOUS les événements entre les deux états
|
||||||
transition_events = self._find_transition_events(
|
transition_events = self._find_transition_events(
|
||||||
current_state, next_state, session.events
|
current_state, next_state, session.events
|
||||||
@@ -1012,6 +1291,7 @@ class GraphBuilder:
|
|||||||
target_node=target_node,
|
target_node=target_node,
|
||||||
all_events=transition_events,
|
all_events=transition_events,
|
||||||
screen_resolution=screen_resolution,
|
screen_resolution=screen_resolution,
|
||||||
|
source_state=current_state,
|
||||||
)
|
)
|
||||||
edges.append(edge)
|
edges.append(edge)
|
||||||
|
|
||||||
@@ -1094,6 +1374,32 @@ class GraphBuilder:
|
|||||||
|
|
||||||
return state_to_node
|
return state_to_node
|
||||||
|
|
||||||
|
def _get_state_time(self, state: ScreenState, fallback: float = 0) -> float:
|
||||||
|
"""Extraire le timestamp d'un ScreenState.
|
||||||
|
|
||||||
|
Priorité :
|
||||||
|
1. metadata['event_time'] (set par _create_screen_states)
|
||||||
|
2. metadata['shot_timestamp'] (set par le reprocessing)
|
||||||
|
3. state.timestamp converti en epoch si c'est un datetime
|
||||||
|
4. fallback
|
||||||
|
|
||||||
|
Note : event_time peut être 0.0 (timestamps relatifs), donc on
|
||||||
|
vérifie `is not None` et non `> 0`.
|
||||||
|
"""
|
||||||
|
if state.metadata:
|
||||||
|
et = state.metadata.get("event_time")
|
||||||
|
if et is not None:
|
||||||
|
return float(et)
|
||||||
|
st = state.metadata.get("shot_timestamp")
|
||||||
|
if st is not None:
|
||||||
|
return float(st)
|
||||||
|
if state.timestamp:
|
||||||
|
try:
|
||||||
|
return state.timestamp.timestamp()
|
||||||
|
except (AttributeError, OSError):
|
||||||
|
pass
|
||||||
|
return fallback
|
||||||
|
|
||||||
def _find_transition_events(
|
def _find_transition_events(
|
||||||
self,
|
self,
|
||||||
current_state: ScreenState,
|
current_state: ScreenState,
|
||||||
@@ -1108,6 +1414,9 @@ class GraphBuilder:
|
|||||||
C'est essentiel pour le replay : une transition peut nécessiter
|
C'est essentiel pour le replay : une transition peut nécessiter
|
||||||
plusieurs actions (ex: Win+R → taper "notepad" → Entrée).
|
plusieurs actions (ex: Win+R → taper "notepad" → Entrée).
|
||||||
|
|
||||||
|
Timestamps : utilise _get_state_time() qui supporte plusieurs
|
||||||
|
sources (event_time, shot_timestamp, datetime).
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
current_state: État source
|
current_state: État source
|
||||||
next_state: État cible
|
next_state: État cible
|
||||||
@@ -1117,8 +1426,8 @@ class GraphBuilder:
|
|||||||
Liste ordonnée (par timestamp) de tous les événements d'action
|
Liste ordonnée (par timestamp) de tous les événements d'action
|
||||||
entre les deux états. Peut être vide.
|
entre les deux états. Peut être vide.
|
||||||
"""
|
"""
|
||||||
current_time = current_state.metadata.get("event_time", 0)
|
current_time = self._get_state_time(current_state, fallback=0)
|
||||||
next_time = next_state.metadata.get("event_time", float('inf'))
|
next_time = self._get_state_time(next_state, fallback=float('inf'))
|
||||||
|
|
||||||
action_events = []
|
action_events = []
|
||||||
for event in events:
|
for event in events:
|
||||||
@@ -1155,6 +1464,7 @@ class GraphBuilder:
|
|||||||
target_node: Optional[WorkflowNode] = None,
|
target_node: Optional[WorkflowNode] = None,
|
||||||
all_events: Optional[List[Event]] = None,
|
all_events: Optional[List[Event]] = None,
|
||||||
screen_resolution: Tuple[int, int] = (1920, 1080),
|
screen_resolution: Tuple[int, int] = (1920, 1080),
|
||||||
|
source_state: Optional[ScreenState] = None,
|
||||||
) -> WorkflowEdge:
|
) -> WorkflowEdge:
|
||||||
"""
|
"""
|
||||||
Créer un WorkflowEdge depuis une transition observée.
|
Créer un WorkflowEdge depuis une transition observée.
|
||||||
@@ -1180,12 +1490,24 @@ class GraphBuilder:
|
|||||||
# Si on a plusieurs événements, créer une action compound
|
# Si on a plusieurs événements, créer une action compound
|
||||||
events_to_use = all_events or ([event] if event else [])
|
events_to_use = all_events or ([event] if event else [])
|
||||||
|
|
||||||
|
# UIElements de l'écran source — sert à ancrer les clics sur un vrai
|
||||||
|
# élément UI (rôle, texte, bbox) plutôt que sur une coordonnée brute.
|
||||||
|
source_ui_elements = (
|
||||||
|
list(source_state.ui_elements)
|
||||||
|
if source_state and source_state.ui_elements
|
||||||
|
else []
|
||||||
|
)
|
||||||
|
|
||||||
if len(events_to_use) > 1:
|
if len(events_to_use) > 1:
|
||||||
action = self._build_compound_action(
|
action = self._build_compound_action(
|
||||||
events_to_use, screen_resolution
|
events_to_use, screen_resolution,
|
||||||
|
source_ui_elements=source_ui_elements,
|
||||||
)
|
)
|
||||||
elif len(events_to_use) == 1:
|
elif len(events_to_use) == 1:
|
||||||
action = self._build_single_action(events_to_use[0])
|
action = self._build_single_action(
|
||||||
|
events_to_use[0],
|
||||||
|
source_ui_elements=source_ui_elements,
|
||||||
|
)
|
||||||
else:
|
else:
|
||||||
action = Action(
|
action = Action(
|
||||||
type="unknown",
|
type="unknown",
|
||||||
@@ -1235,15 +1557,29 @@ class GraphBuilder:
|
|||||||
metadata=edge_metadata,
|
metadata=edge_metadata,
|
||||||
)
|
)
|
||||||
|
|
||||||
def _build_single_action(self, event: Event) -> Action:
|
def _build_single_action(
|
||||||
|
self,
|
||||||
|
event: Event,
|
||||||
|
source_ui_elements: Optional[List[Any]] = None,
|
||||||
|
) -> Action:
|
||||||
"""
|
"""
|
||||||
Construire une Action simple depuis un seul événement.
|
Construire une Action simple depuis un seul événement.
|
||||||
|
|
||||||
Rétrocompatible avec l'ancien format : un type d'action direct
|
Pour un clic, si `source_ui_elements` est fourni, on tente d'ancrer
|
||||||
(mouse_click, key_press, text_input) avec ses paramètres.
|
l'action sur l'UIElement le plus proche (par proximité spatiale).
|
||||||
|
Le TargetSpec devient alors discriminant :
|
||||||
|
- `by_role` = rôle sémantique de l'élément (ex: "primary_action")
|
||||||
|
- `by_text` = label détecté (ex: "Valider")
|
||||||
|
- `selection_policy` = "by_similarity" (laisse le matcher scorer)
|
||||||
|
- `context_hints["anchor_element_id"]` = traçabilité
|
||||||
|
- `context_hints["anchor_bbox"]` = invariant spatial debug
|
||||||
|
|
||||||
|
À défaut d'ancrage (pas d'UIElement ou clic hors de toute bbox
|
||||||
|
proche), on retombe sur `by_role="unknown_element"` (legacy).
|
||||||
"""
|
"""
|
||||||
action_type = event.type
|
action_type = event.type
|
||||||
action_params = {}
|
action_params: Dict[str, Any] = {}
|
||||||
|
target_spec: Optional[TargetSpec] = None
|
||||||
|
|
||||||
if action_type == "mouse_click":
|
if action_type == "mouse_click":
|
||||||
action_params = {
|
action_params = {
|
||||||
@@ -1251,39 +1587,111 @@ class GraphBuilder:
|
|||||||
"position": event.data.get("pos", [0, 0]),
|
"position": event.data.get("pos", [0, 0]),
|
||||||
"wait_after_ms": 500,
|
"wait_after_ms": 500,
|
||||||
}
|
}
|
||||||
target_role = "unknown_element"
|
target_spec = self._build_click_target_spec(
|
||||||
|
event, source_ui_elements or []
|
||||||
|
)
|
||||||
|
|
||||||
elif action_type == "key_press":
|
elif action_type == "key_press":
|
||||||
action_params = {
|
action_params = {
|
||||||
"keys": event.data.get("keys", []),
|
"keys": event.data.get("keys", []),
|
||||||
"wait_after_ms": 200,
|
"wait_after_ms": 200,
|
||||||
}
|
}
|
||||||
target_role = "keyboard_input"
|
target_spec = TargetSpec(
|
||||||
|
by_role="keyboard_input",
|
||||||
|
selection_policy="first",
|
||||||
|
fallback_strategy="visual_similarity",
|
||||||
|
)
|
||||||
|
|
||||||
elif action_type == "text_input":
|
elif action_type == "text_input":
|
||||||
action_params = {
|
action_params = {
|
||||||
"text": event.data.get("text", ""),
|
"text": event.data.get("text", ""),
|
||||||
"wait_after_ms": 300,
|
"wait_after_ms": 300,
|
||||||
}
|
}
|
||||||
target_role = "text_field"
|
target_spec = TargetSpec(
|
||||||
|
by_role="text_field",
|
||||||
|
selection_policy="first",
|
||||||
|
fallback_strategy="visual_similarity",
|
||||||
|
)
|
||||||
else:
|
else:
|
||||||
action_params = {}
|
action_params = {}
|
||||||
target_role = "unknown"
|
target_spec = TargetSpec(
|
||||||
|
by_role="unknown",
|
||||||
|
selection_policy="first",
|
||||||
|
fallback_strategy="visual_similarity",
|
||||||
|
)
|
||||||
|
|
||||||
return Action(
|
return Action(
|
||||||
type=action_type,
|
type=action_type,
|
||||||
target=TargetSpec(
|
target=target_spec,
|
||||||
by_role=target_role,
|
parameters=action_params,
|
||||||
|
)
|
||||||
|
|
||||||
|
def _build_click_target_spec(
|
||||||
|
self,
|
||||||
|
event: Event,
|
||||||
|
source_ui_elements: List[Any],
|
||||||
|
) -> TargetSpec:
|
||||||
|
"""
|
||||||
|
Construire un TargetSpec pour un clic, en essayant de l'ancrer à
|
||||||
|
un UIElement détecté sur l'écran source.
|
||||||
|
|
||||||
|
Retourne toujours un TargetSpec valide :
|
||||||
|
- ancré (role + text + context_hints) si un élément proche existe ;
|
||||||
|
- fallback `unknown_element` sinon (comportement historique).
|
||||||
|
"""
|
||||||
|
clicked = self._find_clicked_element(event, source_ui_elements)
|
||||||
|
|
||||||
|
if clicked is None:
|
||||||
|
return TargetSpec(
|
||||||
|
by_role="unknown_element",
|
||||||
selection_policy="first",
|
selection_policy="first",
|
||||||
fallback_strategy="visual_similarity",
|
fallback_strategy="visual_similarity",
|
||||||
),
|
)
|
||||||
parameters=action_params,
|
|
||||||
|
# Extraction défensive des attributs de l'élément.
|
||||||
|
role = getattr(clicked, "role", None) or "unknown_element"
|
||||||
|
label = getattr(clicked, "label", None) or None
|
||||||
|
element_id = getattr(clicked, "element_id", None)
|
||||||
|
|
||||||
|
# Contexte de traçabilité — `context_hints` est le seul dict libre
|
||||||
|
# disponible dans TargetSpec (pas de champ `metadata` dédié).
|
||||||
|
context_hints: Dict[str, Any] = {}
|
||||||
|
if element_id:
|
||||||
|
context_hints["anchor_element_id"] = str(element_id)
|
||||||
|
|
||||||
|
bbox = getattr(clicked, "bbox", None)
|
||||||
|
if bbox is not None:
|
||||||
|
try:
|
||||||
|
context_hints["anchor_bbox"] = {
|
||||||
|
"x": int(getattr(bbox, "x", bbox[0])),
|
||||||
|
"y": int(getattr(bbox, "y", bbox[1])),
|
||||||
|
"width": int(getattr(bbox, "width", bbox[2])),
|
||||||
|
"height": int(getattr(bbox, "height", bbox[3])),
|
||||||
|
}
|
||||||
|
except (AttributeError, IndexError, TypeError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Center (utile comme ancre de fallback quand le matcher échoue)
|
||||||
|
center = getattr(clicked, "center", None)
|
||||||
|
if center is not None:
|
||||||
|
try:
|
||||||
|
context_hints["anchor_center"] = [int(center[0]), int(center[1])]
|
||||||
|
except (IndexError, TypeError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
return TargetSpec(
|
||||||
|
by_role=role,
|
||||||
|
by_text=label,
|
||||||
|
selection_policy="by_similarity",
|
||||||
|
fallback_strategy="visual_similarity",
|
||||||
|
context_hints=context_hints,
|
||||||
)
|
)
|
||||||
|
|
||||||
def _build_compound_action(
|
def _build_compound_action(
|
||||||
self,
|
self,
|
||||||
events: List[Event],
|
events: List[Event],
|
||||||
screen_resolution: Tuple[int, int] = (1920, 1080),
|
screen_resolution: Tuple[int, int] = (1920, 1080),
|
||||||
|
source_ui_elements: Optional[List[Any]] = None,
|
||||||
) -> Action:
|
) -> Action:
|
||||||
"""
|
"""
|
||||||
Construire une Action compound (multi-étapes) depuis plusieurs événements.
|
Construire une Action compound (multi-étapes) depuis plusieurs événements.
|
||||||
@@ -1360,21 +1768,33 @@ class GraphBuilder:
|
|||||||
# La cible du compound = cible de la dernière action (le clic final, etc.)
|
# La cible du compound = cible de la dernière action (le clic final, etc.)
|
||||||
last_event = events[-1]
|
last_event = events[-1]
|
||||||
if last_event.type == "mouse_click":
|
if last_event.type == "mouse_click":
|
||||||
target_role = "unknown_element"
|
# On tente d'ancrer le clic final aux UIElements détectés,
|
||||||
|
# comme dans _build_single_action.
|
||||||
|
target_spec = self._build_click_target_spec(
|
||||||
|
last_event, source_ui_elements or []
|
||||||
|
)
|
||||||
elif last_event.type == "text_input":
|
elif last_event.type == "text_input":
|
||||||
target_role = "text_field"
|
target_spec = TargetSpec(
|
||||||
|
by_role="text_field",
|
||||||
|
selection_policy="first",
|
||||||
|
fallback_strategy="visual_similarity",
|
||||||
|
)
|
||||||
elif last_event.type == "key_press":
|
elif last_event.type == "key_press":
|
||||||
target_role = "keyboard_input"
|
target_spec = TargetSpec(
|
||||||
|
by_role="keyboard_input",
|
||||||
|
selection_policy="first",
|
||||||
|
fallback_strategy="visual_similarity",
|
||||||
|
)
|
||||||
else:
|
else:
|
||||||
target_role = "unknown"
|
target_spec = TargetSpec(
|
||||||
|
by_role="unknown",
|
||||||
|
selection_policy="first",
|
||||||
|
fallback_strategy="visual_similarity",
|
||||||
|
)
|
||||||
|
|
||||||
return Action(
|
return Action(
|
||||||
type="compound",
|
type="compound",
|
||||||
target=TargetSpec(
|
target=target_spec,
|
||||||
by_role=target_role,
|
|
||||||
selection_policy="first",
|
|
||||||
fallback_strategy="visual_similarity",
|
|
||||||
),
|
|
||||||
parameters={
|
parameters={
|
||||||
"steps": steps,
|
"steps": steps,
|
||||||
"step_count": len(steps),
|
"step_count": len(steps),
|
||||||
|
|||||||
20
core/grounding/__init__.py
Normal file
20
core/grounding/__init__.py
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
# core/grounding — Module de localisation d'éléments UI
|
||||||
|
#
|
||||||
|
# Centralise les méthodes de grounding visuel : template matching,
|
||||||
|
# OCR, VLM, etc. Chaque méthode produit un GroundingResult uniforme.
|
||||||
|
#
|
||||||
|
# Le serveur de grounding (server.py) tourne dans un process séparé
|
||||||
|
# sur le port 8200. Le client HTTP (UITarsGrounder) l'appelle via HTTP.
|
||||||
|
# Le pipeline (GroundingPipeline) orchestre template → OCR → UI-TARS → static.
|
||||||
|
|
||||||
|
from core.grounding.template_matcher import TemplateMatcher, MatchResult
|
||||||
|
from core.grounding.target import GroundingTarget, GroundingResult
|
||||||
|
from core.grounding.ui_tars_grounder import UITarsGrounder
|
||||||
|
from core.grounding.pipeline import GroundingPipeline
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
'TemplateMatcher', 'MatchResult',
|
||||||
|
'GroundingTarget', 'GroundingResult',
|
||||||
|
'UITarsGrounder',
|
||||||
|
'GroundingPipeline',
|
||||||
|
]
|
||||||
256
core/grounding/dialog_handler.py
Normal file
256
core/grounding/dialog_handler.py
Normal file
@@ -0,0 +1,256 @@
|
|||||||
|
"""
|
||||||
|
core/grounding/dialog_handler.py — Gestion intelligente des dialogues
|
||||||
|
|
||||||
|
Quand un dialogue inattendu apparaît (pHash change après une action) :
|
||||||
|
1. Lire le titre de la fenêtre (EasyOCR crop 45px, ~130ms)
|
||||||
|
2. Si titre connu (Enregistrer sous, Confirmer, etc.) → action connue
|
||||||
|
3. Demander à InfiGUI de cliquer sur le bon bouton (~3s)
|
||||||
|
4. Vérifier que le dialogue a disparu (pHash)
|
||||||
|
|
||||||
|
Pas de patterns prédéfinis pour les boutons. InfiGUI comprend
|
||||||
|
visuellement le dialogue et clique au bon endroit.
|
||||||
|
|
||||||
|
Utilisation :
|
||||||
|
from core.grounding.dialog_handler import DialogHandler
|
||||||
|
|
||||||
|
handler = DialogHandler()
|
||||||
|
result = handler.handle_if_dialog(screenshot_pil)
|
||||||
|
if result['handled']:
|
||||||
|
print(f"Dialogue '{result['title']}' géré → {result['action']}")
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import time
|
||||||
|
from typing import Any, Dict, Optional
|
||||||
|
|
||||||
|
|
||||||
|
# Titres connus → quelle action demander à InfiGUI.
|
||||||
|
#
|
||||||
|
# IMPORTANT — ordre du dict = priorité de matching.
|
||||||
|
# L'OCR est full-screen et capte souvent le texte du dialog parent ET du popup
|
||||||
|
# modal qui apparaît par-dessus (ex: "Enregistrer sous" reste visible derrière
|
||||||
|
# "Confirmer l'enregistrement"). Les popups modaux DOIVENT matcher avant les
|
||||||
|
# fenêtres principales, sinon Léa clique sur le bouton du parent qui n'a pas
|
||||||
|
# le focus.
|
||||||
|
KNOWN_DIALOGS = {
|
||||||
|
# ── Popups modaux de confirmation (priorité HAUTE) ──────────────────
|
||||||
|
"voulez-vous le remplacer": {"target": "Oui", "description": "Clique sur Oui pour confirmer le remplacement du fichier"},
|
||||||
|
"do you want to replace": {"target": "Yes", "description": "Click Yes to confirm file replacement"},
|
||||||
|
"existe déjà": {"target": "Oui", "description": "Clique sur Oui, le fichier existe déjà et doit être remplacé"},
|
||||||
|
"already exists": {"target": "Yes", "description": "Click Yes, the file already exists"},
|
||||||
|
"remplacer": {"target": "Oui", "description": "Clique sur le bouton Oui pour confirmer le remplacement du fichier"},
|
||||||
|
"replace": {"target": "Yes", "description": "Click Yes to confirm file replacement"},
|
||||||
|
"écraser": {"target": "Oui", "description": "Clique sur Oui pour écraser le fichier"},
|
||||||
|
"overwrite": {"target": "Yes", "description": "Click Yes to overwrite"},
|
||||||
|
"confirmer l'enregistrement": {"target": "Oui", "description": "Clique sur Oui dans le popup de confirmation d'enregistrement"},
|
||||||
|
"confirmer": {"target": "Oui", "description": "Clique sur le bouton Oui dans le dialogue de confirmation"},
|
||||||
|
# ── Avertissements/erreurs (priorité haute, 1 seul bouton OK) ───────
|
||||||
|
"erreur": {"target": "OK", "description": "Clique sur OK pour fermer le message d'erreur"},
|
||||||
|
"error": {"target": "OK", "description": "Click OK to close the error message"},
|
||||||
|
"avertissement": {"target": "OK", "description": "Clique sur OK pour fermer l'avertissement"},
|
||||||
|
"warning": {"target": "OK", "description": "Click OK to close the warning"},
|
||||||
|
# ── Dialogs principaux de sauvegarde (priorité BASSE — fenêtres parents) ─
|
||||||
|
"voulez-vous enregistrer": {"target": "Enregistrer", "description": "Clique sur Enregistrer pour sauvegarder les modifications"},
|
||||||
|
"do you want to save": {"target": "Save", "description": "Click Save to save changes"},
|
||||||
|
"enregistrer sous": {"target": "Enregistrer", "description": "Clique sur le bouton Enregistrer dans le dialogue Enregistrer sous"},
|
||||||
|
"save as": {"target": "Save", "description": "Click the Save button in the Save As dialog"},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class DialogHandler:
|
||||||
|
"""Gestion intelligente des dialogues via titre + InfiGUI."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self._easyocr_reader = None
|
||||||
|
|
||||||
|
def handle_if_dialog(
|
||||||
|
self,
|
||||||
|
screenshot_pil,
|
||||||
|
previous_title: str = "",
|
||||||
|
) -> Dict[str, Any]:
|
||||||
|
"""Vérifie si l'écran montre un dialogue et le gère.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
screenshot_pil: Screenshot PIL actuel.
|
||||||
|
previous_title: Titre de la fenêtre avant l'action (pour comparaison).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict avec 'handled' (bool), 'title', 'action', 'position'.
|
||||||
|
"""
|
||||||
|
t0 = time.time()
|
||||||
|
|
||||||
|
# 1. Lire le titre de la fenêtre
|
||||||
|
title = self._read_title(screenshot_pil)
|
||||||
|
if not title or len(title) < 3:
|
||||||
|
return {'handled': False, 'title': '', 'reason': 'Titre illisible'}
|
||||||
|
|
||||||
|
print(f"🔍 [Dialog] Titre lu: '{title}'")
|
||||||
|
|
||||||
|
# 2. Chercher si c'est un dialogue connu
|
||||||
|
matched_dialog = None
|
||||||
|
for key, action_info in KNOWN_DIALOGS.items():
|
||||||
|
if key in title.lower():
|
||||||
|
matched_dialog = (key, action_info)
|
||||||
|
break
|
||||||
|
|
||||||
|
if not matched_dialog:
|
||||||
|
# Pas un dialogue connu — le workflow continue normalement
|
||||||
|
return {'handled': False, 'title': title, 'reason': 'Pas un dialogue connu'}
|
||||||
|
|
||||||
|
dialog_key, action_info = matched_dialog
|
||||||
|
target = action_info['target']
|
||||||
|
description = action_info['description']
|
||||||
|
|
||||||
|
print(f"🧠 [Dialog] Dialogue détecté: '{dialog_key}' → clic '{target}'")
|
||||||
|
|
||||||
|
# 3. Demander à InfiGUI de cliquer sur le bouton
|
||||||
|
click_result = self._click_via_infigui(
|
||||||
|
target, description, screenshot_pil
|
||||||
|
)
|
||||||
|
|
||||||
|
dt = (time.time() - t0) * 1000
|
||||||
|
|
||||||
|
if click_result:
|
||||||
|
print(f"✅ [Dialog] Clic '{target}' à ({click_result['x']}, {click_result['y']}) ({dt:.0f}ms)")
|
||||||
|
return {
|
||||||
|
'handled': True,
|
||||||
|
'title': title,
|
||||||
|
'dialog_type': dialog_key,
|
||||||
|
'action': f"click '{target}'",
|
||||||
|
'position': (click_result['x'], click_result['y']),
|
||||||
|
'time_ms': dt,
|
||||||
|
}
|
||||||
|
else:
|
||||||
|
# InfiGUI n'a pas trouvé le bouton — essayer le clic direct via OCR
|
||||||
|
print(f"⚠️ [Dialog] InfiGUI n'a pas trouvé '{target}', essai OCR direct")
|
||||||
|
ocr_result = self._click_via_ocr(target, screenshot_pil)
|
||||||
|
dt = (time.time() - t0) * 1000
|
||||||
|
|
||||||
|
if ocr_result:
|
||||||
|
print(f"✅ [Dialog] OCR clic '{target}' à ({ocr_result[0]}, {ocr_result[1]}) ({dt:.0f}ms)")
|
||||||
|
return {
|
||||||
|
'handled': True,
|
||||||
|
'title': title,
|
||||||
|
'dialog_type': dialog_key,
|
||||||
|
'action': f"click '{target}' (OCR)",
|
||||||
|
'position': ocr_result,
|
||||||
|
'time_ms': dt,
|
||||||
|
}
|
||||||
|
|
||||||
|
print(f"❌ [Dialog] Impossible de cliquer '{target}' ({dt:.0f}ms)")
|
||||||
|
return {
|
||||||
|
'handled': False,
|
||||||
|
'title': title,
|
||||||
|
'dialog_type': dialog_key,
|
||||||
|
'reason': f"Bouton '{target}' introuvable",
|
||||||
|
'time_ms': dt,
|
||||||
|
}
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Lecture titre
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _read_title(self, screenshot_pil) -> str:
|
||||||
|
"""Lit TOUT le texte visible via EasyOCR full-screen (~500ms).
|
||||||
|
|
||||||
|
En VM QEMU, la barre de titre Windows est à l'intérieur du framebuffer,
|
||||||
|
pas en haut absolu de l'écran. On fait l'OCR full-screen et on cherche
|
||||||
|
les mots-clés des dialogues connus dans le texte complet.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
reader = self._get_easyocr()
|
||||||
|
if reader is None:
|
||||||
|
return ""
|
||||||
|
|
||||||
|
results = reader.readtext(np.array(screenshot_pil))
|
||||||
|
full_text = ' '.join(r[1] for r in results if r[1].strip())
|
||||||
|
return full_text
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠️ [Dialog] Erreur lecture écran: {e}")
|
||||||
|
return ""
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Clic via InfiGUI (serveur grounding)
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _click_via_infigui(
|
||||||
|
self, target: str, description: str, screenshot_pil
|
||||||
|
) -> Optional[Dict]:
|
||||||
|
"""Demande à InfiGUI (subprocess one-shot) de localiser et cliquer sur le bouton."""
|
||||||
|
try:
|
||||||
|
from core.grounding.ui_tars_grounder import UITarsGrounder
|
||||||
|
|
||||||
|
grounder = UITarsGrounder.get_instance()
|
||||||
|
result = grounder.ground(
|
||||||
|
target_text=target,
|
||||||
|
target_description=description,
|
||||||
|
screen_pil=screenshot_pil,
|
||||||
|
)
|
||||||
|
|
||||||
|
if result and result.x is not None:
|
||||||
|
import pyautogui
|
||||||
|
pyautogui.click(result.x, result.y)
|
||||||
|
return {'x': result.x, 'y': result.y}
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠️ [Dialog/InfiGUI] Erreur: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Clic via OCR (fallback rapide)
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _click_via_ocr(self, target: str, screenshot_pil) -> Optional[tuple]:
|
||||||
|
"""Cherche le bouton par OCR et clique dessus."""
|
||||||
|
try:
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
reader = self._get_easyocr()
|
||||||
|
if reader is None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
results = reader.readtext(np.array(screenshot_pil))
|
||||||
|
|
||||||
|
target_lower = target.lower()
|
||||||
|
matches = []
|
||||||
|
for (bbox_pts, text, conf) in results:
|
||||||
|
if target_lower in text.lower() or text.lower() in target_lower:
|
||||||
|
x = int(sum(p[0] for p in bbox_pts) / 4)
|
||||||
|
y = int(sum(p[1] for p in bbox_pts) / 4)
|
||||||
|
matches.append((x, y, text))
|
||||||
|
|
||||||
|
if matches:
|
||||||
|
# Prendre le match le plus bas (boutons = bas du dialogue)
|
||||||
|
best = max(matches, key=lambda m: m[1])
|
||||||
|
import pyautogui
|
||||||
|
pyautogui.click(best[0], best[1])
|
||||||
|
return (best[0], best[1])
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠️ [Dialog/OCR] Erreur: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# EasyOCR singleton
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _get_easyocr(self):
|
||||||
|
if self._easyocr_reader is not None:
|
||||||
|
return self._easyocr_reader
|
||||||
|
|
||||||
|
try:
|
||||||
|
import easyocr
|
||||||
|
self._easyocr_reader = easyocr.Reader(
|
||||||
|
['fr', 'en'], gpu=True, verbose=False
|
||||||
|
)
|
||||||
|
return self._easyocr_reader
|
||||||
|
except ImportError:
|
||||||
|
return None
|
||||||
239
core/grounding/element_signature.py
Normal file
239
core/grounding/element_signature.py
Normal file
@@ -0,0 +1,239 @@
|
|||||||
|
"""
|
||||||
|
core/grounding/element_signature.py — Signatures d'éléments UI apprises
|
||||||
|
|
||||||
|
Chaque élément cliqué avec succès enrichit sa signature :
|
||||||
|
- texte OCR, type, position relative, voisins contextuels
|
||||||
|
- nombre de succès/échecs, confiance moyenne
|
||||||
|
- variantes observées (résolutions, positions)
|
||||||
|
|
||||||
|
Les signatures sont stockées en SQLite pour un lookup rapide.
|
||||||
|
Pattern identique à TargetMemoryStore (validé en prod).
|
||||||
|
|
||||||
|
Utilisation :
|
||||||
|
from core.grounding.element_signature import SignatureStore
|
||||||
|
|
||||||
|
store = SignatureStore()
|
||||||
|
|
||||||
|
# Après un clic réussi
|
||||||
|
store.record_success("btn_valider", "notepad_1920x1080", element, confidence=0.92)
|
||||||
|
|
||||||
|
# Au replay
|
||||||
|
sig = store.lookup("btn_valider", "notepad_1920x1080")
|
||||||
|
if sig:
|
||||||
|
print(f"Signature connue : {sig['text']} position={sig['relative_position']}")
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import hashlib
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sqlite3
|
||||||
|
import threading
|
||||||
|
import time
|
||||||
|
from typing import Any, Dict, List, Optional
|
||||||
|
|
||||||
|
from core.grounding.fast_types import DetectedUIElement
|
||||||
|
|
||||||
|
# Chemin par défaut de la DB
|
||||||
|
_DEFAULT_DB = os.path.join(
|
||||||
|
os.path.dirname(os.path.dirname(os.path.dirname(__file__))),
|
||||||
|
"data", "learning", "element_signatures.db",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class SignatureStore:
|
||||||
|
"""Stockage SQLite des signatures d'éléments UI appris."""
|
||||||
|
|
||||||
|
def __init__(self, db_path: str = _DEFAULT_DB):
|
||||||
|
self.db_path = db_path
|
||||||
|
self._lock = threading.Lock()
|
||||||
|
self._ensure_db()
|
||||||
|
|
||||||
|
def _ensure_db(self):
|
||||||
|
"""Crée la DB et la table si nécessaire."""
|
||||||
|
os.makedirs(os.path.dirname(self.db_path), exist_ok=True)
|
||||||
|
with sqlite3.connect(self.db_path) as conn:
|
||||||
|
conn.execute("""
|
||||||
|
CREATE TABLE IF NOT EXISTS signatures (
|
||||||
|
target_key TEXT NOT NULL,
|
||||||
|
screen_context TEXT NOT NULL,
|
||||||
|
text TEXT DEFAULT '',
|
||||||
|
element_type TEXT DEFAULT 'element',
|
||||||
|
relative_position TEXT DEFAULT '',
|
||||||
|
neighbors TEXT DEFAULT '[]',
|
||||||
|
success_count INTEGER DEFAULT 0,
|
||||||
|
fail_count INTEGER DEFAULT 0,
|
||||||
|
avg_confidence REAL DEFAULT 0.0,
|
||||||
|
last_seen TEXT DEFAULT '',
|
||||||
|
variants TEXT DEFAULT '[]',
|
||||||
|
PRIMARY KEY (target_key, screen_context)
|
||||||
|
)
|
||||||
|
""")
|
||||||
|
conn.execute("""
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_target_key
|
||||||
|
ON signatures(target_key)
|
||||||
|
""")
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Lookup
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def lookup(self, target_key: str, screen_context: str = "") -> Optional[Dict[str, Any]]:
|
||||||
|
"""Cherche une signature connue.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
target_key: Clé unique de la cible (hash du texte + description).
|
||||||
|
screen_context: Contexte d'écran (hash titre fenêtre + résolution).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict avec les champs de la signature, ou None.
|
||||||
|
"""
|
||||||
|
with sqlite3.connect(self.db_path) as conn:
|
||||||
|
conn.row_factory = sqlite3.Row
|
||||||
|
# Chercher avec le contexte exact d'abord
|
||||||
|
row = conn.execute(
|
||||||
|
"SELECT * FROM signatures WHERE target_key = ? AND screen_context = ?",
|
||||||
|
(target_key, screen_context),
|
||||||
|
).fetchone()
|
||||||
|
|
||||||
|
# Fallback : chercher sans contexte (toutes les variantes)
|
||||||
|
if row is None and screen_context:
|
||||||
|
row = conn.execute(
|
||||||
|
"SELECT * FROM signatures WHERE target_key = ? ORDER BY success_count DESC LIMIT 1",
|
||||||
|
(target_key,),
|
||||||
|
).fetchone()
|
||||||
|
|
||||||
|
if row is None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
return {
|
||||||
|
"target_key": row["target_key"],
|
||||||
|
"screen_context": row["screen_context"],
|
||||||
|
"text": row["text"],
|
||||||
|
"element_type": row["element_type"],
|
||||||
|
"relative_position": row["relative_position"],
|
||||||
|
"neighbors": json.loads(row["neighbors"]),
|
||||||
|
"success_count": row["success_count"],
|
||||||
|
"fail_count": row["fail_count"],
|
||||||
|
"avg_confidence": row["avg_confidence"],
|
||||||
|
"last_seen": row["last_seen"],
|
||||||
|
"variants": json.loads(row["variants"]),
|
||||||
|
}
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Enregistrement
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def record_success(
|
||||||
|
self,
|
||||||
|
target_key: str,
|
||||||
|
screen_context: str,
|
||||||
|
element: DetectedUIElement,
|
||||||
|
confidence: float,
|
||||||
|
):
|
||||||
|
"""Enregistre un succès — crée ou enrichit la signature."""
|
||||||
|
with self._lock:
|
||||||
|
existing = self.lookup(target_key, screen_context)
|
||||||
|
now = time.strftime("%Y-%m-%dT%H:%M:%S")
|
||||||
|
|
||||||
|
if existing:
|
||||||
|
# Enrichir la signature existante
|
||||||
|
n = existing["success_count"]
|
||||||
|
new_avg = (existing["avg_confidence"] * n + confidence) / (n + 1)
|
||||||
|
|
||||||
|
# Ajouter la variante si position différente
|
||||||
|
variants = existing["variants"]
|
||||||
|
variant = {
|
||||||
|
"position": element.relative_position,
|
||||||
|
"center": list(element.center),
|
||||||
|
"confidence": confidence,
|
||||||
|
"timestamp": now,
|
||||||
|
}
|
||||||
|
variants.append(variant)
|
||||||
|
# Garder les 20 dernières variantes max
|
||||||
|
variants = variants[-20:]
|
||||||
|
|
||||||
|
# Mettre à jour les voisins (union)
|
||||||
|
neighbors = list(set(existing["neighbors"] + element.neighbors))[:10]
|
||||||
|
|
||||||
|
with sqlite3.connect(self.db_path) as conn:
|
||||||
|
conn.execute("""
|
||||||
|
UPDATE signatures SET
|
||||||
|
success_count = success_count + 1,
|
||||||
|
avg_confidence = ?,
|
||||||
|
last_seen = ?,
|
||||||
|
neighbors = ?,
|
||||||
|
variants = ?,
|
||||||
|
relative_position = ?
|
||||||
|
WHERE target_key = ? AND screen_context = ?
|
||||||
|
""", (
|
||||||
|
new_avg, now,
|
||||||
|
json.dumps(neighbors),
|
||||||
|
json.dumps(variants),
|
||||||
|
element.relative_position,
|
||||||
|
target_key, screen_context,
|
||||||
|
))
|
||||||
|
else:
|
||||||
|
# Créer une nouvelle signature
|
||||||
|
with sqlite3.connect(self.db_path) as conn:
|
||||||
|
conn.execute("""
|
||||||
|
INSERT INTO signatures
|
||||||
|
(target_key, screen_context, text, element_type, relative_position,
|
||||||
|
neighbors, success_count, fail_count, avg_confidence, last_seen, variants)
|
||||||
|
VALUES (?, ?, ?, ?, ?, ?, 1, 0, ?, ?, ?)
|
||||||
|
""", (
|
||||||
|
target_key, screen_context,
|
||||||
|
element.ocr_text,
|
||||||
|
element.element_type,
|
||||||
|
element.relative_position,
|
||||||
|
json.dumps(element.neighbors[:10]),
|
||||||
|
confidence, now,
|
||||||
|
json.dumps([{
|
||||||
|
"position": element.relative_position,
|
||||||
|
"center": list(element.center),
|
||||||
|
"confidence": confidence,
|
||||||
|
"timestamp": now,
|
||||||
|
}]),
|
||||||
|
))
|
||||||
|
|
||||||
|
print(f"📝 [Signature] '{target_key}' {'enrichie' if existing else 'créée'} "
|
||||||
|
f"(conf={confidence:.2f}, ctx='{screen_context[:30]}')")
|
||||||
|
|
||||||
|
def record_failure(self, target_key: str, screen_context: str):
|
||||||
|
"""Enregistre un échec pour une signature."""
|
||||||
|
with self._lock:
|
||||||
|
with sqlite3.connect(self.db_path) as conn:
|
||||||
|
conn.execute("""
|
||||||
|
UPDATE signatures SET fail_count = fail_count + 1, last_seen = ?
|
||||||
|
WHERE target_key = ? AND screen_context = ?
|
||||||
|
""", (time.strftime("%Y-%m-%dT%H:%M:%S"), target_key, screen_context))
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Utilitaires
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def make_target_key(text: str, description: str = "") -> str:
|
||||||
|
"""Génère une clé unique pour une cible."""
|
||||||
|
raw = f"{text.lower().strip()}|{description.lower().strip()}"
|
||||||
|
return hashlib.md5(raw.encode()).hexdigest()[:16]
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def make_screen_context(window_title: str, resolution: tuple = (0, 0)) -> str:
|
||||||
|
"""Génère un contexte d'écran."""
|
||||||
|
raw = f"{window_title.lower().strip()}|{resolution[0]}x{resolution[1]}"
|
||||||
|
return hashlib.md5(raw.encode()).hexdigest()[:12]
|
||||||
|
|
||||||
|
def get_stats(self) -> Dict[str, Any]:
|
||||||
|
"""Statistiques de la base de signatures."""
|
||||||
|
with sqlite3.connect(self.db_path) as conn:
|
||||||
|
total = conn.execute("SELECT COUNT(*) FROM signatures").fetchone()[0]
|
||||||
|
reliable = conn.execute(
|
||||||
|
"SELECT COUNT(*) FROM signatures WHERE success_count >= 3 AND fail_count = 0"
|
||||||
|
).fetchone()[0]
|
||||||
|
return {
|
||||||
|
"total_signatures": total,
|
||||||
|
"reliable": reliable,
|
||||||
|
"db_path": self.db_path,
|
||||||
|
}
|
||||||
326
core/grounding/fast_detector.py
Normal file
326
core/grounding/fast_detector.py
Normal file
@@ -0,0 +1,326 @@
|
|||||||
|
"""
|
||||||
|
core/grounding/fast_detector.py — Layer FAST : détection rapide des éléments UI
|
||||||
|
|
||||||
|
Capture l'écran, détecte tous les éléments UI via RF-DETR (~120ms),
|
||||||
|
enrichit chaque élément avec le texte OCR et le contexte spatial.
|
||||||
|
|
||||||
|
Produit un ScreenSnapshot utilisable par le SmartMatcher.
|
||||||
|
|
||||||
|
Utilisation :
|
||||||
|
from core.grounding.fast_detector import FastDetector
|
||||||
|
|
||||||
|
detector = FastDetector()
|
||||||
|
snapshot = detector.detect()
|
||||||
|
print(f"{len(snapshot.elements)} éléments en {snapshot.total_time_ms:.0f}ms")
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import math
|
||||||
|
import time
|
||||||
|
from typing import Any, Dict, List, Optional, Tuple
|
||||||
|
|
||||||
|
from core.grounding.fast_types import DetectedUIElement, ScreenSnapshot
|
||||||
|
|
||||||
|
|
||||||
|
class FastDetector:
|
||||||
|
"""Détection rapide de tous les éléments UI visibles sur l'écran.
|
||||||
|
|
||||||
|
Combine RF-DETR (détection bbox) + docTR (OCR) pour produire
|
||||||
|
un ScreenSnapshot enrichi.
|
||||||
|
|
||||||
|
Le modèle RF-DETR est un singleton chargé au premier appel (~1s),
|
||||||
|
puis les appels suivants sont rapides (~120ms).
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, detection_threshold: float = 0.30):
|
||||||
|
self.detection_threshold = detection_threshold
|
||||||
|
self._last_snapshot: Optional[ScreenSnapshot] = None
|
||||||
|
self._last_phash: str = ""
|
||||||
|
|
||||||
|
def detect(
|
||||||
|
self,
|
||||||
|
screenshot_pil: Optional[Any] = None,
|
||||||
|
phash: str = "",
|
||||||
|
window_title: str = "",
|
||||||
|
) -> ScreenSnapshot:
|
||||||
|
"""Détecte et enrichit tous les éléments UI de l'écran.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
screenshot_pil: Image PIL. Si None, capture via mss.
|
||||||
|
phash: Hash perceptuel pour le cache. Si identique au dernier, réutilise le cache.
|
||||||
|
window_title: Titre de la fenêtre active.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
ScreenSnapshot avec tous les éléments enrichis.
|
||||||
|
"""
|
||||||
|
t0 = time.time()
|
||||||
|
|
||||||
|
# Cache : même écran → même résultat
|
||||||
|
if phash and phash == self._last_phash and self._last_snapshot is not None:
|
||||||
|
print(f"⚡ [FAST] Cache hit (pHash identique)")
|
||||||
|
return self._last_snapshot
|
||||||
|
|
||||||
|
# Capture si pas fourni
|
||||||
|
if screenshot_pil is None:
|
||||||
|
screenshot_pil = self._capture_screen()
|
||||||
|
if screenshot_pil is None:
|
||||||
|
return ScreenSnapshot(elements=[], ocr_words=[], resolution=(0, 0))
|
||||||
|
|
||||||
|
w, h = screenshot_pil.size
|
||||||
|
|
||||||
|
# --- Détection RF-DETR (~120ms) ---
|
||||||
|
t_det = time.time()
|
||||||
|
raw_elements = self._detect_rfdetr(screenshot_pil)
|
||||||
|
detection_ms = (time.time() - t_det) * 1000
|
||||||
|
|
||||||
|
# --- OCR sur les crops des éléments détectés (pas full screen) ---
|
||||||
|
t_ocr = time.time()
|
||||||
|
ocr_words = self._ocr_extract(screenshot_pil)
|
||||||
|
ocr_ms = (time.time() - t_ocr) * 1000
|
||||||
|
|
||||||
|
# --- Enrichissement : attribuer texte + voisins + position ---
|
||||||
|
enriched = self._enrich_elements(raw_elements, ocr_words, w, h)
|
||||||
|
|
||||||
|
total_ms = (time.time() - t0) * 1000
|
||||||
|
|
||||||
|
snapshot = ScreenSnapshot(
|
||||||
|
elements=enriched,
|
||||||
|
ocr_words=ocr_words,
|
||||||
|
resolution=(w, h),
|
||||||
|
window_title=window_title,
|
||||||
|
phash=phash,
|
||||||
|
detection_time_ms=detection_ms,
|
||||||
|
ocr_time_ms=ocr_ms,
|
||||||
|
total_time_ms=total_ms,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Mettre en cache
|
||||||
|
if phash:
|
||||||
|
self._last_phash = phash
|
||||||
|
self._last_snapshot = snapshot
|
||||||
|
|
||||||
|
print(f"⚡ [FAST] {len(enriched)} éléments détectés en {total_ms:.0f}ms "
|
||||||
|
f"(det={detection_ms:.0f}ms, ocr={ocr_ms:.0f}ms)")
|
||||||
|
|
||||||
|
return snapshot
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Détection RF-DETR
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _detect_rfdetr(self, image) -> List[DetectedUIElement]:
|
||||||
|
"""Détecte les éléments via RF-DETR (réutilise le singleton existant)."""
|
||||||
|
try:
|
||||||
|
import sys
|
||||||
|
sys.path.insert(0, 'visual_workflow_builder/backend')
|
||||||
|
from services.ui_detection_service import detect_ui_elements
|
||||||
|
|
||||||
|
result = detect_ui_elements(image, threshold=self.detection_threshold)
|
||||||
|
|
||||||
|
elements = []
|
||||||
|
for e in result.elements:
|
||||||
|
x1 = e.bbox["x1"]
|
||||||
|
y1 = e.bbox["y1"]
|
||||||
|
x2 = e.bbox["x2"]
|
||||||
|
y2 = e.bbox["y2"]
|
||||||
|
elements.append(DetectedUIElement(
|
||||||
|
id=e.id,
|
||||||
|
bbox=(x1, y1, x2, y2),
|
||||||
|
center=(e.center["x"], e.center["y"]),
|
||||||
|
confidence=e.confidence,
|
||||||
|
))
|
||||||
|
|
||||||
|
return elements
|
||||||
|
|
||||||
|
except Exception as ex:
|
||||||
|
print(f"⚠️ [FAST/detect] RF-DETR erreur: {ex}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# OCR
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
_easyocr_reader = None # Singleton EasyOCR (chargé une fois)
|
||||||
|
|
||||||
|
def _ocr_extract(self, image) -> List[Dict[str, Any]]:
|
||||||
|
"""Extrait les mots visibles via EasyOCR (GPU, ~500ms).
|
||||||
|
|
||||||
|
Fallback sur docTR si EasyOCR non disponible.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
import numpy as np
|
||||||
|
import easyocr
|
||||||
|
|
||||||
|
# Singleton : charger le reader une seule fois
|
||||||
|
if FastDetector._easyocr_reader is None:
|
||||||
|
print(f"🔍 [FAST/ocr] Chargement EasyOCR (GPU)...")
|
||||||
|
FastDetector._easyocr_reader = easyocr.Reader(
|
||||||
|
['fr', 'en'], gpu=True, verbose=False
|
||||||
|
)
|
||||||
|
|
||||||
|
results = FastDetector._easyocr_reader.readtext(np.array(image))
|
||||||
|
|
||||||
|
words = []
|
||||||
|
for (bbox_pts, text, conf) in results:
|
||||||
|
if not text or len(text.strip()) < 1:
|
||||||
|
continue
|
||||||
|
# bbox_pts = [[x1,y1],[x2,y1],[x2,y2],[x1,y2]]
|
||||||
|
x1 = int(min(p[0] for p in bbox_pts))
|
||||||
|
y1 = int(min(p[1] for p in bbox_pts))
|
||||||
|
x2 = int(max(p[0] for p in bbox_pts))
|
||||||
|
y2 = int(max(p[1] for p in bbox_pts))
|
||||||
|
words.append({
|
||||||
|
'text': text.strip(),
|
||||||
|
'bbox': [x1, y1, x2, y2],
|
||||||
|
'confidence': float(conf),
|
||||||
|
})
|
||||||
|
|
||||||
|
return words
|
||||||
|
|
||||||
|
except ImportError:
|
||||||
|
# Fallback docTR
|
||||||
|
try:
|
||||||
|
import sys
|
||||||
|
sys.path.insert(0, 'visual_workflow_builder/backend')
|
||||||
|
from services.ocr_service import ocr_extract_words
|
||||||
|
return ocr_extract_words(image) or []
|
||||||
|
except Exception:
|
||||||
|
return []
|
||||||
|
except Exception as ex:
|
||||||
|
print(f"⚠️ [FAST/ocr] EasyOCR erreur: {ex}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Enrichissement
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _enrich_elements(
|
||||||
|
self,
|
||||||
|
elements: List[DetectedUIElement],
|
||||||
|
ocr_words: List[Dict[str, Any]],
|
||||||
|
screen_w: int,
|
||||||
|
screen_h: int,
|
||||||
|
) -> List[DetectedUIElement]:
|
||||||
|
"""Enrichit chaque élément avec texte OCR, voisins et position relative."""
|
||||||
|
|
||||||
|
for elem in elements:
|
||||||
|
# 1. Attribuer le texte OCR par intersection bbox
|
||||||
|
elem.ocr_text = self._assign_ocr_text(elem, ocr_words)
|
||||||
|
|
||||||
|
# 2. Position relative dans l'écran (grille 3x3)
|
||||||
|
elem.relative_position = self._compute_relative_position(
|
||||||
|
elem.center, screen_w, screen_h
|
||||||
|
)
|
||||||
|
|
||||||
|
# 3. Classifier le type d'élément (heuristique taille + ratio)
|
||||||
|
elem.element_type = self._classify_element_type(elem)
|
||||||
|
|
||||||
|
# 4. Calculer les voisins (texte des éléments proches)
|
||||||
|
for elem in elements:
|
||||||
|
elem.neighbors = self._find_neighbors(elem, elements)
|
||||||
|
|
||||||
|
return elements
|
||||||
|
|
||||||
|
def _assign_ocr_text(
|
||||||
|
self,
|
||||||
|
elem: DetectedUIElement,
|
||||||
|
ocr_words: List[Dict[str, Any]],
|
||||||
|
) -> str:
|
||||||
|
"""Attribue le texte OCR à un élément par intersection géométrique."""
|
||||||
|
x1, y1, x2, y2 = elem.bbox
|
||||||
|
# Élargir la bbox de 20% pour capturer le texte autour
|
||||||
|
margin_x = int((x2 - x1) * 0.2)
|
||||||
|
margin_y = int((y2 - y1) * 0.2)
|
||||||
|
ex1, ey1 = x1 - margin_x, y1 - margin_y
|
||||||
|
ex2, ey2 = x2 + margin_x, y2 + margin_y
|
||||||
|
|
||||||
|
texts = []
|
||||||
|
for word in ocr_words:
|
||||||
|
wb = word.get('bbox', [0, 0, 0, 0])
|
||||||
|
if len(wb) < 4:
|
||||||
|
continue
|
||||||
|
wx1, wy1, wx2, wy2 = wb[0], wb[1], wb[2], wb[3]
|
||||||
|
# Intersection ?
|
||||||
|
if wx1 < ex2 and wx2 > ex1 and wy1 < ey2 and wy2 > ey1:
|
||||||
|
text = word.get('text', '').strip()
|
||||||
|
if text and len(text) > 1:
|
||||||
|
texts.append(text)
|
||||||
|
|
||||||
|
return ' '.join(texts)
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _compute_relative_position(
|
||||||
|
center: Tuple[int, int],
|
||||||
|
screen_w: int,
|
||||||
|
screen_h: int,
|
||||||
|
) -> str:
|
||||||
|
"""Calcule la position relative dans une grille 3x3."""
|
||||||
|
cx, cy = center
|
||||||
|
col = "left" if cx < screen_w / 3 else ("right" if cx > 2 * screen_w / 3 else "center")
|
||||||
|
row = "top" if cy < screen_h / 3 else ("bottom" if cy > 2 * screen_h / 3 else "middle")
|
||||||
|
return f"{row}_{col}"
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _classify_element_type(elem: DetectedUIElement) -> str:
|
||||||
|
"""Classifie le type d'élément par heuristique taille/ratio."""
|
||||||
|
w, h = elem.width, elem.height
|
||||||
|
if w == 0 or h == 0:
|
||||||
|
return "element"
|
||||||
|
ratio = w / h
|
||||||
|
area = w * h
|
||||||
|
|
||||||
|
# Petit carré → icône
|
||||||
|
if area < 5000 and 0.5 < ratio < 2.0:
|
||||||
|
return "icon"
|
||||||
|
# Large et fin → bouton ou champ
|
||||||
|
if ratio > 3.0 and h < 60:
|
||||||
|
return "input"
|
||||||
|
if ratio > 2.0 and h < 50:
|
||||||
|
return "button"
|
||||||
|
# Grand bloc → zone de contenu
|
||||||
|
if area > 50000:
|
||||||
|
return "container"
|
||||||
|
|
||||||
|
return "element"
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _find_neighbors(
|
||||||
|
elem: DetectedUIElement,
|
||||||
|
all_elements: List[DetectedUIElement],
|
||||||
|
max_neighbors: int = 5,
|
||||||
|
) -> List[str]:
|
||||||
|
"""Trouve les textes OCR des éléments proches (rayon 1.5x diagonale)."""
|
||||||
|
diag = math.sqrt(elem.width**2 + elem.height**2)
|
||||||
|
radius = max(diag * 1.5, 100) # minimum 100px
|
||||||
|
|
||||||
|
neighbors = []
|
||||||
|
for other in all_elements:
|
||||||
|
if other.id == elem.id or not other.ocr_text:
|
||||||
|
continue
|
||||||
|
dx = other.center[0] - elem.center[0]
|
||||||
|
dy = other.center[1] - elem.center[1]
|
||||||
|
dist = math.sqrt(dx**2 + dy**2)
|
||||||
|
if dist < radius:
|
||||||
|
neighbors.append(other.ocr_text)
|
||||||
|
|
||||||
|
return neighbors[:max_neighbors]
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Capture écran
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _capture_screen():
|
||||||
|
"""Capture l'écran via mss."""
|
||||||
|
try:
|
||||||
|
import mss
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
with mss.mss() as sct:
|
||||||
|
mon = sct.monitors[0]
|
||||||
|
grab = sct.grab(mon)
|
||||||
|
return Image.frombytes('RGB', grab.size, grab.bgra, 'raw', 'BGRX')
|
||||||
|
except Exception as ex:
|
||||||
|
print(f"⚠️ [FAST/capture] Erreur: {ex}")
|
||||||
|
return None
|
||||||
216
core/grounding/fast_pipeline.py
Normal file
216
core/grounding/fast_pipeline.py
Normal file
@@ -0,0 +1,216 @@
|
|||||||
|
"""
|
||||||
|
core/grounding/fast_pipeline.py — Pipeline FAST → SMART → THINK
|
||||||
|
|
||||||
|
Orchestrateur central : détecte les éléments (FAST), matche avec la cible (SMART),
|
||||||
|
et demande au VLM de trancher si le score est trop bas (THINK).
|
||||||
|
|
||||||
|
Seuils de confiance :
|
||||||
|
≥ 0.90 → action directe (FAST/SMART)
|
||||||
|
0.60-0.90 → VLM confirme (THINK)
|
||||||
|
< 0.60 → VLM cherche seul (THINK)
|
||||||
|
|
||||||
|
L'ancien GroundingPipeline est utilisé en fallback si tout échoue.
|
||||||
|
|
||||||
|
Utilisation :
|
||||||
|
from core.grounding.fast_pipeline import FastSmartThinkPipeline
|
||||||
|
from core.grounding.target import GroundingTarget
|
||||||
|
|
||||||
|
pipeline = FastSmartThinkPipeline()
|
||||||
|
result = pipeline.locate(GroundingTarget(text="Valider"))
|
||||||
|
if result:
|
||||||
|
print(f"({result.x}, {result.y}) via {result.method} en {result.time_ms:.0f}ms")
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import time
|
||||||
|
import threading
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from core.grounding.target import GroundingTarget, GroundingResult
|
||||||
|
from core.grounding.fast_types import LocateResult
|
||||||
|
from core.grounding.fast_detector import FastDetector
|
||||||
|
from core.grounding.smart_matcher import SmartMatcher
|
||||||
|
from core.grounding.think_arbiter import ThinkArbiter
|
||||||
|
from core.grounding.element_signature import SignatureStore
|
||||||
|
|
||||||
|
|
||||||
|
# Singleton
|
||||||
|
_instance: Optional[FastSmartThinkPipeline] = None
|
||||||
|
_instance_lock = threading.Lock()
|
||||||
|
|
||||||
|
|
||||||
|
class FastSmartThinkPipeline:
|
||||||
|
"""Pipeline FAST → SMART → THINK pour la localisation d'éléments UI.
|
||||||
|
|
||||||
|
Chaque appel à locate() suit la cascade :
|
||||||
|
1. FAST : détection RF-DETR + OCR enrichissement (~120ms+1s)
|
||||||
|
2. SMART : matching texte/type/position/voisins (< 1ms)
|
||||||
|
3. THINK : VLM arbitre si score insuffisant (~3-5s)
|
||||||
|
4. Fallback : ancien pipeline si tout échoue
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
confidence_direct: float = 0.90,
|
||||||
|
confidence_think: float = 0.60,
|
||||||
|
enable_think: bool = True,
|
||||||
|
enable_learning: bool = True,
|
||||||
|
):
|
||||||
|
self.confidence_direct = confidence_direct
|
||||||
|
self.confidence_think = confidence_think
|
||||||
|
self.enable_think = enable_think
|
||||||
|
self.enable_learning = enable_learning
|
||||||
|
|
||||||
|
self._detector = FastDetector()
|
||||||
|
self._matcher = SmartMatcher()
|
||||||
|
self._arbiter = ThinkArbiter()
|
||||||
|
self._signatures = SignatureStore()
|
||||||
|
self._fallback_pipeline = None
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def get_instance(cls) -> FastSmartThinkPipeline:
|
||||||
|
"""Retourne l'instance singleton."""
|
||||||
|
global _instance
|
||||||
|
if _instance is None:
|
||||||
|
with _instance_lock:
|
||||||
|
if _instance is None:
|
||||||
|
_instance = cls()
|
||||||
|
return _instance
|
||||||
|
|
||||||
|
def set_fallback_pipeline(self, pipeline) -> None:
|
||||||
|
"""Configure l'ancien pipeline comme safety net."""
|
||||||
|
self._fallback_pipeline = pipeline
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# API principale
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def locate(
|
||||||
|
self,
|
||||||
|
target: GroundingTarget,
|
||||||
|
screenshot_pil=None,
|
||||||
|
phash: str = "",
|
||||||
|
window_title: str = "",
|
||||||
|
) -> Optional[GroundingResult]:
|
||||||
|
"""Localise un élément UI via la cascade FAST → SMART → THINK.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
target: Ce qu'on cherche (texte, description, bbox d'origine).
|
||||||
|
screenshot_pil: Image PIL. Si None, capture via mss.
|
||||||
|
phash: Hash perceptuel pour le cache.
|
||||||
|
window_title: Titre de la fenêtre active.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
GroundingResult compatible avec le pipeline existant, ou None.
|
||||||
|
"""
|
||||||
|
t0 = time.time()
|
||||||
|
|
||||||
|
# --- FAST : détecter tous les éléments ---
|
||||||
|
snapshot = self._detector.detect(
|
||||||
|
screenshot_pil=screenshot_pil,
|
||||||
|
phash=phash,
|
||||||
|
window_title=window_title,
|
||||||
|
)
|
||||||
|
|
||||||
|
if not snapshot.elements:
|
||||||
|
print(f"⚡ [Pipeline] FAST : aucun élément détecté")
|
||||||
|
return self._try_fallback(target)
|
||||||
|
|
||||||
|
# --- Lookup signature apprise ---
|
||||||
|
target_key = SignatureStore.make_target_key(
|
||||||
|
target.text or "", target.description or ""
|
||||||
|
)
|
||||||
|
screen_ctx = SignatureStore.make_screen_context(
|
||||||
|
window_title, snapshot.resolution
|
||||||
|
)
|
||||||
|
signature = self._signatures.lookup(target_key, screen_ctx)
|
||||||
|
|
||||||
|
# --- SMART : matcher avec la cible ---
|
||||||
|
candidate = self._matcher.match(snapshot, target, signature)
|
||||||
|
|
||||||
|
if candidate:
|
||||||
|
dt = (time.time() - t0) * 1000
|
||||||
|
|
||||||
|
# Score suffisant → action directe
|
||||||
|
if candidate.score >= self.confidence_direct:
|
||||||
|
print(f"✅ [Pipeline] FAST→SMART direct : '{candidate.element.ocr_text}' "
|
||||||
|
f"score={candidate.score:.3f} ({candidate.method}) "
|
||||||
|
f"→ ({candidate.element.center[0]}, {candidate.element.center[1]}) "
|
||||||
|
f"en {dt:.0f}ms")
|
||||||
|
|
||||||
|
# Apprentissage
|
||||||
|
if self.enable_learning:
|
||||||
|
self._signatures.record_success(
|
||||||
|
target_key, screen_ctx,
|
||||||
|
candidate.element, candidate.score,
|
||||||
|
)
|
||||||
|
|
||||||
|
return GroundingResult(
|
||||||
|
x=candidate.element.center[0],
|
||||||
|
y=candidate.element.center[1],
|
||||||
|
method=f"fast_{candidate.method}",
|
||||||
|
confidence=candidate.score,
|
||||||
|
time_ms=dt,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Score moyen → demander au VLM de confirmer
|
||||||
|
if candidate.score >= self.confidence_think and self.enable_think:
|
||||||
|
print(f"🤔 [Pipeline] SMART score={candidate.score:.3f} — THINK pour confirmer")
|
||||||
|
think_result = self._arbiter.arbitrate(
|
||||||
|
target,
|
||||||
|
candidates=[candidate],
|
||||||
|
screenshot_pil=screenshot_pil or snapshot.elements[0] if False else screenshot_pil,
|
||||||
|
)
|
||||||
|
dt = (time.time() - t0) * 1000
|
||||||
|
|
||||||
|
if think_result:
|
||||||
|
# VLM a confirmé
|
||||||
|
if self.enable_learning:
|
||||||
|
self._signatures.record_success(
|
||||||
|
target_key, screen_ctx,
|
||||||
|
candidate.element, think_result.confidence,
|
||||||
|
)
|
||||||
|
return GroundingResult(
|
||||||
|
x=think_result.x, y=think_result.y,
|
||||||
|
method="smart_think_confirmed",
|
||||||
|
confidence=think_result.confidence,
|
||||||
|
time_ms=dt,
|
||||||
|
)
|
||||||
|
|
||||||
|
# --- THINK : score trop bas ou pas de candidat → VLM cherche seul ---
|
||||||
|
if self.enable_think:
|
||||||
|
score_info = f"score={candidate.score:.3f}" if candidate else "aucun candidat"
|
||||||
|
print(f"🤔 [Pipeline] {score_info} — THINK recherche complète")
|
||||||
|
think_result = self._arbiter.arbitrate(
|
||||||
|
target, candidates=[], screenshot_pil=screenshot_pil,
|
||||||
|
)
|
||||||
|
dt = (time.time() - t0) * 1000
|
||||||
|
|
||||||
|
if think_result:
|
||||||
|
return GroundingResult(
|
||||||
|
x=think_result.x, y=think_result.y,
|
||||||
|
method="think_vlm",
|
||||||
|
confidence=think_result.confidence,
|
||||||
|
time_ms=dt,
|
||||||
|
)
|
||||||
|
|
||||||
|
# --- Fallback : ancien pipeline ---
|
||||||
|
return self._try_fallback(target)
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Fallback
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _try_fallback(self, target: GroundingTarget) -> Optional[GroundingResult]:
|
||||||
|
"""Tente l'ancien pipeline en dernier recours."""
|
||||||
|
if self._fallback_pipeline is None:
|
||||||
|
print(f"❌ [Pipeline] Aucune méthode n'a trouvé '{target.text}'")
|
||||||
|
return None
|
||||||
|
|
||||||
|
print(f"⚠️ [Pipeline] Fallback ancien pipeline pour '{target.text}'")
|
||||||
|
try:
|
||||||
|
return self._fallback_pipeline.locate(target)
|
||||||
|
except Exception as ex:
|
||||||
|
print(f"⚠️ [Pipeline] Fallback échoué: {ex}")
|
||||||
|
return None
|
||||||
81
core/grounding/fast_types.py
Normal file
81
core/grounding/fast_types.py
Normal file
@@ -0,0 +1,81 @@
|
|||||||
|
"""
|
||||||
|
core/grounding/fast_types.py — Structures de données pour le pipeline FAST→SMART→THINK
|
||||||
|
|
||||||
|
Utilisées exclusivement par le pipeline de localisation rapide.
|
||||||
|
Compatibles avec GroundingTarget/GroundingResult existants via conversion.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from typing import Any, Dict, List, Optional, Tuple
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class DetectedUIElement:
|
||||||
|
"""Élément UI détecté par le layer FAST (RF-DETR) puis enrichi par OCR."""
|
||||||
|
id: int
|
||||||
|
bbox: Tuple[int, int, int, int] # (x1, y1, x2, y2) pixels absolus
|
||||||
|
center: Tuple[int, int] # (cx, cy)
|
||||||
|
confidence: float # confidence détecteur (0-1)
|
||||||
|
element_type: str = "element" # "button", "input", "icon", "text", "element"
|
||||||
|
ocr_text: str = "" # texte OCR extrait de la région
|
||||||
|
neighbors: List[str] = field(default_factory=list) # textes des éléments proches
|
||||||
|
relative_position: str = "" # "top_left", "center", "bottom_right", etc.
|
||||||
|
|
||||||
|
@property
|
||||||
|
def width(self) -> int:
|
||||||
|
return self.bbox[2] - self.bbox[0]
|
||||||
|
|
||||||
|
@property
|
||||||
|
def height(self) -> int:
|
||||||
|
return self.bbox[3] - self.bbox[1]
|
||||||
|
|
||||||
|
@property
|
||||||
|
def area(self) -> int:
|
||||||
|
return self.width * self.height
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ScreenSnapshot:
|
||||||
|
"""État complet de l'écran à un instant t — sortie du layer FAST."""
|
||||||
|
elements: List[DetectedUIElement]
|
||||||
|
ocr_words: List[Dict[str, Any]] # mots OCR bruts [{text, bbox}]
|
||||||
|
resolution: Tuple[int, int] # (width, height)
|
||||||
|
window_title: str = ""
|
||||||
|
phash: str = ""
|
||||||
|
detection_time_ms: float = 0.0
|
||||||
|
ocr_time_ms: float = 0.0
|
||||||
|
total_time_ms: float = 0.0
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class MatchCandidate:
|
||||||
|
"""Résultat du matching SMART pour un élément candidat."""
|
||||||
|
element: DetectedUIElement
|
||||||
|
score: float # score combiné (0-1)
|
||||||
|
score_detail: Dict[str, float] = field(default_factory=dict)
|
||||||
|
method: str = "" # "exact_text", "fuzzy_text", "position", etc.
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class LocateResult:
|
||||||
|
"""Résultat final du pipeline FAST→SMART→THINK."""
|
||||||
|
x: int
|
||||||
|
y: int
|
||||||
|
confidence: float
|
||||||
|
method: str # "fast_exact", "fast_fuzzy", "smart_vote", "think_vlm"
|
||||||
|
time_ms: float
|
||||||
|
tier: str = "fast" # "fast", "smart", "think"
|
||||||
|
element: Optional[DetectedUIElement] = None
|
||||||
|
candidates_count: int = 0
|
||||||
|
|
||||||
|
def to_grounding_result(self):
|
||||||
|
"""Conversion vers GroundingResult pour compatibilité."""
|
||||||
|
from core.grounding.target import GroundingResult
|
||||||
|
return GroundingResult(
|
||||||
|
x=self.x, y=self.y,
|
||||||
|
method=self.method,
|
||||||
|
confidence=self.confidence,
|
||||||
|
time_ms=self.time_ms,
|
||||||
|
)
|
||||||
210
core/grounding/infigui_worker.py
Normal file
210
core/grounding/infigui_worker.py
Normal file
@@ -0,0 +1,210 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Worker InfiGUI — process indépendant, communication par fichiers.
|
||||||
|
|
||||||
|
Charge le modèle, surveille /tmp/infigui_request.json, infère, écrit /tmp/infigui_response.json.
|
||||||
|
|
||||||
|
Lancement :
|
||||||
|
cd ~/ai/rpa_vision_v3
|
||||||
|
.venv/bin/python3 -m core.grounding.infigui_worker
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import math
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
import gc
|
||||||
|
import warnings
|
||||||
|
|
||||||
|
warnings.filterwarnings("ignore")
|
||||||
|
|
||||||
|
import torch
|
||||||
|
|
||||||
|
REQUEST_FILE = "/tmp/infigui_request.json"
|
||||||
|
RESPONSE_FILE = "/tmp/infigui_response.json"
|
||||||
|
READY_FILE = "/tmp/infigui_ready"
|
||||||
|
|
||||||
|
|
||||||
|
def load_model():
|
||||||
|
"""Charge InfiGUI-G1-3B en 4-bit NF4."""
|
||||||
|
torch.cuda.empty_cache()
|
||||||
|
gc.collect()
|
||||||
|
|
||||||
|
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor, BitsAndBytesConfig
|
||||||
|
|
||||||
|
model_id = "InfiX-ai/InfiGUI-G1-3B"
|
||||||
|
print(f"[infigui-worker] Chargement {model_id}...")
|
||||||
|
|
||||||
|
bnb = BitsAndBytesConfig(
|
||||||
|
load_in_4bit=True, bnb_4bit_quant_type="nf4",
|
||||||
|
bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True,
|
||||||
|
)
|
||||||
|
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
|
||||||
|
model_id, quantization_config=bnb, device_map={"": "cuda:0"},
|
||||||
|
)
|
||||||
|
model.eval()
|
||||||
|
processor = AutoProcessor.from_pretrained(
|
||||||
|
model_id, padding_side="left",
|
||||||
|
min_pixels=100 * 28 * 28, max_pixels=5600 * 28 * 28,
|
||||||
|
)
|
||||||
|
|
||||||
|
vram = torch.cuda.memory_allocated() / 1e9
|
||||||
|
print(f"[infigui-worker] Prêt — VRAM: {vram:.2f}GB")
|
||||||
|
|
||||||
|
# Signal "prêt"
|
||||||
|
with open(READY_FILE, "w") as f:
|
||||||
|
f.write(f"ready {vram:.2f}GB")
|
||||||
|
|
||||||
|
return model, processor
|
||||||
|
|
||||||
|
|
||||||
|
def infer(model, processor, req):
|
||||||
|
"""Fait une inférence.
|
||||||
|
|
||||||
|
Modes :
|
||||||
|
- texte seul (target/description) : grounding classique
|
||||||
|
- fusionné (anchor_image_path présent) : on passe en plus le crop d'ancre
|
||||||
|
comme image de référence et le modèle doit retrouver cet élément sur
|
||||||
|
le screenshot. Évite la double passe describe→ground.
|
||||||
|
"""
|
||||||
|
from PIL import Image
|
||||||
|
from qwen_vl_utils import process_vision_info
|
||||||
|
|
||||||
|
target = req.get("target", "")
|
||||||
|
description = req.get("description", "")
|
||||||
|
label = f"{target} — {description}" if description else target
|
||||||
|
|
||||||
|
# Image principale (screenshot complet)
|
||||||
|
image_path = req.get("image_path", "")
|
||||||
|
if image_path and os.path.exists(image_path):
|
||||||
|
img = Image.open(image_path).convert("RGB")
|
||||||
|
else:
|
||||||
|
import mss
|
||||||
|
with mss.mss() as sct:
|
||||||
|
grab = sct.grab(sct.monitors[0])
|
||||||
|
img = Image.frombytes("RGB", grab.size, grab.bgra, "raw", "BGRX")
|
||||||
|
|
||||||
|
# Image d'ancre (optionnelle) — mode fusionné describe+ground
|
||||||
|
anchor_image_path = req.get("anchor_image_path", "")
|
||||||
|
anchor_img = None
|
||||||
|
if anchor_image_path and os.path.exists(anchor_image_path):
|
||||||
|
anchor_img = Image.open(anchor_image_path).convert("RGB")
|
||||||
|
|
||||||
|
if not label.strip() and anchor_img is None:
|
||||||
|
return {"x": None, "y": None, "error": "target ou anchor_image requis"}
|
||||||
|
|
||||||
|
W, H = img.size
|
||||||
|
factor = 28
|
||||||
|
rH = max(factor, round(H / factor) * factor)
|
||||||
|
rW = max(factor, round(W / factor) * factor)
|
||||||
|
|
||||||
|
system = (
|
||||||
|
"You FIRST think about the reasoning process as an internal monologue "
|
||||||
|
"and then provide the final answer.\n"
|
||||||
|
"The reasoning process MUST BE enclosed within <think> </think> tags."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Construction du prompt selon le mode
|
||||||
|
if anchor_img is not None:
|
||||||
|
# Mode fusionné : Image1 = crop d'ancre, Image2 = screenshot
|
||||||
|
hint = f' Hint: this element looks like "{label}".' if label.strip() else ""
|
||||||
|
user_text = (
|
||||||
|
f"The first image is a small crop of a UI element captured previously. "
|
||||||
|
f"The second image is the current screen ({rW}x{rH}).{hint}\n"
|
||||||
|
f"Locate on the second image the UI element that visually matches the first image. "
|
||||||
|
f"Output the coordinates using JSON format: "
|
||||||
|
f'[{{"point_2d": [x, y]}}, ...]'
|
||||||
|
)
|
||||||
|
messages = [
|
||||||
|
{"role": "system", "content": system},
|
||||||
|
{"role": "user", "content": [
|
||||||
|
{"type": "image", "image": anchor_img},
|
||||||
|
{"type": "image", "image": img},
|
||||||
|
{"type": "text", "text": user_text},
|
||||||
|
]},
|
||||||
|
]
|
||||||
|
else:
|
||||||
|
# Mode classique : texte seul
|
||||||
|
user_text = (
|
||||||
|
f'The screen\'s resolution is {rW}x{rH}.\n'
|
||||||
|
f'Locate the UI element(s) for "{label}", '
|
||||||
|
f'output the coordinates using JSON format: '
|
||||||
|
f'[{{"point_2d": [x, y]}}, ...]'
|
||||||
|
)
|
||||||
|
messages = [
|
||||||
|
{"role": "system", "content": system},
|
||||||
|
{"role": "user", "content": [
|
||||||
|
{"type": "image", "image": img},
|
||||||
|
{"type": "text", "text": user_text},
|
||||||
|
]},
|
||||||
|
]
|
||||||
|
|
||||||
|
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||||
|
image_inputs, video_inputs = process_vision_info(messages)
|
||||||
|
inputs = processor(
|
||||||
|
text=[text], images=image_inputs, videos=video_inputs,
|
||||||
|
padding=True, return_tensors="pt",
|
||||||
|
).to(model.device)
|
||||||
|
|
||||||
|
t0 = time.time()
|
||||||
|
with torch.no_grad():
|
||||||
|
gen = model.generate(**inputs, max_new_tokens=512)
|
||||||
|
infer_ms = (time.time() - t0) * 1000
|
||||||
|
|
||||||
|
trimmed = [o[len(i):] for i, o in zip(inputs.input_ids, gen)]
|
||||||
|
raw = processor.batch_decode(
|
||||||
|
trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False,
|
||||||
|
)[0].strip()
|
||||||
|
|
||||||
|
mode_str = "fused" if anchor_img is not None else "text"
|
||||||
|
print(f"[infigui-worker] [{mode_str}] '{label[:40]}' ({infer_ms:.0f}ms)")
|
||||||
|
|
||||||
|
# Parser JSON point_2d
|
||||||
|
json_part = raw.split("</think>")[-1] if "</think>" in raw else raw
|
||||||
|
json_part = json_part.replace("```json", "").replace("```", "").strip()
|
||||||
|
|
||||||
|
px, py = None, None
|
||||||
|
try:
|
||||||
|
parsed = json.loads(json_part)
|
||||||
|
if isinstance(parsed, list) and len(parsed) > 0:
|
||||||
|
pt = parsed[0].get("point_2d", [])
|
||||||
|
if len(pt) >= 2:
|
||||||
|
px = int(pt[0] * W / rW)
|
||||||
|
py = int(pt[1] * H / rH)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
m = re.search(r'"point_2d"\s*:\s*\[(\d+),\s*(\d+)\]', raw)
|
||||||
|
if m:
|
||||||
|
px = int(int(m.group(1)) * W / rW)
|
||||||
|
py = int(int(m.group(2)) * H / rH)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"x": px, "y": py,
|
||||||
|
"method": "infigui",
|
||||||
|
"confidence": 0.90 if px else 0.0,
|
||||||
|
"time_ms": round(infer_ms, 1),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Mode one-shot : lit une requête sur stdin, infère, écrit le résultat sur stdout."""
|
||||||
|
# Lire la requête
|
||||||
|
input_data = sys.stdin.read().strip()
|
||||||
|
if not input_data:
|
||||||
|
print(json.dumps({"x": None, "y": None, "error": "pas de requête"}))
|
||||||
|
return
|
||||||
|
|
||||||
|
try:
|
||||||
|
req = json.loads(input_data)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
print(json.dumps({"x": None, "y": None, "error": "JSON invalide"}))
|
||||||
|
return
|
||||||
|
|
||||||
|
model, processor = load_model()
|
||||||
|
result = infer(model, processor, req)
|
||||||
|
print(json.dumps(result))
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
190
core/grounding/pipeline.py
Normal file
190
core/grounding/pipeline.py
Normal file
@@ -0,0 +1,190 @@
|
|||||||
|
"""
|
||||||
|
core/grounding/pipeline.py — Pipeline de grounding en cascade
|
||||||
|
|
||||||
|
Orchestre les methodes de localisation dans l'ordre :
|
||||||
|
1. Template matching (TemplateMatcher, local, ~80ms)
|
||||||
|
2. OCR (docTR via input_handler, local, ~1s)
|
||||||
|
3. UI-TARS (HTTP vers serveur grounding, ~3s)
|
||||||
|
4. Static fallback (coordonnees d'origine du workflow)
|
||||||
|
|
||||||
|
Chaque methode est essayee dans l'ordre. Des qu'une reussit, on retourne
|
||||||
|
le resultat. Cela permet un equilibre entre vitesse (template) et robustesse
|
||||||
|
(UI-TARS pour les elements qui ont change de position/apparence).
|
||||||
|
|
||||||
|
Utilisation :
|
||||||
|
from core.grounding.pipeline import GroundingPipeline
|
||||||
|
from core.grounding.target import GroundingTarget
|
||||||
|
|
||||||
|
pipeline = GroundingPipeline()
|
||||||
|
result = pipeline.locate(GroundingTarget(
|
||||||
|
text="Valider",
|
||||||
|
description="bouton vert en bas",
|
||||||
|
template_b64=screenshot_b64,
|
||||||
|
original_bbox={"x": 100, "y": 200, "width": 80, "height": 30},
|
||||||
|
))
|
||||||
|
if result:
|
||||||
|
print(f"Trouve a ({result.x}, {result.y}) via {result.method}")
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import time
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from core.grounding.target import GroundingTarget, GroundingResult
|
||||||
|
|
||||||
|
|
||||||
|
class GroundingPipeline:
|
||||||
|
"""Pipeline de localisation en cascade : template -> OCR -> UI-TARS -> static."""
|
||||||
|
|
||||||
|
def __init__(self, template_threshold: float = 0.75, enable_uitars: bool = True):
|
||||||
|
self.template_threshold = template_threshold
|
||||||
|
self.enable_uitars = enable_uitars
|
||||||
|
|
||||||
|
def locate(self, target: GroundingTarget) -> Optional[GroundingResult]:
|
||||||
|
"""Localise un element UI en essayant les methodes en cascade.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
target: description de l'element a localiser
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
GroundingResult ou None si aucune methode ne trouve l'element
|
||||||
|
"""
|
||||||
|
t0 = time.time()
|
||||||
|
|
||||||
|
# --- Methode 1 : Template matching (~80ms) ---
|
||||||
|
result = self._try_template(target)
|
||||||
|
if result:
|
||||||
|
print(f"[GroundingPipeline] Localise via {result.method} en "
|
||||||
|
f"{(time.time() - t0) * 1000:.0f}ms")
|
||||||
|
return result
|
||||||
|
|
||||||
|
# --- Methode 2 : OCR texte (~1s) ---
|
||||||
|
result = self._try_ocr(target)
|
||||||
|
if result:
|
||||||
|
print(f"[GroundingPipeline] Localise via {result.method} en "
|
||||||
|
f"{(time.time() - t0) * 1000:.0f}ms")
|
||||||
|
return result
|
||||||
|
|
||||||
|
# --- Methode 3 : UI-TARS via serveur HTTP (~3s) ---
|
||||||
|
if self.enable_uitars:
|
||||||
|
result = self._try_uitars(target)
|
||||||
|
if result:
|
||||||
|
print(f"[GroundingPipeline] Localise via {result.method} en "
|
||||||
|
f"{(time.time() - t0) * 1000:.0f}ms")
|
||||||
|
return result
|
||||||
|
|
||||||
|
# --- Methode 4 : Fallback statique ---
|
||||||
|
result = self._try_static(target)
|
||||||
|
if result:
|
||||||
|
print(f"[GroundingPipeline] Localise via {result.method} en "
|
||||||
|
f"{(time.time() - t0) * 1000:.0f}ms")
|
||||||
|
return result
|
||||||
|
|
||||||
|
print(f"[GroundingPipeline] ECHEC: '{target.text}' introuvable "
|
||||||
|
f"(toutes methodes epuisees, {(time.time() - t0) * 1000:.0f}ms)")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Methodes individuelles
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _try_template(self, target: GroundingTarget) -> Optional[GroundingResult]:
|
||||||
|
"""Template matching — rapide, exact, mais sensible aux changements visuels."""
|
||||||
|
if not target.template_b64:
|
||||||
|
return None
|
||||||
|
|
||||||
|
try:
|
||||||
|
from core.grounding.template_matcher import TemplateMatcher
|
||||||
|
matcher = TemplateMatcher(threshold=self.template_threshold)
|
||||||
|
match = matcher.match_screen(anchor_b64=target.template_b64)
|
||||||
|
if match:
|
||||||
|
print(f"[GroundingPipeline/template] score={match.score:.3f} "
|
||||||
|
f"pos=({match.x},{match.y}) ({match.time_ms:.0f}ms)")
|
||||||
|
return GroundingResult(
|
||||||
|
x=match.x,
|
||||||
|
y=match.y,
|
||||||
|
method='template',
|
||||||
|
confidence=match.score,
|
||||||
|
time_ms=match.time_ms,
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
diag = matcher.match_screen_diagnostic(anchor_b64=target.template_b64)
|
||||||
|
print(f"[GroundingPipeline/template] pas de match — best={diag}")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"[GroundingPipeline/template] ERREUR: {e}")
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
def _try_ocr(self, target: GroundingTarget) -> Optional[GroundingResult]:
|
||||||
|
"""OCR : cherche le texte cible sur l'ecran via docTR."""
|
||||||
|
if not target.text:
|
||||||
|
return None
|
||||||
|
|
||||||
|
try:
|
||||||
|
from core.execution.input_handler import _grounding_ocr
|
||||||
|
bbox = target.original_bbox if target.original_bbox else None
|
||||||
|
result = _grounding_ocr(target.text, anchor_bbox=bbox)
|
||||||
|
if result:
|
||||||
|
print(f"[GroundingPipeline/OCR] '{target.text}' -> ({result['x']}, {result['y']})")
|
||||||
|
return GroundingResult(
|
||||||
|
x=result['x'],
|
||||||
|
y=result['y'],
|
||||||
|
method='ocr',
|
||||||
|
confidence=result.get('confidence', 0.80),
|
||||||
|
time_ms=result.get('time_ms', 0),
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
print(f"[GroundingPipeline/OCR] '{target.text}' non trouve")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"[GroundingPipeline/OCR] ERREUR: {e}")
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
def _try_uitars(self, target: GroundingTarget) -> Optional[GroundingResult]:
|
||||||
|
"""UI-TARS via serveur HTTP — robust, gere les changements de layout."""
|
||||||
|
if not target.text and not target.description:
|
||||||
|
return None
|
||||||
|
|
||||||
|
try:
|
||||||
|
from core.grounding.ui_tars_grounder import UITarsGrounder
|
||||||
|
grounder = UITarsGrounder.get_instance()
|
||||||
|
result = grounder.ground(
|
||||||
|
target_text=target.text,
|
||||||
|
target_description=target.description,
|
||||||
|
)
|
||||||
|
if result:
|
||||||
|
print(f"[GroundingPipeline/UI-TARS] ({result.x}, {result.y}) "
|
||||||
|
f"conf={result.confidence:.2f} ({result.time_ms:.0f}ms)")
|
||||||
|
return result
|
||||||
|
else:
|
||||||
|
print(f"[GroundingPipeline/UI-TARS] pas de resultat")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"[GroundingPipeline/UI-TARS] ERREUR: {e}")
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
def _try_static(self, target: GroundingTarget) -> Optional[GroundingResult]:
|
||||||
|
"""Fallback : coordonnees d'origine du workflow (centre du bounding box)."""
|
||||||
|
bbox = target.original_bbox
|
||||||
|
if not bbox:
|
||||||
|
return None
|
||||||
|
|
||||||
|
w = bbox.get('width', 0)
|
||||||
|
h = bbox.get('height', 0)
|
||||||
|
if not w or not h:
|
||||||
|
return None
|
||||||
|
|
||||||
|
x = int(bbox.get('x', 0) + w / 2)
|
||||||
|
y = int(bbox.get('y', 0) + h / 2)
|
||||||
|
|
||||||
|
print(f"[GroundingPipeline/static] fallback ({x}, {y}) "
|
||||||
|
f"depuis bbox {bbox}")
|
||||||
|
|
||||||
|
return GroundingResult(
|
||||||
|
x=x,
|
||||||
|
y=y,
|
||||||
|
method='static_fallback',
|
||||||
|
confidence=0.30,
|
||||||
|
time_ms=0.0,
|
||||||
|
)
|
||||||
113
core/grounding/server.py
Normal file
113
core/grounding/server.py
Normal file
@@ -0,0 +1,113 @@
|
|||||||
|
"""Serveur grounding minimaliste — Flask single-thread, même contexte CUDA."""
|
||||||
|
import base64, io, json, math, os, re, time, gc
|
||||||
|
import torch
|
||||||
|
from flask import Flask, request, jsonify
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
app = Flask(__name__)
|
||||||
|
|
||||||
|
MODEL_ID = os.environ.get("GROUNDING_MODEL", "InfiX-ai/InfiGUI-G1-3B")
|
||||||
|
MIN_PIXELS = 100 * 28 * 28
|
||||||
|
MAX_PIXELS = 5600 * 28 * 28
|
||||||
|
_model = None
|
||||||
|
_processor = None
|
||||||
|
|
||||||
|
def _smart_resize(h, w, factor=28):
|
||||||
|
h_bar = max(factor, round(h/factor)*factor)
|
||||||
|
w_bar = max(factor, round(w/factor)*factor)
|
||||||
|
if h_bar*w_bar > MAX_PIXELS:
|
||||||
|
beta = math.sqrt((h*w)/MAX_PIXELS)
|
||||||
|
h_bar = math.floor(h/beta/factor)*factor
|
||||||
|
w_bar = math.floor(w/beta/factor)*factor
|
||||||
|
elif h_bar*w_bar < MIN_PIXELS:
|
||||||
|
beta = math.sqrt(MIN_PIXELS/(h*w))
|
||||||
|
h_bar = math.ceil(h*beta/factor)*factor
|
||||||
|
w_bar = math.ceil(w*beta/factor)*factor
|
||||||
|
return h_bar, w_bar
|
||||||
|
|
||||||
|
def load_model():
|
||||||
|
global _model, _processor
|
||||||
|
if _model is not None:
|
||||||
|
return
|
||||||
|
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor, BitsAndBytesConfig
|
||||||
|
torch.cuda.empty_cache(); gc.collect()
|
||||||
|
print(f"[grounding] Chargement {MODEL_ID}...")
|
||||||
|
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
|
||||||
|
bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True)
|
||||||
|
_model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
|
||||||
|
MODEL_ID, quantization_config=bnb, device_map="auto")
|
||||||
|
_model.eval()
|
||||||
|
_processor = AutoProcessor.from_pretrained(MODEL_ID, min_pixels=MIN_PIXELS, max_pixels=MAX_PIXELS, padding_side="left")
|
||||||
|
print(f"[grounding] Prêt — VRAM: {torch.cuda.memory_allocated()/1e9:.2f}GB")
|
||||||
|
|
||||||
|
@app.route('/health')
|
||||||
|
def health():
|
||||||
|
return jsonify({"status": "ok", "model": MODEL_ID, "model_loaded": _model is not None,
|
||||||
|
"cuda_available": torch.cuda.is_available(),
|
||||||
|
"vram_allocated_gb": round(torch.cuda.memory_allocated()/1e9, 2)})
|
||||||
|
|
||||||
|
@app.route('/ground', methods=['POST'])
|
||||||
|
def ground():
|
||||||
|
if _model is None:
|
||||||
|
return jsonify({"error": "Modèle pas chargé"}), 503
|
||||||
|
from qwen_vl_utils import process_vision_info
|
||||||
|
data = request.json
|
||||||
|
target = data.get('target_text', '')
|
||||||
|
desc = data.get('target_description', '')
|
||||||
|
label = f"{target} — {desc}" if desc else target
|
||||||
|
if not label.strip():
|
||||||
|
return jsonify({"error": "target_text requis"}), 400
|
||||||
|
|
||||||
|
# Image
|
||||||
|
if data.get('image_b64'):
|
||||||
|
raw = data['image_b64'].split(',')[1] if ',' in data['image_b64'] else data['image_b64']
|
||||||
|
img = Image.open(io.BytesIO(base64.b64decode(raw))).convert('RGB')
|
||||||
|
else:
|
||||||
|
import mss
|
||||||
|
with mss.mss() as sct:
|
||||||
|
grab = sct.grab(sct.monitors[0])
|
||||||
|
img = Image.frombytes('RGB', grab.size, grab.bgra, 'raw', 'BGRX')
|
||||||
|
|
||||||
|
W, H = img.size
|
||||||
|
rH, rW = _smart_resize(H, W)
|
||||||
|
|
||||||
|
user_text = f'The screen\'s resolution is {rW}x{rH}.\nLocate the UI element(s) for "{label}", output the coordinates using JSON format: [{{"point_2d": [x, y]}}, ...]'
|
||||||
|
system = "You FIRST think about the reasoning process as an internal monologue and then provide the final answer.\nThe reasoning process MUST BE enclosed within <think> </think> tags."
|
||||||
|
|
||||||
|
messages = [{"role": "system", "content": system},
|
||||||
|
{"role": "user", "content": [{"type": "image", "image": img}, {"type": "text", "text": user_text}]}]
|
||||||
|
|
||||||
|
text = _processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||||
|
image_inputs, video_inputs = process_vision_info(messages)
|
||||||
|
inputs = _processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt").to(_model.device)
|
||||||
|
|
||||||
|
t0 = time.time()
|
||||||
|
with torch.no_grad():
|
||||||
|
gen = _model.generate(**inputs, max_new_tokens=512)
|
||||||
|
infer_ms = (time.time()-t0)*1000
|
||||||
|
|
||||||
|
trimmed = [o[len(i):] for i,o in zip(inputs.input_ids, gen)]
|
||||||
|
raw = _processor.batch_decode(trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0].strip()
|
||||||
|
print(f"[grounding] '{label[:40]}' → {raw[:100]} ({infer_ms:.0f}ms)")
|
||||||
|
|
||||||
|
# Parser JSON point_2d
|
||||||
|
json_part = raw.split("</think>")[-1] if "</think>" in raw else raw
|
||||||
|
json_part = json_part.replace("```json","").replace("```","").strip()
|
||||||
|
px, py = None, None
|
||||||
|
try:
|
||||||
|
parsed = json.loads(json_part)
|
||||||
|
if isinstance(parsed, list) and len(parsed) > 0:
|
||||||
|
pt = parsed[0].get("point_2d", [])
|
||||||
|
if len(pt) >= 2:
|
||||||
|
px, py = int(pt[0]*W/rW), int(pt[1]*H/rH)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
m = re.search(r'"point_2d"\s*:\s*\[(\d+),\s*(\d+)\]', raw)
|
||||||
|
if m:
|
||||||
|
px, py = int(int(m.group(1))*W/rW), int(int(m.group(2))*H/rH)
|
||||||
|
|
||||||
|
return jsonify({"x": px, "y": py, "method": "infigui", "confidence": 0.90 if px else 0.0,
|
||||||
|
"time_ms": round(infer_ms, 1), "raw_output": raw[:300]})
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
load_model()
|
||||||
|
app.run(host='0.0.0.0', port=8200, threaded=False)
|
||||||
156
core/grounding/shadow_learning_hook.py
Normal file
156
core/grounding/shadow_learning_hook.py
Normal file
@@ -0,0 +1,156 @@
|
|||||||
|
"""
|
||||||
|
core/grounding/shadow_learning_hook.py — Hook d'apprentissage Shadow
|
||||||
|
|
||||||
|
Connecte le ShadowObserver au SignatureStore : chaque clic observé pendant
|
||||||
|
une session Shadow enrichit la base de signatures d'éléments.
|
||||||
|
|
||||||
|
L'humain clique quelque part → on détecte quel élément UI est sous le clic →
|
||||||
|
on stocke sa signature (texte, type, position, voisins) pour le replay.
|
||||||
|
|
||||||
|
Ce module est un HOOK optionnel — il ne modifie pas le ShadowObserver,
|
||||||
|
il s'y branche via callback.
|
||||||
|
|
||||||
|
Utilisation :
|
||||||
|
from core.grounding.shadow_learning_hook import ShadowLearningHook
|
||||||
|
|
||||||
|
hook = ShadowLearningHook()
|
||||||
|
|
||||||
|
# Dans le ShadowObserver ou l'API de capture :
|
||||||
|
hook.on_click_observed(
|
||||||
|
click_x=542, click_y=318,
|
||||||
|
screenshot_pil=screen,
|
||||||
|
window_title="Bloc-notes",
|
||||||
|
target_label="Bouton Valider",
|
||||||
|
)
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import threading
|
||||||
|
import time
|
||||||
|
from typing import Any, Dict, Optional
|
||||||
|
|
||||||
|
from core.grounding.element_signature import SignatureStore
|
||||||
|
from core.grounding.fast_types import DetectedUIElement
|
||||||
|
|
||||||
|
|
||||||
|
class ShadowLearningHook:
|
||||||
|
"""Hook d'apprentissage pour le mode Shadow.
|
||||||
|
|
||||||
|
À chaque clic humain observé, détecte l'élément sous le clic
|
||||||
|
et enrichit le SignatureStore.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, signature_store: Optional[SignatureStore] = None):
|
||||||
|
self._store = signature_store or SignatureStore()
|
||||||
|
self._detector = None # Lazy load pour ne pas charger RF-DETR au startup
|
||||||
|
self._lock = threading.Lock()
|
||||||
|
|
||||||
|
def on_click_observed(
|
||||||
|
self,
|
||||||
|
click_x: int,
|
||||||
|
click_y: int,
|
||||||
|
screenshot_pil: Optional[Any] = None,
|
||||||
|
window_title: str = "",
|
||||||
|
target_label: str = "",
|
||||||
|
target_description: str = "",
|
||||||
|
) -> Optional[Dict[str, Any]]:
|
||||||
|
"""Appelé quand un clic humain est observé pendant le Shadow.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
click_x, click_y: Position du clic (pixels écran).
|
||||||
|
screenshot_pil: Capture d'écran PIL au moment du clic.
|
||||||
|
window_title: Titre de la fenêtre active.
|
||||||
|
target_label: Label de l'étape (si connu).
|
||||||
|
target_description: Description de l'élément (si connue).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict avec la signature créée/enrichie, ou None si échec.
|
||||||
|
"""
|
||||||
|
t0 = time.time()
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Lazy load du détecteur
|
||||||
|
if self._detector is None:
|
||||||
|
from core.grounding.fast_detector import FastDetector
|
||||||
|
self._detector = FastDetector()
|
||||||
|
|
||||||
|
# Détecter les éléments sur l'écran
|
||||||
|
snapshot = self._detector.detect(screenshot_pil=screenshot_pil)
|
||||||
|
|
||||||
|
if not snapshot.elements:
|
||||||
|
print(f"📝 [Shadow/learn] Aucun élément détecté à ({click_x}, {click_y})")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Trouver l'élément sous le clic
|
||||||
|
clicked_element = self._find_element_at(click_x, click_y, snapshot.elements)
|
||||||
|
|
||||||
|
if clicked_element is None:
|
||||||
|
print(f"📝 [Shadow/learn] Aucun élément sous ({click_x}, {click_y})")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Construire la clé de la cible
|
||||||
|
target_key = SignatureStore.make_target_key(
|
||||||
|
target_label or clicked_element.ocr_text,
|
||||||
|
target_description,
|
||||||
|
)
|
||||||
|
screen_ctx = SignatureStore.make_screen_context(
|
||||||
|
window_title, snapshot.resolution,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Enregistrer la signature
|
||||||
|
self._store.record_success(
|
||||||
|
target_key=target_key,
|
||||||
|
screen_context=screen_ctx,
|
||||||
|
element=clicked_element,
|
||||||
|
confidence=1.0, # L'humain a cliqué → confiance maximale
|
||||||
|
)
|
||||||
|
|
||||||
|
dt = (time.time() - t0) * 1000
|
||||||
|
print(f"📝 [Shadow/learn] Signature '{clicked_element.ocr_text}' "
|
||||||
|
f"type={clicked_element.element_type} "
|
||||||
|
f"pos={clicked_element.relative_position} "
|
||||||
|
f"voisins={clicked_element.neighbors[:3]} ({dt:.0f}ms)")
|
||||||
|
|
||||||
|
return {
|
||||||
|
"target_key": target_key,
|
||||||
|
"text": clicked_element.ocr_text,
|
||||||
|
"element_type": clicked_element.element_type,
|
||||||
|
"relative_position": clicked_element.relative_position,
|
||||||
|
"neighbors": clicked_element.neighbors,
|
||||||
|
"center": clicked_element.center,
|
||||||
|
}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠️ [Shadow/learn] Erreur: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _find_element_at(
|
||||||
|
x: int, y: int,
|
||||||
|
elements: list,
|
||||||
|
margin: int = 20,
|
||||||
|
) -> Optional[DetectedUIElement]:
|
||||||
|
"""Trouve l'élément dont la bbox contient le point (x, y).
|
||||||
|
|
||||||
|
Si aucun match exact, prend le plus proche dans un rayon de `margin` pixels.
|
||||||
|
"""
|
||||||
|
# Match exact : le clic est dans la bbox
|
||||||
|
for elem in elements:
|
||||||
|
x1, y1, x2, y2 = elem.bbox
|
||||||
|
if x1 <= x <= x2 and y1 <= y <= y2:
|
||||||
|
return elem
|
||||||
|
|
||||||
|
# Match par proximité : le clic est proche du centre
|
||||||
|
best_elem = None
|
||||||
|
best_dist = float('inf')
|
||||||
|
|
||||||
|
for elem in elements:
|
||||||
|
dx = abs(elem.center[0] - x)
|
||||||
|
dy = abs(elem.center[1] - y)
|
||||||
|
dist = (dx**2 + dy**2) ** 0.5
|
||||||
|
if dist < margin and dist < best_dist:
|
||||||
|
best_dist = dist
|
||||||
|
best_elem = elem
|
||||||
|
|
||||||
|
return best_elem
|
||||||
263
core/grounding/smart_matcher.py
Normal file
263
core/grounding/smart_matcher.py
Normal file
@@ -0,0 +1,263 @@
|
|||||||
|
"""
|
||||||
|
core/grounding/smart_matcher.py — Layer SMART : matching déterministe/probabiliste
|
||||||
|
|
||||||
|
Étant donné un ScreenSnapshot (tous les éléments détectés) et un GroundingTarget
|
||||||
|
(ce qu'on cherche), trouve l'élément correspondant avec un score de confiance.
|
||||||
|
|
||||||
|
Pipeline de matching (court-circuit au premier match haute confiance) :
|
||||||
|
1. Texte exact (2ms) → score 0.95
|
||||||
|
2. Texte fuzzy ratio (5ms) → score 0.70-0.90
|
||||||
|
3. Type + position (2ms) → bonus/malus
|
||||||
|
4. Voisins contextuels (5ms) → bonus
|
||||||
|
5. Score combiné → MatchCandidate
|
||||||
|
|
||||||
|
Utilisation :
|
||||||
|
from core.grounding.smart_matcher import SmartMatcher
|
||||||
|
from core.grounding.fast_types import ScreenSnapshot
|
||||||
|
from core.grounding.target import GroundingTarget
|
||||||
|
|
||||||
|
matcher = SmartMatcher()
|
||||||
|
candidate = matcher.match(snapshot, GroundingTarget(text="Valider"))
|
||||||
|
if candidate and candidate.score >= 0.90:
|
||||||
|
print(f"Match direct : ({candidate.element.center}) score={candidate.score}")
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import re
|
||||||
|
from difflib import SequenceMatcher
|
||||||
|
from typing import Dict, List, Optional
|
||||||
|
|
||||||
|
from core.grounding.fast_types import DetectedUIElement, MatchCandidate, ScreenSnapshot
|
||||||
|
from core.grounding.target import GroundingTarget
|
||||||
|
|
||||||
|
|
||||||
|
class SmartMatcher:
|
||||||
|
"""Matching intelligent entre une cible et les éléments détectés.
|
||||||
|
|
||||||
|
Combine plusieurs signaux (texte, type, position, voisins) en un score
|
||||||
|
de confiance unique pour chaque candidat.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
weight_text: float = 0.50,
|
||||||
|
weight_type: float = 0.10,
|
||||||
|
weight_position: float = 0.15,
|
||||||
|
weight_neighbors: float = 0.25,
|
||||||
|
):
|
||||||
|
self.w_text = weight_text
|
||||||
|
self.w_type = weight_type
|
||||||
|
self.w_position = weight_position
|
||||||
|
self.w_neighbors = weight_neighbors
|
||||||
|
|
||||||
|
def match(
|
||||||
|
self,
|
||||||
|
snapshot: ScreenSnapshot,
|
||||||
|
target: GroundingTarget,
|
||||||
|
signature: Optional[Dict] = None,
|
||||||
|
) -> Optional[MatchCandidate]:
|
||||||
|
"""Trouve le MEILLEUR élément correspondant à la cible.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Le MatchCandidate avec le score le plus élevé, ou None si aucun match.
|
||||||
|
"""
|
||||||
|
candidates = self.match_all(snapshot, target, signature)
|
||||||
|
if not candidates:
|
||||||
|
return None
|
||||||
|
return candidates[0]
|
||||||
|
|
||||||
|
def match_all(
|
||||||
|
self,
|
||||||
|
snapshot: ScreenSnapshot,
|
||||||
|
target: GroundingTarget,
|
||||||
|
signature: Optional[Dict] = None,
|
||||||
|
) -> List[MatchCandidate]:
|
||||||
|
"""Trouve TOUS les candidats triés par score décroissant.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
snapshot: État de l'écran (éléments détectés + OCR).
|
||||||
|
target: Ce qu'on cherche (texte, description, bbox d'origine).
|
||||||
|
signature: Signature apprise (optionnel, enrichit le matching).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Liste de MatchCandidate triée par score décroissant.
|
||||||
|
"""
|
||||||
|
if not snapshot.elements:
|
||||||
|
return []
|
||||||
|
|
||||||
|
target_text = (target.text or "").strip()
|
||||||
|
target_desc = (target.description or "").strip()
|
||||||
|
search_text = target_text or target_desc
|
||||||
|
|
||||||
|
if not search_text:
|
||||||
|
return []
|
||||||
|
|
||||||
|
candidates = []
|
||||||
|
search_lower = self._normalize(search_text)
|
||||||
|
|
||||||
|
for elem in snapshot.elements:
|
||||||
|
score_detail: Dict[str, float] = {}
|
||||||
|
method = ""
|
||||||
|
|
||||||
|
# --- 1. Score texte ---
|
||||||
|
text_score = self._score_text(search_lower, elem.ocr_text)
|
||||||
|
score_detail["text"] = text_score
|
||||||
|
|
||||||
|
if text_score >= 0.95:
|
||||||
|
method = "exact_text"
|
||||||
|
elif text_score >= 0.70:
|
||||||
|
method = "fuzzy_text"
|
||||||
|
|
||||||
|
# --- 2. Score type (si signature connue) ---
|
||||||
|
type_score = 0.5 # neutre par défaut
|
||||||
|
if signature and signature.get("element_type"):
|
||||||
|
if elem.element_type == signature["element_type"]:
|
||||||
|
type_score = 1.0
|
||||||
|
elif elem.element_type == "element":
|
||||||
|
type_score = 0.5 # non classifié, neutre
|
||||||
|
else:
|
||||||
|
type_score = 0.2
|
||||||
|
score_detail["type"] = type_score
|
||||||
|
|
||||||
|
# --- 3. Score position (si bbox d'origine connue) ---
|
||||||
|
position_score = 0.5 # neutre
|
||||||
|
if target.original_bbox:
|
||||||
|
position_score = self._score_position(
|
||||||
|
elem.center, target.original_bbox,
|
||||||
|
snapshot.resolution[0], snapshot.resolution[1],
|
||||||
|
)
|
||||||
|
elif signature and signature.get("relative_position"):
|
||||||
|
if elem.relative_position == signature["relative_position"]:
|
||||||
|
position_score = 0.9
|
||||||
|
else:
|
||||||
|
position_score = 0.3
|
||||||
|
score_detail["position"] = position_score
|
||||||
|
|
||||||
|
# --- 4. Score voisins (si signature connue) ---
|
||||||
|
neighbor_score = 0.5 # neutre
|
||||||
|
if signature and signature.get("neighbors"):
|
||||||
|
neighbor_score = self._score_neighbors(
|
||||||
|
elem.neighbors, signature["neighbors"]
|
||||||
|
)
|
||||||
|
score_detail["neighbors"] = neighbor_score
|
||||||
|
|
||||||
|
# --- Score combiné ---
|
||||||
|
combined = (
|
||||||
|
self.w_text * text_score
|
||||||
|
+ self.w_type * type_score
|
||||||
|
+ self.w_position * position_score
|
||||||
|
+ self.w_neighbors * neighbor_score
|
||||||
|
)
|
||||||
|
|
||||||
|
# Seuil minimum : pas de candidat si le texte ne matche pas du tout
|
||||||
|
if text_score < 0.30:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if not method:
|
||||||
|
method = "combined"
|
||||||
|
|
||||||
|
candidates.append(MatchCandidate(
|
||||||
|
element=elem,
|
||||||
|
score=combined,
|
||||||
|
score_detail=score_detail,
|
||||||
|
method=method,
|
||||||
|
))
|
||||||
|
|
||||||
|
# Trier par score décroissant
|
||||||
|
candidates.sort(key=lambda c: c.score, reverse=True)
|
||||||
|
|
||||||
|
return candidates
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Scoring texte
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _score_text(self, search: str, ocr_text: str) -> float:
|
||||||
|
"""Score de similarité textuelle (0-1)."""
|
||||||
|
if not ocr_text:
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
ocr_lower = self._normalize(ocr_text)
|
||||||
|
|
||||||
|
# Match exact
|
||||||
|
if search == ocr_lower:
|
||||||
|
return 1.0
|
||||||
|
|
||||||
|
# Inclusion (l'un contient l'autre)
|
||||||
|
if search in ocr_lower or ocr_lower in search:
|
||||||
|
overlap = min(len(search), len(ocr_lower))
|
||||||
|
total = max(len(search), len(ocr_lower))
|
||||||
|
if total > 0:
|
||||||
|
return 0.70 + 0.25 * (overlap / total)
|
||||||
|
|
||||||
|
# Fuzzy matching (SequenceMatcher, standard library)
|
||||||
|
ratio = SequenceMatcher(None, search, ocr_lower).ratio()
|
||||||
|
if ratio >= 0.60:
|
||||||
|
return 0.50 + 0.40 * ratio
|
||||||
|
|
||||||
|
return ratio * 0.3
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Scoring position
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _score_position(
|
||||||
|
center: tuple,
|
||||||
|
original_bbox: dict,
|
||||||
|
screen_w: int,
|
||||||
|
screen_h: int,
|
||||||
|
) -> float:
|
||||||
|
"""Score de proximité par rapport à la position d'origine (0-1)."""
|
||||||
|
if not original_bbox:
|
||||||
|
return 0.5
|
||||||
|
|
||||||
|
orig_x = original_bbox.get("x", 0) + original_bbox.get("width", 0) / 2
|
||||||
|
orig_y = original_bbox.get("y", 0) + original_bbox.get("height", 0) / 2
|
||||||
|
|
||||||
|
dx = abs(center[0] - orig_x) / max(screen_w, 1)
|
||||||
|
dy = abs(center[1] - orig_y) / max(screen_h, 1)
|
||||||
|
distance_norm = (dx**2 + dy**2) ** 0.5
|
||||||
|
|
||||||
|
# distance 0 = score 1.0, distance 0.5 (demi-écran) = score ~0.2
|
||||||
|
return max(0.0, 1.0 - distance_norm * 2.0)
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Scoring voisins
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _score_neighbors(
|
||||||
|
current_neighbors: List[str],
|
||||||
|
expected_neighbors: List[str],
|
||||||
|
) -> float:
|
||||||
|
"""Score Jaccard sur les ensembles de mots voisins (0-1)."""
|
||||||
|
if not expected_neighbors:
|
||||||
|
return 0.5
|
||||||
|
|
||||||
|
current_set = {n.lower().strip() for n in current_neighbors if n}
|
||||||
|
expected_set = {n.lower().strip() for n in expected_neighbors if n}
|
||||||
|
|
||||||
|
if not current_set and not expected_set:
|
||||||
|
return 0.5
|
||||||
|
|
||||||
|
intersection = current_set & expected_set
|
||||||
|
union = current_set | expected_set
|
||||||
|
|
||||||
|
if not union:
|
||||||
|
return 0.5
|
||||||
|
|
||||||
|
return len(intersection) / len(union)
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Utilitaires
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _normalize(text: str) -> str:
|
||||||
|
"""Normalise un texte pour la comparaison."""
|
||||||
|
text = text.lower().strip()
|
||||||
|
text = re.sub(r'[_\-\./\\]', ' ', text)
|
||||||
|
text = re.sub(r'\s+', ' ', text)
|
||||||
|
return text
|
||||||
48
core/grounding/target.py
Normal file
48
core/grounding/target.py
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
"""
|
||||||
|
core/grounding/target.py — Types partagés pour le grounding visuel
|
||||||
|
|
||||||
|
Dataclasses décrivant une cible à localiser (GroundingTarget) et
|
||||||
|
le résultat d'une localisation (GroundingResult).
|
||||||
|
|
||||||
|
Ces types sont la brique commune pour tous les modules de grounding :
|
||||||
|
template matching, OCR, VLM, CLIP, etc.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from typing import Dict, Optional
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class GroundingTarget:
|
||||||
|
"""Description d'un élément UI à localiser sur l'écran.
|
||||||
|
|
||||||
|
Attributs :
|
||||||
|
text : texte visible de l'élément (bouton, label, etc.)
|
||||||
|
description : description sémantique libre (ex: "le bouton Valider en bas à droite")
|
||||||
|
template_b64 : capture visuelle de l'élément, encodée en base64 PNG/JPEG
|
||||||
|
original_bbox : position d'origine lors de la capture {x, y, width, height}
|
||||||
|
"""
|
||||||
|
text: str = ""
|
||||||
|
description: str = ""
|
||||||
|
template_b64: str = ""
|
||||||
|
original_bbox: Optional[Dict[str, int]] = field(default=None)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class GroundingResult:
|
||||||
|
"""Résultat d'une localisation d'élément UI.
|
||||||
|
|
||||||
|
Attributs :
|
||||||
|
x : coordonnée X du centre de l'élément trouvé (pixels écran)
|
||||||
|
y : coordonnée Y du centre de l'élément trouvé (pixels écran)
|
||||||
|
method : méthode ayant produit le résultat ('template', 'ocr', 'vlm', 'clip', etc.)
|
||||||
|
confidence : score de confiance [0.0 – 1.0]
|
||||||
|
time_ms : temps de recherche en millisecondes
|
||||||
|
"""
|
||||||
|
x: int
|
||||||
|
y: int
|
||||||
|
method: str
|
||||||
|
confidence: float
|
||||||
|
time_ms: float
|
||||||
350
core/grounding/template_matcher.py
Normal file
350
core/grounding/template_matcher.py
Normal file
@@ -0,0 +1,350 @@
|
|||||||
|
"""
|
||||||
|
core/grounding/template_matcher.py — Template matching centralisé
|
||||||
|
|
||||||
|
Fournit une classe TemplateMatcher qui localise une ancre visuelle (image template)
|
||||||
|
dans un screenshot via cv2.matchTemplate. Supporte single-scale et multi-scale.
|
||||||
|
|
||||||
|
Remplace les implémentations dupliquées dans :
|
||||||
|
- core/execution/observe_reason_act.py (~1348-1375)
|
||||||
|
- visual_workflow_builder/backend/api_v3/execute.py (~930-963)
|
||||||
|
- visual_workflow_builder/backend/catalog_routes_v2_vlm.py (~339-381)
|
||||||
|
- visual_workflow_builder/backend/services/intelligent_executor.py (~131-210)
|
||||||
|
- core/detection/omniparser_adapter.py (~330)
|
||||||
|
|
||||||
|
Utilisation :
|
||||||
|
from core.grounding import TemplateMatcher, MatchResult
|
||||||
|
|
||||||
|
matcher = TemplateMatcher(threshold=0.75)
|
||||||
|
result = matcher.match_screen(anchor_b64="...")
|
||||||
|
if result:
|
||||||
|
print(f"Trouvé à ({result.x}, {result.y}) score={result.score:.3f}")
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import base64
|
||||||
|
import io
|
||||||
|
import logging
|
||||||
|
import time
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from typing import List, Optional, Tuple
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# Imports optionnels — le module se charge même sans cv2/PIL/mss
|
||||||
|
try:
|
||||||
|
import cv2
|
||||||
|
_CV2 = True
|
||||||
|
except ImportError:
|
||||||
|
_CV2 = False
|
||||||
|
|
||||||
|
try:
|
||||||
|
import numpy as np
|
||||||
|
_NP = True
|
||||||
|
except ImportError:
|
||||||
|
_NP = False
|
||||||
|
|
||||||
|
try:
|
||||||
|
from PIL import Image
|
||||||
|
_PIL = True
|
||||||
|
except ImportError:
|
||||||
|
_PIL = False
|
||||||
|
|
||||||
|
try:
|
||||||
|
import mss as mss_lib
|
||||||
|
_MSS = True
|
||||||
|
except ImportError:
|
||||||
|
_MSS = False
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Résultat d'un match
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class MatchResult:
|
||||||
|
"""Résultat d'un template matching."""
|
||||||
|
x: int
|
||||||
|
y: int
|
||||||
|
score: float
|
||||||
|
method: str # 'template' | 'template_multiscale'
|
||||||
|
time_ms: float
|
||||||
|
scale: float = 1.0 # Échelle à laquelle le meilleur match a été trouvé
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# TemplateMatcher
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
class TemplateMatcher:
|
||||||
|
"""Localise une ancre visuelle dans un screenshot via template matching.
|
||||||
|
|
||||||
|
Paramètres :
|
||||||
|
threshold : score minimum pour accepter un match (défaut 0.75)
|
||||||
|
multiscale : active le matching multi-échelle (défaut False)
|
||||||
|
scales : liste d'échelles à tester en mode multi-scale
|
||||||
|
method : méthode cv2 (défaut cv2.TM_CCOEFF_NORMED)
|
||||||
|
grayscale : convertir en niveaux de gris avant matching (défaut False)
|
||||||
|
"""
|
||||||
|
|
||||||
|
# Échelles par défaut pour le mode multi-scale, ordonnées par
|
||||||
|
# probabilité décroissante (1.0 en premier = rapide si ça matche)
|
||||||
|
DEFAULT_SCALES: List[float] = [1.0, 0.95, 1.05, 0.9, 1.1, 0.85, 1.15, 0.8, 1.2]
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
threshold: float = 0.75,
|
||||||
|
multiscale: bool = False,
|
||||||
|
scales: Optional[List[float]] = None,
|
||||||
|
grayscale: bool = False,
|
||||||
|
):
|
||||||
|
self.threshold = threshold
|
||||||
|
self.multiscale = multiscale
|
||||||
|
self.scales = scales or self.DEFAULT_SCALES
|
||||||
|
self.grayscale = grayscale
|
||||||
|
# cv2.TM_CCOEFF_NORMED est la méthode utilisée partout dans le projet
|
||||||
|
self._cv2_method = cv2.TM_CCOEFF_NORMED if _CV2 else None
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# API publique
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def match_screen(
|
||||||
|
self,
|
||||||
|
anchor_b64: Optional[str] = None,
|
||||||
|
anchor_pil: Optional["Image.Image"] = None,
|
||||||
|
screen_pil: Optional["Image.Image"] = None,
|
||||||
|
) -> Optional[MatchResult]:
|
||||||
|
"""Cherche l'ancre dans le screenshot courant (ou fourni).
|
||||||
|
|
||||||
|
L'ancre peut être passée en base64 ou en PIL Image.
|
||||||
|
Le screenshot est capturé via mss si non fourni.
|
||||||
|
|
||||||
|
Retourne un MatchResult ou None si aucun match >= seuil.
|
||||||
|
"""
|
||||||
|
if not (_CV2 and _NP and _PIL):
|
||||||
|
logger.debug("[TemplateMatcher] cv2/numpy/PIL non disponible")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# --- Préparer l'ancre ---
|
||||||
|
anchor_img = self._decode_anchor(anchor_b64, anchor_pil)
|
||||||
|
if anchor_img is None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
# --- Préparer le screenshot ---
|
||||||
|
if screen_pil is None:
|
||||||
|
screen_pil = self._capture_screen()
|
||||||
|
if screen_pil is None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
# --- Convertir en arrays cv2 ---
|
||||||
|
screen_cv = cv2.cvtColor(np.array(screen_pil), cv2.COLOR_RGB2BGR)
|
||||||
|
anchor_cv = cv2.cvtColor(np.array(anchor_img), cv2.COLOR_RGB2BGR)
|
||||||
|
|
||||||
|
# --- Matching ---
|
||||||
|
if self.multiscale:
|
||||||
|
return self._match_multiscale(screen_cv, anchor_cv)
|
||||||
|
else:
|
||||||
|
return self._match_single(screen_cv, anchor_cv)
|
||||||
|
|
||||||
|
def match_in_region(
|
||||||
|
self,
|
||||||
|
region_cv: "np.ndarray",
|
||||||
|
anchor_cv: "np.ndarray",
|
||||||
|
threshold: Optional[float] = None,
|
||||||
|
) -> Optional[MatchResult]:
|
||||||
|
"""Match dans une région déjà découpée (arrays BGR).
|
||||||
|
|
||||||
|
Utilisé par les pipelines qui font leur propre capture/découpe.
|
||||||
|
"""
|
||||||
|
if not (_CV2 and _NP):
|
||||||
|
return None
|
||||||
|
|
||||||
|
thr = threshold if threshold is not None else self.threshold
|
||||||
|
|
||||||
|
if self.multiscale:
|
||||||
|
return self._match_multiscale(region_cv, anchor_cv, threshold_override=thr)
|
||||||
|
else:
|
||||||
|
return self._match_single(region_cv, anchor_cv, threshold_override=thr)
|
||||||
|
|
||||||
|
def match_screen_diagnostic(
|
||||||
|
self,
|
||||||
|
anchor_b64: Optional[str] = None,
|
||||||
|
anchor_pil: Optional["Image.Image"] = None,
|
||||||
|
screen_pil: Optional["Image.Image"] = None,
|
||||||
|
) -> str:
|
||||||
|
"""Retourne un diagnostic textuel (score + position) même sans match."""
|
||||||
|
if not (_CV2 and _NP and _PIL):
|
||||||
|
return "cv2/numpy/PIL non dispo"
|
||||||
|
|
||||||
|
anchor_img = self._decode_anchor(anchor_b64, anchor_pil)
|
||||||
|
if anchor_img is None:
|
||||||
|
return "ancre non décodable"
|
||||||
|
|
||||||
|
if screen_pil is None:
|
||||||
|
screen_pil = self._capture_screen()
|
||||||
|
if screen_pil is None:
|
||||||
|
return "capture écran échouée"
|
||||||
|
|
||||||
|
screen_cv = cv2.cvtColor(np.array(screen_pil), cv2.COLOR_RGB2BGR)
|
||||||
|
anchor_cv = cv2.cvtColor(np.array(anchor_img), cv2.COLOR_RGB2BGR)
|
||||||
|
|
||||||
|
if anchor_cv.shape[0] >= screen_cv.shape[0] or anchor_cv.shape[1] >= screen_cv.shape[1]:
|
||||||
|
return f"ancre {anchor_cv.shape[:2]} >= écran {screen_cv.shape[:2]}"
|
||||||
|
|
||||||
|
s_img, a_img = self._maybe_grayscale(screen_cv, anchor_cv)
|
||||||
|
result_tm = cv2.matchTemplate(s_img, a_img, self._cv2_method)
|
||||||
|
_, max_val, _, max_loc = cv2.minMaxLoc(result_tm)
|
||||||
|
return f"{max_val:.3f} pos={max_loc}"
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Méthodes internes
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _match_single(
|
||||||
|
self,
|
||||||
|
screen_cv: "np.ndarray",
|
||||||
|
anchor_cv: "np.ndarray",
|
||||||
|
threshold_override: Optional[float] = None,
|
||||||
|
) -> Optional[MatchResult]:
|
||||||
|
"""Template matching single-scale."""
|
||||||
|
threshold = threshold_override if threshold_override is not None else self.threshold
|
||||||
|
|
||||||
|
if anchor_cv.shape[0] >= screen_cv.shape[0] or anchor_cv.shape[1] >= screen_cv.shape[1]:
|
||||||
|
logger.debug("[TemplateMatcher] Ancre plus grande que le screen")
|
||||||
|
return None
|
||||||
|
|
||||||
|
s_img, a_img = self._maybe_grayscale(screen_cv, anchor_cv)
|
||||||
|
|
||||||
|
t0 = time.time()
|
||||||
|
result_tm = cv2.matchTemplate(s_img, a_img, self._cv2_method)
|
||||||
|
_, max_val, _, max_loc = cv2.minMaxLoc(result_tm)
|
||||||
|
elapsed_ms = (time.time() - t0) * 1000
|
||||||
|
|
||||||
|
logger.debug(
|
||||||
|
"[TemplateMatcher] score=%.3f pos=%s (%.0fms)",
|
||||||
|
max_val, max_loc, elapsed_ms,
|
||||||
|
)
|
||||||
|
|
||||||
|
if max_val >= threshold:
|
||||||
|
cx = max_loc[0] + anchor_cv.shape[1] // 2
|
||||||
|
cy = max_loc[1] + anchor_cv.shape[0] // 2
|
||||||
|
return MatchResult(
|
||||||
|
x=cx,
|
||||||
|
y=cy,
|
||||||
|
score=float(max_val),
|
||||||
|
method='template',
|
||||||
|
time_ms=elapsed_ms,
|
||||||
|
scale=1.0,
|
||||||
|
)
|
||||||
|
return None
|
||||||
|
|
||||||
|
def _match_multiscale(
|
||||||
|
self,
|
||||||
|
screen_cv: "np.ndarray",
|
||||||
|
anchor_cv: "np.ndarray",
|
||||||
|
threshold_override: Optional[float] = None,
|
||||||
|
) -> Optional[MatchResult]:
|
||||||
|
"""Template matching multi-scale."""
|
||||||
|
threshold = threshold_override if threshold_override is not None else self.threshold
|
||||||
|
|
||||||
|
best_score = -1.0
|
||||||
|
best_loc = None
|
||||||
|
best_scale = 1.0
|
||||||
|
best_anchor_shape = anchor_cv.shape
|
||||||
|
|
||||||
|
t0 = time.time()
|
||||||
|
|
||||||
|
for scale in self.scales:
|
||||||
|
if scale == 1.0:
|
||||||
|
scaled = anchor_cv
|
||||||
|
else:
|
||||||
|
new_w = int(anchor_cv.shape[1] * scale)
|
||||||
|
new_h = int(anchor_cv.shape[0] * scale)
|
||||||
|
if new_w < 8 or new_h < 8:
|
||||||
|
continue
|
||||||
|
if new_h >= screen_cv.shape[0] or new_w >= screen_cv.shape[1]:
|
||||||
|
continue
|
||||||
|
scaled = cv2.resize(anchor_cv, (new_w, new_h), interpolation=cv2.INTER_AREA)
|
||||||
|
|
||||||
|
if scaled.shape[0] >= screen_cv.shape[0] or scaled.shape[1] >= screen_cv.shape[1]:
|
||||||
|
continue
|
||||||
|
|
||||||
|
s_img, a_img = self._maybe_grayscale(screen_cv, scaled)
|
||||||
|
result_tm = cv2.matchTemplate(s_img, a_img, self._cv2_method)
|
||||||
|
_, max_val, _, max_loc = cv2.minMaxLoc(result_tm)
|
||||||
|
|
||||||
|
if max_val > best_score:
|
||||||
|
best_score = max_val
|
||||||
|
best_loc = max_loc
|
||||||
|
best_scale = scale
|
||||||
|
best_anchor_shape = scaled.shape
|
||||||
|
|
||||||
|
elapsed_ms = (time.time() - t0) * 1000
|
||||||
|
|
||||||
|
logger.debug(
|
||||||
|
"[TemplateMatcher/multiscale] best_score=%.3f scale=%.2f (%.0fms)",
|
||||||
|
best_score, best_scale, elapsed_ms,
|
||||||
|
)
|
||||||
|
|
||||||
|
if best_score >= threshold and best_loc is not None:
|
||||||
|
cx = best_loc[0] + best_anchor_shape[1] // 2
|
||||||
|
cy = best_loc[1] + best_anchor_shape[0] // 2
|
||||||
|
return MatchResult(
|
||||||
|
x=cx,
|
||||||
|
y=cy,
|
||||||
|
score=float(best_score),
|
||||||
|
method='template_multiscale',
|
||||||
|
time_ms=elapsed_ms,
|
||||||
|
scale=best_scale,
|
||||||
|
)
|
||||||
|
return None
|
||||||
|
|
||||||
|
def _maybe_grayscale(
|
||||||
|
self,
|
||||||
|
screen: "np.ndarray",
|
||||||
|
anchor: "np.ndarray",
|
||||||
|
) -> Tuple["np.ndarray", "np.ndarray"]:
|
||||||
|
"""Convertit en niveaux de gris si self.grayscale est True."""
|
||||||
|
if not self.grayscale:
|
||||||
|
return screen, anchor
|
||||||
|
s = cv2.cvtColor(screen, cv2.COLOR_BGR2GRAY) if len(screen.shape) == 3 else screen
|
||||||
|
a = cv2.cvtColor(anchor, cv2.COLOR_BGR2GRAY) if len(anchor.shape) == 3 else anchor
|
||||||
|
return s, a
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _decode_anchor(
|
||||||
|
anchor_b64: Optional[str],
|
||||||
|
anchor_pil: Optional["Image.Image"],
|
||||||
|
) -> Optional["Image.Image"]:
|
||||||
|
"""Décode l'ancre depuis base64 ou retourne le PIL directement."""
|
||||||
|
if anchor_pil is not None:
|
||||||
|
return anchor_pil
|
||||||
|
|
||||||
|
if anchor_b64 is None:
|
||||||
|
logger.debug("[TemplateMatcher] Ni anchor_b64 ni anchor_pil fourni")
|
||||||
|
return None
|
||||||
|
|
||||||
|
try:
|
||||||
|
raw = anchor_b64.split(',')[1] if ',' in anchor_b64 else anchor_b64
|
||||||
|
data = base64.b64decode(raw)
|
||||||
|
return Image.open(io.BytesIO(data))
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug("[TemplateMatcher] Erreur décodage ancre: %s", e)
|
||||||
|
return None
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _capture_screen() -> Optional["Image.Image"]:
|
||||||
|
"""Capture l'écran complet via mss (moniteur 0 = tous les écrans)."""
|
||||||
|
if not _MSS:
|
||||||
|
logger.debug("[TemplateMatcher] mss non disponible")
|
||||||
|
return None
|
||||||
|
|
||||||
|
try:
|
||||||
|
with mss_lib.mss() as sct:
|
||||||
|
mon = sct.monitors[0]
|
||||||
|
grab = sct.grab(mon)
|
||||||
|
return Image.frombytes('RGB', grab.size, grab.bgra, 'raw', 'BGRX')
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug("[TemplateMatcher] Erreur capture écran: %s", e)
|
||||||
|
return None
|
||||||
103
core/grounding/think_arbiter.py
Normal file
103
core/grounding/think_arbiter.py
Normal file
@@ -0,0 +1,103 @@
|
|||||||
|
"""
|
||||||
|
core/grounding/think_arbiter.py — Layer THINK : VLM arbitre (InfiGUI via subprocess)
|
||||||
|
|
||||||
|
Appelé UNIQUEMENT quand le SmartMatcher n'a pas assez confiance.
|
||||||
|
Utilise le subprocess worker InfiGUI (pas de serveur HTTP).
|
||||||
|
|
||||||
|
Utilisation :
|
||||||
|
from core.grounding.think_arbiter import ThinkArbiter
|
||||||
|
|
||||||
|
arbiter = ThinkArbiter()
|
||||||
|
result = arbiter.arbitrate(target, candidates, screenshot)
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import time
|
||||||
|
from typing import Any, Dict, List, Optional
|
||||||
|
|
||||||
|
from core.grounding.fast_types import LocateResult, MatchCandidate
|
||||||
|
from core.grounding.target import GroundingTarget
|
||||||
|
|
||||||
|
|
||||||
|
class ThinkArbiter:
|
||||||
|
"""Arbitre VLM — appelle InfiGUI via subprocess worker."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self._grounder = None
|
||||||
|
|
||||||
|
def _get_grounder(self):
|
||||||
|
if self._grounder is None:
|
||||||
|
from core.grounding.ui_tars_grounder import UITarsGrounder
|
||||||
|
self._grounder = UITarsGrounder.get_instance()
|
||||||
|
return self._grounder
|
||||||
|
|
||||||
|
@property
|
||||||
|
def available(self) -> bool:
|
||||||
|
"""Toujours disponible — le worker se lance à la demande."""
|
||||||
|
return True
|
||||||
|
|
||||||
|
def arbitrate(
|
||||||
|
self,
|
||||||
|
target: GroundingTarget,
|
||||||
|
candidates: List[MatchCandidate],
|
||||||
|
screenshot_pil: Optional[Any] = None,
|
||||||
|
) -> Optional[LocateResult]:
|
||||||
|
"""Demande au VLM de trancher.
|
||||||
|
|
||||||
|
Si target.template_b64 est fourni, on bascule en mode fusionné :
|
||||||
|
le crop est passé comme image de référence à InfiGUI, ce qui évite
|
||||||
|
une description Ollama qwen2.5vl coûteuse en VRAM.
|
||||||
|
"""
|
||||||
|
t0 = time.time()
|
||||||
|
|
||||||
|
# Décodage du crop d'ancre si disponible (mode fusionné)
|
||||||
|
anchor_pil = None
|
||||||
|
if target.template_b64:
|
||||||
|
try:
|
||||||
|
import base64
|
||||||
|
import io
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
raw_b64 = target.template_b64
|
||||||
|
if ',' in raw_b64:
|
||||||
|
raw_b64 = raw_b64.split(',', 1)[1]
|
||||||
|
anchor_pil = Image.open(io.BytesIO(base64.b64decode(raw_b64))).convert("RGB")
|
||||||
|
except Exception as ex:
|
||||||
|
print(f"⚠️ [THINK] Décodage anchor échoué: {ex}")
|
||||||
|
anchor_pil = None
|
||||||
|
|
||||||
|
try:
|
||||||
|
grounder = self._get_grounder()
|
||||||
|
result = grounder.ground(
|
||||||
|
target_text=target.text or "",
|
||||||
|
target_description=target.description or "",
|
||||||
|
screen_pil=screenshot_pil,
|
||||||
|
anchor_pil=anchor_pil,
|
||||||
|
)
|
||||||
|
|
||||||
|
dt = (time.time() - t0) * 1000
|
||||||
|
|
||||||
|
if result is None:
|
||||||
|
label = target.text or "<crop>"
|
||||||
|
print(f"🤔 [THINK] VLM n'a pas trouvé '{label}' ({dt:.0f}ms)")
|
||||||
|
return None
|
||||||
|
|
||||||
|
method = "think_vlm_fused" if anchor_pil is not None else "think_vlm"
|
||||||
|
locate = LocateResult(
|
||||||
|
x=result.x,
|
||||||
|
y=result.y,
|
||||||
|
confidence=result.confidence,
|
||||||
|
method=method,
|
||||||
|
time_ms=dt,
|
||||||
|
tier="think",
|
||||||
|
candidates_count=len(candidates),
|
||||||
|
)
|
||||||
|
|
||||||
|
print(f"🤔 [THINK/{method}] ({result.x}, {result.y}) conf={result.confidence:.2f} ({dt:.0f}ms)")
|
||||||
|
return locate
|
||||||
|
|
||||||
|
except Exception as ex:
|
||||||
|
dt = (time.time() - t0) * 1000
|
||||||
|
print(f"⚠️ [THINK] Erreur: {ex} ({dt:.0f}ms)")
|
||||||
|
return None
|
||||||
174
core/grounding/title_verifier.py
Normal file
174
core/grounding/title_verifier.py
Normal file
@@ -0,0 +1,174 @@
|
|||||||
|
"""
|
||||||
|
core/grounding/title_verifier.py — Vérification post-action par titre de fenêtre
|
||||||
|
|
||||||
|
Après chaque action (clic, double-clic), vérifie que la fenêtre active
|
||||||
|
a changé de manière attendue en lisant le titre via OCR sur un crop
|
||||||
|
de 45px en haut de l'écran.
|
||||||
|
|
||||||
|
Léger (~120ms), non-bloquant (échec = warning + retry, pas stop).
|
||||||
|
|
||||||
|
Utilisation :
|
||||||
|
from core.grounding.title_verifier import TitleVerifier
|
||||||
|
|
||||||
|
verifier = TitleVerifier()
|
||||||
|
title = verifier.read_title(screenshot_pil)
|
||||||
|
changed = verifier.has_title_changed(title_before, title_after)
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import time
|
||||||
|
from difflib import SequenceMatcher
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
|
||||||
|
class TitleVerifier:
|
||||||
|
"""Vérifie le titre de la fenêtre active via OCR sur crop."""
|
||||||
|
|
||||||
|
# Hauteur du crop pour la barre de titre Windows
|
||||||
|
TITLE_BAR_HEIGHT = 45
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self._ocr_fn = None # Lazy load
|
||||||
|
|
||||||
|
def read_title(self, screenshot_pil) -> str:
|
||||||
|
"""Lit le titre de la fenêtre active via OCR sur le crop supérieur.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
screenshot_pil: Image PIL du screenshot complet.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Texte du titre (peut être vide si OCR échoue).
|
||||||
|
"""
|
||||||
|
t0 = time.time()
|
||||||
|
|
||||||
|
try:
|
||||||
|
w, h = screenshot_pil.size
|
||||||
|
# Crop la barre de titre (45px du haut)
|
||||||
|
title_crop = screenshot_pil.crop((0, 0, w, min(self.TITLE_BAR_HEIGHT, h)))
|
||||||
|
|
||||||
|
# OCR sur le petit crop
|
||||||
|
ocr_fn = self._get_ocr()
|
||||||
|
if ocr_fn is None:
|
||||||
|
return ""
|
||||||
|
|
||||||
|
text = ocr_fn(title_crop)
|
||||||
|
dt = (time.time() - t0) * 1000
|
||||||
|
|
||||||
|
# Nettoyer le texte
|
||||||
|
title = text.strip() if text else ""
|
||||||
|
if title:
|
||||||
|
print(f"📋 [TitleVerify] Titre lu: '{title[:60]}' ({dt:.0f}ms)")
|
||||||
|
|
||||||
|
return title
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠️ [TitleVerify] Erreur lecture titre: {e}")
|
||||||
|
return ""
|
||||||
|
|
||||||
|
def has_title_changed(self, title_before: str, title_after: str) -> bool:
|
||||||
|
"""Vérifie si le titre a changé de manière significative."""
|
||||||
|
if not title_before and not title_after:
|
||||||
|
return False
|
||||||
|
if not title_before or not title_after:
|
||||||
|
return True # Un des deux est vide = changement
|
||||||
|
|
||||||
|
# Comparaison fuzzy — les titres peuvent avoir des variations mineures
|
||||||
|
ratio = SequenceMatcher(None, title_before.lower(), title_after.lower()).ratio()
|
||||||
|
return ratio < 0.85 # Changement si < 85% similaire
|
||||||
|
|
||||||
|
def verify_action(
|
||||||
|
self,
|
||||||
|
screenshot_before,
|
||||||
|
screenshot_after,
|
||||||
|
action_type: str,
|
||||||
|
) -> dict:
|
||||||
|
"""Vérifie qu'une action a produit l'effet attendu sur le titre.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
screenshot_before: Screenshot PIL avant l'action.
|
||||||
|
screenshot_after: Screenshot PIL après l'action.
|
||||||
|
action_type: Type d'action ("double_click", "click", "type", "hotkey").
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict avec success, title_before, title_after, changed.
|
||||||
|
"""
|
||||||
|
# Les actions qui ne changent pas le titre
|
||||||
|
if action_type in ('type_text', 'keyboard_shortcut', 'wait_for_anchor', 'hover'):
|
||||||
|
return {
|
||||||
|
'success': True,
|
||||||
|
'title_before': '',
|
||||||
|
'title_after': '',
|
||||||
|
'changed': False,
|
||||||
|
'reason': f"Action '{action_type}' — vérification titre non requise",
|
||||||
|
}
|
||||||
|
|
||||||
|
title_before = self.read_title(screenshot_before)
|
||||||
|
title_after = self.read_title(screenshot_after)
|
||||||
|
changed = self.has_title_changed(title_before, title_after)
|
||||||
|
|
||||||
|
# Pour un double-clic (ouverture fichier/dossier), le titre DOIT changer
|
||||||
|
# Mais seulement si les titres lus sont significatifs (> 3 chars)
|
||||||
|
# docTR sur un crop 45px dans une VM peut donner du bruit ('o', 'a', etc.)
|
||||||
|
if action_type in ('double_click_anchor',) and not changed:
|
||||||
|
if len(title_before) > 3 and len(title_after) > 3:
|
||||||
|
return {
|
||||||
|
'success': False,
|
||||||
|
'title_before': title_before,
|
||||||
|
'title_after': title_after,
|
||||||
|
'changed': False,
|
||||||
|
'reason': f"Double-clic sans changement de titre ('{title_after[:40]}')",
|
||||||
|
}
|
||||||
|
# Titres trop courts = bruit OCR, on ne peut pas conclure
|
||||||
|
return {
|
||||||
|
'success': True,
|
||||||
|
'title_before': title_before,
|
||||||
|
'title_after': title_after,
|
||||||
|
'changed': False,
|
||||||
|
'reason': f"Titre trop court pour vérifier ('{title_after}')",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Pour un clic simple, le changement est optionnel
|
||||||
|
return {
|
||||||
|
'success': True,
|
||||||
|
'title_before': title_before,
|
||||||
|
'title_after': title_after,
|
||||||
|
'changed': changed,
|
||||||
|
'reason': 'Titre changé' if changed else 'Titre identique (acceptable)',
|
||||||
|
}
|
||||||
|
|
||||||
|
_easyocr_reader = None # Singleton partagé
|
||||||
|
|
||||||
|
def _get_ocr(self):
|
||||||
|
"""Lazy load de la fonction OCR (EasyOCR prioritaire, fallback docTR)."""
|
||||||
|
if self._ocr_fn is not None:
|
||||||
|
return self._ocr_fn
|
||||||
|
|
||||||
|
# EasyOCR (rapide, bonne qualité GUI)
|
||||||
|
try:
|
||||||
|
import easyocr
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
if TitleVerifier._easyocr_reader is None:
|
||||||
|
TitleVerifier._easyocr_reader = easyocr.Reader(
|
||||||
|
['fr', 'en'], gpu=True, verbose=False
|
||||||
|
)
|
||||||
|
|
||||||
|
def _easyocr_extract_text(img):
|
||||||
|
results = TitleVerifier._easyocr_reader.readtext(np.array(img))
|
||||||
|
return ' '.join(r[1] for r in results if r[1].strip())
|
||||||
|
|
||||||
|
self._ocr_fn = _easyocr_extract_text
|
||||||
|
return self._ocr_fn
|
||||||
|
except ImportError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Fallback docTR
|
||||||
|
try:
|
||||||
|
import sys
|
||||||
|
sys.path.insert(0, 'visual_workflow_builder/backend')
|
||||||
|
from services.ocr_service import ocr_extract_text
|
||||||
|
self._ocr_fn = ocr_extract_text
|
||||||
|
return self._ocr_fn
|
||||||
|
except ImportError:
|
||||||
|
return None
|
||||||
161
core/grounding/ui_tars_grounder.py
Normal file
161
core/grounding/ui_tars_grounder.py
Normal file
@@ -0,0 +1,161 @@
|
|||||||
|
"""
|
||||||
|
core/grounding/ui_tars_grounder.py — Grounding via script one-shot InfiGUI
|
||||||
|
|
||||||
|
Chaque appel lance un subprocess Python qui charge le modèle, infère, et quitte.
|
||||||
|
Lent (~15s) mais fiable — pas de crash CUDA en process persistant.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
import threading
|
||||||
|
import time
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from core.grounding.target import GroundingResult
|
||||||
|
|
||||||
|
_instance: Optional[UITarsGrounder] = None
|
||||||
|
_instance_lock = threading.Lock()
|
||||||
|
|
||||||
|
|
||||||
|
class UITarsGrounder:
|
||||||
|
"""Grounding via script one-shot InfiGUI."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self._lock = threading.Lock()
|
||||||
|
self._project_root = os.path.abspath(
|
||||||
|
os.path.join(os.path.dirname(__file__), "..", "..")
|
||||||
|
)
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def get_instance(cls) -> UITarsGrounder:
|
||||||
|
global _instance
|
||||||
|
if _instance is None:
|
||||||
|
with _instance_lock:
|
||||||
|
if _instance is None:
|
||||||
|
_instance = cls()
|
||||||
|
return _instance
|
||||||
|
|
||||||
|
@property
|
||||||
|
def available(self) -> bool:
|
||||||
|
return True # Toujours disponible — le script se lance à la demande
|
||||||
|
|
||||||
|
def ground(
|
||||||
|
self,
|
||||||
|
target_text: str = "",
|
||||||
|
target_description: str = "",
|
||||||
|
screen_pil=None,
|
||||||
|
anchor_pil=None,
|
||||||
|
) -> Optional[GroundingResult]:
|
||||||
|
"""Localise un élément UI via un script one-shot InfiGUI.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
target_text: nom textuel de la cible (peut être vide si anchor_pil fourni).
|
||||||
|
target_description: description sémantique libre.
|
||||||
|
screen_pil: screenshot complet (PIL.Image).
|
||||||
|
anchor_pil: crop visuel de l'ancre capturée précédemment (PIL.Image).
|
||||||
|
Si fourni, le worker passe en mode fusionné : Image1=crop, Image2=screen,
|
||||||
|
"trouve sur l'image 2 l'élément visuel de l'image 1".
|
||||||
|
"""
|
||||||
|
t0 = time.time()
|
||||||
|
|
||||||
|
try:
|
||||||
|
with self._lock:
|
||||||
|
# Sauver l'image principale
|
||||||
|
image_path = "/tmp/infigui_screen.png"
|
||||||
|
if screen_pil is not None:
|
||||||
|
screen_pil.save(image_path)
|
||||||
|
|
||||||
|
# Sauver l'image d'ancre (mode fusionné)
|
||||||
|
anchor_image_path = ""
|
||||||
|
if anchor_pil is not None:
|
||||||
|
anchor_image_path = "/tmp/infigui_anchor.png"
|
||||||
|
anchor_pil.save(anchor_image_path)
|
||||||
|
|
||||||
|
# Construire la requête JSON
|
||||||
|
req = json.dumps({
|
||||||
|
"target": target_text,
|
||||||
|
"description": target_description,
|
||||||
|
"image_path": image_path,
|
||||||
|
"anchor_image_path": anchor_image_path,
|
||||||
|
})
|
||||||
|
|
||||||
|
mode_str = "fused" if anchor_pil is not None else "text"
|
||||||
|
label_short = target_text[:30] if target_text else "<crop only>"
|
||||||
|
print(f"🎯 [InfiGUI] Lancement one-shot [{mode_str}]: '{label_short}'")
|
||||||
|
|
||||||
|
# Lancer le script one-shot
|
||||||
|
# IMPORTANT: depuis un service systemd où le parent a déjà chargé CUDA,
|
||||||
|
# le subprocess hérite d'un état GPU cassé (No CUDA GPUs available).
|
||||||
|
# Solutions : start_new_session=True (nouveau cgroup) + forcer
|
||||||
|
# CUDA_VISIBLE_DEVICES=0 explicitement pour bypass l'héritage parent.
|
||||||
|
_child_env = {**os.environ}
|
||||||
|
_child_env["PYTHONDONTWRITEBYTECODE"] = "1"
|
||||||
|
_child_env["CUDA_VISIBLE_DEVICES"] = "0"
|
||||||
|
_child_env["NVIDIA_VISIBLE_DEVICES"] = "all"
|
||||||
|
# Supprimer les variables Python qui pourraient pointer sur l'état parent
|
||||||
|
_child_env.pop("PYTORCH_NVML_BASED_CUDA_CHECK", None)
|
||||||
|
|
||||||
|
result = subprocess.run(
|
||||||
|
[sys.executable, "-m", "core.grounding.infigui_worker"],
|
||||||
|
input=req + "\n",
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=60,
|
||||||
|
cwd=self._project_root,
|
||||||
|
env=_child_env,
|
||||||
|
start_new_session=True, # nouveau session group, isole du parent
|
||||||
|
close_fds=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
if result.returncode != 0:
|
||||||
|
stderr_lines = (result.stderr or '').strip().split('\n')
|
||||||
|
# Afficher les dernières lignes significatives du stderr
|
||||||
|
last_err = [l for l in stderr_lines[-5:] if l.strip()]
|
||||||
|
print(f"⚠️ [InfiGUI] Script échoué (code {result.returncode})")
|
||||||
|
for l in last_err:
|
||||||
|
print(f" ❌ {l}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Parser la sortie — chercher la ligne JSON de résultat
|
||||||
|
data = None
|
||||||
|
for line in result.stdout.strip().split("\n"):
|
||||||
|
line = line.strip()
|
||||||
|
if not line:
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
parsed = json.loads(line)
|
||||||
|
if "x" in parsed:
|
||||||
|
data = parsed
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if data is None:
|
||||||
|
print(f"⚠️ [InfiGUI] Pas de réponse JSON dans la sortie")
|
||||||
|
return None
|
||||||
|
|
||||||
|
dt = (time.time() - t0) * 1000
|
||||||
|
|
||||||
|
if data.get("x") is not None:
|
||||||
|
method_name = "infigui_fused" if anchor_pil is not None else "infigui"
|
||||||
|
print(f"🎯 [InfiGUI/{method_name}] ({data['x']}, {data['y']}) "
|
||||||
|
f"conf={data.get('confidence', 0):.2f} ({dt:.0f}ms)")
|
||||||
|
return GroundingResult(
|
||||||
|
x=data["x"], y=data["y"],
|
||||||
|
method=method_name,
|
||||||
|
confidence=data.get("confidence", 0.90),
|
||||||
|
time_ms=dt,
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
print(f"⚠️ [InfiGUI] Pas trouvé ({dt:.0f}ms)")
|
||||||
|
return None
|
||||||
|
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
print(f"⚠️ [InfiGUI] Timeout 60s")
|
||||||
|
return None
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠️ [InfiGUI] Erreur: {e}")
|
||||||
|
return None
|
||||||
0
core/knowledge/__init__.py
Normal file
0
core/knowledge/__init__.py
Normal file
523
core/knowledge/ui_patterns.py
Normal file
523
core/knowledge/ui_patterns.py
Normal file
@@ -0,0 +1,523 @@
|
|||||||
|
"""
|
||||||
|
Base de connaissances des patterns d'interface utilisateur.
|
||||||
|
|
||||||
|
Donne à Léa des "réflexes natifs" : quand elle reconnaît un pattern UI
|
||||||
|
connu (dialogue OK/Annuler, menu, barre d'outils), elle sait immédiatement
|
||||||
|
quoi faire sans avoir besoin de l'apprendre par observation.
|
||||||
|
|
||||||
|
Sources :
|
||||||
|
- GUI-R1 dataset (3K exemples annotés, ritzzai/GUI-R1)
|
||||||
|
- Patterns Windows/Linux courants
|
||||||
|
- Conventions UI universelles
|
||||||
|
|
||||||
|
Utilisation :
|
||||||
|
from core.knowledge.ui_patterns import UIPatternLibrary
|
||||||
|
lib = UIPatternLibrary()
|
||||||
|
match = lib.find_pattern("Voulez-vous enregistrer ?")
|
||||||
|
# → {'action': 'click', 'target': 'Enregistrer', 'zone': 'dialog_center', ...}
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any, Dict, List, Optional, Tuple
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class UIPattern:
|
||||||
|
"""Un pattern d'interface connu."""
|
||||||
|
name: str
|
||||||
|
category: str
|
||||||
|
triggers: List[str]
|
||||||
|
action: str
|
||||||
|
target: str
|
||||||
|
typical_zone: str
|
||||||
|
typical_bbox: Optional[List[float]] = None
|
||||||
|
os: str = "any"
|
||||||
|
confidence: float = 0.9
|
||||||
|
metadata: Dict[str, Any] = field(default_factory=dict)
|
||||||
|
|
||||||
|
|
||||||
|
# Patterns Windows natifs — réflexes de base
|
||||||
|
BUILTIN_PATTERNS: List[Dict[str, Any]] = [
|
||||||
|
# === DIALOGUES DE CONFIRMATION ===
|
||||||
|
{
|
||||||
|
"name": "dialog_save",
|
||||||
|
"category": "dialog",
|
||||||
|
"triggers": [
|
||||||
|
"voulez-vous enregistrer", "do you want to save",
|
||||||
|
"save changes", "enregistrer les modifications",
|
||||||
|
"enregistrer sous", "save as",
|
||||||
|
"sauvegarder", "unsaved changes",
|
||||||
|
],
|
||||||
|
"action": "click",
|
||||||
|
"target": "Enregistrer",
|
||||||
|
"alternatives": ["Save", "Oui", "Yes"],
|
||||||
|
"typical_zone": "dialog_center",
|
||||||
|
"typical_bbox": [0.35, 0.55, 0.50, 0.65],
|
||||||
|
"os": "any",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "dialog_cancel",
|
||||||
|
"category": "dialog",
|
||||||
|
"triggers": [
|
||||||
|
"annuler", "cancel", "abandonner", "discard",
|
||||||
|
],
|
||||||
|
"action": "click",
|
||||||
|
"target": "Annuler",
|
||||||
|
"alternatives": ["Cancel", "Non", "No"],
|
||||||
|
"typical_zone": "dialog_center",
|
||||||
|
"typical_bbox": [0.50, 0.55, 0.65, 0.65],
|
||||||
|
"os": "any",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "dialog_ok",
|
||||||
|
"category": "dialog",
|
||||||
|
"triggers": [
|
||||||
|
"ok", "d'accord", "compris", "information",
|
||||||
|
"erreur", "error", "warning", "avertissement",
|
||||||
|
],
|
||||||
|
"action": "click",
|
||||||
|
"target": "OK",
|
||||||
|
"alternatives": ["Fermer", "Close", "Compris"],
|
||||||
|
"typical_zone": "dialog_center",
|
||||||
|
"typical_bbox": [0.45, 0.60, 0.55, 0.70],
|
||||||
|
"os": "any",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "dialog_yes_no",
|
||||||
|
"category": "dialog",
|
||||||
|
"triggers": [
|
||||||
|
"êtes-vous sûr", "are you sure", "confirmer",
|
||||||
|
"confirm", "supprimer", "delete",
|
||||||
|
],
|
||||||
|
"action": "click",
|
||||||
|
"target": "Oui",
|
||||||
|
"alternatives": ["Yes", "Confirmer", "Confirm"],
|
||||||
|
"typical_zone": "dialog_center",
|
||||||
|
"typical_bbox": [0.35, 0.60, 0.45, 0.68],
|
||||||
|
"os": "any",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "dialog_overwrite",
|
||||||
|
"category": "dialog",
|
||||||
|
"triggers": [
|
||||||
|
"voulez-vous remplacer", "voulez-vous écraser",
|
||||||
|
"remplacer le fichier", "replace existing",
|
||||||
|
"fichier existe déjà", "already exists",
|
||||||
|
"overwrite", "écraser",
|
||||||
|
],
|
||||||
|
"action": "click",
|
||||||
|
"target": "Oui",
|
||||||
|
"alternatives": ["Yes", "Remplacer", "Replace", "Confirmer"],
|
||||||
|
"typical_zone": "dialog_center",
|
||||||
|
"os": "any",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "dialog_dont_save",
|
||||||
|
"category": "dialog",
|
||||||
|
"triggers": [
|
||||||
|
"ne pas enregistrer", "don't save",
|
||||||
|
"ne pas sauvegarder", "quitter sans enregistrer",
|
||||||
|
"discard changes",
|
||||||
|
],
|
||||||
|
"action": "click",
|
||||||
|
"target": "Ne pas enregistrer",
|
||||||
|
"alternatives": ["Don't Save", "Ne pas sauvegarder", "Non"],
|
||||||
|
"typical_zone": "dialog_center",
|
||||||
|
"os": "any",
|
||||||
|
},
|
||||||
|
|
||||||
|
# === NAVIGATION FENÊTRE ===
|
||||||
|
{
|
||||||
|
"name": "window_close",
|
||||||
|
"category": "window",
|
||||||
|
"triggers": ["fermer la fenêtre", "close window"],
|
||||||
|
"action": "click",
|
||||||
|
"target": "X",
|
||||||
|
"typical_zone": "titlebar",
|
||||||
|
"typical_bbox": [0.96, 0.0, 1.0, 0.04],
|
||||||
|
"os": "windows",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "window_minimize",
|
||||||
|
"category": "window",
|
||||||
|
"triggers": ["minimiser", "minimize"],
|
||||||
|
"action": "click",
|
||||||
|
"target": "_",
|
||||||
|
"typical_zone": "titlebar",
|
||||||
|
"typical_bbox": [0.90, 0.0, 0.94, 0.04],
|
||||||
|
"os": "windows",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "window_maximize",
|
||||||
|
"category": "window",
|
||||||
|
"triggers": ["maximiser", "maximize", "agrandir"],
|
||||||
|
"action": "click",
|
||||||
|
"target": "□",
|
||||||
|
"typical_zone": "titlebar",
|
||||||
|
"typical_bbox": [0.94, 0.0, 0.96, 0.04],
|
||||||
|
"os": "windows",
|
||||||
|
},
|
||||||
|
|
||||||
|
# === MENUS ===
|
||||||
|
{
|
||||||
|
"name": "menu_file",
|
||||||
|
"category": "menu",
|
||||||
|
"triggers": ["menu fichier", "menu file", "ouvrir fichier", "open file"],
|
||||||
|
"action": "click",
|
||||||
|
"target": "Fichier",
|
||||||
|
"alternatives": ["File"],
|
||||||
|
"typical_zone": "menu_toolbar",
|
||||||
|
"typical_bbox": [0.0, 0.03, 0.06, 0.06],
|
||||||
|
"os": "any",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "menu_edit",
|
||||||
|
"category": "menu",
|
||||||
|
"triggers": ["édition", "edit", "modifier"],
|
||||||
|
"action": "click",
|
||||||
|
"target": "Édition",
|
||||||
|
"alternatives": ["Edit"],
|
||||||
|
"typical_zone": "menu_toolbar",
|
||||||
|
"typical_bbox": [0.06, 0.03, 0.12, 0.06],
|
||||||
|
"os": "any",
|
||||||
|
},
|
||||||
|
|
||||||
|
# === FORMULAIRES ===
|
||||||
|
{
|
||||||
|
"name": "form_submit",
|
||||||
|
"category": "form",
|
||||||
|
"triggers": [
|
||||||
|
"valider", "submit", "envoyer", "send",
|
||||||
|
"connexion", "login", "se connecter", "sign in",
|
||||||
|
],
|
||||||
|
"action": "click",
|
||||||
|
"target": "Valider",
|
||||||
|
"alternatives": ["Submit", "Envoyer", "Connexion", "Login", "OK"],
|
||||||
|
"typical_zone": "content",
|
||||||
|
"typical_bbox": [0.35, 0.70, 0.65, 0.80],
|
||||||
|
"os": "any",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "form_search",
|
||||||
|
"category": "form",
|
||||||
|
"triggers": ["rechercher", "search", "chercher", "find"],
|
||||||
|
"action": "click",
|
||||||
|
"target": "Rechercher",
|
||||||
|
"alternatives": ["Search", "🔍", "Go"],
|
||||||
|
"typical_zone": "menu_toolbar",
|
||||||
|
"typical_bbox": [0.30, 0.03, 0.70, 0.06],
|
||||||
|
"os": "any",
|
||||||
|
},
|
||||||
|
|
||||||
|
# === NAVIGATION WEB ===
|
||||||
|
{
|
||||||
|
"name": "cookie_accept",
|
||||||
|
"category": "popup",
|
||||||
|
"triggers": [
|
||||||
|
"accepter les cookies", "accept cookies",
|
||||||
|
"utilise des cookies", "uses cookies",
|
||||||
|
"j'accepte", "accept all", "tout accepter",
|
||||||
|
"consent", "consentement",
|
||||||
|
],
|
||||||
|
"action": "click",
|
||||||
|
"target": "Accepter",
|
||||||
|
"alternatives": ["Accept", "Accept All", "Tout accepter", "J'accepte"],
|
||||||
|
"typical_zone": "content",
|
||||||
|
"typical_bbox": [0.30, 0.80, 0.70, 0.90],
|
||||||
|
"os": "any",
|
||||||
|
},
|
||||||
|
|
||||||
|
# === RACCOURCIS UNIVERSELS ===
|
||||||
|
{
|
||||||
|
"name": "shortcut_save",
|
||||||
|
"category": "shortcut",
|
||||||
|
"triggers": ["sauvegarder", "enregistrer", "save"],
|
||||||
|
"action": "hotkey",
|
||||||
|
"target": "ctrl+s",
|
||||||
|
"typical_zone": "keyboard",
|
||||||
|
"os": "any",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "shortcut_undo",
|
||||||
|
"category": "shortcut",
|
||||||
|
"triggers": ["annuler action", "undo", "défaire"],
|
||||||
|
"action": "hotkey",
|
||||||
|
"target": "ctrl+z",
|
||||||
|
"typical_zone": "keyboard",
|
||||||
|
"os": "any",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "shortcut_copy",
|
||||||
|
"category": "shortcut",
|
||||||
|
"triggers": ["copier", "copy"],
|
||||||
|
"action": "hotkey",
|
||||||
|
"target": "ctrl+c",
|
||||||
|
"typical_zone": "keyboard",
|
||||||
|
"os": "any",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "shortcut_paste",
|
||||||
|
"category": "shortcut",
|
||||||
|
"triggers": ["coller", "paste"],
|
||||||
|
"action": "hotkey",
|
||||||
|
"target": "ctrl+v",
|
||||||
|
"typical_zone": "keyboard",
|
||||||
|
"os": "any",
|
||||||
|
},
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
class UIPatternLibrary:
|
||||||
|
"""Bibliothèque de patterns UI connus.
|
||||||
|
|
||||||
|
Fournit des "réflexes natifs" à Léa : quand un pattern
|
||||||
|
est reconnu dans le texte OCR ou le contexte visuel,
|
||||||
|
elle sait immédiatement quoi faire.
|
||||||
|
"""
|
||||||
|
|
||||||
|
# Chemins par défaut des fichiers de patterns additionnels
|
||||||
|
_PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent
|
||||||
|
_GUI_R1_PATTERNS_PATH = _PROJECT_ROOT / "data" / "gui_r1_ui_patterns.json"
|
||||||
|
_LEARNED_PATTERNS_PATH = _PROJECT_ROOT / "data" / "learned_patterns.json"
|
||||||
|
|
||||||
|
def __init__(self, extra_patterns_path: Optional[str] = None):
|
||||||
|
self._patterns: List[UIPattern] = []
|
||||||
|
self._load_builtin()
|
||||||
|
|
||||||
|
# Charger les patterns extraits de GUI-R1 (statiques, générés une fois)
|
||||||
|
self._load_from_file(str(self._GUI_R1_PATTERNS_PATH))
|
||||||
|
|
||||||
|
# Charger les patterns appris par observation Shadow (dynamiques)
|
||||||
|
self._load_from_file(str(self._LEARNED_PATTERNS_PATH))
|
||||||
|
|
||||||
|
# Fichier custom fourni explicitement
|
||||||
|
if extra_patterns_path:
|
||||||
|
self._load_from_file(extra_patterns_path)
|
||||||
|
|
||||||
|
logger.info(f"UIPatternLibrary: {len(self._patterns)} patterns chargés")
|
||||||
|
|
||||||
|
def _load_builtin(self):
|
||||||
|
for p in BUILTIN_PATTERNS:
|
||||||
|
self._patterns.append(UIPattern(
|
||||||
|
name=p["name"],
|
||||||
|
category=p["category"],
|
||||||
|
triggers=p["triggers"],
|
||||||
|
action=p["action"],
|
||||||
|
target=p["target"],
|
||||||
|
typical_zone=p.get("typical_zone", "content"),
|
||||||
|
typical_bbox=p.get("typical_bbox"),
|
||||||
|
os=p.get("os", "any"),
|
||||||
|
metadata={
|
||||||
|
"alternatives": p.get("alternatives", []),
|
||||||
|
"source": "builtin",
|
||||||
|
},
|
||||||
|
))
|
||||||
|
|
||||||
|
def _load_from_file(self, path: str):
|
||||||
|
filepath = Path(path)
|
||||||
|
if not filepath.exists():
|
||||||
|
logger.debug(f"Fichier patterns non trouvé (OK si premier lancement): {path}")
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
with open(filepath) as f:
|
||||||
|
data = json.load(f)
|
||||||
|
for p in data.get("patterns", []):
|
||||||
|
# Construire metadata en incluant source/learned_at/gui_r1_id si présents
|
||||||
|
meta = dict(p.get("metadata", {}))
|
||||||
|
if "source" in p:
|
||||||
|
meta["source"] = p["source"]
|
||||||
|
if "learned_at" in p:
|
||||||
|
meta["learned_at"] = p["learned_at"]
|
||||||
|
if "gui_r1_id" in p:
|
||||||
|
meta["gui_r1_id"] = p["gui_r1_id"]
|
||||||
|
self._patterns.append(UIPattern(
|
||||||
|
name=p["name"],
|
||||||
|
category=p.get("category", "custom"),
|
||||||
|
triggers=p.get("triggers", []),
|
||||||
|
action=p.get("action", "click"),
|
||||||
|
target=p.get("target", ""),
|
||||||
|
typical_zone=p.get("typical_zone", "content"),
|
||||||
|
typical_bbox=p.get("typical_bbox"),
|
||||||
|
os=p.get("os", "any"),
|
||||||
|
confidence=p.get("confidence", 0.9),
|
||||||
|
metadata=meta,
|
||||||
|
))
|
||||||
|
logger.info(f"Chargé {len(data.get('patterns', []))} patterns depuis {path}")
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Erreur chargement patterns: {e}")
|
||||||
|
|
||||||
|
def find_pattern(
|
||||||
|
self,
|
||||||
|
text: str,
|
||||||
|
os_filter: Optional[str] = None,
|
||||||
|
) -> Optional[Dict[str, Any]]:
|
||||||
|
"""Cherche un pattern UI dans du texte (OCR, titre fenêtre, etc.).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
text: Texte à analyser (peut contenir du bruit OCR)
|
||||||
|
os_filter: Filtrer par OS ("windows", "linux", None=tous)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict avec action, target, confidence, etc. ou None
|
||||||
|
"""
|
||||||
|
text_lower = text.lower()
|
||||||
|
best_match = None
|
||||||
|
best_score = 0
|
||||||
|
|
||||||
|
for pattern in self._patterns:
|
||||||
|
if os_filter and pattern.os not in ("any", os_filter):
|
||||||
|
continue
|
||||||
|
|
||||||
|
score = 0
|
||||||
|
matched_trigger = None
|
||||||
|
for trigger in pattern.triggers:
|
||||||
|
if len(trigger) <= 3:
|
||||||
|
import re
|
||||||
|
if re.search(r'\b' + re.escape(trigger) + r'\b', text_lower):
|
||||||
|
trigger_score = len(trigger) / max(len(text_lower), 1)
|
||||||
|
if trigger_score > score:
|
||||||
|
score = trigger_score
|
||||||
|
matched_trigger = trigger
|
||||||
|
elif trigger in text_lower:
|
||||||
|
trigger_score = len(trigger) / max(len(text_lower), 1)
|
||||||
|
if trigger_score > score:
|
||||||
|
score = trigger_score
|
||||||
|
matched_trigger = trigger
|
||||||
|
|
||||||
|
if score > best_score and matched_trigger is not None:
|
||||||
|
best_score = score
|
||||||
|
best_match = {
|
||||||
|
"pattern": pattern.name,
|
||||||
|
"category": pattern.category,
|
||||||
|
"action": pattern.action,
|
||||||
|
"target": pattern.target,
|
||||||
|
"alternatives": pattern.metadata.get("alternatives", []),
|
||||||
|
"typical_zone": pattern.typical_zone,
|
||||||
|
"typical_bbox": pattern.typical_bbox,
|
||||||
|
"confidence": min(pattern.confidence * (1 + score), 1.0),
|
||||||
|
"matched_trigger": matched_trigger,
|
||||||
|
"os": pattern.os,
|
||||||
|
}
|
||||||
|
|
||||||
|
return best_match
|
||||||
|
|
||||||
|
def find_by_category(self, category: str) -> List[Dict[str, Any]]:
|
||||||
|
"""Retourne tous les patterns d'une catégorie."""
|
||||||
|
return [
|
||||||
|
{
|
||||||
|
"name": p.name,
|
||||||
|
"action": p.action,
|
||||||
|
"target": p.target,
|
||||||
|
"triggers": p.triggers,
|
||||||
|
"typical_zone": p.typical_zone,
|
||||||
|
}
|
||||||
|
for p in self._patterns
|
||||||
|
if p.category == category
|
||||||
|
]
|
||||||
|
|
||||||
|
def get_dialog_handler(self, dialog_text: str) -> Optional[Dict[str, Any]]:
|
||||||
|
"""Raccourci : cherche un pattern de dialogue."""
|
||||||
|
match = self.find_pattern(dialog_text)
|
||||||
|
if match and match["category"] == "dialog":
|
||||||
|
return match
|
||||||
|
return self.find_pattern(dialog_text)
|
||||||
|
|
||||||
|
def add_pattern(self, pattern_dict: Dict[str, Any]):
|
||||||
|
"""Ajoute un pattern dynamiquement (ex: appris par observation)."""
|
||||||
|
self._patterns.append(UIPattern(
|
||||||
|
name=pattern_dict["name"],
|
||||||
|
category=pattern_dict.get("category", "learned"),
|
||||||
|
triggers=pattern_dict.get("triggers", []),
|
||||||
|
action=pattern_dict.get("action", "click"),
|
||||||
|
target=pattern_dict.get("target", ""),
|
||||||
|
typical_zone=pattern_dict.get("typical_zone", "content"),
|
||||||
|
typical_bbox=pattern_dict.get("typical_bbox"),
|
||||||
|
os=pattern_dict.get("os", "any"),
|
||||||
|
confidence=pattern_dict.get("confidence", 0.7),
|
||||||
|
metadata={"source": "learned"},
|
||||||
|
))
|
||||||
|
|
||||||
|
def save_to_file(self, path: str):
|
||||||
|
"""Sauvegarde tous les patterns (builtin + appris) dans un fichier."""
|
||||||
|
data = {
|
||||||
|
"patterns": [
|
||||||
|
{
|
||||||
|
"name": p.name,
|
||||||
|
"category": p.category,
|
||||||
|
"triggers": p.triggers,
|
||||||
|
"action": p.action,
|
||||||
|
"target": p.target,
|
||||||
|
"typical_zone": p.typical_zone,
|
||||||
|
"typical_bbox": p.typical_bbox,
|
||||||
|
"os": p.os,
|
||||||
|
"confidence": p.confidence,
|
||||||
|
"metadata": p.metadata,
|
||||||
|
}
|
||||||
|
for p in self._patterns
|
||||||
|
]
|
||||||
|
}
|
||||||
|
with open(path, "w", encoding="utf-8") as f:
|
||||||
|
json.dump(data, f, indent=2, ensure_ascii=False)
|
||||||
|
logger.info(f"Sauvegardé {len(self._patterns)} patterns dans {path}")
|
||||||
|
|
||||||
|
def save_learned_pattern(self, pattern_dict: Dict[str, Any]):
|
||||||
|
"""Persiste un pattern appris par observation Shadow dans learned_patterns.json.
|
||||||
|
|
||||||
|
Le pattern est ajouté en mémoire ET sauvegardé sur disque.
|
||||||
|
Le fichier est créé s'il n'existe pas, ou les patterns existants sont préservés.
|
||||||
|
"""
|
||||||
|
from datetime import datetime as dt
|
||||||
|
|
||||||
|
# Charger le fichier existant ou créer la structure
|
||||||
|
filepath = self._LEARNED_PATTERNS_PATH
|
||||||
|
filepath.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
existing: Dict[str, Any] = {"patterns": []}
|
||||||
|
if filepath.exists():
|
||||||
|
try:
|
||||||
|
with open(filepath, encoding="utf-8") as f:
|
||||||
|
existing = json.load(f)
|
||||||
|
except (json.JSONDecodeError, OSError):
|
||||||
|
logger.warning(f"Fichier {filepath} corrompu, recréation")
|
||||||
|
|
||||||
|
# Vérifier qu'on ne duplique pas (même trigger + même target)
|
||||||
|
new_triggers = set(t.lower() for t in pattern_dict.get("triggers", []))
|
||||||
|
new_target = pattern_dict.get("target", "").lower()
|
||||||
|
for existing_p in existing.get("patterns", []):
|
||||||
|
existing_triggers = set(t.lower() for t in existing_p.get("triggers", []))
|
||||||
|
if existing_triggers == new_triggers and existing_p.get("target", "").lower() == new_target:
|
||||||
|
logger.debug(f"Pattern déjà connu, skip: triggers={new_triggers}, target={new_target}")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Numéroter automatiquement et construire l'entrée complète
|
||||||
|
count = len(existing.get("patterns", []))
|
||||||
|
entry = {
|
||||||
|
"name": pattern_dict.get("name", f"learned_dialog_{count + 1:03d}"),
|
||||||
|
"category": pattern_dict.get("category", "dialog"),
|
||||||
|
"triggers": pattern_dict.get("triggers", []),
|
||||||
|
"action": pattern_dict.get("action", "click"),
|
||||||
|
"target": pattern_dict.get("target", ""),
|
||||||
|
"os": pattern_dict.get("os", "windows"),
|
||||||
|
"source": "shadow_learning",
|
||||||
|
"learned_at": dt.now().isoformat(timespec="seconds"),
|
||||||
|
"confidence": pattern_dict.get("confidence", 0.8),
|
||||||
|
}
|
||||||
|
|
||||||
|
# Ajouter en mémoire (avec le nom auto-généré)
|
||||||
|
self.add_pattern(entry)
|
||||||
|
existing.setdefault("patterns", []).append(entry)
|
||||||
|
|
||||||
|
with open(filepath, "w", encoding="utf-8") as f:
|
||||||
|
json.dump(existing, f, indent=2, ensure_ascii=False)
|
||||||
|
logger.info(f"Pattern appris sauvegardé: {entry['name']} → {entry['target']}")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def stats(self) -> Dict[str, int]:
|
||||||
|
from collections import Counter
|
||||||
|
cats = Counter(p.category for p in self._patterns)
|
||||||
|
return {"total": len(self._patterns), "by_category": dict(cats)}
|
||||||
15
core/llm/__init__.py
Normal file
15
core/llm/__init__.py
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
"""Modules LLM (clients Ollama et décisionnels métier) + extracteur OCR."""
|
||||||
|
|
||||||
|
from .t2a_decision import (
|
||||||
|
PROMPT_TEMPLATE,
|
||||||
|
DEFAULT_MODEL,
|
||||||
|
analyze_dpi,
|
||||||
|
)
|
||||||
|
from .ocr_extractor import extract_text_from_image
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"PROMPT_TEMPLATE",
|
||||||
|
"DEFAULT_MODEL",
|
||||||
|
"analyze_dpi",
|
||||||
|
"extract_text_from_image",
|
||||||
|
]
|
||||||
71
core/llm/ocr_extractor.py
Normal file
71
core/llm/ocr_extractor.py
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
"""Extracteur OCR — texte depuis une image (screenshot d'écran).
|
||||||
|
|
||||||
|
Utilise EasyOCR fr+en. Singleton (chargement modèle ~3s au premier appel).
|
||||||
|
|
||||||
|
Conçu pour le pipeline streaming serveur (action `extract_text`) : récupère
|
||||||
|
un screenshot fresh (dernier heartbeat ou capture forcée), applique l'OCR,
|
||||||
|
retourne le texte concaténé pour analyse downstream (ex: t2a_decision).
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import logging
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional, Tuple
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
_easyocr_reader = None
|
||||||
|
|
||||||
|
|
||||||
|
def _get_reader():
|
||||||
|
"""Initialise EasyOCR fr+en au premier appel (singleton)."""
|
||||||
|
global _easyocr_reader
|
||||||
|
if _easyocr_reader is None:
|
||||||
|
import easyocr
|
||||||
|
try:
|
||||||
|
_easyocr_reader = easyocr.Reader(['fr', 'en'], gpu=True, verbose=False)
|
||||||
|
logger.info("EasyOCR initialisé (fr+en, GPU)")
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning("EasyOCR GPU indisponible (%s), fallback CPU", e)
|
||||||
|
_easyocr_reader = easyocr.Reader(['fr', 'en'], gpu=False, verbose=False)
|
||||||
|
return _easyocr_reader
|
||||||
|
|
||||||
|
|
||||||
|
def extract_text_from_image(
|
||||||
|
image_path: str,
|
||||||
|
region: Optional[Tuple[int, int, int, int]] = None,
|
||||||
|
paragraph: bool = True,
|
||||||
|
) -> str:
|
||||||
|
"""Extrait le texte d'une image via EasyOCR.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
image_path: chemin du PNG sur disque.
|
||||||
|
region: (x, y, w, h) pour cropper avant OCR. None = image entière.
|
||||||
|
paragraph: True pour regrouper les lignes en paragraphes (lisible),
|
||||||
|
False pour blocs séparés (granulaire).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Texte concaténé. Chaque ligne / paragraphe est séparé par un saut de ligne.
|
||||||
|
En cas d'erreur, retourne une chaîne vide et log un warning.
|
||||||
|
"""
|
||||||
|
path = Path(image_path)
|
||||||
|
if not path.exists():
|
||||||
|
logger.warning("extract_text: fichier introuvable %s", image_path)
|
||||||
|
return ""
|
||||||
|
|
||||||
|
try:
|
||||||
|
from PIL import Image
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
img = Image.open(path)
|
||||||
|
if region:
|
||||||
|
x, y, w, h = region
|
||||||
|
img = img.crop((x, y, x + w, y + h))
|
||||||
|
|
||||||
|
reader = _get_reader()
|
||||||
|
results = reader.readtext(np.array(img), detail=0, paragraph=paragraph)
|
||||||
|
return "\n".join(str(r).strip() for r in results if r)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning("extract_text échoué sur %s : %s", image_path, e)
|
||||||
|
return ""
|
||||||
168
core/llm/t2a_decision.py
Normal file
168
core/llm/t2a_decision.py
Normal file
@@ -0,0 +1,168 @@
|
|||||||
|
"""Aide à la décision de facturation urgences T2A/PMSI via LLM local.
|
||||||
|
|
||||||
|
Décide si un passage aux urgences relève :
|
||||||
|
- du FORFAIT_URGENCE (passage simple, retour à domicile)
|
||||||
|
- de la REQUALIFICATION_HOSPITALISATION (séjour MCO, valorisation 1k-5k€+)
|
||||||
|
|
||||||
|
Le prompt impose une extraction littérale des faits du DPI (pas d'invention)
|
||||||
|
et une modulation honnête de la confiance. Validé sur 15 DPI synthétiques :
|
||||||
|
qwen2.5:7b atteint 100 % d'accuracy en ~5 s/cas avec 4,7 Go VRAM.
|
||||||
|
|
||||||
|
Voir docs/clients/ght_sud_95/ et demo/facturation_urgences/RESULTATS.md pour le
|
||||||
|
bench comparatif des 11 LLMs évalués.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import time
|
||||||
|
import urllib.error
|
||||||
|
import urllib.request
|
||||||
|
from typing import Any, Dict
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
OLLAMA_URL = os.environ.get("OLLAMA_URL", "http://localhost:11434/api/generate")
|
||||||
|
DEFAULT_MODEL = os.environ.get("T2A_MODEL", "qwen2.5:7b")
|
||||||
|
DEFAULT_TIMEOUT = 60 # secondes
|
||||||
|
|
||||||
|
PROMPT_TEMPLATE = """Tu es médecin DIM (Département d'Information Médicale), expert en facturation T2A/PMSI aux urgences hospitalières en France.
|
||||||
|
|
||||||
|
Analyse le dossier patient ci-dessous pour déterminer si le passage relève :
|
||||||
|
- FORFAIT_URGENCE : passage simple, retour à domicile, sans surveillance prolongée ni soins continus
|
||||||
|
- REQUALIFICATION_HOSPITALISATION : séjour MCO requis selon les 3 critères PMSI/ATIH
|
||||||
|
|
||||||
|
LES 3 CRITÈRES UHCD (au moins 2 sur 3 validés ⇒ REQUALIFICATION) :
|
||||||
|
1. Pathologie potentiellement évolutive (instabilité hémodynamique, terrain à risque, traitement nécessitant adaptation)
|
||||||
|
2. Surveillance médicale et paramédicale prolongée (constantes itératives, observations IDE/médecin, durée > 6 h)
|
||||||
|
3. Examens complémentaires ou actes thérapeutiques (biologie, imagerie, sutures, gestes techniques)
|
||||||
|
|
||||||
|
INSTRUCTIONS STRICTES :
|
||||||
|
1. N'utilise QUE des éléments littéralement présents dans le dossier patient. N'invente AUCUN critère.
|
||||||
|
2. Pour CHAQUE critère (1, 2, 3), tu DOIS produire un texte de preuve qui contient AU MOINS UNE CITATION LITTÉRALE du dossier entre guillemets français « ... ». Exemple : « FC à 110 bpm, TA 92/60 ».
|
||||||
|
3. Si le critère est NON validé, ne renvoie JAMAIS un fallback creux : explique factuellement ce qui manque, en citant le dossier (ex: « Sortie à H+2 », « Aucun acte technique au compte-rendu »).
|
||||||
|
4. Le texte de chaque preuve fait 2-3 phrases : (i) la citation littérale, (ii) l'analyse PMSI, (iii) la conclusion validé/non validé.
|
||||||
|
5. Calcule la durée totale du passage en heures (admission → sortie/transfert) à partir des horaires du dossier.
|
||||||
|
6. Module ta confiance honnêtement :
|
||||||
|
- "elevee" uniquement si tous les indices convergent
|
||||||
|
- "moyenne" si éléments ambivalents
|
||||||
|
- "faible" si information manquante ou très atypique
|
||||||
|
|
||||||
|
Réponds STRICTEMENT en JSON valide, sans texte avant ni après :
|
||||||
|
{{
|
||||||
|
"duree_passage_heures": <nombre>,
|
||||||
|
"elements_pour_hospitalisation": [<phrases littéralement extraites du dossier>],
|
||||||
|
"elements_pour_forfait": [<phrases littéralement extraites du dossier>],
|
||||||
|
"decision": "FORFAIT_URGENCE" | "REQUALIFICATION_HOSPITALISATION",
|
||||||
|
"decision_court": "UHCD" | "Forfait Urgences",
|
||||||
|
"preuve_critere1": "<2-3 phrases incluant AU MOINS UNE citation littérale entre « » (motif, symptôme, terrain à risque, traitement). Si non validé : factualise ce qui manque en citant le dossier.>",
|
||||||
|
"critere1_valide": true | false,
|
||||||
|
"preuve_critere2": "<2-3 phrases incluant AU MOINS UNE citation littérale entre « » (constantes, observations IDE, durée surveillance). Si non validé : factualise.>",
|
||||||
|
"critere2_valide": true | false,
|
||||||
|
"preuve_critere3": "<2-3 phrases incluant AU MOINS UNE citation littérale entre « » (actes/examens : biologie, imagerie, suture, etc.). Si non validé : factualise.>",
|
||||||
|
"critere3_valide": true | false,
|
||||||
|
"justification": "<2-3 phrases synthétiques s'appuyant explicitement sur les preuves ci-dessus, avec au moins une citation>",
|
||||||
|
"confiance": "elevee" | "moyenne" | "faible"
|
||||||
|
}}
|
||||||
|
|
||||||
|
DOSSIER PATIENT :
|
||||||
|
{dpi}
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_dpi(
|
||||||
|
dpi_text: str,
|
||||||
|
model: str = DEFAULT_MODEL,
|
||||||
|
timeout: int = DEFAULT_TIMEOUT,
|
||||||
|
ollama_url: str = OLLAMA_URL,
|
||||||
|
) -> Dict[str, Any]:
|
||||||
|
"""Soumet un DPI urgences à un LLM Ollama et retourne la décision JSON.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
dpi_text: Texte du dossier patient (concaténation des onglets ou DPI brut).
|
||||||
|
model: Modèle Ollama à utiliser (default qwen2.5:7b — 100% accuracy bench).
|
||||||
|
timeout: Timeout HTTP en secondes.
|
||||||
|
ollama_url: Endpoint Ollama (default localhost:11434/api/generate).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict avec :
|
||||||
|
decision: "FORFAIT_URGENCE" | "REQUALIFICATION_HOSPITALISATION"
|
||||||
|
elements_pour_hospitalisation: List[str]
|
||||||
|
elements_pour_forfait: List[str]
|
||||||
|
duree_passage_heures: float
|
||||||
|
justification: str
|
||||||
|
confiance: "elevee" | "moyenne" | "faible"
|
||||||
|
_elapsed_s: float (latence)
|
||||||
|
_model: str
|
||||||
|
En cas d'erreur :
|
||||||
|
{"_error": str, "_elapsed_s": float} (réseau / Ollama indisponible)
|
||||||
|
{"_parse_error": True, "_raw": str, "_elapsed_s": float} (JSON invalide)
|
||||||
|
"""
|
||||||
|
payload = {
|
||||||
|
"model": model,
|
||||||
|
"prompt": PROMPT_TEMPLATE.format(dpi=dpi_text),
|
||||||
|
"stream": False,
|
||||||
|
"format": "json",
|
||||||
|
"keep_alive": "5m",
|
||||||
|
"options": {
|
||||||
|
"temperature": 0.1,
|
||||||
|
"num_predict": 1500,
|
||||||
|
"num_ctx": 16384,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
data = json.dumps(payload).encode("utf-8")
|
||||||
|
req = urllib.request.Request(
|
||||||
|
ollama_url,
|
||||||
|
data=data,
|
||||||
|
headers={"Content-Type": "application/json"},
|
||||||
|
method="POST",
|
||||||
|
)
|
||||||
|
t0 = time.time()
|
||||||
|
try:
|
||||||
|
with urllib.request.urlopen(req, timeout=timeout) as resp:
|
||||||
|
body = json.loads(resp.read().decode("utf-8"))
|
||||||
|
except (urllib.error.URLError, TimeoutError, ConnectionError) as e:
|
||||||
|
elapsed = round(time.time() - t0, 1)
|
||||||
|
logger.warning("analyze_dpi: Ollama indisponible (%s) après %.1fs", e, elapsed)
|
||||||
|
return {"_error": str(e), "_elapsed_s": elapsed, "_model": model}
|
||||||
|
|
||||||
|
elapsed = time.time() - t0
|
||||||
|
|
||||||
|
raw_response = body.get("response", "").strip()
|
||||||
|
raw_thinking = body.get("thinking", "").strip()
|
||||||
|
|
||||||
|
candidates = [raw_response]
|
||||||
|
if not raw_response and raw_thinking:
|
||||||
|
last_close = raw_thinking.rfind("}")
|
||||||
|
last_open = raw_thinking.rfind("{", 0, last_close)
|
||||||
|
if last_open != -1 and last_close != -1:
|
||||||
|
candidates.append(raw_thinking[last_open:last_close + 1])
|
||||||
|
|
||||||
|
parsed = None
|
||||||
|
for cand in candidates:
|
||||||
|
cleaned = cand
|
||||||
|
if cleaned.startswith("```"):
|
||||||
|
cleaned = cleaned.split("\n", 1)[-1]
|
||||||
|
if cleaned.endswith("```"):
|
||||||
|
cleaned = cleaned.rsplit("```", 1)[0]
|
||||||
|
cleaned = cleaned.strip()
|
||||||
|
try:
|
||||||
|
parsed = json.loads(cleaned)
|
||||||
|
break
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if parsed is None:
|
||||||
|
return {
|
||||||
|
"_parse_error": True,
|
||||||
|
"_raw": (raw_response or raw_thinking)[:500],
|
||||||
|
"_elapsed_s": round(elapsed, 1),
|
||||||
|
"_model": model,
|
||||||
|
}
|
||||||
|
|
||||||
|
parsed["_elapsed_s"] = round(elapsed, 1)
|
||||||
|
parsed["_model"] = model
|
||||||
|
parsed["_eval_count"] = body.get("eval_count")
|
||||||
|
return parsed
|
||||||
@@ -137,10 +137,14 @@ class WorkflowPipeline:
|
|||||||
else:
|
else:
|
||||||
logger.warning(f"UI Detector not available: {e}")
|
logger.warning(f"UI Detector not available: {e}")
|
||||||
|
|
||||||
# 6. Graph Builder
|
# 6. Graph Builder — reçoit l'UIDetector pour enrichir les
|
||||||
|
# ScreenStates avec ui_elements + OCR pendant _create_screen_states.
|
||||||
|
# Sans ça, les TargetSpec ne peuvent pas être ancrés (by_role=unknown).
|
||||||
self.graph_builder = GraphBuilder(
|
self.graph_builder = GraphBuilder(
|
||||||
embedding_builder=self.embedding_builder,
|
embedding_builder=self.embedding_builder,
|
||||||
faiss_manager=self.faiss_manager
|
faiss_manager=self.faiss_manager,
|
||||||
|
ui_detector=self.ui_detector,
|
||||||
|
enable_ui_enrichment=enable_ui_detection,
|
||||||
)
|
)
|
||||||
logger.info("✓ Graph Builder initialized")
|
logger.info("✓ Graph Builder initialized")
|
||||||
|
|
||||||
|
|||||||
@@ -1,327 +0,0 @@
|
|||||||
e)a, field_namg(datin_loggsanitize_fordator.valieturn r()
|
|
||||||
or_validatet_inputalidator = g""
|
|
||||||
v
|
|
||||||
"iséesnées sanit Don
|
|
||||||
Returns:
|
|
||||||
amp
|
|
||||||
chNom du ame: field_ntiser
|
|
||||||
s à saniata: Donnée d
|
|
||||||
|
|
||||||
Args:ging.
|
|
||||||
le loges pours donnéSanitise de """
|
|
||||||
-> str:
|
|
||||||
"data") me: str = nay, field_ta: An(da_loggingize_for sanita
|
|
||||||
|
|
||||||
|
|
||||||
defarsed_dat return p
|
|
||||||
")
|
|
||||||
errors)}t.uljoin(res {'; '.ed:ion failalidator(f"JSON vlidationErrise InputVa ralid:
|
|
||||||
is_vat.not resul if
|
|
||||||
")
|
|
||||||
"json_datafield_name=e, th=max_sizr, max_lengring(json_stalidate_stvalidator.vt =
|
|
||||||
resuldata)s(parsed_on.dump = js json_strtor()
|
|
||||||
put_validaet_in gidator =s
|
|
||||||
vales injectionur lontenu poider le c
|
|
||||||
# Valt")
|
|
||||||
dicng orbe strimust N data "JSOionError(putValidat raise In se:
|
|
||||||
|
|
||||||
elson_data_data = jparsed")
|
|
||||||
size}max_ze of { maximum siexceedsN data rror(f"JSOValidationEaise Input r_size:
|
|
||||||
lized) > maxlen(seria if a)
|
|
||||||
s(json_dat json.dumpalized =eri sialisée
|
|
||||||
ére sla taillrifier # Véct):
|
|
||||||
ata, di_de(jsonncsinsta elif i
|
|
||||||
t: {e}") JSON formaidror(f"InvalErdationalise InputV raie:
|
|
||||||
ror as JSONDecodeErt json. excep n_data)
|
|
||||||
loads(jsojson.= d_data parse
|
|
||||||
try:
|
|
||||||
size}")
|
|
||||||
{max_mum size of axiceeds m data exONor(f"JSrrtionEputValidaise In ra
|
|
||||||
max_size:a) >(json_datf len i
|
|
||||||
data, str):json_isinstance( if ""
|
|
||||||
" invalides
|
|
||||||
sont ess donnéSi letionError: InputValida s:
|
|
||||||
Raise
|
|
||||||
|
|
||||||
ON validéess JS Donnéeurns:
|
|
||||||
|
|
||||||
Ret s
|
|
||||||
n caractèremale exille maax_size: Tai mou dict)
|
|
||||||
string nnées JSON (: Do_data json
|
|
||||||
|
|
||||||
Args: .
|
|
||||||
nnées JSONdo Valide des "
|
|
||||||
|
|
||||||
"") -> dict:= 10000x_size: int t], man[str, dicnion_data: Uput(jsoe_json_inalidat
|
|
||||||
|
|
||||||
|
|
||||||
def ved_pathurn normaliz ret
|
|
||||||
|
|
||||||
")ath}malized_pories: {norwed directllon apath not ior(f"File ionErratlide InputVa rais ):
|
|
||||||
rslowed_di_dir in al for allowedr)d_diallowe.startswith(_obj)str(pathot any( if n)
|
|
||||||
alized_pathPath(normpath_obj = :
|
|
||||||
_dirsif allowed
|
|
||||||
i spécifiésautorisés soires répertrifier lesVé
|
|
||||||
# ")
|
|
||||||
xt}n: {file_extensio engerous filer(f"DaolationErroyVi Securit raisensions:
|
|
||||||
xtegerous_ext in danf file_e()
|
|
||||||
ix.lowerath).suffied_pnormalizxt = Path( file_e p', '.sh'}
|
|
||||||
.ph', ' '.jscr', '.vbs', '.s, '.cmd',xe', '.bat'{'.ensions = ngerous_exte dauses
|
|
||||||
angereons densies exter l Vérifi
|
|
||||||
#_path}")
|
|
||||||
{file detected:attemptl raversa t"Pathrror(fationEyViol Securitise ra"/"):
|
|
||||||
ith(path.startswd_or normalizelized_path in norma ".." ifl
|
|
||||||
rsaraveh tives de patntat les teVérifier # )
|
|
||||||
|
|
||||||
_pathle.normpath(fih = os.pathpatrmalized_ noin
|
|
||||||
ser le chem# Normali
|
|
||||||
ng")
|
|
||||||
t be a strile path mus"Fir(dationErroalise InputV raitr):
|
|
||||||
th, se_pailsinstance(ft i if no
|
|
||||||
"""
|
|
||||||
ngereux dae chemin estError: Si lionnputValidat I
|
|
||||||
aises:
|
|
||||||
R
|
|
||||||
sénormalit min validé e Che
|
|
||||||
Returns:
|
|
||||||
|
|
||||||
orisésutres ars: Répertoilowed_di al valider
|
|
||||||
n àhemie_path: C filgs:
|
|
||||||
Ar
|
|
||||||
chier.
|
|
||||||
hemin de fialide un c V"
|
|
||||||
" ":
|
|
||||||
trne) -> s No] =str]List[ional[rs: Optwed_di: str, allole_pathath_input(fifile_plidate_vae
|
|
||||||
|
|
||||||
|
|
||||||
def ized_valuresult.sanitreturn
|
|
||||||
|
|
||||||
.errors)}").join(resulte}: {'; 'field_named for {dation failf"ValinError(idatio InputValserai is_valid:
|
|
||||||
t.ul not res
|
|
||||||
if_name)
|
|
||||||
_html, fieldength, allow, max_lring(valuealidate_stidator.vval = resultor()
|
|
||||||
idatt_input_valator = ge"
|
|
||||||
valid""ue
|
|
||||||
échotionlidai la vaor: SdationErrnputVali Is:
|
|
||||||
se
|
|
||||||
Rai
|
|
||||||
nitisée sa Valeureturns:
|
|
||||||
R
|
|
||||||
p
|
|
||||||
du chamm d_name: No fiel HTML
|
|
||||||
oriser leow_html: Aut all ximale
|
|
||||||
Longueur mamax_length: r
|
|
||||||
r à valideue: Valeu val Args:
|
|
||||||
|
|
||||||
|
|
||||||
ée string.e une entranitisalide et s
|
|
||||||
V"""r:
|
|
||||||
t") -> st= "inpue: str e, field_namalsool = Fw_html: b allo
|
|
||||||
1000, ength: int =max_lvalue: str, ut(ing_inpvalidate_str
|
|
||||||
|
|
||||||
|
|
||||||
def r_instancern _validato)
|
|
||||||
retudator(alie = InputVancinstalidator_ _v one:
|
|
||||||
tance is Nor_insf _validat
|
|
||||||
itancer_insal _validatolob"
|
|
||||||
g""r
|
|
||||||
alidateuu vstance d Inturns:
|
|
||||||
Re
|
|
||||||
r.
|
|
||||||
teuida du valobaleinstance glourne l' Ret""
|
|
||||||
"or:
|
|
||||||
lidatputVa-> Inr() dato_valit_inputef geNone
|
|
||||||
|
|
||||||
|
|
||||||
d= ] putValidatoronal[Inance: Optilidator_instidateur
|
|
||||||
_va du val globalencesta
|
|
||||||
# In )
|
|
||||||
|
|
||||||
}"
|
|
||||||
_valuezedue: {saniti f"Val . "
|
|
||||||
field_name}ype} in {ation_tvioltected: {iolation dey vf"Securit rning(
|
|
||||||
ger.wa logame)
|
|
||||||
e, field_ng(valuor_logginf.sanitize_f selalue =tized_v sani""
|
|
||||||
té."ride sécuion violatg une Lo """:
|
|
||||||
ny) -> Nonevalue: A_name: str, ldier, fn_type: stolatioon(self, viati_violitylog_secur _
|
|
||||||
def _}]"
|
|
||||||
e_(data).__namntable:{typeme}[unpri{field_nareturn f"
|
|
||||||
ion:cept Except ex
|
|
||||||
ata_str
|
|
||||||
turn d re
|
|
||||||
tr)
|
|
||||||
scape(data_s html.e data_str =
|
|
||||||
dangereuxres es caractèhapper l # Éc
|
|
||||||
."
|
|
||||||
"..r[:200] + ata_stata_str = d d
|
|
||||||
0:r) > 20ata_st if len(d s
|
|
||||||
our les log taille pr la # Limite
|
|
||||||
|
|
||||||
ta)r(dastr = st data_ else:
|
|
||||||
|
|
||||||
, ':')),'s=('eparatore, s_ascii=Trunsurea, e(dat.dumps json = data_str
|
|
||||||
ct, list)): (dia,nstance(datsi if i
|
|
||||||
try:le
|
|
||||||
aila tter lg et limi en strinonvertir # C
|
|
||||||
]"
|
|
||||||
{len(data)}_}:size=a).__name_(dattypeme}[{{field_naturn f" re :
|
|
||||||
))istta, (dict, ltance(daisinsif el )}]"
|
|
||||||
lue(datave_vasensitish:{hash_e}[haield_namf"{f return
|
|
||||||
> 20:d len(data)str) ane(data, sinstanc if is
|
|
||||||
ensiblenées ss donhasher lerisé, En mode sécu # itive:
|
|
||||||
ensself.log_s not if ""
|
|
||||||
|
|
||||||
"r logging pouestisénées saniDon
|
|
||||||
Returns:
|
|
||||||
|
|
||||||
pom du chameld_name: N fi er
|
|
||||||
itis sanes àata: Donné d gs:
|
|
||||||
Ar
|
|
||||||
sécurisé.
|
|
||||||
le logging pouronnéess dnitise de Sa ""
|
|
||||||
" ) -> str:
|
|
||||||
ata"tr = "dd_name: sy, fiel: Anlf, dataging(seogze_for_lef saniti
|
|
||||||
dngs)
|
|
||||||
ors, warninitized, err sa_valid,ult(isationReslid return Va
|
|
||||||
s) == 0error= len(valid is_
|
|
||||||
itized)
|
|
||||||
, san7F]', ''\x1F\x0C\x0E-\x0B8\x0-\x0r'[\x0e.sub(= r sanitized ôle
|
|
||||||
ntrctères de cocaraoyer les # Nett
|
|
||||||
|
|
||||||
anitized).escape(s = html sanitized :
|
|
||||||
allow_html if not ire
|
|
||||||
si nécessatizer HTML# Sani
|
|
||||||
)
|
|
||||||
"SQL patternspicious Noains suntld_name} cofiepend(f"{ngs.ap warni else:
|
|
||||||
|
|
||||||
value)e,nam", field_ attemptionjectQL inlation("NoSecurity_vioog_s._l self ")
|
|
||||||
ernection pattl NoSQL injs potentiae} containd_nam{fiel(f"penderrors.ap
|
|
||||||
_mode:lf.strictse if lue):
|
|
||||||
(vaern.searchif patt ns:
|
|
||||||
atterf._nosql_prn in selte for patSQL
|
|
||||||
njections Nofier les i # Véri
|
|
||||||
")
|
|
||||||
QL pattern Suspiciousontains seld_name} c{fiappend(f"arnings. w:
|
|
||||||
else e)
|
|
||||||
, valu_nameeld, fipt"ection attem"SQL injiolation(security_vg_loself._ )
|
|
||||||
on pattern"L injectiotential SQontains p_name} c"{fieldppend(f.aors err e:
|
|
||||||
.strict_modself if alue):
|
|
||||||
rn.search(vatteif p patterns:
|
|
||||||
sql_f._eln spattern i for ons SQL
|
|
||||||
tir les injecVérifie #
|
|
||||||
|
|
||||||
x_length] value[:matized = sani ers")
|
|
||||||
th} charact{max_lengcated to _name} trunf"{fieldend(s.app warning else:
|
|
||||||
|
|
||||||
}")ax_length{mf length oimum eeds maxe} exc"{field_nam(fpend errors.ap ct_mode:
|
|
||||||
f self.stri ih:
|
|
||||||
lengtalue) > max_ if len(vueur
|
|
||||||
longVérifier la
|
|
||||||
# s)
|
|
||||||
ors, warningne, errt(False, NoonResulidati return Val tring")
|
|
||||||
t be a smusd_name} f"{fielrs.append( erro
|
|
||||||
, str):ce(valueisinstan if not
|
|
||||||
ue
|
|
||||||
d = valanitize sgs = []
|
|
||||||
nin war
|
|
||||||
errors = []"
|
|
||||||
"" alidation
|
|
||||||
vt de Résulta eturns:
|
|
||||||
R
|
|
||||||
s
|
|
||||||
our les logdu champ pNom : ld_name fie HTML
|
|
||||||
toriser le w_html: Au allo e
|
|
||||||
aximalgueur mh: Lonengt max_lder
|
|
||||||
valiue: Valeur à val:
|
|
||||||
Args
|
|
||||||
.
|
|
||||||
tèresde carac chaîne Valide une"
|
|
||||||
"" lt:
|
|
||||||
esuValidationRput") -> : str = "infield_name= False, tml: bool allow_h ,
|
|
||||||
000h: int = 1 max_lengtstr,f, value: (selring validate_st def
|
|
||||||
ERNS]
|
|
||||||
TTN_PAJECTIOlf.NOSQL_INttern in seor paE) fCASe.IGNOREttern, re(pa.compil= [rerns patteself._nosql_ RNS]
|
|
||||||
TE_PATL_INJECTION in self.SQfor patternNORECASE) re.IGtern,compile(pate. = [rerns_sql_pattf. selformance
|
|
||||||
pour pers patterns lepiler # Com
|
|
||||||
ata
|
|
||||||
ive_d.log_sensitive = configsit_sen self.log
|
|
||||||
ationinput_valid.strict_se configels not None _mode istrictct_mode if striict_mode = self.str nfig()
|
|
||||||
security_coig = get_ conf""
|
|
||||||
"g)
|
|
||||||
selon confi auto (None =strictde: Mode strict_mo
|
|
||||||
Args:
|
|
||||||
|
|
||||||
ur.datese le vali Initiali """
|
|
||||||
:
|
|
||||||
one)l] = N[boo: Optionalt_mode stric_(self,it_def __in
|
|
||||||
]
|
|
||||||
)"
|
|
||||||
\.|db\.is r"(th
|
|
||||||
\})",\s*\$.* r"(\{
|
|
||||||
meout\b)",etTil\b|\bs\(|\bevaction\s*"(funr nin)",
|
|
||||||
in|\$gt|\$lt|\$\$e|\$regex|\$n"(\$where| r [
|
|
||||||
TTERNS =CTION_PAL_INJEOSQ N n NoSQL
|
|
||||||
ctiour injengereux poatterns da # P]
|
|
||||||
|
|
||||||
"
|
|
||||||
b)\qlbsp_executes"(\
|
|
||||||
r",dshell\b)bxp_cm r"(\
|
|
||||||
)",[\'\";]r"( )\b)",
|
|
||||||
ONERRORAD|T|ONLOBSCRIP|VIPTAVASCRSCRIPT|J(\b( r" */)",
|
|
||||||
--|#|/\*|\ r"( ",
|
|
||||||
+)s*=\s*\d\AND)\s+\d+(UNION|OR|\b r"(
|
|
||||||
b)",\UTE)EXEC|EXECE|ALTER|OP|CREATDRELETE|ERT|UPDATE|Db(SELECT|INS r"(\
|
|
||||||
RNS = [N_PATTE_INJECTIOSQL
|
|
||||||
SQLnjection ereux pour irns dangtte# Pa
|
|
||||||
|
|
||||||
""teur."s utilisaeur d'entréeidatVal"" "ator:
|
|
||||||
Valids Inputclas
|
|
||||||
|
|
||||||
pass
|
|
||||||
""
|
|
||||||
ée."tectécurité déolation de s"Vi"" Error):
|
|
||||||
tValidationnError(InpuyViolatioSecurit
|
|
||||||
|
|
||||||
class pass
|
|
||||||
"
|
|
||||||
rée.""nton d'ealidatieur de v""Err "
|
|
||||||
ion):r(ExceptidationErroputValass In= []
|
|
||||||
|
|
||||||
|
|
||||||
clf.warnings sel:
|
|
||||||
None isarnings self.w ifors = []
|
|
||||||
elf.err sne:
|
|
||||||
is Nororser if self.
|
|
||||||
lf):init__(seost_def __p
|
|
||||||
r]
|
|
||||||
[sts: Listningwar[str]
|
|
||||||
istrs: L erroue: Any
|
|
||||||
ed_val sanitiz: bool
|
|
||||||
lid
|
|
||||||
is_va"""
|
|
||||||
une entrée.dation d' de valitat"Résul""lt:
|
|
||||||
ationResuclass Validaclass
|
|
||||||
dat
|
|
||||||
|
|
||||||
@_)
|
|
||||||
ame_etLogger(__ngging.g
|
|
||||||
logger = lolue
|
|
||||||
ive_vaash_sensitonfig, h_cecurityimport get_srity_config .secu
|
|
||||||
|
|
||||||
from dataclassrtpoimdataclasses
|
|
||||||
from Union, SetOptional,, List, Any, Dict import ng
|
|
||||||
from typirt Pathimpoib thlfrom pajson
|
|
||||||
|
|
||||||
import l htmortlogging
|
|
||||||
impe
|
|
||||||
import port r
|
|
||||||
imrt ospo"
|
|
||||||
|
|
||||||
im"ggées
|
|
||||||
"données loization des 7.4: Sanit
|
|
||||||
Exigence s chiers de fin des chemintioalida3: VExigence 7.
|
|
||||||
SQL/NoSQLonsti injeccontre lesion ectotence 7.2: PrExigé.
|
|
||||||
a sécuritur lteur polisatrées utiion des envalidat
|
|
||||||
Système de m
|
|
||||||
stedation Syut Vali"""
|
|
||||||
Inp
|
|
||||||
@@ -1,100 +0,0 @@
|
|||||||
{
|
|
||||||
"workflow_id": "demo_calculator",
|
|
||||||
"name": "Demo - Calculatrice",
|
|
||||||
"description": "Ouvre la calculatrice et effectue un calcul simple",
|
|
||||||
"version": "1.0.0",
|
|
||||||
"created_at": "2024-11-29T10:00:00",
|
|
||||||
"updated_at": "2024-11-29T10:00:00",
|
|
||||||
"learning_state": "OBSERVATION",
|
|
||||||
"execution_count": 0,
|
|
||||||
"entry_nodes": ["start"],
|
|
||||||
"end_nodes": ["end"],
|
|
||||||
"nodes": [
|
|
||||||
{
|
|
||||||
"node_id": "start",
|
|
||||||
"name": "Desktop",
|
|
||||||
"description": "Écran de départ",
|
|
||||||
"template": {
|
|
||||||
"title_pattern": ".*"
|
|
||||||
},
|
|
||||||
"is_entry": true,
|
|
||||||
"is_end": false,
|
|
||||||
"metadata": {}
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"node_id": "calc_open",
|
|
||||||
"name": "Calculatrice ouverte",
|
|
||||||
"description": "La calculatrice est visible",
|
|
||||||
"template": {
|
|
||||||
"title_pattern": ".*(calc|gnome-calculator).*"
|
|
||||||
},
|
|
||||||
"is_entry": false,
|
|
||||||
"is_end": false,
|
|
||||||
"metadata": {}
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"node_id": "end",
|
|
||||||
"name": "Calcul effectué",
|
|
||||||
"description": "Le calcul est affiché",
|
|
||||||
"template": {
|
|
||||||
"title_pattern": ".*"
|
|
||||||
},
|
|
||||||
"is_entry": false,
|
|
||||||
"is_end": true,
|
|
||||||
"metadata": {}
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"edges": [
|
|
||||||
{
|
|
||||||
"edge_id": "open_calc",
|
|
||||||
"source_node": "start",
|
|
||||||
"target_node": "calc_open",
|
|
||||||
"action": {
|
|
||||||
"type": "compound",
|
|
||||||
"target": {
|
|
||||||
"by_role": null,
|
|
||||||
"selection_policy": "first"
|
|
||||||
},
|
|
||||||
"parameters": {
|
|
||||||
"steps": [
|
|
||||||
{"type": "key_press", "key": "super"},
|
|
||||||
{"type": "wait", "duration_ms": 500},
|
|
||||||
{"type": "text_input", "text": "calculator"},
|
|
||||||
{"type": "key_press", "key": "Return"}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"constraints": {
|
|
||||||
"timeout_ms": 5000
|
|
||||||
},
|
|
||||||
"confidence_threshold": 0.7
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"edge_id": "do_calc",
|
|
||||||
"source_node": "calc_open",
|
|
||||||
"target_node": "end",
|
|
||||||
"action": {
|
|
||||||
"type": "text_input",
|
|
||||||
"target": {
|
|
||||||
"by_role": "button",
|
|
||||||
"selection_policy": "first"
|
|
||||||
},
|
|
||||||
"parameters": {
|
|
||||||
"text": "${expression}=",
|
|
||||||
"defaults": {
|
|
||||||
"expression": "2+2"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"constraints": {
|
|
||||||
"timeout_ms": 3000
|
|
||||||
},
|
|
||||||
"confidence_threshold": 0.8
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"author": "RPA Vision V3",
|
|
||||||
"tags": ["demo", "calculator"],
|
|
||||||
"difficulty": "easy"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
19
deploy/configs/config_dev_windows.txt
Normal file
19
deploy/configs/config_dev_windows.txt
Normal file
@@ -0,0 +1,19 @@
|
|||||||
|
# ============================================================
|
||||||
|
# Configuration Lea — Poste Dev / Chef de projet (Windows)
|
||||||
|
# ============================================================
|
||||||
|
#
|
||||||
|
# Poste : PC dev chef de projet
|
||||||
|
# Objectif : enrichir connaissance Windows, evaluer robustesse
|
||||||
|
# Serveur : 192.168.1.40:5005 (RTX 5070)
|
||||||
|
#
|
||||||
|
# ============================================================
|
||||||
|
|
||||||
|
RPA_SERVER_URL=http://192.168.1.40:5005/api/v1
|
||||||
|
RPA_API_TOKEN=86031addb338e449fccdb1a983f61807aec15d42d482b9c7748ad607dc23caab
|
||||||
|
RPA_MACHINE_ID=DEV_WINDOWS
|
||||||
|
RPA_USER_LABEL=Dev
|
||||||
|
|
||||||
|
# --- Parametres avances (ne pas modifier sauf indication) ---
|
||||||
|
# RPA_OLLAMA_HOST=localhost
|
||||||
|
RPA_BLUR_SENSITIVE=false
|
||||||
|
RPA_LOG_RETENTION_DAYS=180
|
||||||
18
deploy/configs/config_pc_fixe_lan.txt
Normal file
18
deploy/configs/config_pc_fixe_lan.txt
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
# ============================================================
|
||||||
|
# Configuration Lea — PC fixe Windows (LAN)
|
||||||
|
# ============================================================
|
||||||
|
#
|
||||||
|
# Poste : PC fixe Windows de Dom
|
||||||
|
# Serveur : 192.168.1.40:5005 (RTX 5070)
|
||||||
|
#
|
||||||
|
# ============================================================
|
||||||
|
|
||||||
|
RPA_SERVER_URL=http://192.168.1.40:5005/api/v1
|
||||||
|
RPA_API_TOKEN=86031addb338e449fccdb1a983f61807aec15d42d482b9c7748ad607dc23caab
|
||||||
|
RPA_MACHINE_ID=PC_WINDOWS_dOM
|
||||||
|
RPA_USER_LABEL=Dom
|
||||||
|
|
||||||
|
# --- Parametres avances (ne pas modifier sauf indication) ---
|
||||||
|
# RPA_OLLAMA_HOST=localhost
|
||||||
|
RPA_BLUR_SENSITIVE=false
|
||||||
|
RPA_LOG_RETENTION_DAYS=180
|
||||||
19
deploy/configs/config_tim_pauline.txt
Normal file
19
deploy/configs/config_tim_pauline.txt
Normal file
@@ -0,0 +1,19 @@
|
|||||||
|
# ============================================================
|
||||||
|
# Configuration Lea — Poste TIM Pauline (LAN Anoust)
|
||||||
|
# ============================================================
|
||||||
|
#
|
||||||
|
# Poste : PC de Pauline (TIM urgences)
|
||||||
|
# Objectif : apprentissage outil metier (DPI OSIRIS)
|
||||||
|
# Serveur : 192.168.1.40:5005 (RTX 5070)
|
||||||
|
#
|
||||||
|
# ============================================================
|
||||||
|
|
||||||
|
RPA_SERVER_URL=http://192.168.1.40:5005/api/v1
|
||||||
|
RPA_API_TOKEN=86031addb338e449fccdb1a983f61807aec15d42d482b9c7748ad607dc23caab
|
||||||
|
RPA_MACHINE_ID=TIM_PAULINE
|
||||||
|
RPA_USER_LABEL=Pauline
|
||||||
|
|
||||||
|
# --- Parametres avances (ne pas modifier sauf indication) ---
|
||||||
|
# RPA_OLLAMA_HOST=localhost
|
||||||
|
RPA_BLUR_SENSITIVE=true
|
||||||
|
RPA_LOG_RETENTION_DAYS=180
|
||||||
18
deploy/configs/config_vm_lan.txt
Normal file
18
deploy/configs/config_vm_lan.txt
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
# ============================================================
|
||||||
|
# Configuration Lea — VM Windows (LAN)
|
||||||
|
# ============================================================
|
||||||
|
#
|
||||||
|
# Poste : VM Windows 11 en reseau local
|
||||||
|
# Serveur : 192.168.1.40:5005 (RTX 5070)
|
||||||
|
#
|
||||||
|
# ============================================================
|
||||||
|
|
||||||
|
RPA_SERVER_URL=http://192.168.1.40:5005/api/v1
|
||||||
|
RPA_API_TOKEN=86031addb338e449fccdb1a983f61807aec15d42d482b9c7748ad607dc23caab
|
||||||
|
RPA_MACHINE_ID=windows_vm
|
||||||
|
RPA_USER_LABEL=Dom2
|
||||||
|
|
||||||
|
# --- Parametres avances (ne pas modifier sauf indication) ---
|
||||||
|
# RPA_OLLAMA_HOST=localhost
|
||||||
|
RPA_BLUR_SENSITIVE=false
|
||||||
|
RPA_LOG_RETENTION_DAYS=180
|
||||||
@@ -22,6 +22,6 @@ USER_NAME=Prenom Nom
|
|||||||
USER_EMAIL=prenom.nom@aivanov.com
|
USER_EMAIL=prenom.nom@aivanov.com
|
||||||
USER_ID=
|
USER_ID=
|
||||||
|
|
||||||
# Connexion serveur (valeurs par defaut deja pre-remplies)
|
# Connexion serveur (remplacer les valeurs CONFIGURE_ME avant utilisation)
|
||||||
SERVER_URL=https://lea.labs.laurinebazin.design/api/v1
|
SERVER_URL=CONFIGURE_ME
|
||||||
API_TOKEN=86031addb338e449fccdb1a983f61807aec15d42d482b9c7748ad607dc23caab
|
API_TOKEN=CONFIGURE_ME
|
||||||
|
|||||||
@@ -8,36 +8,33 @@
|
|||||||
#
|
#
|
||||||
# Les lignes commencant par # sont des commentaires (ignorees).
|
# Les lignes commencant par # sont des commentaires (ignorees).
|
||||||
#
|
#
|
||||||
|
# IMPORTANT : remplacez toutes les valeurs CONFIGURE_ME
|
||||||
|
# avant de lancer Lea. L'agent refusera de demarrer sinon.
|
||||||
|
#
|
||||||
|
# Pour obtenir un config.txt pre-rempli, utilisez le dashboard
|
||||||
|
# Fleet (Menu → Fleet → Telecharger le ZIP d'un agent).
|
||||||
|
#
|
||||||
# ============================================================
|
# ============================================================
|
||||||
|
|
||||||
# Adresse du serveur Lea (URL complete avec /api/v1)
|
# Adresse du serveur Lea (obligatoire — remplacer avant utilisation)
|
||||||
RPA_SERVER_URL=https://lea.labs.laurinebazin.design/api/v1
|
# Exemples :
|
||||||
|
# LAN interne : http://192.168.1.40:5005/api/v1
|
||||||
|
# Internet : https://lea.labs.laurinebazin.design/api/v1
|
||||||
|
# Dev local : http://localhost:5005/api/v1
|
||||||
|
RPA_SERVER_URL=CONFIGURE_ME
|
||||||
|
|
||||||
# Cle d'authentification (fournie par l'administrateur)
|
# Cle d'authentification (fournie par l'administrateur)
|
||||||
RPA_API_TOKEN=86031addb338e449fccdb1a983f61807aec15d42d482b9c7748ad607dc23caab
|
RPA_API_TOKEN=CONFIGURE_ME
|
||||||
|
|
||||||
# Nom du serveur (sans https://, sans /api/v1)
|
# Host Ollama (defaut localhost, ne pas modifier sauf configuration speciale)
|
||||||
RPA_SERVER_HOST=lea.labs.laurinebazin.design
|
# RPA_OLLAMA_HOST=localhost
|
||||||
|
|
||||||
# ============================================================
|
# Identifiant unique de ce poste
|
||||||
# Parametres avances (ne pas modifier sauf indication)
|
RPA_MACHINE_ID=CONFIGURE_ME
|
||||||
# ============================================================
|
|
||||||
|
|
||||||
# Flouter les zones de texte dans les captures cote CLIENT.
|
# Nom du collaborateur associe
|
||||||
#
|
RPA_USER_LABEL=CONFIGURE_ME
|
||||||
# DEPUIS AVRIL 2026 : LE BLUR CLIENT EST DESACTIVE PAR DEFAUT.
|
|
||||||
# Le floutage des donnees sensibles (noms, adresses, telephones, NIR, email)
|
# --- Parametres avances (ne pas modifier sauf indication) ---
|
||||||
# est desormais effectue cote SERVEUR via EDS-NLP + OCR dans le module
|
|
||||||
# core/anonymisation/pii_blur.py.
|
|
||||||
#
|
|
||||||
# Avantages du blur server-side :
|
|
||||||
# - Cible precisement les PII (PERSON/LOCATION/PHONE/NIR/EMAIL)
|
|
||||||
# - Ne casse plus les codes CIM, montants PMSI, identifiants techniques
|
|
||||||
# - Deux versions stockees : _raw (entrainement) + _blurred (affichage)
|
|
||||||
#
|
|
||||||
# Ne remettre a 'true' que si un deploiement specifique l'exige explicitement
|
|
||||||
# (ex : reseau non chiffre entre agent et serveur).
|
|
||||||
RPA_BLUR_SENSITIVE=false
|
RPA_BLUR_SENSITIVE=false
|
||||||
|
|
||||||
# Duree de conservation des logs en jours (minimum 180 pour conformite)
|
|
||||||
RPA_LOG_RETENTION_DAYS=180
|
RPA_LOG_RETENTION_DAYS=180
|
||||||
|
|||||||
28
deploy/systemd/rpa-mockup-easily.service
Normal file
28
deploy/systemd/rpa-mockup-easily.service
Normal file
@@ -0,0 +1,28 @@
|
|||||||
|
[Unit]
|
||||||
|
Description=Maquette Easily Assure (démo GHT Sud 95) - serveur statique HTTP
|
||||||
|
After=network-online.target
|
||||||
|
Wants=network-online.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
User=dom
|
||||||
|
Group=dom
|
||||||
|
WorkingDirectory=/home/dom/ai/rpa_vision_v3/docs/clients/ght_sud_95/mockup_easily_assure
|
||||||
|
ExecStart=/usr/bin/python3 -m http.server 8765 --bind 0.0.0.0
|
||||||
|
|
||||||
|
Restart=on-failure
|
||||||
|
RestartSec=3
|
||||||
|
TimeoutStopSec=10
|
||||||
|
|
||||||
|
NoNewPrivileges=true
|
||||||
|
PrivateTmp=true
|
||||||
|
ProtectSystem=strict
|
||||||
|
ProtectHome=read-only
|
||||||
|
ReadOnlyPaths=/home/dom/ai/rpa_vision_v3/docs/clients/ght_sud_95/mockup_easily_assure
|
||||||
|
|
||||||
|
StandardOutput=journal
|
||||||
|
StandardError=journal
|
||||||
|
SyslogIdentifier=rpa-mockup-easily
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
46
deploy/systemd/rpa-streaming.service
Normal file
46
deploy/systemd/rpa-streaming.service
Normal file
@@ -0,0 +1,46 @@
|
|||||||
|
[Unit]
|
||||||
|
Description=RPA Vision V3 - Streaming Server (FastAPI, port 5005)
|
||||||
|
Documentation=https://lea.labs.laurinebazin.design
|
||||||
|
After=network-online.target
|
||||||
|
Wants=network-online.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
|
||||||
|
# ---- Runtime ----
|
||||||
|
User=dom
|
||||||
|
Group=dom
|
||||||
|
WorkingDirectory=/home/dom/ai/rpa_vision_v3
|
||||||
|
EnvironmentFile=/home/dom/ai/rpa_vision_v3/.env.local
|
||||||
|
Environment="PYTHONUNBUFFERED=1"
|
||||||
|
Environment="RPA_SERVICE_NAME=rpa-streaming"
|
||||||
|
# Service grounding persistant — socket + répertoire d'images partagés via /run/rpa/.
|
||||||
|
Environment="RPA_GROUNDING_SOCKET=/run/rpa/grounding.sock"
|
||||||
|
Environment="RPA_GROUNDING_IMG_DIR=/run/rpa"
|
||||||
|
|
||||||
|
# Lancement via le module Python (même commande que svc.sh)
|
||||||
|
ExecStart=/home/dom/ai/rpa_vision_v3/.venv/bin/python3 -m agent_v0.server_v1.api_stream
|
||||||
|
|
||||||
|
# ---- Resilience ----
|
||||||
|
Restart=on-failure
|
||||||
|
RestartSec=5
|
||||||
|
TimeoutStopSec=30
|
||||||
|
# Envoyer SIGTERM d'abord, puis SIGKILL après TimeoutStopSec
|
||||||
|
KillMode=mixed
|
||||||
|
KillSignal=SIGTERM
|
||||||
|
|
||||||
|
# ---- Hardening (raisonnable pour un poste de dev/prod) ----
|
||||||
|
NoNewPrivileges=true
|
||||||
|
PrivateTmp=true
|
||||||
|
# /run/rpa/ partagé avec rpa-grounding (socket + images)
|
||||||
|
RuntimeDirectory=rpa
|
||||||
|
RuntimeDirectoryMode=0755
|
||||||
|
RuntimeDirectoryPreserve=yes
|
||||||
|
|
||||||
|
# Logs -> journald
|
||||||
|
StandardOutput=journal
|
||||||
|
StandardError=journal
|
||||||
|
SyslogIdentifier=rpa-streaming
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
@@ -7,32 +7,39 @@ Wants=network-online.target
|
|||||||
Type=simple
|
Type=simple
|
||||||
|
|
||||||
# ---- Runtime ----
|
# ---- Runtime ----
|
||||||
User=rpa
|
User=dom
|
||||||
Group=rpa
|
Group=dom
|
||||||
WorkingDirectory=/opt/rpa_vision_v3/server
|
WorkingDirectory=/home/dom/ai/rpa_vision_v3
|
||||||
EnvironmentFile=/etc/rpa_vision_v3/rpa_vision_v3.env
|
EnvironmentFile=/home/dom/ai/rpa_vision_v3/.env.local
|
||||||
Environment="PYTHONUNBUFFERED=1"
|
Environment="PYTHONUNBUFFERED=1"
|
||||||
Environment="ENVIRONMENT=production"
|
Environment="ENVIRONMENT=production"
|
||||||
Environment="RPA_SERVICE_NAME=rpa-vision-v3-api"
|
Environment="RPA_SERVICE_NAME=rpa-vision-v3-api"
|
||||||
|
# Service grounding persistant — socket + répertoire d'images partagés via /run/rpa/.
|
||||||
|
# Si le service rpa-grounding n'est pas démarré, le client retombe automatiquement
|
||||||
|
# sur le subprocess one-shot (cf. ui_tars_grounder.py).
|
||||||
|
Environment="RPA_GROUNDING_SOCKET=/run/rpa/grounding.sock"
|
||||||
|
Environment="RPA_GROUNDING_IMG_DIR=/run/rpa"
|
||||||
|
|
||||||
# Sécurité : valide les secrets (exit !=0 => systemd restart)
|
ExecStart=/home/dom/ai/rpa_vision_v3/.venv/bin/python3 server/api_upload.py
|
||||||
ExecStart=/opt/rpa_vision_v3/venv_v3/bin/python api_upload.py
|
|
||||||
|
|
||||||
# ---- Resilience ----
|
# ---- Resilience ----
|
||||||
Restart=on-failure
|
Restart=on-failure
|
||||||
RestartSec=3
|
RestartSec=3
|
||||||
TimeoutStopSec=30
|
TimeoutStopSec=30
|
||||||
|
|
||||||
# ---- Hardening (raisonnable pour un MVP) ----
|
# ---- Hardening ----
|
||||||
NoNewPrivileges=true
|
NoNewPrivileges=true
|
||||||
PrivateTmp=true
|
PrivateTmp=true
|
||||||
ProtectSystem=strict
|
# /run/rpa/ partagé avec rpa-grounding pour le socket et les images grounding.
|
||||||
ProtectHome=true
|
# Le service rpa-grounding crée le répertoire ; ici on l'expose au /run du service.
|
||||||
ReadWritePaths=/opt/rpa_vision_v3/data /opt/rpa_vision_v3/logs
|
RuntimeDirectory=rpa
|
||||||
|
RuntimeDirectoryMode=0755
|
||||||
|
RuntimeDirectoryPreserve=yes
|
||||||
|
|
||||||
# Logs -> journald
|
# Logs -> journald
|
||||||
StandardOutput=journal
|
StandardOutput=journal
|
||||||
StandardError=journal
|
StandardError=journal
|
||||||
|
SyslogIdentifier=rpa-vision-v3-api
|
||||||
|
|
||||||
[Install]
|
[Install]
|
||||||
WantedBy=multi-user.target
|
WantedBy=multi-user.target
|
||||||
@@ -3,8 +3,8 @@ Description=RPA Vision V3 - Artifact retention / rotation
|
|||||||
|
|
||||||
[Service]
|
[Service]
|
||||||
Type=oneshot
|
Type=oneshot
|
||||||
User=rpa
|
User=dom
|
||||||
Group=rpa
|
Group=dom
|
||||||
WorkingDirectory=/opt/rpa_vision_v3
|
WorkingDirectory=/home/dom/ai/rpa_vision_v3
|
||||||
EnvironmentFile=/etc/rpa_vision_v3/rpa_vision_v3.env
|
EnvironmentFile=/home/dom/ai/rpa_vision_v3/.env.local
|
||||||
ExecStart=/opt/rpa_vision_v3/venv_v3/bin/python -m core.system.artifact_retention
|
ExecStart=/home/dom/ai/rpa_vision_v3/.venv/bin/python3 -m core.system.artifact_retention
|
||||||
|
|||||||
@@ -5,14 +5,17 @@ Wants=network-online.target
|
|||||||
|
|
||||||
[Service]
|
[Service]
|
||||||
Type=simple
|
Type=simple
|
||||||
User=rpa
|
User=dom
|
||||||
Group=rpa
|
Group=dom
|
||||||
WorkingDirectory=/opt/rpa_vision_v3
|
WorkingDirectory=/home/dom/ai/rpa_vision_v3
|
||||||
EnvironmentFile=/etc/rpa_vision_v3/rpa_vision_v3.env
|
EnvironmentFile=/home/dom/ai/rpa_vision_v3/.env.local
|
||||||
Environment="PYTHONUNBUFFERED=1"
|
Environment="PYTHONUNBUFFERED=1"
|
||||||
Environment="ENVIRONMENT=production"
|
Environment="ENVIRONMENT=production"
|
||||||
Environment="RPA_SERVICE_NAME=rpa-vision-v3-dashboard"
|
Environment="RPA_SERVICE_NAME=rpa-vision-v3-dashboard"
|
||||||
ExecStart=/opt/rpa_vision_v3/venv_v3/bin/python web_dashboard/app.py
|
# Service grounding persistant
|
||||||
|
Environment="RPA_GROUNDING_SOCKET=/run/rpa/grounding.sock"
|
||||||
|
Environment="RPA_GROUNDING_IMG_DIR=/run/rpa"
|
||||||
|
ExecStart=/home/dom/ai/rpa_vision_v3/.venv/bin/python3 web_dashboard/app.py
|
||||||
|
|
||||||
Restart=on-failure
|
Restart=on-failure
|
||||||
RestartSec=3
|
RestartSec=3
|
||||||
@@ -20,12 +23,10 @@ TimeoutStopSec=30
|
|||||||
|
|
||||||
NoNewPrivileges=true
|
NoNewPrivileges=true
|
||||||
PrivateTmp=true
|
PrivateTmp=true
|
||||||
ProtectSystem=strict
|
|
||||||
ProtectHome=true
|
|
||||||
ReadWritePaths=/opt/rpa_vision_v3/data /opt/rpa_vision_v3/logs
|
|
||||||
|
|
||||||
StandardOutput=journal
|
StandardOutput=journal
|
||||||
StandardError=journal
|
StandardError=journal
|
||||||
|
SyslogIdentifier=rpa-vision-v3-dashboard
|
||||||
|
|
||||||
[Install]
|
[Install]
|
||||||
WantedBy=multi-user.target
|
WantedBy=multi-user.target
|
||||||
@@ -8,9 +8,9 @@ OnFailure=rpa-vision-v3-recover.service
|
|||||||
|
|
||||||
[Service]
|
[Service]
|
||||||
Type=oneshot
|
Type=oneshot
|
||||||
WorkingDirectory=/opt/rpa_vision_v3
|
WorkingDirectory=/home/dom/ai/rpa_vision_v3
|
||||||
EnvironmentFile=/etc/rpa_vision_v3/rpa_vision_v3.env
|
EnvironmentFile=/home/dom/ai/rpa_vision_v3/.env.local
|
||||||
ExecStart=/opt/rpa_vision_v3/server/healthcheck.sh
|
ExecStart=/home/dom/ai/rpa_vision_v3/server/healthcheck.sh
|
||||||
|
|
||||||
[Install]
|
[Install]
|
||||||
WantedBy=multi-user.target
|
WantedBy=multi-user.target
|
||||||
@@ -5,4 +5,4 @@ Description=RPA Vision V3 - Recover stack (restart services)
|
|||||||
Type=oneshot
|
Type=oneshot
|
||||||
# Important: nécessite root pour systemctl
|
# Important: nécessite root pour systemctl
|
||||||
User=root
|
User=root
|
||||||
ExecStart=/bin/bash -lc 'systemctl restart rpa-vision-v3-api.service rpa-vision-v3-dashboard.service rpa-vision-v3-worker.service || true'
|
ExecStart=/bin/bash -lc 'systemctl restart rpa-streaming.service rpa-vision-v3-api.service rpa-vision-v3-dashboard.service rpa-vision-v3-worker.service || true'
|
||||||
|
|||||||
@@ -5,12 +5,15 @@ Wants=network-online.target
|
|||||||
|
|
||||||
[Service]
|
[Service]
|
||||||
Type=simple
|
Type=simple
|
||||||
User=rpa
|
User=dom
|
||||||
Group=rpa
|
Group=dom
|
||||||
WorkingDirectory=/opt/rpa_vision_v3/server
|
WorkingDirectory=/home/dom/ai/rpa_vision_v3
|
||||||
EnvironmentFile=/etc/rpa_vision_v3/rpa_vision_v3.env
|
EnvironmentFile=/home/dom/ai/rpa_vision_v3/.env.local
|
||||||
Environment="PYTHONUNBUFFERED=1"
|
Environment="PYTHONUNBUFFERED=1"
|
||||||
ExecStart=/opt/rpa_vision_v3/venv_v3/bin/python worker_daemon.py
|
# Service grounding persistant — socket + répertoire d'images partagés via /run/rpa/.
|
||||||
|
Environment="RPA_GROUNDING_SOCKET=/run/rpa/grounding.sock"
|
||||||
|
Environment="RPA_GROUNDING_IMG_DIR=/run/rpa"
|
||||||
|
ExecStart=/home/dom/ai/rpa_vision_v3/.venv/bin/python3 server/worker_daemon.py
|
||||||
|
|
||||||
Restart=on-failure
|
Restart=on-failure
|
||||||
RestartSec=3
|
RestartSec=3
|
||||||
@@ -18,12 +21,14 @@ TimeoutStopSec=60
|
|||||||
|
|
||||||
NoNewPrivileges=true
|
NoNewPrivileges=true
|
||||||
PrivateTmp=true
|
PrivateTmp=true
|
||||||
ProtectSystem=strict
|
# /run/rpa/ partagé avec rpa-grounding (socket + images)
|
||||||
ProtectHome=true
|
RuntimeDirectory=rpa
|
||||||
ReadWritePaths=/opt/rpa_vision_v3/data /opt/rpa_vision_v3/logs
|
RuntimeDirectoryMode=0755
|
||||||
|
RuntimeDirectoryPreserve=yes
|
||||||
|
|
||||||
StandardOutput=journal
|
StandardOutput=journal
|
||||||
StandardError=journal
|
StandardError=journal
|
||||||
|
SyslogIdentifier=rpa-vision-v3-worker
|
||||||
|
|
||||||
[Install]
|
[Install]
|
||||||
WantedBy=multi-user.target
|
WantedBy=multi-user.target
|
||||||
@@ -1,4 +1,11 @@
|
|||||||
# /etc/rpa_vision_v3/rpa_vision_v3.env
|
# /home/dom/ai/rpa_vision_v3/.env.local
|
||||||
|
# Chargé par tous les services systemd via EnvironmentFile=
|
||||||
|
#
|
||||||
|
# IMPORTANT : format systemd EnvironmentFile
|
||||||
|
# - Pas de "export" devant les variables
|
||||||
|
# - Pas de guillemets autour des valeurs (sauf si espaces)
|
||||||
|
# - Commentaires avec #
|
||||||
|
# - Une variable par ligne : CLE=valeur
|
||||||
|
|
||||||
# --- Secrets (OBLIGATOIRES en prod) ---
|
# --- Secrets (OBLIGATOIRES en prod) ---
|
||||||
ENCRYPTION_PASSWORD=CHANGE_ME
|
ENCRYPTION_PASSWORD=CHANGE_ME
|
||||||
@@ -7,33 +14,45 @@ SECRET_KEY=CHANGE_ME
|
|||||||
# --- Runtime ---
|
# --- Runtime ---
|
||||||
ENVIRONMENT=production
|
ENVIRONMENT=production
|
||||||
|
|
||||||
# --- Fiche #24 - Observabilité ---
|
# --- Token API fixe (streaming server + agent) ---
|
||||||
# Label Prometheus (surcouche). En prod, les unités systemd posent déjà une valeur par service.
|
# Générer avec : python3 -c "import secrets; print(secrets.token_hex(32))"
|
||||||
# RPA_SERVICE_NAME=rpa-vision-v3
|
# OBLIGATOIRE : si vide en prod, le serveur de streaming refuse de démarrer
|
||||||
|
# (fail-closed P0-C). Pour désactiver l'auth en dev local : RPA_AUTH_DISABLED=true
|
||||||
|
RPA_API_TOKEN=CHANGE_ME
|
||||||
|
|
||||||
# Worker mode:
|
# --- Auth dashboard Flask (port 5001, Fix P0-A) ---
|
||||||
|
# HTTP Basic Auth obligatoire sur tous les endpoints sauf healthchecks.
|
||||||
|
# OBLIGATOIRE en prod. Pour désactiver en dev : DASHBOARD_AUTH_DISABLED=true
|
||||||
|
DASHBOARD_USER=lea
|
||||||
|
DASHBOARD_PASSWORD=CHANGE_ME
|
||||||
|
|
||||||
|
# --- Worker mode ---
|
||||||
# thread -> worker intégré à l'API
|
# thread -> worker intégré à l'API
|
||||||
# external -> worker dans rpa-vision-v3-worker.service (recommandé prod)
|
# external -> worker dans rpa-vision-v3-worker.service (recommandé prod)
|
||||||
# disabled -> API upload only
|
# disabled -> API upload only
|
||||||
RPA_PROCESSING_WORKER=external
|
RPA_PROCESSING_WORKER=external
|
||||||
|
|
||||||
# Ports (healthcheck.sh les utilise)
|
# --- Ports (healthcheck.sh les utilise) ---
|
||||||
RPA_API_HOST=127.0.0.1
|
RPA_API_HOST=127.0.0.1
|
||||||
RPA_API_PORT=8000
|
RPA_API_PORT=8000
|
||||||
RPA_DASHBOARD_HOST=127.0.0.1
|
RPA_DASHBOARD_HOST=127.0.0.1
|
||||||
RPA_DASHBOARD_PORT=5001
|
RPA_DASHBOARD_PORT=5001
|
||||||
RPA_CHECK_DASHBOARD=1
|
RPA_CHECK_DASHBOARD=1
|
||||||
|
|
||||||
# Worker heartbeat (si worker external)
|
# --- Worker heartbeat ---
|
||||||
RPA_WORKER_HEARTBEAT_PATH=data/runtime/health/worker_heartbeat.json
|
RPA_WORKER_HEARTBEAT_PATH=data/runtime/health/worker_heartbeat.json
|
||||||
RPA_WORKER_HEARTBEAT_MAX_AGE_S=60
|
RPA_WORKER_HEARTBEAT_MAX_AGE_S=60
|
||||||
|
|
||||||
# Retention / rotation
|
# --- Retention / rotation ---
|
||||||
RPA_DATA_DIR=data
|
RPA_DATA_DIR=data
|
||||||
RPA_RETENTION_FAILURE_CASES_DAYS=14
|
RPA_RETENTION_FAILURE_CASES_DAYS=14
|
||||||
RPA_RETENTION_ARCHIVE_FAILURE_CASES=true
|
RPA_RETENTION_ARCHIVE_FAILURE_CASES=true
|
||||||
RPA_RETENTION_WATCHDOG_DAYS=7
|
RPA_RETENTION_WATCHDOG_DAYS=7
|
||||||
RPA_RETENTION_GUARD_REPORTS_DAYS=30
|
RPA_RETENTION_GUARD_REPORTS_DAYS=30
|
||||||
|
|
||||||
# Healthcheck - disque
|
# --- Healthcheck - disque ---
|
||||||
RPA_MIN_FREE_MB=1024
|
RPA_MIN_FREE_MB=1024
|
||||||
|
|
||||||
|
# --- VLM (modèle de vision local) ---
|
||||||
|
RPA_VLM_MODEL=qwen3-vl:8b
|
||||||
|
VLM_MODEL=qwen3-vl:8b
|
||||||
|
|||||||
897
docs/AUDIT_20260404.md
Normal file
897
docs/AUDIT_20260404.md
Normal file
@@ -0,0 +1,897 @@
|
|||||||
|
# Audit Complet — RPA Vision V3
|
||||||
|
|
||||||
|
**Date** : 4 avril 2026
|
||||||
|
**Auditeur** : Claude Sonnet 4.6 + 5 agents d'exploration spécialisés
|
||||||
|
**Périmètre** : Projet complet (code source, tests, sécurité, déploiement, qualité)
|
||||||
|
**Environnement** : Ubuntu 24.04, Python 3.12.3, NVIDIA RTX 5070 (12 Go VRAM)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Table des matières
|
||||||
|
|
||||||
|
1. [Synthèse exécutive](#1-synthèse-exécutive)
|
||||||
|
2. [Métriques clés](#2-métriques-clés)
|
||||||
|
3. [Architecture](#3-architecture)
|
||||||
|
4. [Modules core — Analyse détaillée](#4-modules-core--analyse-détaillée)
|
||||||
|
5. [Composants web](#5-composants-web)
|
||||||
|
6. [Agent V0/V1 — Streaming](#6-agent-v0v1--streaming)
|
||||||
|
7. [Tests](#7-tests)
|
||||||
|
8. [Sécurité](#8-sécurité)
|
||||||
|
9. [Déploiement & Infrastructure](#9-déploiement--infrastructure)
|
||||||
|
10. [Qualité du code](#10-qualité-du-code)
|
||||||
|
11. [Performances](#11-performances)
|
||||||
|
12. [Gestion des dépendances](#12-gestion-des-dépendances)
|
||||||
|
13. [Documentation](#13-documentation)
|
||||||
|
14. [Espace disque](#14-espace-disque)
|
||||||
|
15. [Points forts](#15-points-forts)
|
||||||
|
16. [Points faibles & Risques](#16-points-faibles--risques)
|
||||||
|
17. [Recommandations](#17-recommandations)
|
||||||
|
18. [Score global](#18-score-global)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Synthèse exécutive
|
||||||
|
|
||||||
|
RPA Vision V3 est un système d'automatisation RPA 100% basé sur la vision (pas d'accessibilité, pas de sélecteurs DOM). Il utilise CLIP, FAISS, Ollama (VLM local), SomEngine (YOLO + docTR) et le template matching pour identifier et interagir avec les éléments d'interface.
|
||||||
|
|
||||||
|
**État** : Phase 0 complète, Phase 1 (streaming agent) en stabilisation.
|
||||||
|
**Maturité** : Prototype avancé / pré-production.
|
||||||
|
**Risque principal** : Tokens de production hardcodés dans le code source.
|
||||||
|
|
||||||
|
Le projet est fonctionnel : le replay visuel fonctionne sur Windows, le VWB permet de construire des workflows, le dashboard de monitoring est opérationnel. Cependant, la dette technique s'accumule (fichiers monolithiques, 47 Go de venvs dupliqués, code mort) et des failles de sécurité critiques doivent être corrigées avant toute mise en production.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Métriques clés
|
||||||
|
|
||||||
|
### Volume de code
|
||||||
|
|
||||||
|
| Métrique | Valeur |
|
||||||
|
|----------|--------|
|
||||||
|
| Fichiers Python (hors venvs/archives) | 1 094 |
|
||||||
|
| Lignes de code source | 190 382 |
|
||||||
|
| Lignes de tests | 63 114 |
|
||||||
|
| Lignes TypeScript/JavaScript (frontend) | 39 868 (103 fichiers) |
|
||||||
|
| **Total lignes de code** | **~293 000** |
|
||||||
|
| Ratio tests/source | 33,2% |
|
||||||
|
| Commits | 123 |
|
||||||
|
| Contributeur unique | Dom |
|
||||||
|
| Période de développement | 7 jan → 4 avril 2026 (88 jours) |
|
||||||
|
|
||||||
|
### Répartition du code source par module
|
||||||
|
|
||||||
|
| Module | Lignes | % du total |
|
||||||
|
|--------|--------|------------|
|
||||||
|
| `core/` | 74 555 | 39,2% |
|
||||||
|
| `visual_workflow_builder/` | 45 830 | 24,1% |
|
||||||
|
| `agent_v0/` | 23 637 | 12,4% |
|
||||||
|
| `scripts/` | 16 525 | 8,7% |
|
||||||
|
| `deploy/` | 7 097 | 3,7% |
|
||||||
|
| `agent_chat/` | 6 937 | 3,6% |
|
||||||
|
| `examples/` | 4 510 | 2,4% |
|
||||||
|
| `server/` | 2 897 | 1,5% |
|
||||||
|
| `web_dashboard/` | 2 430 | 1,3% |
|
||||||
|
| Autres (cli, gui, i18n, etc.) | 5 964 | 3,1% |
|
||||||
|
|
||||||
|
### Sous-modules core/ (top 10 par taille)
|
||||||
|
|
||||||
|
| Sous-module | Lignes | Rôle |
|
||||||
|
|-------------|--------|------|
|
||||||
|
| `execution/` | 12 503 | Exécution d'actions, DAG, target resolver |
|
||||||
|
| `visual/` | 5 493 | Screen analyzer, SomEngine, visual matching |
|
||||||
|
| `analytics/` | 5 230 | Métriques, rapports, statistiques |
|
||||||
|
| `workflow/` | 4 328 | Gestion workflows, scheduler |
|
||||||
|
| `detection/` | 4 202 | UI detector, Ollama client, VLM config |
|
||||||
|
| `models/` | 3 492 | Modèles de données (workflow graph, etc.) |
|
||||||
|
| `security/` | 3 365 | API tokens, rate limiting, audit trail |
|
||||||
|
| `embedding/` | 2 914 | CLIP embedder, FAISS manager |
|
||||||
|
| `system/` | 2 862 | Safety switch, auto-heal, hooks |
|
||||||
|
| `corrections/` | 2 780 | Corrections BBOX, sniper mode |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Architecture
|
||||||
|
|
||||||
|
### Architecture 5 couches
|
||||||
|
|
||||||
|
```
|
||||||
|
RawSession → ScreenState → UIElement → StateEmbedding → WorkflowGraph
|
||||||
|
(1) (2) (3) (4) (5)
|
||||||
|
```
|
||||||
|
|
||||||
|
1. **RawSession** : Capture brute (screenshots + événements souris/clavier)
|
||||||
|
2. **ScreenState** : État d'écran analysé (éléments détectés, OCR)
|
||||||
|
3. **UIElement** : Éléments d'interface identifiés (boutons, champs, menus)
|
||||||
|
4. **StateEmbedding** : Vecteurs CLIP/FAISS pour recherche similaire
|
||||||
|
5. **WorkflowGraph** : Graphe de workflow exécutable
|
||||||
|
|
||||||
|
### Services (8 services, gérés par `svc.sh`)
|
||||||
|
|
||||||
|
| Port | Service | Type | Framework |
|
||||||
|
|------|---------|------|-----------|
|
||||||
|
| 8000 | API Server (upload/processing) | required | FastAPI |
|
||||||
|
| 5001 | Web Dashboard | required | Flask + SocketIO |
|
||||||
|
| 5002 | VWB Backend | required | Flask + SQLAlchemy |
|
||||||
|
| 5003 | Monitoring | optional | Flask |
|
||||||
|
| 5004 | Agent Chat | optional | Flask + SocketIO |
|
||||||
|
| 5005 | Streaming Server (Agent V1) | optional | FastAPI |
|
||||||
|
| 5099 | Worker (polling) | optional | Python script |
|
||||||
|
| 3002 | VWB Frontend | required | React 19 + Vite |
|
||||||
|
|
||||||
|
### Points d'entrée
|
||||||
|
|
||||||
|
| Fichier | Rôle |
|
||||||
|
|---------|------|
|
||||||
|
| `run.sh` | Chef d'orchestre — lance les composants selon les flags |
|
||||||
|
| `svc.sh` | Gestionnaire de services (systemd + legacy PID) |
|
||||||
|
| `cli.py` | CLI interactif (660 lignes) |
|
||||||
|
| `services.conf` | Source de vérité des ports et commandes |
|
||||||
|
|
||||||
|
### Diagramme de flux principal
|
||||||
|
|
||||||
|
```
|
||||||
|
[Agent V1 Windows]
|
||||||
|
↓ (capture screenshots + events)
|
||||||
|
↓ HTTP POST /upload_batch
|
||||||
|
[Streaming Server :5005]
|
||||||
|
↓ stream_processor.py
|
||||||
|
↓ (ScreenAnalyzer → CLIP → FAISS → GraphBuilder)
|
||||||
|
[Core Pipeline]
|
||||||
|
↓ build_replay() → resolve_target()
|
||||||
|
↓ (SomEngine → VLM grounding → template matching)
|
||||||
|
[Replay Engine]
|
||||||
|
↓ HTTP → Agent V1
|
||||||
|
↓ executor.py
|
||||||
|
[Agent V1 Windows]
|
||||||
|
↓ PyAutoGUI (Bézier mouse + char-by-char typing)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Modules core — Analyse détaillée
|
||||||
|
|
||||||
|
### 4.1 Détection (`core/detection/` — 10 fichiers, 4 202 lignes)
|
||||||
|
|
||||||
|
| Fichier | Lignes | Rôle |
|
||||||
|
|---------|--------|------|
|
||||||
|
| `ui_detector.py` | ~800 | Détecteur principal (CLIP + template matching) |
|
||||||
|
| `ollama_client.py` | ~600 | Client Ollama pour VLM (gemma4:e4b) |
|
||||||
|
| `vlm_config.py` | ~200 | Configuration VLM (modèle, endpoint) |
|
||||||
|
| `screen_analyzer.py` | ~500 | Analyse complète d'un screenshot |
|
||||||
|
| `som_engine.py` | ~315 | Set-of-Mark (YOLO + docTR), singleton thread-safe |
|
||||||
|
| `owl_detector.py` | ~300 | OWL-ViT v2 pour détection zero-shot |
|
||||||
|
| `template_matcher.py` | ~400 | Template matching OpenCV |
|
||||||
|
|
||||||
|
**Stratégie de résolution** (cascade) :
|
||||||
|
1. **Grounding VLM** (Qwen2.5-VL GPU) — pour éléments avec texte OCR
|
||||||
|
2. **Template matching** (OpenCV) — pour icônes sans texte
|
||||||
|
3. **SomEngine + VLM** — fallback multi-étapes
|
||||||
|
|
||||||
|
**Imports lourds** : `torch`, `transformers`, `open_clip_torch`, `cv2`, `PIL`
|
||||||
|
|
||||||
|
### 4.2 Exécution (`core/execution/` — 15 fichiers, 12 503 lignes)
|
||||||
|
|
||||||
|
**Fichiers critiques** :
|
||||||
|
|
||||||
|
| Fichier | Lignes | Rôle |
|
||||||
|
|---------|--------|------|
|
||||||
|
| `target_resolver.py` | 3 495 | Résolution multi-stratégie de cibles |
|
||||||
|
| `execution_loop.py` | 1 361 | Boucle principale d'exécution |
|
||||||
|
| `action_executor.py` | 1 171 | Exécuteur d'actions individuelles |
|
||||||
|
| `dag_executor.py` | ~800 | Exécution de DAG (workflows parallèles) |
|
||||||
|
| `llm_actions.py` | ~600 | Actions LLM (analyse, traduction, extraction) |
|
||||||
|
| `memory_cache.py` | 1 059 | Cache mémoire pour optimisation |
|
||||||
|
|
||||||
|
**⚠️ `target_resolver.py`** est le fichier le plus complexe du core. Il implémente 5+ stratégies de résolution : texte OCR, ancrage visuel, template matching, SomEngine, VLM grounding. À surveiller pour la maintenabilité.
|
||||||
|
|
||||||
|
**⚠️ `dag_executor.py:532`** utilise `eval()` pour évaluer des conditions de workflow :
|
||||||
|
```python
|
||||||
|
result = bool(eval(condition, {"__builtins__": {}}, eval_context))
|
||||||
|
```
|
||||||
|
Le `__builtins__: {}` limite les risques mais ne les élimine pas (contournement possible via `type.__subclasses__`).
|
||||||
|
|
||||||
|
### 4.3 GPU (`core/gpu/` — 6 fichiers, 1 735 lignes)
|
||||||
|
|
||||||
|
| Fichier | Rôle |
|
||||||
|
|---------|------|
|
||||||
|
| `gpu_resource_manager.py` | Orchestrateur GPU (modes RECORDING/AUTOPILOT/IDLE) |
|
||||||
|
| `ollama_manager.py` | Gestion cycle de vie modèles Ollama (async) |
|
||||||
|
| `clip_manager.py` | Gestion modèle CLIP (lazy load, GPU↔CPU) |
|
||||||
|
|
||||||
|
**Architecture GPU** :
|
||||||
|
- Mode **RECORDING** : VLM sur GPU, CLIP sur CPU
|
||||||
|
- Mode **AUTOPILOT** : VLM déchargé, CLIP sur GPU
|
||||||
|
- Seuil VRAM CLIP : 1 024 Mo
|
||||||
|
- Timeout inactivité : 300s
|
||||||
|
|
||||||
|
### 4.4 Authentification (`core/auth/` — 5 fichiers, 1 223 lignes)
|
||||||
|
|
||||||
|
| Fichier | Rôle |
|
||||||
|
|---------|------|
|
||||||
|
| `credential_vault.py` | Coffre-fort chiffré (Fernet AES + PBKDF2 600k itérations) |
|
||||||
|
| `totp_generator.py` | TOTP RFC 6238 (30s, 6 digits) |
|
||||||
|
| `auth_handler.py` | Orchestration authentification multi-facteur |
|
||||||
|
|
||||||
|
**⚠️ Fallback non sécurisé** : si `cryptography` n'est pas installé, le vault utilise un simple encodage base64.
|
||||||
|
|
||||||
|
### 4.5 Fédération (`core/federation/` — 3 fichiers, 1 339 lignes)
|
||||||
|
|
||||||
|
Export/import de LearningPacks anonymisés entre instances. Merge FAISS global. Endpoints REST dédiés.
|
||||||
|
|
||||||
|
### 4.6 Graph Builder (`core/graph/` — 4 fichiers, 1 949 lignes)
|
||||||
|
|
||||||
|
Construit le WorkflowGraph à partir des sessions d'enregistrement. `graph_builder.py` (1 616 lignes) accepte `precomputed_states` pour skip ScreenAnalyzer.
|
||||||
|
|
||||||
|
### 4.7 Autres modules notables
|
||||||
|
|
||||||
|
| Module | Fichiers | Lignes | Rôle |
|
||||||
|
|--------|----------|--------|------|
|
||||||
|
| `healing/` | 13 | 2 343 | Auto-correction, learning packs |
|
||||||
|
| `monitoring/` | 8 | 1 967 | Triggers, chain manager, scheduler |
|
||||||
|
| `security/` | 10 | 3 365 | API tokens, rate limiting, audit trail |
|
||||||
|
| `pipeline/` | 4 | 1 695 | Pipeline de traitement principal |
|
||||||
|
| `training/` | 6 | 1 999 | Entraînement et adaptation |
|
||||||
|
| `analytics/` | 25 | 5 230 | Reporting, métriques, dashboard data |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Composants web
|
||||||
|
|
||||||
|
### 5.1 Visual Workflow Builder (VWB)
|
||||||
|
|
||||||
|
**Backend** (`visual_workflow_builder/backend/`) :
|
||||||
|
- Framework : Flask + SQLAlchemy + Flask-SocketIO
|
||||||
|
- Base de données : `workflows.db` (SQLite)
|
||||||
|
- Routes principales : `catalog_routes_v2_vlm.py` (2 836 lignes — **monolithique**)
|
||||||
|
- API v3 : `dag_execute.py` (1 058 lignes), `execute.py` (1 173 lignes)
|
||||||
|
- VLM Provider : `vlm_provider.py` — interface Ollama pour détection visuelle
|
||||||
|
- Actions disponibles : 15+ catégories (data, intelligence, navigation, validation, vision_ui)
|
||||||
|
|
||||||
|
**Frontend** :
|
||||||
|
- Framework : React 19 + TypeScript + MUI 7 + Redux Toolkit
|
||||||
|
- Flow editor : `@xyflow/react` v12
|
||||||
|
- WebSocket : `socket.io-client`
|
||||||
|
- 103 fichiers TS/TSX (39 868 lignes)
|
||||||
|
- **⚠️ 2 dossiers frontend** : `frontend/` (1,3 Go avec node_modules) et `frontend_v4/` (79 Mo)
|
||||||
|
|
||||||
|
### 5.2 Web Dashboard (`web_dashboard/`)
|
||||||
|
|
||||||
|
- Framework : Flask + SocketIO
|
||||||
|
- Fichier unique : `app.py` (2 430 lignes — **monolithique**)
|
||||||
|
- 65 routes Flask
|
||||||
|
- Fonctionnalités : monitoring sessions, replay, métriques, proxy streaming
|
||||||
|
- **⚠️ `cors_allowed_origins="*"`** — pas de restriction CORS
|
||||||
|
|
||||||
|
### 5.3 Agent Chat (`agent_chat/`)
|
||||||
|
|
||||||
|
- Framework : Flask + SocketIO (6 937 lignes, 8 fichiers)
|
||||||
|
- `app.py` (2 570 lignes — **monolithique**)
|
||||||
|
- `autonomous_planner.py` — planification autonome de workflows
|
||||||
|
- Interface conversationnelle pour le pilotage RPA
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Agent V0/V1 — Streaming
|
||||||
|
|
||||||
|
### 6.1 Client Agent V1 (`agent_v0/agent_v1/`)
|
||||||
|
|
||||||
|
Déployé sur la machine Windows cible. Léger, sans GPU.
|
||||||
|
|
||||||
|
| Fichier | Rôle |
|
||||||
|
|---------|------|
|
||||||
|
| `main.py` | Point d'entrée, configuration |
|
||||||
|
| `core/executor.py` | Exécution actions (PyAutoGUI, Bézier, char-by-char) |
|
||||||
|
| `vision/capturer.py` | Capture screenshots (mss) |
|
||||||
|
| `network/streamer.py` | Streaming vers serveur (HTTP batch upload) |
|
||||||
|
| `ui/notifications.py` | Notifications utilisateur |
|
||||||
|
| `window_info_crossplatform.py` | Info fenêtre active (Windows/Linux) |
|
||||||
|
|
||||||
|
### 6.2 Serveur Streaming (`agent_v0/server_v1/`)
|
||||||
|
|
||||||
|
Tourne sur le serveur avec GPU (RTX 5070).
|
||||||
|
|
||||||
|
| Fichier | Lignes | Rôle |
|
||||||
|
|---------|--------|------|
|
||||||
|
| `api_stream.py` | **5 612** | API FastAPI (27 endpoints) + replay + résolution + admin |
|
||||||
|
| `stream_processor.py` | **4 656** | Orchestrateur central (analyse, CLIP, FAISS, graph) |
|
||||||
|
| `live_session_manager.py` | ~600 | Gestion sessions en mémoire |
|
||||||
|
| `worker_stream.py` | ~400 | Worker polling + API directe |
|
||||||
|
| `replay_failure_logger.py` | ~200 | Logger d'échecs replay |
|
||||||
|
| `vm_controller.py` | ~150 | Contrôle VM (virsh) |
|
||||||
|
|
||||||
|
**⚠️ `api_stream.py` et `stream_processor.py`** totalisent **10 268 lignes** à eux deux. C'est le fichier le plus urgent à découper.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Tests
|
||||||
|
|
||||||
|
### 7.1 Vue d'ensemble
|
||||||
|
|
||||||
|
| Métrique | Valeur |
|
||||||
|
|----------|--------|
|
||||||
|
| Tests collectés (hors property) | 1 463 |
|
||||||
|
| Tests passants | **1 401** |
|
||||||
|
| Tests échoués | **9** |
|
||||||
|
| Tests skippés | 43 |
|
||||||
|
| Tests xfailed | 4 |
|
||||||
|
| Tests xpassed | 1 |
|
||||||
|
| Durée totale | 318s (~5min18) |
|
||||||
|
| **Taux de succès** | **95,8%** (hors skips : 99,4%) |
|
||||||
|
|
||||||
|
### 7.2 Répartition des fichiers de test
|
||||||
|
|
||||||
|
| Catégorie | Fichiers | Rôle |
|
||||||
|
|-----------|----------|------|
|
||||||
|
| `unit/` | 70 | Tests unitaires isolés |
|
||||||
|
| `integration/` | 47 | Tests d'intégration (services, API) |
|
||||||
|
| `smoke/` | 1 | Smoke test E2E minimal |
|
||||||
|
| `performance/` | 1 | Benchmarks |
|
||||||
|
| `property/` | 7 | Tests basés sur propriétés (Hypothesis) — **CASSÉS** |
|
||||||
|
| Racine `tests/` | 10 | Tests E2E pipeline, correction packs, coaching |
|
||||||
|
| `utils/` | 1 | Utilitaires de test |
|
||||||
|
|
||||||
|
### 7.3 Tests en échec (9 tests)
|
||||||
|
|
||||||
|
| Test | Raison |
|
||||||
|
|------|--------|
|
||||||
|
| `test_diagnostic_actions_manquantes_vwb` (×3) | Actions VWB manquantes dans le catalogue |
|
||||||
|
| `test_fiche11_multi_anchor_constraints` (×1) | Déterminisme tie-breaking non garanti |
|
||||||
|
| `test_vwb_actions_09jan2026` (×5) | Mock executor obsolète |
|
||||||
|
|
||||||
|
### 7.4 Tests non collectables (erreurs de collection)
|
||||||
|
|
||||||
|
| Fichier | Erreur |
|
||||||
|
|---------|--------|
|
||||||
|
| `tests/property/*.py` (7 fichiers) | Imports cassés (modules supprimés/renommés) |
|
||||||
|
| `tests/integration/test_visual_rpa_checkpoint.py` | Import `VisualMetadata` inexistant |
|
||||||
|
|
||||||
|
### 7.5 Couverture par module core
|
||||||
|
|
||||||
|
| Module | Couverture | Module | Couverture |
|
||||||
|
|--------|-----------|--------|-----------|
|
||||||
|
| `models/` | Excellente (129 imports) | `execution/` | Excellente (50 imports) |
|
||||||
|
| `workflow/` | Excellente (49 imports) | `capture/` | Bonne (29 imports) |
|
||||||
|
| `visual/` | Bonne (21 imports) | `detection/` | Bonne (19 imports) |
|
||||||
|
| `embedding/` | Bonne (18 imports) | `pipeline/` | Bonne (23 imports) |
|
||||||
|
| `healing/` | Modérée (10 imports) | `analytics/` | Modérée (11 imports) |
|
||||||
|
| `auth/` | Faible (3 imports) | `security/` | Très faible (1 import) |
|
||||||
|
| `gpu/` | Très faible (2 imports) | `extraction/` | Très faible (2 imports) |
|
||||||
|
| **`supervision/`** | **AUCUNE** | **`matching/`** | **AUCUNE** |
|
||||||
|
| **`variants/`** | **AUCUNE** | | |
|
||||||
|
|
||||||
|
3 modules sur 31 n'ont **aucun test** : `supervision`, `matching`, `variants`.
|
||||||
|
|
||||||
|
### 7.5 Configuration pytest
|
||||||
|
|
||||||
|
```ini
|
||||||
|
testpaths = tests
|
||||||
|
addopts = -q --tb=short --strict-markers
|
||||||
|
markers = unit, integration, performance, slow, smoke, fiche1..fiche10
|
||||||
|
filterwarnings = ignore::DeprecationWarning
|
||||||
|
```
|
||||||
|
|
||||||
|
**⚠️ Le Makefile pointe vers `venv_v3/bin/pytest`** au lieu de `.venv/bin/pytest` (le venv actif).
|
||||||
|
|
||||||
|
### 7.7 Marqueurs pytest sous-utilisés
|
||||||
|
|
||||||
|
6 marqueurs `fiche` sur 10 sont réellement utilisés (fiche4, fiche6, fiche7, fiche8, fiche9, fiche10). Les marqueurs fiche1, fiche2, fiche3, fiche5 sont déclarés mais jamais appliqués à aucun test.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Sécurité
|
||||||
|
|
||||||
|
### 8.1 Vulnérabilités CRITIQUES
|
||||||
|
|
||||||
|
#### 🔴 Clés API cloud en clair dans `.env.local`
|
||||||
|
|
||||||
|
**Fichier** : `.env.local` (gitignored mais sur disque)
|
||||||
|
|
||||||
|
Le fichier contient en clair :
|
||||||
|
- `ANTHROPIC_API_KEY=sk-ant-api03-...` (clé Anthropic complète)
|
||||||
|
- `OPENAI_API_KEY=sk-proj-...` (clé OpenAI complète)
|
||||||
|
- `GOOGLE_API_KEY=AIzaSy...` (clé Google complète)
|
||||||
|
- `DEEPSEEK_API_KEY=3d7b...` (clé Deepseek complète)
|
||||||
|
- `ENCRYPTION_PASSWORD`, `SECRET_KEY`, `RPA_TOKEN_ADMIN`, `AUTOHEAL_ADMIN_TOKEN`, `RPA_API_TOKEN`
|
||||||
|
|
||||||
|
**Impact** : Si le disque est compromis ou si le fichier fuite (backup, copie), toutes les clés cloud sont exposées. Les clés Anthropic/OpenAI ont un coût financier direct.
|
||||||
|
|
||||||
|
**Remédiation** :
|
||||||
|
- Révoquer et régénérer toutes les clés cloud immédiatement
|
||||||
|
- Utiliser un gestionnaire de secrets (Vault, systèmes de credentials)
|
||||||
|
- A minima, permissions `chmod 600` et propriétaire `dom:dom` uniquement
|
||||||
|
|
||||||
|
#### 🔴 Tokens de production hardcodés
|
||||||
|
|
||||||
|
**Fichier** : `core/security/api_tokens.py:93-94`
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Temporary fix: Add production tokens directly
|
||||||
|
prod_admin_token = "73cf0db73f9a5064e79afebba96c85338be65cc2060b9c1d42c3ea5dd7d4e490"
|
||||||
|
prod_readonly_token = "7eea1de415cc69c02381ce09ff63aeebf3e1d9b476d54aa6730ba9de849e3dc6"
|
||||||
|
```
|
||||||
|
|
||||||
|
Ces tokens **admin** sont dans le code source, visibles dans git. Ils donnent un accès complet à l'API de streaming (port 5005) exposé sur Internet via `lea.labs.laurinebazin.design`.
|
||||||
|
|
||||||
|
**Impact** : Un attaquant peut prendre le contrôle total de l'agent RPA et exécuter des actions arbitraires sur la machine cible.
|
||||||
|
|
||||||
|
**Remédiation immédiate** : Révoquer ces tokens, les déplacer dans `.env`, régénérer.
|
||||||
|
|
||||||
|
#### 🔴 `eval()` dans le DAG executor
|
||||||
|
|
||||||
|
**Fichier** : `core/execution/dag_executor.py:532`
|
||||||
|
|
||||||
|
```python
|
||||||
|
result = bool(eval(condition, {"__builtins__": {}}, eval_context))
|
||||||
|
```
|
||||||
|
|
||||||
|
Même avec `__builtins__: {}`, `eval()` est contournable via introspection Python. Si `condition` provient d'une entrée utilisateur (workflow JSON), c'est une injection de code.
|
||||||
|
|
||||||
|
**Remédiation** : Remplacer par un parser AST sécurisé ou une grammaire restreinte.
|
||||||
|
|
||||||
|
#### 🔴 Clé de chiffrement par défaut
|
||||||
|
|
||||||
|
**Fichier** : `core/security/api_tokens.py:80`
|
||||||
|
|
||||||
|
```python
|
||||||
|
self.secret_key = os.getenv("TOKEN_SECRET_KEY", "dev-token-secret-change-in-production")
|
||||||
|
```
|
||||||
|
|
||||||
|
En production sans la variable d'environnement, la clé de signature des tokens est connue.
|
||||||
|
|
||||||
|
### 8.2 Vulnérabilités HAUTES
|
||||||
|
|
||||||
|
#### 🟠 Désérialisation `pickle.load()` non sécurisée
|
||||||
|
|
||||||
|
**Fichiers** :
|
||||||
|
- `core/embedding/faiss_manager.py:517,534`
|
||||||
|
- `core/visual/visual_embedding_manager.py`
|
||||||
|
|
||||||
|
```python
|
||||||
|
with open(metadata_path, 'rb') as f:
|
||||||
|
pickle.load(f) # Pas de restriction
|
||||||
|
```
|
||||||
|
|
||||||
|
`pickle.load()` sans restrictions permet l'exécution de code arbitraire si un fichier `.pkl` est compromis (fichier metadata FAISS). Si un attaquant peut placer un fichier `.pkl` malveillant dans `data/embeddings/`, il obtient une exécution de code.
|
||||||
|
|
||||||
|
**Remédiation** : Migrer vers JSON/msgpack pour les métadonnées, ou valider l'intégrité des fichiers avec HMAC.
|
||||||
|
|
||||||
|
#### 🟠 `shell=True` dans subprocess (11 occurrences)
|
||||||
|
|
||||||
|
**Fichier** : `agent_v0/server_v1/vm_controller.py` (10 occurrences)
|
||||||
|
|
||||||
|
```python
|
||||||
|
subprocess.run(f"virsh start {self.domain_name}", shell=True, check=True)
|
||||||
|
```
|
||||||
|
|
||||||
|
Si `domain_name` est contrôlé par l'utilisateur, c'est une injection de commandes.
|
||||||
|
|
||||||
|
**Autres occurrences** :
|
||||||
|
- `web_dashboard/app.py:1851` — `lsof -ti :{port} | xargs -r kill`
|
||||||
|
- `visual_workflow_builder/backend/catalog_routes_v2_vlm.py:2181` — `os.system('echo ...')`
|
||||||
|
|
||||||
|
#### 🟠 `os.system()` avec variables non sanitisées
|
||||||
|
|
||||||
|
- `agent_v0/agent_v1/ui/smart_tray.py:557` — `os.system(f'xdg-open "{sessions_path}"')`
|
||||||
|
|
||||||
|
Si `sessions_path` contient des guillemets ou des caractères shell, injection possible.
|
||||||
|
|
||||||
|
#### 🟠 CORS permissif
|
||||||
|
|
||||||
|
- `web_dashboard/app.py:41` — `cors_allowed_origins="*"` (accepte toutes les origines)
|
||||||
|
- Le streaming server a une liste blanche configurable (mieux)
|
||||||
|
|
||||||
|
#### 🟠 Logs contenant des tokens partiels
|
||||||
|
|
||||||
|
**Fichier** : `core/security/api_tokens.py:73-76`
|
||||||
|
|
||||||
|
```python
|
||||||
|
logger.info(f"RPA_TOKEN_ADMIN value: {admin_token[:8]}...")
|
||||||
|
```
|
||||||
|
|
||||||
|
Les 8 premiers caractères du token sont loggés. Insuffisant pour une compromission directe mais réduit l'entropie.
|
||||||
|
|
||||||
|
### 8.3 Vulnérabilités MOYENNES
|
||||||
|
|
||||||
|
| Problème | Fichiers | Impact |
|
||||||
|
|----------|----------|--------|
|
||||||
|
| `bare except:` (69 occurrences) | Tout le projet | Masque les erreurs, empêche le debugging |
|
||||||
|
| `except Exception:` (191 occurrences) | Tout le projet | Trop large, capture des erreurs inattendues |
|
||||||
|
| Fallback base64 dans credential vault | `core/auth/credential_vault.py` | Pas de chiffrement réel sans `cryptography` |
|
||||||
|
| Bearer token fixe (pas de rotation) | `core/security/api_tokens.py` | Token compromis = accès permanent |
|
||||||
|
| Logs partiels de tokens (8 premiers chars) | `core/security/api_tokens.py:73-76` | Réduit l'entropie |
|
||||||
|
| Variables globales VLM non thread-safe | `core/detection/vlm_config.py` | Race condition possible |
|
||||||
|
|
||||||
|
### 8.4 Points positifs sécurité
|
||||||
|
|
||||||
|
- Credential Vault avec Fernet AES + PBKDF2 (600k itérations, conforme OWASP 2023)
|
||||||
|
- TOTP RFC 6238 pour 2FA
|
||||||
|
- Rate limiting configurable
|
||||||
|
- Audit trail (retention 180 jours)
|
||||||
|
- Floutage des données sensibles dans les replays
|
||||||
|
- HTTPS via Let's Encrypt en production
|
||||||
|
- Bearer token obligatoire sur les endpoints exposés
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. Déploiement & Infrastructure
|
||||||
|
|
||||||
|
### 9.1 Gestion des services
|
||||||
|
|
||||||
|
- **`svc.sh`** : Gestionnaire centralisé (systemd + fallback PID files)
|
||||||
|
- **`services.conf`** : Source de vérité (8 services, ports, commandes)
|
||||||
|
- **7 services systemd** dans `deploy/systemd/` (user-level)
|
||||||
|
|
||||||
|
### 9.2 Packaging Windows
|
||||||
|
|
||||||
|
- `deploy/build_package.sh` : Vérifie 26 fichiers requis
|
||||||
|
- Package "Léa" pour collaborateurs non-techniques
|
||||||
|
- Auto-stop enregistrement (1h max, notification à 50min)
|
||||||
|
- DPI awareness (SetProcessDpiAwareness(2))
|
||||||
|
|
||||||
|
### 9.3 Exposition Internet
|
||||||
|
|
||||||
|
| URL | Service | Auth |
|
||||||
|
|-----|---------|------|
|
||||||
|
| `lea.labs.laurinebazin.design` | Streaming :5005 | Bearer token |
|
||||||
|
| `vwb.labs.laurinebazin.design` | VWB frontend :3002 | HTTP Basic (lea/Medecin2026!) |
|
||||||
|
|
||||||
|
Reverse proxy : NPM (Nginx Proxy Manager) via Docker.
|
||||||
|
|
||||||
|
### 9.4 Duplication dans deploy/
|
||||||
|
|
||||||
|
Le dossier `deploy/build/Lea/` contient une **copie complète** de l'agent V1 (executor.py, chat_window.py, etc.) qui **diverge** du code source :
|
||||||
|
- `executor.py` : 1 576 lignes (deploy) vs 1 653 lignes (source) — manque le `NotificationManager`
|
||||||
|
- `TARGETED_CROP_SIZE` : 400×400 (deploy) vs 80×80 (source)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. Qualité du code
|
||||||
|
|
||||||
|
### 10.1 Fichiers monolithiques (> 2 000 lignes)
|
||||||
|
|
||||||
|
| Fichier | Lignes | Responsabilités mélangées |
|
||||||
|
|---------|--------|---------------------------|
|
||||||
|
| `api_stream.py` | 5 612 | API + replay + résolution + admin + healthcheck |
|
||||||
|
| `stream_processor.py` | 4 656 | Orchestration + nettoyage + replay builder + enrichissement |
|
||||||
|
| `target_resolver.py` | 3 495 | 5+ stratégies de résolution mélangées |
|
||||||
|
| `catalog_routes_v2_vlm.py` | 2 836 | Routes API + logique VLM + actions |
|
||||||
|
| `agent_chat/app.py` | 2 570 | Serveur Flask + logique chat + WebSocket |
|
||||||
|
| `web_dashboard/app.py` | 2 430 | Dashboard + 65 routes + proxy |
|
||||||
|
|
||||||
|
### 10.2 Debug print() en production
|
||||||
|
|
||||||
|
| Zone | Nombre de `print()` |
|
||||||
|
|------|---------------------|
|
||||||
|
| `visual_workflow_builder/` | ~1 500 |
|
||||||
|
| `scripts/` | ~800 |
|
||||||
|
| `examples/` | ~600 |
|
||||||
|
| `core/` | ~500 |
|
||||||
|
| `agent_v0/` | ~400 |
|
||||||
|
| `deploy/` | ~300 |
|
||||||
|
| `agent_chat/` | ~150 |
|
||||||
|
| `cli.py` | 130 |
|
||||||
|
| **Total** | **~4 350** |
|
||||||
|
|
||||||
|
La majorité provient de scripts de démonstration/diagnostic, mais ~500 sont dans le core et ~400 dans l'agent, utilisés en production.
|
||||||
|
|
||||||
|
### 10.3 TODO / FIXME / HACK
|
||||||
|
|
||||||
|
**50 marqueurs** dans le code actif (hors venvs) :
|
||||||
|
|
||||||
|
| Fichier | Nombre | Exemple |
|
||||||
|
|---------|--------|---------|
|
||||||
|
| `stream_processor.py` | 12 | Nettoyage, refactoring, edge cases |
|
||||||
|
| `auto_heal_manager.py` | 4 | Logique de récupération |
|
||||||
|
| `cli.py` | 3 | Fonctionnalités manquantes |
|
||||||
|
| `api_stream.py` | 3 | Optimisations pending |
|
||||||
|
|
||||||
|
### 10.4 Cohérence du code
|
||||||
|
|
||||||
|
#### Bug réel : `_MODIFIER_ONLY_KEYS` divergent
|
||||||
|
|
||||||
|
```python
|
||||||
|
# core/graph/graph_builder.py — 12 entrées
|
||||||
|
_MODIFIER_ONLY_KEYS = {
|
||||||
|
"ctrl", "ctrl_l", "ctrl_r",
|
||||||
|
"alt", "alt_l", "alt_r",
|
||||||
|
"shift", "shift_l", "shift_r",
|
||||||
|
"win", "cmd", "cmd_l", "cmd_r",
|
||||||
|
"meta", "super", "super_l", "super_r",
|
||||||
|
}
|
||||||
|
|
||||||
|
# agent_v0/server_v1/stream_processor.py — 20 entrées
|
||||||
|
_MODIFIER_ONLY_KEYS = {
|
||||||
|
"ctrl", "ctrl_l", "ctrl_r", "control", "control_l", "control_r",
|
||||||
|
"alt", "alt_l", "alt_r", "alt_gr",
|
||||||
|
"shift", "shift_l", "shift_r",
|
||||||
|
"win", "win_l", "win_r", "cmd", "cmd_l", "cmd_r",
|
||||||
|
"meta", "meta_l", "meta_r", "super", "super_l", "super_r",
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Le `graph_builder.py` ne reconnaît pas `control`, `control_l`, `control_r`, `alt_gr`, `win_l`, `win_r`, `meta_l`, `meta_r` comme des modificateurs. Cela peut causer des actions fantômes dans les workflows construits à partir des sessions enregistrées sur Windows.
|
||||||
|
|
||||||
|
### 10.5 Imports circulaires
|
||||||
|
|
||||||
|
**Aucun import circulaire détecté** entre les sous-modules de `core/`. C'est un point positif qui témoigne d'une bonne architecture en couches.
|
||||||
|
|
||||||
|
### 10.6 Code mort
|
||||||
|
|
||||||
|
- `_a_trier/` : **561 Mo**, 261 fichiers Python orphelins non triés
|
||||||
|
- `archives/` : 21 Mo de code archivé
|
||||||
|
- `scripts/` : 39 fichiers (16 525 lignes) de scripts de diagnostic/validation datés de janvier 2026, probablement obsolètes
|
||||||
|
- `examples/` : 29 fichiers de démonstration, certains avec des imports cassés
|
||||||
|
- 2 frontends VWB (`frontend/` 1,3 Go et `frontend_v4/` 79 Mo)
|
||||||
|
- `visual_workflow_builder/backend/app_lightweight.py` (1 451 lignes) et `app_catalogue_simple.py` (1 370 lignes) — alternatives apparemment non utilisées
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 11. Performances
|
||||||
|
|
||||||
|
### 11.1 Performances mesurées (31 mars 2026)
|
||||||
|
|
||||||
|
| Méthode | Précision | Vitesse | Usage |
|
||||||
|
|---------|-----------|---------|-------|
|
||||||
|
| Template matching 80×80 | dist=0.000 (parfait) | 0,1s | Icônes sans texte |
|
||||||
|
| Grounding Qwen2.5-VL GPU | dist<0.04 (exact) | 2-5s | Éléments avec texte OCR |
|
||||||
|
| SomEngine CPU (build_replay) | 80% détection | 1,4s | Enrichissement enregistrement |
|
||||||
|
|
||||||
|
### 11.2 Replay E2E Windows (meilleur résultat)
|
||||||
|
|
||||||
|
- 19/20 actions correctes (Word ouvert, texte tapé, document enregistré)
|
||||||
|
- 0 retries
|
||||||
|
- Temps moyen : 2,4s/clic
|
||||||
|
- Point faible : icônes sans texte OCR sur écrans différents
|
||||||
|
|
||||||
|
### 11.3 Tests (durée d'exécution)
|
||||||
|
|
||||||
|
- 1 457 tests en ~318s (5min18) avec `-m "not slow"`
|
||||||
|
- 6 tests marqués `@slow` (GPU-dépendants)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 12. Gestion des dépendances
|
||||||
|
|
||||||
|
### 12.1 requirements.txt principal
|
||||||
|
|
||||||
|
176 dépendances pinnées, incluant :
|
||||||
|
|
||||||
|
| Catégorie | Packages clés |
|
||||||
|
|-----------|--------------|
|
||||||
|
| ML/IA | `torch==2.9.1`, `transformers==4.57.3`, `open_clip_torch==3.2.0`, `timm==1.0.24` |
|
||||||
|
| Vision | `opencv-python==4.12.0.88`, `pillow==12.1.0`, `python-doctr==1.0.1` |
|
||||||
|
| Recherche | `faiss-cpu==1.13.2`, `scikit-learn==1.8.0` |
|
||||||
|
| Web | `fastapi==0.128.0`, `Flask==3.0.0`, `uvicorn==0.40.0` |
|
||||||
|
| Automatisation | `PyAutoGUI==0.9.54`, `pynput==1.8.1`, `mss==10.1.0` |
|
||||||
|
| GUI | `PyQt5==5.15.11` |
|
||||||
|
| Sécurité | `cryptography==46.0.3` |
|
||||||
|
| NVIDIA | `nvidia-cublas-cu12`, `nvidia-cudnn-cu12`, etc. (CUDA 12.8) |
|
||||||
|
|
||||||
|
### 12.2 Fichiers requirements multiples
|
||||||
|
|
||||||
|
7 fichiers `requirements*.txt` (hors archives) pour différents sous-projets. Risque de désynchronisation.
|
||||||
|
|
||||||
|
### 12.3 setup.py minimal
|
||||||
|
|
||||||
|
```python
|
||||||
|
install_requires=["numpy", "pillow", "faiss-cpu", "scikit-learn", "open_clip_torch"]
|
||||||
|
```
|
||||||
|
|
||||||
|
Ne reflète pas les dépendances réelles (manque torch, transformers, fastapi, flask, etc.). Le `setup.py` est vestigial.
|
||||||
|
|
||||||
|
### 12.4 Pas de pyproject.toml
|
||||||
|
|
||||||
|
Le projet utilise `setup.py` + `pytest.ini` au lieu du standard moderne `pyproject.toml`. Pas de linter configuré (ruff, black, mypy ne sont pas dans la CI).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 13. Documentation
|
||||||
|
|
||||||
|
### 13.1 Volume
|
||||||
|
|
||||||
|
- **136 fichiers** dans `docs/` (dont ~100 rapports de sessions/corrections de janvier 2026)
|
||||||
|
- Documentation structurée dans `docs/reference/`, `docs/specs/`, `docs/fiches/`, `docs/guides/`
|
||||||
|
- `docs/README.md` — index bien organisé
|
||||||
|
|
||||||
|
### 13.2 Documents clés
|
||||||
|
|
||||||
|
| Document | Contenu |
|
||||||
|
|----------|---------|
|
||||||
|
| `docs/reference/ARCHITECTURE_VISION_COMPLETE.md` | Architecture 5 couches complète |
|
||||||
|
| `docs/specs/requirements.md` | 15 requirements, 89 critères d'acceptation |
|
||||||
|
| `docs/specs/design.md` | Design détaillé, 20 correctness properties |
|
||||||
|
| `docs/specs/tasks.md` | Plan d'implémentation 13 phases, 60+ tâches |
|
||||||
|
| `docs/CONFORMITE_AI_ACT.md` | Conformité Règlement IA européen |
|
||||||
|
| `docs/PLAYBOOK_DSI_RSSI.md` | Playbook pour DSI/RSSI |
|
||||||
|
| `docs/DOSSIER_COMMISSAIRE_AUX_APPORTS.md` | Dossier d'évaluation financière |
|
||||||
|
|
||||||
|
### 13.3 Points d'attention
|
||||||
|
|
||||||
|
- ~100 fichiers de rapports de sessions datés (janvier 2026) polluent le dossier `docs/`
|
||||||
|
- Pas de documentation API auto-générée (Swagger/OpenAPI non configuré malgré FastAPI)
|
||||||
|
- Pas de CONTRIBUTING.md ou CHANGELOG.md formels
|
||||||
|
- Les commentaires dans le code sont en français (cohérent avec la convention du projet)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 14. Espace disque
|
||||||
|
|
||||||
|
### 14.1 Taille totale : 61 Go
|
||||||
|
|
||||||
|
| Élément | Taille | % |
|
||||||
|
|---------|--------|---|
|
||||||
|
| `.venv/` (principal) | 9,0 Go | 14,8% |
|
||||||
|
| `visual_workflow_builder/backend/venv` | 8,3 Go | 13,6% |
|
||||||
|
| `venv_v3/` (legacy) | 7,8 Go | 12,8% |
|
||||||
|
| `venv/` (legacy) | 7,5 Go | 12,3% |
|
||||||
|
| `visual_workflow_builder/venv` | 7,3 Go | 12,0% |
|
||||||
|
| `agent_v0/.venv` | 7,1 Go | 11,6% |
|
||||||
|
| **Total venvs** | **47,0 Go** | **77,0%** |
|
||||||
|
| `data/` | 3,2 Go | 5,2% |
|
||||||
|
| `frontend/node_modules` | 1,3 Go | 2,1% |
|
||||||
|
| `.git/` | 633 Mo | 1,0% |
|
||||||
|
| `_a_trier/` | 561 Mo | 0,9% |
|
||||||
|
| `models/` | 511 Mo | 0,8% |
|
||||||
|
| Code source + docs + reste | ~400 Mo | 0,7% |
|
||||||
|
|
||||||
|
### 14.2 Venvs dupliqués — problème critique
|
||||||
|
|
||||||
|
**6 environnements virtuels** pour un seul projet, totalisant **47 Go**. Chacun contient probablement PyTorch (~2 Go), transformers, etc. en doublon.
|
||||||
|
|
||||||
|
**Venvs actifs** :
|
||||||
|
- `.venv/` — principal (utilisé par pytest, svc.sh)
|
||||||
|
- `visual_workflow_builder/backend/venv` — backend VWB
|
||||||
|
|
||||||
|
**Venvs probablement inutiles** :
|
||||||
|
- `venv/` — ancien, probablement jamais nettoyé
|
||||||
|
- `venv_v3/` — ancien (référencé dans le Makefile mais plus utilisé)
|
||||||
|
- `visual_workflow_builder/venv` — probablement remplacé par `backend/venv`
|
||||||
|
- `agent_v0/.venv` — l'agent V1 est déployé séparément sur Windows
|
||||||
|
|
||||||
|
**Recommandation** : Supprimer les venvs inutilisés pour gagner ~30 Go.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 15. Points forts
|
||||||
|
|
||||||
|
1. **Architecture 5 couches claire** : Séparation nette des responsabilités, 30 sous-modules core sans imports circulaires
|
||||||
|
2. **100% vision** : Approche unique et cohérente, pas de raccourcis (accessibility API, DOM selectors)
|
||||||
|
3. **Suite de tests conséquente** : 1 463 tests, 95,8% de succès, couverture des modules critiques
|
||||||
|
4. **SomEngine bien conçu** : 315 lignes, singleton thread-safe, lazy loading, documentation
|
||||||
|
5. **Gestion GPU sophistiquée** : Modes RECORDING/AUTOPILOT, arbitrage VRAM automatique
|
||||||
|
6. **Sécurité crypto solide** : Fernet AES + PBKDF2 600k, TOTP RFC 6238
|
||||||
|
7. **Conformité réglementaire** : Rétention 180j, floutage, audit trail, dossier AI Act
|
||||||
|
8. **Packaging Windows robuste** : Vérification des 26 fichiers, auto-stop, DPI awareness
|
||||||
|
9. **Anti-détection** : Bézier mouse movement + frappe caractère par caractère
|
||||||
|
10. **Commits conventionnels** : Préfixes `feat:/fix:/refactor:/chore:` respectés
|
||||||
|
11. **Infrastructure as Code** : systemd services, svc.sh, services.conf
|
||||||
|
12. **Cascade de résolution intelligente** : VLM → template matching → SomEngine (fail-safe)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 16. Points faibles & Risques
|
||||||
|
|
||||||
|
### 16.1 Risques critiques (P0)
|
||||||
|
|
||||||
|
| # | Risque | Impact | Fichier |
|
||||||
|
|---|--------|--------|---------|
|
||||||
|
| 1 | **Clés API cloud en clair** (Anthropic, OpenAI, Google, Deepseek) | Compromission financière + accès APIs | `.env.local` |
|
||||||
|
| 2 | Tokens admin hardcodés dans le code | Compromission complète de l'API exposée sur Internet | `core/security/api_tokens.py:93-94` |
|
||||||
|
| 3 | `eval()` sur conditions workflow | Injection de code arbitraire | `core/execution/dag_executor.py:532` |
|
||||||
|
| 4 | Clé de signature par défaut | Forge de tokens en production | `core/security/api_tokens.py:80` |
|
||||||
|
|
||||||
|
### 16.2 Risques hauts (P1)
|
||||||
|
|
||||||
|
| # | Risque | Impact |
|
||||||
|
|---|--------|--------|
|
||||||
|
| 5 | `pickle.load()` sans restrictions | Exécution de code via fichiers `.pkl` malveillants |
|
||||||
|
| 6 | 11 `subprocess(shell=True)` avec variables | Injection de commandes |
|
||||||
|
| 7 | `_MODIFIER_ONLY_KEYS` divergent entre modules | Actions fantômes dans les workflows |
|
||||||
|
| 8 | Executor dupliqué et divergent (source vs deploy) | Comportement différent en prod |
|
||||||
|
| 9 | 36+ fichiers modifiés non commités | Perte de travail potentielle |
|
||||||
|
|
||||||
|
### 16.3 Risques moyens (P2)
|
||||||
|
|
||||||
|
| # | Risque | Impact |
|
||||||
|
|---|--------|--------|
|
||||||
|
| 8 | Fichiers monolithiques (api_stream.py : 5 612 lignes) | Maintenabilité, risque de régression |
|
||||||
|
| 9 | 47 Go de venvs (77% de l'espace disque) | Espace disque, confusion |
|
||||||
|
| 10 | 4 350 print() en production | Pas de logging structuré, debug en prod |
|
||||||
|
| 11 | 69 bare except:, 191 except Exception: | Erreurs masquées |
|
||||||
|
| 12 | 7 tests property cassés | Fausse couverture |
|
||||||
|
| 13 | Makefile pointe vers mauvais venv | DX cassée |
|
||||||
|
| 14 | `setup.py` ne reflète pas les vraies dépendances | Installation cassée |
|
||||||
|
| 15 | CORS `*` sur le dashboard | Pas de restriction cross-origin |
|
||||||
|
|
||||||
|
### 16.4 Dette technique (P3)
|
||||||
|
|
||||||
|
| # | Problème | Volume |
|
||||||
|
|---|----------|--------|
|
||||||
|
| 16 | `_a_trier/` non trié | 561 Mo, 261 fichiers Python |
|
||||||
|
| 17 | Scripts de diagnostic datés (jan 2026) | 39 fichiers, 16 525 lignes |
|
||||||
|
| 18 | 2 frontends VWB | 1,3 Go vs 79 Mo |
|
||||||
|
| 19 | ~100 rapports de sessions dans docs/ | Pollution documentation |
|
||||||
|
| 20 | 50 TODO/FIXME dans le code actif | Travail non terminé |
|
||||||
|
| 21 | Pas de CI/CD (linter, tests automatiques) | Qualité non vérifiée automatiquement |
|
||||||
|
| 22 | Pas de pyproject.toml | Configuration fragmentée |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 17. Recommandations
|
||||||
|
|
||||||
|
### Immédiat (cette semaine) — Sécurité & Risque de perte
|
||||||
|
|
||||||
|
| # | Action | Effort | Impact |
|
||||||
|
|---|--------|--------|--------|
|
||||||
|
| 1 | **Révoquer toutes les clés API cloud** (Anthropic, OpenAI, Google, Deepseek dans `.env.local`) et régénérer | 1h | 🔴 Critique |
|
||||||
|
| 2 | **Supprimer les tokens hardcodés** de `api_tokens.py`, les charger uniquement depuis `.env` | 30min | 🔴 Critique |
|
||||||
|
| 3 | **Remplacer `eval()` par `ast.literal_eval`** ou un parser restreint | 2h | 🔴 Critique |
|
||||||
|
| 4 | **Commiter les 36+ fichiers modifiés** ou les stasher | 15min | 🔴 Perte de travail |
|
||||||
|
| 5 | **Supprimer la clé par défaut** dans `TOKEN_SECRET_KEY` | 15min | 🔴 Critique |
|
||||||
|
| 6 | **Corriger `cors_allowed_origins="*"`** dans web_dashboard | 10min | 🟠 Haut |
|
||||||
|
|
||||||
|
### Court terme (1-2 semaines) — Cohérence & Hygiène
|
||||||
|
|
||||||
|
| # | Action | Effort | Impact |
|
||||||
|
|---|--------|--------|--------|
|
||||||
|
| 7 | Unifier `_MODIFIER_ONLY_KEYS` dans un module partagé | 1h | 🟠 Bug réel |
|
||||||
|
| 8 | Corriger le Makefile (`venv_v3` → `.venv`) | 5min | 🟡 DX |
|
||||||
|
| 9 | Supprimer les 4 venvs inutilisés (~30 Go) | 10min | 🟡 Espace |
|
||||||
|
| 10 | Remplacer `subprocess(shell=True)` par des listes d'arguments | 2h | 🟠 Injection |
|
||||||
|
| 11 | Remplacer `pickle.load()` par JSON/msgpack dans faiss_manager | 2h | 🟠 Sécurité |
|
||||||
|
| 12 | Supprimer la copie divergente dans `deploy/build/Lea/` | 1h | 🟠 Cohérence |
|
||||||
|
| 13 | Corriger les 9 tests en échec | 4h | 🟡 Qualité |
|
||||||
|
|
||||||
|
### Moyen terme (1-2 mois) — Maintenabilité
|
||||||
|
|
||||||
|
| # | Action | Effort | Impact |
|
||||||
|
|---|--------|--------|--------|
|
||||||
|
| 12 | Découper `api_stream.py` (5 612L) en 4+ modules | 2j | 🟡 Maintenabilité |
|
||||||
|
| 13 | Découper `stream_processor.py` (4 656L) | 2j | 🟡 Maintenabilité |
|
||||||
|
| 14 | Remplacer les `print()` par `logging` (core + agent) | 1j | 🟡 Observabilité |
|
||||||
|
| 15 | Nettoyer `_a_trier/` (561 Mo) | 2h | 🟡 Hygiène |
|
||||||
|
| 16 | Supprimer/archiver les scripts de diagnostic de jan 2026 | 1h | 🟡 Hygiène |
|
||||||
|
| 17 | Migrer vers `pyproject.toml` | 2h | 🟡 Standards |
|
||||||
|
| 18 | Configurer CI (ruff + pytest + pre-commit) | 4h | 🟡 Qualité |
|
||||||
|
| 19 | Activer Swagger/OpenAPI pour FastAPI | 1h | 🟡 Documentation |
|
||||||
|
| 20 | Réparer ou supprimer les 7 tests property | 4h | 🟡 Couverture |
|
||||||
|
|
||||||
|
### Long terme (3+ mois) — Scalabilité
|
||||||
|
|
||||||
|
| # | Action | Effort |
|
||||||
|
|---|--------|--------|
|
||||||
|
| 21 | Containeriser avec Docker (multi-stage builds) |
|
||||||
|
| 22 | Implémenter la rotation de tokens API |
|
||||||
|
| 23 | Ajouter des health checks automatisés pour chaque service |
|
||||||
|
| 24 | Mettre en place un pipeline CI/CD complet (build → test → deploy) |
|
||||||
|
| 25 | Implémenter le monitoring Prometheus/Grafana |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 18. Score global
|
||||||
|
|
||||||
|
| Axe | Note | Commentaire |
|
||||||
|
|-----|------|-------------|
|
||||||
|
| **Fonctionnalité** | 8/10 | Pipeline complet, replay fonctionnel, VWB opérationnel |
|
||||||
|
| **Architecture** | 7/10 | 5 couches bien séparées, mais fichiers monolithiques |
|
||||||
|
| **Tests** | 7/10 | 1 463 tests, 95,8% succès, mais property tests cassés |
|
||||||
|
| **Sécurité** | 2/10 | Clés API cloud en clair + tokens hardcodés + eval() + pickle + shell=True |
|
||||||
|
| **Cohérence** | 5/10 | Duplication code, venvs multiples, divergences |
|
||||||
|
| **Dette technique** | 4/10 | 4 350 print(), 561 Mo non trié, fichiers géants |
|
||||||
|
| **Documentation** | 6/10 | Bonne structure mais polluée par les rapports de session |
|
||||||
|
| **Déploiement** | 6/10 | systemd + svc.sh fonctionnels, mais pas de CI/CD |
|
||||||
|
| **Performance** | 8/10 | 2,4s/clic, cascade intelligente, GPU bien géré |
|
||||||
|
| **DX (Developer Experience)** | 5/10 | Makefile cassé, venvs confus, pas de linter |
|
||||||
|
| **Global** | **5,7/10** | Solide fonctionnellement, sécurité et housekeeping urgents |
|
||||||
|
|
||||||
|
### Verdict
|
||||||
|
|
||||||
|
RPA Vision V3 est un projet ambitieux et techniquement impressionnant dans sa vision (100% basé sur la vision, pas de sélecteurs). Le pipeline fonctionne, le replay est opérationnel, et l'architecture 5 couches est bien pensée.
|
||||||
|
|
||||||
|
Cependant, **la mise en production est bloquée** par les failles de sécurité critiques (tokens hardcodés, eval(), clé par défaut). Les actions P0 doivent être traitées **avant toute exposition supplémentaire sur Internet**.
|
||||||
|
|
||||||
|
La dette technique (fichiers monolithiques, 47 Go de venvs, 4 350 print()) ne bloque pas le fonctionnement mais ralentira significativement le développement futur. Un sprint de nettoyage de 1-2 semaines apporterait un ROI important.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Généré le 4 avril 2026 par Claude Sonnet 4.6 — Audit multi-agents (5 agents parallèles : architecture, core, tests, web, sécurité)*
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user