feat: replay visuel VLM-first, worker séparé, package Léa, AZERTY, sécurité HTTPS
Pipeline replay visuel : - VLM-first : l'agent appelle Ollama directement pour trouver les éléments - Template matching en fallback (seuil strict 0.90) - Stop immédiat si élément non trouvé (pas de clic blind) - Replay depuis session brute (/replay-session) sans attendre le VLM - Vérification post-action (screenshot hash avant/après) - Gestion des popups (Enter/Escape/Tab+Enter) Worker VLM séparé : - run_worker.py : process distinct du serveur HTTP - Communication par fichiers (_worker_queue.txt + _replay_active.lock) - Le serveur HTTP ne fait plus jamais de VLM → toujours réactif - Service systemd rpa-worker.service Capture clavier : - raw_keys (vk + press/release) pour replay exact indépendant du layout - Fix AZERTY : ToUnicodeEx + AltGr detection - Enter capturé comme \n, Tab comme \t - Filtrage modificateurs seuls (Ctrl/Alt/Shift parasites) - Fusion text_input consécutifs, dédup key_combo Sécurité & Internet : - HTTPS Let's Encrypt (lea.labs + vwb.labs.laurinebazin.design) - Token API fixe dans .env.local - HTTP Basic Auth sur VWB - Security headers (HSTS, CSP, nosniff) - CORS domaines publics, plus de wildcard Infrastructure : - DPI awareness (SetProcessDpiAwareness) Python + Rust - Métadonnées système (dpi_scale, window_bounds, monitors, os_theme) - Template matching multi-scale [0.5, 2.0] - Résolution dynamique (plus de hardcode 1920x1080) - VLM prefill fix (47x speedup, 3.5s au lieu de 180s) Modules : - core/auth/ : credential vault (Fernet AES), TOTP (RFC 6238), auth handler - core/federation/ : LearningPack export/import anonymisé, FAISS global - deploy/ : package Léa (config.txt, Lea.bat, install.bat, LISEZMOI.txt) UX : - Filtrage OS (VWB + Chat montrent que les workflows de l'OS courant) - Bibliothèque persistante (cache local + SQLite) - Clustering hybride (titre fenêtre + DBSCAN) - EdgeConstraints + PostConditions peuplés - GraphBuilder compound actions (toutes les frappes) Agent Rust : - Token Bearer auth (network.rs) - sysinfo.rs (DPI, résolution, window bounds via Win32 API) - config.txt lu automatiquement - Support Chrome/Brave/Firefox (pas que Edge) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -158,16 +158,35 @@ class LiveSessionManager:
|
||||
session.events.append(event_data)
|
||||
session.last_activity = datetime.now()
|
||||
# Extraire le contexte fenêtre si présent
|
||||
# Format 1 : {"window": {"title": ..., "app_name": ...}} (Python agent)
|
||||
# Format 2 : {"window_title": "...", "screen_resolution": [w, h]} (Rust agent)
|
||||
window = event_data.get("window")
|
||||
if window and isinstance(window, dict):
|
||||
session.last_window_info = window
|
||||
# Accumuler les titres/apps pour le nommage automatique
|
||||
title = window.get("title", "").strip()
|
||||
app_name = window.get("app_name", "").strip()
|
||||
if title and title != "Unknown":
|
||||
session.window_titles_seen[title] = session.window_titles_seen.get(title, 0) + 1
|
||||
if app_name and app_name != "unknown":
|
||||
session.app_names_seen[app_name] = session.app_names_seen.get(app_name, 0) + 1
|
||||
elif event_data.get("window_title"):
|
||||
# Format Rust agent : extraire le titre et la résolution
|
||||
info = {
|
||||
"title": event_data["window_title"],
|
||||
"app_name": session.last_window_info.get("app_name", "unknown"),
|
||||
}
|
||||
# Propager la résolution si fournie par l'agent
|
||||
screen_res = event_data.get("screen_resolution")
|
||||
if screen_res and isinstance(screen_res, list) and len(screen_res) == 2:
|
||||
info["screen_resolution"] = screen_res
|
||||
# Propager les métadonnées d'environnement graphique
|
||||
for meta_key in ("dpi_scale", "monitor_index", "window_bounds",
|
||||
"monitors", "os_theme", "os_language"):
|
||||
meta_val = event_data.get(meta_key)
|
||||
if meta_val is not None:
|
||||
info[meta_key] = meta_val
|
||||
session.last_window_info = info
|
||||
# Accumuler les titres/apps pour le nommage automatique
|
||||
title = session.last_window_info.get("title", "").strip()
|
||||
app_name = session.last_window_info.get("app_name", "").strip()
|
||||
if title and title != "Unknown":
|
||||
session.window_titles_seen[title] = session.window_titles_seen.get(title, 0) + 1
|
||||
if app_name and app_name != "unknown":
|
||||
session.app_names_seen[app_name] = session.app_names_seen.get(app_name, 0) + 1
|
||||
self._maybe_persist(session_id)
|
||||
|
||||
def add_screenshot(self, session_id: str, shot_id: str, file_path: str) -> None:
|
||||
@@ -227,16 +246,41 @@ class LiveSessionManager:
|
||||
"captured_at": datetime.now().isoformat(),
|
||||
})
|
||||
|
||||
# Résolution réelle depuis les events (envoyée par l'agent Rust/Python),
|
||||
# fallback sur 1920x1080 si non disponible
|
||||
screen_res = session.last_window_info.get("screen_resolution", [1920, 1080])
|
||||
|
||||
# Métadonnées d'environnement graphique dynamiques
|
||||
screen_info: Dict[str, Any] = {"primary_resolution": screen_res}
|
||||
dpi_scale = session.last_window_info.get("dpi_scale")
|
||||
if dpi_scale is not None:
|
||||
screen_info["dpi_scale"] = dpi_scale
|
||||
monitors = session.last_window_info.get("monitors")
|
||||
if monitors is not None:
|
||||
screen_info["monitors"] = monitors
|
||||
monitor_index = session.last_window_info.get("monitor_index")
|
||||
if monitor_index is not None:
|
||||
screen_info["monitor_index"] = monitor_index
|
||||
|
||||
env_info: Dict[str, Any] = {
|
||||
"os": platform.system().lower(),
|
||||
"hostname": socket.gethostname(),
|
||||
"machine_id": session.machine_id,
|
||||
"screen": screen_info,
|
||||
}
|
||||
# Propager os_theme / os_language si disponibles
|
||||
os_theme = session.last_window_info.get("os_theme")
|
||||
if os_theme is not None:
|
||||
env_info["os_theme"] = os_theme
|
||||
os_language = session.last_window_info.get("os_language")
|
||||
if os_language is not None:
|
||||
env_info["os_language"] = os_language
|
||||
|
||||
return {
|
||||
"schema_version": "rawsession_v1",
|
||||
"session_id": session.session_id,
|
||||
"agent_version": "agent_v1_stream",
|
||||
"environment": {
|
||||
"os": platform.system().lower(),
|
||||
"hostname": socket.gethostname(),
|
||||
"machine_id": session.machine_id,
|
||||
"screen": {"primary_resolution": [1920, 1080]},
|
||||
},
|
||||
"environment": env_info,
|
||||
"user": {"id": "remote_agent"},
|
||||
"context": {
|
||||
"workflow": session.last_window_info.get("title", ""),
|
||||
|
||||
Reference in New Issue
Block a user