feat: replay visuel VLM-first, worker séparé, package Léa, AZERTY, sécurité HTTPS

Pipeline replay visuel :
- VLM-first : l'agent appelle Ollama directement pour trouver les éléments
- Template matching en fallback (seuil strict 0.90)
- Stop immédiat si élément non trouvé (pas de clic blind)
- Replay depuis session brute (/replay-session) sans attendre le VLM
- Vérification post-action (screenshot hash avant/après)
- Gestion des popups (Enter/Escape/Tab+Enter)

Worker VLM séparé :
- run_worker.py : process distinct du serveur HTTP
- Communication par fichiers (_worker_queue.txt + _replay_active.lock)
- Le serveur HTTP ne fait plus jamais de VLM → toujours réactif
- Service systemd rpa-worker.service

Capture clavier :
- raw_keys (vk + press/release) pour replay exact indépendant du layout
- Fix AZERTY : ToUnicodeEx + AltGr detection
- Enter capturé comme \n, Tab comme \t
- Filtrage modificateurs seuls (Ctrl/Alt/Shift parasites)
- Fusion text_input consécutifs, dédup key_combo

Sécurité & Internet :
- HTTPS Let's Encrypt (lea.labs + vwb.labs.laurinebazin.design)
- Token API fixe dans .env.local
- HTTP Basic Auth sur VWB
- Security headers (HSTS, CSP, nosniff)
- CORS domaines publics, plus de wildcard

Infrastructure :
- DPI awareness (SetProcessDpiAwareness) Python + Rust
- Métadonnées système (dpi_scale, window_bounds, monitors, os_theme)
- Template matching multi-scale [0.5, 2.0]
- Résolution dynamique (plus de hardcode 1920x1080)
- VLM prefill fix (47x speedup, 3.5s au lieu de 180s)

Modules :
- core/auth/ : credential vault (Fernet AES), TOTP (RFC 6238), auth handler
- core/federation/ : LearningPack export/import anonymisé, FAISS global
- deploy/ : package Léa (config.txt, Lea.bat, install.bat, LISEZMOI.txt)

UX :
- Filtrage OS (VWB + Chat montrent que les workflows de l'OS courant)
- Bibliothèque persistante (cache local + SQLite)
- Clustering hybride (titre fenêtre + DBSCAN)
- EdgeConstraints + PostConditions peuplés
- GraphBuilder compound actions (toutes les frappes)

Agent Rust :
- Token Bearer auth (network.rs)
- sysinfo.rs (DPI, résolution, window bounds via Win32 API)
- config.txt lu automatiquement
- Support Chrome/Brave/Firefox (pas que Edge)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Dom
2026-03-26 10:19:18 +01:00
parent fe5e0ba83d
commit d5deac3029
162 changed files with 25669 additions and 557 deletions

View File

@@ -158,16 +158,35 @@ class LiveSessionManager:
session.events.append(event_data)
session.last_activity = datetime.now()
# Extraire le contexte fenêtre si présent
# Format 1 : {"window": {"title": ..., "app_name": ...}} (Python agent)
# Format 2 : {"window_title": "...", "screen_resolution": [w, h]} (Rust agent)
window = event_data.get("window")
if window and isinstance(window, dict):
session.last_window_info = window
# Accumuler les titres/apps pour le nommage automatique
title = window.get("title", "").strip()
app_name = window.get("app_name", "").strip()
if title and title != "Unknown":
session.window_titles_seen[title] = session.window_titles_seen.get(title, 0) + 1
if app_name and app_name != "unknown":
session.app_names_seen[app_name] = session.app_names_seen.get(app_name, 0) + 1
elif event_data.get("window_title"):
# Format Rust agent : extraire le titre et la résolution
info = {
"title": event_data["window_title"],
"app_name": session.last_window_info.get("app_name", "unknown"),
}
# Propager la résolution si fournie par l'agent
screen_res = event_data.get("screen_resolution")
if screen_res and isinstance(screen_res, list) and len(screen_res) == 2:
info["screen_resolution"] = screen_res
# Propager les métadonnées d'environnement graphique
for meta_key in ("dpi_scale", "monitor_index", "window_bounds",
"monitors", "os_theme", "os_language"):
meta_val = event_data.get(meta_key)
if meta_val is not None:
info[meta_key] = meta_val
session.last_window_info = info
# Accumuler les titres/apps pour le nommage automatique
title = session.last_window_info.get("title", "").strip()
app_name = session.last_window_info.get("app_name", "").strip()
if title and title != "Unknown":
session.window_titles_seen[title] = session.window_titles_seen.get(title, 0) + 1
if app_name and app_name != "unknown":
session.app_names_seen[app_name] = session.app_names_seen.get(app_name, 0) + 1
self._maybe_persist(session_id)
def add_screenshot(self, session_id: str, shot_id: str, file_path: str) -> None:
@@ -227,16 +246,41 @@ class LiveSessionManager:
"captured_at": datetime.now().isoformat(),
})
# Résolution réelle depuis les events (envoyée par l'agent Rust/Python),
# fallback sur 1920x1080 si non disponible
screen_res = session.last_window_info.get("screen_resolution", [1920, 1080])
# Métadonnées d'environnement graphique dynamiques
screen_info: Dict[str, Any] = {"primary_resolution": screen_res}
dpi_scale = session.last_window_info.get("dpi_scale")
if dpi_scale is not None:
screen_info["dpi_scale"] = dpi_scale
monitors = session.last_window_info.get("monitors")
if monitors is not None:
screen_info["monitors"] = monitors
monitor_index = session.last_window_info.get("monitor_index")
if monitor_index is not None:
screen_info["monitor_index"] = monitor_index
env_info: Dict[str, Any] = {
"os": platform.system().lower(),
"hostname": socket.gethostname(),
"machine_id": session.machine_id,
"screen": screen_info,
}
# Propager os_theme / os_language si disponibles
os_theme = session.last_window_info.get("os_theme")
if os_theme is not None:
env_info["os_theme"] = os_theme
os_language = session.last_window_info.get("os_language")
if os_language is not None:
env_info["os_language"] = os_language
return {
"schema_version": "rawsession_v1",
"session_id": session.session_id,
"agent_version": "agent_v1_stream",
"environment": {
"os": platform.system().lower(),
"hostname": socket.gethostname(),
"machine_id": session.machine_id,
"screen": {"primary_resolution": [1920, 1080]},
},
"environment": env_info,
"user": {"id": "remote_agent"},
"context": {
"workflow": session.last_window_info.get("title", ""),