feat: replay visuel VLM-first, worker séparé, package Léa, AZERTY, sécurité HTTPS

Pipeline replay visuel : - VLM-first : l'agent appelle Ollama directement pour trouver les éléments - Template matching en fallback (seuil strict 0.90) - Stop immédiat si élément non trouvé (pas de clic blind) - Replay depuis session brute (/replay-session) sans attendre le VLM - Vérification post-action (screenshot hash avant/après) - Gestion des popups (Enter/Escape/Tab+Enter) Worker VLM séparé : - run_worker.py : process distinct du serveur HTTP - Communication par fichiers (_worker_queue.txt + _replay_active.lock) - Le serveur HTTP ne fait plus jamais de VLM → toujours réactif - Service systemd rpa-worker.service Capture clavier : - raw_keys (vk + press/release) pour replay exact indépendant du layout - Fix AZERTY : ToUnicodeEx + AltGr detection - Enter capturé comme \n, Tab comme \t - Filtrage modificateurs seuls (Ctrl/Alt/Shift parasites) - Fusion text_input consécutifs, dédup key_combo Sécurité & Internet : - HTTPS Let's Encrypt (lea.labs + vwb.labs.laurinebazin.design) - Token API fixe dans .env.local - HTTP Basic Auth sur VWB - Security headers (HSTS, CSP, nosniff) - CORS domaines publics, plus de wildcard Infrastructure : - DPI awareness (SetProcessDpiAwareness) Python + Rust - Métadonnées système (dpi_scale, window_bounds, monitors, os_theme) - Template matching multi-scale [0.5, 2.0] - Résolution dynamique (plus de hardcode 1920x1080) - VLM prefill fix (47x speedup, 3.5s au lieu de 180s) Modules : - core/auth/ : credential vault (Fernet AES), TOTP (RFC 6238), auth handler - core/federation/ : LearningPack export/import anonymisé, FAISS global - deploy/ : package Léa (config.txt, Lea.bat, install.bat, LISEZMOI.txt) UX : - Filtrage OS (VWB + Chat montrent que les workflows de l'OS courant) - Bibliothèque persistante (cache local + SQLite) - Clustering hybride (titre fenêtre + DBSCAN) - EdgeConstraints + PostConditions peuplés - GraphBuilder compound actions (toutes les frappes) Agent Rust : - Token Bearer auth (network.rs) - sysinfo.rs (DPI, résolution, window bounds via Win32 API) - config.txt lu automatiquement - Support Chrome/Brave/Firefox (pas que Edge) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 10:19:18 +01:00
parent fe5e0ba83d
commit d5deac3029
162 changed files with 25669 additions and 557 deletions
--- a/core/embedding/fusion_engine.py
+++ b/core/embedding/fusion_engine.py
@@ -125,18 +125,32 @@ class FusionEngine:
                      weights: Dict[str, float]) -> np.ndarray:
        """
        Fusion pondérée simple : somme pondérée des vecteurs
-        
+
        fused = w1*v1 + w2*v2 + w3*v3 + w4*v4
+
+        Les poids sont renormalisés en fonction des modalités effectivement
+        présentes, pour que la somme des poids effectifs = 1.0.
+        Exemple : si seuls image (0.5) et text (0.3) sont fournis,
+        les poids deviennent image=0.625, text=0.375.
        """
        # Initialiser vecteur résultat
        first_vector = next(iter(embeddings.values()))
        fused = np.zeros_like(first_vector, dtype=np.float32)
-        
-        # Somme pondérée
+
+        # Calculer la somme des poids des modalités présentes pour renormaliser
+        present_weight_sum = sum(
+            weights.get(modality, 0.0) for modality in embeddings
+        )
+
+        # Somme pondérée avec renormalisation
        for modality, vector in embeddings.items():
-            weight = weights.get(modality, 0.0)
-            fused += weight * vector
-        
+            raw_weight = weights.get(modality, 0.0)
+            if present_weight_sum > 1e-10:
+                effective_weight = raw_weight / present_weight_sum
+            else:
+                effective_weight = 1.0 / len(embeddings)
+            fused += effective_weight * vector
+
        return fused
    
    def _fuse_concat_projection(self,
--- a/core/embedding/state_embedding_builder.py
+++ b/core/embedding/state_embedding_builder.py
@@ -112,7 +112,7 @@ class StateEmbeddingBuilder:
            metadata={
                "screen_state_id": screen_state.screen_state_id,
                "timestamp": screen_state.timestamp.isoformat(),
-                "window_title": getattr(screen_state.window, 'title', ''),
+                "window_title": getattr(screen_state.window, 'window_title', ''),
                "created_at": datetime.now().isoformat()
            }
        )
@@ -160,15 +160,16 @@ class StateEmbeddingBuilder:
            if ui_emb is not None:
                embeddings["ui"] = ui_emb
        
-        # Si aucun embedding calculé, créer des vecteurs par défaut
+        # Si aucun embedding calculé, retourner un vecteur zéro unique
+        # (sera ignoré par DBSCAN → noise, comportement correct)
        if not embeddings:
-            # Utiliser dimensions par défaut (512)
            default_dim = 512
+            logger.warning(
+                "Aucun embedding calculé pour ce ScreenState — "
+                "retour d'un vecteur zéro (sera traité comme noise par DBSCAN)"
+            )
            embeddings = {
-                "image": np.random.randn(default_dim).astype(np.float32),
-                "text": np.random.randn(default_dim).astype(np.float32),
-                "title": np.random.randn(default_dim).astype(np.float32),
-                "ui": np.random.randn(default_dim).astype(np.float32)
+                "image": np.zeros(default_dim, dtype=np.float32)
            }
        
        return embeddings
@@ -243,7 +244,7 @@ class StateEmbeddingBuilder:
        
        try:
            embedder = self.embedders["title"]
-            title = getattr(screen_state.window, 'title', '')
+            title = getattr(screen_state.window, 'window_title', '')
            
            if not title:
                return None