Go to file

Dom d5deac3029 feat: replay visuel VLM-first, worker séparé, package Léa, AZERTY, sécurité HTTPS

Pipeline replay visuel :
- VLM-first : l'agent appelle Ollama directement pour trouver les éléments
- Template matching en fallback (seuil strict 0.90)
- Stop immédiat si élément non trouvé (pas de clic blind)
- Replay depuis session brute (/replay-session) sans attendre le VLM
- Vérification post-action (screenshot hash avant/après)
- Gestion des popups (Enter/Escape/Tab+Enter)

Worker VLM séparé :
- run_worker.py : process distinct du serveur HTTP
- Communication par fichiers (_worker_queue.txt + _replay_active.lock)
- Le serveur HTTP ne fait plus jamais de VLM → toujours réactif
- Service systemd rpa-worker.service

Capture clavier :
- raw_keys (vk + press/release) pour replay exact indépendant du layout
- Fix AZERTY : ToUnicodeEx + AltGr detection
- Enter capturé comme \n, Tab comme \t
- Filtrage modificateurs seuls (Ctrl/Alt/Shift parasites)
- Fusion text_input consécutifs, dédup key_combo

Sécurité & Internet :
- HTTPS Let's Encrypt (lea.labs + vwb.labs.laurinebazin.design)
- Token API fixe dans .env.local
- HTTP Basic Auth sur VWB
- Security headers (HSTS, CSP, nosniff)
- CORS domaines publics, plus de wildcard

Infrastructure :
- DPI awareness (SetProcessDpiAwareness) Python + Rust
- Métadonnées système (dpi_scale, window_bounds, monitors, os_theme)
- Template matching multi-scale [0.5, 2.0]
- Résolution dynamique (plus de hardcode 1920x1080)
- VLM prefill fix (47x speedup, 3.5s au lieu de 180s)

Modules :
- core/auth/ : credential vault (Fernet AES), TOTP (RFC 6238), auth handler
- core/federation/ : LearningPack export/import anonymisé, FAISS global
- deploy/ : package Léa (config.txt, Lea.bat, install.bat, LISEZMOI.txt)

UX :
- Filtrage OS (VWB + Chat montrent que les workflows de l'OS courant)
- Bibliothèque persistante (cache local + SQLite)
- Clustering hybride (titre fenêtre + DBSCAN)
- EdgeConstraints + PostConditions peuplés
- GraphBuilder compound actions (toutes les frappes)

Agent Rust :
- Token Bearer auth (network.rs)
- sysinfo.rs (DPI, résolution, window bounds via Win32 API)
- config.txt lu automatiquement
- Support Chrome/Brave/Firefox (pas que Edge)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-26 10:19:18 +01:00

agent_chat

feat: replay visuel VLM-first, worker séparé, package Léa, AZERTY, sécurité HTTPS

2026-03-26 10:19:18 +01:00

agent_rust

feat: replay visuel VLM-first, worker séparé, package Léa, AZERTY, sécurité HTTPS

2026-03-26 10:19:18 +01:00

agent_v0

feat: replay visuel VLM-first, worker séparé, package Léa, AZERTY, sécurité HTTPS

2026-03-26 10:19:18 +01:00

core

feat: replay visuel VLM-first, worker séparé, package Léa, AZERTY, sécurité HTTPS

2026-03-26 10:19:18 +01:00

data/training/workflows

feat: extraction expressions math + workflow calculatrice paramétrable

2026-03-14 18:39:56 +01:00

deploy

feat: replay visuel VLM-first, worker séparé, package Léa, AZERTY, sécurité HTTPS

2026-03-26 10:19:18 +01:00

docs

feat: replay visuel VLM-first, worker séparé, package Léa, AZERTY, sécurité HTTPS

2026-03-26 10:19:18 +01:00

examples

feat: chat unifié, GestureCatalog, Copilot, Léa UI, extraction données, vérification replay

2026-03-15 10:02:09 +01:00

gui

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

i18n

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

models

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

scripts

feat: chat unifié, GestureCatalog, Copilot, Léa UI, extraction données, vérification replay

2026-03-15 10:02:09 +01:00

server

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

tests

feat: replay visuel VLM-first, worker séparé, package Léa, AZERTY, sécurité HTTPS

2026-03-26 10:19:18 +01:00

visual_workflow_builder

feat: replay visuel VLM-first, worker séparé, package Léa, AZERTY, sécurité HTTPS

2026-03-26 10:19:18 +01:00

web_dashboard

feat: chat unifié, GestureCatalog, Copilot, Léa UI, extraction données, vérification replay

2026-03-15 10:02:09 +01:00

__init__.py

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

.env.example

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

.gitignore

feat: replay visuel VLM-first, worker séparé, package Léa, AZERTY, sécurité HTTPS

2026-03-26 10:19:18 +01:00

agent_config.json

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

AGENT_CONVERSATIONNEL_VISION.md

Refactor: Renommer command_interface en agent_chat

2026-01-15 15:13:26 +01:00

ANALYSE_MOAT_RPA_VISION_V3.md

docs: Ajouter analyse MOAT complète RPA Vision V3

2026-01-18 18:10:52 +01:00

cli.py

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

Makefile

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

monitoring_server.py

feat: chat unifié, GestureCatalog, Copilot, Léa UI, extraction données, vérification replay

2026-03-15 10:02:09 +01:00

PITCH_INVESTISSEURS_RPA_VISION_V3.md

Docs: Audit sécurité et pitch investisseurs

2026-01-15 00:31:37 +01:00

pytest.ini

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

QUICK_START.md

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

README.md

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

requirements.txt

chore: consolider venvs — .venv unique avec requirements.txt complet

2026-03-17 07:52:25 +01:00

run_gui.py

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

run.sh

feat: chat unifié, GestureCatalog, Copilot, Léa UI, extraction données, vérification replay

2026-03-15 10:02:09 +01:00

services.conf

feat: replay visuel VLM-first, worker séparé, package Léa, AZERTY, sécurité HTTPS

2026-03-26 10:19:18 +01:00

setup.py

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

status.sh

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

svc.sh

feat: replay visuel VLM-first, worker séparé, package Léa, AZERTY, sécurité HTTPS

2026-03-26 10:19:18 +01:00

README.md

RPA Vision V3 - 100% Vision-Based Workflow Automation

📊 Status

🚀 PRODUCTION-READY - Phase 12 Complete (77% System Completion) ✅

Latest Update: 14 Décembre 2024

✅ 10/13 Phases Complétées - Système mature et fonctionnel
✅ Performance Exceptionnelle - 500-6250x plus rapide que requis
✅ Architecture Entreprise - 148k+ lignes, 19 modules, 6 specs complètes
✅ Innovations Techniques - Self-healing, Multi-modal, GPU management
📊 Audit Complet - Rapport détaillé

Quick Test: bash test_clip.sh

🎯 Vision

RPA basé sur la compréhension sémantique des interfaces, pas sur des coordonnées de clics.

Le système apprend des workflows en observant l'utilisateur et les automatise de manière robuste grâce à une architecture en 5 couches.

🏗️ Architecture en 5 Couches

RawSession (Couche 0)
    ↓
ScreenState (Couche 1) - 4 niveaux d'abstraction
    ↓
UIElement Detection (Couche 2) - Types + Rôles sémantiques
    ↓
State Embedding (Couche 3) - Fusion multi-modale
    ↓
Workflow Graph (Couche 4) - Nodes + Edges + Learning States

📁 Structure

rpa_vision_v3/
├── core/
│   ├── models/          # Couches 0-4 : Structures de données
│   ├── capture/         # Couche 0 : Capture événements + screenshots
│   ├── detection/       # Couche 2 : Détection UI sémantique
│   ├── embedding/       # Couche 3 : Fusion multi-modale + FAISS
│   ├── graph/           # Couche 4 : Construction + Matching + Exécution
│   └── persistence/     # Sauvegarde/Chargement
├── data/
│   ├── sessions/        # RawSessions
│   ├── screen_states/   # ScreenStates
│   ├── embeddings/      # Vecteurs .npy
│   ├── faiss_index/     # Index FAISS
│   └── workflows/       # Workflow Graphs
└── tests/               # Tests unitaires + intégration

🚀 Démarrage Rapide

Installation

# 1. Installer Ollama
curl -fsSL https://ollama.ai/install.sh | sh  # Linux
# ou
brew install ollama  # macOS

# 2. Démarrer Ollama
ollama serve

# 3. Télécharger le modèle VLM
ollama pull qwen3-vl:8b

# 4. Installer dépendances Python
pip install -r requirements.txt

Test Rapide

# Diagnostic système
python3 rpa_vision_v3/examples/diagnostic_vlm.py

# Test de détection
./rpa_vision_v3/test_quick.sh

Utilisation - Détection UI

from rpa_vision_v3.core.detection import create_detector

# Créer le détecteur
detector = create_detector()

# Détecter les éléments UI
elements = detector.detect("screenshot.png")

# Utiliser les résultats
for elem in elements:
    print(f"{elem.type:15s} | {elem.role:20s} | {elem.label}")

Utilisation - Workflow (Phase 4 - À venir)

from rpa_vision_v3.core.models import RawSession, ScreenState, Workflow
from rpa_vision_v3.core.graph import GraphBuilder, NodeMatcher

# 1. Capturer une session
session = RawSession(...)
# ... capturer événements et screenshots

# 2. Construire workflow automatiquement
builder = GraphBuilder(...)
workflow = builder.build_from_session(session)

# 3. Matcher état actuel
matcher = NodeMatcher(...)
current_state = ScreenState(...)
match = matcher.match(current_state, workflow)

# 4. Exécuter action
if match:
    edge = workflow.get_outgoing_edges(match.node.node_id)[0]
    executor.execute_edge(edge, current_state)

📚 Documentation

Guides Principaux

Quick Start : QUICK_START.md - Démarrage rapide
Prochaines Étapes : NEXT_STEPS.md - Roadmap et Phase 4
Phase 3 Complète : PHASE3_COMPLETE.md - Résumé Phase 3

Documentation Technique

Spec complète : .kiro/specs/workflow-graph-implementation/
Architecture : docs/reference/ARCHITECTURE_VISION_COMPLETE.md
Détection Hybride : HYBRID_DETECTION_SUMMARY.md
Intégration Ollama : docs/OLLAMA_INTEGRATION.md

🎓 Concepts Clés

RPA 100% Vision

❌ Pas de coordonnées (x, y) fixes
✅ Rôles sémantiques (primary_action, form_input, etc.)
✅ Matching par similarité visuelle et textuelle
✅ Robuste aux changements d'UI

Apprentissage Progressif

OBSERVATION (5+ exécutions)
    ↓
COACHING (10+ assistances, succès >90%)
    ↓
AUTO_CANDIDATE (20+ exécutions, succès >95%)
    ↓
AUTO_CONFIRMÉ (validation utilisateur)

State Embedding

Fusion multi-modale :

50% Image (screenshot complet)
30% Texte (texte détecté)
10% Titre (fenêtre)
10% UI (éléments détectés)

🧪 Tests

# Tests unitaires
pytest tests/unit/

# Tests d'intégration
pytest tests/integration/

# Tests de performance
pytest tests/performance/ --benchmark-only

📈 Roadmap - 77% Complété (10/13 Phases)

✅ Phases Complétées

Phase 1-2 : Fondations + Embeddings FAISS ✅
Phase 4-6 : Détection UI + Workflow Graphs + Action Execution ✅
Phase 7-8 : Learning System + Training System ✅
Phase 10-12 : GPU Management + Performance + Monitoring ✅

🎯 Phases Restantes

Phase 3 : Checkpoint Final (tests storage)
Phase 9 : Visual Workflow Builder (90% → 100%)
Phase 13 : Tests End-to-End + Documentation finale

🚀 Composants Production-Ready

Agent V0 : Capture cross-platform + Encryption ✅
Server API : Processing pipeline + Web dashboard ✅
Analytics System : Monitoring + Insights + Reporting ✅
Self-Healing : Automatic adaptation + Recovery ✅

🤝 Contribution

Voir .kiro/specs/workflow-graph-implementation/tasks.md pour les tâches en cours.

📄 Licence

Languages

Python 82.6%

TypeScript 11.8%

HTML 2.7%

Shell 1.2%

CSS 1.1%

Other 0.4%