Go to file

Dom cecdf417b7 fix: contrôle strict des étapes + routage par machine_id

Corrections critiques après test E2E qui montrait des clics au mauvais endroit :

1. Routage par machine_id (api_stream.py)
   Quand 2 machines partagent le même session_id (agent_demo_user),
   les actions d'un replay pour la VM ne doivent PLUS être distribuées
   au PC physique. Vérification que le replay_state appartient bien à
   la machine qui poll avant de consommer la queue.

2. IRBuilder extrait expected_window_before/after (ir_builder.py)
   Pour chaque action click/type/key_combo, stocke le titre de la fenêtre
   au moment du clic (before) et le titre du prochain événement (after).
   Ces champs alimentent le contrôle strict au runtime.

3. ExecutionCompiler crée SuccessCondition title_match (execution_compiler.py)
   Quand expected_window_after est défini, crée une condition de succès
   STRICTE avec method="title_match" et expected_title. Plus de simple
   "l'écran a changé" — on vérifie la fenêtre résultante.

4. Runner propage expected_window_before et success_strict
   Le flag success_strict indique à l'agent que le contrôle post-action
   DOIT être strict (STOP sur mismatch au lieu de warning).

5. UIA strict sur parent_path (executor.py)
   _resolve_via_uia_local REJETTE un match si l'élément trouvé n'est pas
   dans la bonne fenêtre parente (évite ex: "Rechercher" taskbar confondu
   avec "Rechercher" explorateur).

6. Pré/post vérif stricte et bloquante (executor.py)
   - expected_window_before lu en priorité depuis l'action (plan V4)
   - Post-vérif : si success_strict=True et timeout, result.success=False
     → le replay s'arrête au lieu de continuer avec des warnings.

Validé sur la VM :
- Le replay s'arrête proprement quand l'étape 2 aboutit dans "Propriétés de
  Internet" au lieu de "blocnote.txt - Bloc-notes"
- Plus de clics en aveugle / saisie au mauvais endroit

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-10 14:05:23 +02:00

agent_chat

feat: replay visuel VLM-first, worker séparé, package Léa, AZERTY, sécurité HTTPS

2026-03-26 10:19:18 +01:00

agent_rust/lea_uia

feat: lea_uia — helper Rust Windows UI Automation (cross-compilé)

2026-04-10 09:30:45 +02:00

agent_v0

fix: contrôle strict des étapes + routage par machine_id

2026-04-10 14:05:23 +02:00

core

fix: contrôle strict des étapes + routage par machine_id

2026-04-10 14:05:23 +02:00

data/training/workflows

feat: extraction expressions math + workflow calculatrice paramétrable

2026-03-14 18:39:56 +01:00

deploy

feat: pipeline complet MACRO/MÉSO/MICRO — Critic, Observer, Policy, Recovery, Learning, Audit Trail, TaskPlanner

2026-04-09 21:03:25 +02:00

docs

docs: consolidation 5 avril — état des lieux complet

2026-04-05 21:25:10 +02:00

examples

feat: chat unifié, GestureCatalog, Copilot, Léa UI, extraction données, vérification replay

2026-03-15 10:02:09 +01:00

gui

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

i18n

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

models

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

scripts

feat: chat unifié, GestureCatalog, Copilot, Léa UI, extraction données, vérification replay

2026-03-15 10:02:09 +01:00

server

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

tests

feat: câblage complet V4 — stratégie UIA + surface profile

2026-04-10 11:02:51 +02:00

visual_workflow_builder

feat: replay visuel VLM-first, worker séparé, package Léa, AZERTY, sécurité HTTPS

2026-03-26 10:19:18 +01:00

web_dashboard

feat: Léa chat + IRBuilder enrichi (stratégies V4 complètes)

2026-04-10 09:01:13 +02:00

__init__.py

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

.env.example

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

.gitignore

feat: replay visuel VLM-first, worker séparé, package Léa, AZERTY, sécurité HTTPS

2026-03-26 10:19:18 +01:00

agent_config.json

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

AGENT_CONVERSATIONNEL_VISION.md

Refactor: Renommer command_interface en agent_chat

2026-01-15 15:13:26 +01:00

ANALYSE_MOAT_RPA_VISION_V3.md

docs: Ajouter analyse MOAT complète RPA Vision V3

2026-01-18 18:10:52 +01:00

cli.py

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

Makefile

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

monitoring_server.py

feat: chat unifié, GestureCatalog, Copilot, Léa UI, extraction données, vérification replay

2026-03-15 10:02:09 +01:00

PITCH_INVESTISSEURS_RPA_VISION_V3.md

Docs: Audit sécurité et pitch investisseurs

2026-01-15 00:31:37 +01:00

pytest.ini

feat: pipeline complet MACRO/MÉSO/MICRO — Critic, Observer, Policy, Recovery, Learning, Audit Trail, TaskPlanner

2026-04-09 21:03:25 +02:00

QUICK_START.md

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

README.md

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

requirements.txt

chore: consolider venvs — .venv unique avec requirements.txt complet

2026-03-17 07:52:25 +01:00

run_gui.py

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

run.sh

feat: chat unifié, GestureCatalog, Copilot, Léa UI, extraction données, vérification replay

2026-03-15 10:02:09 +01:00

services.conf

feat: replay visuel VLM-first, worker séparé, package Léa, AZERTY, sécurité HTTPS

2026-03-26 10:19:18 +01:00

setup.py

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

status.sh

v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

2026-01-29 11:23:51 +01:00

svc.sh

feat: replay visuel VLM-first, worker séparé, package Léa, AZERTY, sécurité HTTPS

2026-03-26 10:19:18 +01:00

README.md

RPA Vision V3 - 100% Vision-Based Workflow Automation

📊 Status

🚀 PRODUCTION-READY - Phase 12 Complete (77% System Completion) ✅

Latest Update: 14 Décembre 2024

✅ 10/13 Phases Complétées - Système mature et fonctionnel
✅ Performance Exceptionnelle - 500-6250x plus rapide que requis
✅ Architecture Entreprise - 148k+ lignes, 19 modules, 6 specs complètes
✅ Innovations Techniques - Self-healing, Multi-modal, GPU management
📊 Audit Complet - Rapport détaillé

Quick Test: bash test_clip.sh

🎯 Vision

RPA basé sur la compréhension sémantique des interfaces, pas sur des coordonnées de clics.

Le système apprend des workflows en observant l'utilisateur et les automatise de manière robuste grâce à une architecture en 5 couches.

🏗️ Architecture en 5 Couches

RawSession (Couche 0)
    ↓
ScreenState (Couche 1) - 4 niveaux d'abstraction
    ↓
UIElement Detection (Couche 2) - Types + Rôles sémantiques
    ↓
State Embedding (Couche 3) - Fusion multi-modale
    ↓
Workflow Graph (Couche 4) - Nodes + Edges + Learning States

📁 Structure

rpa_vision_v3/
├── core/
│   ├── models/          # Couches 0-4 : Structures de données
│   ├── capture/         # Couche 0 : Capture événements + screenshots
│   ├── detection/       # Couche 2 : Détection UI sémantique
│   ├── embedding/       # Couche 3 : Fusion multi-modale + FAISS
│   ├── graph/           # Couche 4 : Construction + Matching + Exécution
│   └── persistence/     # Sauvegarde/Chargement
├── data/
│   ├── sessions/        # RawSessions
│   ├── screen_states/   # ScreenStates
│   ├── embeddings/      # Vecteurs .npy
│   ├── faiss_index/     # Index FAISS
│   └── workflows/       # Workflow Graphs
└── tests/               # Tests unitaires + intégration

🚀 Démarrage Rapide

Installation

# 1. Installer Ollama
curl -fsSL https://ollama.ai/install.sh | sh  # Linux
# ou
brew install ollama  # macOS

# 2. Démarrer Ollama
ollama serve

# 3. Télécharger le modèle VLM
ollama pull qwen3-vl:8b

# 4. Installer dépendances Python
pip install -r requirements.txt

Test Rapide

# Diagnostic système
python3 rpa_vision_v3/examples/diagnostic_vlm.py

# Test de détection
./rpa_vision_v3/test_quick.sh

Utilisation - Détection UI

from rpa_vision_v3.core.detection import create_detector

# Créer le détecteur
detector = create_detector()

# Détecter les éléments UI
elements = detector.detect("screenshot.png")

# Utiliser les résultats
for elem in elements:
    print(f"{elem.type:15s} | {elem.role:20s} | {elem.label}")

Utilisation - Workflow (Phase 4 - À venir)

from rpa_vision_v3.core.models import RawSession, ScreenState, Workflow
from rpa_vision_v3.core.graph import GraphBuilder, NodeMatcher

# 1. Capturer une session
session = RawSession(...)
# ... capturer événements et screenshots

# 2. Construire workflow automatiquement
builder = GraphBuilder(...)
workflow = builder.build_from_session(session)

# 3. Matcher état actuel
matcher = NodeMatcher(...)
current_state = ScreenState(...)
match = matcher.match(current_state, workflow)

# 4. Exécuter action
if match:
    edge = workflow.get_outgoing_edges(match.node.node_id)[0]
    executor.execute_edge(edge, current_state)

📚 Documentation

Guides Principaux

Quick Start : QUICK_START.md - Démarrage rapide
Prochaines Étapes : NEXT_STEPS.md - Roadmap et Phase 4
Phase 3 Complète : PHASE3_COMPLETE.md - Résumé Phase 3

Documentation Technique

Spec complète : .kiro/specs/workflow-graph-implementation/
Architecture : docs/reference/ARCHITECTURE_VISION_COMPLETE.md
Détection Hybride : HYBRID_DETECTION_SUMMARY.md
Intégration Ollama : docs/OLLAMA_INTEGRATION.md

🎓 Concepts Clés

RPA 100% Vision

❌ Pas de coordonnées (x, y) fixes
✅ Rôles sémantiques (primary_action, form_input, etc.)
✅ Matching par similarité visuelle et textuelle
✅ Robuste aux changements d'UI

Apprentissage Progressif

OBSERVATION (5+ exécutions)
    ↓
COACHING (10+ assistances, succès >90%)
    ↓
AUTO_CANDIDATE (20+ exécutions, succès >95%)
    ↓
AUTO_CONFIRMÉ (validation utilisateur)

State Embedding

Fusion multi-modale :

50% Image (screenshot complet)
30% Texte (texte détecté)
10% Titre (fenêtre)
10% UI (éléments détectés)

🧪 Tests

# Tests unitaires
pytest tests/unit/

# Tests d'intégration
pytest tests/integration/

# Tests de performance
pytest tests/performance/ --benchmark-only

📈 Roadmap - 77% Complété (10/13 Phases)

✅ Phases Complétées

Phase 1-2 : Fondations + Embeddings FAISS ✅
Phase 4-6 : Détection UI + Workflow Graphs + Action Execution ✅
Phase 7-8 : Learning System + Training System ✅
Phase 10-12 : GPU Management + Performance + Monitoring ✅

🎯 Phases Restantes

Phase 3 : Checkpoint Final (tests storage)
Phase 9 : Visual Workflow Builder (90% → 100%)
Phase 13 : Tests End-to-End + Documentation finale

🚀 Composants Production-Ready

Agent V0 : Capture cross-platform + Encryption ✅
Server API : Processing pipeline + Web dashboard ✅
Analytics System : Monitoring + Insights + Reporting ✅
Self-Healing : Automatic adaptation + Recovery ✅

🤝 Contribution

Voir .kiro/specs/workflow-graph-implementation/tasks.md pour les tâches en cours.

📄 Licence

Languages

Python 82.6%

TypeScript 11.8%

HTML 2.7%

Shell 1.2%

CSS 1.1%

Other 0.4%