Files

Dom a27b74cf22 v1.0 - Version stable: multi-PC, détection UI-DETR-1, 3 modes exécution

- Frontend v4 accessible sur réseau local (192.168.1.40)
- Ports ouverts: 3002 (frontend), 5001 (backend), 5004 (dashboard)
- Ollama GPU fonctionnel
- Self-healing interactif
- Dashboard confiance

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-29 11:23:51 +01:00

16 KiB

Raw Blame History

Document de Requirements - Workflow Graph Implementation

Introduction

Ce document définit les exigences pour l'implémentation de l'architecture Workflow Graph du système RPA Vision V2. Le système transforme progressivement des captures d'écran brutes en workflows sémantiques appris, permettant une automatisation basée sur la compréhension visuelle plutôt que sur des coordonnées de clics.

L'architecture suit une approche en 5 couches : RawSession (capture brute) → ScreenState (analyse multi-modale) → UIElement Detection (détection sémantique) → State Embedding (fusion multi-modale) → Workflow Graph (modélisation en graphe avec apprentissage progressif).

Glossaire

System : Le système RPA Vision V2
ScreenState : Représentation structurée d'un état d'écran à 4 niveaux (Raw, Perception, Sémantique UI, Contexte Métier)
UIElement : Élément d'interface détecté avec type, rôle et embeddings
State Embedding : Vecteur unique (fingerprint) fusionnant toutes les modalités d'un écran
WorkflowNode : Template d'état d'écran dans un graphe de workflow
WorkflowEdge : Transition (action) entre deux nodes
Workflow Graph : Graphe complet modélisant un workflow avec états d'apprentissage
Learning State : État de progression (OBSERVATION, COACHING, AUTO_CANDIDATE, AUTO_CONFIRMÉ)
RawSession : Enregistrement brut des événements utilisateur avec screenshots
Embedding : Vecteur numérique représentant une modalité (image, texte, UI)
FAISS Index : Index de recherche de similarité pour embeddings
VLM : Vision-Language Model (modèle vision-langage)

Requirements

Requirement 1

User Story: En tant que développeur système, je veux capturer fidèlement les sessions utilisateur avec tous les événements et screenshots, afin de pouvoir analyser et apprendre les workflows.

Acceptance Criteria

WHEN THE System captures a user session THEN THE System SHALL record all mouse events with precise timestamps and window context
WHEN THE System captures a user session THEN THE System SHALL record all keyboard events with key combinations and window context
WHEN THE System captures a user session THEN THE System SHALL take screenshots at each significant event with unique identifiers
WHEN THE System saves a RawSession THEN THE System SHALL serialize it to JSON format with schema version "rawsession_v1"
WHEN THE System loads a RawSession THEN THE System SHALL deserialize it from JSON and validate schema compatibility

Requirement 2

User Story: En tant que développeur système, je veux transformer chaque screenshot en ScreenState structuré à 4 niveaux, afin d'avoir une représentation riche et exploitable de l'état d'écran.

Acceptance Criteria

WHEN THE System processes a screenshot THEN THE System SHALL create a ScreenState with Raw level containing image path and metadata
WHEN THE System processes a screenshot THEN THE System SHALL create Perception level with image embedding using OpenCLIP
WHEN THE System processes a screenshot THEN THE System SHALL detect text using VLM and create text embeddings
WHEN THE System processes a screenshot THEN THE System SHALL detect UI elements and create Sémantique UI level
WHEN THE System processes a screenshot THEN THE System SHALL extract window context and create Contexte Métier level
WHEN THE System saves a ScreenState THEN THE System SHALL serialize it to JSON with all 4 levels preserved

Requirement 3

User Story: En tant que développeur système, je veux détecter les éléments UI de manière sémantique avec types et rôles, afin de pouvoir les identifier et les manipuler indépendamment de leur position exacte.

Acceptance Criteria

WHEN THE System detects UI elements THEN THE System SHALL identify regions of interest using VLM
WHEN THE System detects UI elements THEN THE System SHALL classify each element with a semantic type (button, text_input, checkbox, etc.)
WHEN THE System detects UI elements THEN THE System SHALL assign a semantic role to each element (primary_action, cancel, form_input, etc.)
WHEN THE System detects UI elements THEN THE System SHALL extract visual features (dominant color, shape, size category)
WHEN THE System detects UI elements THEN THE System SHALL generate dual embeddings (image embedding and text embedding) for each element
WHEN THE System detects UI elements THEN THE System SHALL compute a confidence score for each detection
WHEN THE System saves UIElements THEN THE System SHALL serialize them to JSON with all attributes and embeddings references

Requirement 4

User Story: En tant que développeur système, je veux fusionner toutes les modalités d'un écran en un State Embedding unique, afin de pouvoir comparer et matcher des états d'écran de manière robuste.

Acceptance Criteria

WHEN THE System creates a State Embedding THEN THE System SHALL compute image embedding from the full screenshot
WHEN THE System creates a State Embedding THEN THE System SHALL compute text embedding from all detected text concatenated
WHEN THE System creates a State Embedding THEN THE System SHALL compute title embedding from window title
WHEN THE System creates a State Embedding THEN THE System SHALL compute UI embedding by averaging all UIElement embeddings
WHEN THE System creates a State Embedding THEN THE System SHALL fuse all embeddings using weighted combination with configurable weights
WHEN THE System creates a State Embedding THEN THE System SHALL normalize the final embedding vector
WHEN THE System compares two State Embeddings THEN THE System SHALL compute cosine similarity between vectors
WHEN THE System saves a State Embedding THEN THE System SHALL store the vector in FAISS index and save metadata to JSON

Requirement 5

User Story: En tant que développeur système, je veux modéliser les workflows comme des graphes explicites avec Nodes et Edges, afin de représenter clairement les états et transitions.

Acceptance Criteria

WHEN THE System creates a WorkflowNode THEN THE System SHALL define a screen template with window constraints
WHEN THE System creates a WorkflowNode THEN THE System SHALL define required text patterns for matching
WHEN THE System creates a WorkflowNode THEN THE System SHALL define required UI elements with roles and types
WHEN THE System creates a WorkflowNode THEN THE System SHALL compute an embedding prototype from sample ScreenStates
WHEN THE System creates a WorkflowNode THEN THE System SHALL set a minimum similarity threshold for matching
WHEN THE System saves a WorkflowNode THEN THE System SHALL serialize it to JSON with all template constraints

Requirement 6

User Story: En tant que développeur système, je veux définir les transitions entre nodes comme des WorkflowEdges avec actions sémantiques, afin de spécifier comment naviguer dans le workflow.

Acceptance Criteria

WHEN THE System creates a WorkflowEdge THEN THE System SHALL define source and target nodes
WHEN THE System creates a WorkflowEdge THEN THE System SHALL define action type (mouse_click, key_press, text_input, compound)
WHEN THE System creates a WorkflowEdge THEN THE System SHALL define target element by semantic role rather than coordinates
WHEN THE System creates a WorkflowEdge THEN THE System SHALL define selection policy for target element (first, last, by_similarity)
WHEN THE System creates a WorkflowEdge THEN THE System SHALL define pre-conditions and post-conditions for validation
WHEN THE System creates a WorkflowEdge THEN THE System SHALL track execution statistics (success count, failure count, avg time)
WHEN THE System saves a WorkflowEdge THEN THE System SHALL serialize it to JSON with all action details and stats

Requirement 7

User Story: En tant que développeur système, je veux assembler Nodes et Edges en Workflow Graph complet avec métadonnées, afin d'avoir une représentation complète du workflow.

Acceptance Criteria

WHEN THE System creates a Workflow Graph THEN THE System SHALL define entry nodes and end nodes
WHEN THE System creates a Workflow Graph THEN THE System SHALL validate that all edges reference existing nodes
WHEN THE System creates a Workflow Graph THEN THE System SHALL detect cycles and branching in the graph
WHEN THE System creates a Workflow Graph THEN THE System SHALL assign a unique workflow_id
WHEN THE System creates a Workflow Graph THEN THE System SHALL initialize learning state to OBSERVATION
WHEN THE System saves a Workflow Graph THEN THE System SHALL serialize it to JSON with all nodes, edges and metadata

Requirement 8

User Story: En tant que développeur système, je veux implémenter les états d'apprentissage progressif (OBSERVATION, COACHING, AUTO_CANDIDATE, AUTO_CONFIRMÉ), afin de permettre au système d'apprendre graduellement.

Acceptance Criteria

WHEN THE System initializes a workflow THEN THE System SHALL set learning state to OBSERVATION
WHEN THE System has observed a workflow 5 times with similarity > 0.90 THEN THE System SHALL transition to COACHING state
WHEN THE System has assisted a workflow 10 times with success rate > 0.90 THEN THE System SHALL transition to AUTO_CANDIDATE state
WHEN THE System has executed a workflow 20 times in AUTO_CANDIDATE with success rate > 0.95 THEN THE System SHALL be eligible for AUTO_CONFIRMÉ state
WHEN THE System transitions learning state THEN THE System SHALL log the transition with reason and timestamp
WHEN THE System is in AUTO_CONFIRMÉ state and confidence drops below 0.90 THEN THE System SHALL rollback to COACHING state

Requirement 9

User Story: En tant que développeur système, je veux matcher un ScreenState actuel contre les WorkflowNodes existants, afin de reconnaître dans quel état du workflow on se trouve.

Acceptance Criteria

WHEN THE System matches a ScreenState THEN THE System SHALL compute State Embedding for current screen
WHEN THE System matches a ScreenState THEN THE System SHALL search FAISS index for similar node prototypes
WHEN THE System matches a ScreenState THEN THE System SHALL validate window constraints for candidate nodes
WHEN THE System matches a ScreenState THEN THE System SHALL validate required text patterns for candidate nodes
WHEN THE System matches a ScreenState THEN THE System SHALL validate required UI elements for candidate nodes
WHEN THE System matches a ScreenState THEN THE System SHALL return best matching node with confidence score
WHEN THE System matches a ScreenState and no node matches above threshold THEN THE System SHALL return null match

Requirement 10

User Story: En tant que développeur système, je veux exécuter les actions définies dans WorkflowEdges en trouvant les UIElements par rôle, afin d'automatiser le workflow de manière robuste.

Acceptance Criteria

WHEN THE System executes a WorkflowEdge THEN THE System SHALL find target UIElement by semantic role in current ScreenState
WHEN THE System executes a mouse_click action THEN THE System SHALL click on the center of the matched UIElement
WHEN THE System executes a text_input action THEN THE System SHALL type text into the matched UIElement
WHEN THE System executes a compound action THEN THE System SHALL execute all steps in sequence
WHEN THE System executes an action THEN THE System SHALL wait for post-conditions to be satisfied
WHEN THE System executes an action THEN THE System SHALL verify transition to expected target node
WHEN THE System executes an action and post-conditions fail THEN THE System SHALL log failure and rollback if possible

Requirement 11

User Story: En tant que développeur système, je veux détecter automatiquement les patterns répétés dans les RawSessions, afin de construire les Workflow Graphs sans intervention manuelle.

Acceptance Criteria

WHEN THE System analyzes a RawSession THEN THE System SHALL group events by window context
WHEN THE System analyzes a RawSession THEN THE System SHALL create ScreenStates for all screenshots
WHEN THE System analyzes a RawSession THEN THE System SHALL compute State Embeddings for all ScreenStates
WHEN THE System analyzes a RawSession THEN THE System SHALL detect repeated sequences using embedding similarity
WHEN THE System detects a repeated sequence THEN THE System SHALL cluster similar ScreenStates into candidate nodes
WHEN THE System detects a repeated sequence THEN THE System SHALL identify transitions as candidate edges
WHEN THE System detects a repeated sequence with 3+ repetitions THEN THE System SHALL propose a Workflow Graph

Requirement 12

User Story: En tant que développeur système, je veux persister tous les artefacts (ScreenStates, Embeddings, Workflow Graphs) de manière structurée, afin de pouvoir les recharger et les analyser.

Acceptance Criteria

WHEN THE System saves a ScreenState THEN THE System SHALL write JSON file with schema version
WHEN THE System saves embeddings THEN THE System SHALL write numpy arrays to .npy files
WHEN THE System saves embeddings THEN THE System SHALL add vectors to FAISS index
WHEN THE System saves a Workflow Graph THEN THE System SHALL write JSON file with all nodes and edges
WHEN THE System loads a Workflow Graph THEN THE System SHALL deserialize JSON and reconstruct graph structure
WHEN THE System loads embeddings THEN THE System SHALL load FAISS index and metadata mappings
WHEN THE System saves artifacts THEN THE System SHALL organize files by date and workflow_id

Requirement 13

User Story: En tant que développeur système, je veux valider la qualité des State Embeddings, afin de m'assurer qu'ils discriminent bien les différents états.

Acceptance Criteria

WHEN THE System validates State Embeddings THEN THE System SHALL compute intra-node similarity (states of same node should be similar)
WHEN THE System validates State Embeddings THEN THE System SHALL compute inter-node similarity (states of different nodes should be dissimilar)
WHEN THE System validates State Embeddings THEN THE System SHALL compute embedding quality score as ratio of intra/inter similarity
WHEN THE System validates State Embeddings and quality score is below 0.70 THEN THE System SHALL log warning
WHEN THE System validates State Embeddings THEN THE System SHALL report discriminative power metric

Requirement 14

User Story: En tant que développeur système, je veux gérer les erreurs de matching et d'exécution de manière robuste, afin que le système soit résilient aux changements d'UI.

Acceptance Criteria

WHEN THE System fails to match a ScreenState to any node THEN THE System SHALL log the unmatched state with screenshot
WHEN THE System fails to find a target UIElement by role THEN THE System SHALL try fallback strategies (visual similarity, position)
WHEN THE System fails to execute an action THEN THE System SHALL log the failure with context
WHEN THE System detects UI change (similarity drop) THEN THE System SHALL pause execution and notify user
WHEN THE System is in AUTO_CONFIRMÉ and confidence drops THEN THE System SHALL rollback to COACHING state
WHEN THE System encounters repeated failures on same edge THEN THE System SHALL mark edge as problematic

Requirement 15

User Story: En tant que développeur système, je veux optimiser les performances du système, afin que le matching et l'exécution soient rapides (< 400ms).

Acceptance Criteria

WHEN THE System computes State Embedding THEN THE System SHALL complete in less than 100ms
WHEN THE System matches ScreenState against nodes THEN THE System SHALL complete FAISS search in less than 50ms
WHEN THE System detects UI elements THEN THE System SHALL complete detection in less than 200ms
WHEN THE System executes an action THEN THE System SHALL complete execution in less than 50ms
WHEN THE System processes a ScreenState end-to-end THEN THE System SHALL complete in less than 400ms total

16 KiB Raw Blame History

Document de Requirements - Workflow Graph Implementation

Introduction

Glossaire

Requirements

Requirement 1

Acceptance Criteria

Requirement 2

Acceptance Criteria

Requirement 3

Acceptance Criteria

Requirement 4

Acceptance Criteria

Requirement 5

Acceptance Criteria

Requirement 6

Acceptance Criteria

Requirement 7

Acceptance Criteria

Requirement 8

Acceptance Criteria

Requirement 9

Acceptance Criteria

Requirement 10

Acceptance Criteria

Requirement 11

Acceptance Criteria

Requirement 12

Acceptance Criteria

Requirement 13

Acceptance Criteria

Requirement 14

Acceptance Criteria

Requirement 15

Acceptance Criteria

16 KiB

Raw Blame History