Geniusia_v2/.kiro/specs/workflow-graph-implementation/design.md

# Document de Design - Workflow Graph Implementation

## Vue d'Ensemble

Ce document décrit le design détaillé de l'implémentation de l'architecture Workflow Graph pour RPA Vision V2. Le système transforme des captures d'écran brutes en workflows sémantiques appris à travers 5 couches d'abstraction progressive.

**Philosophie** : "Observer → Comprendre → Apprendre → Agir"

Le système ne travaille PAS avec des coordonnées de clics, mais avec une compréhension sémantique des interfaces : types d'éléments, rôles, contexte visuel et textuel.

**Architecture en 5 Couches** :
```
RawSession (Couche 0) → ScreenState (Couche 1) → UIElement Detection (Couche 2)
→ State Embedding (Couche 3) → Workflow Graph (Couche 4)
```

**Référence** : Ce design s'appuie sur `docs/reference/ARCHITECTURE_VISION_COMPLETE.md` et `docs/reference/ARCHITECTURE_ENRICHISSEMENTS.md`.

## Architecture

### Architecture Globale

```
┌─────────────────────────────────────────────────────────────┐
│                    Couche 4 : Workflow Graph                │
│  WorkflowNode + WorkflowEdge + Learning States              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │ Graph Builder│  │ Node Matcher │  │ Action Executor  │  │
│  └──────────────┘  └──────────────┘  └──────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                            ↕
┌─────────────────────────────────────────────────────────────┐
│                 Couche 3 : State Embedding                  │
│  Fusion Multi-Modale (Image + Text + UI + Context)         │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │ Fusion Engine│  │ FAISS Index  │  │ Similarity Comp  │  │
│  └──────────────┘  └──────────────┘  └──────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                            ↕
┌─────────────────────────────────────────────────────────────┐
│              Couche 2 : UIElement Detection                 │
│  Détection Sémantique (Type + Rôle + Embeddings)           │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │ VLM Detector │  │ Classifier   │  │ Embedding Gen    │  │
│  └──────────────┘  └──────────────┘  └──────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                            ↕
┌─────────────────────────────────────────────────────────────┐
│                 Couche 1 : ScreenState                      │
│  Analyse Multi-Modale (4 Niveaux)                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │ Raw Capture  │  │ Perception   │  │ Semantic + Ctx   │  │
│  └──────────────┘  └──────────────┘  └──────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                            ↕
┌─────────────────────────────────────────────────────────────┐
│                  Couche 0 : RawSession                      │
│  Capture Brute (Events + Screenshots)                      │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │ Event Logger │  │ Screenshot   │  │ Session Manager  │  │
│  └──────────────┘  └──────────────┘  └──────────────────┘  │
└─────────────────────────────────────────────────────────────┘
```


### Structure des Répertoires

```
geniusia2/
├── core/
│   ├── models/
│   │   ├── raw_session.py          # Couche 0 : RawSession
│   │   ├── screen_state.py         # Couche 1 : ScreenState
│   │   ├── ui_element.py           # Couche 2 : UIElement
│   │   ├── state_embedding.py      # Couche 3 : State Embedding
│   │   └── workflow_graph.py       # Couche 4 : Workflow Graph
│   ├── capture/
│   │   ├── event_capture.py        # Capture événements
│   │   └── screenshot_capture.py   # Capture screenshots
│   ├── detection/
│   │   ├── ui_detector.py          # Détection UI avec VLM
│   │   └── text_detector.py        # Détection texte
│   ├── embedding/
│   │   ├── fusion_engine.py        # Fusion multi-modale
│   │   ├── faiss_manager.py        # Gestion index FAISS
│   │   └── similarity.py           # Calculs de similarité
│   ├── graph/
│   │   ├── graph_builder.py        # Construction graphes
│   │   ├── node_matcher.py         # Matching nodes
│   │   ├── action_executor.py      # Exécution actions
│   │   └── learning_manager.py     # Gestion états apprentissage
│   └── persistence/
│       ├── json_serializer.py      # Sérialisation JSON
│       └── storage_manager.py      # Gestion stockage
└── data/
    ├── sessions/                   # RawSessions
    ├── screen_states/              # ScreenStates
    ├── embeddings/                 # Embeddings (.npy)
    ├── faiss_index/                # Index FAISS
    └── workflows/                  # Workflow Graphs
```

## Composants et Interfaces

### Couche 0 : RawSession

**Responsabilité** : Capture fidèle de tous les événements utilisateur avec screenshots.

**Classe Principale** : `RawSession`

```python
@dataclass
class RawSession:
    schema_version: str = "rawsession_v1"
    session_id: str
    agent_version: str
    environment: Dict[str, Any]
    user: Dict[str, str]
    context: Dict[str, str]
    started_at: datetime
    ended_at: Optional[datetime]
    events: List[Event]
    screenshots: List[Screenshot]

    def add_event(self, event: Event) -> None:
        """Ajouter un événement à la session"""

    def add_screenshot(self, screenshot: Screenshot) -> None:
        """Ajouter un screenshot à la session"""

    def to_json(self) -> Dict[str, Any]:
        """Sérialiser en JSON"""

    @classmethod
    def from_json(cls, data: Dict[str, Any]) -> 'RawSession':
        """Désérialiser depuis JSON"""
```

**Classe Event** :

```python
@dataclass
class Event:
    t: float  # Timestamp relatif en secondes
    type: str  # mouse_click, key_press, etc.
    window: WindowContext
    screenshot_id: Optional[str]
    # Champs spécifiques selon type
    data: Dict[str, Any]
```


### Couche 1 : ScreenState

**Responsabilité** : Représentation structurée d'un écran à 4 niveaux d'abstraction.

**Classe Principale** : `ScreenState`

```python
@dataclass
class ScreenState:
    screen_state_id: str
    timestamp: datetime
    session_id: str
    window: WindowContext

    # Niveau 1 : Raw
    raw: RawLevel

    # Niveau 2 : Perception
    perception: PerceptionLevel

    # Niveau 3 : Sémantique UI
    ui_elements: List[UIElement]

    # Niveau 4 : Contexte Métier
    context: ContextLevel

    metadata: Dict[str, Any]

    def to_json(self) -> Dict[str, Any]:
        """Sérialiser en JSON"""

    @classmethod
    def from_json(cls, data: Dict[str, Any]) -> 'ScreenState':
        """Désérialiser depuis JSON"""
```

**Niveaux** :

```python
@dataclass
class RawLevel:
    screenshot_path: str
    capture_method: str
    file_size_bytes: int

@dataclass
class PerceptionLevel:
    embedding: EmbeddingRef
    detected_text: List[str]
    text_detection_method: str
    confidence_avg: float

@dataclass
class ContextLevel:
    current_workflow_candidate: Optional[str]
    workflow_step: Optional[int]
    user_id: str
    tags: List[str]
    business_variables: Dict[str, Any]
```

### Couche 2 : UIElement Detection

**Responsabilité** : Détection sémantique des éléments UI avec types, rôles et embeddings.

**Classe Principale** : `UIElement`

```python
@dataclass
class UIElement:
    element_id: str
    type: str  # button, text_input, checkbox, etc.
    role: str  # primary_action, cancel, form_input, etc.
    bbox: Tuple[int, int, int, int]
    center: Tuple[int, int]

    label: str
    label_confidence: float

    embeddings: UIElementEmbeddings
    visual_features: VisualFeatures

    tags: List[str]
    confidence: float

    metadata: Dict[str, Any]

    def to_json(self) -> Dict[str, Any]:
        """Sérialiser en JSON"""
```

**Embeddings Duaux** :

```python
@dataclass
class UIElementEmbeddings:
    image: EmbeddingRef  # Embedding de l'image croppée
    text: EmbeddingRef   # Embedding du texte détecté
```

**Détecteur UI** :

```python
class UIDetector:
    def __init__(self, vlm_model: str, clip_model: str):
        self.vlm = VLMClient(vlm_model)
        self.clip = CLIPEmbedder(clip_model)

    def detect_elements(self, screenshot: np.ndarray,
                       window_context: WindowContext) -> List[UIElement]:
        """Détecter tous les éléments UI dans un screenshot"""
        # 1. Proposer régions d'intérêt avec VLM
        # 2. Caractériser chaque élément (crop, OCR, embeddings)
        # 3. Classifier type et rôle
        # 4. Retourner liste d'UIElements
```


### Couche 3 : State Embedding

**Responsabilité** : Fusion multi-modale en vecteur unique (fingerprint d'écran).

**Classe Principale** : `StateEmbedding`

```python
@dataclass
class StateEmbedding:
    embedding_id: str
    vector_id: str  # Chemin vers .npy
    dimensions: int
    fusion_method: str  # "weighted" ou "concat_projection"

    components: Dict[str, EmbeddingComponent]
    metadata: Dict[str, Any]

    def get_vector(self) -> np.ndarray:
        """Charger le vecteur depuis le fichier"""

    def compute_similarity(self, other: 'StateEmbedding') -> float:
        """Calculer similarité cosinus avec autre embedding"""
```

**Moteur de Fusion** :

```python
class FusionEngine:
    def __init__(self, method: str = "weighted",
                 weights: Optional[Dict[str, float]] = None):
        self.method = method
        self.weights = weights or {
            "image": 0.5,
            "text": 0.3,
            "title": 0.1,
            "ui": 0.1
        }

    def fuse(self,
             img_emb: np.ndarray,
             text_emb: np.ndarray,
             title_emb: np.ndarray,
             ui_emb: np.ndarray) -> np.ndarray:
        """Fusionner tous les embeddings en un seul vecteur"""
        if self.method == "weighted":
            return self._weighted_fusion(img_emb, text_emb, title_emb, ui_emb)
        elif self.method == "concat_projection":
            return self._concat_projection(img_emb, text_emb, title_emb, ui_emb)

    def _weighted_fusion(self, img_emb, text_emb, title_emb, ui_emb) -> np.ndarray:
        """Fusion pondérée simple"""
        fused = (
            self.weights["image"] * normalize(img_emb) +
            self.weights["text"] * normalize(text_emb) +
            self.weights["title"] * normalize(title_emb) +
            self.weights["ui"] * normalize(ui_emb)
        )
        return normalize(fused)
```

**Gestionnaire FAISS** :

```python
class FAISSManager:
    def __init__(self, index_path: str, dimension: int = 512):
        self.index_path = index_path
        self.dimension = dimension
        self.index = self._load_or_create_index()
        self.metadata_store: Dict[int, Dict[str, Any]] = {}

    def add_embedding(self, embedding: np.ndarray,
                     metadata: Dict[str, Any]) -> int:
        """Ajouter un embedding à l'index"""
        idx = self.index.ntotal
        self.index.add(embedding.reshape(1, -1))
        self.metadata_store[idx] = metadata
        return idx

    def search_similar(self, query: np.ndarray,
                      k: int = 5) -> List[SearchResult]:
        """Chercher les k plus proches voisins"""
        distances, indices = self.index.search(query.reshape(1, -1), k)
        results = []
        for dist, idx in zip(distances[0], indices[0]):
            results.append(SearchResult(
                id=int(idx),
                distance=float(dist),
                similarity=1.0 - float(dist),  # Cosine similarity
                metadata=self.metadata_store.get(int(idx), {})
            ))
        return results
```


### Couche 4 : Workflow Graph

**Responsabilité** : Modélisation des workflows en graphes avec apprentissage progressif.

**Classe WorkflowNode** :

```python
@dataclass
class WorkflowNode:
    node_id: str
    label: str
    description: str
    screen_template: ScreenTemplate
    metadata: Dict[str, Any]

    def matches(self, screen_state: ScreenState,
                state_embedding: StateEmbedding) -> Tuple[bool, float]:
        """Vérifier si un ScreenState correspond à ce node"""
        # 1. Vérifier contraintes fenêtre
        # 2. Vérifier texte requis
        # 3. Vérifier éléments UI requis
        # 4. Vérifier similarité embedding
        # Retourner (match, confidence)

@dataclass
class ScreenTemplate:
    window: WindowConstraints
    required_text_any: List[str]
    required_ui_elements: List[UIElementConstraint]
    embedding_prototype: EmbeddingPrototype
    optional_elements: List[UIElementConstraint]

@dataclass
class EmbeddingPrototype:
    provider: str
    vector_id: str
    min_cosine_similarity: float
    sample_count: int
```

**Classe WorkflowEdge** :

```python
@dataclass
class WorkflowEdge:
    edge_id: str
    from_node: str
    to_node: str
    action: Action
    constraints: EdgeConstraints
    post_conditions: PostConditions
    stats: EdgeStats
    metadata: Dict[str, Any]

    def can_execute(self, current_state: ScreenState) -> Tuple[bool, str]:
        """Vérifier si l'edge peut être exécuté"""
        # Vérifier pre-conditions

    def execute(self, executor: ActionExecutor) -> ExecutionResult:
        """Exécuter l'action de cet edge"""

@dataclass
class Action:
    type: str  # mouse_click, key_press, text_input, compound
    target: TargetSpec
    parameters: Dict[str, Any]

@dataclass
class TargetSpec:
    role: str  # Rôle sémantique de l'élément cible
    selection_policy: str  # first, last, by_similarity
    fallback_strategy: str  # visual_similarity, position
```

**Classe Workflow** :

```python
@dataclass
class Workflow:
    workflow_id: str
    name: str
    description: str
    version: int

    learning_state: str  # OBSERVATION, COACHING, AUTO_CANDIDATE, AUTO_CONFIRMÉ

    created_at: datetime
    updated_at: datetime

    entry_nodes: List[str]
    end_nodes: List[str]

    nodes: List[WorkflowNode]
    edges: List[WorkflowEdge]

    safety_rules: SafetyRules
    stats: WorkflowStats
    learning: LearningConfig
    metadata: Dict[str, Any]

    def get_node(self, node_id: str) -> Optional[WorkflowNode]:
        """Récupérer un node par ID"""

    def get_outgoing_edges(self, node_id: str) -> List[WorkflowEdge]:
        """Récupérer tous les edges sortants d'un node"""

    def to_json(self) -> Dict[str, Any]:
        """Sérialiser en JSON"""
```


### Graph Builder

**Responsabilité** : Construire automatiquement des Workflow Graphs depuis des RawSessions.

```python
class GraphBuilder:
    def __init__(self,
                 faiss_manager: FAISSManager,
                 fusion_engine: FusionEngine,
                 ui_detector: UIDetector):
        self.faiss = faiss_manager
        self.fusion = fusion_engine
        self.ui_detector = ui_detector

    def build_from_session(self, session: RawSession) -> Optional[Workflow]:
        """Construire un workflow depuis une session"""
        # 1. Créer ScreenStates pour tous les screenshots
        screen_states = self._create_screen_states(session)

        # 2. Calculer State Embeddings
        embeddings = self._compute_embeddings(screen_states)

        # 3. Détecter patterns répétés
        patterns = self._detect_patterns(screen_states, embeddings)

        # 4. Construire nodes et edges
        if patterns:
            return self._build_workflow(patterns)
        return None

    def _detect_patterns(self,
                        screen_states: List[ScreenState],
                        embeddings: List[StateEmbedding]) -> List[Pattern]:
        """Détecter séquences répétées"""
        # Utiliser clustering sur embeddings
        # Identifier transitions récurrentes
        # Retourner patterns détectés
```

### Node Matcher

**Responsabilité** : Matcher un ScreenState actuel contre les WorkflowNodes existants.

```python
class NodeMatcher:
    def __init__(self,
                 faiss_manager: FAISSManager,
                 fusion_engine: FusionEngine):
        self.faiss = faiss_manager
        self.fusion = fusion_engine

    def match(self,
             screen_state: ScreenState,
             workflow: Workflow) -> Optional[NodeMatch]:
        """Trouver le node correspondant au ScreenState actuel"""
        # 1. Calculer State Embedding
        state_emb = self._compute_state_embedding(screen_state)

        # 2. Chercher dans FAISS les prototypes similaires
        candidates = self.faiss.search_similar(state_emb.get_vector(), k=5)

        # 3. Filtrer par workflow_id
        candidates = [c for c in candidates
                     if c.metadata.get('workflow_id') == workflow.workflow_id]

        # 4. Valider contraintes pour chaque candidat
        for candidate in candidates:
            node = workflow.get_node(candidate.metadata['node_id'])
            if node:
                matches, confidence = node.matches(screen_state, state_emb)
                if matches:
                    return NodeMatch(
                        node=node,
                        confidence=confidence,
                        embedding_similarity=candidate.similarity
                    )

        return None

@dataclass
class NodeMatch:
    node: WorkflowNode
    confidence: float
    embedding_similarity: float
```


### Action Executor

**Responsabilité** : Exécuter les actions définies dans WorkflowEdges.

```python
class ActionExecutor:
    def __init__(self, input_controller):
        self.input = input_controller

    def execute_edge(self,
                    edge: WorkflowEdge,
                    current_state: ScreenState) -> ExecutionResult:
        """Exécuter l'action d'un edge"""
        # 1. Vérifier pre-conditions
        can_execute, reason = edge.can_execute(current_state)
        if not can_execute:
            return ExecutionResult(success=False, reason=reason)

        # 2. Trouver élément cible par rôle
        target_element = self._find_target_element(
            edge.action.target,
            current_state
        )
        if not target_element:
            return ExecutionResult(success=False,
                                 reason="Target element not found")

        # 3. Exécuter action
        if edge.action.type == "mouse_click":
            self._execute_click(target_element, edge.action.parameters)
        elif edge.action.type == "text_input":
            self._execute_text_input(target_element, edge.action.parameters)
        elif edge.action.type == "compound":
            self._execute_compound(edge.action, current_state)

        # 4. Attendre post-conditions
        success = self._wait_for_postconditions(edge.post_conditions)

        return ExecutionResult(success=success)

    def _find_target_element(self,
                            target: TargetSpec,
                            state: ScreenState) -> Optional[UIElement]:
        """Trouver élément UI par rôle sémantique"""
        candidates = [el for el in state.ui_elements
                     if el.role == target.role]

        if not candidates:
            return None

        if target.selection_policy == "first":
            return candidates[0]
        elif target.selection_policy == "last":
            return candidates[-1]
        elif target.selection_policy == "by_similarity":
            # Utiliser embedding similarity
            return self._select_by_similarity(candidates, target)

        return candidates[0]

@dataclass
class ExecutionResult:
    success: bool
    reason: Optional[str] = None
    execution_time_ms: Optional[float] = None
```

### Learning Manager

**Responsabilité** : Gérer les états d'apprentissage et transitions.

```python
class LearningManager:
    def __init__(self):
        self.workflows: Dict[str, Workflow] = {}

    def update_workflow_stats(self,
                             workflow_id: str,
                             execution_result: ExecutionResult) -> None:
        """Mettre à jour statistiques après exécution"""
        workflow = self.workflows[workflow_id]
        workflow.stats.total_executions += 1

        if execution_result.success:
            workflow.stats.success_count += 1
        else:
            workflow.stats.failure_count += 1

        # Vérifier si transition d'état nécessaire
        self._check_state_transition(workflow)

    def _check_state_transition(self, workflow: Workflow) -> None:
        """Vérifier et effectuer transitions d'état si nécessaire"""
        current_state = workflow.learning_state

        if current_state == "OBSERVATION":
            if self._can_transition_to_coaching(workflow):
                self._transition_to(workflow, "COACHING")

        elif current_state == "COACHING":
            if self._can_transition_to_auto_candidate(workflow):
                self._transition_to(workflow, "AUTO_CANDIDATE")

        elif current_state == "AUTO_CANDIDATE":
            if self._can_transition_to_auto_confirmed(workflow):
                # Nécessite validation utilisateur
                self._request_user_approval(workflow)

        elif current_state == "AUTO_CONFIRMÉ":
            if self._should_rollback(workflow):
                self._transition_to(workflow, "COACHING")

    def _can_transition_to_coaching(self, workflow: Workflow) -> bool:
        """Vérifier critères pour OBSERVATION → COACHING"""
        return (workflow.stats.observed_runs >= 5 and
                workflow.stats.avg_similarity >= 0.90)

    def _can_transition_to_auto_candidate(self, workflow: Workflow) -> bool:
        """Vérifier critères pour COACHING → AUTO_CANDIDATE"""
        return (workflow.stats.assist_runs >= 10 and
                workflow.stats.success_rate >= 0.90)

    def _can_transition_to_auto_confirmed(self, workflow: Workflow) -> bool:
        """Vérifier critères pour AUTO_CANDIDATE → AUTO_CONFIRMÉ"""
        return (workflow.stats.auto_candidate_runs >= 20 and
                workflow.stats.success_rate >= 0.95)

    def _should_rollback(self, workflow: Workflow) -> bool:
        """Vérifier si rollback nécessaire"""
        return workflow.stats.recent_confidence < 0.90
```


## Modèles de Données

### Format JSON Complet

**RawSession** :
```json
{
  "schema_version": "rawsession_v1",
  "session_id": "sess_2025-11-22T10-15-00_user1",
  "agent_version": "0.2.0",
  "environment": {
    "platform": "linux",
    "hostname": "dev-machine",
    "screen": {"primary_resolution": [1920, 1080]}
  },
  "user": {"id": "user1", "label": "Developer User"},
  "context": {"customer": "Demo", "training_label": "workflow_test"},
  "started_at": "2025-11-22T10:15:00Z",
  "ended_at": "2025-11-22T10:30:00Z",
  "events": [
    {
      "t": 0.523,
      "type": "mouse_click",
      "button": "left",
      "pos": [800, 400],
      "window": {"title": "App", "app_name": "app.exe"},
      "screenshot_id": "shot_0001"
    }
  ],
  "screenshots": [
    {
      "screenshot_id": "shot_0001",
      "relative_path": "shots/shot_0001.png",
      "captured_at": "2025-11-22T10:15:00.523Z"
    }
  ]
}
```

**ScreenState** :
```json
{
  "screen_state_id": "screen_2025-11-22T10-15-32.123Z",
  "timestamp": "2025-11-22T10:15:32.123Z",
  "session_id": "sess_2025-11-22T10-15-00_user1",
  "window": {
    "app_name": "app",
    "window_title": "Main Window",
    "screen_resolution": [1920, 1080]
  },
  "raw": {
    "screenshot_path": "data/screens/2025-11-22/10-15-32.png",
    "capture_method": "mss",
    "file_size_bytes": 245678
  },
  "perception": {
    "embedding": {
      "provider": "openclip_ViT-B-32",
      "vector_id": "data/embeddings/screens/screen_2025-11-22T10-15-32.123Z.npy",
      "dimensions": 512
    },
    "detected_text": ["Button", "Input", "Submit"],
    "text_detection_method": "qwen_vl",
    "confidence_avg": 0.92
  },
  "ui_elements": [
    {
      "element_id": "el_btn_001",
      "type": "button",
      "role": "primary_action",
      "bbox": [100, 200, 200, 240],
      "center": [150, 220],
      "label": "Submit",
      "label_confidence": 0.96,
      "embeddings": {
        "image": {
          "provider": "openclip_ViT-B-32",
          "vector_id": "data/embeddings/elements/el_btn_001_img.npy",
          "dimensions": 512
        },
        "text": {
          "provider": "openclip_ViT-B-32",
          "vector_id": "data/embeddings/elements/el_btn_001_txt.npy",
          "dimensions": 512
        }
      },
      "visual_features": {
        "dominant_color": "#4CAF50",
        "has_icon": false,
        "shape": "rectangle",
        "size_category": "medium"
      },
      "tags": ["action", "primary"],
      "confidence": 0.94
    }
  ],
  "context": {
    "current_workflow_candidate": null,
    "workflow_step": null,
    "user_id": "user1",
    "tags": ["demo"],
    "business_variables": {}
  },
  "metadata": {
    "processing_time_ms": 245,
    "ui_elements_count": 5
  }
}
```


**Workflow Graph** :
```json
{
  "workflow_id": "WF_demo_workflow",
  "name": "Demo Workflow",
  "description": "Simple demo workflow for testing",
  "version": 1,
  "learning_state": "OBSERVATION",
  "created_at": "2025-11-22T10:45:00Z",
  "updated_at": "2025-11-22T10:45:00Z",
  "entry_nodes": ["N1_start"],
  "end_nodes": ["N3_end"],
  "nodes": [
    {
      "node_id": "N1_start",
      "label": "Start Screen",
      "description": "Initial screen with form",
      "screen_template": {
        "window": {
          "app_name_any_of": ["app"],
          "title_contains_any_of": ["Main"]
        },
        "required_text_any": ["Submit", "Input"],
        "required_ui_elements": [
          {
            "role": "primary_action",
            "type_any_of": ["button"],
            "min_count": 1
          }
        ],
        "embedding_prototype": {
          "provider": "openclip_ViT-B-32",
          "vector_id": "data/embeddings/workflows/WF_demo/N1_prototype.npy",
          "min_cosine_similarity": 0.85,
          "sample_count": 5
        }
      },
      "metadata": {
        "created_at": "2025-11-22T10:45:00Z",
        "observation_count": 5
      }
    }
  ],
  "edges": [
    {
      "edge_id": "E1_submit",
      "from_node": "N1_start",
      "to_node": "N2_processing",
      "action": {
        "type": "mouse_click",
        "target": {
          "role": "primary_action",
          "selection_policy": "first",
          "fallback_strategy": "visual_similarity"
        },
        "parameters": {
          "click_offset": [0, 0],
          "wait_after_ms": 500
        }
      },
      "constraints": {
        "max_delay_seconds": 5,
        "pre_conditions": ["element:primary_action_visible"],
        "post_conditions": ["window_title_changed"]
      },
      "post_conditions": {
        "expected_node": "N2_processing",
        "min_similarity": 0.85,
        "timeout_seconds": 5
      },
      "stats": {
        "manual_executions": 5,
        "assist_executions": 0,
        "auto_executions": 0,
        "success_count": 5,
        "failure_count": 0,
        "avg_execution_time_ms": 1200
      }
    }
  ],
  "safety_rules": {
    "forbidden_text_clicks": ["Delete", "Remove"],
    "forbidden_roles": ["delete_action"],
    "require_confirmation_for": ["irreversible_action"]
  },
  "stats": {
    "observed_runs": 5,
    "assist_runs": 0,
    "auto_candidate_runs": 0,
    "auto_confirmed_runs": 0,
    "success_rate_overall": 1.0,
    "avg_duration_seconds": 30.5
  },
  "learning": {
    "state": "OBSERVATION",
    "thresholds": {
      "min_observed_runs_for_coaching": 5,
      "min_assist_runs_for_auto_candidate": 10,
      "min_auto_candidate_runs_for_auto_confirmed": 20
    }
  }
}
```

## Gestion des Erreurs

### Échecs de Matching

**Scénario** : Aucun node ne correspond au ScreenState actuel.

**Gestion** :
1. Logger le ScreenState non matché avec screenshot
2. Calculer similarité avec tous les nodes existants
3. Si similarité proche (0.75-0.84), suggérer mise à jour du node
4. Si similarité faible (<0.75), suggérer création d'un nouveau node
5. Notifier l'utilisateur et mettre en pause l'exécution

### Échecs de Détection UI

**Scénario** : Élément cible non trouvé par rôle.

**Gestion** :
1. Logger l'échec avec contexte (node, edge, rôle recherché)
2. Essayer stratégie de fallback (visual similarity)
3. Si fallback échoue, essayer position approximative
4. Si tout échoue, notifier utilisateur et demander correction
5. Mettre à jour le template du node avec feedback

### Violations de Post-Conditions

**Scénario** : Post-conditions non satisfaites après exécution.

**Gestion** :
1. Logger la violation avec détails (attendu vs réel)
2. Attendre timeout configuré
3. Si toujours pas satisfait, marquer exécution comme échec
4. Incrémenter compteur d'échecs pour cet edge
5. Si échecs répétés (>3), marquer edge comme problématique

### Changements d'UI Détectés

**Scénario** : Similarité d'embedding chute significativement.

**Gestion** :
1. Détecter changement (similarité < 0.70 vs prototype)
2. Capturer nouveau screenshot pour analyse
3. Mettre en pause l'exécution automatique
4. Notifier utilisateur du changement détecté
5. Proposer ré-apprentissage du node affecté


## Stratégie de Test

### Tests Unitaires

**Composants à Tester** :

1. **RawSession** :
   - Sérialisation/désérialisation JSON
   - Ajout d'événements et screenshots
   - Validation de schéma

2. **ScreenState** :
   - Création des 4 niveaux
   - Sérialisation/désérialisation JSON
   - Validation de structure

3. **UIElement** :
   - Détection de types et rôles
   - Génération d'embeddings duaux
   - Calcul de features visuelles

4. **StateEmbedding** :
   - Fusion pondérée
   - Fusion par concaténation
   - Calcul de similarité cosinus

5. **FAISSManager** :
   - Ajout d'embeddings
   - Recherche de similarité
   - Sauvegarde/chargement d'index

6. **WorkflowNode** :
   - Matching avec ScreenState
   - Validation de contraintes
   - Calcul de confiance

7. **WorkflowEdge** :
   - Validation de pre-conditions
   - Exécution d'actions
   - Vérification de post-conditions

8. **LearningManager** :
   - Transitions d'états
   - Calcul de métriques
   - Détection de rollback

**Framework** : pytest avec fixtures

### Tests d'Intégration

**Scénarios** :

1. **Pipeline Complet RawSession → Workflow** :
   - Capturer session
   - Créer ScreenStates
   - Détecter UI elements
   - Calculer embeddings
   - Construire workflow graph
   - Vérifier structure du graphe

2. **Matching et Exécution** :
   - Charger workflow existant
   - Matcher ScreenState actuel
   - Trouver edge sortant
   - Exécuter action
   - Vérifier transition

3. **Apprentissage Progressif** :
   - Simuler 5 observations → COACHING
   - Simuler 10 assistances → AUTO_CANDIDATE
   - Simuler 20 exécutions → AUTO_CONFIRMÉ
   - Vérifier transitions

4. **Gestion d'Erreurs** :
   - Simuler échec de matching
   - Simuler élément non trouvé
   - Simuler violation post-conditions
   - Vérifier récupération

### Tests de Performance

**Métriques Cibles** :

| Opération | Temps Cible | Méthode de Test |
|-----------|-------------|-----------------|
| Compute State Embedding | < 100ms | pytest-benchmark |
| FAISS Search | < 50ms | pytest-benchmark |
| UI Detection | < 200ms | pytest-benchmark |
| Action Execution | < 50ms | pytest-benchmark |
| End-to-End Processing | < 400ms | pytest-benchmark |

**Outils** : pytest-benchmark, cProfile

### Tests de Bout en Bout

**Workflow de Test** :

1. **Apprentissage Complet** :
   - Capturer 5 sessions d'un workflow simple
   - Vérifier construction automatique du graphe
   - Vérifier transition OBSERVATION → COACHING
   - Exécuter en mode assisté 10 fois
   - Vérifier transition COACHING → AUTO_CANDIDATE

2. **Robustesse UI** :
   - Apprendre workflow sur UI version 1
   - Modifier légèrement UI (couleurs, positions)
   - Vérifier que matching fonctionne toujours
   - Modifier significativement UI
   - Vérifier détection de changement

3. **Multi-Workflows** :
   - Apprendre 3 workflows différents
   - Vérifier isolation des workflows
   - Vérifier matching correct pour chaque workflow
   - Vérifier pas de confusion entre workflows


## Correctness Properties

*A property is a characteristic or behavior that should hold true across all valid executions of a system-essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*

### Property 1: RawSession Serialization Round Trip

*For any* valid RawSession object, serializing to JSON then deserializing should produce an equivalent RawSession with all events and screenshots preserved.

**Validates: Requirements 1.4, 1.5**

### Property 2: ScreenState Multi-Level Consistency

*For any* ScreenState, all 4 levels (Raw, Perception, Sémantique UI, Contexte) should reference the same screenshot and timestamp.

**Validates: Requirements 2.1, 2.2, 2.3, 2.4, 2.5**

### Property 3: UIElement Detection Confidence Bounds

*For any* detected UIElement, the confidence score should be between 0.0 and 1.0, and elements with confidence below threshold should not be included in results.

**Validates: Requirements 3.6**

### Property 4: State Embedding Normalization

*For any* State Embedding vector, the L2 norm should be 1.0 (normalized vector).

**Validates: Requirements 4.6**

### Property 5: State Embedding Similarity Symmetry

*For any* two State Embeddings A and B, similarity(A, B) should equal similarity(B, A).

**Validates: Requirements 4.7**

### Property 6: State Embedding Similarity Bounds

*For any* two State Embeddings, the cosine similarity should be between -1.0 and 1.0.

**Validates: Requirements 4.7**

### Property 7: WorkflowNode Matching Consistency

*For any* WorkflowNode and ScreenState, if the node matches with confidence C1, then matching again immediately should return the same confidence C1 (deterministic).

**Validates: Requirements 9.1, 9.2, 9.3, 9.4, 9.5, 9.6**

### Property 8: WorkflowEdge Pre-Condition Validation

*For any* WorkflowEdge, if pre-conditions are not satisfied, execution should not proceed and should return failure.

**Validates: Requirements 10.5**

### Property 9: Learning State Monotonic Progression

*For any* Workflow in learning state S, transitioning to state S' should only happen if S' is the next state in the progression (OBSERVATION → COACHING → AUTO_CANDIDATE → AUTO_CONFIRMÉ), except for rollback to COACHING.

**Validates: Requirements 8.1, 8.2, 8.3, 8.4**

### Property 10: Learning State Rollback Condition

*For any* Workflow in AUTO_CONFIRMÉ state, if confidence drops below 0.90, the system should rollback to COACHING state.

**Validates: Requirements 8.6**

### Property 11: FAISS Index Consistency

*For any* embedding added to FAISS index with metadata M, searching for that exact embedding should return it as the top result with metadata M.

**Validates: Requirements 4.8, 12.3, 12.6**

### Property 12: Workflow Graph Structural Validity

*For any* Workflow Graph, all edges should reference existing nodes (no dangling references).

**Validates: Requirements 7.2**

### Property 13: UIElement Role Uniqueness Per Type

*For any* ScreenState, if multiple UIElements have the same role, they should have different element_ids.

**Validates: Requirements 3.3**

### Property 14: Embedding Prototype Sample Count

*For any* WorkflowNode with embedding prototype, the sample_count should be at least 1 and the prototype vector should exist.

**Validates: Requirements 5.4**

### Property 15: Action Execution Timeout

*For any* WorkflowEdge execution, if post-conditions are not satisfied within timeout_seconds, the execution should be marked as failed.

**Validates: Requirements 10.6**

### Property 16: Pattern Detection Minimum Repetitions

*For any* detected workflow pattern, it should have been observed at least 3 times before being proposed as a Workflow Graph.

**Validates: Requirements 11.7**

### Property 17: State Embedding Component Weights Sum

*For any* weighted fusion configuration, the sum of all component weights (image + text + title + ui) should equal 1.0.

**Validates: Requirements 4.5**

### Property 18: Workflow JSON Serialization Round Trip

*For any* valid Workflow Graph, serializing to JSON then deserializing should produce an equivalent Workflow with all nodes and edges preserved.

**Validates: Requirements 7.6, 12.4, 12.5**

### Property 19: Performance Constraint - State Embedding

*For any* ScreenState, computing the State Embedding should complete in less than 100ms.

**Validates: Requirements 15.1**

### Property 20: Performance Constraint - End-to-End

*For any* ScreenState processing (detection + embedding + matching), the total time should be less than 400ms.

**Validates: Requirements 15.5**


## Considérations de Sécurité

### Validation des Données

- Tous les JSON chargés doivent être validés contre leur schéma
- Les embeddings chargés doivent avoir les dimensions attendues
- Les workflow_ids et node_ids doivent être validés (format, unicité)

### Isolation des Workflows

- Chaque workflow doit avoir son propre espace dans FAISS
- Les embeddings de différents workflows ne doivent pas interférer
- Les métadonnées doivent inclure workflow_id pour filtrage

### Safety Rules

- Les actions interdites (forbidden_text_clicks, forbidden_roles) doivent être bloquées
- Les actions irréversibles doivent demander confirmation
- Les rollbacks doivent être possibles pour les 3 dernières actions

### Logging et Audit

- Toutes les transitions d'état doivent être loggées
- Tous les échecs d'exécution doivent être loggées avec contexte
- Les changements d'UI détectés doivent être loggées avec screenshots

## Optimisation des Performances

### Embeddings

- Utiliser batch processing pour calculer plusieurs embeddings
- Mettre en cache les embeddings de prototypes
- Utiliser quantification FP16 pour modèles CLIP

### FAISS

- Utiliser index IVF pour grands ensembles (>10k embeddings)
- Optimiser périodiquement l'index (compactage)
- Utiliser GPU si disponible pour recherche

### UI Detection

- Limiter la résolution des screenshots (max 1920x1080)
- Utiliser ROI detection pour réduire zone de traitement
- Mettre en cache les résultats de détection pour frames similaires

### Workflow Matching

- Pré-filtrer les candidats par window context
- Utiliser early stopping si confiance très élevée (>0.95)
- Mettre en cache le dernier node matché

## Considérations de Déploiement

### Dépendances

- Python 3.9+
- PyTorch 2.0+
- OpenCLIP
- FAISS (CPU ou GPU)
- Transformers (Hugging Face)
- NumPy, Pillow
- pytest, pytest-benchmark

### Structure de Données

```
data/
├── sessions/
│   └── YYYY-MM-DD/
│       └── sess_*.json
├── screen_states/
│   └── YYYY-MM-DD/
│       └── screen_*.json
├── embeddings/
│   ├── screens/
│   │   └── *.npy
│   ├── elements/
│   │   └── *.npy
│   └── states/
│       └── *.npy
├── faiss_index/
│   ├── index.faiss
│   └── metadata.json
└── workflows/
    └── WF_*/
        ├── workflow.json
        └── prototypes/
            └── *.npy
```

### Configuration

```python
CONFIG = {
    "models": {
        "clip": "ViT-B-32",
        "vlm": "qwen2.5-vl:3b"
    },
    "embedding": {
        "dimension": 512,
        "fusion_method": "weighted",
        "weights": {
            "image": 0.5,
            "text": 0.3,
            "title": 0.1,
            "ui": 0.1
        }
    },
    "matching": {
        "min_similarity": 0.85,
        "faiss_k": 5
    },
    "learning": {
        "observation_threshold": 5,
        "coaching_threshold": 10,
        "auto_candidate_threshold": 20,
        "min_success_rate": 0.90,
        "rollback_confidence": 0.90
    },
    "performance": {
        "max_embedding_time_ms": 100,
        "max_detection_time_ms": 200,
        "max_total_time_ms": 400
    }
}
```

## Plan d'Implémentation

### Phase 1 : Fondations (Semaines 1-2)

**Objectif** : Structures de données et sérialisation

- Implémenter classes de base (RawSession, ScreenState, UIElement, etc.)
- Implémenter sérialisation/désérialisation JSON
- Tests unitaires sur structures
- Validation de schémas

**Livrables** :
- `geniusia2/core/models/*.py`
- Tests unitaires complets

### Phase 2 : Embeddings et FAISS (Semaines 3-4)

**Objectif** : Système d'embeddings fonctionnel

- Implémenter FusionEngine
- Implémenter FAISSManager
- Implémenter calculs de similarité
- Tests de performance

**Livrables** :
- `geniusia2/core/embedding/*.py`
- Benchmarks de performance

### Phase 3 : UI Detection (Semaines 5-6)

**Objectif** : Détection sémantique d'éléments UI

- Intégrer VLM pour détection
- Implémenter classification type/rôle
- Générer embeddings duaux
- Tests avec screenshots réels

**Livrables** :
- `geniusia2/core/detection/*.py`
- Dataset de test avec screenshots

### Phase 4 : Workflow Graph (Semaines 7-9)

**Objectif** : Construction et matching de graphes

- Implémenter WorkflowNode et WorkflowEdge
- Implémenter GraphBuilder
- Implémenter NodeMatcher
- Tests d'intégration

**Livrables** :
- `geniusia2/core/graph/*.py`
- Tests d'intégration complets

### Phase 5 : Exécution et Apprentissage (Semaines 10-12)

**Objectif** : Exécution d'actions et états d'apprentissage

- Implémenter ActionExecutor
- Implémenter LearningManager
- Implémenter transitions d'états
- Tests end-to-end

**Livrables** :
- `geniusia2/core/graph/action_executor.py`
- `geniusia2/core/graph/learning_manager.py`
- Tests end-to-end

### Phase 6 : Optimisation et Production (Semaines 13-14)

**Objectif** : Optimisation et déploiement

- Optimiser performances (caching, batching)
- Ajouter monitoring et logging
- Documentation complète
- Tests de charge

**Livrables** :
- Système optimisé
- Documentation utilisateur
- Guide de déploiement