# Architecture Vision Complète - RPA Vision V2

**Date de création** : 22 novembre 2024  
**Version** : 1.0  
**Statut** : 📐 Document de Référence Architecturale

---

## 🎯 Vue d'Ensemble

Ce document décrit l'architecture complète du système RPA Vision V2, depuis la capture brute des événements utilisateur jusqu'à l'exécution automatique de workflows appris.

### Philosophie : "Observer → Comprendre → Apprendre → Agir"

Le système transforme progressivement des **captures brutes** en **connaissances actionnables** à travers 5 couches d'abstraction :

```
┌─────────────────────────────────────────────────────────────┐
│ Couche 0 : RawSession                                       │
│ Capture brute des événements (clics, touches, screenshots) │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ Couche 1 : ScreenState                                      │
│ Analyse multi-modale (image, texte, éléments UI, contexte) │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ Couche 2 : UIElement Detection                              │
│ Détection sémantique des éléments d'interface              │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ Couche 3 : State Embedding                                  │
│ Fusion multi-modale en vecteur unique (fingerprint)        │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ Couche 4 : Workflow Graph                                   │
│ Modélisation en graphe (Nodes, Edges, Learning States)     │
└─────────────────────────────────────────────────────────────┘
```

---

## 📊 Couche 0 : RawSession - Capture Brute

### Objectif

Enregistrer fidèlement toutes les interactions utilisateur avec horodatage précis et contexte complet.

### Structure JSON

**Format** : `rawsession_v1`

```json
{
  "schema_version": "rawsession_v1",
  "session_id": "sess_2025-11-21T10-15-00_formateurX",
  "agent_version": "0.1.0",
  
  "environment": {
    "platform": "windows",
    "hostname": "PC-FORMATEUR-X",
    "screen": {
      "primary_resolution": [1920, 1080],
      "display_scale": 1.25
    }
  },
  
  "user": {
    "id": "formateurX",
    "label": "Formateur X démo clinique"
  },
  
  "context": {
    "customer": "Clinique Demo",
    "training_label": "Facturation_T2A_demo",
    "notes": "Session de formation interne"
  },
  
  "started_at": "2025-11-21T10:15:00Z",
  "ended_at": "2025-11-21T10:32:45Z",
  
  "events": [
    {
      "t": 0.523,
      "type": "mouse_click",
      "button": "left",
      "pos": [1620, 920],
      "window": {
        "title": "Factures - Clinique Demo",
        "app_name": "logiciel_facturation.exe"
      },
      "screenshot_id": "shot_0001"
    },
    {
      "t": 1.850,
      "type": "key_press",
      "keys": ["CTRL", "F"],
      "window": {
        "title": "Factures - Clinique Demo",
        "app_name": "logiciel_facturation.exe"
      },
      "screenshot_id": "shot_0002"
    },
    {
      "t": 3.200,
      "type": "mouse_scroll",
      "delta": -120,
      "pos": [800, 600],
      "window": {
        "title": "Factures - Clinique Demo",
        "app_name": "logiciel_facturation.exe"
      },
      "screenshot_id": null
    }
  ],
  
  "screenshots": [
    {
      "screenshot_id": "shot_0001",
      "relative_path": "shots/shot_0001.png",
      "captured_at": "2025-11-21T10:15:00.523Z"
    },
    {
      "screenshot_id": "shot_0002",
      "relative_path": "shots/shot_0002.png",
      "captured_at": "2025-11-21T10:15:01.850Z"
    }
  ]
}
```

### Types d'Événements Supportés

| Type | Description | Champs spécifiques |
|------|-------------|-------------------|
| `mouse_click` | Clic souris | `button`, `pos`, `window` |
| `mouse_move` | Déplacement souris | `pos`, `window` |
| `mouse_scroll` | Scroll souris | `delta`, `pos`, `window` |
| `key_press` | Touche(s) clavier | `keys`, `window` |
| `key_release` | Relâchement touche | `keys`, `window` |
| `text_input` | Saisie de texte | `text`, `window` |
| `window_change` | Changement de fenêtre | `from_window`, `to_window` |
| `screen_change` | Changement d'écran détecté | `similarity_score` |

### 💡 Propositions d'Amélioration

**1. Ajout de métadonnées de performance**
```json
"performance": {
  "cpu_usage_percent": 45.2,
  "memory_usage_mb": 2048,
  "capture_latency_ms": 12
}
```

**2. Support multi-écrans**
```json
"environment": {
  "screens": [
    {"id": 0, "resolution": [1920, 1080], "is_primary": true},
    {"id": 1, "resolution": [1920, 1080], "is_primary": false}
  ]
}
```

**3. Versioning des événements**
```json
"events": [
  {
    "event_version": "1.0",
    "t": 0.523,
    ...
  }
]
```

---

## 📊 Couche 1 : ScreenState - Analyse Multi-Modale

### Objectif

Transformer un screenshot brut en une représentation structurée à 4 niveaux d'abstraction.

### Les 4 Niveaux

#### Niveau 1 : Raw (Ce que la machine voit)
- Screenshot brut (image)
- Résolution, ratio d'écran
- Métadonnées de capture

#### Niveau 2 : Perception (Ce que la vision déduit)
- Embeddings multi-modaux (CLIP, Pix2Struct, VLM)
- Texte détecté (OCR/VLM)
- Zones d'intérêt identifiées

#### Niveau 3 : Sémantique UI (Ce que le système comprend)
- Liste d'UIElements structurés
- Types, rôles, labels
- Embeddings locaux par élément

#### Niveau 4 : Contexte Métier (Session/Application)
- Application et fenêtre active
- Workflow en cours (si identifié)
- Variables métier

### Structure JSON

```json
{
  "screen_state_id": "screen_2025-11-21T10-15-32.123Z",
  "timestamp": "2025-11-21T10:15:32.123Z",
  "session_id": "session_abc123",
  
  "window": {
    "app_name": "logiciel_facturation",
    "window_title": "Factures - Clinique Demo",
    "screen_resolution": [1920, 1080],
    "workspace": "main"
  },
  
  "raw": {
    "screenshot_path": "data/screens/2025-11-21/10-15-32_factures.png",
    "capture_method": "mss",
    "file_size_bytes": 245678
  },
  
  "perception": {
    "embedding": {
      "provider": "openclip_ViT-B-32",
      "vector_id": "data/embeddings/screens/screen_2025-11-21T10-15-32.123Z.npy",
      "dimensions": 512
    },
    "detected_text": [
      "Factures",
      "Patient",
      "Montant",
      "Statut",
      "À valider",
      "Validée"
    ],
    "text_detection_method": "qwen_vl",
    "confidence_avg": 0.92
  },
  
  "ui_elements": [
    {
      "element_id": "el_row_001",
      "type": "table_row",
      "role": "invoice_row",
      "bbox": [100, 250, 1800, 280],
      "label": "DUPONT Jean - 120,00 € - À valider",
      "embedding": {
        "provider": "openclip_ViT-B-32",
        "vector_id": "data/embeddings/elements/screen_..._el_row_001.npy"
      },
      "tags": ["invoice", "pending"],
      "confidence": 0.94
    },
    {
      "element_id": "el_btn_open",
      "type": "button",
      "role": "open_invoice",
      "bbox": [1750, 250, 1850, 280],
      "label": "Ouvrir",
      "embedding": null,
      "tags": ["action", "primary"],
      "confidence": 0.98
    }
  ],
  
  "context": {
    "current_workflow_candidate": null,
    "workflow_step": null,
    "user_id": "dom",
    "tags": ["facturation"],
    "business_variables": {}
  },
  
  "metadata": {
    "processing_time_ms": 245,
    "ui_elements_count": 12,
    "text_regions_count": 45
  }
}
```

### 💡 Propositions d'Amélioration

**1. Ajout de zones d'intérêt (ROI)**
```json
"perception": {
  "regions_of_interest": [
    {
      "roi_id": "roi_001",
      "bbox": [100, 200, 1800, 800],
      "type": "content_area",
      "importance": 0.9
    }
  ]
}
```

**2. Historique de changements**
```json
"change_detection": {
  "previous_state_id": "screen_2025-11-21T10-15-30.123Z",
  "similarity_score": 0.87,
  "changed_regions": [
    {"bbox": [1500, 900, 1600, 940], "change_type": "new_element"}
  ]
}
```

**3. Métriques de qualité**
```json
"quality_metrics": {
  "image_sharpness": 0.92,
  "text_readability": 0.88,
  "ui_element_detection_confidence": 0.91
}
```

---

## 📊 Couche 2 : UIElement Detection - Détection Sémantique

### Objectif

Transformer un screenshot en liste d'objets UI sémantiques exploitables (boutons, champs, etc.).

### Pipeline de Détection

```
Screenshot
    ↓
┌─────────────────────────────────────┐
│ Étape 1 : Proposer Régions d'Intérêt│
│ - Heuristiques (zones de texte)    │
│ - VLM (zones cliquables)            │
│ - Détection de patterns UI          │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│ Étape 2 : Caractériser Éléments    │
│ - Crop image → embedding image      │
│ - OCR/VLM → texte + embedding texte │
│ - Position + dimensions             │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│ Étape 3 : Classifier Type + Rôle   │
│ - Type : button, input, checkbox... │
│ - Rôle : primary_action, cancel...  │
│ - ID stable (hash)                  │
└─────────────────────────────────────┘
    ↓
Liste d'UIElements
```

### Structure UIElement

```json
{
  "element_id": "el_btn_valider_001",
  "type": "button",
  "role": "validate_invoice",
  "bbox": [1500, 900, 1600, 940],
  "center": [1550, 920],
  
  "label": "Valider la facture",
  "label_confidence": 0.96,
  
  "embeddings": {
    "image": {
      "provider": "openclip_ViT-B-32",
      "vector_id": "data/embeddings/elements/el_btn_valider_001_img.npy",
      "dimensions": 512
    },
    "text": {
      "provider": "openclip_ViT-B-32",
      "vector_id": "data/embeddings/elements/el_btn_valider_001_txt.npy",
      "dimensions": 512
    }
  },
  
  "visual_features": {
    "dominant_color": "#4CAF50",
    "has_icon": false,
    "shape": "rectangle",
    "size_category": "medium"
  },
  
  "tags": ["primary_action", "billing", "validation"],
  "confidence": 0.94,
  
  "metadata": {
    "detection_method": "qwen_vl",
    "detection_time_ms": 45
  }
}
```

### Types d'Éléments Supportés

| Type | Description | Rôles typiques |
|------|-------------|----------------|
| `button` | Bouton cliquable | `primary_action`, `cancel`, `submit` |
| `text_input` | Champ de saisie | `search_field`, `form_input` |
| `checkbox` | Case à cocher | `verification`, `selection` |
| `radio` | Bouton radio | `option_selection` |
| `dropdown` | Menu déroulant | `category_selector` |
| `tab` | Onglet | `navigation` |
| `link` | Lien hypertexte | `navigation`, `external_link` |
| `icon` | Icône | `action_trigger`, `status_indicator` |
| `table_row` | Ligne de tableau | `data_row`, `selectable_item` |
| `menu_item` | Élément de menu | `action`, `submenu` |

### 💡 Propositions d'Amélioration

**1. Hiérarchie d'éléments**
```json
{
  "element_id": "el_form_001",
  "type": "form",
  "children": [
    {"element_id": "el_input_001", "type": "text_input"},
    {"element_id": "el_btn_submit", "type": "button"}
  ]
}
```

**2. États d'éléments**
```json
{
  "state": {
    "enabled": true,
    "visible": true,
    "focused": false,
    "selected": false,
    "value": "DUPONT Jean"
  }
}
```

**3. Relations entre éléments**
```json
{
  "relationships": [
    {
      "type": "label_for",
      "target_element_id": "el_input_001"
    },
    {
      "type": "part_of_group",
      "group_id": "form_patient"
    }
  ]
}
```

---

## 📊 Couche 3 : State Embedding - Fusion Multi-Modale

### Objectif

Créer un "fingerprint" unique de l'écran en fusionnant toutes les modalités (image, texte, UI, contexte).

### Composantes du State Embedding

```python
state_emb = fusion(
    img_emb,      # Embedding CLIP du screenshot entier
    text_emb,     # Embedding du texte détecté concaténé
    title_emb,    # Embedding du titre de fenêtre
    ui_emb,       # Moyenne des embeddings des éléments UI
    ctx_emb       # Contexte workflow/métier encodé
)
```

### Méthodes de Fusion

#### Option A : Fusion Pondérée (Recommandée pour démarrer)

```python
state_emb = normalize(
    0.5 * img_emb +      # Visuel global (50%)
    0.3 * text_emb +     # Texte détecté (30%)
    0.1 * title_emb +    # Titre fenêtre (10%)
    0.1 * ui_emb         # Éléments UI (10%)
)
```

**Avantages** :
- Simple à implémenter
- Pas de training nécessaire
- Poids ajustables manuellement

#### Option B : Concaténation + Projection

```python
z = concat([
    normalize(img_emb),
    normalize(text_emb),
    normalize(title_emb),
    normalize(ui_emb),
    normalize(ctx_emb)
])

state_emb = projection_layer(z)  # MLP ou PCA
```

**Avantages** :
- Préserve toute l'information
- Peut être affiné avec du training
- Plus expressif

### Structure JSON

```json
{
  "state_embedding": {
    "embedding_id": "state_emb_2025-11-21T10-15-32.123Z",
    "vector_id": "data/embeddings/states/state_2025-11-21T10-15-32.123Z.npy",
    "dimensions": 512,
    "fusion_method": "weighted",
    
    "components": {
      "image": {
        "weight": 0.5,
        "vector_id": "data/embeddings/screens/screen_..._img.npy"
      },
      "text": {
        "weight": 0.3,
        "vector_id": "data/embeddings/screens/screen_..._text.npy",
        "source_text": "Factures | Patient | Montant | Statut | ..."
      },
      "title": {
        "weight": 0.1,
        "vector_id": "data/embeddings/screens/screen_..._title.npy",
        "source_text": "Factures - Clinique Demo"
      },
      "ui_elements": {
        "weight": 0.1,
        "aggregation": "mean",
        "element_count": 12
      }
    },
    
    "metadata": {
      "computation_time_ms": 78,
      "provider": "openclip_ViT-B-32"
    }
  }
}
```

### Utilisations du State Embedding

| Usage | Description | Seuil typique |
|-------|-------------|---------------|
| **Node Matching** | Reconnaître dans quel node on se trouve | > 0.85 |
| **UI Change Detection** | Détecter changements significatifs | < 0.70 |
| **Workflow Similarity** | Trouver workflows similaires | > 0.75 |
| **Historical Search** | Chercher états passés similaires | > 0.80 |

### 💡 Propositions d'Amélioration

**1. Embeddings contextuels adaptatifs**
```python
# Ajuster les poids selon le contexte
if workflow_type == "form_filling":
    weights = {"text": 0.5, "ui": 0.3, "image": 0.2}
elif workflow_type == "visual_inspection":
    weights = {"image": 0.7, "ui": 0.2, "text": 0.1}
```

**2. Embeddings temporels**
```json
{
  "temporal_context": {
    "previous_states": [
      {"state_id": "...", "time_delta_s": -2.5, "similarity": 0.92}
    ],
    "trajectory_embedding": "data/embeddings/trajectories/traj_001.npy"
  }
}
```

**3. Métriques de qualité**
```json
{
  "quality_metrics": {
    "component_alignment": 0.89,
    "information_preservation": 0.94,
    "discriminative_power": 0.87
  }
}
```

---

## 📊 Couche 4 : Workflow Graph - Modélisation en Graphe

### Objectif

Modéliser les workflows comme des graphes explicites (Nodes + Edges) avec progression d'apprentissage formalisée.

### Concepts Clés

**WorkflowNode** = Template d'état d'écran  
**WorkflowEdge** = Transition (action) entre deux nodes  
**Workflow** = Graphe complet avec learning state

### Structure WorkflowNode

```json
{
  "node_id": "N1_liste_factures",
  "label": "Liste des factures",
  "description": "Écran principal listant les factures avec statut à valider / validée.",
  
  "screen_template": {
    "window": {
      "app_name_any_of": ["logiciel_facturation"],
      "title_contains_any_of": ["Factures", "Liste des factures"]
    },
    
    "required_text_any": [
      "Factures",
      "Patient",
      "Montant",
      "Statut"
    ],
    
    "required_ui_elements": [
      {
        "role": "invoice_row",
        "type_any_of": ["table_row", "list_item"],
        "min_count": 1
      },
      {
        "role": "open_invoice",
        "type_any_of": ["button"],
        "label_contains_any_of": ["Ouvrir", "Détail"]
      }
    ],
    
    "embedding_prototype": {
      "provider": "openclip_ViT-B-32",
      "vector_id": "data/embeddings/workflows/WF_validation_facture/N1_prototype.npy",
      "min_cosine_similarity": 0.85,
      "sample_count": 15
    },
    
    "optional_elements": [
      {
        "role": "search_field",
        "type": "text_input"
      }
    ]
  },
  
  "metadata": {
    "created_at": "2025-11-21T10:30:00Z",
    "updated_at": "2025-11-21T10:30:00Z",
    "observation_count": 15,
    "tags": ["facturation", "liste"]
  }
}
```

### Structure WorkflowEdge

```json
{
  "edge_id": "E1_ouvrir_facture",
  "from_node": "N1_liste_factures",
  "to_node": "N2_detail_facture",
  
  "action": {
    "type": "mouse_click",
    "strategy": "row_then_button",
    
    "target": {
      "role": "invoice_row",
      "selection_policy": "first_pending",
      "fallback_strategy": "visual_similarity"
    },
    
    "secondary_target": {
      "role": "open_invoice",
      "optional": true
    },
    
    "parameters": {
      "click_offset": [0, 0],
      "double_click": false,
      "wait_after_ms": 500
    }
  },
  
  "constraints": {
    "max_delay_seconds": 5,
    "pre_conditions": [
      "element:invoice_row_visible",
      "element:invoice_row_status=pending"
    ],
    "post_conditions": [
      "window_title_changed",
      "new_ui_elements_detected"
    ]
  },
  
  "post_conditions": {
    "expected_node": "N2_detail_facture",
    "min_similarity": 0.85,
    "timeout_seconds": 5
  },
  
  "stats": {
    "manual_executions": 12,
    "assist_executions": 0,
    "auto_executions": 0,
    "success_count": 12,
    "failure_count": 0,
    "avg_execution_time_ms": 1250,
    "last_executed_at": "2025-11-21T10:30:00Z"
  },
  
  "metadata": {
    "created_at": "2025-11-21T10:30:00Z",
    "updated_at": "2025-11-21T10:30:00Z"
  }
}
```

### Structure Workflow Complète

```json
{
  "workflow_id": "WF_validation_facture",
  "name": "Validation facture consultation",
  "description": "Ouvrir une facture en attente, la contrôler et la valider.",
  "version": 1,
  
  "learning_state": "OBSERVATION",
  
  "created_at": "2025-11-21T10:45:00Z",
  "updated_at": "2025-11-21T10:45:00Z",
  
  "entry_nodes": ["N1_liste_factures"],
  "end_nodes": ["N5_liste_factures_maj"],
  
  "nodes": [
    {
      "node_id": "N1_liste_factures",
      "label": "Liste des factures",
      "description": "Écran principal listant les factures.",
      "screen_template": { /* ... */ }
    },
    {
      "node_id": "N2_detail_facture",
      "label": "Détail facture",
      "description": "Écran détaillé d'une facture unique.",
      "screen_template": { /* ... */ }
    },
    {
      "node_id": "N3_controle_facture",
      "label": "Contrôle facture",
      "description": "Écran de contrôle / vérification avant validation.",
      "screen_template": { /* ... */ }
    },
    {
      "node_id": "N4_popup_confirmation",
      "label": "Popup confirmation",
      "description": "Fenêtre modale de confirmation définitive.",
      "screen_template": { /* ... */ }
    },
    {
      "node_id": "N5_liste_factures_maj",
      "label": "Liste factures mise à jour",
      "description": "Retour à la liste, facture marquée 'Validée'.",
      "screen_template": {
        "similar_to_node": "N1_liste_factures",
        "additional_constraints": {
          "must_have_row_with_status": "Validée"
        }
      }
    }
  ],
  
  "edges": [
    {
      "edge_id": "E1_ouvrir_facture",
      "from_node": "N1_liste_factures",
      "to_node": "N2_detail_facture",
      "action": { /* ... */ }
    },
    {
      "edge_id": "E2_valider_depuis_detail",
      "from_node": "N2_detail_facture",
      "to_node_candidates": ["N3_controle_facture", "N4_popup_confirmation"],
      "action": {
        "type": "mouse_click",
        "target": {
          "role": "validate_button"
        }
      },
      "branching": {
        "type": "conditional",
        "detection_method": "screen_similarity",
        "learned_probability": {
          "N3_controle_facture": 0.7,
          "N4_popup_confirmation": 0.3
        }
      }
    },
    {
      "edge_id": "E3_confirmer_controle",
      "from_node": "N3_controle_facture",
      "to_node": "N4_popup_confirmation",
      "action": {
        "type": "compound",
        "steps": [
          {
            "type": "mouse_click",
            "target": {
              "role": "checkbox_verification"
            },
            "repeat": "all"
          },
          {
            "type": "mouse_click",
            "target": {
              "role": "confirm_validation"
            }
          }
        ]
      }
    },
    {
      "edge_id": "E4_confirmer_definitivement",
      "from_node": "N4_popup_confirmation",
      "to_node": "N5_liste_factures_maj",
      "action": {
        "type": "mouse_click",
        "target": {
          "role": "confirm_yes"
        }
      }
    }
  ],
  
  "safety_rules": {
    "forbidden_text_clicks": [
      "Supprimer",
      "Annuler la facture",
      "Effacer"
    ],
    "forbidden_roles": [
      "delete_action",
      "dangerous_action"
    ],
    "max_amount_without_manual_check": 1000.0,
    "require_confirmation_for": [
      "delete",
      "irreversible_action"
    ]
  },
  
  "stats": {
    "observed_runs": 15,
    "assist_runs": 0,
    "auto_candidate_runs": 0,
    "auto_confirmed_runs": 0,
    "success_rate_overall": 1.0,
    "avg_duration_seconds": 45.2,
    "total_executions": 15
  },
  
  "learning": {
    "state": "OBSERVATION",
    
    "thresholds": {
      "min_observed_runs_for_assist": 5,
      "min_assist_runs_for_auto_candidate": 10,
      "min_assist_success_rate_for_auto_candidate": 0.90,
      "min_auto_candidate_runs_for_auto_confirmed": 20,
      "min_auto_candidate_success_rate_for_auto_confirmed": 0.95
    },
    
    "progression": {
      "current_phase": "OBSERVATION",
      "progress_percent": 100.0,
      "next_phase": "COACHING",
      "requirements_met": true,
      "user_approval_required": true
    }
  },
  
  "metadata": {
    "created_by": "system",
    "customer": "Clinique Demo",
    "application": "logiciel_facturation",
    "tags": ["facturation", "T2A", "validation"]
  }
}
```

### États d'Apprentissage (Learning States)

```
┌──────────────────────────────────────────────────────────────┐
│ OBSERVATION (Shadow)                                         │
│ - Enregistre ScreenStates + actions                         │
│ - Détecte séquences répétées                                │
│ - Construit le graphe                                        │
│ Critères: ≥5 exécutions similaires                          │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│ COACHING (Assist)                                            │
│ - Reconnaît le début du workflow                            │
│ - Suggère les étapes à l'avance                             │
│ - Utilisateur exécute, système observe                      │
│ Critères: ≥10 exécutions assistées, succès >90%             │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│ AUTO_CANDIDATE (Semi-auto supervisé)                         │
│ - Exécute automatiquement                                    │
│ - Demande confirmation à chaque étape                        │
│ - Pause si écran inattendu                                   │
│ Critères: ≥20 exécutions, succès >95%, validation user      │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│ AUTO_CONFIRMÉ (Pilote automatique)                           │
│ - Exécute sans demander                                      │
│ - Whitelist activée                                          │
│ - Rétrogradation si UI change ou confiance baisse            │
│ Critères: Validation explicite utilisateur                  │
└──────────────────────────────────────────────────────────────┘
```

### 💡 Propositions d'Amélioration

**1. Graphes avec boucles et conditions**
```json
{
  "edge_id": "E_loop_search",
  "from_node": "N_search_results",
  "to_node": "N_search_results",
  "condition": {
    "type": "while",
    "expression": "not found_target_invoice",
    "max_iterations": 10
  }
}
```

**2. Sous-workflows réutilisables**
```json
{
  "workflow_id": "WF_validation_facture",
  "sub_workflows": [
    {
      "sub_workflow_id": "SUB_login",
      "entry_edge": "E0_start",
      "exit_edge": "E1_after_login"
    }
  ]
}
```

**3. Métriques de confiance par node**
```json
{
  "node_id": "N2_detail_facture",
  "confidence_metrics": {
    "recognition_accuracy": 0.96,
    "false_positive_rate": 0.02,
    "avg_matching_time_ms": 45
  }
}
```

---

## 🔄 Pipeline de Traitement Complet

### De RawSession à Workflow Appris

```
1. CAPTURE
   RawSession enregistrée
   ↓
2. ANALYSE
   Pour chaque screenshot_id:
     - Créer ScreenState (4 niveaux)
     - Détecter UIElements
     - Calculer state_embedding
   ↓
3. DÉTECTION DE PATTERNS
   Analyser séquence d'événements:
     - Grouper par fenêtre
     - Détecter répétitions
     - Identifier transitions récurrentes
   ↓
4. CONSTRUCTION DU GRAPHE
   Créer Workflow:
     - Nodes = ScreenStates similaires groupés
     - Edges = Actions entre nodes
     - Stats = Compteurs d'exécution
   ↓
5. APPRENTISSAGE
   Mettre à jour learning_state:
     - OBSERVATION → COACHING (si critères atteints)
     - Calculer embeddings prototypes
     - Affiner seuils de similarité
   ↓
6. EXÉCUTION
   Rejouer workflow:
     - Matcher state_emb → node
     - Trouver UIElement par rôle
     - Exécuter action
     - Vérifier post-conditions
```

### Exemple Concret : "Validation Facture T2A"

#### Étape 1 : Capture (RawSession)
```
Session: 15 minutes
Events: 45 événements
Screenshots: 12 captures
```

#### Étape 2 : Analyse (ScreenStates)
```
12 ScreenStates créés
Moyenne 8 UIElements par état
State embeddings calculés
```

#### Étape 3 : Détection
```
Pattern détecté: N1 → N2 → N3 → N4 → N5
Répétitions: 3 fois
Similarité moyenne: 0.92
```

#### Étape 4 : Construction
```
Workflow créé: WF_validation_facture
Nodes: 5 (N1 à N5)
Edges: 4 (E1 à E4)
Learning state: OBSERVATION
```

#### Étape 5 : Apprentissage
```
Après 5 exécutions:
  → Passage en COACHING
  → Embeddings prototypes calculés
  → Seuils affinés

Après 15 exécutions assistées:
  → Passage en AUTO_CANDIDATE
  → Confiance: 96%
```

#### Étape 6 : Exécution
```
Détection: "Je suis dans N1"
Action: Clic sur role=invoice_row
Vérification: Transition vers N2 (similarité 0.94)
Succès: ✓
```

---

## 📐 Mapping avec Code Existant

### Fichiers Actuels vs Architecture Cible

| Composant Actuel | Fichier | Architecture Cible |
|------------------|---------|-------------------|
| Event Capture | `event_capture.py` | → RawSession |
| Screen Capture | `enriched_screen_capture.py` | → ScreenState (raw) |
| UI Detection | `ui_element_detector.py` | → UIElement Detection |
| Embeddings | `multimodal_embedding_manager.py` | → State Embedding |
| Workflow Detection | `workflow_detector.py` | → Workflow Graph |
| Matching | `enhanced_workflow_matcher.py` | → Node Matching |

### Évolutions Nécessaires

**1. RawSession**
- ✅ Déjà partiellement implémenté (SessionManager)
- 🔧 À enrichir : métadonnées environnement, performance

**2. ScreenState**
- ✅ Bases existantes (EnrichedScreenState)
- 🔧 À structurer : 4 niveaux explicites
- 🔧 À ajouter : contexte métier

**3. UIElement**
- ✅ Modèles existants (UIElement, ui_element_models.py)
- 🔧 À enrichir : rôles sémantiques, embeddings duaux

**4. State Embedding**
- ✅ Embeddings existants (MultiModalEmbeddingManager)
- 🔧 À implémenter : fusion multi-modale

**5. Workflow Graph**
- 🔧 À créer : structure Nodes/Edges
- 🔧 À migrer : workflows actuels (liste de steps)
- 🔧 À implémenter : learning states

---

## 🚀 Plan de Migration Progressif

### Phase 1 : Fondations (Semaines 1-2)

**Objectif** : Structures de données et formats JSON

- [ ] Définir schémas JSON complets
- [ ] Créer classes Python (ScreenState, UIElement, WorkflowNode, etc.)
- [ ] Implémenter sérialisation/désérialisation
- [ ] Tests unitaires sur structures

**Livrables** :
- `geniusia2/core/models/screen_state.py`
- `geniusia2/core/models/workflow_graph.py`
- Schémas JSON dans `docs/schemas/`

### Phase 2 : UIElement Detection (Semaines 3-4)

**Objectif** : Pipeline de détection robuste

- [ ] Implémenter détection de régions d'intérêt
- [ ] Intégrer VLM pour zones cliquables
- [ ] Calculer embeddings duaux (image + texte)
- [ ] Classifier types et rôles

**Livrables** :
- `geniusia2/core/ui_element_pipeline.py`
- Tests avec screenshots réels

### Phase 3 : State Embedding (Semaines 5-6)

**Objectif** : Fusion multi-modale

- [ ] Implémenter fusion pondérée
- [ ] Calculer embeddings de titre/texte
- [ ] Agréger embeddings UI
- [ ] Benchmarker qualité (similarité, discrimination)

**Livrables** :
- `geniusia2/core/state_embedding_fusion.py`
- Métriques de qualité

### Phase 4 : Workflow Graph (Semaines 7-9)

**Objectif** : Modélisation en graphe

- [ ] Créer WorkflowNode avec templates
- [ ] Créer WorkflowEdge avec actions
- [ ] Implémenter matching node (state_emb → node)
- [ ] Migrer workflows existants

**Livrables** :
- `geniusia2/core/workflow_graph_builder.py`
- Script de migration

### Phase 5 : Learning States (Semaines 10-12)

**Objectif** : Progression d'apprentissage

- [ ] Implémenter machine à états
- [ ] Calculer métriques de progression
- [ ] Intégrer dans GUI (indicateurs)
- [ ] Tests end-to-end

**Livrables** :
- `geniusia2/core/learning_state_manager.py`
- GUI mise à jour

### Phase 6 : Production (Semaines 13-14)

**Objectif** : Déploiement et monitoring

- [ ] Tests utilisateurs
- [ ] Optimisations performance
- [ ] Documentation utilisateur
- [ ] Monitoring et métriques

---

## 📊 Métriques de Succès

### Qualité de Détection

| Métrique | Cible | Mesure |
|----------|-------|--------|
| Précision UIElement | >90% | TP / (TP + FP) |
| Rappel UIElement | >85% | TP / (TP + FN) |
| Précision Node Matching | >95% | Nodes correctement identifiés |
| Temps de traitement | <500ms | Par ScreenState |

### Qualité d'Apprentissage

| Métrique | Cible | Mesure |
|----------|-------|--------|
| Workflows détectés | >80% | Patterns réels détectés |
| Faux positifs | <5% | Workflows incorrects |
| Temps d'apprentissage | <10 exéc | Pour passer en COACHING |
| Taux de succès AUTO | >95% | Exécutions réussies |

### Performance Système

| Métrique | Cible | Mesure |
|----------|-------|--------|
| Latence capture | <50ms | Temps event → screenshot |
| Latence analyse | <300ms | Screenshot → ScreenState |
| Latence matching | <100ms | State → Node |
| Mémoire | <2GB | RAM utilisée |

---

## 🔒 Considérations de Sécurité

### Protection des Données

**1. Chiffrement des captures**
```json
{
  "raw": {
    "screenshot_path": "encrypted://data/screens/...",
    "encryption_method": "AES-256-GCM",
    "key_id": "key_2025_11"
  }
}
```

**2. Anonymisation des données sensibles**
```json
{
  "privacy": {
    "pii_detected": true,
    "anonymized_fields": ["patient_name", "ssn"],
    "anonymization_method": "hash_sha256"
  }
}
```

### Safety Rules

**1. Validation des actions**
- Whitelist d'applications autorisées
- Blacklist de rôles dangereux
- Confirmation pour actions irréversibles

**2. Rollback automatique**
- Sauvegarde état avant action
- Détection d'échec
- Restauration automatique

**3. Audit trail**
- Logs immuables de toutes les actions
- Traçabilité complète
- Conformité RGPD

---

## 📚 Références et Ressources

### Modèles de Vision

- **CLIP** : Contrastive Language-Image Pre-training
- **Pix2Struct** : Screenshot parsing as pretraining
- **Qwen-VL** : Vision-Language Model
- **OWL-v2** : Open-vocabulary object detection

### Techniques d'Embedding

- **Cosine Similarity** : Mesure de similarité
- **FAISS** : Indexation et recherche rapide
- **PCA** : Réduction dimensionnelle
- **t-SNE** : Visualisation

### Architectures de Graphes

- **State Machines** : Machines à états finis
- **Petri Nets** : Modélisation de workflows
- **DAG** : Directed Acyclic Graphs

---

## 📝 Conclusion

Cette architecture fournit une base solide pour transformer RPA Vision V2 d'un système de capture/rejeu simple en un **système d'apprentissage cognitif** capable de :

✅ **Comprendre** les interfaces à un niveau sémantique  
✅ **Apprendre** des workflows de manière progressive  
✅ **S'adapter** aux changements d'UI  
✅ **Exécuter** de manière robuste et sécurisée  

La migration peut se faire de manière **incrémentale** sans casser l'existant, en ajoutant progressivement les couches d'abstraction.

---

**Document créé le** : 22 novembre 2024  
**Auteur** : Architecture collaborative  
**Version** : 1.0  
**Statut** : ✅ Référence Architecturale Complète