Dom/Geniusia_v2

Fork 0

Files

Dom dcd4de9945 Initial commit

2026-03-05 00:20:25 +01:00

36 KiB

Raw Permalink Blame History

Architecture Vision Complète - RPA Vision V2

Date de création : 22 novembre 2024
Version : 1.0
Statut : 📐 Document de Référence Architecturale

🎯 Vue d'Ensemble

Ce document décrit l'architecture complète du système RPA Vision V2, depuis la capture brute des événements utilisateur jusqu'à l'exécution automatique de workflows appris.

Philosophie : "Observer → Comprendre → Apprendre → Agir"

Le système transforme progressivement des captures brutes en connaissances actionnables à travers 5 couches d'abstraction :

┌─────────────────────────────────────────────────────────────┐
│ Couche 0 : RawSession                                       │
│ Capture brute des événements (clics, touches, screenshots) │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ Couche 1 : ScreenState                                      │
│ Analyse multi-modale (image, texte, éléments UI, contexte) │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ Couche 2 : UIElement Detection                              │
│ Détection sémantique des éléments d'interface              │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ Couche 3 : State Embedding                                  │
│ Fusion multi-modale en vecteur unique (fingerprint)        │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ Couche 4 : Workflow Graph                                   │
│ Modélisation en graphe (Nodes, Edges, Learning States)     │
└─────────────────────────────────────────────────────────────┘

📊 Couche 0 : RawSession - Capture Brute

Objectif

Enregistrer fidèlement toutes les interactions utilisateur avec horodatage précis et contexte complet.

Structure JSON

Format : rawsession_v1

{
  "schema_version": "rawsession_v1",
  "session_id": "sess_2025-11-21T10-15-00_formateurX",
  "agent_version": "0.1.0",
  
  "environment": {
    "platform": "windows",
    "hostname": "PC-FORMATEUR-X",
    "screen": {
      "primary_resolution": [1920, 1080],
      "display_scale": 1.25
    }
  },
  
  "user": {
    "id": "formateurX",
    "label": "Formateur X démo clinique"
  },
  
  "context": {
    "customer": "Clinique Demo",
    "training_label": "Facturation_T2A_demo",
    "notes": "Session de formation interne"
  },
  
  "started_at": "2025-11-21T10:15:00Z",
  "ended_at": "2025-11-21T10:32:45Z",
  
  "events": [
    {
      "t": 0.523,
      "type": "mouse_click",
      "button": "left",
      "pos": [1620, 920],
      "window": {
        "title": "Factures - Clinique Demo",
        "app_name": "logiciel_facturation.exe"
      },
      "screenshot_id": "shot_0001"
    },
    {
      "t": 1.850,
      "type": "key_press",
      "keys": ["CTRL", "F"],
      "window": {
        "title": "Factures - Clinique Demo",
        "app_name": "logiciel_facturation.exe"
      },
      "screenshot_id": "shot_0002"
    },
    {
      "t": 3.200,
      "type": "mouse_scroll",
      "delta": -120,
      "pos": [800, 600],
      "window": {
        "title": "Factures - Clinique Demo",
        "app_name": "logiciel_facturation.exe"
      },
      "screenshot_id": null
    }
  ],
  
  "screenshots": [
    {
      "screenshot_id": "shot_0001",
      "relative_path": "shots/shot_0001.png",
      "captured_at": "2025-11-21T10:15:00.523Z"
    },
    {
      "screenshot_id": "shot_0002",
      "relative_path": "shots/shot_0002.png",
      "captured_at": "2025-11-21T10:15:01.850Z"
    }
  ]
}

Types d'Événements Supportés

Type	Description	Champs spécifiques
`mouse_click`	Clic souris	`button`, `pos`, `window`
`mouse_move`	Déplacement souris	`pos`, `window`
`mouse_scroll`	Scroll souris	`delta`, `pos`, `window`
`key_press`	Touche(s) clavier	`keys`, `window`
`key_release`	Relâchement touche	`keys`, `window`
`text_input`	Saisie de texte	`text`, `window`
`window_change`	Changement de fenêtre	`from_window`, `to_window`
`screen_change`	Changement d'écran détecté	`similarity_score`

💡 Propositions d'Amélioration

1. Ajout de métadonnées de performance

"performance": {
  "cpu_usage_percent": 45.2,
  "memory_usage_mb": 2048,
  "capture_latency_ms": 12
}

2. Support multi-écrans

"environment": {
  "screens": [
    {"id": 0, "resolution": [1920, 1080], "is_primary": true},
    {"id": 1, "resolution": [1920, 1080], "is_primary": false}
  ]
}

3. Versioning des événements

"events": [
  {
    "event_version": "1.0",
    "t": 0.523,
    ...
  }
]

📊 Couche 1 : ScreenState - Analyse Multi-Modale

Objectif

Transformer un screenshot brut en une représentation structurée à 4 niveaux d'abstraction.

Les 4 Niveaux

Niveau 1 : Raw (Ce que la machine voit)

Screenshot brut (image)
Résolution, ratio d'écran
Métadonnées de capture

Niveau 2 : Perception (Ce que la vision déduit)

Embeddings multi-modaux (CLIP, Pix2Struct, VLM)
Texte détecté (OCR/VLM)
Zones d'intérêt identifiées

Niveau 3 : Sémantique UI (Ce que le système comprend)

Liste d'UIElements structurés
Types, rôles, labels
Embeddings locaux par élément

Niveau 4 : Contexte Métier (Session/Application)

Application et fenêtre active
Workflow en cours (si identifié)
Variables métier

Structure JSON

{
  "screen_state_id": "screen_2025-11-21T10-15-32.123Z",
  "timestamp": "2025-11-21T10:15:32.123Z",
  "session_id": "session_abc123",
  
  "window": {
    "app_name": "logiciel_facturation",
    "window_title": "Factures - Clinique Demo",
    "screen_resolution": [1920, 1080],
    "workspace": "main"
  },
  
  "raw": {
    "screenshot_path": "data/screens/2025-11-21/10-15-32_factures.png",
    "capture_method": "mss",
    "file_size_bytes": 245678
  },
  
  "perception": {
    "embedding": {
      "provider": "openclip_ViT-B-32",
      "vector_id": "data/embeddings/screens/screen_2025-11-21T10-15-32.123Z.npy",
      "dimensions": 512
    },
    "detected_text": [
      "Factures",
      "Patient",
      "Montant",
      "Statut",
      "À valider",
      "Validée"
    ],
    "text_detection_method": "qwen_vl",
    "confidence_avg": 0.92
  },
  
  "ui_elements": [
    {
      "element_id": "el_row_001",
      "type": "table_row",
      "role": "invoice_row",
      "bbox": [100, 250, 1800, 280],
      "label": "DUPONT Jean - 120,00 € - À valider",
      "embedding": {
        "provider": "openclip_ViT-B-32",
        "vector_id": "data/embeddings/elements/screen_..._el_row_001.npy"
      },
      "tags": ["invoice", "pending"],
      "confidence": 0.94
    },
    {
      "element_id": "el_btn_open",
      "type": "button",
      "role": "open_invoice",
      "bbox": [1750, 250, 1850, 280],
      "label": "Ouvrir",
      "embedding": null,
      "tags": ["action", "primary"],
      "confidence": 0.98
    }
  ],
  
  "context": {
    "current_workflow_candidate": null,
    "workflow_step": null,
    "user_id": "dom",
    "tags": ["facturation"],
    "business_variables": {}
  },
  
  "metadata": {
    "processing_time_ms": 245,
    "ui_elements_count": 12,
    "text_regions_count": 45
  }
}

💡 Propositions d'Amélioration

1. Ajout de zones d'intérêt (ROI)

"perception": {
  "regions_of_interest": [
    {
      "roi_id": "roi_001",
      "bbox": [100, 200, 1800, 800],
      "type": "content_area",
      "importance": 0.9
    }
  ]
}

2. Historique de changements

"change_detection": {
  "previous_state_id": "screen_2025-11-21T10-15-30.123Z",
  "similarity_score": 0.87,
  "changed_regions": [
    {"bbox": [1500, 900, 1600, 940], "change_type": "new_element"}
  ]
}

3. Métriques de qualité

"quality_metrics": {
  "image_sharpness": 0.92,
  "text_readability": 0.88,
  "ui_element_detection_confidence": 0.91
}

📊 Couche 2 : UIElement Detection - Détection Sémantique

Objectif

Transformer un screenshot en liste d'objets UI sémantiques exploitables (boutons, champs, etc.).

Pipeline de Détection

Screenshot
    ↓
┌─────────────────────────────────────┐
│ Étape 1 : Proposer Régions d'Intérêt│
│ - Heuristiques (zones de texte)    │
│ - VLM (zones cliquables)            │
│ - Détection de patterns UI          │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│ Étape 2 : Caractériser Éléments    │
│ - Crop image → embedding image      │
│ - OCR/VLM → texte + embedding texte │
│ - Position + dimensions             │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│ Étape 3 : Classifier Type + Rôle   │
│ - Type : button, input, checkbox... │
│ - Rôle : primary_action, cancel...  │
│ - ID stable (hash)                  │
└─────────────────────────────────────┘
    ↓
Liste d'UIElements

Structure UIElement

{
  "element_id": "el_btn_valider_001",
  "type": "button",
  "role": "validate_invoice",
  "bbox": [1500, 900, 1600, 940],
  "center": [1550, 920],
  
  "label": "Valider la facture",
  "label_confidence": 0.96,
  
  "embeddings": {
    "image": {
      "provider": "openclip_ViT-B-32",
      "vector_id": "data/embeddings/elements/el_btn_valider_001_img.npy",
      "dimensions": 512
    },
    "text": {
      "provider": "openclip_ViT-B-32",
      "vector_id": "data/embeddings/elements/el_btn_valider_001_txt.npy",
      "dimensions": 512
    }
  },
  
  "visual_features": {
    "dominant_color": "#4CAF50",
    "has_icon": false,
    "shape": "rectangle",
    "size_category": "medium"
  },
  
  "tags": ["primary_action", "billing", "validation"],
  "confidence": 0.94,
  
  "metadata": {
    "detection_method": "qwen_vl",
    "detection_time_ms": 45
  }
}

Types d'Éléments Supportés

Type	Description	Rôles typiques
`button`	Bouton cliquable	`primary_action`, `cancel`, `submit`
`text_input`	Champ de saisie	`search_field`, `form_input`
`checkbox`	Case à cocher	`verification`, `selection`
`radio`	Bouton radio	`option_selection`
`dropdown`	Menu déroulant	`category_selector`
`tab`	Onglet	`navigation`
`link`	Lien hypertexte	`navigation`, `external_link`
`icon`	Icône	`action_trigger`, `status_indicator`
`table_row`	Ligne de tableau	`data_row`, `selectable_item`
`menu_item`	Élément de menu	`action`, `submenu`

💡 Propositions d'Amélioration

1. Hiérarchie d'éléments

{
  "element_id": "el_form_001",
  "type": "form",
  "children": [
    {"element_id": "el_input_001", "type": "text_input"},
    {"element_id": "el_btn_submit", "type": "button"}
  ]
}

2. États d'éléments

{
  "state": {
    "enabled": true,
    "visible": true,
    "focused": false,
    "selected": false,
    "value": "DUPONT Jean"
  }
}

3. Relations entre éléments

{
  "relationships": [
    {
      "type": "label_for",
      "target_element_id": "el_input_001"
    },
    {
      "type": "part_of_group",
      "group_id": "form_patient"
    }
  ]
}

📊 Couche 3 : State Embedding - Fusion Multi-Modale

Objectif

Créer un "fingerprint" unique de l'écran en fusionnant toutes les modalités (image, texte, UI, contexte).

Composantes du State Embedding

state_emb = fusion(
    img_emb,      # Embedding CLIP du screenshot entier
    text_emb,     # Embedding du texte détecté concaténé
    title_emb,    # Embedding du titre de fenêtre
    ui_emb,       # Moyenne des embeddings des éléments UI
    ctx_emb       # Contexte workflow/métier encodé
)

Méthodes de Fusion

Option A : Fusion Pondérée (Recommandée pour démarrer)

state_emb = normalize(
    0.5 * img_emb +      # Visuel global (50%)
    0.3 * text_emb +     # Texte détecté (30%)
    0.1 * title_emb +    # Titre fenêtre (10%)
    0.1 * ui_emb         # Éléments UI (10%)
)

Avantages :

Simple à implémenter
Pas de training nécessaire
Poids ajustables manuellement

Option B : Concaténation + Projection

z = concat([
    normalize(img_emb),
    normalize(text_emb),
    normalize(title_emb),
    normalize(ui_emb),
    normalize(ctx_emb)
])

state_emb = projection_layer(z)  # MLP ou PCA

Avantages :

Préserve toute l'information
Peut être affiné avec du training
Plus expressif

Structure JSON

{
  "state_embedding": {
    "embedding_id": "state_emb_2025-11-21T10-15-32.123Z",
    "vector_id": "data/embeddings/states/state_2025-11-21T10-15-32.123Z.npy",
    "dimensions": 512,
    "fusion_method": "weighted",
    
    "components": {
      "image": {
        "weight": 0.5,
        "vector_id": "data/embeddings/screens/screen_..._img.npy"
      },
      "text": {
        "weight": 0.3,
        "vector_id": "data/embeddings/screens/screen_..._text.npy",
        "source_text": "Factures | Patient | Montant | Statut | ..."
      },
      "title": {
        "weight": 0.1,
        "vector_id": "data/embeddings/screens/screen_..._title.npy",
        "source_text": "Factures - Clinique Demo"
      },
      "ui_elements": {
        "weight": 0.1,
        "aggregation": "mean",
        "element_count": 12
      }
    },
    
    "metadata": {
      "computation_time_ms": 78,
      "provider": "openclip_ViT-B-32"
    }
  }
}

Utilisations du State Embedding

Usage	Description	Seuil typique
Node Matching	Reconnaître dans quel node on se trouve	> 0.85
UI Change Detection	Détecter changements significatifs	< 0.70
Workflow Similarity	Trouver workflows similaires	> 0.75
Historical Search	Chercher états passés similaires	> 0.80

💡 Propositions d'Amélioration

1. Embeddings contextuels adaptatifs

# Ajuster les poids selon le contexte
if workflow_type == "form_filling":
    weights = {"text": 0.5, "ui": 0.3, "image": 0.2}
elif workflow_type == "visual_inspection":
    weights = {"image": 0.7, "ui": 0.2, "text": 0.1}

2. Embeddings temporels

{
  "temporal_context": {
    "previous_states": [
      {"state_id": "...", "time_delta_s": -2.5, "similarity": 0.92}
    ],
    "trajectory_embedding": "data/embeddings/trajectories/traj_001.npy"
  }
}

3. Métriques de qualité

{
  "quality_metrics": {
    "component_alignment": 0.89,
    "information_preservation": 0.94,
    "discriminative_power": 0.87
  }
}

📊 Couche 4 : Workflow Graph - Modélisation en Graphe

Objectif

Modéliser les workflows comme des graphes explicites (Nodes + Edges) avec progression d'apprentissage formalisée.

Concepts Clés

WorkflowNode = Template d'état d'écran
WorkflowEdge = Transition (action) entre deux nodes
Workflow = Graphe complet avec learning state

Structure WorkflowNode

{
  "node_id": "N1_liste_factures",
  "label": "Liste des factures",
  "description": "Écran principal listant les factures avec statut à valider / validée.",
  
  "screen_template": {
    "window": {
      "app_name_any_of": ["logiciel_facturation"],
      "title_contains_any_of": ["Factures", "Liste des factures"]
    },
    
    "required_text_any": [
      "Factures",
      "Patient",
      "Montant",
      "Statut"
    ],
    
    "required_ui_elements": [
      {
        "role": "invoice_row",
        "type_any_of": ["table_row", "list_item"],
        "min_count": 1
      },
      {
        "role": "open_invoice",
        "type_any_of": ["button"],
        "label_contains_any_of": ["Ouvrir", "Détail"]
      }
    ],
    
    "embedding_prototype": {
      "provider": "openclip_ViT-B-32",
      "vector_id": "data/embeddings/workflows/WF_validation_facture/N1_prototype.npy",
      "min_cosine_similarity": 0.85,
      "sample_count": 15
    },
    
    "optional_elements": [
      {
        "role": "search_field",
        "type": "text_input"
      }
    ]
  },
  
  "metadata": {
    "created_at": "2025-11-21T10:30:00Z",
    "updated_at": "2025-11-21T10:30:00Z",
    "observation_count": 15,
    "tags": ["facturation", "liste"]
  }
}

Structure WorkflowEdge

{
  "edge_id": "E1_ouvrir_facture",
  "from_node": "N1_liste_factures",
  "to_node": "N2_detail_facture",
  
  "action": {
    "type": "mouse_click",
    "strategy": "row_then_button",
    
    "target": {
      "role": "invoice_row",
      "selection_policy": "first_pending",
      "fallback_strategy": "visual_similarity"
    },
    
    "secondary_target": {
      "role": "open_invoice",
      "optional": true
    },
    
    "parameters": {
      "click_offset": [0, 0],
      "double_click": false,
      "wait_after_ms": 500
    }
  },
  
  "constraints": {
    "max_delay_seconds": 5,
    "pre_conditions": [
      "element:invoice_row_visible",
      "element:invoice_row_status=pending"
    ],
    "post_conditions": [
      "window_title_changed",
      "new_ui_elements_detected"
    ]
  },
  
  "post_conditions": {
    "expected_node": "N2_detail_facture",
    "min_similarity": 0.85,
    "timeout_seconds": 5
  },
  
  "stats": {
    "manual_executions": 12,
    "assist_executions": 0,
    "auto_executions": 0,
    "success_count": 12,
    "failure_count": 0,
    "avg_execution_time_ms": 1250,
    "last_executed_at": "2025-11-21T10:30:00Z"
  },
  
  "metadata": {
    "created_at": "2025-11-21T10:30:00Z",
    "updated_at": "2025-11-21T10:30:00Z"
  }
}

Structure Workflow Complète

{
  "workflow_id": "WF_validation_facture",
  "name": "Validation facture consultation",
  "description": "Ouvrir une facture en attente, la contrôler et la valider.",
  "version": 1,
  
  "learning_state": "OBSERVATION",
  
  "created_at": "2025-11-21T10:45:00Z",
  "updated_at": "2025-11-21T10:45:00Z",
  
  "entry_nodes": ["N1_liste_factures"],
  "end_nodes": ["N5_liste_factures_maj"],
  
  "nodes": [
    {
      "node_id": "N1_liste_factures",
      "label": "Liste des factures",
      "description": "Écran principal listant les factures.",
      "screen_template": { /* ... */ }
    },
    {
      "node_id": "N2_detail_facture",
      "label": "Détail facture",
      "description": "Écran détaillé d'une facture unique.",
      "screen_template": { /* ... */ }
    },
    {
      "node_id": "N3_controle_facture",
      "label": "Contrôle facture",
      "description": "Écran de contrôle / vérification avant validation.",
      "screen_template": { /* ... */ }
    },
    {
      "node_id": "N4_popup_confirmation",
      "label": "Popup confirmation",
      "description": "Fenêtre modale de confirmation définitive.",
      "screen_template": { /* ... */ }
    },
    {
      "node_id": "N5_liste_factures_maj",
      "label": "Liste factures mise à jour",
      "description": "Retour à la liste, facture marquée 'Validée'.",
      "screen_template": {
        "similar_to_node": "N1_liste_factures",
        "additional_constraints": {
          "must_have_row_with_status": "Validée"
        }
      }
    }
  ],
  
  "edges": [
    {
      "edge_id": "E1_ouvrir_facture",
      "from_node": "N1_liste_factures",
      "to_node": "N2_detail_facture",
      "action": { /* ... */ }
    },
    {
      "edge_id": "E2_valider_depuis_detail",
      "from_node": "N2_detail_facture",
      "to_node_candidates": ["N3_controle_facture", "N4_popup_confirmation"],
      "action": {
        "type": "mouse_click",
        "target": {
          "role": "validate_button"
        }
      },
      "branching": {
        "type": "conditional",
        "detection_method": "screen_similarity",
        "learned_probability": {
          "N3_controle_facture": 0.7,
          "N4_popup_confirmation": 0.3
        }
      }
    },
    {
      "edge_id": "E3_confirmer_controle",
      "from_node": "N3_controle_facture",
      "to_node": "N4_popup_confirmation",
      "action": {
        "type": "compound",
        "steps": [
          {
            "type": "mouse_click",
            "target": {
              "role": "checkbox_verification"
            },
            "repeat": "all"
          },
          {
            "type": "mouse_click",
            "target": {
              "role": "confirm_validation"
            }
          }
        ]
      }
    },
    {
      "edge_id": "E4_confirmer_definitivement",
      "from_node": "N4_popup_confirmation",
      "to_node": "N5_liste_factures_maj",
      "action": {
        "type": "mouse_click",
        "target": {
          "role": "confirm_yes"
        }
      }
    }
  ],
  
  "safety_rules": {
    "forbidden_text_clicks": [
      "Supprimer",
      "Annuler la facture",
      "Effacer"
    ],
    "forbidden_roles": [
      "delete_action",
      "dangerous_action"
    ],
    "max_amount_without_manual_check": 1000.0,
    "require_confirmation_for": [
      "delete",
      "irreversible_action"
    ]
  },
  
  "stats": {
    "observed_runs": 15,
    "assist_runs": 0,
    "auto_candidate_runs": 0,
    "auto_confirmed_runs": 0,
    "success_rate_overall": 1.0,
    "avg_duration_seconds": 45.2,
    "total_executions": 15
  },
  
  "learning": {
    "state": "OBSERVATION",
    
    "thresholds": {
      "min_observed_runs_for_assist": 5,
      "min_assist_runs_for_auto_candidate": 10,
      "min_assist_success_rate_for_auto_candidate": 0.90,
      "min_auto_candidate_runs_for_auto_confirmed": 20,
      "min_auto_candidate_success_rate_for_auto_confirmed": 0.95
    },
    
    "progression": {
      "current_phase": "OBSERVATION",
      "progress_percent": 100.0,
      "next_phase": "COACHING",
      "requirements_met": true,
      "user_approval_required": true
    }
  },
  
  "metadata": {
    "created_by": "system",
    "customer": "Clinique Demo",
    "application": "logiciel_facturation",
    "tags": ["facturation", "T2A", "validation"]
  }
}

États d'Apprentissage (Learning States)

┌──────────────────────────────────────────────────────────────┐
│ OBSERVATION (Shadow)                                         │
│ - Enregistre ScreenStates + actions                         │
│ - Détecte séquences répétées                                │
│ - Construit le graphe                                        │
│ Critères: ≥5 exécutions similaires                          │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│ COACHING (Assist)                                            │
│ - Reconnaît le début du workflow                            │
│ - Suggère les étapes à l'avance                             │
│ - Utilisateur exécute, système observe                      │
│ Critères: ≥10 exécutions assistées, succès >90%             │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│ AUTO_CANDIDATE (Semi-auto supervisé)                         │
│ - Exécute automatiquement                                    │
│ - Demande confirmation à chaque étape                        │
│ - Pause si écran inattendu                                   │
│ Critères: ≥20 exécutions, succès >95%, validation user      │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│ AUTO_CONFIRMÉ (Pilote automatique)                           │
│ - Exécute sans demander                                      │
│ - Whitelist activée                                          │
│ - Rétrogradation si UI change ou confiance baisse            │
│ Critères: Validation explicite utilisateur                  │
└──────────────────────────────────────────────────────────────┘

💡 Propositions d'Amélioration

1. Graphes avec boucles et conditions

{
  "edge_id": "E_loop_search",
  "from_node": "N_search_results",
  "to_node": "N_search_results",
  "condition": {
    "type": "while",
    "expression": "not found_target_invoice",
    "max_iterations": 10
  }
}

2. Sous-workflows réutilisables

{
  "workflow_id": "WF_validation_facture",
  "sub_workflows": [
    {
      "sub_workflow_id": "SUB_login",
      "entry_edge": "E0_start",
      "exit_edge": "E1_after_login"
    }
  ]
}

3. Métriques de confiance par node

{
  "node_id": "N2_detail_facture",
  "confidence_metrics": {
    "recognition_accuracy": 0.96,
    "false_positive_rate": 0.02,
    "avg_matching_time_ms": 45
  }
}

🔄 Pipeline de Traitement Complet

De RawSession à Workflow Appris

1. CAPTURE
   RawSession enregistrée
   ↓
2. ANALYSE
   Pour chaque screenshot_id:
     - Créer ScreenState (4 niveaux)
     - Détecter UIElements
     - Calculer state_embedding
   ↓
3. DÉTECTION DE PATTERNS
   Analyser séquence d'événements:
     - Grouper par fenêtre
     - Détecter répétitions
     - Identifier transitions récurrentes
   ↓
4. CONSTRUCTION DU GRAPHE
   Créer Workflow:
     - Nodes = ScreenStates similaires groupés
     - Edges = Actions entre nodes
     - Stats = Compteurs d'exécution
   ↓
5. APPRENTISSAGE
   Mettre à jour learning_state:
     - OBSERVATION → COACHING (si critères atteints)
     - Calculer embeddings prototypes
     - Affiner seuils de similarité
   ↓
6. EXÉCUTION
   Rejouer workflow:
     - Matcher state_emb → node
     - Trouver UIElement par rôle
     - Exécuter action
     - Vérifier post-conditions

Exemple Concret : "Validation Facture T2A"

Étape 1 : Capture (RawSession)

Session: 15 minutes
Events: 45 événements
Screenshots: 12 captures

Étape 2 : Analyse (ScreenStates)

12 ScreenStates créés
Moyenne 8 UIElements par état
State embeddings calculés

Étape 3 : Détection

Pattern détecté: N1 → N2 → N3 → N4 → N5
Répétitions: 3 fois
Similarité moyenne: 0.92

Étape 4 : Construction

Workflow créé: WF_validation_facture
Nodes: 5 (N1 à N5)
Edges: 4 (E1 à E4)
Learning state: OBSERVATION

Étape 5 : Apprentissage

Après 5 exécutions:
  → Passage en COACHING
  → Embeddings prototypes calculés
  → Seuils affinés

Après 15 exécutions assistées:
  → Passage en AUTO_CANDIDATE
  → Confiance: 96%

Étape 6 : Exécution

Détection: "Je suis dans N1"
Action: Clic sur role=invoice_row
Vérification: Transition vers N2 (similarité 0.94)
Succès: ✓

📐 Mapping avec Code Existant

Fichiers Actuels vs Architecture Cible

Composant Actuel	Fichier	Architecture Cible
Event Capture	`event_capture.py`	→ RawSession
Screen Capture	`enriched_screen_capture.py`	→ ScreenState (raw)
UI Detection	`ui_element_detector.py`	→ UIElement Detection
Embeddings	`multimodal_embedding_manager.py`	→ State Embedding
Workflow Detection	`workflow_detector.py`	→ Workflow Graph
Matching	`enhanced_workflow_matcher.py`	→ Node Matching

Évolutions Nécessaires

1. RawSession

✅ Déjà partiellement implémenté (SessionManager)
🔧 À enrichir : métadonnées environnement, performance

2. ScreenState

✅ Bases existantes (EnrichedScreenState)
🔧 À structurer : 4 niveaux explicites
🔧 À ajouter : contexte métier

3. UIElement

✅ Modèles existants (UIElement, ui_element_models.py)
🔧 À enrichir : rôles sémantiques, embeddings duaux

4. State Embedding

✅ Embeddings existants (MultiModalEmbeddingManager)
🔧 À implémenter : fusion multi-modale

5. Workflow Graph

🔧 À créer : structure Nodes/Edges
🔧 À migrer : workflows actuels (liste de steps)
🔧 À implémenter : learning states

🚀 Plan de Migration Progressif

Phase 1 : Fondations (Semaines 1-2)

Objectif : Structures de données et formats JSON

Définir schémas JSON complets
Créer classes Python (ScreenState, UIElement, WorkflowNode, etc.)
Implémenter sérialisation/désérialisation
Tests unitaires sur structures

Livrables :

geniusia2/core/models/screen_state.py
geniusia2/core/models/workflow_graph.py
Schémas JSON dans docs/schemas/

Phase 2 : UIElement Detection (Semaines 3-4)

Objectif : Pipeline de détection robuste

Implémenter détection de régions d'intérêt
Intégrer VLM pour zones cliquables
Calculer embeddings duaux (image + texte)
Classifier types et rôles

Livrables :

geniusia2/core/ui_element_pipeline.py
Tests avec screenshots réels

Phase 3 : State Embedding (Semaines 5-6)

Objectif : Fusion multi-modale

Implémenter fusion pondérée
Calculer embeddings de titre/texte
Agréger embeddings UI
Benchmarker qualité (similarité, discrimination)

Livrables :

geniusia2/core/state_embedding_fusion.py
Métriques de qualité

Phase 4 : Workflow Graph (Semaines 7-9)

Objectif : Modélisation en graphe

Créer WorkflowNode avec templates
Créer WorkflowEdge avec actions
Implémenter matching node (state_emb → node)
Migrer workflows existants

Livrables :

geniusia2/core/workflow_graph_builder.py
Script de migration

Phase 5 : Learning States (Semaines 10-12)

Objectif : Progression d'apprentissage

Implémenter machine à états
Calculer métriques de progression
Intégrer dans GUI (indicateurs)
Tests end-to-end

Livrables :

geniusia2/core/learning_state_manager.py
GUI mise à jour

Phase 6 : Production (Semaines 13-14)

Objectif : Déploiement et monitoring

Tests utilisateurs
Optimisations performance
Documentation utilisateur
Monitoring et métriques

📊 Métriques de Succès

Qualité de Détection

Métrique	Cible	Mesure
Précision UIElement	>90%	TP / (TP + FP)
Rappel UIElement	>85%	TP / (TP + FN)
Précision Node Matching	>95%	Nodes correctement identifiés
Temps de traitement	<500ms	Par ScreenState

Qualité d'Apprentissage

Métrique	Cible	Mesure
Workflows détectés	>80%	Patterns réels détectés
Faux positifs	<5%	Workflows incorrects
Temps d'apprentissage	<10 exéc	Pour passer en COACHING
Taux de succès AUTO	>95%	Exécutions réussies

Performance Système

Métrique	Cible	Mesure
Latence capture	<50ms	Temps event → screenshot
Latence analyse	<300ms	Screenshot → ScreenState
Latence matching	<100ms	State → Node
Mémoire	<2GB	RAM utilisée

🔒 Considérations de Sécurité

Protection des Données

1. Chiffrement des captures

{
  "raw": {
    "screenshot_path": "encrypted://data/screens/...",
    "encryption_method": "AES-256-GCM",
    "key_id": "key_2025_11"
  }
}

2. Anonymisation des données sensibles

{
  "privacy": {
    "pii_detected": true,
    "anonymized_fields": ["patient_name", "ssn"],
    "anonymization_method": "hash_sha256"
  }
}

Safety Rules

1. Validation des actions

Whitelist d'applications autorisées
Blacklist de rôles dangereux
Confirmation pour actions irréversibles

2. Rollback automatique

Sauvegarde état avant action
Détection d'échec
Restauration automatique

3. Audit trail

Logs immuables de toutes les actions
Traçabilité complète
Conformité RGPD

📚 Références et Ressources

Modèles de Vision

CLIP : Contrastive Language-Image Pre-training
Pix2Struct : Screenshot parsing as pretraining
Qwen-VL : Vision-Language Model
OWL-v2 : Open-vocabulary object detection

Techniques d'Embedding

Cosine Similarity : Mesure de similarité
FAISS : Indexation et recherche rapide
PCA : Réduction dimensionnelle
t-SNE : Visualisation

Architectures de Graphes

State Machines : Machines à états finis
Petri Nets : Modélisation de workflows
DAG : Directed Acyclic Graphs

📝 Conclusion

Cette architecture fournit une base solide pour transformer RPA Vision V2 d'un système de capture/rejeu simple en un système d'apprentissage cognitif capable de :

✅ Comprendre les interfaces à un niveau sémantique
✅ Apprendre des workflows de manière progressive
✅ S'adapter aux changements d'UI
✅ Exécuter de manière robuste et sécurisée

La migration peut se faire de manière incrémentale sans casser l'existant, en ajoutant progressivement les couches d'abstraction.

Document créé le : 22 novembre 2024
Auteur : Architecture collaborative
Version : 1.0
Statut : ✅ Référence Architecturale Complète

36 KiB Raw Permalink Blame History

Architecture Vision Complète - RPA Vision V2

🎯 Vue d'Ensemble

Philosophie : "Observer → Comprendre → Apprendre → Agir"

📊 Couche 0 : RawSession - Capture Brute

Objectif

Structure JSON

Types d'Événements Supportés

💡 Propositions d'Amélioration

📊 Couche 1 : ScreenState - Analyse Multi-Modale

Objectif

Les 4 Niveaux

Niveau 1 : Raw (Ce que la machine voit)

Niveau 2 : Perception (Ce que la vision déduit)

Niveau 3 : Sémantique UI (Ce que le système comprend)

Niveau 4 : Contexte Métier (Session/Application)

Structure JSON

💡 Propositions d'Amélioration

📊 Couche 2 : UIElement Detection - Détection Sémantique

Objectif

Pipeline de Détection

Structure UIElement

Types d'Éléments Supportés

💡 Propositions d'Amélioration

📊 Couche 3 : State Embedding - Fusion Multi-Modale

Objectif

Composantes du State Embedding

Méthodes de Fusion

Option A : Fusion Pondérée (Recommandée pour démarrer)

Option B : Concaténation + Projection

Structure JSON

Utilisations du State Embedding

💡 Propositions d'Amélioration

📊 Couche 4 : Workflow Graph - Modélisation en Graphe

Objectif

Concepts Clés

Structure WorkflowNode

Structure WorkflowEdge

Structure Workflow Complète

États d'Apprentissage (Learning States)

💡 Propositions d'Amélioration

🔄 Pipeline de Traitement Complet

De RawSession à Workflow Appris

Exemple Concret : "Validation Facture T2A"

Étape 1 : Capture (RawSession)

Étape 2 : Analyse (ScreenStates)

Étape 3 : Détection

Étape 4 : Construction

Étape 5 : Apprentissage

Étape 6 : Exécution

📐 Mapping avec Code Existant

Fichiers Actuels vs Architecture Cible

Évolutions Nécessaires

🚀 Plan de Migration Progressif

Phase 1 : Fondations (Semaines 1-2)

Phase 2 : UIElement Detection (Semaines 3-4)

Phase 3 : State Embedding (Semaines 5-6)

Phase 4 : Workflow Graph (Semaines 7-9)

Phase 5 : Learning States (Semaines 10-12)

Phase 6 : Production (Semaines 13-14)

📊 Métriques de Succès

Qualité de Détection

Qualité d'Apprentissage

Performance Système

🔒 Considérations de Sécurité

Protection des Données

Safety Rules

📚 Références et Ressources

Modèles de Vision

Techniques d'Embedding

Architectures de Graphes

📝 Conclusion

36 KiB

Raw Permalink Blame History