1420 lines
44 KiB
Markdown
1420 lines
44 KiB
Markdown
# Document de Design - Workflow Graph Implementation
|
|
|
|
## Vue d'Ensemble
|
|
|
|
Ce document décrit le design détaillé de l'implémentation de l'architecture Workflow Graph pour RPA Vision V2. Le système transforme des captures d'écran brutes en workflows sémantiques appris à travers 5 couches d'abstraction progressive.
|
|
|
|
**Philosophie** : "Observer → Comprendre → Apprendre → Agir"
|
|
|
|
Le système ne travaille PAS avec des coordonnées de clics, mais avec une compréhension sémantique des interfaces : types d'éléments, rôles, contexte visuel et textuel.
|
|
|
|
**Architecture en 5 Couches** :
|
|
```
|
|
RawSession (Couche 0) → ScreenState (Couche 1) → UIElement Detection (Couche 2)
|
|
→ State Embedding (Couche 3) → Workflow Graph (Couche 4)
|
|
```
|
|
|
|
**Référence** : Ce design s'appuie sur `docs/reference/ARCHITECTURE_VISION_COMPLETE.md` et `docs/reference/ARCHITECTURE_ENRICHISSEMENTS.md`.
|
|
|
|
## Architecture
|
|
|
|
### Architecture Globale
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Couche 4 : Workflow Graph │
|
|
│ WorkflowNode + WorkflowEdge + Learning States │
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
|
|
│ │ Graph Builder│ │ Node Matcher │ │ Action Executor │ │
|
|
│ └──────────────┘ └──────────────┘ └──────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
↕
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Couche 3 : State Embedding │
|
|
│ Fusion Multi-Modale (Image + Text + UI + Context) │
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
|
|
│ │ Fusion Engine│ │ FAISS Index │ │ Similarity Comp │ │
|
|
│ └──────────────┘ └──────────────┘ └──────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
↕
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Couche 2 : UIElement Detection │
|
|
│ Détection Sémantique (Type + Rôle + Embeddings) │
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
|
|
│ │ VLM Detector │ │ Classifier │ │ Embedding Gen │ │
|
|
│ └──────────────┘ └──────────────┘ └──────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
↕
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Couche 1 : ScreenState │
|
|
│ Analyse Multi-Modale (4 Niveaux) │
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
|
|
│ │ Raw Capture │ │ Perception │ │ Semantic + Ctx │ │
|
|
│ └──────────────┘ └──────────────┘ └──────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
↕
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Couche 0 : RawSession │
|
|
│ Capture Brute (Events + Screenshots) │
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
|
|
│ │ Event Logger │ │ Screenshot │ │ Session Manager │ │
|
|
│ └──────────────┘ └──────────────┘ └──────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
|
|
### Structure des Répertoires
|
|
|
|
```
|
|
geniusia2/
|
|
├── core/
|
|
│ ├── models/
|
|
│ │ ├── raw_session.py # Couche 0 : RawSession
|
|
│ │ ├── screen_state.py # Couche 1 : ScreenState
|
|
│ │ ├── ui_element.py # Couche 2 : UIElement
|
|
│ │ ├── state_embedding.py # Couche 3 : State Embedding
|
|
│ │ └── workflow_graph.py # Couche 4 : Workflow Graph
|
|
│ ├── capture/
|
|
│ │ ├── event_capture.py # Capture événements
|
|
│ │ └── screenshot_capture.py # Capture screenshots
|
|
│ ├── detection/
|
|
│ │ ├── ui_detector.py # Détection UI avec VLM
|
|
│ │ └── text_detector.py # Détection texte
|
|
│ ├── embedding/
|
|
│ │ ├── fusion_engine.py # Fusion multi-modale
|
|
│ │ ├── faiss_manager.py # Gestion index FAISS
|
|
│ │ └── similarity.py # Calculs de similarité
|
|
│ ├── graph/
|
|
│ │ ├── graph_builder.py # Construction graphes
|
|
│ │ ├── node_matcher.py # Matching nodes
|
|
│ │ ├── action_executor.py # Exécution actions
|
|
│ │ └── learning_manager.py # Gestion états apprentissage
|
|
│ └── persistence/
|
|
│ ├── json_serializer.py # Sérialisation JSON
|
|
│ └── storage_manager.py # Gestion stockage
|
|
└── data/
|
|
├── sessions/ # RawSessions
|
|
├── screen_states/ # ScreenStates
|
|
├── embeddings/ # Embeddings (.npy)
|
|
├── faiss_index/ # Index FAISS
|
|
└── workflows/ # Workflow Graphs
|
|
```
|
|
|
|
## Composants et Interfaces
|
|
|
|
### Couche 0 : RawSession
|
|
|
|
**Responsabilité** : Capture fidèle de tous les événements utilisateur avec screenshots.
|
|
|
|
**Classe Principale** : `RawSession`
|
|
|
|
```python
|
|
@dataclass
|
|
class RawSession:
|
|
schema_version: str = "rawsession_v1"
|
|
session_id: str
|
|
agent_version: str
|
|
environment: Dict[str, Any]
|
|
user: Dict[str, str]
|
|
context: Dict[str, str]
|
|
started_at: datetime
|
|
ended_at: Optional[datetime]
|
|
events: List[Event]
|
|
screenshots: List[Screenshot]
|
|
|
|
def add_event(self, event: Event) -> None:
|
|
"""Ajouter un événement à la session"""
|
|
|
|
def add_screenshot(self, screenshot: Screenshot) -> None:
|
|
"""Ajouter un screenshot à la session"""
|
|
|
|
def to_json(self) -> Dict[str, Any]:
|
|
"""Sérialiser en JSON"""
|
|
|
|
@classmethod
|
|
def from_json(cls, data: Dict[str, Any]) -> 'RawSession':
|
|
"""Désérialiser depuis JSON"""
|
|
```
|
|
|
|
**Classe Event** :
|
|
|
|
```python
|
|
@dataclass
|
|
class Event:
|
|
t: float # Timestamp relatif en secondes
|
|
type: str # mouse_click, key_press, etc.
|
|
window: WindowContext
|
|
screenshot_id: Optional[str]
|
|
# Champs spécifiques selon type
|
|
data: Dict[str, Any]
|
|
```
|
|
|
|
|
|
|
|
### Couche 1 : ScreenState
|
|
|
|
**Responsabilité** : Représentation structurée d'un écran à 4 niveaux d'abstraction.
|
|
|
|
**Classe Principale** : `ScreenState`
|
|
|
|
```python
|
|
@dataclass
|
|
class ScreenState:
|
|
screen_state_id: str
|
|
timestamp: datetime
|
|
session_id: str
|
|
window: WindowContext
|
|
|
|
# Niveau 1 : Raw
|
|
raw: RawLevel
|
|
|
|
# Niveau 2 : Perception
|
|
perception: PerceptionLevel
|
|
|
|
# Niveau 3 : Sémantique UI
|
|
ui_elements: List[UIElement]
|
|
|
|
# Niveau 4 : Contexte Métier
|
|
context: ContextLevel
|
|
|
|
metadata: Dict[str, Any]
|
|
|
|
def to_json(self) -> Dict[str, Any]:
|
|
"""Sérialiser en JSON"""
|
|
|
|
@classmethod
|
|
def from_json(cls, data: Dict[str, Any]) -> 'ScreenState':
|
|
"""Désérialiser depuis JSON"""
|
|
```
|
|
|
|
**Niveaux** :
|
|
|
|
```python
|
|
@dataclass
|
|
class RawLevel:
|
|
screenshot_path: str
|
|
capture_method: str
|
|
file_size_bytes: int
|
|
|
|
@dataclass
|
|
class PerceptionLevel:
|
|
embedding: EmbeddingRef
|
|
detected_text: List[str]
|
|
text_detection_method: str
|
|
confidence_avg: float
|
|
|
|
@dataclass
|
|
class ContextLevel:
|
|
current_workflow_candidate: Optional[str]
|
|
workflow_step: Optional[int]
|
|
user_id: str
|
|
tags: List[str]
|
|
business_variables: Dict[str, Any]
|
|
```
|
|
|
|
### Couche 2 : UIElement Detection
|
|
|
|
**Responsabilité** : Détection sémantique des éléments UI avec types, rôles et embeddings.
|
|
|
|
**Classe Principale** : `UIElement`
|
|
|
|
```python
|
|
@dataclass
|
|
class UIElement:
|
|
element_id: str
|
|
type: str # button, text_input, checkbox, etc.
|
|
role: str # primary_action, cancel, form_input, etc.
|
|
bbox: Tuple[int, int, int, int]
|
|
center: Tuple[int, int]
|
|
|
|
label: str
|
|
label_confidence: float
|
|
|
|
embeddings: UIElementEmbeddings
|
|
visual_features: VisualFeatures
|
|
|
|
tags: List[str]
|
|
confidence: float
|
|
|
|
metadata: Dict[str, Any]
|
|
|
|
def to_json(self) -> Dict[str, Any]:
|
|
"""Sérialiser en JSON"""
|
|
```
|
|
|
|
**Embeddings Duaux** :
|
|
|
|
```python
|
|
@dataclass
|
|
class UIElementEmbeddings:
|
|
image: EmbeddingRef # Embedding de l'image croppée
|
|
text: EmbeddingRef # Embedding du texte détecté
|
|
```
|
|
|
|
**Détecteur UI** :
|
|
|
|
```python
|
|
class UIDetector:
|
|
def __init__(self, vlm_model: str, clip_model: str):
|
|
self.vlm = VLMClient(vlm_model)
|
|
self.clip = CLIPEmbedder(clip_model)
|
|
|
|
def detect_elements(self, screenshot: np.ndarray,
|
|
window_context: WindowContext) -> List[UIElement]:
|
|
"""Détecter tous les éléments UI dans un screenshot"""
|
|
# 1. Proposer régions d'intérêt avec VLM
|
|
# 2. Caractériser chaque élément (crop, OCR, embeddings)
|
|
# 3. Classifier type et rôle
|
|
# 4. Retourner liste d'UIElements
|
|
```
|
|
|
|
|
|
|
|
### Couche 3 : State Embedding
|
|
|
|
**Responsabilité** : Fusion multi-modale en vecteur unique (fingerprint d'écran).
|
|
|
|
**Classe Principale** : `StateEmbedding`
|
|
|
|
```python
|
|
@dataclass
|
|
class StateEmbedding:
|
|
embedding_id: str
|
|
vector_id: str # Chemin vers .npy
|
|
dimensions: int
|
|
fusion_method: str # "weighted" ou "concat_projection"
|
|
|
|
components: Dict[str, EmbeddingComponent]
|
|
metadata: Dict[str, Any]
|
|
|
|
def get_vector(self) -> np.ndarray:
|
|
"""Charger le vecteur depuis le fichier"""
|
|
|
|
def compute_similarity(self, other: 'StateEmbedding') -> float:
|
|
"""Calculer similarité cosinus avec autre embedding"""
|
|
```
|
|
|
|
**Moteur de Fusion** :
|
|
|
|
```python
|
|
class FusionEngine:
|
|
def __init__(self, method: str = "weighted",
|
|
weights: Optional[Dict[str, float]] = None):
|
|
self.method = method
|
|
self.weights = weights or {
|
|
"image": 0.5,
|
|
"text": 0.3,
|
|
"title": 0.1,
|
|
"ui": 0.1
|
|
}
|
|
|
|
def fuse(self,
|
|
img_emb: np.ndarray,
|
|
text_emb: np.ndarray,
|
|
title_emb: np.ndarray,
|
|
ui_emb: np.ndarray) -> np.ndarray:
|
|
"""Fusionner tous les embeddings en un seul vecteur"""
|
|
if self.method == "weighted":
|
|
return self._weighted_fusion(img_emb, text_emb, title_emb, ui_emb)
|
|
elif self.method == "concat_projection":
|
|
return self._concat_projection(img_emb, text_emb, title_emb, ui_emb)
|
|
|
|
def _weighted_fusion(self, img_emb, text_emb, title_emb, ui_emb) -> np.ndarray:
|
|
"""Fusion pondérée simple"""
|
|
fused = (
|
|
self.weights["image"] * normalize(img_emb) +
|
|
self.weights["text"] * normalize(text_emb) +
|
|
self.weights["title"] * normalize(title_emb) +
|
|
self.weights["ui"] * normalize(ui_emb)
|
|
)
|
|
return normalize(fused)
|
|
```
|
|
|
|
**Gestionnaire FAISS** :
|
|
|
|
```python
|
|
class FAISSManager:
|
|
def __init__(self, index_path: str, dimension: int = 512):
|
|
self.index_path = index_path
|
|
self.dimension = dimension
|
|
self.index = self._load_or_create_index()
|
|
self.metadata_store: Dict[int, Dict[str, Any]] = {}
|
|
|
|
def add_embedding(self, embedding: np.ndarray,
|
|
metadata: Dict[str, Any]) -> int:
|
|
"""Ajouter un embedding à l'index"""
|
|
idx = self.index.ntotal
|
|
self.index.add(embedding.reshape(1, -1))
|
|
self.metadata_store[idx] = metadata
|
|
return idx
|
|
|
|
def search_similar(self, query: np.ndarray,
|
|
k: int = 5) -> List[SearchResult]:
|
|
"""Chercher les k plus proches voisins"""
|
|
distances, indices = self.index.search(query.reshape(1, -1), k)
|
|
results = []
|
|
for dist, idx in zip(distances[0], indices[0]):
|
|
results.append(SearchResult(
|
|
id=int(idx),
|
|
distance=float(dist),
|
|
similarity=1.0 - float(dist), # Cosine similarity
|
|
metadata=self.metadata_store.get(int(idx), {})
|
|
))
|
|
return results
|
|
```
|
|
|
|
|
|
|
|
### Couche 4 : Workflow Graph
|
|
|
|
**Responsabilité** : Modélisation des workflows en graphes avec apprentissage progressif.
|
|
|
|
**Classe WorkflowNode** :
|
|
|
|
```python
|
|
@dataclass
|
|
class WorkflowNode:
|
|
node_id: str
|
|
label: str
|
|
description: str
|
|
screen_template: ScreenTemplate
|
|
metadata: Dict[str, Any]
|
|
|
|
def matches(self, screen_state: ScreenState,
|
|
state_embedding: StateEmbedding) -> Tuple[bool, float]:
|
|
"""Vérifier si un ScreenState correspond à ce node"""
|
|
# 1. Vérifier contraintes fenêtre
|
|
# 2. Vérifier texte requis
|
|
# 3. Vérifier éléments UI requis
|
|
# 4. Vérifier similarité embedding
|
|
# Retourner (match, confidence)
|
|
|
|
@dataclass
|
|
class ScreenTemplate:
|
|
window: WindowConstraints
|
|
required_text_any: List[str]
|
|
required_ui_elements: List[UIElementConstraint]
|
|
embedding_prototype: EmbeddingPrototype
|
|
optional_elements: List[UIElementConstraint]
|
|
|
|
@dataclass
|
|
class EmbeddingPrototype:
|
|
provider: str
|
|
vector_id: str
|
|
min_cosine_similarity: float
|
|
sample_count: int
|
|
```
|
|
|
|
**Classe WorkflowEdge** :
|
|
|
|
```python
|
|
@dataclass
|
|
class WorkflowEdge:
|
|
edge_id: str
|
|
from_node: str
|
|
to_node: str
|
|
action: Action
|
|
constraints: EdgeConstraints
|
|
post_conditions: PostConditions
|
|
stats: EdgeStats
|
|
metadata: Dict[str, Any]
|
|
|
|
def can_execute(self, current_state: ScreenState) -> Tuple[bool, str]:
|
|
"""Vérifier si l'edge peut être exécuté"""
|
|
# Vérifier pre-conditions
|
|
|
|
def execute(self, executor: ActionExecutor) -> ExecutionResult:
|
|
"""Exécuter l'action de cet edge"""
|
|
|
|
@dataclass
|
|
class Action:
|
|
type: str # mouse_click, key_press, text_input, compound
|
|
target: TargetSpec
|
|
parameters: Dict[str, Any]
|
|
|
|
@dataclass
|
|
class TargetSpec:
|
|
role: str # Rôle sémantique de l'élément cible
|
|
selection_policy: str # first, last, by_similarity
|
|
fallback_strategy: str # visual_similarity, position
|
|
```
|
|
|
|
**Classe Workflow** :
|
|
|
|
```python
|
|
@dataclass
|
|
class Workflow:
|
|
workflow_id: str
|
|
name: str
|
|
description: str
|
|
version: int
|
|
|
|
learning_state: str # OBSERVATION, COACHING, AUTO_CANDIDATE, AUTO_CONFIRMÉ
|
|
|
|
created_at: datetime
|
|
updated_at: datetime
|
|
|
|
entry_nodes: List[str]
|
|
end_nodes: List[str]
|
|
|
|
nodes: List[WorkflowNode]
|
|
edges: List[WorkflowEdge]
|
|
|
|
safety_rules: SafetyRules
|
|
stats: WorkflowStats
|
|
learning: LearningConfig
|
|
metadata: Dict[str, Any]
|
|
|
|
def get_node(self, node_id: str) -> Optional[WorkflowNode]:
|
|
"""Récupérer un node par ID"""
|
|
|
|
def get_outgoing_edges(self, node_id: str) -> List[WorkflowEdge]:
|
|
"""Récupérer tous les edges sortants d'un node"""
|
|
|
|
def to_json(self) -> Dict[str, Any]:
|
|
"""Sérialiser en JSON"""
|
|
```
|
|
|
|
|
|
|
|
### Graph Builder
|
|
|
|
**Responsabilité** : Construire automatiquement des Workflow Graphs depuis des RawSessions.
|
|
|
|
```python
|
|
class GraphBuilder:
|
|
def __init__(self,
|
|
faiss_manager: FAISSManager,
|
|
fusion_engine: FusionEngine,
|
|
ui_detector: UIDetector):
|
|
self.faiss = faiss_manager
|
|
self.fusion = fusion_engine
|
|
self.ui_detector = ui_detector
|
|
|
|
def build_from_session(self, session: RawSession) -> Optional[Workflow]:
|
|
"""Construire un workflow depuis une session"""
|
|
# 1. Créer ScreenStates pour tous les screenshots
|
|
screen_states = self._create_screen_states(session)
|
|
|
|
# 2. Calculer State Embeddings
|
|
embeddings = self._compute_embeddings(screen_states)
|
|
|
|
# 3. Détecter patterns répétés
|
|
patterns = self._detect_patterns(screen_states, embeddings)
|
|
|
|
# 4. Construire nodes et edges
|
|
if patterns:
|
|
return self._build_workflow(patterns)
|
|
return None
|
|
|
|
def _detect_patterns(self,
|
|
screen_states: List[ScreenState],
|
|
embeddings: List[StateEmbedding]) -> List[Pattern]:
|
|
"""Détecter séquences répétées"""
|
|
# Utiliser clustering sur embeddings
|
|
# Identifier transitions récurrentes
|
|
# Retourner patterns détectés
|
|
```
|
|
|
|
### Node Matcher
|
|
|
|
**Responsabilité** : Matcher un ScreenState actuel contre les WorkflowNodes existants.
|
|
|
|
```python
|
|
class NodeMatcher:
|
|
def __init__(self,
|
|
faiss_manager: FAISSManager,
|
|
fusion_engine: FusionEngine):
|
|
self.faiss = faiss_manager
|
|
self.fusion = fusion_engine
|
|
|
|
def match(self,
|
|
screen_state: ScreenState,
|
|
workflow: Workflow) -> Optional[NodeMatch]:
|
|
"""Trouver le node correspondant au ScreenState actuel"""
|
|
# 1. Calculer State Embedding
|
|
state_emb = self._compute_state_embedding(screen_state)
|
|
|
|
# 2. Chercher dans FAISS les prototypes similaires
|
|
candidates = self.faiss.search_similar(state_emb.get_vector(), k=5)
|
|
|
|
# 3. Filtrer par workflow_id
|
|
candidates = [c for c in candidates
|
|
if c.metadata.get('workflow_id') == workflow.workflow_id]
|
|
|
|
# 4. Valider contraintes pour chaque candidat
|
|
for candidate in candidates:
|
|
node = workflow.get_node(candidate.metadata['node_id'])
|
|
if node:
|
|
matches, confidence = node.matches(screen_state, state_emb)
|
|
if matches:
|
|
return NodeMatch(
|
|
node=node,
|
|
confidence=confidence,
|
|
embedding_similarity=candidate.similarity
|
|
)
|
|
|
|
return None
|
|
|
|
@dataclass
|
|
class NodeMatch:
|
|
node: WorkflowNode
|
|
confidence: float
|
|
embedding_similarity: float
|
|
```
|
|
|
|
|
|
|
|
### Action Executor
|
|
|
|
**Responsabilité** : Exécuter les actions définies dans WorkflowEdges.
|
|
|
|
```python
|
|
class ActionExecutor:
|
|
def __init__(self, input_controller):
|
|
self.input = input_controller
|
|
|
|
def execute_edge(self,
|
|
edge: WorkflowEdge,
|
|
current_state: ScreenState) -> ExecutionResult:
|
|
"""Exécuter l'action d'un edge"""
|
|
# 1. Vérifier pre-conditions
|
|
can_execute, reason = edge.can_execute(current_state)
|
|
if not can_execute:
|
|
return ExecutionResult(success=False, reason=reason)
|
|
|
|
# 2. Trouver élément cible par rôle
|
|
target_element = self._find_target_element(
|
|
edge.action.target,
|
|
current_state
|
|
)
|
|
if not target_element:
|
|
return ExecutionResult(success=False,
|
|
reason="Target element not found")
|
|
|
|
# 3. Exécuter action
|
|
if edge.action.type == "mouse_click":
|
|
self._execute_click(target_element, edge.action.parameters)
|
|
elif edge.action.type == "text_input":
|
|
self._execute_text_input(target_element, edge.action.parameters)
|
|
elif edge.action.type == "compound":
|
|
self._execute_compound(edge.action, current_state)
|
|
|
|
# 4. Attendre post-conditions
|
|
success = self._wait_for_postconditions(edge.post_conditions)
|
|
|
|
return ExecutionResult(success=success)
|
|
|
|
def _find_target_element(self,
|
|
target: TargetSpec,
|
|
state: ScreenState) -> Optional[UIElement]:
|
|
"""Trouver élément UI par rôle sémantique"""
|
|
candidates = [el for el in state.ui_elements
|
|
if el.role == target.role]
|
|
|
|
if not candidates:
|
|
return None
|
|
|
|
if target.selection_policy == "first":
|
|
return candidates[0]
|
|
elif target.selection_policy == "last":
|
|
return candidates[-1]
|
|
elif target.selection_policy == "by_similarity":
|
|
# Utiliser embedding similarity
|
|
return self._select_by_similarity(candidates, target)
|
|
|
|
return candidates[0]
|
|
|
|
@dataclass
|
|
class ExecutionResult:
|
|
success: bool
|
|
reason: Optional[str] = None
|
|
execution_time_ms: Optional[float] = None
|
|
```
|
|
|
|
### Learning Manager
|
|
|
|
**Responsabilité** : Gérer les états d'apprentissage et transitions.
|
|
|
|
```python
|
|
class LearningManager:
|
|
def __init__(self):
|
|
self.workflows: Dict[str, Workflow] = {}
|
|
|
|
def update_workflow_stats(self,
|
|
workflow_id: str,
|
|
execution_result: ExecutionResult) -> None:
|
|
"""Mettre à jour statistiques après exécution"""
|
|
workflow = self.workflows[workflow_id]
|
|
workflow.stats.total_executions += 1
|
|
|
|
if execution_result.success:
|
|
workflow.stats.success_count += 1
|
|
else:
|
|
workflow.stats.failure_count += 1
|
|
|
|
# Vérifier si transition d'état nécessaire
|
|
self._check_state_transition(workflow)
|
|
|
|
def _check_state_transition(self, workflow: Workflow) -> None:
|
|
"""Vérifier et effectuer transitions d'état si nécessaire"""
|
|
current_state = workflow.learning_state
|
|
|
|
if current_state == "OBSERVATION":
|
|
if self._can_transition_to_coaching(workflow):
|
|
self._transition_to(workflow, "COACHING")
|
|
|
|
elif current_state == "COACHING":
|
|
if self._can_transition_to_auto_candidate(workflow):
|
|
self._transition_to(workflow, "AUTO_CANDIDATE")
|
|
|
|
elif current_state == "AUTO_CANDIDATE":
|
|
if self._can_transition_to_auto_confirmed(workflow):
|
|
# Nécessite validation utilisateur
|
|
self._request_user_approval(workflow)
|
|
|
|
elif current_state == "AUTO_CONFIRMÉ":
|
|
if self._should_rollback(workflow):
|
|
self._transition_to(workflow, "COACHING")
|
|
|
|
def _can_transition_to_coaching(self, workflow: Workflow) -> bool:
|
|
"""Vérifier critères pour OBSERVATION → COACHING"""
|
|
return (workflow.stats.observed_runs >= 5 and
|
|
workflow.stats.avg_similarity >= 0.90)
|
|
|
|
def _can_transition_to_auto_candidate(self, workflow: Workflow) -> bool:
|
|
"""Vérifier critères pour COACHING → AUTO_CANDIDATE"""
|
|
return (workflow.stats.assist_runs >= 10 and
|
|
workflow.stats.success_rate >= 0.90)
|
|
|
|
def _can_transition_to_auto_confirmed(self, workflow: Workflow) -> bool:
|
|
"""Vérifier critères pour AUTO_CANDIDATE → AUTO_CONFIRMÉ"""
|
|
return (workflow.stats.auto_candidate_runs >= 20 and
|
|
workflow.stats.success_rate >= 0.95)
|
|
|
|
def _should_rollback(self, workflow: Workflow) -> bool:
|
|
"""Vérifier si rollback nécessaire"""
|
|
return workflow.stats.recent_confidence < 0.90
|
|
```
|
|
|
|
|
|
|
|
## Modèles de Données
|
|
|
|
### Format JSON Complet
|
|
|
|
**RawSession** :
|
|
```json
|
|
{
|
|
"schema_version": "rawsession_v1",
|
|
"session_id": "sess_2025-11-22T10-15-00_user1",
|
|
"agent_version": "0.2.0",
|
|
"environment": {
|
|
"platform": "linux",
|
|
"hostname": "dev-machine",
|
|
"screen": {"primary_resolution": [1920, 1080]}
|
|
},
|
|
"user": {"id": "user1", "label": "Developer User"},
|
|
"context": {"customer": "Demo", "training_label": "workflow_test"},
|
|
"started_at": "2025-11-22T10:15:00Z",
|
|
"ended_at": "2025-11-22T10:30:00Z",
|
|
"events": [
|
|
{
|
|
"t": 0.523,
|
|
"type": "mouse_click",
|
|
"button": "left",
|
|
"pos": [800, 400],
|
|
"window": {"title": "App", "app_name": "app.exe"},
|
|
"screenshot_id": "shot_0001"
|
|
}
|
|
],
|
|
"screenshots": [
|
|
{
|
|
"screenshot_id": "shot_0001",
|
|
"relative_path": "shots/shot_0001.png",
|
|
"captured_at": "2025-11-22T10:15:00.523Z"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**ScreenState** :
|
|
```json
|
|
{
|
|
"screen_state_id": "screen_2025-11-22T10-15-32.123Z",
|
|
"timestamp": "2025-11-22T10:15:32.123Z",
|
|
"session_id": "sess_2025-11-22T10-15-00_user1",
|
|
"window": {
|
|
"app_name": "app",
|
|
"window_title": "Main Window",
|
|
"screen_resolution": [1920, 1080]
|
|
},
|
|
"raw": {
|
|
"screenshot_path": "data/screens/2025-11-22/10-15-32.png",
|
|
"capture_method": "mss",
|
|
"file_size_bytes": 245678
|
|
},
|
|
"perception": {
|
|
"embedding": {
|
|
"provider": "openclip_ViT-B-32",
|
|
"vector_id": "data/embeddings/screens/screen_2025-11-22T10-15-32.123Z.npy",
|
|
"dimensions": 512
|
|
},
|
|
"detected_text": ["Button", "Input", "Submit"],
|
|
"text_detection_method": "qwen_vl",
|
|
"confidence_avg": 0.92
|
|
},
|
|
"ui_elements": [
|
|
{
|
|
"element_id": "el_btn_001",
|
|
"type": "button",
|
|
"role": "primary_action",
|
|
"bbox": [100, 200, 200, 240],
|
|
"center": [150, 220],
|
|
"label": "Submit",
|
|
"label_confidence": 0.96,
|
|
"embeddings": {
|
|
"image": {
|
|
"provider": "openclip_ViT-B-32",
|
|
"vector_id": "data/embeddings/elements/el_btn_001_img.npy",
|
|
"dimensions": 512
|
|
},
|
|
"text": {
|
|
"provider": "openclip_ViT-B-32",
|
|
"vector_id": "data/embeddings/elements/el_btn_001_txt.npy",
|
|
"dimensions": 512
|
|
}
|
|
},
|
|
"visual_features": {
|
|
"dominant_color": "#4CAF50",
|
|
"has_icon": false,
|
|
"shape": "rectangle",
|
|
"size_category": "medium"
|
|
},
|
|
"tags": ["action", "primary"],
|
|
"confidence": 0.94
|
|
}
|
|
],
|
|
"context": {
|
|
"current_workflow_candidate": null,
|
|
"workflow_step": null,
|
|
"user_id": "user1",
|
|
"tags": ["demo"],
|
|
"business_variables": {}
|
|
},
|
|
"metadata": {
|
|
"processing_time_ms": 245,
|
|
"ui_elements_count": 5
|
|
}
|
|
}
|
|
```
|
|
|
|
|
|
|
|
**Workflow Graph** :
|
|
```json
|
|
{
|
|
"workflow_id": "WF_demo_workflow",
|
|
"name": "Demo Workflow",
|
|
"description": "Simple demo workflow for testing",
|
|
"version": 1,
|
|
"learning_state": "OBSERVATION",
|
|
"created_at": "2025-11-22T10:45:00Z",
|
|
"updated_at": "2025-11-22T10:45:00Z",
|
|
"entry_nodes": ["N1_start"],
|
|
"end_nodes": ["N3_end"],
|
|
"nodes": [
|
|
{
|
|
"node_id": "N1_start",
|
|
"label": "Start Screen",
|
|
"description": "Initial screen with form",
|
|
"screen_template": {
|
|
"window": {
|
|
"app_name_any_of": ["app"],
|
|
"title_contains_any_of": ["Main"]
|
|
},
|
|
"required_text_any": ["Submit", "Input"],
|
|
"required_ui_elements": [
|
|
{
|
|
"role": "primary_action",
|
|
"type_any_of": ["button"],
|
|
"min_count": 1
|
|
}
|
|
],
|
|
"embedding_prototype": {
|
|
"provider": "openclip_ViT-B-32",
|
|
"vector_id": "data/embeddings/workflows/WF_demo/N1_prototype.npy",
|
|
"min_cosine_similarity": 0.85,
|
|
"sample_count": 5
|
|
}
|
|
},
|
|
"metadata": {
|
|
"created_at": "2025-11-22T10:45:00Z",
|
|
"observation_count": 5
|
|
}
|
|
}
|
|
],
|
|
"edges": [
|
|
{
|
|
"edge_id": "E1_submit",
|
|
"from_node": "N1_start",
|
|
"to_node": "N2_processing",
|
|
"action": {
|
|
"type": "mouse_click",
|
|
"target": {
|
|
"role": "primary_action",
|
|
"selection_policy": "first",
|
|
"fallback_strategy": "visual_similarity"
|
|
},
|
|
"parameters": {
|
|
"click_offset": [0, 0],
|
|
"wait_after_ms": 500
|
|
}
|
|
},
|
|
"constraints": {
|
|
"max_delay_seconds": 5,
|
|
"pre_conditions": ["element:primary_action_visible"],
|
|
"post_conditions": ["window_title_changed"]
|
|
},
|
|
"post_conditions": {
|
|
"expected_node": "N2_processing",
|
|
"min_similarity": 0.85,
|
|
"timeout_seconds": 5
|
|
},
|
|
"stats": {
|
|
"manual_executions": 5,
|
|
"assist_executions": 0,
|
|
"auto_executions": 0,
|
|
"success_count": 5,
|
|
"failure_count": 0,
|
|
"avg_execution_time_ms": 1200
|
|
}
|
|
}
|
|
],
|
|
"safety_rules": {
|
|
"forbidden_text_clicks": ["Delete", "Remove"],
|
|
"forbidden_roles": ["delete_action"],
|
|
"require_confirmation_for": ["irreversible_action"]
|
|
},
|
|
"stats": {
|
|
"observed_runs": 5,
|
|
"assist_runs": 0,
|
|
"auto_candidate_runs": 0,
|
|
"auto_confirmed_runs": 0,
|
|
"success_rate_overall": 1.0,
|
|
"avg_duration_seconds": 30.5
|
|
},
|
|
"learning": {
|
|
"state": "OBSERVATION",
|
|
"thresholds": {
|
|
"min_observed_runs_for_coaching": 5,
|
|
"min_assist_runs_for_auto_candidate": 10,
|
|
"min_auto_candidate_runs_for_auto_confirmed": 20
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Gestion des Erreurs
|
|
|
|
### Échecs de Matching
|
|
|
|
**Scénario** : Aucun node ne correspond au ScreenState actuel.
|
|
|
|
**Gestion** :
|
|
1. Logger le ScreenState non matché avec screenshot
|
|
2. Calculer similarité avec tous les nodes existants
|
|
3. Si similarité proche (0.75-0.84), suggérer mise à jour du node
|
|
4. Si similarité faible (<0.75), suggérer création d'un nouveau node
|
|
5. Notifier l'utilisateur et mettre en pause l'exécution
|
|
|
|
### Échecs de Détection UI
|
|
|
|
**Scénario** : Élément cible non trouvé par rôle.
|
|
|
|
**Gestion** :
|
|
1. Logger l'échec avec contexte (node, edge, rôle recherché)
|
|
2. Essayer stratégie de fallback (visual similarity)
|
|
3. Si fallback échoue, essayer position approximative
|
|
4. Si tout échoue, notifier utilisateur et demander correction
|
|
5. Mettre à jour le template du node avec feedback
|
|
|
|
### Violations de Post-Conditions
|
|
|
|
**Scénario** : Post-conditions non satisfaites après exécution.
|
|
|
|
**Gestion** :
|
|
1. Logger la violation avec détails (attendu vs réel)
|
|
2. Attendre timeout configuré
|
|
3. Si toujours pas satisfait, marquer exécution comme échec
|
|
4. Incrémenter compteur d'échecs pour cet edge
|
|
5. Si échecs répétés (>3), marquer edge comme problématique
|
|
|
|
### Changements d'UI Détectés
|
|
|
|
**Scénario** : Similarité d'embedding chute significativement.
|
|
|
|
**Gestion** :
|
|
1. Détecter changement (similarité < 0.70 vs prototype)
|
|
2. Capturer nouveau screenshot pour analyse
|
|
3. Mettre en pause l'exécution automatique
|
|
4. Notifier utilisateur du changement détecté
|
|
5. Proposer ré-apprentissage du node affecté
|
|
|
|
|
|
|
|
## Stratégie de Test
|
|
|
|
### Tests Unitaires
|
|
|
|
**Composants à Tester** :
|
|
|
|
1. **RawSession** :
|
|
- Sérialisation/désérialisation JSON
|
|
- Ajout d'événements et screenshots
|
|
- Validation de schéma
|
|
|
|
2. **ScreenState** :
|
|
- Création des 4 niveaux
|
|
- Sérialisation/désérialisation JSON
|
|
- Validation de structure
|
|
|
|
3. **UIElement** :
|
|
- Détection de types et rôles
|
|
- Génération d'embeddings duaux
|
|
- Calcul de features visuelles
|
|
|
|
4. **StateEmbedding** :
|
|
- Fusion pondérée
|
|
- Fusion par concaténation
|
|
- Calcul de similarité cosinus
|
|
|
|
5. **FAISSManager** :
|
|
- Ajout d'embeddings
|
|
- Recherche de similarité
|
|
- Sauvegarde/chargement d'index
|
|
|
|
6. **WorkflowNode** :
|
|
- Matching avec ScreenState
|
|
- Validation de contraintes
|
|
- Calcul de confiance
|
|
|
|
7. **WorkflowEdge** :
|
|
- Validation de pre-conditions
|
|
- Exécution d'actions
|
|
- Vérification de post-conditions
|
|
|
|
8. **LearningManager** :
|
|
- Transitions d'états
|
|
- Calcul de métriques
|
|
- Détection de rollback
|
|
|
|
**Framework** : pytest avec fixtures
|
|
|
|
### Tests d'Intégration
|
|
|
|
**Scénarios** :
|
|
|
|
1. **Pipeline Complet RawSession → Workflow** :
|
|
- Capturer session
|
|
- Créer ScreenStates
|
|
- Détecter UI elements
|
|
- Calculer embeddings
|
|
- Construire workflow graph
|
|
- Vérifier structure du graphe
|
|
|
|
2. **Matching et Exécution** :
|
|
- Charger workflow existant
|
|
- Matcher ScreenState actuel
|
|
- Trouver edge sortant
|
|
- Exécuter action
|
|
- Vérifier transition
|
|
|
|
3. **Apprentissage Progressif** :
|
|
- Simuler 5 observations → COACHING
|
|
- Simuler 10 assistances → AUTO_CANDIDATE
|
|
- Simuler 20 exécutions → AUTO_CONFIRMÉ
|
|
- Vérifier transitions
|
|
|
|
4. **Gestion d'Erreurs** :
|
|
- Simuler échec de matching
|
|
- Simuler élément non trouvé
|
|
- Simuler violation post-conditions
|
|
- Vérifier récupération
|
|
|
|
### Tests de Performance
|
|
|
|
**Métriques Cibles** :
|
|
|
|
| Opération | Temps Cible | Méthode de Test |
|
|
|-----------|-------------|-----------------|
|
|
| Compute State Embedding | < 100ms | pytest-benchmark |
|
|
| FAISS Search | < 50ms | pytest-benchmark |
|
|
| UI Detection | < 200ms | pytest-benchmark |
|
|
| Action Execution | < 50ms | pytest-benchmark |
|
|
| End-to-End Processing | < 400ms | pytest-benchmark |
|
|
|
|
**Outils** : pytest-benchmark, cProfile
|
|
|
|
### Tests de Bout en Bout
|
|
|
|
**Workflow de Test** :
|
|
|
|
1. **Apprentissage Complet** :
|
|
- Capturer 5 sessions d'un workflow simple
|
|
- Vérifier construction automatique du graphe
|
|
- Vérifier transition OBSERVATION → COACHING
|
|
- Exécuter en mode assisté 10 fois
|
|
- Vérifier transition COACHING → AUTO_CANDIDATE
|
|
|
|
2. **Robustesse UI** :
|
|
- Apprendre workflow sur UI version 1
|
|
- Modifier légèrement UI (couleurs, positions)
|
|
- Vérifier que matching fonctionne toujours
|
|
- Modifier significativement UI
|
|
- Vérifier détection de changement
|
|
|
|
3. **Multi-Workflows** :
|
|
- Apprendre 3 workflows différents
|
|
- Vérifier isolation des workflows
|
|
- Vérifier matching correct pour chaque workflow
|
|
- Vérifier pas de confusion entre workflows
|
|
|
|
|
|
|
|
## Correctness Properties
|
|
|
|
*A property is a characteristic or behavior that should hold true across all valid executions of a system-essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
|
|
|
|
### Property 1: RawSession Serialization Round Trip
|
|
|
|
*For any* valid RawSession object, serializing to JSON then deserializing should produce an equivalent RawSession with all events and screenshots preserved.
|
|
|
|
**Validates: Requirements 1.4, 1.5**
|
|
|
|
### Property 2: ScreenState Multi-Level Consistency
|
|
|
|
*For any* ScreenState, all 4 levels (Raw, Perception, Sémantique UI, Contexte) should reference the same screenshot and timestamp.
|
|
|
|
**Validates: Requirements 2.1, 2.2, 2.3, 2.4, 2.5**
|
|
|
|
### Property 3: UIElement Detection Confidence Bounds
|
|
|
|
*For any* detected UIElement, the confidence score should be between 0.0 and 1.0, and elements with confidence below threshold should not be included in results.
|
|
|
|
**Validates: Requirements 3.6**
|
|
|
|
### Property 4: State Embedding Normalization
|
|
|
|
*For any* State Embedding vector, the L2 norm should be 1.0 (normalized vector).
|
|
|
|
**Validates: Requirements 4.6**
|
|
|
|
### Property 5: State Embedding Similarity Symmetry
|
|
|
|
*For any* two State Embeddings A and B, similarity(A, B) should equal similarity(B, A).
|
|
|
|
**Validates: Requirements 4.7**
|
|
|
|
### Property 6: State Embedding Similarity Bounds
|
|
|
|
*For any* two State Embeddings, the cosine similarity should be between -1.0 and 1.0.
|
|
|
|
**Validates: Requirements 4.7**
|
|
|
|
### Property 7: WorkflowNode Matching Consistency
|
|
|
|
*For any* WorkflowNode and ScreenState, if the node matches with confidence C1, then matching again immediately should return the same confidence C1 (deterministic).
|
|
|
|
**Validates: Requirements 9.1, 9.2, 9.3, 9.4, 9.5, 9.6**
|
|
|
|
### Property 8: WorkflowEdge Pre-Condition Validation
|
|
|
|
*For any* WorkflowEdge, if pre-conditions are not satisfied, execution should not proceed and should return failure.
|
|
|
|
**Validates: Requirements 10.5**
|
|
|
|
### Property 9: Learning State Monotonic Progression
|
|
|
|
*For any* Workflow in learning state S, transitioning to state S' should only happen if S' is the next state in the progression (OBSERVATION → COACHING → AUTO_CANDIDATE → AUTO_CONFIRMÉ), except for rollback to COACHING.
|
|
|
|
**Validates: Requirements 8.1, 8.2, 8.3, 8.4**
|
|
|
|
### Property 10: Learning State Rollback Condition
|
|
|
|
*For any* Workflow in AUTO_CONFIRMÉ state, if confidence drops below 0.90, the system should rollback to COACHING state.
|
|
|
|
**Validates: Requirements 8.6**
|
|
|
|
### Property 11: FAISS Index Consistency
|
|
|
|
*For any* embedding added to FAISS index with metadata M, searching for that exact embedding should return it as the top result with metadata M.
|
|
|
|
**Validates: Requirements 4.8, 12.3, 12.6**
|
|
|
|
### Property 12: Workflow Graph Structural Validity
|
|
|
|
*For any* Workflow Graph, all edges should reference existing nodes (no dangling references).
|
|
|
|
**Validates: Requirements 7.2**
|
|
|
|
### Property 13: UIElement Role Uniqueness Per Type
|
|
|
|
*For any* ScreenState, if multiple UIElements have the same role, they should have different element_ids.
|
|
|
|
**Validates: Requirements 3.3**
|
|
|
|
### Property 14: Embedding Prototype Sample Count
|
|
|
|
*For any* WorkflowNode with embedding prototype, the sample_count should be at least 1 and the prototype vector should exist.
|
|
|
|
**Validates: Requirements 5.4**
|
|
|
|
### Property 15: Action Execution Timeout
|
|
|
|
*For any* WorkflowEdge execution, if post-conditions are not satisfied within timeout_seconds, the execution should be marked as failed.
|
|
|
|
**Validates: Requirements 10.6**
|
|
|
|
### Property 16: Pattern Detection Minimum Repetitions
|
|
|
|
*For any* detected workflow pattern, it should have been observed at least 3 times before being proposed as a Workflow Graph.
|
|
|
|
**Validates: Requirements 11.7**
|
|
|
|
### Property 17: State Embedding Component Weights Sum
|
|
|
|
*For any* weighted fusion configuration, the sum of all component weights (image + text + title + ui) should equal 1.0.
|
|
|
|
**Validates: Requirements 4.5**
|
|
|
|
### Property 18: Workflow JSON Serialization Round Trip
|
|
|
|
*For any* valid Workflow Graph, serializing to JSON then deserializing should produce an equivalent Workflow with all nodes and edges preserved.
|
|
|
|
**Validates: Requirements 7.6, 12.4, 12.5**
|
|
|
|
### Property 19: Performance Constraint - State Embedding
|
|
|
|
*For any* ScreenState, computing the State Embedding should complete in less than 100ms.
|
|
|
|
**Validates: Requirements 15.1**
|
|
|
|
### Property 20: Performance Constraint - End-to-End
|
|
|
|
*For any* ScreenState processing (detection + embedding + matching), the total time should be less than 400ms.
|
|
|
|
**Validates: Requirements 15.5**
|
|
|
|
|
|
|
|
## Considérations de Sécurité
|
|
|
|
### Validation des Données
|
|
|
|
- Tous les JSON chargés doivent être validés contre leur schéma
|
|
- Les embeddings chargés doivent avoir les dimensions attendues
|
|
- Les workflow_ids et node_ids doivent être validés (format, unicité)
|
|
|
|
### Isolation des Workflows
|
|
|
|
- Chaque workflow doit avoir son propre espace dans FAISS
|
|
- Les embeddings de différents workflows ne doivent pas interférer
|
|
- Les métadonnées doivent inclure workflow_id pour filtrage
|
|
|
|
### Safety Rules
|
|
|
|
- Les actions interdites (forbidden_text_clicks, forbidden_roles) doivent être bloquées
|
|
- Les actions irréversibles doivent demander confirmation
|
|
- Les rollbacks doivent être possibles pour les 3 dernières actions
|
|
|
|
### Logging et Audit
|
|
|
|
- Toutes les transitions d'état doivent être loggées
|
|
- Tous les échecs d'exécution doivent être loggées avec contexte
|
|
- Les changements d'UI détectés doivent être loggées avec screenshots
|
|
|
|
## Optimisation des Performances
|
|
|
|
### Embeddings
|
|
|
|
- Utiliser batch processing pour calculer plusieurs embeddings
|
|
- Mettre en cache les embeddings de prototypes
|
|
- Utiliser quantification FP16 pour modèles CLIP
|
|
|
|
### FAISS
|
|
|
|
- Utiliser index IVF pour grands ensembles (>10k embeddings)
|
|
- Optimiser périodiquement l'index (compactage)
|
|
- Utiliser GPU si disponible pour recherche
|
|
|
|
### UI Detection
|
|
|
|
- Limiter la résolution des screenshots (max 1920x1080)
|
|
- Utiliser ROI detection pour réduire zone de traitement
|
|
- Mettre en cache les résultats de détection pour frames similaires
|
|
|
|
### Workflow Matching
|
|
|
|
- Pré-filtrer les candidats par window context
|
|
- Utiliser early stopping si confiance très élevée (>0.95)
|
|
- Mettre en cache le dernier node matché
|
|
|
|
## Considérations de Déploiement
|
|
|
|
### Dépendances
|
|
|
|
- Python 3.9+
|
|
- PyTorch 2.0+
|
|
- OpenCLIP
|
|
- FAISS (CPU ou GPU)
|
|
- Transformers (Hugging Face)
|
|
- NumPy, Pillow
|
|
- pytest, pytest-benchmark
|
|
|
|
### Structure de Données
|
|
|
|
```
|
|
data/
|
|
├── sessions/
|
|
│ └── YYYY-MM-DD/
|
|
│ └── sess_*.json
|
|
├── screen_states/
|
|
│ └── YYYY-MM-DD/
|
|
│ └── screen_*.json
|
|
├── embeddings/
|
|
│ ├── screens/
|
|
│ │ └── *.npy
|
|
│ ├── elements/
|
|
│ │ └── *.npy
|
|
│ └── states/
|
|
│ └── *.npy
|
|
├── faiss_index/
|
|
│ ├── index.faiss
|
|
│ └── metadata.json
|
|
└── workflows/
|
|
└── WF_*/
|
|
├── workflow.json
|
|
└── prototypes/
|
|
└── *.npy
|
|
```
|
|
|
|
### Configuration
|
|
|
|
```python
|
|
CONFIG = {
|
|
"models": {
|
|
"clip": "ViT-B-32",
|
|
"vlm": "qwen2.5-vl:3b"
|
|
},
|
|
"embedding": {
|
|
"dimension": 512,
|
|
"fusion_method": "weighted",
|
|
"weights": {
|
|
"image": 0.5,
|
|
"text": 0.3,
|
|
"title": 0.1,
|
|
"ui": 0.1
|
|
}
|
|
},
|
|
"matching": {
|
|
"min_similarity": 0.85,
|
|
"faiss_k": 5
|
|
},
|
|
"learning": {
|
|
"observation_threshold": 5,
|
|
"coaching_threshold": 10,
|
|
"auto_candidate_threshold": 20,
|
|
"min_success_rate": 0.90,
|
|
"rollback_confidence": 0.90
|
|
},
|
|
"performance": {
|
|
"max_embedding_time_ms": 100,
|
|
"max_detection_time_ms": 200,
|
|
"max_total_time_ms": 400
|
|
}
|
|
}
|
|
```
|
|
|
|
## Plan d'Implémentation
|
|
|
|
### Phase 1 : Fondations (Semaines 1-2)
|
|
|
|
**Objectif** : Structures de données et sérialisation
|
|
|
|
- Implémenter classes de base (RawSession, ScreenState, UIElement, etc.)
|
|
- Implémenter sérialisation/désérialisation JSON
|
|
- Tests unitaires sur structures
|
|
- Validation de schémas
|
|
|
|
**Livrables** :
|
|
- `geniusia2/core/models/*.py`
|
|
- Tests unitaires complets
|
|
|
|
### Phase 2 : Embeddings et FAISS (Semaines 3-4)
|
|
|
|
**Objectif** : Système d'embeddings fonctionnel
|
|
|
|
- Implémenter FusionEngine
|
|
- Implémenter FAISSManager
|
|
- Implémenter calculs de similarité
|
|
- Tests de performance
|
|
|
|
**Livrables** :
|
|
- `geniusia2/core/embedding/*.py`
|
|
- Benchmarks de performance
|
|
|
|
### Phase 3 : UI Detection (Semaines 5-6)
|
|
|
|
**Objectif** : Détection sémantique d'éléments UI
|
|
|
|
- Intégrer VLM pour détection
|
|
- Implémenter classification type/rôle
|
|
- Générer embeddings duaux
|
|
- Tests avec screenshots réels
|
|
|
|
**Livrables** :
|
|
- `geniusia2/core/detection/*.py`
|
|
- Dataset de test avec screenshots
|
|
|
|
### Phase 4 : Workflow Graph (Semaines 7-9)
|
|
|
|
**Objectif** : Construction et matching de graphes
|
|
|
|
- Implémenter WorkflowNode et WorkflowEdge
|
|
- Implémenter GraphBuilder
|
|
- Implémenter NodeMatcher
|
|
- Tests d'intégration
|
|
|
|
**Livrables** :
|
|
- `geniusia2/core/graph/*.py`
|
|
- Tests d'intégration complets
|
|
|
|
### Phase 5 : Exécution et Apprentissage (Semaines 10-12)
|
|
|
|
**Objectif** : Exécution d'actions et états d'apprentissage
|
|
|
|
- Implémenter ActionExecutor
|
|
- Implémenter LearningManager
|
|
- Implémenter transitions d'états
|
|
- Tests end-to-end
|
|
|
|
**Livrables** :
|
|
- `geniusia2/core/graph/action_executor.py`
|
|
- `geniusia2/core/graph/learning_manager.py`
|
|
- Tests end-to-end
|
|
|
|
### Phase 6 : Optimisation et Production (Semaines 13-14)
|
|
|
|
**Objectif** : Optimisation et déploiement
|
|
|
|
- Optimiser performances (caching, batching)
|
|
- Ajouter monitoring et logging
|
|
- Documentation complète
|
|
- Tests de charge
|
|
|
|
**Livrables** :
|
|
- Système optimisé
|
|
- Documentation utilisateur
|
|
- Guide de déploiement
|