feat: replay E2E fonctionnel — 25/25 actions, 0 retries, SomEngine via serveur

Validé sur PC Windows (DESKTOP-58D5CAC, 2560x1600) :
- 8 clics résolus visuellement (1 anchor_template, 1 som_text_match, 6 som_vlm)
- Score moyen 0.75, temps moyen 1.6s
- Texte tapé correctement (bonjour, test word, date, email)
- 0 retries, 2 actions non vérifiées (OK)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Dom
2026-03-31 14:04:41 +02:00
parent 5e0b53cfd1
commit a7de6a488b
79542 changed files with 6091757 additions and 1 deletions

View File

@@ -0,0 +1,380 @@
# Design Document: Admin Monitoring System
## Overview
This design document describes the architecture and implementation of a comprehensive monitoring and administration system for RPA Vision V3. The system extends the existing web dashboard with workflow chain management, trigger configuration, Prometheus metrics integration, centralized logging, and log download capabilities.
## Architecture
```mermaid
graph TB
subgraph "Admin Dashboard"
UI[Web Interface]
API[Flask API]
WS[WebSocket Handler]
end
subgraph "Monitoring Core"
Logger[Centralized Logger]
Metrics[Prometheus Metrics]
Collector[Metrics Collector]
end
subgraph "Management"
ChainMgr[Chain Manager]
TriggerMgr[Trigger Manager]
LogExporter[Log Exporter]
end
subgraph "Storage"
ChainStore[(Chains JSON)]
TriggerStore[(Triggers JSON)]
LogStore[(Log Files)]
end
UI --> API
UI --> WS
API --> ChainMgr
API --> TriggerMgr
API --> Logger
API --> LogExporter
API --> Metrics
ChainMgr --> ChainStore
TriggerMgr --> TriggerStore
Logger --> LogStore
Logger --> Metrics
Collector --> Metrics
WS --> Collector
```
## Components and Interfaces
### 1. Centralized Logger (`core/monitoring/logger.py`)
```python
@dataclass
class LogEntry:
timestamp: datetime
level: str # INFO, WARNING, ERROR, DEBUG
component: str
message: str
workflow_id: Optional[str] = None
node_id: Optional[str] = None
metadata: Dict[str, Any] = field(default_factory=dict)
def to_dict(self) -> Dict[str, Any]: ...
class RPALogger:
def __init__(self, component: str, log_file: Optional[str] = None): ...
def info(self, message: str, workflow_id: str = None, **metadata): ...
def warning(self, message: str, workflow_id: str = None, **metadata): ...
def error(self, message: str, workflow_id: str = None, **metadata): ...
def debug(self, message: str, workflow_id: str = None, **metadata): ...
def workflow_start(self, workflow_id: str, **metadata): ...
def workflow_end(self, workflow_id: str, success: bool, duration: float): ...
def get_recent_logs(self, limit: int = 100) -> List[LogEntry]: ...
def export_logs(self, start_time: datetime = None, end_time: datetime = None) -> str: ...
def get_logger(component: str) -> RPALogger: ...
```
### 2. Prometheus Metrics (`core/monitoring/metrics.py`)
```python
# Counters
workflow_executions_total = Counter(
'workflow_executions_total',
'Total workflow executions',
['workflow_id', 'status']
)
log_entries_total = Counter(
'log_entries_total',
'Total log entries',
['level', 'component']
)
chain_executions_total = Counter(
'chain_executions_total',
'Total chain executions',
['chain_id', 'status']
)
trigger_fires_total = Counter(
'trigger_fires_total',
'Total trigger fires',
['trigger_type', 'workflow_id']
)
# Histograms
workflow_duration_seconds = Histogram(
'workflow_duration_seconds',
'Workflow execution duration',
['workflow_id']
)
# Gauges
active_workflows = Gauge('active_workflows', 'Number of active workflows')
error_rate = Gauge('error_rate', 'Current error rate percentage')
```
### 3. Chain Manager (`core/monitoring/chain_manager.py`)
```python
@dataclass
class WorkflowChain:
chain_id: str
name: str
workflows: List[str] # Ordered list of workflow_ids
status: str # active, inactive, running
created_at: datetime
last_execution: Optional[datetime] = None
success_rate: float = 0.0
class ChainManager:
def __init__(self, storage_path: Path): ...
def list_chains(self) -> List[WorkflowChain]: ...
def get_chain(self, chain_id: str) -> Optional[WorkflowChain]: ...
def create_chain(self, name: str, workflows: List[str]) -> WorkflowChain: ...
def validate_workflows_exist(self, workflow_ids: List[str]) -> bool: ...
def execute_chain(self, chain_id: str, on_progress: Callable) -> ChainExecutionResult: ...
def delete_chain(self, chain_id: str) -> bool: ...
```
### 4. Trigger Manager (`core/monitoring/trigger_manager.py`)
```python
@dataclass
class Trigger:
trigger_id: str
trigger_type: str # schedule, file, manual
workflow_id: str
config: Dict[str, Any]
enabled: bool
created_at: datetime
last_fired: Optional[datetime] = None
class TriggerManager:
def __init__(self, storage_path: Path): ...
def list_triggers(self) -> List[Trigger]: ...
def get_trigger(self, trigger_id: str) -> Optional[Trigger]: ...
def create_trigger(self, trigger_type: str, workflow_id: str, config: Dict) -> Trigger: ...
def validate_config(self, trigger_type: str, config: Dict) -> bool: ...
def enable_trigger(self, trigger_id: str) -> bool: ...
def disable_trigger(self, trigger_id: str) -> bool: ...
def delete_trigger(self, trigger_id: str) -> bool: ...
```
### 5. Log Exporter (`core/monitoring/log_exporter.py`)
```python
class LogExporter:
def __init__(self, logs_path: Path): ...
def export_to_zip(
self,
start_time: Optional[datetime] = None,
end_time: Optional[datetime] = None
) -> io.BytesIO: ...
def get_execution_logs(self, start: datetime, end: datetime) -> List[Dict]: ...
def get_error_logs(self, start: datetime, end: datetime) -> List[Dict]: ...
def get_metrics_summary(self) -> Dict: ...
```
### 6. API Endpoints (additions to `web_dashboard/app.py`)
```python
# Chains API
@app.route('/api/chains')
def api_chains(): ...
@app.route('/api/chains', methods=['POST'])
def create_chain(): ...
@app.route('/api/chains/<chain_id>/execute', methods=['POST'])
def execute_chain(chain_id): ...
# Triggers API
@app.route('/api/triggers')
def api_triggers(): ...
@app.route('/api/triggers', methods=['POST'])
def create_trigger(): ...
@app.route('/api/triggers/<trigger_id>/toggle', methods=['POST'])
def toggle_trigger(trigger_id): ...
# Logs API
@app.route('/api/logs/download')
def download_logs(): ...
# Metrics API
@app.route('/metrics')
def prometheus_metrics(): ...
```
## Data Models
### Chain Storage Format (`data/chains/*.json`)
```json
{
"chain_id": "chain_001",
"name": "Complete Process",
"workflows": ["wf_login", "wf_data_entry", "wf_submit"],
"status": "active",
"created_at": "2024-11-29T10:00:00",
"last_execution": "2024-11-29T14:30:00",
"success_rate": 92.5,
"execution_history": [
{
"timestamp": "2024-11-29T14:30:00",
"success": true,
"duration": 45.2,
"failed_at": null
}
]
}
```
### Trigger Storage Format (`data/triggers/*.json`)
```json
{
"trigger_id": "trigger_001",
"trigger_type": "schedule",
"workflow_id": "wf_login",
"config": {
"interval_seconds": 3600,
"start_time": "08:00",
"end_time": "18:00"
},
"enabled": true,
"created_at": "2024-11-29T10:00:00",
"last_fired": "2024-11-29T14:00:00"
}
```
### Log Entry Format
```json
{
"timestamp": "2024-11-29T14:30:15.123",
"level": "INFO",
"component": "execution",
"message": "Workflow started",
"workflow_id": "wf_001",
"node_id": "login_node",
"metadata": {
"trigger": "schedule",
"user": "system"
}
}
```
## Correctness Properties
*A property is a characteristic or behavior that should hold true across all valid executions of a system-essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
### Property 1: Chain listing completeness
*For any* set of chains stored in the system, the chains API endpoint SHALL return all chains with their complete workflow sequences and status information.
**Validates: Requirements 1.1**
### Property 2: Chain workflow validation
*For any* chain creation request with workflow references, if any referenced workflow does not exist, the creation SHALL fail with a validation error.
**Validates: Requirements 1.2**
### Property 3: Chain execution stops on failure
*For any* chain execution where a workflow fails, the chain execution SHALL stop at the failed workflow and not execute subsequent workflows.
**Validates: Requirements 1.4**
### Property 4: Trigger listing completeness
*For any* set of triggers stored in the system, the triggers API endpoint SHALL return all triggers with type, workflow_id, and enabled status.
**Validates: Requirements 2.1**
### Property 5: Trigger state persistence
*For any* trigger enable/disable operation, the new state SHALL be persisted and returned correctly on subsequent queries.
**Validates: Requirements 2.3**
### Property 6: Prometheus metrics format validity
*For any* request to the /metrics endpoint, the response SHALL be valid Prometheus exposition format parseable by Prometheus.
**Validates: Requirements 3.1**
### Property 7: Workflow execution counter increment
*For any* workflow execution (success or failure), the workflow_executions_total counter SHALL increment by exactly 1 with correct labels.
**Validates: Requirements 3.2**
### Property 8: Workflow duration histogram recording
*For any* completed workflow execution with a measured duration, the workflow_duration_seconds histogram SHALL record that duration.
**Validates: Requirements 3.3**
### Property 9: Log entry structure completeness
*For any* log entry created by the logging system, the entry SHALL contain timestamp, level, component, and message fields.
**Validates: Requirements 4.1**
### Property 10: Workflow log metadata inclusion
*For any* log entry created with workflow context, the entry metadata SHALL include workflow_id and node_id when provided.
**Validates: Requirements 4.2**
### Property 11: Log filtering correctness
*For any* log query with filter parameters, all returned entries SHALL match the specified filter criteria.
**Validates: Requirements 4.3**
### Property 12: Log counter synchronization
*For any* log entry written, the corresponding Prometheus log counter SHALL be incremented by 1.
**Validates: Requirements 4.4**
### Property 13: ZIP archive validity
*For any* log download request, the response SHALL be a valid ZIP archive.
**Validates: Requirements 5.1**
### Property 14: ZIP archive contents
*For any* log download, the ZIP archive SHALL contain execution_logs.json, error_logs.json, and metrics.json files.
**Validates: Requirements 5.2**
### Property 15: Date range filtering
*For any* log download with date range parameters, all log entries in the archive SHALL have timestamps within the specified range.
**Validates: Requirements 5.4**
## Error Handling
### Chain Errors
- `ChainNotFoundError`: Chain ID does not exist
- `WorkflowNotFoundError`: Referenced workflow does not exist
- `ChainExecutionError`: Error during chain execution with failure point
### Trigger Errors
- `TriggerNotFoundError`: Trigger ID does not exist
- `InvalidTriggerConfigError`: Trigger configuration is invalid
- `WorkflowNotFoundError`: Target workflow does not exist
### Log Errors
- `LogExportError`: Error generating log archive
- `InvalidDateRangeError`: Start date is after end date
## Testing Strategy
### Property-Based Testing Library
The implementation will use **Hypothesis** for Python property-based testing.
### Test Configuration
- Minimum 100 iterations per property test
- Each property test tagged with: `**Feature: admin-monitoring, Property {number}: {property_text}**`
### Unit Tests
- Test individual component methods
- Test API endpoint responses
- Test error handling paths
### Property-Based Tests
Each correctness property will have a corresponding property-based test:
1. **Property 1-2**: Generate random chain configurations, verify listing and validation
2. **Property 3**: Generate chains with failing workflows, verify execution stops
3. **Property 4-5**: Generate random triggers, verify listing and state persistence
4. **Property 6-8**: Generate workflow executions, verify metrics format and values
5. **Property 9-12**: Generate log entries, verify structure and counter sync
6. **Property 13-15**: Generate log data, verify ZIP contents and filtering
### Integration Tests
- End-to-end chain execution flow
- Trigger firing and workflow execution
- Log download with various filters

View File

@@ -0,0 +1,82 @@
# Requirements Document
## Introduction
Ce document spécifie les exigences pour un système de monitoring et d'administration complet pour RPA Vision V3. Le système fournira une interface admin enrichie avec gestion des chaînes de workflows, déclencheurs, métriques Prometheus intégrées, logging centralisé et téléchargement des logs.
## Glossary
- **Admin Dashboard**: Interface web d'administration pour gérer et monitorer le système RPA
- **Prometheus**: Système de monitoring et d'alerting open-source pour collecter et stocker des métriques
- **Workflow Chain**: Séquence ordonnée de workflows exécutés consécutivement
- **Trigger**: Déclencheur automatique qui lance l'exécution d'un workflow basé sur un événement
- **Log Entry**: Entrée de journal structurée contenant timestamp, niveau, composant et message
- **Metrics Endpoint**: Point d'accès HTTP exposant les métriques au format Prometheus
## Requirements
### Requirement 1
**User Story:** As an administrator, I want to view and manage workflow chains, so that I can orchestrate complex multi-workflow processes.
#### Acceptance Criteria
1. WHEN an administrator accesses the chains section, THE Admin Dashboard SHALL display all configured workflow chains with their status and workflow sequence
2. WHEN an administrator creates a new chain, THE Admin Dashboard SHALL validate that all referenced workflows exist before saving
3. WHEN an administrator executes a chain, THE Admin Dashboard SHALL execute workflows in sequence and report progress via WebSocket
4. WHEN a workflow in a chain fails, THE Admin Dashboard SHALL stop chain execution and report the failure point
### Requirement 2
**User Story:** As an administrator, I want to configure and manage triggers, so that workflows can be automatically executed based on events.
#### Acceptance Criteria
1. WHEN an administrator accesses the triggers section, THE Admin Dashboard SHALL display all configured triggers with type, target workflow, and enabled status
2. WHEN an administrator creates a schedule trigger, THE Admin Dashboard SHALL validate the interval configuration and associate it with a workflow
3. WHEN an administrator enables or disables a trigger, THE Admin Dashboard SHALL update the trigger state and persist the change
4. WHEN a trigger fires, THE Admin Dashboard SHALL log the event and initiate the associated workflow execution
### Requirement 3
**User Story:** As a system operator, I want Prometheus metrics exposed, so that I can monitor system health and performance.
#### Acceptance Criteria
1. WHEN a monitoring system requests metrics, THE Admin Dashboard SHALL expose a /metrics endpoint returning Prometheus-formatted data
2. WHEN a workflow executes, THE Admin Dashboard SHALL increment workflow execution counters with workflow_id and status labels
3. WHEN a workflow completes, THE Admin Dashboard SHALL record execution duration in a histogram metric
4. WHEN the system state changes, THE Admin Dashboard SHALL update gauge metrics for active workflows and error rates
### Requirement 4
**User Story:** As a developer, I want a centralized logging system, so that I can track and debug system behavior across components.
#### Acceptance Criteria
1. WHEN a component logs an event, THE Logging System SHALL create a structured log entry with timestamp, level, component, and message
2. WHEN a workflow-related event occurs, THE Logging System SHALL include workflow_id and node_id in the log entry metadata
3. WHEN logs are requested via API, THE Admin Dashboard SHALL return recent log entries filtered by optional parameters
4. WHEN the logging system writes entries, THE Logging System SHALL also update Prometheus log counters by level and component
### Requirement 5
**User Story:** As an administrator, I want to download logs, so that I can analyze system behavior offline or share with support.
#### Acceptance Criteria
1. WHEN an administrator requests log download, THE Admin Dashboard SHALL generate a ZIP archive containing structured log files
2. WHEN generating the ZIP archive, THE Admin Dashboard SHALL include execution_logs.json, error_logs.json, and metrics.json files
3. WHEN the download completes, THE Admin Dashboard SHALL return the ZIP file with appropriate content-type and filename headers
4. WHEN an administrator specifies a date range, THE Admin Dashboard SHALL filter logs to include only entries within that range
### Requirement 6
**User Story:** As an administrator, I want the admin interface updated with new sections, so that I can access all monitoring features from one place.
#### Acceptance Criteria
1. WHEN an administrator loads the dashboard, THE Admin Dashboard SHALL display navigation links for Workflows, Chains, Triggers, Metrics, and Logs sections
2. WHEN an administrator switches sections, THE Admin Dashboard SHALL load section data dynamically via API calls
3. WHEN displaying the logs section, THE Admin Dashboard SHALL show a download button and refresh functionality
4. WHEN displaying metrics, THE Admin Dashboard SHALL show real-time system metrics with auto-refresh capability

View File

@@ -0,0 +1,167 @@
# Implementation Plan
- [x] 1. Set up monitoring module structure
- Create `core/monitoring/` directory with `__init__.py`
- Define base interfaces and types
- _Requirements: 3.1, 4.1_
- [x] 2. Implement centralized logging system
- [x] 2.1 Create LogEntry dataclass and RPALogger class
- Implement structured log entries with timestamp, level, component, message
- Add workflow_id and node_id metadata support
- Implement info, warning, error, debug methods
- _Requirements: 4.1, 4.2_
- [ ]* 2.2 Write property test for log entry structure
- **Property 9: Log entry structure completeness**
- **Validates: Requirements 4.1**
- [ ]* 2.3 Write property test for workflow metadata inclusion
- **Property 10: Workflow log metadata inclusion**
- **Validates: Requirements 4.2**
- [x] 2.4 Implement get_logger factory function
- Create singleton pattern for component loggers
- Configure file and console handlers
- _Requirements: 4.1_
- [x] 3. Implement Prometheus metrics integration
- [x] 3.1 Create metrics module with Counter, Histogram, Gauge definitions
- Define workflow_executions_total counter
- Define workflow_duration_seconds histogram
- Define log_entries_total counter
- Define active_workflows and error_rate gauges
- _Requirements: 3.1, 3.2, 3.3, 3.4_
- [ ]* 3.2 Write property test for metrics format validity
- **Property 6: Prometheus metrics format validity**
- **Validates: Requirements 3.1**
- [x] 3.3 Integrate metrics with logger
- Increment log_entries_total on each log write
- _Requirements: 4.4_
- [ ]* 3.4 Write property test for log counter synchronization
- **Property 12: Log counter synchronization**
- **Validates: Requirements 4.4**
- [ ] 4. Checkpoint - Ensure all tests pass
- Ensure all tests pass, ask the user if questions arise.
- [x] 5. Implement Chain Manager
- [x] 5.1 Create WorkflowChain dataclass and ChainManager class
- Implement chain storage and retrieval
- Implement list_chains and get_chain methods
- _Requirements: 1.1_
- [ ]* 5.2 Write property test for chain listing completeness
- **Property 1: Chain listing completeness**
- **Validates: Requirements 1.1**
- [x] 5.3 Implement chain creation with workflow validation
- Implement create_chain method
- Implement validate_workflows_exist method
- _Requirements: 1.2_
- [ ]* 5.4 Write property test for chain workflow validation
- **Property 2: Chain workflow validation**
- **Validates: Requirements 1.2**
- [x] 5.5 Implement chain execution logic
- Execute workflows in sequence
- Stop on failure and report failure point
- _Requirements: 1.3, 1.4_
- [ ]* 5.6 Write property test for chain execution failure handling
- **Property 3: Chain execution stops on failure**
- **Validates: Requirements 1.4**
- [x] 6. Implement Trigger Manager
- [x] 6.1 Create Trigger dataclass and TriggerManager class
- Implement trigger storage and retrieval
- Implement list_triggers and get_trigger methods
- _Requirements: 2.1_
- [ ]* 6.2 Write property test for trigger listing completeness
- **Property 4: Trigger listing completeness**
- **Validates: Requirements 2.1**
- [x] 6.3 Implement trigger creation with config validation
- Implement create_trigger method
- Implement validate_config method for schedule triggers
- _Requirements: 2.2_
- [x] 6.4 Implement trigger enable/disable functionality
- Implement enable_trigger and disable_trigger methods
- Persist state changes to storage
- _Requirements: 2.3_
- [ ]* 6.5 Write property test for trigger state persistence
- **Property 5: Trigger state persistence**
- **Validates: Requirements 2.3**
- [ ] 7. Checkpoint - Ensure all tests pass
- Ensure all tests pass, ask the user if questions arise.
- [x] 8. Implement Log Exporter
- [x] 8.1 Create LogExporter class with ZIP generation
- Implement export_to_zip method
- Generate execution_logs.json, error_logs.json, metrics.json
- _Requirements: 5.1, 5.2_
- [ ]* 8.2 Write property test for ZIP archive validity
- **Property 13: ZIP archive validity**
- **Validates: Requirements 5.1**
- [ ]* 8.3 Write property test for ZIP archive contents
- **Property 14: ZIP archive contents**
- **Validates: Requirements 5.2**
- [x] 8.4 Implement date range filtering for log export
- Filter logs by start_time and end_time parameters
- _Requirements: 5.4_
- [ ]* 8.5 Write property test for date range filtering
- **Property 15: Date range filtering**
- **Validates: Requirements 5.4**
- [x] 9. Implement workflow metrics tracking
- [x] 9.1 Add workflow execution counter increment
- Increment counter on workflow start/end
- Include workflow_id and status labels
- _Requirements: 3.2_
- [ ]* 9.2 Write property test for counter increment
- **Property 7: Workflow execution counter increment**
- **Validates: Requirements 3.2**
- [x] 9.3 Add workflow duration histogram recording
- Record duration on workflow completion
- _Requirements: 3.3_
- [ ]* 9.4 Write property test for histogram recording
- **Property 8: Workflow duration histogram recording**
- **Validates: Requirements 3.3**
- [x] 10. Update web dashboard API
- [x] 10.1 Add chains API endpoints
- GET /api/chains - list all chains
- POST /api/chains - create new chain
- POST /api/chains/<id>/execute - execute chain
- _Requirements: 1.1, 1.2, 1.3_
- [x] 10.2 Add triggers API endpoints
- GET /api/triggers - list all triggers
- POST /api/triggers - create new trigger
- POST /api/triggers/<id>/toggle - enable/disable trigger
- _Requirements: 2.1, 2.2, 2.3_
- [x] 10.3 Add logs download endpoint
- GET /api/logs/download - download logs as ZIP
- Support date range query parameters
- _Requirements: 5.1, 5.3, 5.4_
- [x] 10.4 Add Prometheus metrics endpoint
- GET /metrics - return Prometheus-formatted metrics
- _Requirements: 3.1_
- [ ]* 10.5 Write property test for log filtering via API
- **Property 11: Log filtering correctness**
- **Validates: Requirements 4.3**
- [x] 11. Update admin interface HTML
- [x] 11.1 Add navigation links for new sections
- Add Chains, Triggers, Metrics links to navbar
- _Requirements: 6.1_
- [x] 11.2 Create chains section with dynamic loading
- Display chain cards with workflow sequence
- Add create and execute buttons
- _Requirements: 6.2_
- [x] 11.3 Create triggers section with dynamic loading
- Display trigger cards with type and status
- Add enable/disable toggle
- _Requirements: 6.2_
- [x] 11.4 Update logs section with download button
- Add download ZIP button
- Add refresh functionality
- _Requirements: 6.3_
- [x] 11.5 Create metrics section with real-time display
- Show key metrics with auto-refresh
- _Requirements: 6.4_
- [x] 12. Final Checkpoint - Ensure all tests pass
- Ensure all tests pass, ask the user if questions arise.