Validé sur PC Windows (DESKTOP-58D5CAC, 2560x1600) : - 8 clics résolus visuellement (1 anchor_template, 1 som_text_match, 6 som_vlm) - Score moyen 0.75, temps moyen 1.6s - Texte tapé correctement (bonjour, test word, date, email) - 0 retries, 2 actions non vérifiées (OK) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
12 KiB
Design Document: Admin Monitoring System
Overview
This design document describes the architecture and implementation of a comprehensive monitoring and administration system for RPA Vision V3. The system extends the existing web dashboard with workflow chain management, trigger configuration, Prometheus metrics integration, centralized logging, and log download capabilities.
Architecture
graph TB
subgraph "Admin Dashboard"
UI[Web Interface]
API[Flask API]
WS[WebSocket Handler]
end
subgraph "Monitoring Core"
Logger[Centralized Logger]
Metrics[Prometheus Metrics]
Collector[Metrics Collector]
end
subgraph "Management"
ChainMgr[Chain Manager]
TriggerMgr[Trigger Manager]
LogExporter[Log Exporter]
end
subgraph "Storage"
ChainStore[(Chains JSON)]
TriggerStore[(Triggers JSON)]
LogStore[(Log Files)]
end
UI --> API
UI --> WS
API --> ChainMgr
API --> TriggerMgr
API --> Logger
API --> LogExporter
API --> Metrics
ChainMgr --> ChainStore
TriggerMgr --> TriggerStore
Logger --> LogStore
Logger --> Metrics
Collector --> Metrics
WS --> Collector
Components and Interfaces
1. Centralized Logger (core/monitoring/logger.py)
@dataclass
class LogEntry:
timestamp: datetime
level: str # INFO, WARNING, ERROR, DEBUG
component: str
message: str
workflow_id: Optional[str] = None
node_id: Optional[str] = None
metadata: Dict[str, Any] = field(default_factory=dict)
def to_dict(self) -> Dict[str, Any]: ...
class RPALogger:
def __init__(self, component: str, log_file: Optional[str] = None): ...
def info(self, message: str, workflow_id: str = None, **metadata): ...
def warning(self, message: str, workflow_id: str = None, **metadata): ...
def error(self, message: str, workflow_id: str = None, **metadata): ...
def debug(self, message: str, workflow_id: str = None, **metadata): ...
def workflow_start(self, workflow_id: str, **metadata): ...
def workflow_end(self, workflow_id: str, success: bool, duration: float): ...
def get_recent_logs(self, limit: int = 100) -> List[LogEntry]: ...
def export_logs(self, start_time: datetime = None, end_time: datetime = None) -> str: ...
def get_logger(component: str) -> RPALogger: ...
2. Prometheus Metrics (core/monitoring/metrics.py)
# Counters
workflow_executions_total = Counter(
'workflow_executions_total',
'Total workflow executions',
['workflow_id', 'status']
)
log_entries_total = Counter(
'log_entries_total',
'Total log entries',
['level', 'component']
)
chain_executions_total = Counter(
'chain_executions_total',
'Total chain executions',
['chain_id', 'status']
)
trigger_fires_total = Counter(
'trigger_fires_total',
'Total trigger fires',
['trigger_type', 'workflow_id']
)
# Histograms
workflow_duration_seconds = Histogram(
'workflow_duration_seconds',
'Workflow execution duration',
['workflow_id']
)
# Gauges
active_workflows = Gauge('active_workflows', 'Number of active workflows')
error_rate = Gauge('error_rate', 'Current error rate percentage')
3. Chain Manager (core/monitoring/chain_manager.py)
@dataclass
class WorkflowChain:
chain_id: str
name: str
workflows: List[str] # Ordered list of workflow_ids
status: str # active, inactive, running
created_at: datetime
last_execution: Optional[datetime] = None
success_rate: float = 0.0
class ChainManager:
def __init__(self, storage_path: Path): ...
def list_chains(self) -> List[WorkflowChain]: ...
def get_chain(self, chain_id: str) -> Optional[WorkflowChain]: ...
def create_chain(self, name: str, workflows: List[str]) -> WorkflowChain: ...
def validate_workflows_exist(self, workflow_ids: List[str]) -> bool: ...
def execute_chain(self, chain_id: str, on_progress: Callable) -> ChainExecutionResult: ...
def delete_chain(self, chain_id: str) -> bool: ...
4. Trigger Manager (core/monitoring/trigger_manager.py)
@dataclass
class Trigger:
trigger_id: str
trigger_type: str # schedule, file, manual
workflow_id: str
config: Dict[str, Any]
enabled: bool
created_at: datetime
last_fired: Optional[datetime] = None
class TriggerManager:
def __init__(self, storage_path: Path): ...
def list_triggers(self) -> List[Trigger]: ...
def get_trigger(self, trigger_id: str) -> Optional[Trigger]: ...
def create_trigger(self, trigger_type: str, workflow_id: str, config: Dict) -> Trigger: ...
def validate_config(self, trigger_type: str, config: Dict) -> bool: ...
def enable_trigger(self, trigger_id: str) -> bool: ...
def disable_trigger(self, trigger_id: str) -> bool: ...
def delete_trigger(self, trigger_id: str) -> bool: ...
5. Log Exporter (core/monitoring/log_exporter.py)
class LogExporter:
def __init__(self, logs_path: Path): ...
def export_to_zip(
self,
start_time: Optional[datetime] = None,
end_time: Optional[datetime] = None
) -> io.BytesIO: ...
def get_execution_logs(self, start: datetime, end: datetime) -> List[Dict]: ...
def get_error_logs(self, start: datetime, end: datetime) -> List[Dict]: ...
def get_metrics_summary(self) -> Dict: ...
6. API Endpoints (additions to web_dashboard/app.py)
# Chains API
@app.route('/api/chains')
def api_chains(): ...
@app.route('/api/chains', methods=['POST'])
def create_chain(): ...
@app.route('/api/chains/<chain_id>/execute', methods=['POST'])
def execute_chain(chain_id): ...
# Triggers API
@app.route('/api/triggers')
def api_triggers(): ...
@app.route('/api/triggers', methods=['POST'])
def create_trigger(): ...
@app.route('/api/triggers/<trigger_id>/toggle', methods=['POST'])
def toggle_trigger(trigger_id): ...
# Logs API
@app.route('/api/logs/download')
def download_logs(): ...
# Metrics API
@app.route('/metrics')
def prometheus_metrics(): ...
Data Models
Chain Storage Format (data/chains/*.json)
{
"chain_id": "chain_001",
"name": "Complete Process",
"workflows": ["wf_login", "wf_data_entry", "wf_submit"],
"status": "active",
"created_at": "2024-11-29T10:00:00",
"last_execution": "2024-11-29T14:30:00",
"success_rate": 92.5,
"execution_history": [
{
"timestamp": "2024-11-29T14:30:00",
"success": true,
"duration": 45.2,
"failed_at": null
}
]
}
Trigger Storage Format (data/triggers/*.json)
{
"trigger_id": "trigger_001",
"trigger_type": "schedule",
"workflow_id": "wf_login",
"config": {
"interval_seconds": 3600,
"start_time": "08:00",
"end_time": "18:00"
},
"enabled": true,
"created_at": "2024-11-29T10:00:00",
"last_fired": "2024-11-29T14:00:00"
}
Log Entry Format
{
"timestamp": "2024-11-29T14:30:15.123",
"level": "INFO",
"component": "execution",
"message": "Workflow started",
"workflow_id": "wf_001",
"node_id": "login_node",
"metadata": {
"trigger": "schedule",
"user": "system"
}
}
Correctness Properties
A property is a characteristic or behavior that should hold true across all valid executions of a system-essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.
Property 1: Chain listing completeness
For any set of chains stored in the system, the chains API endpoint SHALL return all chains with their complete workflow sequences and status information. Validates: Requirements 1.1
Property 2: Chain workflow validation
For any chain creation request with workflow references, if any referenced workflow does not exist, the creation SHALL fail with a validation error. Validates: Requirements 1.2
Property 3: Chain execution stops on failure
For any chain execution where a workflow fails, the chain execution SHALL stop at the failed workflow and not execute subsequent workflows. Validates: Requirements 1.4
Property 4: Trigger listing completeness
For any set of triggers stored in the system, the triggers API endpoint SHALL return all triggers with type, workflow_id, and enabled status. Validates: Requirements 2.1
Property 5: Trigger state persistence
For any trigger enable/disable operation, the new state SHALL be persisted and returned correctly on subsequent queries. Validates: Requirements 2.3
Property 6: Prometheus metrics format validity
For any request to the /metrics endpoint, the response SHALL be valid Prometheus exposition format parseable by Prometheus. Validates: Requirements 3.1
Property 7: Workflow execution counter increment
For any workflow execution (success or failure), the workflow_executions_total counter SHALL increment by exactly 1 with correct labels. Validates: Requirements 3.2
Property 8: Workflow duration histogram recording
For any completed workflow execution with a measured duration, the workflow_duration_seconds histogram SHALL record that duration. Validates: Requirements 3.3
Property 9: Log entry structure completeness
For any log entry created by the logging system, the entry SHALL contain timestamp, level, component, and message fields. Validates: Requirements 4.1
Property 10: Workflow log metadata inclusion
For any log entry created with workflow context, the entry metadata SHALL include workflow_id and node_id when provided. Validates: Requirements 4.2
Property 11: Log filtering correctness
For any log query with filter parameters, all returned entries SHALL match the specified filter criteria. Validates: Requirements 4.3
Property 12: Log counter synchronization
For any log entry written, the corresponding Prometheus log counter SHALL be incremented by 1. Validates: Requirements 4.4
Property 13: ZIP archive validity
For any log download request, the response SHALL be a valid ZIP archive. Validates: Requirements 5.1
Property 14: ZIP archive contents
For any log download, the ZIP archive SHALL contain execution_logs.json, error_logs.json, and metrics.json files. Validates: Requirements 5.2
Property 15: Date range filtering
For any log download with date range parameters, all log entries in the archive SHALL have timestamps within the specified range. Validates: Requirements 5.4
Error Handling
Chain Errors
ChainNotFoundError: Chain ID does not existWorkflowNotFoundError: Referenced workflow does not existChainExecutionError: Error during chain execution with failure point
Trigger Errors
TriggerNotFoundError: Trigger ID does not existInvalidTriggerConfigError: Trigger configuration is invalidWorkflowNotFoundError: Target workflow does not exist
Log Errors
LogExportError: Error generating log archiveInvalidDateRangeError: Start date is after end date
Testing Strategy
Property-Based Testing Library
The implementation will use Hypothesis for Python property-based testing.
Test Configuration
- Minimum 100 iterations per property test
- Each property test tagged with:
**Feature: admin-monitoring, Property {number}: {property_text}**
Unit Tests
- Test individual component methods
- Test API endpoint responses
- Test error handling paths
Property-Based Tests
Each correctness property will have a corresponding property-based test:
- Property 1-2: Generate random chain configurations, verify listing and validation
- Property 3: Generate chains with failing workflows, verify execution stops
- Property 4-5: Generate random triggers, verify listing and state persistence
- Property 6-8: Generate workflow executions, verify metrics format and values
- Property 9-12: Generate log entries, verify structure and counter sync
- Property 13-15: Generate log data, verify ZIP contents and filtering
Integration Tests
- End-to-end chain execution flow
- Trigger firing and workflow execution
- Log download with various filters