feat: replay E2E fonctionnel — 25/25 actions, 0 retries, SomEngine via serveur

Validé sur PC Windows (DESKTOP-58D5CAC, 2560x1600) :
- 8 clics résolus visuellement (1 anchor_template, 1 som_text_match, 6 som_vlm)
- Score moyen 0.75, temps moyen 1.6s
- Texte tapé correctement (bonjour, test word, date, email)
- 0 retries, 2 actions non vérifiées (OK)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Dom
2026-03-31 14:04:41 +02:00
parent 5e0b53cfd1
commit a7de6a488b
79542 changed files with 6091757 additions and 1 deletions

View File

@@ -0,0 +1,292 @@
# Design Document - Auto-Heal Hybride (Fiche #22)
## Overview
Le système d'Auto-Heal Hybride implémente une approche équilibrée entre continuité de service et sécurité. Il utilise une machine d'état pour gérer les transitions entre différents modes d'exécution, des circuit breakers pour éviter les boucles infinies, et un système de versioning pour permettre le rollback de l'apprentissage.
L'architecture s'appuie sur les systèmes existants (Fiche #19 pour la capture d'échecs, Fiche #18 pour l'apprentissage persistant, Fiche #16 pour les rapports) tout en ajoutant une couche intelligente de supervision et de protection.
## Architecture
```mermaid
graph TB
subgraph "Auto-Heal Hybrid System"
AHM[Auto Heal Manager]
CB[Circuit Breaker]
VS[Versioned Store]
PC[Policy Config]
end
subgraph "Execution Layer"
EL[Execution Loop]
AE[Action Executor]
TR[Target Resolver]
end
subgraph "Learning Layer"
TMS[Target Memory Store]
FAISS[FAISS Index]
PROTO[Prototypes]
end
subgraph "Integration Layer"
FCR[Failure Case Recorder]
SR[Simulation Report]
PM[Precision Metrics]
end
EL --> AHM
AHM --> CB
AHM --> VS
AHM --> PC
AHM --> FCR
AHM --> SR
AHM --> PM
VS --> TMS
VS --> FAISS
VS --> PROTO
AHM --> AE
AE --> TR
```
## Components and Interfaces
### 1. Auto Heal Manager (core/system/auto_heal_manager.py)
**Responsabilité:** Gestionnaire central des états d'exécution et des politiques de sécurité.
```python
class ExecutionState(Enum):
RUNNING = "running"
DEGRADED = "degraded"
QUARANTINED = "quarantined"
ROLLBACK = "rollback"
PAUSED = "paused"
class AutoHealManager:
def __init__(self, policy_path: Path = Path("data/config/auto_heal_policy.json"))
def should_execute_step(self, workflow_id: str, step_id: str) -> Tuple[bool, str]
def on_step_result(self, workflow_id: str, step_id: str, result: ExecutionResult) -> None
def get_mode(self, workflow_id: str) -> ExecutionState
def force_transition(self, workflow_id: str, new_state: ExecutionState, reason: str) -> None
def get_status_report(self) -> Dict[str, Any]
```
**Intégration avec l'execution loop:**
```python
# Dans execution_loop.py ou action_executor.py
before_step = auto_heal_manager.should_execute_step(workflow_id, step_id)
if not before_step[0]:
return ExecutionResult(status=ExecutionStatus.BLOCKED, message=before_step[1])
# Exécuter l'action...
result = execute_action(...)
# Après exécution
auto_heal_manager.on_step_result(workflow_id, step_id, result)
```
### 2. Circuit Breaker (core/system/circuit_breaker.py)
**Responsabilité:** Mécanisme anti-boucle avec fenêtres glissantes.
```python
class CircuitBreaker:
def __init__(self, policy: Dict[str, Any])
def record_failure(self, workflow_id: str, step_id: str, failure_type: str) -> None
def record_success(self, workflow_id: str, step_id: str) -> None
def should_trigger_degraded(self, workflow_id: str, step_id: str) -> bool
def should_trigger_quarantine(self, workflow_id: str) -> bool
def should_trigger_global_pause(self) -> bool
def get_failure_counts(self, workflow_id: str) -> Dict[str, int]
```
**Fenêtres glissantes:**
- Step level: 3 échecs consécutifs → DEGRADED
- Workflow level: 10 échecs en 10 minutes → QUARANTINED
- Global level: 30 échecs en 10 minutes → PAUSE (optionnel)
### 3. Versioned Store (core/learning/versioned_store.py)
**Responsabilité:** Système de versioning pour l'apprentissage réversible.
```python
class VersionedStore:
def __init__(self, base_path: Path = Path("data"))
def snapshot_version(self, workflow_id: str) -> str
def rollback_to_previous(self, workflow_id: str, version: Optional[str] = None) -> bool
def list_versions(self, workflow_id: str) -> List[VersionInfo]
def cleanup_old_versions(self, workflow_id: str, keep_count: int = 5) -> None
# Versioning des composants
def version_prototypes(self, workflow_id: str, version: str) -> None
def version_faiss_index(self, workflow_id: str, version: str) -> None
def version_target_memory(self, workflow_id: str, version: str) -> None
```
**Structure de versioning:**
```
data/
├── learning/
│ └── prototypes/
│ └── v001/ # Version snapshots
│ └── v002/
├── faiss_index/
│ └── workflow_<id>/
│ └── v001/ # Versioned indices
│ └── v002/
└── target_memory_snapshots/
└── v001.db # SQLite snapshots
└── v002.db
```
### 4. Policy Configuration (data/config/auto_heal_policy.json)
**Structure de configuration:**
```json
{
"mode": "hybrid",
"step_fail_streak_to_degraded": 3,
"workflow_fail_window_s": 600,
"workflow_fail_max_in_window": 10,
"global_fail_max_in_window": 30,
"min_confidence_normal": 0.72,
"min_confidence_degraded": 0.82,
"min_margin_top1_top2_degraded": 0.08,
"disable_learning_in_degraded": true,
"rollback_on_regression": true,
"regression_window_steps": 50,
"regression_fail_ratio": 0.20,
"quarantine_duration_s": 1800,
"max_versions_to_keep": 5
}
```
## Data Models
### ExecutionStateInfo
```python
@dataclass
class ExecutionStateInfo:
workflow_id: str
current_state: ExecutionState
state_since: datetime
failure_count: int
last_failure: Optional[datetime]
confidence_threshold: float
learning_enabled: bool
quarantine_until: Optional[datetime]
```
### FailureWindow
```python
@dataclass
class FailureWindow:
window_start: datetime
window_duration_s: int
failures: List[FailureEvent]
def add_failure(self, failure: FailureEvent) -> None
def get_failure_count(self) -> int
def cleanup_expired(self) -> None
```
### VersionInfo
```python
@dataclass
class VersionInfo:
version_id: str
created_at: datetime
workflow_id: str
success_rate_before: float
success_rate_after: Optional[float]
components_versioned: List[str] # ["prototypes", "faiss", "memory"]
```
## Correctness Properties
*A property is a characteristic or behavior that should hold true across all valid executions of a system-essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
### Property 1: State Transition Consistency
*For any* workflow execution state, transitions should follow valid state machine rules and maintain consistency across all system components.
**Validates: Requirements 1.1, 1.2, 1.3, 1.4, 1.5, 1.6**
### Property 2: Circuit Breaker Threshold Enforcement
*For any* sequence of step failures, when thresholds are exceeded, the circuit breaker should trigger appropriate state transitions within the configured time windows.
**Validates: Requirements 2.1, 2.2, 2.3**
### Property 3: Degraded Mode Safety
*For any* workflow in DEGRADED state, all execution decisions should use increased confidence thresholds and learning updates should be disabled.
**Validates: Requirements 3.1, 3.2, 3.3, 3.4, 3.5, 3.6**
### Property 4: Rollback Consistency
*For any* rollback operation, all versioned components (prototypes, FAISS indices, target memory) should be restored to the same consistent version point.
**Validates: Requirements 4.1, 4.2, 4.3, 4.5, 4.6**
### Property 5: Hybrid Storage Integrity
*For any* execution decision, audit records should always be written to JSONL, and SQLite records should only be written for validated successes when not in DEGRADED mode.
**Validates: Requirements 5.1, 5.2, 5.3, 5.4**
### Property 6: Configuration Consistency
*For any* configuration change, all system components should apply the new settings consistently without requiring restart.
**Validates: Requirements 6.1, 6.2, 6.3, 6.4, 6.5**
### Property 7: Integration Compatibility
*For any* existing system integration point, the auto-healing system should maintain backward compatibility and enhance functionality without breaking existing workflows.
**Validates: Requirements 7.1, 7.2, 7.3, 7.4, 7.5**
## Error Handling
### Failure Classification
1. **TARGET_NOT_FOUND**: Élément UI non trouvé
2. **POSTCONDITION_FAILED**: Post-conditions non satisfaites
3. **WATCHDOG_TIMEOUT**: Timeout de surveillance
4. **LOW_CONFIDENCE**: Confiance FAISS insuffisante
5. **RUNTIME_DRIFT**: Changement de résolution/scale
### Recovery Strategies
1. **Immediate**: Retry avec paramètres normaux
2. **Degraded**: Retry avec seuils augmentés
3. **Quarantine**: Arrêt du workflow avec capture
4. **Rollback**: Restauration version précédente
5. **Manual**: Intervention humaine requise
### Error Propagation
- Les erreurs de step remontent au niveau workflow
- Les erreurs de workflow peuvent déclencher des actions globales
- Chaque erreur génère un FailureCase (Fiche #19)
- Les erreurs critiques génèrent des rapports (Fiche #16)
## Testing Strategy
### Unit Tests
- Test des transitions d'état individuelles
- Test des seuils de circuit breaker
- Test des opérations de versioning
- Test de la configuration policy
### Property Tests
- Test des propriétés de cohérence d'état
- Test des invariants de seuil
- Test de l'intégrité des rollbacks
- Test de la consistance du stockage hybride
### Integration Tests
- Test avec les systèmes existants (Fiche #19, #18, #16)
- Test des hooks d'exécution
- Test des scénarios de dégradation
- Test des rollbacks complets
### Scenario Tests
- Simulation de 3 échecs consécutifs → DEGRADED
- Simulation de 10 échecs en 10 min → QUARANTINED
- Simulation de dégradation d'apprentissage → ROLLBACK
- Test de récupération après quarantaine
La stratégie de test utilise à la fois des tests unitaires pour les cas spécifiques et des tests de propriétés pour valider les invariants universels. Les tests d'intégration vérifient la compatibilité avec les systèmes existants, tandis que les tests de scénarios valident les comportements de bout en bout.

View File

@@ -0,0 +1,139 @@
# Requirements Document
## Introduction
Système d'auto-healing hybride qui maintient la continuité de service tout en garantissant la sécurité. Le système continue à fonctionner tant que c'est sûr, ralentit et durcit les critères quand c'est flou, et s'arrête localement quand c'est dangereux.
## Glossary
- **Auto_Heal_Manager**: Gestionnaire central des états d'exécution et des politiques de sécurité
- **Circuit_Breaker**: Mécanisme anti-boucle qui surveille les échecs consécutifs
- **Versioned_Store**: Système de versioning pour l'apprentissage réversible
- **Execution_State**: État d'un workflow (RUNNING, DEGRADED, QUARANTINED, ROLLBACK, PAUSED)
- **Failure_Window**: Fenêtre glissante pour compter les échecs sur une période donnée
- **Confidence_Threshold**: Seuil de confiance pour les décisions d'exécution
## Requirements
### Requirement 1: États d'Exécution avec Machine d'État
**User Story:** En tant que système RPA, je veux gérer différents états d'exécution pour chaque workflow, afin de maintenir la continuité tout en garantissant la sécurité.
#### Acceptance Criteria
1. THE Auto_Heal_Manager SHALL maintain execution states for each workflow: RUNNING, DEGRADED, QUARANTINED, ROLLBACK, PAUSED
2. WHEN a workflow is in RUNNING state, THE Auto_Heal_Manager SHALL allow normal execution with standard confidence thresholds
3. WHEN a workflow transitions to DEGRADED state, THE Auto_Heal_Manager SHALL increase confidence thresholds and disable learning updates
4. WHEN a workflow is QUARANTINED, THE Auto_Heal_Manager SHALL prevent execution and create a FailureCase record
5. WHEN a workflow enters ROLLBACK state, THE Auto_Heal_Manager SHALL restore previous stable versions of prototypes and FAISS indices
6. WHEN a workflow is PAUSED, THE Auto_Heal_Manager SHALL halt execution until manual intervention
### Requirement 2: Circuit Breaker Anti-Boucle
**User Story:** En tant qu'administrateur système, je veux un mécanisme de circuit breaker pour éviter les boucles infinies d'échecs, afin de protéger la stabilité du système.
#### Acceptance Criteria
1. WHEN 3 consecutive step failures occur, THE Circuit_Breaker SHALL transition the workflow to DEGRADED state
2. WHEN 10 failures occur within a 10-minute window for a workflow, THE Circuit_Breaker SHALL transition it to QUARANTINED state
3. WHEN 30 global failures occur within a 10-minute window, THE Circuit_Breaker SHALL optionally trigger system-wide PAUSE
4. WHEN a circuit breaker triggers, THE Auto_Heal_Manager SHALL create a FailureCase record with Fiche #19 integration
5. WHEN a circuit breaker triggers, THE Auto_Heal_Manager SHALL generate a mini report using Fiche #16 if scenario available
### Requirement 3: Mode Dégradé Conservateur
**User Story:** En tant que système RPA, je veux un mode dégradé qui continue l'exécution avec des critères plus stricts, afin de maintenir la continuité tout en réduisant les risques.
#### Acceptance Criteria
1. WHEN in DEGRADED mode, THE Auto_Heal_Manager SHALL increase min_confidence threshold by 0.1
2. WHEN in DEGRADED mode, THE Auto_Heal_Manager SHALL enforce minimum top1-top2 margin for FAISS matches
3. WHEN in DEGRADED mode, THE Auto_Heal_Manager SHALL disable learning updates (no prototype updates, no memory writes)
4. WHEN in DEGRADED mode, THE Auto_Heal_Manager SHALL require hard_constraints if provided
5. WHEN in DEGRADED mode, THE Auto_Heal_Manager SHALL refuse clicks if ambiguous flag is true
6. WHEN in DEGRADED mode, THE Auto_Heal_Manager SHALL only proceed with clear and unambiguous targets
### Requirement 4: Apprentissage Réversible avec Rollback
**User Story:** En tant que système d'apprentissage, je veux pouvoir revenir à des versions stables précédentes, afin de récupérer d'une dégradation de performance causée par un mauvais apprentissage.
#### Acceptance Criteria
1. WHEN consolidating prototypes, THE Versioned_Store SHALL create versioned snapshots in data/learning/prototypes/vNN/
2. WHEN updating FAISS indices, THE Versioned_Store SHALL version indices in data/faiss_index/workflow_<id>/vNN/
3. WHEN updating target memory, THE Versioned_Store SHALL create SQLite snapshots or WAL checkpoints
4. WHEN success rate drops by >20% over 50 steps after an update, THE Auto_Heal_Manager SHALL trigger automatic rollback
5. WHEN rollback is triggered, THE Versioned_Store SHALL restore prototypes, FAISS indices, and target memory to previous stable version
6. THE Versioned_Store SHALL maintain rollback capability to at least 5 previous versions
### Requirement 5: Stockage Hybride des Décisions
**User Story:** En tant que système d'audit, je veux un stockage hybride qui maintient la traçabilité complète tout en optimisant les performances, afin d'assurer la conformité et l'efficacité.
#### Acceptance Criteria
1. THE Auto_Heal_Manager SHALL write all decisions to JSONL format for complete audit trail
2. THE Auto_Heal_Manager SHALL write to SQLite only for validated successes (postconditions OK)
3. WHEN in DEGRADED mode, THE Auto_Heal_Manager SHALL NOT write success records to SQLite
4. THE Auto_Heal_Manager SHALL maintain decision metadata including confidence scores and execution state
5. THE Auto_Heal_Manager SHALL provide query interface for decision history and patterns
### Requirement 6: Configuration de Politique Flexible
**User Story:** En tant qu'administrateur, je veux configurer les seuils et politiques d'auto-healing, afin d'adapter le comportement aux besoins spécifiques de l'environnement.
#### Acceptance Criteria
1. THE Auto_Heal_Manager SHALL load configuration from data/config/auto_heal_policy.json
2. THE configuration SHALL include step failure thresholds, time windows, and confidence levels
3. THE configuration SHALL support different modes: hybrid, conservative, aggressive
4. THE configuration SHALL allow customization of degraded mode behavior and rollback triggers
5. WHEN configuration changes, THE Auto_Heal_Manager SHALL apply new settings without restart
### Requirement 7: Intégration avec Systèmes Existants
**User Story:** En tant que développeur, je veux que l'auto-healing s'intègre seamlessly avec les systèmes existants, afin de maintenir la compatibilité et réutiliser les fonctionnalités.
#### Acceptance Criteria
1. THE Auto_Heal_Manager SHALL integrate with Fiche #19 FailureCase recording for automatic capture
2. THE Auto_Heal_Manager SHALL integrate with Fiche #16 simulation reports when scenarios are available
3. THE Auto_Heal_Manager SHALL integrate with Fiche #18 persistent learning for rollback decisions
4. THE Auto_Heal_Manager SHALL integrate with Fiche #10 precision metrics for confidence scoring
5. THE Auto_Heal_Manager SHALL provide hooks in execution loop for should_execute_step() and on_step_result()
### Requirement 8: Surveillance et Métriques
**User Story:** En tant qu'administrateur, je veux surveiller l'état et les performances du système d'auto-healing, afin de détecter les problèmes et optimiser les configurations.
#### Acceptance Criteria
1. THE Auto_Heal_Manager SHALL maintain sliding window counters for step, workflow, and global failures
2. THE Auto_Heal_Manager SHALL track state transition history and durations
3. THE Auto_Heal_Manager SHALL provide real-time status via get_mode() and health endpoints
4. THE Auto_Heal_Manager SHALL generate alerts for quarantine events and rollback triggers
5. THE Auto_Heal_Manager SHALL expose metrics for monitoring dashboard integration
### Requirement 9: Déclencheurs de Transition d'État
**User Story:** En tant que système de surveillance, je veux des déclencheurs clairs pour les transitions d'état, afin de réagir de manière prévisible aux conditions du système.
#### Acceptance Criteria
1. WHEN TARGET_NOT_FOUND occurs repeatedly, THE Auto_Heal_Manager SHALL trigger appropriate state transitions
2. WHEN POSTCONDITION_FAILED occurs repeatedly, THE Auto_Heal_Manager SHALL trigger degraded or quarantine states
3. WHEN WATCHDOG_TIMEOUT occurs, THE Auto_Heal_Manager SHALL consider it as a failure event
4. WHEN node_match_confidence is low (FAISS top1-top2 too close), THE Auto_Heal_Manager SHALL trigger degraded mode
5. WHEN runtime drift is detected (resolution/scale changes), THE Auto_Heal_Manager SHALL adapt thresholds accordingly
### Requirement 10: Tests et Validation
**User Story:** En tant que développeur, je veux pouvoir tester facilement le système d'auto-healing, afin de valider son comportement dans différents scénarios d'échec.
#### Acceptance Criteria
1. THE system SHALL provide test scenarios for forcing TARGET_NOT_FOUND failures
2. THE system SHALL demonstrate DEGRADED mode activation after 3 consecutive failures
3. THE system SHALL demonstrate QUARANTINED mode activation after threshold breaches
4. THE system SHALL demonstrate rollback functionality with intentionally degraded prototypes
5. THE system SHALL provide validation tools for configuration and state transitions

View File

@@ -0,0 +1,246 @@
# Implementation Plan: Auto-Heal Hybride (Fiche #22)
## Overview
Implémentation du système d'auto-healing hybride qui équilibre continuité et sécurité. Le plan suit une approche incrémentale : configuration et modèles de base, circuit breaker, gestionnaire principal, système de versioning, intégration avec l'exécution, et tests complets.
## Tasks
- [ ] 1. Setup configuration and base models
- Create policy configuration structure
- Define execution state enums and data models
- Set up base directory structure for versioning
- _Requirements: 6.1, 6.2, 6.3_
- [x] 1.1 Create policy configuration system
- Implement auto_heal_policy.json structure
- Create PolicyConfig class with validation
- Add configuration loading and hot-reload capability
- _Requirements: 6.1, 6.4, 6.5_
- [ ]* 1.2 Write property test for configuration consistency
- **Property 6: Configuration Consistency**
- **Validates: Requirements 6.1, 6.2, 6.3, 6.4, 6.5**
- [x] 1.3 Implement base data models
- Create ExecutionStateInfo, FailureWindow, VersionInfo dataclasses
- Implement ExecutionState enum with validation
- Add serialization/deserialization methods
- _Requirements: 1.1, 2.1, 4.1_
- [ ]* 1.4 Write unit tests for data models
- Test state transitions and validation
- Test failure window operations
- Test version info management
- _Requirements: 1.1, 2.1, 4.1_
- [ ] 2. Implement Circuit Breaker system
- [ ] 2.1 Create CircuitBreaker class with sliding windows
- Implement failure counting with time windows
- Add step-level, workflow-level, and global-level tracking
- Create threshold checking methods
- _Requirements: 2.1, 2.2, 2.3_
- [ ]* 2.2 Write property test for circuit breaker thresholds
- **Property 2: Circuit Breaker Threshold Enforcement**
- **Validates: Requirements 2.1, 2.2, 2.3**
- [ ] 2.3 Implement failure window management
- Create sliding window data structure
- Add automatic cleanup of expired failures
- Implement efficient failure counting
- _Requirements: 2.1, 2.2, 2.3_
- [ ]* 2.4 Write unit tests for circuit breaker
- Test sliding window behavior
- Test threshold triggering
- Test failure cleanup
- _Requirements: 2.1, 2.2, 2.3_
- [ ] 3. Create Versioned Store system
- [ ] 3.1 Implement VersionedStore class
- Create version snapshot functionality
- Implement rollback operations
- Add version listing and cleanup
- _Requirements: 4.1, 4.2, 4.3, 4.6_
- [ ] 3.2 Implement component versioning
- Add prototypes versioning (data/learning/prototypes/vNN/)
- Add FAISS index versioning (data/faiss_index/workflow_<id>/vNN/)
- Add target memory versioning (SQLite snapshots)
- _Requirements: 4.1, 4.2, 4.3_
- [ ]* 3.3 Write property test for rollback consistency
- **Property 4: Rollback Consistency**
- **Validates: Requirements 4.1, 4.2, 4.3, 4.5, 4.6**
- [ ]* 3.4 Write unit tests for versioned store
- Test snapshot creation
- Test rollback operations
- Test version cleanup
- _Requirements: 4.1, 4.2, 4.3, 4.6_
- [ ] 4. Checkpoint - Ensure core components work
- Ensure all tests pass, ask the user if questions arise.
- [ ] 5. Implement Auto Heal Manager
- [ ] 5.1 Create AutoHealManager class
- Implement state machine for execution states
- Add should_execute_step() and on_step_result() methods
- Create state transition logic
- _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 1.6_
- [ ] 5.2 Implement degraded mode logic
- Add confidence threshold adjustment
- Implement learning disable functionality
- Add strict validation for ambiguous targets
- _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6_
- [ ]* 5.3 Write property test for state transitions
- **Property 1: State Transition Consistency**
- **Validates: Requirements 1.1, 1.2, 1.3, 1.4, 1.5, 1.6**
- [ ]* 5.4 Write property test for degraded mode safety
- **Property 3: Degraded Mode Safety**
- **Validates: Requirements 3.1, 3.2, 3.3, 3.4, 3.5, 3.6**
- [ ] 5.5 Implement hybrid storage system
- Add JSONL audit logging for all decisions
- Add SQLite logging only for validated successes
- Implement storage filtering based on execution state
- _Requirements: 5.1, 5.2, 5.3, 5.4, 5.5_
- [ ]* 5.6 Write property test for hybrid storage integrity
- **Property 5: Hybrid Storage Integrity**
- **Validates: Requirements 5.1, 5.2, 5.3, 5.4**
- [ ] 6. Integrate with existing systems
- [ ] 6.1 Add Fiche #19 FailureCase integration
- Integrate with failure_case_recorder.py
- Add automatic capture on quarantine events
- Include auto-heal context in failure records
- _Requirements: 7.1, 2.4_
- [ ] 6.2 Add Fiche #16 simulation report integration
- Generate mini reports on circuit breaker triggers
- Include scenario data when available
- Add auto-heal specific report sections
- _Requirements: 7.2, 2.5_
- [ ] 6.3 Add Fiche #18 persistent learning integration
- Integrate with target_memory_store.py for rollback decisions
- Add learning success rate monitoring
- Implement regression detection
- _Requirements: 7.3, 4.4_
- [ ] 6.4 Add Fiche #10 precision metrics integration
- Use precision metrics for confidence scoring
- Integrate with metrics_engine.py
- Add auto-heal metrics to precision dashboard
- _Requirements: 7.4_
- [ ]* 6.5 Write property test for integration compatibility
- **Property 7: Integration Compatibility**
- **Validates: Requirements 7.1, 7.2, 7.3, 7.4, 7.5**
- [ ] 7. Add execution loop hooks
- [ ] 7.1 Modify action_executor.py integration
- Add should_execute_step() call before action execution
- Add on_step_result() call after action completion
- Implement execution blocking for quarantined workflows
- _Requirements: 7.5, 1.4_
- [ ] 7.2 Implement trigger detection
- Add TARGET_NOT_FOUND detection and counting
- Add POSTCONDITION_FAILED detection and counting
- Add WATCHDOG_TIMEOUT detection
- Add low confidence detection from FAISS matches
- _Requirements: 9.1, 9.2, 9.3, 9.4_
- [ ]* 7.3 Write integration tests for execution hooks
- Test execution flow with auto-heal manager
- Test blocking of quarantined workflows
- Test state transitions during execution
- _Requirements: 7.5, 1.4, 1.5_
- [ ] 8. Implement monitoring and metrics
- [ ] 8.1 Add status reporting
- Implement get_status_report() method
- Add real-time state monitoring
- Create health endpoint integration
- _Requirements: 8.3, 8.4_
- [ ] 8.2 Add metrics collection
- Track state transition history and durations
- Maintain sliding window counters
- Add performance metrics for auto-heal operations
- _Requirements: 8.1, 8.2, 8.5_
- [ ]* 8.3 Write unit tests for monitoring
- Test status reporting accuracy
- Test metrics collection
- Test alert generation
- _Requirements: 8.1, 8.2, 8.3, 8.4_
- [ ] 9. Create test scenarios and validation
- [ ] 9.1 Implement forced failure scenarios
- Create test workflow with non-existent targets
- Add configuration for test mode
- Implement failure injection capabilities
- _Requirements: 10.1, 10.2_
- [ ] 9.2 Create degraded mode test
- Force 3 consecutive TARGET_NOT_FOUND failures
- Verify transition to DEGRADED state
- Validate increased confidence thresholds
- _Requirements: 10.2, 3.1, 3.2_
- [ ] 9.3 Create quarantine test
- Force 10 failures in 10-minute window
- Verify transition to QUARANTINED state
- Validate FailureCase creation
- _Requirements: 10.3, 2.2, 2.4_
- [ ] 9.4 Create rollback test
- Intentionally degrade prototype quality
- Trigger automatic rollback on performance drop
- Verify restoration of previous versions
- _Requirements: 10.4, 4.4, 4.5_
- [ ]* 9.5 Write comprehensive integration tests
- Test complete auto-heal workflow scenarios
- Test interaction with all integrated systems
- Test configuration changes and hot-reload
- _Requirements: 10.1, 10.2, 10.3, 10.4, 10.5_
- [ ] 10. Final checkpoint and documentation
- [ ] 10.1 Create demo script
- Implement demo_auto_heal_hybrid.py
- Show all state transitions and features
- Include performance metrics and monitoring
- _Requirements: 10.1, 10.2, 10.3, 10.4, 10.5_
- [ ] 10.2 Update system integration
- Update run.sh and launch scripts
- Add auto-heal configuration to deployment
- Update monitoring dashboard integration
- _Requirements: 7.5, 8.5_
- [ ] 10.3 Final validation
- Run complete test suite
- Validate all integration points
- Test production deployment scenario
- _Requirements: 10.5_
- [ ] 11. Final checkpoint - Ensure all tests pass
- Ensure all tests pass, ask the user if questions arise.
## Notes
- Tasks marked with `*` are optional and can be skipped for faster MVP
- Each task references specific requirements for traceability
- Checkpoints ensure incremental validation
- Property tests validate universal correctness properties
- Unit tests validate specific examples and edge cases
- Integration tests verify compatibility with existing systems (Fiche #19, #18, #16, #10)
- The implementation follows the hybrid approach: continue when safe, degrade when uncertain, stop when dangerous