Initial commit

2026-03-05 00:20:25 +01:00
commit dcd4de9945
1954 changed files with 669380 additions and 0 deletions
--- a/docs/archive/old-summaries/EMBEDDING_SYSTEM_INTEGRATION_GUIDE.md
+++ b/docs/archive/old-summaries/EMBEDDING_SYSTEM_INTEGRATION_GUIDE.md
@@ -0,0 +1,293 @@
+# Guide d'Intégration du Système d'Embeddings
+
+## Vue d'ensemble
+
+Le nouveau système d'embeddings est maintenant prêt à être intégré dans GeniusIA v2. Ce guide explique comment l'utiliser.
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                    Orchestrator                          │
+│  - Gère les workflows                                    │
+│  - Collecte les exemples de fine-tuning                 │
+└────────────────┬────────────────────────────────────────┘
+                 │
+                 ▼
+┌─────────────────────────────────────────────────────────┐
+│              EmbeddingManager                            │
+│  - Sélection de modèle (CLIP recommandé)               │
+│  - Cache LRU (1000 entrées)                            │
+│  - Fallback automatique                                 │
+└────────────────┬────────────────────────────────────────┘
+                 │
+        ┌────────┴────────┐
+        ▼                 ▼
+┌──────────────┐  ┌──────────────────┐
+│ CLIPEmbedder │  │ LightweightFine  │
+│              │  │ Tuner            │
+│ - Embeddings │  │ - Collecte       │
+│ - Fine-tune  │  │ - Trigger auto   │
+└──────────────┘  └──────────────────┘
+        │
+        ▼
+┌──────────────┐
+│  FAISSIndex  │
+│ - Recherche  │
+│ - Persistence│
+└──────────────┘
+```
+
+## Utilisation dans l'Orchestrator
+
+### 1. Initialisation
+
+```python
+from geniusia2.core.embedders import EmbeddingManager, LightweightFineTuner, FAISSIndex
+
+class Orchestrator:
+    def __init__(self, config):
+        # Initialize embedding system
+        self.embedding_manager = EmbeddingManager(
+            model_name="clip",  # Recommandé
+            cache_size=1000,
+            fallback_enabled=True
+        )
+        
+        # Initialize FAISS index
+        self.faiss_index = FAISSIndex(
+            dimension=self.embedding_manager.get_dimension()
+        )
+        
+        # Initialize fine-tuner
+        self.fine_tuner = LightweightFineTuner(
+            embedder=self.embedding_manager.embedder,
+            trigger_threshold=10,  # Fine-tune tous les 10 exemples
+            max_examples=1000
+        )
+        
+        # Load checkpoint if exists
+        self.fine_tuner.load_checkpoint("orchestrator_finetuning")
+```
+
+### 2. Génération d'Embeddings
+
+```python
+def analyze_screenshot(self, screenshot_pil: Image.Image):
+    """Analyser un screenshot et générer son embedding."""
+    # Generate embedding (avec cache automatique)
+    embedding = self.embedding_manager.embed(screenshot_pil)
+    
+    return embedding
+```
+
+### 3. Recherche de Workflows Similaires
+
+```python
+def find_similar_workflows(self, screenshot_pil: Image.Image, k=5):
+    """Trouver les workflows similaires via FAISS."""
+    # Generate embedding
+    embedding = self.embedding_manager.embed(screenshot_pil)
+    
+    # Search in FAISS
+    results = self.faiss_index.search(embedding, k=k)
+    
+    return results
+```
+
+### 4. Ajout d'Exemples pour Fine-tuning
+
+```python
+def on_workflow_accepted(self, screenshot_pil: Image.Image, workflow_id: str):
+    """Appelé quand l'utilisateur accepte un workflow."""
+    # Add positive example for fine-tuning
+    self.fine_tuner.add_positive_example(
+        image=screenshot_pil,
+        workflow_id=workflow_id,
+        metadata={'timestamp': time.time()}
+    )
+    
+    # Save checkpoint periodically
+    if self.fine_tuner.training_count % 5 == 0:
+        self.fine_tuner.save_checkpoint("orchestrator_finetuning")
+
+def on_workflow_rejected(self, screenshot_pil: Image.Image, workflow_id: str):
+    """Appelé quand l'utilisateur rejette un workflow."""
+    # Add negative example for fine-tuning
+    self.fine_tuner.add_negative_example(
+        image=screenshot_pil,
+        workflow_id=workflow_id,
+        metadata={'timestamp': time.time()}
+    )
+```
+
+### 5. Sauvegarde à l'Arrêt
+
+```python
+def shutdown(self):
+    """Appelé à l'arrêt de l'application."""
+    # Wait for any ongoing fine-tuning
+    self.fine_tuner.wait_for_training(timeout=30)
+    
+    # Save checkpoint
+    self.fine_tuner.save_checkpoint("orchestrator_finetuning")
+    
+    # Save FAISS index
+    self.faiss_index.save("data/workflow_embeddings")
+```
+
+## Migration depuis l'Ancien Système
+
+### Ancien Code (EmbeddingsManager)
+
+```python
+# Ancien
+from .embeddings_manager import EmbeddingsManager
+
+embeddings = EmbeddingsManager()
+embedding = embeddings.encode_image(numpy_image)  # numpy BGR
+```
+
+### Nouveau Code (EmbeddingManager)
+
+```python
+# Nouveau
+from .embedders import EmbeddingManager
+from PIL import Image
+import cv2
+
+embedding_manager = EmbeddingManager(model_name="clip")
+
+# Convertir numpy BGR → PIL RGB
+image_rgb = cv2.cvtColor(numpy_image, cv2.COLOR_BGR2RGB)
+pil_image = Image.fromarray(image_rgb)
+
+embedding = embedding_manager.embed(pil_image)
+```
+
+### Compatibilité dans VisionAnalysis
+
+Le code dans `vision_analysis.py` est déjà compatible avec les deux systèmes:
+
+```python
+# Détecte automatiquement quel système est utilisé
+if self._use_new_system:
+    # Nouveau système
+    region_rgb = cv2.cvtColor(region, cv2.COLOR_BGR2RGB)
+    pil_image = Image.fromarray(region_rgb)
+    embedding = self.embeddings.embed(pil_image)
+else:
+    # Ancien système
+    embedding = self.embeddings.encode_image(region)
+```
+
+## Configuration Recommandée
+
+```python
+config = {
+    "embedding": {
+        "model": "clip",  # "clip" ou "pix2struct" (non recommandé)
+        "cache_size": 1000,
+        "fallback_enabled": True
+    },
+    "fine_tuning": {
+        "enabled": True,
+        "trigger_threshold": 10,  # Fine-tune tous les 10 exemples
+        "max_examples": 1000,
+        "checkpoint_dir": "data/fine_tuning"
+    },
+    "faiss": {
+        "index_path": "data/workflow_embeddings"
+    }
+}
+```
+
+## Métriques et Monitoring
+
+### Statistiques du Cache
+
+```python
+stats = embedding_manager.get_stats()
+print(f"Cache hit rate: {stats['cache_hit_rate']:.1%}")
+print(f"Cache size: {stats['cache_size']}/{stats['cache_capacity']}")
+```
+
+### Statistiques du Fine-tuning
+
+```python
+stats = fine_tuner.get_stats()
+print(f"Examples collected: {stats['total_examples']}")
+print(f"Trainings completed: {stats['training_count']}")
+print(f"Is training: {stats['is_training']}")
+
+# Historique des métriques
+for metrics in stats['metrics_history']:
+    print(f"Training #{metrics['training_number']}: "
+          f"loss={metrics['loss']:.4f}, "
+          f"duration={metrics['duration_seconds']:.1f}s")
+```
+
+## Performance Attendue
+
+### CLIP (Recommandé)
+- **Embedding**: ~20ms par image (batch)
+- **Cache hit**: <1ms
+- **Fine-tuning**: 30s-2min pour 10-100 exemples
+- **Mémoire**: ~2GB (modèle) + ~500MB (FAISS pour 10k embeddings)
+
+### Pix2Struct (Non Recommandé)
+- **Embedding**: ~2900ms par image (146x plus lent)
+- **Discrimination**: 9x moins précis que CLIP
+- **Mémoire**: ~4GB (modèle)
+
+## Troubleshooting
+
+### Problème: Dimension mismatch dans FAISS
+
+```python
+# Solution: Rebuild l'index
+if faiss_index.rebuild_if_needed(new_dimension):
+    logger.warning("FAISS index rebuilt due to dimension change")
+```
+
+### Problème: Fine-tuning bloque l'application
+
+```python
+# Vérifier que le fine-tuning est bien en thread séparé
+assert fine_tuner.training_thread.daemon == True
+```
+
+### Problème: Cache ne fonctionne pas
+
+```python
+# Vérifier que use_cache=True (défaut)
+embedding = embedding_manager.embed(image, use_cache=True)
+```
+
+## Tests
+
+Lancer les tests complets:
+
+```bash
+# Test du système de base
+geniusia2/venv/bin/python test_embedding_system.py
+
+# Benchmark CLIP vs Pix2Struct
+geniusia2/venv/bin/python test_pix2struct_vs_clip.py
+```
+
+## Prochaines Étapes
+
+1. ✅ Intégrer dans `Orchestrator.__init__()`
+2. ✅ Connecter aux événements workflow (accept/reject)
+3. ✅ Ajouter sauvegarde à l'arrêt
+4. ✅ Tester en conditions réelles
+5. ✅ Monitorer les métriques de fine-tuning
+
+## Support
+
+Pour toute question, voir:
+- `PIX2STRUCT_BENCHMARK_RESULTS.md` - Résultats des benchmarks
+- `.kiro/specs/embedding-improvement/` - Spec complète
+- Tests dans `test_embedding_system.py`
+