Files
Geniusia_v2/.kiro/specs/embedding-improvement/requirements.md
2026-03-05 00:20:25 +01:00

5.3 KiB

Requirements - Amélioration du Système d'Embeddings et Fine-tuning

Introduction

Ce document spécifie les exigences pour améliorer le système d'embeddings visuels utilisé pour le matching de workflows. Le système doit corriger les problèmes FAISS actuels, créer une abstraction pour supporter plusieurs modèles (CLIP, Pix2Struct), et implémenter un fine-tuning léger incrémental pour améliorer la précision au fil du temps.

Glossary

  • Embedding: Représentation vectorielle d'une image capturée d'écran
  • FAISS: Bibliothèque Facebook AI pour la recherche de similarité vectorielle
  • CLIP: Modèle vision-language d'OpenAI pour les embeddings génériques
  • Pix2Struct: Modèle Google spécialisé pour la compréhension d'interfaces utilisateur
  • Fine-tuning Léger: Ajustement incrémental de la dernière couche du modèle
  • Embedder: Interface abstraite pour générer des embeddings d'images
  • Workflow Match: Correspondance entre une situation actuelle et un workflow connu

Requirements

Requirement 1

User Story: As a developer, I want the FAISS embedding system to work correctly, so that workflow matching can function reliably.

Acceptance Criteria

  1. WHEN the system generates embeddings THEN the FAISS index SHALL store them without dimension errors
  2. WHEN the system searches for similar embeddings THEN FAISS SHALL return results with valid similarity scores
  3. WHEN embeddings are saved to disk THEN the system SHALL persist both the index and metadata correctly
  4. WHEN embeddings are loaded from disk THEN the system SHALL restore the exact same index state
  5. WHEN the embedding dimension changes THEN the system SHALL rebuild the index automatically

Requirement 2

User Story: As a developer, I want an abstraction layer for embedding models, so that I can easily switch between CLIP and Pix2Struct.

Acceptance Criteria

  1. WHEN creating an embedder THEN the system SHALL provide a common interface for all models
  2. WHEN generating embeddings THEN the interface SHALL accept PIL images and return numpy arrays
  3. WHEN switching models THEN the system SHALL maintain backward compatibility with existing workflows
  4. WHEN a model fails to load THEN the system SHALL fallback to CLIP automatically
  5. WHEN using different models THEN the system SHALL normalize embeddings to comparable scales

Requirement 3

User Story: As a user, I want Pix2Struct to be available as an embedding model, so that UI matching accuracy improves over CLIP.

Acceptance Criteria

  1. WHEN Pix2Struct is selected THEN the system SHALL load the model on first use
  2. WHEN generating embeddings with Pix2Struct THEN the system SHALL use GPU if available
  3. WHEN Pix2Struct generates embeddings THEN they SHALL have consistent dimensions
  4. WHEN comparing Pix2Struct and CLIP THEN Pix2Struct SHALL show better UI element recognition
  5. WHEN Pix2Struct is unavailable THEN the system SHALL fallback to CLIP without errors

Requirement 4

User Story: As a user, I want the system to learn from my usage patterns, so that workflow matching improves over time.

Acceptance Criteria

  1. WHEN a workflow is successfully executed THEN the system SHALL collect the screen embeddings as positive examples
  2. WHEN a workflow suggestion is rejected THEN the system SHALL collect the screen embeddings as negative examples
  3. WHEN 10 new examples are collected THEN the system SHALL trigger a lightweight fine-tuning update
  4. WHEN fine-tuning runs THEN the system SHALL complete within 2 minutes
  5. WHEN fine-tuning completes THEN the system SHALL update the model weights without restarting

Requirement 5

User Story: As a user, I want fine-tuning to happen in the background, so that it doesn't interrupt my work.

Acceptance Criteria

  1. WHEN fine-tuning is triggered THEN the system SHALL run it in a separate thread
  2. WHEN fine-tuning is running THEN the system SHALL continue using the current model
  3. WHEN fine-tuning completes THEN the system SHALL swap to the new model atomically
  4. WHEN fine-tuning fails THEN the system SHALL log the error and keep the current model
  5. WHEN the system shuts down during fine-tuning THEN the system SHALL save partial progress

Requirement 6

User Story: As a developer, I want the embedding system to be efficient, so that it doesn't slow down the application.

Acceptance Criteria

  1. WHEN generating embeddings THEN the system SHALL cache results for identical images
  2. WHEN the cache exceeds 1000 entries THEN the system SHALL evict least recently used entries
  3. WHEN using GPU THEN the system SHALL batch multiple embedding requests
  4. WHEN GPU is unavailable THEN the system SHALL use CPU without errors
  5. WHEN measuring performance THEN embedding generation SHALL take less than 200ms per image

Requirement 7

User Story: As a user, I want to see which embedding model is being used, so that I can understand system behavior.

Acceptance Criteria

  1. WHEN the system starts THEN the system SHALL log which embedding model is loaded
  2. WHEN fine-tuning completes THEN the system SHALL log the improvement metrics
  3. WHEN switching models THEN the system SHALL notify the user via logs
  4. WHEN embeddings are generated THEN the system SHALL include model metadata
  5. WHEN debugging THEN the system SHALL provide embedding visualization tools