Requirements - Amélioration du Système d'Embeddings et Fine-tuning

Introduction

Ce document spécifie les exigences pour améliorer le système d'embeddings visuels utilisé pour le matching de workflows. Le système doit corriger les problèmes FAISS actuels, créer une abstraction pour supporter plusieurs modèles (CLIP, Pix2Struct), et implémenter un fine-tuning léger incrémental pour améliorer la précision au fil du temps.

Glossary

Embedding: Représentation vectorielle d'une image capturée d'écran
FAISS: Bibliothèque Facebook AI pour la recherche de similarité vectorielle
CLIP: Modèle vision-language d'OpenAI pour les embeddings génériques
Pix2Struct: Modèle Google spécialisé pour la compréhension d'interfaces utilisateur
Fine-tuning Léger: Ajustement incrémental de la dernière couche du modèle
Embedder: Interface abstraite pour générer des embeddings d'images
Workflow Match: Correspondance entre une situation actuelle et un workflow connu

Requirements

Requirement 1

User Story: As a developer, I want the FAISS embedding system to work correctly, so that workflow matching can function reliably.

Acceptance Criteria

WHEN the system generates embeddings THEN the FAISS index SHALL store them without dimension errors
WHEN the system searches for similar embeddings THEN FAISS SHALL return results with valid similarity scores
WHEN embeddings are saved to disk THEN the system SHALL persist both the index and metadata correctly
WHEN embeddings are loaded from disk THEN the system SHALL restore the exact same index state
WHEN the embedding dimension changes THEN the system SHALL rebuild the index automatically

Requirement 2

User Story: As a developer, I want an abstraction layer for embedding models, so that I can easily switch between CLIP and Pix2Struct.

Acceptance Criteria

WHEN creating an embedder THEN the system SHALL provide a common interface for all models
WHEN generating embeddings THEN the interface SHALL accept PIL images and return numpy arrays
WHEN switching models THEN the system SHALL maintain backward compatibility with existing workflows
WHEN a model fails to load THEN the system SHALL fallback to CLIP automatically
WHEN using different models THEN the system SHALL normalize embeddings to comparable scales

Requirement 3

User Story: As a user, I want Pix2Struct to be available as an embedding model, so that UI matching accuracy improves over CLIP.

Acceptance Criteria

WHEN Pix2Struct is selected THEN the system SHALL load the model on first use
WHEN generating embeddings with Pix2Struct THEN the system SHALL use GPU if available
WHEN Pix2Struct generates embeddings THEN they SHALL have consistent dimensions
WHEN comparing Pix2Struct and CLIP THEN Pix2Struct SHALL show better UI element recognition
WHEN Pix2Struct is unavailable THEN the system SHALL fallback to CLIP without errors

Requirement 4

User Story: As a user, I want the system to learn from my usage patterns, so that workflow matching improves over time.

Acceptance Criteria

WHEN a workflow is successfully executed THEN the system SHALL collect the screen embeddings as positive examples
WHEN a workflow suggestion is rejected THEN the system SHALL collect the screen embeddings as negative examples
WHEN 10 new examples are collected THEN the system SHALL trigger a lightweight fine-tuning update
WHEN fine-tuning runs THEN the system SHALL complete within 2 minutes
WHEN fine-tuning completes THEN the system SHALL update the model weights without restarting

Requirement 5

User Story: As a user, I want fine-tuning to happen in the background, so that it doesn't interrupt my work.

Acceptance Criteria

WHEN fine-tuning is triggered THEN the system SHALL run it in a separate thread
WHEN fine-tuning is running THEN the system SHALL continue using the current model
WHEN fine-tuning completes THEN the system SHALL swap to the new model atomically
WHEN fine-tuning fails THEN the system SHALL log the error and keep the current model
WHEN the system shuts down during fine-tuning THEN the system SHALL save partial progress

Requirement 6

User Story: As a developer, I want the embedding system to be efficient, so that it doesn't slow down the application.

Acceptance Criteria

WHEN generating embeddings THEN the system SHALL cache results for identical images
WHEN the cache exceeds 1000 entries THEN the system SHALL evict least recently used entries
WHEN using GPU THEN the system SHALL batch multiple embedding requests
WHEN GPU is unavailable THEN the system SHALL use CPU without errors
WHEN measuring performance THEN embedding generation SHALL take less than 200ms per image

Requirement 7

User Story: As a user, I want to see which embedding model is being used, so that I can understand system behavior.

Acceptance Criteria

WHEN the system starts THEN the system SHALL log which embedding model is loaded
WHEN fine-tuning completes THEN the system SHALL log the improvement metrics
WHEN switching models THEN the system SHALL notify the user via logs
WHEN embeddings are generated THEN the system SHALL include model metadata
WHEN debugging THEN the system SHALL provide embedding visualization tools

5.3 KiB Raw Blame History

Requirements - Amélioration du Système d'Embeddings et Fine-tuning

Introduction

Glossary

Requirements

Requirement 1

Acceptance Criteria

Requirement 2

Acceptance Criteria

Requirement 3

Acceptance Criteria

Requirement 4

Acceptance Criteria

Requirement 5

Acceptance Criteria

Requirement 6

Acceptance Criteria

Requirement 7

Acceptance Criteria

5.3 KiB

Raw Blame History