perf(ocr): EasyOCR remplace docTR dans FastDetector + TitleVerifier

FastDetector : EasyOCR GPU en singleton (~192ms vs 1300ms docTR = 6.8x)
- "Corbeille" lu correctement (docTR lisait "Gorbeille")
- "Google Chrome" en deux mots propres
- Détection complète (RF-DETR + OCR) en 313ms à chaud
- Fallback docTR si EasyOCR non disponible

TitleVerifier : EasyOCR pour le crop titre (fallback docTR)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Dom
2026-04-26 03:32:43 +02:00
parent cc64439738
commit 343d6fbe95
2 changed files with 67 additions and 15 deletions

View File

@@ -137,11 +137,33 @@ class TitleVerifier:
'reason': 'Titre changé' if changed else 'Titre identique (acceptable)',
}
_easyocr_reader = None # Singleton partagé
def _get_ocr(self):
"""Lazy load de la fonction OCR."""
"""Lazy load de la fonction OCR (EasyOCR prioritaire, fallback docTR)."""
if self._ocr_fn is not None:
return self._ocr_fn
# EasyOCR (rapide, bonne qualité GUI)
try:
import easyocr
import numpy as np
if TitleVerifier._easyocr_reader is None:
TitleVerifier._easyocr_reader = easyocr.Reader(
['fr', 'en'], gpu=True, verbose=False
)
def _easyocr_extract_text(img):
results = TitleVerifier._easyocr_reader.readtext(np.array(img))
return ' '.join(r[1] for r in results if r[1].strip())
self._ocr_fn = _easyocr_extract_text
return self._ocr_fn
except ImportError:
pass
# Fallback docTR
try:
import sys
sys.path.insert(0, 'visual_workflow_builder/backend')
@@ -149,10 +171,4 @@ class TitleVerifier:
self._ocr_fn = ocr_extract_text
return self._ocr_fn
except ImportError:
try:
from core.extraction.field_extractor import FieldExtractor
extractor = FieldExtractor()
self._ocr_fn = extractor.extract_text_from_image
return self._ocr_fn
except ImportError:
return None
return None