feat(extraction): extract_dossier_from_image — orchestrateur OCR->VLM->qualite (injectable)
Enchaine ocr_fn -> tokens_from_grid -> map_roles -> assess_quality. OCR et client VLM injectables (testable hors-ligne, import OCR lazy = module reste pur). C'est la brique que le handler runtime extract_dossier appellera. 4 tests (35 au total role_mapper). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -247,3 +247,33 @@ def map_roles(
|
||||
data = parse_vlm_json(raw)
|
||||
vlm_fields = data.get("champs", []) if isinstance(data, dict) else []
|
||||
return reconstruct_fields(tokens, vlm_fields)
|
||||
|
||||
|
||||
def extract_dossier_from_image(
|
||||
image_path: str,
|
||||
vlm_client: VlmClient,
|
||||
roles: Optional[Sequence[str]] = None,
|
||||
ocr_fn: Optional[Callable[[str], Sequence[Sequence[dict]]]] = None,
|
||||
min_confidence: float = 0.6,
|
||||
required_roles: Optional[Sequence[str]] = None,
|
||||
) -> dict:
|
||||
"""Orchestre l'extraction d'un dossier depuis une capture : OCR → rôles → qualité.
|
||||
|
||||
Enchaîne `ocr_fn` (grille OCR) → `tokens_from_grid` → `map_roles` (VLM, ancrage
|
||||
strict) → `assess_quality`. C'est la brique que le handler runtime
|
||||
`_handle_extract_dossier_action` appellera, avec le vrai OCR et le vrai client
|
||||
vLLM. `ocr_fn` et `vlm_client` sont INJECTABLES (testable hors-ligne).
|
||||
|
||||
`ocr_fn` par défaut = `core.llm.ocr_extractor.extract_grid_from_image` (import
|
||||
LAZY : le module reste pur quand l'OCR est injecté en test).
|
||||
|
||||
Returns:
|
||||
{fields: List[MappedField], status: str, n_tokens: int}
|
||||
"""
|
||||
if ocr_fn is None:
|
||||
from core.llm.ocr_extractor import extract_grid_from_image as ocr_fn
|
||||
grid = ocr_fn(image_path)
|
||||
tokens = tokens_from_grid(grid)
|
||||
fields = map_roles(image_path, tokens, vlm_client, roles)
|
||||
status = assess_quality(fields, required_roles=required_roles, min_confidence=min_confidence)
|
||||
return {"fields": fields, "status": status, "n_tokens": len(tokens)}
|
||||
|
||||
Reference in New Issue
Block a user