refactor: réorganisation référentiels, nouveaux modules extraction, nettoyage code obsolète

- Réorganisation data/referentiels/ : pdfs/, dicts/, user/ (structure unifiée)
- Fix badges "Source absente" sur page admin référentiels
- Ré-indexation COCOA 2025 (555 → 1451 chunks, couverture 94%)
- Fix VRAM OOM : embeddings forcés CPU via T2A_EMBED_CPU
- Nouveaux modules : document_router, docx_extractor, image_extractor, ocr_engine
- Module complétude (quality/completude.py + config YAML)
- Template DIM (synthèse dimensionnelle)
- Gunicorn config + systemd service t2a-viewer
- Suppression t2a_install_rag_cleanup/ (copie obsolète)
- Suppression scripts/ et scripts_t2a_v2/ (anciens benchmarks)
- Suppression 81 fichiers _doc.txt de test
- Cache Ollama : TTL configurable, corrections loader YAML
- Dashboard : améliorations templates (base, index, detail, cpam, validation)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
dom
2026-03-07 16:48:10 +01:00
parent 2578afb6ff
commit 4e2b4bd946
210 changed files with 6939 additions and 22104 deletions

View File

@@ -192,7 +192,7 @@ class TestSplitDocuments:
# --- Test intégration process_pdf ---
class TestProcessPdfMulti:
@patch("src.main.extract_text_with_pages")
@patch("src.main.extract_document_with_pages")
@patch("src.main.extract_medical_info")
@patch("src.main._run_edsnlp", return_value=None)
@patch("src.main._use_edsnlp", False)
@@ -203,9 +203,14 @@ class TestProcessPdfMulti:
from src.main import process_pdf
from src.config import DossierMedical, Diagnostic
from src.extraction.page_tracker import PageTracker
from src.extraction.pdf_extractor import ExtractionStats
# Mock extract_text_with_pages retournant un texte multi-épisodes Trackare
mock_extract.return_value = (TRACKARE_MULTI, PageTracker([(0, len(TRACKARE_MULTI))]))
# Mock extract_document_with_pages retournant un texte multi-épisodes Trackare
mock_extract.return_value = (
TRACKARE_MULTI,
PageTracker([(0, len(TRACKARE_MULTI))]),
ExtractionStats(total_pages=1, chars_per_page=[len(TRACKARE_MULTI)], total_chars=len(TRACKARE_MULTI)),
)
# Mock extract_medical_info retournant un DossierMedical minimal
mock_medical.return_value = DossierMedical(