tests: alias DLBCL + garde-fou Trackare + e2e PDFs réels + gold CRH + benchmark enrichi

- 11 tests unitaires : TestAliasAndConclusionBonus (7) + TestTrackareSymptomGuard (4)
- Tests e2e sur PDFs réels (skip si absent) : méningite A87.0 + DLBCL C83.3 top1
- Gold CRH enrichi : 5 cas (2 réels ajoutés : 115_23066188, 132_23080179)
- Benchmark synthese : récupération conclusion depuis source_excerpt des DAS/traitements
- .gitignore : protection anti-PHI (real_crh_pdfs/, data/crh_samples/*.pdf)
- docs/PHI_POLICY.md : 7 règles de sécurité PHI
- Rapports debug : case 132 REVIEW (garde-fou actif), top errors, DIM pack

1043 tests passent, 0 régression.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
dom
2026-02-24 14:35:57 +01:00
parent 06a1be5425
commit cad0dd22b1
16 changed files with 1513 additions and 11 deletions

View File

@@ -0,0 +1,21 @@
case_id,document_type,chosen_code,chosen_term,verdict,confidence,dp_expected_code,dp_expected_label,dp_acceptable_codes,dp_acceptable_family3,allow_symptom_dp,confidence_gold,notes
132_23080179,trackare,R59.0,Adénopathie,REVIEW,medium,C83.3,Lymphome diffus à grandes cellules B,,C83,False,probable,
74_23141536,crh,D50,Anémie,REVIEW,medium,I25.1,Syndrome coronarien aigu,I25.1|I25.5,I25,False,probable,
99_23033146,trackare,E66.83,Obésité (IMC 30.408),REVIEW,medium,,,,,,,
106_23056475,trackare,I26.9,Embolie pulmonaire,REVIEW,medium,I26.9,Embolie pulmonaire,I26.0|I26.9,I26,False,certain,
111_23061304,trackare,N19,Insuffisance rénale,REVIEW,medium,,,,,,,
112_23065936,trackare,I25.5,Cardiopathie ischémique,REVIEW,medium,,,,,,,
120_23033508,trackare,N85.7,Hématome,REVIEW,medium,,,,,,,
139_23087691,trackare,M16.7,Coxarthrose,REVIEW,medium,,,,,,,
140_23090475,trackare,Z54.8,Convalescence,REVIEW,medium,,,,,,,
149_23089771,trackare,H16.0,C omprend décollement de la (de la) : • conjonctive,REVIEW,medium,,,,,,,
153_23102610,trackare,T83.5,Infection urinaire,REVIEW,medium,,,,,,,
159_23107113,trackare,I26.9,Embolie pulmonaire,REVIEW,medium,,,,,,,
160_23099448,trackare,E88.1,Lipodystrophie,REVIEW,medium,,,,,,,
170_23077016,trackare,K59.0,Constipation,REVIEW,medium,,,,,,,
174_23080042,trackare,Q40.1,Hernie hiatale ce,REVIEW,medium,,,,,,,
183_23087212,trackare,T83.5,Infection urinaire,REVIEW,medium,,,,,,,
192_23132490,trackare,D50,Anémie,REVIEW,medium,,,,,,,
200_23149959,trackare,I80.2,Thrombose veineuse profonde,REVIEW,medium,,,,,,,
225_23160703,trackare,N85.7,Hématome,REVIEW,medium,,,,,,,
25_23127187,trackare,N19,Insuffisance rénale,REVIEW,medium,,,,,,,
1 case_id document_type chosen_code chosen_term verdict confidence dp_expected_code dp_expected_label dp_acceptable_codes dp_acceptable_family3 allow_symptom_dp confidence_gold notes
2 132_23080179 trackare R59.0 Adénopathie REVIEW medium C83.3 Lymphome diffus à grandes cellules B C83 False probable
3 74_23141536 crh D50 Anémie REVIEW medium I25.1 Syndrome coronarien aigu I25.1|I25.5 I25 False probable
4 99_23033146 trackare E66.83 Obésité (IMC 30.408) REVIEW medium
5 106_23056475 trackare I26.9 Embolie pulmonaire REVIEW medium I26.9 Embolie pulmonaire I26.0|I26.9 I26 False certain
6 111_23061304 trackare N19 Insuffisance rénale REVIEW medium
7 112_23065936 trackare I25.5 Cardiopathie ischémique REVIEW medium
8 120_23033508 trackare N85.7 Hématome REVIEW medium
9 139_23087691 trackare M16.7 Coxarthrose REVIEW medium
10 140_23090475 trackare Z54.8 Convalescence REVIEW medium
11 149_23089771 trackare H16.0 C omprend décollement de la (de la) : • conjonctive REVIEW medium
12 153_23102610 trackare T83.5 Infection urinaire REVIEW medium
13 159_23107113 trackare I26.9 Embolie pulmonaire REVIEW medium
14 160_23099448 trackare E88.1 Lipodystrophie REVIEW medium
15 170_23077016 trackare K59.0 Constipation REVIEW medium
16 174_23080042 trackare Q40.1 Hernie hiatale ce REVIEW medium
17 183_23087212 trackare T83.5 Infection urinaire REVIEW medium
18 192_23132490 trackare D50 Anémie REVIEW medium
19 200_23149959 trackare I80.2 Thrombose veineuse profonde REVIEW medium
20 225_23160703 trackare N85.7 Hématome REVIEW medium
21 25_23127187 trackare N19 Insuffisance rénale REVIEW medium

View File

@@ -0,0 +1,6 @@
case_id,document_type,chosen_code,chosen_term,verdict,confidence,expected_code,acceptable_codes,acceptable_family3,strict_match,acceptable_match,family3_match,symptom_not_allowed,raw_pool_size,filtered_pool_size,topk_size,evidence_count,review_reason_tag,top1_score,top2_score,delta_top1_top2,top3_codes,top3_terms
132_23080179,trackare,R59.0,Adénopathie,REVIEW,medium,C83.3,,C83,False,False,False,True,23,0,0,2,other,0,0,0,,
74_23141536,crh,D50,Anémie,REVIEW,medium,I25.1,I25.1|I25.5,I25,False,False,False,False,3,3,3,1,low_delta,4.0,4.0,0.0,D50|I25.1|Z95.5,Anémie|SCA (Syndrome Coronarien Aigu)|Stent vasculaire
115_23066188,trackare,A87.0,Méningite à entérovirus,CONFIRMED,high,A87.0,,A87,True,True,True,False,6,0,0,1,other,0,0,0,,
106_23056475,trackare,I26.9,Embolie pulmonaire,REVIEW,medium,I26.9,I26.0|I26.9,I26,True,True,True,False,10,7,7,1,low_delta,6.0,5.0,1.0,I26.9|I26.9|Q53.9,Embolie pulmonaire|Embolie pulmonaire|Cryptorchidie
73_23139637,trackare,R06.0,Dyspnée,REVIEW,medium,R06.0,,R06,True,True,True,False,1,1,1,1,mono_fragile,1.0,0,1.0,R06.0,Dyspnée
1 case_id document_type chosen_code chosen_term verdict confidence expected_code acceptable_codes acceptable_family3 strict_match acceptable_match family3_match symptom_not_allowed raw_pool_size filtered_pool_size topk_size evidence_count review_reason_tag top1_score top2_score delta_top1_top2 top3_codes top3_terms
2 132_23080179 trackare R59.0 Adénopathie REVIEW medium C83.3 C83 False False False True 23 0 0 2 other 0 0 0
3 74_23141536 crh D50 Anémie REVIEW medium I25.1 I25.1|I25.5 I25 False False False False 3 3 3 1 low_delta 4.0 4.0 0.0 D50|I25.1|Z95.5 Anémie|SCA (Syndrome Coronarien Aigu)|Stent vasculaire
4 115_23066188 trackare A87.0 Méningite à entérovirus CONFIRMED high A87.0 A87 True True True False 6 0 0 1 other 0 0 0
5 106_23056475 trackare I26.9 Embolie pulmonaire REVIEW medium I26.9 I26.0|I26.9 I26 True True True False 10 7 7 1 low_delta 6.0 5.0 1.0 I26.9|I26.9|Q53.9 Embolie pulmonaire|Embolie pulmonaire|Cryptorchidie
6 73_23139637 trackare R06.0 Dyspnée REVIEW medium R06.0 R06 True True True False 1 1 1 1 mono_fragile 1.0 0 1.0 R06.0 Dyspnée

View File

@@ -0,0 +1,5 @@
{"case_id": "132_23080179", "document_type": "trackare", "chosen_code": "R59.0", "chosen_term": "Adénopathie", "verdict": "REVIEW", "confidence": "medium", "expected_code": "C83.3", "acceptable_codes": "", "acceptable_family3": "C83", "strict_match": false, "acceptable_match": false, "family3_match": false, "symptom_not_allowed": true, "raw_pool_size": 23, "filtered_pool_size": 0, "topk_size": 0, "evidence_count": 2, "review_reason_tag": "other", "top1_score": 0, "top2_score": 0, "delta_top1_top2": 0, "top3_codes": "", "top3_terms": ""}
{"case_id": "74_23141536", "document_type": "crh", "chosen_code": "D50", "chosen_term": "Anémie", "verdict": "REVIEW", "confidence": "medium", "expected_code": "I25.1", "acceptable_codes": "I25.1|I25.5", "acceptable_family3": "I25", "strict_match": false, "acceptable_match": false, "family3_match": false, "symptom_not_allowed": false, "raw_pool_size": 3, "filtered_pool_size": 3, "topk_size": 3, "evidence_count": 1, "review_reason_tag": "low_delta", "top1_score": 4.0, "top2_score": 4.0, "delta_top1_top2": 0.0, "top3_codes": "D50|I25.1|Z95.5", "top3_terms": "Anémie|SCA (Syndrome Coronarien Aigu)|Stent vasculaire"}
{"case_id": "115_23066188", "document_type": "trackare", "chosen_code": "A87.0", "chosen_term": "Méningite à entérovirus", "verdict": "CONFIRMED", "confidence": "high", "expected_code": "A87.0", "acceptable_codes": "", "acceptable_family3": "A87", "strict_match": true, "acceptable_match": true, "family3_match": true, "symptom_not_allowed": false, "raw_pool_size": 6, "filtered_pool_size": 0, "topk_size": 0, "evidence_count": 1, "review_reason_tag": "other", "top1_score": 0, "top2_score": 0, "delta_top1_top2": 0, "top3_codes": "", "top3_terms": ""}
{"case_id": "106_23056475", "document_type": "trackare", "chosen_code": "I26.9", "chosen_term": "Embolie pulmonaire", "verdict": "REVIEW", "confidence": "medium", "expected_code": "I26.9", "acceptable_codes": "I26.0|I26.9", "acceptable_family3": "I26", "strict_match": true, "acceptable_match": true, "family3_match": true, "symptom_not_allowed": false, "raw_pool_size": 10, "filtered_pool_size": 7, "topk_size": 7, "evidence_count": 1, "review_reason_tag": "low_delta", "top1_score": 6.0, "top2_score": 5.0, "delta_top1_top2": 1.0, "top3_codes": "I26.9|I26.9|Q53.9", "top3_terms": "Embolie pulmonaire|Embolie pulmonaire|Cryptorchidie"}
{"case_id": "73_23139637", "document_type": "trackare", "chosen_code": "R06.0", "chosen_term": "Dyspnée", "verdict": "REVIEW", "confidence": "medium", "expected_code": "R06.0", "acceptable_codes": "", "acceptable_family3": "R06", "strict_match": true, "acceptable_match": true, "family3_match": true, "symptom_not_allowed": false, "raw_pool_size": 1, "filtered_pool_size": 1, "topk_size": 1, "evidence_count": 1, "review_reason_tag": "mono_fragile", "top1_score": 1.0, "top2_score": 0, "delta_top1_top2": 1.0, "top3_codes": "R06.0", "top3_terms": "Dyspnée"}

View File

@@ -0,0 +1,15 @@
# NUKE-3 — Top erreurs gold CRH
**Date** : 2026-02-24 14:34
**Cas** : 5
| # | Case ID | Choisi | Attendu | Strict | Accept. | Verdict | Conf. | Delta | Reason |
|---|---------|--------|---------|--------|---------|---------|-------|-------|--------|
| 1 | 132_23080179 | R59.0 | C83.3 | FAIL | FAIL | REVIEW | medium | 0 | other |
| 2 | 74_23141536 | D50 | I25.1 | FAIL | FAIL | REVIEW | medium | 0.0 | low_delta |
| 3 | 115_23066188 | A87.0 | A87.0 | OK | OK | CONFIRMED | high | 0 | other |
| 4 | 106_23056475 | I26.9 | I26.9 | OK | OK | REVIEW | medium | 1.0 | low_delta |
| 5 | 73_23139637 | R06.0 | R06.0 | OK | OK | REVIEW | medium | 1.0 | mono_fragile |
---
*Généré le 2026-02-24 14:34*

View File

@@ -0,0 +1,40 @@
{
"case_id": "115_23066188",
"document_type": "trackare",
"gold": {
"dp_expected": {
"code": "A87.0",
"label": "Méningite à entérovirus"
},
"dp_acceptable_codes": [],
"dp_acceptable_family3": [
"A87"
],
"allow_symptom_dp": false,
"confidence": "probable"
},
"prediction": {
"chosen_code": "A87.0",
"chosen_term": "Méningite à entérovirus",
"verdict": "CONFIRMED",
"confidence": "high",
"reason": "DP Trackare — source d'autorité",
"review_reason_tag": "other",
"evidence": [
"Source: Trackare (codage établissement)"
],
"evidence_count": 1
},
"pool_stats": {
"raw_pool_size": 6,
"filtered_pool_size": 0,
"topk_size": 0
},
"top_candidates": [],
"match_eval": {
"strict_match": true,
"acceptable_match": true,
"family3_match": true,
"symptom_not_allowed": false
}
}

View File

@@ -0,0 +1,39 @@
# Case Debug — 115_23066188
**Type** : trackare
**Verdict** : CONFIRMED
**Confidence** : high
**Code choisi** : A87.0
**Reason** : DP Trackare — source d'autorité
**Evidence** : 1 extrait(s)
**Pool** : 6 raw → 0 candidats
**DP attendu** : A87.0 (Méningite à entérovirus)
**Confiance gold** : probable
**Match** : strict=OK, acceptable=OK, symptôme interdit=-
## Gold vs Prediction
| | Gold | NUKE-3 |
|---|------|--------|
| Code | A87.0 | A87.0 |
| Label | Méningite à entérovirus | Méningite à entérovirus |
| Codes acceptables | - | - |
| Family3 | A87 | - |
| Confiance | probable | high |
| Symptôme autorisé | non | - |
## Top candidats
| Rank | Code | Score | Term | Flags | Section |
|------|------|-------|------|-------|---------|
## Evidence
1. Source: Trackare (codage établissement)
## Hypothèse bug
**Pool vide** — aucun candidat DP n'a été extrait. Vérifier l'extraction CIM-10 sur ce document.
---
*Généré le 2026-02-24 14:00*

View File

@@ -0,0 +1,41 @@
{
"case_id": "132_23080179",
"document_type": "trackare",
"gold": {
"dp_expected": {
"code": "C83.3",
"label": "Lymphome diffus à grandes cellules B"
},
"dp_acceptable_codes": [],
"dp_acceptable_family3": [
"C83"
],
"allow_symptom_dp": false,
"confidence": "probable"
},
"prediction": {
"chosen_code": "R59.0",
"chosen_term": "Adénopathie",
"verdict": "REVIEW",
"confidence": "medium",
"reason": "Trackare symptôme vs CRH diagnostic — vérification DIM requise",
"review_reason_tag": "other",
"evidence": [
"Source: Trackare (codage établissement)",
"Alerte: Trackare code un symptôme (R*) mais le CRH mentionne un diagnostic étiologique"
],
"evidence_count": 2
},
"pool_stats": {
"raw_pool_size": 23,
"filtered_pool_size": 0,
"topk_size": 0
},
"top_candidates": [],
"match_eval": {
"strict_match": false,
"acceptable_match": false,
"family3_match": false,
"symptom_not_allowed": true
}
}

View File

@@ -0,0 +1,40 @@
# Case Debug — 132_23080179
**Type** : trackare
**Verdict** : REVIEW
**Confidence** : medium
**Code choisi** : R59.0
**Reason** : Trackare symptôme vs CRH diagnostic — vérification DIM requise
**Evidence** : 2 extrait(s)
**Pool** : 23 raw → 0 candidats
**DP attendu** : C83.3 (Lymphome diffus à grandes cellules B)
**Confiance gold** : probable
**Match** : strict=FAIL, acceptable=FAIL, symptôme interdit=OUI
## Gold vs Prediction
| | Gold | NUKE-3 |
|---|------|--------|
| Code | C83.3 | R59.0 |
| Label | Lymphome diffus à grandes cellules B | Adénopathie |
| Codes acceptables | - | - |
| Family3 | C83 | - |
| Confiance | probable | medium |
| Symptôme autorisé | non | - |
## Top candidats
| Rank | Code | Score | Term | Flags | Section |
|------|------|-------|------|-------|---------|
## Evidence
1. Source: Trackare (codage établissement)
2. Alerte: Trackare code un symptôme (R*) mais le CRH mentionne un diagnostic étiologique
## Hypothèse bug
**Pool vide** — aucun candidat DP n'a été extrait. Vérifier l'extraction CIM-10 sur ce document.
---
*Généré le 2026-02-24 14:33*