feat(referentials): validation ATIH 2018 des codes médicaux
Ajoute une couche de validation post-extraction contre les référentiels
officiels de l'ATIH (Agence Technique de l'Information sur
l'Hospitalisation) pour 2018. Zéro tolérance sur les codes T2A : un
code invalide est flaggé, et une correction par plus proche voisin
(Levenshtein ≤ 1) est proposée.
Contenu :
- pipeline/referentials.py : API publique is_valid_{cim10,ccam,ghm,ghs},
get_cim10_libelle, nearest_cim10, ghm_to_ghs. CLI --build/--test/--stats.
- pipeline/validation.py : annote un JSON d'extraction avec un bloc
`_validation` par page (codes valides/invalides + suggestions + cross-
checks GHM↔GHS).
- referentials/sources/ : données brutes ATIH publiques (CIM-10 ClaML
2019 substitut, CCAM v5 2018, GHM v2018, tarifs fév. 2018).
- referentials/atih_2018.sqlite : base SQLite prête à l'emploi
(11 623 CIM-10 · 8 147 CCAM · 2 593 GHM · 5 329 couples GHM→GHS).
- tests/test_referentials.py : 11 tests unitaires (11/11 passent).
- annotate_validation.py : script qui annote tous les JSONs V2 en
place et produit validation_report.md.
Note CIM-10 : la version 2018 ATIH n'est publiée qu'en PDF, ClaML 2019
est utilisée en substitut (écart connu ≈ 60 codes / 11 600).
Gestion des suffixes PMSI : `*` (CMA exclue par le DP) et `+N`
(extension PMSI) sont strippés avant validation, le code racine seul
est comparé au référentiel.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
BIN
referentials/atih_2018.sqlite
Normal file
BIN
referentials/atih_2018.sqlite
Normal file
Binary file not shown.
BIN
referentials/sources/ccam_2018_v5.xlsx
Normal file
BIN
referentials/sources/ccam_2018_v5.xlsx
Normal file
Binary file not shown.
BIN
referentials/sources/cim.json.gz
Normal file
BIN
referentials/sources/cim.json.gz
Normal file
Binary file not shown.
BIN
referentials/sources/cim10_claml_2019.zip
Normal file
BIN
referentials/sources/cim10_claml_2019.zip
Normal file
Binary file not shown.
283
referentials/sources/cim10_claml_2019_extracted/ClaML.dtd
Normal file
283
referentials/sources/cim10_claml_2019_extracted/ClaML.dtd
Normal file
@@ -0,0 +1,283 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!ENTITY % rubric.simple "#PCDATA | Reference | Term">
|
||||
<!ENTITY % rubric.complex "%rubric.simple; | Para | Include |
|
||||
IncludeDescendants| Fragment | List | Table">
|
||||
|
||||
<!ELEMENT ClaML (
|
||||
Meta*,
|
||||
Identifier*,
|
||||
Title,
|
||||
Authors?,
|
||||
Variants?,
|
||||
ClassKinds,
|
||||
UsageKinds?,
|
||||
RubricKinds,
|
||||
Modifier*,
|
||||
ModifierClass*,
|
||||
Class*)
|
||||
>
|
||||
<!ATTLIST ClaML
|
||||
version CDATA #REQUIRED
|
||||
>
|
||||
|
||||
<!ELEMENT Variants (Variant+)>
|
||||
<!ELEMENT Variant (#PCDATA)>
|
||||
<!ATTLIST Variant
|
||||
name ID #REQUIRED
|
||||
>
|
||||
|
||||
<!ELEMENT Meta EMPTY>
|
||||
<!ATTLIST Meta
|
||||
name CDATA #REQUIRED
|
||||
value CDATA #REQUIRED
|
||||
variants IDREFS #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT Identifier EMPTY>
|
||||
<!ATTLIST Identifier
|
||||
authority NMTOKEN #IMPLIED
|
||||
uid CDATA #REQUIRED
|
||||
>
|
||||
|
||||
<!ELEMENT Title (#PCDATA)>
|
||||
<!ATTLIST Title
|
||||
name NMTOKEN #REQUIRED
|
||||
version CDATA #IMPLIED
|
||||
date CDATA #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT Authors (Author* )>
|
||||
<!ELEMENT Author (#PCDATA)>
|
||||
<!ATTLIST Author
|
||||
name ID #REQUIRED
|
||||
>
|
||||
|
||||
<!ELEMENT ClassKinds (ClassKind+)>
|
||||
<!ELEMENT RubricKinds (RubricKind+)>
|
||||
<!ELEMENT UsageKinds (UsageKind+)>
|
||||
|
||||
<!ELEMENT ClassKind (Display*)>
|
||||
<!ATTLIST ClassKind
|
||||
name ID #REQUIRED
|
||||
>
|
||||
|
||||
<!ELEMENT RubricKind (Display*)>
|
||||
<!ATTLIST RubricKind
|
||||
name ID #REQUIRED
|
||||
inherited (true|false) "true"
|
||||
>
|
||||
|
||||
<!ELEMENT UsageKind EMPTY>
|
||||
<!ATTLIST UsageKind
|
||||
name ID #REQUIRED
|
||||
mark CDATA #REQUIRED
|
||||
>
|
||||
|
||||
<!ELEMENT Display (#PCDATA)>
|
||||
<!ATTLIST Display
|
||||
xml:lang NMTOKEN #REQUIRED
|
||||
variants IDREF #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT Modifier (
|
||||
Meta*,
|
||||
SubClass*,
|
||||
Rubric*,
|
||||
History*)
|
||||
>
|
||||
<!ATTLIST Modifier
|
||||
code NMTOKEN #REQUIRED
|
||||
variants IDREFS #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT ModifierClass (
|
||||
Meta*,
|
||||
SuperClass,
|
||||
SubClass*,
|
||||
Rubric*,
|
||||
History*)
|
||||
>
|
||||
<!ATTLIST ModifierClass
|
||||
modifier NMTOKEN #REQUIRED
|
||||
code NMTOKEN #REQUIRED
|
||||
usage IDREF #IMPLIED
|
||||
variants IDREFS #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT Class (
|
||||
Meta*,
|
||||
SuperClass*,
|
||||
SubClass*,
|
||||
ModifiedBy*,
|
||||
ExcludeModifier*,
|
||||
Rubric*,
|
||||
History*)
|
||||
>
|
||||
<!ATTLIST Class
|
||||
code CDATA #REQUIRED
|
||||
kind IDREF #REQUIRED
|
||||
usage IDREF #IMPLIED
|
||||
variants IDREFS #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT ModifiedBy (
|
||||
Meta*,
|
||||
ValidModifierClass*)
|
||||
>
|
||||
<!ATTLIST ModifiedBy
|
||||
code NMTOKEN #REQUIRED
|
||||
all (true|false) "true"
|
||||
position CDATA #IMPLIED
|
||||
variants IDREFS #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT ExcludeModifier EMPTY>
|
||||
<!ATTLIST ExcludeModifier
|
||||
code NMTOKEN #REQUIRED
|
||||
variants IDREFS #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT ValidModifierClass EMPTY>
|
||||
<!ATTLIST ValidModifierClass
|
||||
code NMTOKEN #REQUIRED
|
||||
variants IDREFS #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT Rubric (
|
||||
Label+,
|
||||
History*)
|
||||
>
|
||||
<!ATTLIST Rubric
|
||||
id ID #IMPLIED
|
||||
kind IDREF #REQUIRED
|
||||
usage IDREF #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT Label (%rubric.complex;)*>
|
||||
<!ATTLIST Label
|
||||
xml:lang NMTOKEN #REQUIRED
|
||||
xml:space (default|preserve) "default"
|
||||
variants IDREFS #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT History (#PCDATA)>
|
||||
<!ATTLIST History
|
||||
author IDREF #REQUIRED
|
||||
date NMTOKEN #REQUIRED
|
||||
>
|
||||
|
||||
<!ELEMENT SuperClass EMPTY>
|
||||
<!ATTLIST SuperClass
|
||||
code CDATA #REQUIRED
|
||||
variants IDREFS #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT SubClass EMPTY>
|
||||
<!ATTLIST SubClass
|
||||
code CDATA #REQUIRED
|
||||
variants IDREFS #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT Reference (#PCDATA)>
|
||||
<!ATTLIST Reference
|
||||
class CDATA #IMPLIED
|
||||
authority NMTOKEN #IMPLIED
|
||||
uid NMTOKEN #IMPLIED
|
||||
code CDATA #IMPLIED
|
||||
usage IDREF #IMPLIED
|
||||
variants IDREFS #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT Para (%rubric.simple;)*>
|
||||
<!ATTLIST Para
|
||||
class CDATA #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT Fragment (%rubric.simple;)*>
|
||||
<!ATTLIST Fragment
|
||||
class CDATA #IMPLIED
|
||||
usage IDREF #IMPLIED
|
||||
type (item | list) "item"
|
||||
>
|
||||
|
||||
<!ELEMENT Include EMPTY>
|
||||
<!ATTLIST Include
|
||||
class CDATA #IMPLIED
|
||||
rubric IDREF #REQUIRED
|
||||
>
|
||||
|
||||
<!ELEMENT IncludeDescendants EMPTY>
|
||||
<!ATTLIST IncludeDescendants
|
||||
code NMTOKEN #REQUIRED
|
||||
kind IDREF #REQUIRED
|
||||
>
|
||||
|
||||
<!ELEMENT List (ListItem+)>
|
||||
<!ATTLIST List
|
||||
class CDATA #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT ListItem (
|
||||
%rubric.simple;
|
||||
| Para
|
||||
| Include
|
||||
| List
|
||||
| Table)*
|
||||
>
|
||||
<!ATTLIST ListItem
|
||||
class CDATA #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT Table (
|
||||
Caption?,
|
||||
THead?,
|
||||
TBody?,
|
||||
TFoot?)
|
||||
>
|
||||
<!ATTLIST Table
|
||||
class CDATA #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT Caption (%rubric.simple;)*>
|
||||
<!ATTLIST Caption
|
||||
class CDATA #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT THead (Row+)>
|
||||
<!ATTLIST THead
|
||||
class CDATA #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT TBody (Row+)>
|
||||
<!ATTLIST TBody
|
||||
class CDATA #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT TFoot (Row+)>
|
||||
<!ATTLIST TFoot
|
||||
class CDATA #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT Row (Cell*)>
|
||||
<!ATTLIST Row
|
||||
class CDATA #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT Cell (
|
||||
%rubric.simple;
|
||||
| Para
|
||||
| Include
|
||||
| List
|
||||
| Table)*
|
||||
>
|
||||
<!ATTLIST Cell
|
||||
class CDATA #IMPLIED
|
||||
rowspan CDATA #IMPLIED
|
||||
colspan CDATA #IMPLIED
|
||||
>
|
||||
|
||||
<!ELEMENT Term (#PCDATA)>
|
||||
<!ATTLIST Term
|
||||
class CDATA #IMPLIED
|
||||
>
|
||||
|
||||
154659
referentials/sources/cim10_claml_2019_extracted/cim10_claml_2019.xml
Normal file
154659
referentials/sources/cim10_claml_2019_extracted/cim10_claml_2019.xml
Normal file
File diff suppressed because it is too large
Load Diff
BIN
referentials/sources/cim_libelle.json.gz
Normal file
BIN
referentials/sources/cim_libelle.json.gz
Normal file
Binary file not shown.
BIN
referentials/sources/ghm_intermediaire.json.gz
Normal file
BIN
referentials/sources/ghm_intermediaire.json.gz
Normal file
Binary file not shown.
BIN
referentials/sources/ghs_prive.json.gz
Normal file
BIN
referentials/sources/ghs_prive.json.gz
Normal file
Binary file not shown.
BIN
referentials/sources/ghs_public.json.gz
Normal file
BIN
referentials/sources/ghs_public.json.gz
Normal file
Binary file not shown.
BIN
referentials/sources/regroupement_ghm_v2018.xlsx
Normal file
BIN
referentials/sources/regroupement_ghm_v2018.xlsx
Normal file
Binary file not shown.
BIN
referentials/sources/tarif_arrete_fev_2018.xlsx
Normal file
BIN
referentials/sources/tarif_arrete_fev_2018.xlsx
Normal file
Binary file not shown.
Reference in New Issue
Block a user