Ajoute une couche de validation post-extraction contre les référentiels
officiels de l'ATIH (Agence Technique de l'Information sur
l'Hospitalisation) pour 2018. Zéro tolérance sur les codes T2A : un
code invalide est flaggé, et une correction par plus proche voisin
(Levenshtein ≤ 1) est proposée.
Contenu :
- pipeline/referentials.py : API publique is_valid_{cim10,ccam,ghm,ghs},
get_cim10_libelle, nearest_cim10, ghm_to_ghs. CLI --build/--test/--stats.
- pipeline/validation.py : annote un JSON d'extraction avec un bloc
`_validation` par page (codes valides/invalides + suggestions + cross-
checks GHM↔GHS).
- referentials/sources/ : données brutes ATIH publiques (CIM-10 ClaML
2019 substitut, CCAM v5 2018, GHM v2018, tarifs fév. 2018).
- referentials/atih_2018.sqlite : base SQLite prête à l'emploi
(11 623 CIM-10 · 8 147 CCAM · 2 593 GHM · 5 329 couples GHM→GHS).
- tests/test_referentials.py : 11 tests unitaires (11/11 passent).
- annotate_validation.py : script qui annote tous les JSONs V2 en
place et produit validation_report.md.
Note CIM-10 : la version 2018 ATIH n'est publiée qu'en PDF, ClaML 2019
est utilisée en substitut (écart connu ≈ 60 codes / 11 600).
Gestion des suffixes PMSI : `*` (CMA exclue par le DP) et `+N`
(extension PMSI) sont strippés avant validation, le code racine seul
est comparé au référentiel.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
284 lines
4.3 KiB
XML
284 lines
4.3 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!ENTITY % rubric.simple "#PCDATA | Reference | Term">
|
|
<!ENTITY % rubric.complex "%rubric.simple; | Para | Include |
|
|
IncludeDescendants| Fragment | List | Table">
|
|
|
|
<!ELEMENT ClaML (
|
|
Meta*,
|
|
Identifier*,
|
|
Title,
|
|
Authors?,
|
|
Variants?,
|
|
ClassKinds,
|
|
UsageKinds?,
|
|
RubricKinds,
|
|
Modifier*,
|
|
ModifierClass*,
|
|
Class*)
|
|
>
|
|
<!ATTLIST ClaML
|
|
version CDATA #REQUIRED
|
|
>
|
|
|
|
<!ELEMENT Variants (Variant+)>
|
|
<!ELEMENT Variant (#PCDATA)>
|
|
<!ATTLIST Variant
|
|
name ID #REQUIRED
|
|
>
|
|
|
|
<!ELEMENT Meta EMPTY>
|
|
<!ATTLIST Meta
|
|
name CDATA #REQUIRED
|
|
value CDATA #REQUIRED
|
|
variants IDREFS #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT Identifier EMPTY>
|
|
<!ATTLIST Identifier
|
|
authority NMTOKEN #IMPLIED
|
|
uid CDATA #REQUIRED
|
|
>
|
|
|
|
<!ELEMENT Title (#PCDATA)>
|
|
<!ATTLIST Title
|
|
name NMTOKEN #REQUIRED
|
|
version CDATA #IMPLIED
|
|
date CDATA #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT Authors (Author* )>
|
|
<!ELEMENT Author (#PCDATA)>
|
|
<!ATTLIST Author
|
|
name ID #REQUIRED
|
|
>
|
|
|
|
<!ELEMENT ClassKinds (ClassKind+)>
|
|
<!ELEMENT RubricKinds (RubricKind+)>
|
|
<!ELEMENT UsageKinds (UsageKind+)>
|
|
|
|
<!ELEMENT ClassKind (Display*)>
|
|
<!ATTLIST ClassKind
|
|
name ID #REQUIRED
|
|
>
|
|
|
|
<!ELEMENT RubricKind (Display*)>
|
|
<!ATTLIST RubricKind
|
|
name ID #REQUIRED
|
|
inherited (true|false) "true"
|
|
>
|
|
|
|
<!ELEMENT UsageKind EMPTY>
|
|
<!ATTLIST UsageKind
|
|
name ID #REQUIRED
|
|
mark CDATA #REQUIRED
|
|
>
|
|
|
|
<!ELEMENT Display (#PCDATA)>
|
|
<!ATTLIST Display
|
|
xml:lang NMTOKEN #REQUIRED
|
|
variants IDREF #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT Modifier (
|
|
Meta*,
|
|
SubClass*,
|
|
Rubric*,
|
|
History*)
|
|
>
|
|
<!ATTLIST Modifier
|
|
code NMTOKEN #REQUIRED
|
|
variants IDREFS #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT ModifierClass (
|
|
Meta*,
|
|
SuperClass,
|
|
SubClass*,
|
|
Rubric*,
|
|
History*)
|
|
>
|
|
<!ATTLIST ModifierClass
|
|
modifier NMTOKEN #REQUIRED
|
|
code NMTOKEN #REQUIRED
|
|
usage IDREF #IMPLIED
|
|
variants IDREFS #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT Class (
|
|
Meta*,
|
|
SuperClass*,
|
|
SubClass*,
|
|
ModifiedBy*,
|
|
ExcludeModifier*,
|
|
Rubric*,
|
|
History*)
|
|
>
|
|
<!ATTLIST Class
|
|
code CDATA #REQUIRED
|
|
kind IDREF #REQUIRED
|
|
usage IDREF #IMPLIED
|
|
variants IDREFS #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT ModifiedBy (
|
|
Meta*,
|
|
ValidModifierClass*)
|
|
>
|
|
<!ATTLIST ModifiedBy
|
|
code NMTOKEN #REQUIRED
|
|
all (true|false) "true"
|
|
position CDATA #IMPLIED
|
|
variants IDREFS #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT ExcludeModifier EMPTY>
|
|
<!ATTLIST ExcludeModifier
|
|
code NMTOKEN #REQUIRED
|
|
variants IDREFS #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT ValidModifierClass EMPTY>
|
|
<!ATTLIST ValidModifierClass
|
|
code NMTOKEN #REQUIRED
|
|
variants IDREFS #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT Rubric (
|
|
Label+,
|
|
History*)
|
|
>
|
|
<!ATTLIST Rubric
|
|
id ID #IMPLIED
|
|
kind IDREF #REQUIRED
|
|
usage IDREF #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT Label (%rubric.complex;)*>
|
|
<!ATTLIST Label
|
|
xml:lang NMTOKEN #REQUIRED
|
|
xml:space (default|preserve) "default"
|
|
variants IDREFS #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT History (#PCDATA)>
|
|
<!ATTLIST History
|
|
author IDREF #REQUIRED
|
|
date NMTOKEN #REQUIRED
|
|
>
|
|
|
|
<!ELEMENT SuperClass EMPTY>
|
|
<!ATTLIST SuperClass
|
|
code CDATA #REQUIRED
|
|
variants IDREFS #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT SubClass EMPTY>
|
|
<!ATTLIST SubClass
|
|
code CDATA #REQUIRED
|
|
variants IDREFS #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT Reference (#PCDATA)>
|
|
<!ATTLIST Reference
|
|
class CDATA #IMPLIED
|
|
authority NMTOKEN #IMPLIED
|
|
uid NMTOKEN #IMPLIED
|
|
code CDATA #IMPLIED
|
|
usage IDREF #IMPLIED
|
|
variants IDREFS #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT Para (%rubric.simple;)*>
|
|
<!ATTLIST Para
|
|
class CDATA #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT Fragment (%rubric.simple;)*>
|
|
<!ATTLIST Fragment
|
|
class CDATA #IMPLIED
|
|
usage IDREF #IMPLIED
|
|
type (item | list) "item"
|
|
>
|
|
|
|
<!ELEMENT Include EMPTY>
|
|
<!ATTLIST Include
|
|
class CDATA #IMPLIED
|
|
rubric IDREF #REQUIRED
|
|
>
|
|
|
|
<!ELEMENT IncludeDescendants EMPTY>
|
|
<!ATTLIST IncludeDescendants
|
|
code NMTOKEN #REQUIRED
|
|
kind IDREF #REQUIRED
|
|
>
|
|
|
|
<!ELEMENT List (ListItem+)>
|
|
<!ATTLIST List
|
|
class CDATA #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT ListItem (
|
|
%rubric.simple;
|
|
| Para
|
|
| Include
|
|
| List
|
|
| Table)*
|
|
>
|
|
<!ATTLIST ListItem
|
|
class CDATA #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT Table (
|
|
Caption?,
|
|
THead?,
|
|
TBody?,
|
|
TFoot?)
|
|
>
|
|
<!ATTLIST Table
|
|
class CDATA #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT Caption (%rubric.simple;)*>
|
|
<!ATTLIST Caption
|
|
class CDATA #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT THead (Row+)>
|
|
<!ATTLIST THead
|
|
class CDATA #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT TBody (Row+)>
|
|
<!ATTLIST TBody
|
|
class CDATA #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT TFoot (Row+)>
|
|
<!ATTLIST TFoot
|
|
class CDATA #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT Row (Cell*)>
|
|
<!ATTLIST Row
|
|
class CDATA #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT Cell (
|
|
%rubric.simple;
|
|
| Para
|
|
| Include
|
|
| List
|
|
| Table)*
|
|
>
|
|
<!ATTLIST Cell
|
|
class CDATA #IMPLIED
|
|
rowspan CDATA #IMPLIED
|
|
colspan CDATA #IMPLIED
|
|
>
|
|
|
|
<!ELEMENT Term (#PCDATA)>
|
|
<!ATTLIST Term
|
|
class CDATA #IMPLIED
|
|
>
|
|
|