Files
Aivanov_scan_ogc/referentials/sources/cim10_claml_2019_extracted/ClaML.dtd
Dom 6df590ae95 feat(referentials): validation ATIH 2018 des codes médicaux
Ajoute une couche de validation post-extraction contre les référentiels
officiels de l'ATIH (Agence Technique de l'Information sur
l'Hospitalisation) pour 2018. Zéro tolérance sur les codes T2A : un
code invalide est flaggé, et une correction par plus proche voisin
(Levenshtein ≤ 1) est proposée.

Contenu :
- pipeline/referentials.py : API publique is_valid_{cim10,ccam,ghm,ghs},
  get_cim10_libelle, nearest_cim10, ghm_to_ghs. CLI --build/--test/--stats.
- pipeline/validation.py    : annote un JSON d'extraction avec un bloc
  `_validation` par page (codes valides/invalides + suggestions + cross-
  checks GHM↔GHS).
- referentials/sources/     : données brutes ATIH publiques (CIM-10 ClaML
  2019 substitut, CCAM v5 2018, GHM v2018, tarifs fév. 2018).
- referentials/atih_2018.sqlite : base SQLite prête à l'emploi
  (11 623 CIM-10 · 8 147 CCAM · 2 593 GHM · 5 329 couples GHM→GHS).
- tests/test_referentials.py : 11 tests unitaires (11/11 passent).
- annotate_validation.py    : script qui annote tous les JSONs V2 en
  place et produit validation_report.md.

Note CIM-10 : la version 2018 ATIH n'est publiée qu'en PDF, ClaML 2019
est utilisée en substitut (écart connu ≈ 60 codes / 11 600).

Gestion des suffixes PMSI : `*` (CMA exclue par le DP) et `+N`
(extension PMSI) sont strippés avant validation, le code racine seul
est comparé au référentiel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:06:01 +02:00

284 lines
4.3 KiB
XML

<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY % rubric.simple "#PCDATA | Reference | Term">
<!ENTITY % rubric.complex "%rubric.simple; | Para | Include |
IncludeDescendants| Fragment | List | Table">
<!ELEMENT ClaML (
Meta*,
Identifier*,
Title,
Authors?,
Variants?,
ClassKinds,
UsageKinds?,
RubricKinds,
Modifier*,
ModifierClass*,
Class*)
>
<!ATTLIST ClaML
version CDATA #REQUIRED
>
<!ELEMENT Variants (Variant+)>
<!ELEMENT Variant (#PCDATA)>
<!ATTLIST Variant
name ID #REQUIRED
>
<!ELEMENT Meta EMPTY>
<!ATTLIST Meta
name CDATA #REQUIRED
value CDATA #REQUIRED
variants IDREFS #IMPLIED
>
<!ELEMENT Identifier EMPTY>
<!ATTLIST Identifier
authority NMTOKEN #IMPLIED
uid CDATA #REQUIRED
>
<!ELEMENT Title (#PCDATA)>
<!ATTLIST Title
name NMTOKEN #REQUIRED
version CDATA #IMPLIED
date CDATA #IMPLIED
>
<!ELEMENT Authors (Author* )>
<!ELEMENT Author (#PCDATA)>
<!ATTLIST Author
name ID #REQUIRED
>
<!ELEMENT ClassKinds (ClassKind+)>
<!ELEMENT RubricKinds (RubricKind+)>
<!ELEMENT UsageKinds (UsageKind+)>
<!ELEMENT ClassKind (Display*)>
<!ATTLIST ClassKind
name ID #REQUIRED
>
<!ELEMENT RubricKind (Display*)>
<!ATTLIST RubricKind
name ID #REQUIRED
inherited (true|false) "true"
>
<!ELEMENT UsageKind EMPTY>
<!ATTLIST UsageKind
name ID #REQUIRED
mark CDATA #REQUIRED
>
<!ELEMENT Display (#PCDATA)>
<!ATTLIST Display
xml:lang NMTOKEN #REQUIRED
variants IDREF #IMPLIED
>
<!ELEMENT Modifier (
Meta*,
SubClass*,
Rubric*,
History*)
>
<!ATTLIST Modifier
code NMTOKEN #REQUIRED
variants IDREFS #IMPLIED
>
<!ELEMENT ModifierClass (
Meta*,
SuperClass,
SubClass*,
Rubric*,
History*)
>
<!ATTLIST ModifierClass
modifier NMTOKEN #REQUIRED
code NMTOKEN #REQUIRED
usage IDREF #IMPLIED
variants IDREFS #IMPLIED
>
<!ELEMENT Class (
Meta*,
SuperClass*,
SubClass*,
ModifiedBy*,
ExcludeModifier*,
Rubric*,
History*)
>
<!ATTLIST Class
code CDATA #REQUIRED
kind IDREF #REQUIRED
usage IDREF #IMPLIED
variants IDREFS #IMPLIED
>
<!ELEMENT ModifiedBy (
Meta*,
ValidModifierClass*)
>
<!ATTLIST ModifiedBy
code NMTOKEN #REQUIRED
all (true|false) "true"
position CDATA #IMPLIED
variants IDREFS #IMPLIED
>
<!ELEMENT ExcludeModifier EMPTY>
<!ATTLIST ExcludeModifier
code NMTOKEN #REQUIRED
variants IDREFS #IMPLIED
>
<!ELEMENT ValidModifierClass EMPTY>
<!ATTLIST ValidModifierClass
code NMTOKEN #REQUIRED
variants IDREFS #IMPLIED
>
<!ELEMENT Rubric (
Label+,
History*)
>
<!ATTLIST Rubric
id ID #IMPLIED
kind IDREF #REQUIRED
usage IDREF #IMPLIED
>
<!ELEMENT Label (%rubric.complex;)*>
<!ATTLIST Label
xml:lang NMTOKEN #REQUIRED
xml:space (default|preserve) "default"
variants IDREFS #IMPLIED
>
<!ELEMENT History (#PCDATA)>
<!ATTLIST History
author IDREF #REQUIRED
date NMTOKEN #REQUIRED
>
<!ELEMENT SuperClass EMPTY>
<!ATTLIST SuperClass
code CDATA #REQUIRED
variants IDREFS #IMPLIED
>
<!ELEMENT SubClass EMPTY>
<!ATTLIST SubClass
code CDATA #REQUIRED
variants IDREFS #IMPLIED
>
<!ELEMENT Reference (#PCDATA)>
<!ATTLIST Reference
class CDATA #IMPLIED
authority NMTOKEN #IMPLIED
uid NMTOKEN #IMPLIED
code CDATA #IMPLIED
usage IDREF #IMPLIED
variants IDREFS #IMPLIED
>
<!ELEMENT Para (%rubric.simple;)*>
<!ATTLIST Para
class CDATA #IMPLIED
>
<!ELEMENT Fragment (%rubric.simple;)*>
<!ATTLIST Fragment
class CDATA #IMPLIED
usage IDREF #IMPLIED
type (item | list) "item"
>
<!ELEMENT Include EMPTY>
<!ATTLIST Include
class CDATA #IMPLIED
rubric IDREF #REQUIRED
>
<!ELEMENT IncludeDescendants EMPTY>
<!ATTLIST IncludeDescendants
code NMTOKEN #REQUIRED
kind IDREF #REQUIRED
>
<!ELEMENT List (ListItem+)>
<!ATTLIST List
class CDATA #IMPLIED
>
<!ELEMENT ListItem (
%rubric.simple;
| Para
| Include
| List
| Table)*
>
<!ATTLIST ListItem
class CDATA #IMPLIED
>
<!ELEMENT Table (
Caption?,
THead?,
TBody?,
TFoot?)
>
<!ATTLIST Table
class CDATA #IMPLIED
>
<!ELEMENT Caption (%rubric.simple;)*>
<!ATTLIST Caption
class CDATA #IMPLIED
>
<!ELEMENT THead (Row+)>
<!ATTLIST THead
class CDATA #IMPLIED
>
<!ELEMENT TBody (Row+)>
<!ATTLIST TBody
class CDATA #IMPLIED
>
<!ELEMENT TFoot (Row+)>
<!ATTLIST TFoot
class CDATA #IMPLIED
>
<!ELEMENT Row (Cell*)>
<!ATTLIST Row
class CDATA #IMPLIED
>
<!ELEMENT Cell (
%rubric.simple;
| Para
| Include
| List
| Table)*
>
<!ATTLIST Cell
class CDATA #IMPLIED
rowspan CDATA #IMPLIED
colspan CDATA #IMPLIED
>
<!ELEMENT Term (#PCDATA)>
<!ATTLIST Term
class CDATA #IMPLIED
>