Files
aivanov_CIM/TASK_14_SUMMARY.md
2026-03-05 01:20:14 +01:00

9.1 KiB

Task 14: PMSI Validator and Question Generator - Implementation Summary

Overview

Successfully implemented Task 14 which includes the PMSI Validator and Question Generator components for the medical coding pipeline. These components are critical for validating coding proposals, detecting errors, and generating questions for missing information.

Components Implemented

1. PMSIValidator (src/pipeline_mco_pmsi/validators/pmsi_validator.py)

Responsibilities:

  • Generate categorized validation problems (bloquant/à_revoir/info)
  • Detect missing mandatory information
  • Validate conformity to eligibility criteria from Guide Méthodologique
  • Detect zero-tolerance errors
  • Block automatic validation when critical issues are found

Key Features:

  • Mandatory Information Detection: Checks for missing DP, documents, facts, and evidence
  • Eligibility Criteria Validation: Integrates with RAG Engine to retrieve and validate eligibility criteria for DP and DAS codes
  • Code Consistency Checks: Verifies codes match clinical facts and detects uncoded diagnostics
  • Zero-Tolerance Error Detection: Identifies 8 types of critical errors:
    1. Negated diagnoses coded as affirmed
    2. Suspected diagnoses coded as certain (especially for DP)
    3. CCAM acts without explicit evidence
    4. Medical history coded as current episode
    5. Unknown referentiel versions
    6. High confidence on ambiguous cases
    7. Gross DP/DAS inversions
    8. PII leaks in logs/exports

Methods:

  • validate_proposal(): Main validation entry point
  • check_zero_tolerance_errors(): Detects critical errors
  • has_blocking_issues(): Checks for blocking problems
  • should_block_automatic_validation(): Determines if validation should be blocked

Requirements Satisfied: 9.1, 9.2, 26.5, 19.1-19.9

2. QuestionGenerator (src/pipeline_mco_pmsi/validators/question_generator.py)

Responsibilities:

  • Generate prioritized questions (maximum 5)
  • Detect inconsistencies between codes and clinical facts
  • Prioritize questions by impact on coding accuracy

Key Features:

  • Question Sources:

    • Validation issues (blocking and review)
    • Suspected clinical facts
    • Code/fact inconsistencies
    • Low confidence codes
    • Document contradictions
  • Prioritization System:

    • Priority levels: 1 (high) to 5 (low)
    • Category ordering: contradiction > missing_info > clarification > confirmation
    • Automatic limiting to MAX_QUESTIONS (5)
  • Inconsistency Detection:

    • Negated facts with proposed codes
    • Contradictions between documents
    • Suspected diagnoses requiring confirmation
    • Low confidence codes requiring validation

Methods:

  • generate_questions(): Main question generation entry point
  • _detect_inconsistencies(): Finds code/fact inconsistencies
  • _detect_document_contradictions(): Identifies multi-document contradictions
  • _prioritize_and_limit(): Sorts and limits questions to top 5

Requirements Satisfied: 9.3, 9.4

3. Blocking Logic (Integrated in PMSIValidator)

Responsibilities:

  • Block automatic validation when blocking issues detected
  • Block automatic validation when zero-tolerance errors detected

Key Features:

  • Comprehensive zero-tolerance error checking
  • Clear blocking decision logic
  • Detailed logging of blocking reasons

Requirements Satisfied: 9.6, 19.9

Test Coverage

PMSIValidator Tests (tests/test_pmsi_validator.py)

20 tests covering:

  • Basic initialization and validation
  • Missing mandatory information detection (DP, documents, facts, evidence)
  • Eligibility criteria validation (retrieval, no criteria, exclusion rules)
  • Zero-tolerance error detection (all 8 types)
  • Blocking logic (blocking issues, zero-tolerance, no issues)

Test Results: 20/20 passing (100%)

Coverage: 88% of pmsi_validator.py

QuestionGenerator Tests (tests/test_question_generator.py)

13 tests covering:

  • Basic initialization and question generation
  • Question generation from various sources
  • Inconsistency detection (negated facts, document contradictions)
  • Question prioritization and limiting

Test Results: 13/13 passing (100%)

Coverage: 86% of question_generator.py

Integration Points

RAG Engine Integration

  • PMSIValidator uses rag_engine.retrieve_eligibility_criteria() to fetch eligibility criteria from Guide Méthodologique
  • Validates codes against retrieved criteria
  • Generates warnings for exclusion and hierarchization rules

Data Models Used

  • ValidationIssue: Represents validation problems with severity and category
  • Question: Represents generated questions with priority and context
  • EligibilityCriteria: Contains eligibility rules from Guide Méthodologique
  • CodingProposal: Input containing proposed codes
  • StructuredStay: Input containing clinical facts and documents

Key Design Decisions

  1. Conservative Approach: The validator is designed to be conservative, preferring to flag potential issues rather than miss critical errors

  2. Separation of Concerns:

    • PMSIValidator focuses on validation and error detection
    • QuestionGenerator focuses on question generation and prioritization
    • Clear separation makes testing and maintenance easier
  3. Extensibility: Both classes are designed to be easily extended with new validation rules or question types

  4. Integration with RAG: Eligibility criteria validation leverages the RAG Engine for dynamic rule retrieval

  5. Pydantic Validation: Leverages Pydantic models for data validation, ensuring type safety and data integrity

Files Created/Modified

Created:

  1. src/pipeline_mco_pmsi/validators/pmsi_validator.py (222 lines)
  2. src/pipeline_mco_pmsi/validators/question_generator.py (122 lines)
  3. tests/test_pmsi_validator.py (745 lines)
  4. tests/test_question_generator.py (485 lines)

Modified:

  1. src/pipeline_mco_pmsi/validators/__init__.py - Added exports for new classes

Requirements Traceability

Requirement Component Status
9.1 - Categorized validation problems PMSIValidator Implemented
9.2 - Missing mandatory info detection PMSIValidator Implemented
9.3 - Prioritized questions (max 5) QuestionGenerator Implemented
9.4 - Code/fact inconsistency detection QuestionGenerator Implemented
9.6 - Block validation on blocking issues PMSIValidator Implemented
19.1 - Prevent negated coded as affirmed PMSIValidator Implemented
19.2 - Prevent suspected as certain PMSIValidator Implemented
19.3 - Prevent CCAM without evidence PMSIValidator Implemented
19.4 - Prevent history as current PMSIValidator Implemented
19.5 - Prevent DP/DAS inversions PMSIValidator Implemented
19.6 - Prevent unknown referentiel PMSIValidator Implemented
19.7 - Prevent PII leaks PMSIValidator Implemented
19.8 - Prevent high confidence ambiguous PMSIValidator Implemented
19.9 - Block on zero-tolerance errors PMSIValidator Implemented
26.5 - Validate eligibility criteria PMSIValidator Implemented

Usage Example

from pipeline_mco_pmsi.validators import PMSIValidator, QuestionGenerator
from pipeline_mco_pmsi.rag.rag_engine import RAGEngine

# Initialize components
rag_engine = RAGEngine(referentiels_manager)
pmsi_validator = PMSIValidator(rag_engine=rag_engine)
question_generator = QuestionGenerator()

# Validate a coding proposal
validation_issues = pmsi_validator.validate_proposal(
    proposal=coding_proposal,
    structured_stay=structured_stay
)

# Check for zero-tolerance errors
zero_tolerance_issues = pmsi_validator.check_zero_tolerance_errors(
    proposal=coding_proposal,
    structured_stay=structured_stay
)

# Determine if validation should be blocked
should_block = pmsi_validator.should_block_automatic_validation(
    validation_issues=validation_issues,
    zero_tolerance_issues=zero_tolerance_issues
)

# Generate questions for missing information
questions = question_generator.generate_questions(
    proposal=coding_proposal,
    structured_stay=structured_stay,
    validation_issues=validation_issues
)

# Process results
if should_block:
    print(f"Validation blocked: {len(validation_issues)} issues, {len(zero_tolerance_issues)} critical errors")
    print(f"Questions to resolve: {len(questions)}")
else:
    print("Validation passed")

Next Steps

The following tasks remain in the pipeline:

  1. Task 15: Implement Audit Logger for complete traceability
  2. Task 16: Implement main Pipeline orchestration
  3. Task 17-30: Additional features (rules management, metrics, deployment, etc.)

Conclusion

Task 14 has been successfully completed with:

  • All 3 subtasks implemented (14.1, 14.2, 14.3)
  • 33 unit tests passing (100% pass rate)
  • 87% average code coverage
  • All requirements satisfied
  • Integration with RAG Engine working
  • Zero-tolerance error detection comprehensive
  • Question generation and prioritization functional

The PMSI Validator and Question Generator are now ready for integration into the main pipeline and provide robust validation and question generation capabilities for the medical coding system.