75 lines
2.5 KiB
Markdown
75 lines
2.5 KiB
Markdown
# Changelog
|
|
|
|
All notable changes to the OMOP Data Pipeline project will be documented in this file.
|
|
|
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
|
|
## [0.1.0] - 2024-01-XX
|
|
|
|
### Added
|
|
- Initial release of OMOP CDM 5.4 Data Pipeline
|
|
- Complete OMOP CDM 5.4 schema implementation (30+ tables)
|
|
- Staging schema for raw data ingestion
|
|
- Audit schema for ETL tracking and data quality metrics
|
|
- Extractor component for batch and incremental extraction
|
|
- Concept Mapper with LRU caching and multi-level mapping strategy
|
|
- Transformer for all major OMOP tables (PERSON, VISIT_OCCURRENCE, CONDITION_OCCURRENCE, etc.)
|
|
- Validator with comprehensive data quality checks
|
|
- Loader with bulk insert and UPSERT capabilities
|
|
- Orchestrator for coordinating complete ETL flow
|
|
- Parallel processing with ThreadPoolExecutor
|
|
- Error Handler with retry logic, circuit breaker, and checkpoint/resume
|
|
- CLI interface with comprehensive commands
|
|
- Vocabulary Loader for OMOP vocabularies
|
|
- Configuration management with YAML and environment variables
|
|
- Comprehensive logging with file rotation
|
|
- Database connection pooling with retry logic
|
|
- Pydantic models for all OMOP tables
|
|
- PostgreSQL sequences for ID generation
|
|
|
|
### Features
|
|
- Automated concept mapping with fallback strategies
|
|
- Batch processing with configurable batch sizes
|
|
- Multi-threaded parallel processing
|
|
- Transaction management with automatic rollback
|
|
- Foreign key validation before loading
|
|
- Date validation and parsing
|
|
- Referential integrity checks
|
|
- OMOP compliance validation
|
|
- Unmapped code tracking
|
|
- Execution statistics and audit trail
|
|
- Progress bars for long-running operations
|
|
- Verbose logging mode
|
|
|
|
### Documentation
|
|
- README with quick start guide
|
|
- User guide with detailed instructions
|
|
- Architecture documentation
|
|
- Transformation rules documentation
|
|
- API documentation in code
|
|
- Configuration examples
|
|
|
|
### Requirements
|
|
- Python 3.12+
|
|
- PostgreSQL 16.11+
|
|
- SQLAlchemy 2.0+
|
|
- Pydantic 2.5+
|
|
- Click 8.1+
|
|
- Other dependencies in requirements.txt
|
|
|
|
## [Unreleased]
|
|
|
|
### Planned
|
|
- Property-based tests with Hypothesis
|
|
- Integration tests for complete ETL flow
|
|
- Performance benchmarking suite
|
|
- Docker containerization
|
|
- CI/CD pipeline
|
|
- Data Quality Dashboard integration
|
|
- Additional source data formats (HL7, FHIR)
|
|
- Incremental ETL mode
|
|
- Data lineage tracking
|
|
- Web-based monitoring dashboard
|
|
- REST API for programmatic access
|