Dissertation Announcement: Mengying Yan | Duke Department of Biostatistics and Bioinformatics

Speaker

Mengying Yan

Advisor/Mentor: Ben Goldstein Dissertation Title: Outcome Observability in Electronic Health Record-Based Clinical Prediction Models Abstract: Electronic health records (EHRs) are a rich, real-world data source widely used to develop clinical prediction models (CPMs). However, outcomes in EHR data are often not fully observed, as individuals can receive care at multiple institutions or otherwise fall outside the system's observational reach. This observability problem, particularly when it differs by demographic subgroups (differential observability), can introduce a critical source of bias in EHR-based CPMs. As a result, models may systematically underestimate risk for vulnerable groups, reinforcing existing health inequities. While observability is a challenge across all EHR-based studies, this dissertation focuses on outcome observability within CPM development. Specifically, it addresses three key questions: how differential observability can bias CPMs, how to estimate the degree of observability, and how to build robust CPMs despite incomplete outcome information. First, after formally defining observability, we demonstrate how differential observability induces algorithmic bias in CPMs. Next, we propose a novel method to estimate and assess the extent of observability using a fully observed external data. By reweighting the external data to resemble the target EHR population, the approach provides estimates of both overall and differential observability, without requiring direct patient-level linkage. Finally, we address the challenge of constructing CPMs for long-term outcomes with a limited observing window. We frame this as a positive-unlabeled (PU) problem, and employ an adversarial domain adaptation method that aligns historical, fully labeled data with more contemporary, partially labeled target cohort to maintain predictive performance despite shifting patient populations. This dissertation underscores the importance of accounting for observability when leveraging EHR data for CPMs. By defining and measuring outcome observability, as well as adapting modeling strategies to mitigate its impact, this dissertation provides a comprehensive framework for handling observability in EHR data when building clinical prediction models.

Event Series

B&B Dissertation Defense