Realizing the Potential of EHR Data for Clinical Research: Overcoming Noisiness, Privacy Constraints and Heterogeneity

Seminar Series

Wednesday, March 10, 2021 - 02:00
Zoom Conference
Yifan Hong, PhD

Abstract:  Healthcare data such as electronic health records (EHR) linking with genomics data have opened new research opportunities along with unique challenges, such as imperfect outcome data, high dimensionality, data privacy, and heterogeneity across different sites. In this talk, I present several novel statistical methods to overcome the challenges in EHR linked clinical research. Specifically, we develop semi-supervised learning methods to make inferences about both accuracies of multiple surrogate outcomes and the effect of biomarkers on the true phenotype. Our algorithms account for misclassification errors containing in outcome data, and reduce the needed size of gold-standard labels by leveraging information from a large unlabeled data. We derive asymptotic properties of the proposed estimators and provide theoretical guarantees for the performance. We illustrate the proposed methods through an EHR-based study evaluating biomarkers of cardiovascular disease among patients with rheumatoid arthritis. 

Speaker: Chuan Hong, PhD
Instructor, Biomedical Informatics 
Harvard Unviersity

Zoom: https://duke.zoom.us/j/99550106979?pwd=SmRDdzI0NVpyM0xHTEdIOXZ1OGQ1UT09

Passcode: 995986