Reconsider Machine Learning Method for Variable Selection and Validation with High Dimensional Data

July 23, 2024
11:00 am to 1:00 pm
Hock Plaza CRTP Classroom 214

Event sponsored by:

Biostatistics and Bioinformatics
School of Medicine (SOM)


Allison, Tasha


Lu Liu
Mentor/Advisor: Sin-Ho Jung, PhD Abstract: The big data tendency influences how people think and inspires potential research directions. Recent feats of machine learning have seized collective attention because of its profound performance in conducting big data analysis including text analysis and image processing. Machine learning is also a popular topic in clinical medicine to implement analysis on electronic health records and medical image data, which traditional statistics model is not adequate for. However, we realize that machine learning is not panacea and its defects such as loss of interpretability and excess selection may restrict its application. And we must also recognize that for many clinical prediction analyses, the simpler approach-generalized linear model is enough for what we need. In this dissertation, we propose to use standard regression methods, without any penalizing approach, combined with a stepwise variable selection procedure to overcome the over-selection issue of popular machine learning methods. For model validation, we propose a permutation approach to estimate the performance of various validation methods. Finally, we propose a repeated sieving approach, extending the standard regression methods with stepwise variable selection, to handle high dimensional modeling.

B&B Dissertation Defense