Imputation and Causal Inference in Genomics

Seminar Series

Thursday, August 13, 2020 - 12:00
Audrey Qiuyan Fu, PhD

Abstract: Genomic data can be complex, large, noisy and sparse.  Here I will discuss two problems we have worked on.  The first problem deals with the highly sparse data from experiments of measuring gene expression in single cells.  These data contain a large number of zeros (often >80%); many of these zeros are artifacts rather than indication of no expression.  Underlying these data are complex regulatory relationships among genes, as well as potentially many cell types with different gene expression profiles.  We take a deep learning approach and design imputation methods based on autoencoders.  We generate synthetic data using real single-cell data to evaluate the performance, and explore several statistical properties of our methods and competing ones. 

The second problem deals with causal inference: can we identify potential biological mechanism directly from genomic data? For example, which genes regulate which other genes?  And which genes are targeted by drugs?  Genetic variation makes this inference possible (under certain assumptions), as it provides randomization among the individuals -- this is known as the principle of Mendelian randomization in genetic epidemiology, which has typically been used to establish a mediation effect.  We extend the interpretation of this principle to capture more causal relationships.  We represent these relationships with causal graphs, and also develop an algorithm for learning such graphs based on the PC algorithm, a classical algorithm in computer science for inferring directed acyclic graphs. We use this approach to study gene regulation and drug responses.

Zoom link:

Speaker: Audrey Qiuyan Fu, PhD
Assistant Professor
Department of Statistical Science
Institute of Bioinformatics and Evolutionary Institute for Modeling Collaboration & Innovation
University of Idaho