Event sponsored by:
Computational Biology and Bioinformatics (CBB)
Biostatistics and Bioinformatics
Center for Advanced Genomic Technologies
Developmental and Stem Cell Biology Program
Duke Center for Genomic and Computational Biology (GCB)
Molecular Genetics and Microbiology (MGM)
School of Medicine (SOM)
University Program in Genetics & Genomics (UPGG)
Contact:
Franklin, MonicaSpeaker:
Gurkan Yardimci
Advent of single-cell genomics has enhanced our ability to study heterogeneous cell populations (1) to track course of temporal processes, such as cellular differentiation, (2) to identify novel and rare cell states, and (3) characterize the heterogeneity of complex tissues, such as tumors. In this talk, I will present three methods to study such cell populations using multimodal single-cell omics assays. Our first algorithm, Epiconfig, is an interpretable multimodal topic model that learns unsupervised clustering of single-cells while modeling cross modality relationships. We applied EpiConfig to a collection of sc-RNA+ATAC-seq assays that jointly measure transcriptomic and chromatin accessibility of single cells from healthy and cancerous cell populations. Epi-Config is as accurate as widely used sc-multiomics clustering methods; it learns sets of unimodal and cross modality features, called topics, that correspond to specific cell types and states. We developed a shiny app for interpretation of these topics to obtain biological insights into different cell states; we show that cross modality features reflect 3D genome interactions. Our second method, RIDDLER, can identify copy number variation (CNV) events in single-cell datasets. CNV is a widely studied type of genomic structural variation that can have direct and indirect effects on gene dosage, and may drive cancer progression. RIDDLER is a single-cell resolution CNV detection algorithm based on outlier aware generalized linear modeling. We demonstrate the effectiveness of our algorithm on cancer cell line models where it achieves better agreement with sc-WGS derived CNVs than competing methods. RIDDLER is able to accurately reconstruct clonal heterogeneity of the cell population, in accordance with sc-WGS derived clones, and can be applied to both sc-ATAC-seq, sc-WGS and sc-methylation datasets. Lastly, I will introduce scPrePrint, an algorithm for studying transcription factor (TF) binding to chromatin using sc-ATAC-seq data. scPrePrint uses an unsupervised deep neural network to identify transcription factor binding sites via modeling of the ATAC footprint left on the TF binding site; while correcting for ATAC sequence bias. We deployed scPrePrint on publicly available sc-ATAC-seq from cell lines, and show that our footprint calls are concordant with ChIP-seq data.
CBB Monday Seminar Series