Develop Novel Statistical and Computational Methods for Omics Data Analysis

August 14, 2023
2:00 pm to 3:00 pm
Hock Plaza, Room 214

Event sponsored by:

Biostatistics and Bioinformatics


Allison, Tasha


Qi Gao


Qi Gao, PhD Candidate

Recent advances in sequencing technologies have enabled the measurement of gene expression and other omics profiles at multi-cell, single-cell or subcellular resolution. However, these advances also posed challenges for data analysis, such as identifying differentially expressed feature gene sets with high accuracy and benchmarking computational methods for various analysis topics on data with complex heterogeneity. In my dissertation, we have focused on developing novel statistical and computational methods to address these challenges.

In project 1, we developed SifiNet, a versatile pipeline to identify cell-subpopulation specific feature genes, annotate cell subpopulations, and reveal their relationships. The major advantage of SifiNet is that it bypasses cell clustering and thus avoids possible bias introduced by inaccurate clustering; thus, SifiNet achieves significantly higher accuracy in feature gene identification and cell annotation than tranditional two-step methods relying on clustering. SifiNet can analyze both single cell RNA sequencing (scRNA-seq) and single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) data, providing insight into multiomic cellular profile.

In project 2, we developed GeneScape, a novel scRNA-seq data simulator that can simulate complex cellular heterogeneity. Existing scRNA-seq data simulators are limited in their abilities to simulate data with complex or subtle cellular heterogeneities, especially for those cells exhibit both cell type and cell state differences (such as differences in cell cycles, senescence levels, and DNA-damage levels). GeneScape can successfully simulate gene expressions for cells with complex heterogeneity structures. In project 3, we developed GeneScape-S (GeneScape-Spatial), a simulator for spatially resolved transcriptomics (SRT) data. Existing SRT-specific simulators cannot fulfill customized needs such as simulating multi-layer data, mimicking local tissue heterogeneity, and accommodating mixing cell-type structures in low-resolution spots. To fill these gaps, we propose GeneScape-S, which preserves the expression and spatial patterns of real SRT data, and offers specially designed functions tailored to fulfill customized needs. GeneScape-S also incorporates the features in GeneScape to simulate complex heterogeneities.

Mentor: Jichun Xie

Zoom Link: | Meeting Number: 917 5826 9666 | Passcode: 079995