Genome Academy is a series of stand-alone workshops in genome topics offered to Duke faculty, postdocs, graduate students and staff at little to no fee. Sabbatical scholars and other collaborating visitors may request registration and will be accommodated on a space-available basis. The workshops, taught by faculty and staff from various departments, range from 101-style introductions in genomic technologies, computational approaches and mass spectrometer analyses to more focused topics of molecular analysis. They are intended to introduce Duke community members to the field and build capacity in areas to further their own research.
There is no enrollment cap for online courses. Enrollment for in-person courses is capped at 20 students, and registration closes 7 days before each class. You will receive an enrollment confirmation 3 days before each class. In the event you are unable to attend your registered course(s), please contact Matthew Franco. Many in-person courses have a waitlist and we can offer your spot to another person.
Spring Courses:
Spring 2024 Courses
Introduction to DNA Sequencing
Instructor: Devi Swain Lenz, PhD
Date and Time: March 20, 1-2 PM
Location: 2240 CIEMAS
Cost: Free
During the past two decades, a new generation of high-throughput DNA sequencers has transformed biomedical and biotechnology research. These new technologies have fostered the development of a wide range of applications to basic and clinical research, including SNP discovery, transcriptome profiling, genome sequencing, and epigenetics. The goal of this introductory course is to teach the basic principles of next generation sequencing technology (NGS) and to present an overview of various library preparations and their applications. Advantages and limitations of various methods will be discussed and compared across technologies/platforms (Illumina, PacBio, Oxford Nanopore, and startup technologies). This course will also provide an introduction to primary data analysis and data quality assessment steps. Attendees will become familiar with NGS technology terms and fundamentals, NGS data format and quality, and will acquire a better understanding of how to choose a suitable NGS sequencing method or instrument for their study.
Blood, Sweat, and Tears: Proteomics of Biofluids and Secretomes
Instructor: Matthew Foster, Ph.D.
Date and Time: April 17, 1:00pm-3:30pm
Location: CIEMAS 2240
Cost: Free
Biofluids, including plasma, urine and cerebrospinal fluid are among the most common sample types utilized for protein biomarker discovery. Similarly, proteins secreted by cells or organoids in vitro can be used to identify signaling factors or response to injury. The proteomic analysis of these samples by can be challenged by pre-analytic variability and by the extreme range of protein concentrations. This workshop will focus on best practices for sample collection and storage, experimental design, sample preparation, and data acquisition and analysis as they relate to the targeted and non-targeted mass spectrometry-based proteomic analysis of biofluids and secretomes. Limitations of mass spectrometry-based versus other approaches (e.g. antibody-based assays) will also be discussed.
Multimodal Single Cell Data Analysis – Gene and Cell Surface Protein Expression
Instructor: Vaibhav Jain
Date and Time: April 30, 2:15-3:15
Location: CIEMAS 2240
Cost: Free
The workshop aims to provide participants with an in-depth understanding of single-cell data analysis techniques, focusing on gene and cell surface protein expression. The workshop will cover the process of data generation, including an explanation of what single-cell data is and how it is generated. Participants will learn to utilize tools like 10x cellranger to convert sequencing data, map reads to reference transcriptomes, and generate cell-by-gene matrices. Furthermore, the workshop will delve into exploring data using R-based tools such as Seurat, enabling participants to gain insights into the complexities of single-cell expression profiles and extract meaningful biological information. Through practical demonstrations and hands-on exercises, attendees will acquire the necessary skills to navigate and analyze multimodal single-cell data effectively.
Spatial 10X Xenium In Situ Sequencing Data Analysis
Instructor: Vaibhav Jain
Date and Time: May 3, 12:30 - 1:30 PM
Location: CIEMAS 2240
Cost: Free
This workshop will introduce learners to Spatial 10X Xenium In Situ Sequencing Data Analysis and review how data are generated. In particular, we will review Xenium explorer to learn about onboard analysis results and also highlight R-based tools like Seurat and SpaceXR.
Introduction to Mass Spectrometry-Based Proteomics *WAITLIST ONLY*
Instructor: Erik Soderblom, Ph.D.
Date and Time: May 13, 9:00am-12:00pm
Location: French Science 4233
Cost: Free
Liquid chromatography coupled with tandem mass spectrometry (LC/MS/MS) continues to be the key technology for the qualitative and quantitative analysis of peptides and proteins for both basic and clinical research projects. This Genome Academy session is designed as an introduction for researchers needing to expand their knowledge of the use of LC/MS/MS-based methods for proteomics, and thus help researchers better understand how these technologies can help inform their research goals. Background material in basic protein chemistries will be provided, with an emphasis on how to use the physicochemical characteristics of these biomolecules for sample preparation specifically for LC/MS/MS analyses. In addition, the fundamentals of liquid chromatography and mass spectrometry will be discussed to enable students to understand the nuances of the experimental designs required to address their specific project. Real-world examples will be used to illustrate sample preparation and analysis strategies, including basic identification projects, characterization of Post-Translational Modifications and differential expression analyses (including 'omic biomarker discovery and targeted biomarker verification). Finally, the use of open source software tools for interpretation of these datasets will be discussed.
On Demand Courses
A high-performance computing cluster empowers users to harness computational power far beyond that of a single machine. This online course module teaches users how to run and manage computational analyses on the HARDAC cluster, Duke's high-performance computing resource tailor-designed for computational genomics. The course module consists of a set of interactive activities that take users from connecting to HARDAC all the way to running powerful array jobs. To follow the activities included in this course, users must be granted access to HARDAC, which is available to everyone on Duke's campus through the Computational Solutions service center. To inquire about access, please contact the Computational Solutions team.
Custom software is a common component of almost any computational workflow. Using and installing these on a shared high-performance computing environment such as HARDAC presents challenges due to frequently conflicting recursive dependency chains and versions. A number of mechanisms have been developed to isolate software environments, their versions, and their dependencies from each other, and understanding these is key in recruiting already installed software packages, and installing one's own. This course teaches the use of Environment Modules as provided on HARDAC in part 1, and Conda Environments in part 2. A third installment is planned for teaching the use of Singularity containers. To follow the material in this class effectively, you should have access to HARDAC, and you should have taken (or have full command over the material presented in) the "High Performance Computing (HPC) / SLURM Best Practices for HARDAC" online course.
View Part I Lecture: Environment Modules
View Part II Lecture: Conda Environments
Past Course Offerings
Instructors: Dr. Erik Soderblom and Dr. Will Thompson, Duke Proteomics and Metabolomics
Cost: Free
Liquid chromatography coupled with mass spectrometry (LC/MS) is a versatile tool for the qualitative and quantitative characterization of peptides, proteins and metabolites for both basic and clinical research projects. One of the most important considerations in being able to translate LC-MS datasets into meaningful biological observations is to effectively use open source software packages and/or online resources geared toward LC-MS based datasets. This GCB Academy session is designed as a complement to GCB Academy course “Fundamentals of Mass Spectrometry for Proteomic and Metabolomic Analyses” (Nov 7th) and GCB Academy course “Experimental Design: Get the most out of your proteome” (Nov 8th) and is intended for users of the Proteomics and Metabolomics Shared Resource who have or plan on generating LC/MS based Proteomic or Metabolomic Datasets with the Shared Resource. This first portion of the course will focus on the effective use of Scaffold to characterize qualitative proteomic datasets. This will include an overview of Scaffold and features such as interpretation of spectral matches at a protein or peptide level, gene ontology classification, homology matching, spectral count data, and data export. The second portion of the course will cover common proteomic and metabolomic data analysis strategies from supplemental data (typically .xlsx file formats from Rosetta Elucidator) provided as part of the Shared Resource’s quantitative proteomic workflows. This will include an overview of the typical features of a quantitative data return document, various data summarization levels, calculating peptide/protein relative fold-changes and p-values, exporting data for motif analysis (PTM specific datasets), and performing Principle Component Analysis (PCA) and 2D Clustering within JMP Pro.
Instructor: Matt Foster
Cost: Free
This course will provide an in-depth overview of experimental design, focusing on proteomic analysis of protein post-translational modifications (PTMs) and protein expression in (but not limited to) mammalian cells, tissues and biofluids. Topics will be aimed at getting maximum biological information from your samples. We will discuss methods for enriching subproteomes and PTMs; best practices for insuring sample integrity and avoiding common contaminants that will be carried downstream; and how to be aware of additional factors that might influence reproducibility across biological replicates. In addition, we will discuss where discovery-based or targeted proteomic analyses may be most appropriate. Feel free to bring specific questions about your favorite proteins, model systems, or biological matrices. Prerequisite: Fundamentals of Mass Spectrometry for Proteomic and Metabolomic Analyses, encouraged, but not required.
Instructor: Arthur Moseley
Cost: Free
Liquid chromatography coupled with tandem mass spectrometry (LC/MS/MS) continues to be the key technology for the qualitative and quantitative analysis of peptides, proteins and metabolites for both basic and clinical research projects. This GCB Academy session is designed as an introduction for researchers needing to expand their knowledge of the use of LC/MS/MS-based methods for proteomics and metabolomics, and thus help researchers better understand how these technologies can help inform their research goals. Background material in basic protein/metabolite chemistries will be provided, with an emphasis on how to use the physicochemical characteristics of these biomolecules for sample preparation specifically for LC/MS/MS analyses. In addition, the fundamentals of liquid chromatography and mass spectrometry will be discussed to enable students to understand the nuances of the experimental designs required to address their specific project. Real-world examples will be used to illustrate sample preparation and analysis strategies, including basic identification projects, characterization of Post-Translational Modifications and differential expression analyses (including 'omic biomarker discovery and targeted biomarker verification).
Instructor: Tom Burke
Cost: Free
This seminar will offer an introductory overview of key considerations and best practices in establishing and maintaining clinical biospecimen collections for genomic and precision medicine research. Topics covered will include: basic concepts in biobank and cohort research; role of standardization, harmonization, and quality control; maintaining unique sample identification and robust chain-of-custody tracking; need for secure information and inventory management systems for samples and data; important considerations in repository design; and an overview of biobanking resources at Duke and beyond.
This half-day tutorial will provide you with a better understanding of the data processing and analysis methods that are used in RNA-seq analysis. We will cover topics such as data quality control, normalization, and calling differentially expressed genes. We will provide hands-on experience that will allow you to go back to your lab and work with your own data.
*Pre-requisites: "Introduction to Unix" and "Introduction to Scientific Computing for Genomics" (or equivalent experience)
Course: Introduction to DNA Sequencing
Presenter: Devi Swain Lenz, PhD
Cost: Free
During the past two decades, a new generation of high-throughput DNA sequencers has transformed biomedical and biotechnology research. These new technologies have fostered the development of a wide range of applications to basic and clinical research, including SNP discovery, transcriptome profiling, genome sequencing, and epigenetics. The goal of this introductory course is to teach the basic principles of next generation sequencing technology (NGS) and to present an overview of various library preparations and their applications. Advantages and limitations of various methods will be discussed and compared across technologies/platforms (Illumina, PacBio, Oxford Nanopore, and startup technologies). This course will also provide an introduction to primary data analysis and data quality assessment steps. Attendees will become familiar with NGS technology terms and fundamentals, NGS data format and quality, and will acquire a better understanding of how to choose a suitable NGS sequencing method or instrument for their study.
Course: Introduction to Mass Spectrometry based Proteomics
Presenter: Dr. Erik Soderblom
Liquid chromatography coupled with tandem mass spectrometry (LC/MS/MS) continues to be the key technology for the qualitative and quantitative analysis of peptides and proteins for both basic and clinical research projects. This GCB Academy session is designed as an introduction for researchers needing to expand their knowledge of the use of LC/MS/MS-based methods for proteomics, and thus help researchers better understand how these technologies can help inform their research goals. Background material in basic protein chemistries will be provided, with an emphasis on how to use the physicochemical characteristics of these biomolecules for sample preparation specifically for LC/MS/MS analyses. In addition, the fundamentals of liquid chromatography and mass spectrometry will be discussed to enable students to understand the nuances of the experimental designs required to address their specific project. Real-world examples will be used to illustrate sample preparation and analysis strategies, including basic identification projects, characterization of Post-Translational Modifications and differential expression analyses (including 'omic biomarker discovery and targeted biomarker verification). Finally, the use of open source software tools for interpretation of these datasets will be discussed.
Instructors: Jennifer Modliszewski and Holly Dressman
Cost: $50 for faculty, postdocs and staff; free for grad students
This 4-hour tutorial will first spend time discussing important considerations for the design of your study and the collection of your samples. It will also introduce you to the data processing and analysis methods that are used in 16S microbiome analysis. We will cover topics such as data quality control, diversity indices, and calling differentially abundant microflora. We will provide hands-on experience that will allow you to go back to your lab and work with your own data.
Instructor: Wei Chen
Cost: $50 for faculty, postdocs and staff; free for grad students
This 4-hour tutorial will provide you with a better understanding of the data processing and analysis methods that are used in RNA-seq analysis. We will cover topics such as data quality control, normalization, and calling differentially expressed genes. We will provide hands-on experience that will allow you to go back to your lab and work with your own data.
*Pre-requisites: "Introduction to Unix" and "Introduction to Scientific Computing for Genomics" (or equivalent experience).
Instructor: Hilmar Lapp
Cost: $200 for faculty, postdocs and staff; Free for grad students
Computing has become an integral and indispensable part of genomic biology. This course teaches basic skills in scientific computing, with a focus on applications for genomic science, aimed at making you more productive, your computational work more reliable, and your research easier to reproduce and extend, including by your future self. The course includes introductions to (1) using Unix shell commands to efficiently find, organize, and stage data for analysis; (2) basic data types, control flows, functions, and 3rd party packages for the Python programming language commonly encountered in scientific computing; (3) using version control to manage with confidence the numerous directions research code takes from inception to publication; and (4) effectively using a high-performance computing cluster to run computational analyses. The format of the course is inspired by the acclaimed Software Carpentry-style bootcamps. Hence, this is a fully hands-on workshop, and students are expected to bring a laptop.
*Prerequisites: “Introduction to Unix” (or equivalent experience)
Course Website: https://duke-gcb.github.io/SciComp-Nov-2019/
Instructor: Hélène Fradin
Cost: $50 for faculty, postdocs and staff; free for grad students
This 4-hour hands-on tutorial will provide you with experience working with data from a single-cell RNA-Seq experiment. We will cover quality control, filtering, normalization, clustering, differential expression and mark identification analysis.
*Pre-requisites: Must have previously taken the GCB Academy “RNA-Seq Analysis” course.
Instructor: Dr. Erik Soderblom and Dr. Will Thompson, Duke Proteomics and Metabolomics
Cost: Free
Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) is a versatile tool for the qualitative and quantitative characterization of peptides, proteins and metabolites for both basic and clinical research projects. One of the most important considerations in being able to translate LC-MS datasets into meaningful biological observations is to effectively use open source software packages and other online resources to perform proper data analysis, interpretation, and forward-looking experimental design. This GCB Academy session is designed as a complement to “Introduction to Mass Spec Technologies for Proteomics and Metabolomics” and is intended for users of the Proteomics and Metabolomics Shared Resource who have or plan on generating LC/MS based Proteomic or Metabolomic Datasets with the Shared Resource. The first section of the course will focus on the effective use of Scaffold to characterize qualitative proteomic datasets. This will include an overview of Scaffold and features such as interpretation of spectral matches at a protein or peptide level, gene ontology classification, homology matching, spectral count data, and data export. The second section will cover the interpretation and meta-analysis of data provided from the quantitative proteomics and metabolomics pipelines (typically .xlsx file formats from Rosetta Elucidator) provided as part of the Shared Resource’s quantitative workflows. This will include an overview of the typical features of a quantitative data return document, various data summarization levels (e.g. peptide versus protein), calculating peptide/protein relative fold-changes and p-values, exporting data for motif analysis (PTM specific datasets), and performing Principle Component Analysis (PCA) and 2D Clustering within JMP Pro (SAS Institute, Cary, NC). Finally, we will cover the use of Skyline as a tool for targeted quantitative proteomics workflows. This will include utilizing Skyline for verification following LC-MS based discovery experiments, as well as a brief introduction to using Skyline to design and interpret targeted proteomics and metabolomics analysis. This portion will utilize hands-on analysis of raw data collected in the Shared Resource. For advance training in Skyline, the tutorials on the Skyline software web site are highly recommended (https://skyline.gs.washington.edu/labkey/wiki/home/software/Skyline/page.view?name=tutorials).
Prerequisites:
1) GCB Academy course; “Introduction to Mass Spec Technologies for Proteomics and Metabolomics” (optional, but preferred)
2) Personal laptop with Scaffold (http://www.proteomesoftware.com/products/scaffold/download/), Skyline (https://skyline.gs.washington.edu/labkey/project/home/software/Skyline/…?), and JMP Pro (downloaded from Duke OIT, https://software.oit.duke.edu/comp-print/software/index.php) pre-installed.
Instructor: Will Thompson
Cost: Free
Metabolomics has emerged as a powerful approach for characterization of molecular systems and also development of biomarkers for disease progression or diagnosis. Broadly, metabolomics is the characterization of small molecules by mass spectrometry and can include both "unbiased" or non-targeted techniques, as well as "targeted" methods. The measurement of metabolites by mass spectrometry is also directly translatable to the clinic; many common assays such as amino acids, acylcarnitines, vitamin D epimers, steroid hormones, and drugs of abuse are all clinical mass spec assays. Whether developing a novel assay or using a validated metabolite assay, the most important aspect for a successful metabolomics study is deciding which technique to use and understanding the data each approach will likely be able to provide. In this course, we will discuss sample types which are amenable to metabolomics, and utilize case studies to discuss the critical differences in targeted and non-targeted metabolomics and an investigator might choose one over another. We will use example datasets to demonstrate techniques for analysis of high dimensional metabolomic data. We will also cover the methods needed for accurate quantification, how to enable longitudinal translation of metabolomics assays, and how a targeted mass spec assay may differ in utilization from a clinical ELISA.
Instructors: Holly Dressman & Olaf Mueller
Cost: Free
An introductory discussion on what is involved in designing and analyzing the microbiome. We will cover study design, sample collection/storage/preparation/sequencing of 16S rDNA and provide a 3-4 hour analysis using basic QIIME data analysis.
Course: Microbiome Workshop
Presenter: Dr. Josh Granek & Dr. So Young Kim
This 3-hour workshop will be an introduction to microbiome research. We will cover the basics of study design, sample collection, preparation, sequencing, and analysis of a microbiome high-throughput amplicon sequencing (e.g. 16S rRNA) experiment.
Instructors: Dr. Holly Dressman, Co-Director, Sequencing and Genomic Technologies
Cost: FREE
PCR, quantitative PCR and droplet-digital PCR technologies will be discussed along with examples on which technology would best fit your research.
Instructor: Sunil Suchindran
Cost: Free
This course has two objectives. First, it seeks to develop an understanding of risk prediction and classification in the Omics setting. Second, for researchers who plan to develop risk models, this course seeks to provide concrete steps for study design, analysis, and interpretation. To accomplish these goals, we will discuss how different aspects of a statistical model can provide measures of association or measures of predictive accuracy. This distinction is important in understanding how developing a model for association/etiology/causal inference is conceptually different from using the model to predict. We will then discuss risk models in the conventional setting: larger sample sizes with a smaller number of predictors. We will cover study design, statistical models, and performance metrics. The course seeks to develop an appreciation of challenging considerations in the field, but also seeks to provide clear steps on how to proceed. Finally, we will review areas of active research and in what direction the field is moving. After establishing foundations, we will move into the Omics realm, which is characterized by smaller samples sizes and thousands of predictors. Prediction models in Omics often use machine-learning techniques, so we will cover some common machine-learning techniques and what makes them different from more conventional models. We will review current best practices with an emphasis on estimating performance. This course will not include any hands-on coding because of time limitations, but this will be the topic of a future course. The course focuses on understanding the most important aspects of risk prediction and classification.
Instructor: Erik Soderblom
Cost: Free
Liquid chromatography coupled with mass spectrometry (LC/MS) is a versatile tool for the qualitative and quantitative characterization of peptides, proteins and metabolites for both basic and clinical research projects. One of the most important considerations in being able to translate LC-MS datasets into meaningful biological observations is to effectively use open source software packages and/or online resources geared toward LC-MS based datasets. This GCB Academy session is designed as a complement to GCB Academy course “Fundamentals of Mass Spectrometry for Proteomic and Metabolomic Analyses” (Nov. 7) and GCB Academy course “Experimental Design: Get the most your of your proteome” (Nov. 8) and is intended for users of the Proteomics and Metabolomics Shared Resource who have or plan on generating LC/MS based Proteomic Datasets with the Shared Resource. This first portion of the course will focus on the effective use of Scaffold to characterize qualitative proteomic datasets. This will include an overview of Scaffold and features such as interpretation of spectral matches at a protein or peptide level, gene ontology classification, homology matching, spectral count data, and data export. The second portion of the course will cover common proteomic data analysis strategies from supplemental data (typically .xlsx file formats from Rosetta Elucidator) provided as part of the Shared Resource’s quantitative proteomic workflows. This will include an overview of the typical features of a quantitative data return document, various data summarization levels, calculating peptide/protein relative fold-changes and p-values, exporting data for motif analysis (PTM specific datasets), and performing Principle Component Analysis (PCA) and 2D Clustering within JMP Pro.
Instructors: Dr. Erik Soderblom and Dr. Will Thompson, Duke Proteomics and Metabolomics
Cost: Free
Critical review of a Proteomics data analysis presents unique challenges because of the complex workflows involved in going from raw mass spectrometry data to results interpretation. Using tools discussed in the “Bioinformatics Tools” course, this class will work to ‘deconstruct’ a proteomics experiment which has had flaws in the analysis and interpretation. By finding the errors in data analysis and interpretation, the goal of this case study will be to become more aware of many common pitfalls in proteomics data analysis, and enhance your skills in reviewing proteomics datasets which are becoming much more common in the peer-reviewed literature. The material will be guided, but hands-on participation is expected. Laptops required. Prerequisites: Attendance at “Fundamentals” and “Experimental Design” classes recommended but not required, attendance at “Bioinformatics Tools” highly recommended.
Instructor: Susanne Haga
Cost: Free
This 90-minute course will provide attendees with an overview of general principles of genetics, genomics and molecular biology, and clinical applications and technologies currently used in clinical practice. In particular, the course will provide an overview of genomics, genome-wide association studies and other large initiatives and a range of testing technologies for diagnosis and treatment. Introduction of new technologies such as liquid biopsies will also be briefly discussed.
2.5 days
Instructors: Dr. Olivier Fédrigo, Director, and Dr. Nicolas Devos, Associate Director, Sequencing and Genomic Technologies Shared Resource
Cost: $200
In this 3-day workshop, participants will prepare stranded RNA-Seq libraries and will have the opportunit to generate and analyze expression data. This hands-on workshop consists of two parts: 1) sample preparation and data generation (wet lab) and 2) data analysis. In the first part, participants will be trained at estimating RNA sample quality, generating stranded directional RNA-Seq libraries, and assessing RNA-Seq library quality. In the second part, participants will learn how to perform basic bioinformatics analyses on the RNA-Seq data, including data QC, mapping reads, and differential expression analysis. For more in-depth analyses, the GCB Academy course on RNA-Seq analysis is recommended.
Pre-requisites: Attendees should have basic laboratory skills such as lab safety principles, best RNA practices, pipetting, and dilutions.
Instructors: Dr. Holly Dressman, Co-Director, Sequencing and Genomic Technologies
Cost: FREE
Single cell expression profiling can be facilitated through the automation of the Fluidigm C1 System. The system allows one to capture single cells and explore gene expression profiling through the use of qPCR or RNA sequencing analysis. The technology will be discussed as well as how it can be applied in your research. Pre-requisites: Basic understanding of molecular biology. PCR, quantitative PCR and droplet-digital PCR technologies will be discussed along with examples on which technology would best fit your research.
Instructors: Dr. Holly Dressman, Co-Director, Sequencing and Genomic Technologies
Cost: FREE
Single cell expression profiling can be facilitated through the automation of the Fluidigm C1 System. The system allows one to capture single cells and explore gene expression profiling through the use of qPCR or RNA sequencing analysis. The technology will be discussed as well as how it can be applied in your research. Pre-requisites: Basic understanding of molecular biology.
Instructor: Dr. David Corcoran, Director, Genomic Analysis and Bioinformatics Shared Resource
Cost: $50 for faculty, postdocs, and staff; free for graduate students
This hands-on tutorial will introduce the data processing steps for the purpose of calling variants from whole exome sequencing data. We will go step-by-step through the best practices guide from the Genome Analysis Toolkit. After completing this tutorial, you should feel comfortable calling variants from data generated in your own labs.
Pre-requisites: "Introduction to Unix" and "Introduction to Scientific Computing for Genomics" (or equivalent experience).