Course Descriptions

This course provides a formal introduction to the basic theory and methods of probability and statistics. It covers topics in probability theory with an emphasis on those needed in statistics, including probability and sample spaces, independence, conditional probability, random variables, parametric families of distributions, and sampling distributions. Core concepts are mastered through mathematical exploration and simulation. Prerequisite(s): 2 semesters of calculus, including multivariate calculus, or its equivalent. Familiarity with linear algebra is helpful. Credits: 3

This course provides an advanced formal introduction to the basic theory and methods of probability and statistics. It covers topics in probability theory with an emphasis on those needed in statistics, including probability and sample spaces, independence, conditional probability, random variables, parametric families of distributions, and sampling distributions. Core concepts are mastered through mathematical exploration and simulation. Prerequisite(s): 2 semesters of calculus, including multivariate calculus, or its equivalent. Familiarity with linear algebra is helpful. Director of Graduate Studies permission is required. Credits: 3

This course provides an introduction to study design, descriptive statistics, and analysis of statistical models with oneor two predictor variables. Topics include principles of study design, basic study designs, descriptive statistics, sampling, contingency tables, one-and two-way analysis of variance, simple linear regression, analysis of covariance. Both parametric and non-parametric techniques are explored. Computational exercises will use the R and SAS packages. Prerequisite(s): 2 semesters of calculus or its equivalent (multivariate calculus preferred). Familiarity with linear algebra is helpful. Credits 3.

This course provides an advanced introduction to study design, descriptive statistics, and analysis of statistical models with one or two variables. Topics include principles of study design, basic study designs, descriptive statistics, sampling, contingency tables, one- and two-way analysis of variance, simple linear regression, and analysis of covariance. Both parametric and non-parametric techniques are explored. Computational exercises will use the R and SAS packages. Prerequisite(s): 2 semesters of calculus or its equivalent (multivariate calculus preferred). Familiarity with linear algebra is helpful. Director of Graduate Studies permission is required. Credits: 3

This course provides an introduction to biology at a level suitable for practicing biostatisticians and directed practice in techniques of statistical collaboration and communication. With an emphasis on the connection between biomedical content and statistical approach, this course helps unify the statistical concepts and applications learned in BIOSTAT 701 and BIOSTAT 702. Biomedical topics are organized around the fundamental mechanisms of disease from both evolutionary and mechanist perspectives. In addition, students learn how to read and interpret research and clinical research papers. Core concepts and skills are mastered through individual reading and class discussion of selected biomedical papers, team-based case studies, and practical sessions introducing the art of collaborative statistics. Credits: 3

This course provides an advanced introduction to biology at a level suitable for practicing biostatisticians and directed practice in techniques of statistical collaboration and communication. With an emphasis on the connection between biomedical content and statistical approach, this course helps unify the statistical concepts and applications learned in BIOSTAT 701 and BIOSTAT 702. Biomedical topics are organized around the fundamental mechanisms of disease from both evolutionary and mechanistic perspectives. In addition, students learn how to readand interpret research and clinical research papers. Core concepts and skills are mastered through individual reading and class discussion of selected biomedical papers, team-based case studies and practical sessions introducing the art of collaborative statistics. Director of Graduate Studies permission is required. Credits: 3

The lab is an extension of the course and operates like a seminar in which journal articles are used as a basis for discussion. The primary focus is on teaching students how to dissect a research article from a statistical and scientific perspective. Students also have the opportunity to present on material covered in the co-requisite course and to practice the communication skills that are a core focus of the program. Corequisite(s): BIOSTAT 703/BIOSTAT 703A or permission of the director of graduate studies. Credits: 0

This course provides formal introduction to the basic theory and methods of probability and statistics. It covers topics in statistical inference,including classical and Bayesian methods, and statistical models for discrete, continuous and categorical outcomes. Core concepts are mastered through mathematical exploration and simulations. Credits: 3

This course provides formal introduction to the basic theory and methods of probability and statistics. It covers topics in statistical inference, including classical and Bayesian methods, and statistical models for discrete, continuous, and categorical outcomes. Core concepts are mastered through mathematical exploration and simulations. Director of Graduate Studies permission is required. Credits: 3

This course provides an introduction to general linear models and the concept of experimental designs. Topics include linear regression models, analysis of variance, mixed-effects models, generalized linear models (GLM) including binary, multinomial responses and log-linear models, basic models for survival analysis and regression models for censored survival data, and model assessment, validation and prediction. Core concepts are mastered through statistical methods application and analysis of practical research problems encountered by program faculty and demonstrated in practicum experiences in concert with BIOSTAT 706/BIOSTAT 706A. Computational examples and exercises will use the SAS and R packages. Credits: 3

This course provides an advanced introduction to general linear models and the concept of experimental designs. Topics include linear regression models, analysis of variance, mixed-effects models, generalized linear models (GLM) including binary, multinomial responses and log-linear models, basic models for survival analysis and regression models for censored survival data, and model assessment, validation, and prediction. Core concepts are mastered through statistical methods application and analysis of practical research problems encountered by program faculty and demonstrated in practicum experiences in concert with BIOSTAT 706/BIOSTAT 706A. Computational examples and exercises will use the SAS and R packages. Director of Graduate Studies permission is required. Credits: 3

This course revisits the topics covered in BIOSTAT 703 in the  context of high-throughput, high-dimensional studies such as genomics and transcriptomics. The course will be based on the reading of both the textbook and research papers. Students will learn the biology and technology underlying the generation of "big data" and the computational and statistical challenges associated with the analysis of such data sets. As with BIOSTAT 703, there will be a strong emphasis on the development of communication skills via written and oral presentations. Credits: 3

This course revisits the advanced topics covered in BIOSTAT 703 in the  context of high-throughput, high-dimensional studies such as genomics and transcriptomics. The course will be based on reading of both the textbook and research papers. Students will learn the biology and technology underlying the generation of “big data,” and the computational and statistical challenges associated with the analysis of such data sets. As with BIOSTAT 703, there will be strong emphasis on the development of communication skills via written and oral presentations. Director of Graduate Studies permission is required. Credits: 3

This course surveys machine learning methods for biological applications, with emphasis on probabilistic approaches and applications in genetics and genomics. Topics include neural networks, probabilistic graphical models, Bayes' nets, Markov models, decision trees and random forests, support vector machines, clustering, Bayesian regression, Markov chain Monte Carlo, and methods for training and validating models. Coursework will include practical programming assignments in R. Students will be expected to become familiar with basic concepts from graph theory and algorithms. Prerequisite(s): BIOSTAT 706/706A or permission of the director of graduate studies. Credits: 3

Topics include history/background and process for clinical trial, key concepts for good statistics practice (GSP)/good clinical practice (GCP), regulatory requirement for pharmaceutical/clinical development, basic considerations for clinical trials, designs for clinical trials, classification of clinical trials, power analysis for sample size calculation, statistical analysis for efficacy evaluation, statistical analysis for safety assessment, implementation of a clinical protocol, statistical analysis plan, data safety monitoring, adaptive design methods in clinical trials (general concepts, group sequential design, dose-finding design, and phase I/II or phase II/III seamless design) and controversial issues in clinical trials. Prerequisite(s): BIOSTAT 706/706A or permission of the director of graduate studies. Credits: 3

Methods for causal inference, including confounding and selection bias in observational or quasi-experimental research designs, propensity score methodology, instrumental variables, and methods for non-compliance in randomized clinical trials. Prerequisite(s): BIOSTAT 706/706A or permission of the Director of Graduate Studies. Credits: 3

Topics from current and classical methods for assessing familiality and heritability, linkage analysis of Mendelian and complex traits, family-based and population-based association studies, genetic heterogeneity, epistasis, and gene-environmental interactions. Computational methods and applications in current research areas. The course will include a simple overview of genetic data, terminology, and essential population genetic results. Topics will include sampling designs in human genetics, gene frequency estimation, segregation analysis, linkage analysis, tests of association, and detection of errors in genetic data. Prerequisite(s): BIOSTAT 706/706A or permission of the Director of Graduate Studies. Credits: 3

Introduction to concepts and techniques used in the analysis of time-to-event data, including censoring, hazard rates, estimation of survival curves, regression techniques, and applications to clinical trials. Interval censoring, informative censoring, time-dependent covariates; nonparametric and semi-parametric methods. Prerequisite(s): BIOSTAT 706/706A or permission of the Director of Graduate Studies. Credits: 3

Topics include linear and non-linear mixed models; generalized estimating equations; subject-specific versus population average interpretation; and hierarchal models. Prerequisite(s): BIOSTAT 706/706A or permission of the Director of Graduate Studies. Credits: 3

The class introduces the concept of the exponential family of distributions and link functions, and their use in generalizing the standard linear regression to accommodate various outcome types. The theoretical framework will be presented but detailed practical analyses will be performed as well, including logistic regression and Poisson regression with extensions. The majority of the course will deal with the independent observations framework. However, there will be substantial discussion of longitudinal/clustered data where correlations within clusters are expected. To deal with such data the Generalized Estimating Equations and the Generalized Linear Mixed models will be introduced. An introduction to a Bayesian analysis approach will be presented, time permitting. Prerequisite(s): BIOSTAT 706/706A or permission of the Director of Graduate Studies. Credits: 3

Completed during a student’s final year of study, the master’s project is performed under the direction of a faculty mentor and is intended to demonstrate general mastery of biostatistical practice. Prerequisite(s): BIOSTAT 706/706A. Credits: 3 in Fall Semester and 3 in Spring Semester.

This class is an introduction to programming in R, targeted at those with minimal programming knowledge. Students will learn the core ideas of programming (functions, objects, data structures, input and output, debugging, and logical design) through writing code to assist in numerical and graphical statistical analyses. Students will learn how to write maintainable code, and to test code for correctness. They will then learn how to set up stochastic simulations and how to work with and filter large data sets. Since code is also an important form of communication among scientists, students will learn how to comment and organize code to achieve reproducibility. Prerequisite(s): None; familiarity with linear algebra is helpful. Permission of the Director of Graduate Studies. Credits: 3

This class is an advanced introduction to programming in R, targeted at those with minimal programming knowledge. Students will learn the core ideas of programming (functions, objects, data structures, input and output, debugging, and logical design) through writing code to assist in numerical and graphical statistical analyses. Students will learn how to write maintainable code, and to test code for correctness. They will then learn how to set up stochastic simulations and how to work with and filter large data sets. Since code is also an important form of communication among scientists, students will learn how to comment and organize code to achieve reproducibility. Prerequisite(s): None; familiarity with linear algebra is helpful. Permission of the director of graduate studies. Credits: 3

This class is an introduction to programming in SAS, targeted at those with minimal programming knowledge. Topics build from data management programming to statistical programming.  Algorithms and data structures are emphasized. Prerequisite(s): None; familiarity with linear algebra is helpful. Credits: 3

This class is an advanced introduction to programming in SAS, targeted at those with minimal programming knowledge. Topics build from data management programming to statistical programming. Algorithms and data structures are emphasized. Prerequisite(s): None; familiarity with linear algebra is helpful. Permission of the director of graduate studies. Credits: 3

This is a first course in Bayesian statistical analysis for graduate students in biostatistics. The fundamentals of Bayesian inference are introduced, including Bayes’ Theorem and prior and posterior distributions. Bayesian inference is compared and contrasted with frequentist methods through application to common problems in biostatistics. Inference based on conjugate families, as well as a computation-based introduction to Markov chain Monte Carlo methods is presented. Bayesian regression models are introduced, including model checking and selection, followed by an introduction to Bayesian hierarchical regression models. The course format emphasizes applied data analysis and is more heavily weighted toward heuristics and computation-based exploration of Bayesian methods rather than an intense mathematical treatment. Students should have a working knowledge of probability theory, likelihood, and applied frequentist data analysis including linear and logistic regression, and an understanding of how calculus is used in biostatistical applications. Prerequisite: None. Credits: 3

This course will teach students how to analyze biomedical data from a Bayesian inference perspective with a strong emphasis on using real-world data, including electronic health records, wearables, and imaging data. This is a second course in Bayesian inference and will start by introducing the hierarchical model as a flexible framework for analyzing complex data structures. This includes missing data, and spatial and longitudinal data. Then Bayesian machine learning approaches will be introduced, including regularization for high-dimensional data and scalable inference techniques for big data, including variational inference. Additional topics may be discussed from the Bayesian perspective, including causal inference, meta-analysis, and time-to-event data. While an applied course, the methods will be introduced from a mathematical perspective, allowing students to obtain a fundamental understanding of the introduced models. Students will learn computational skills for implementing Bayesian models using R and Stan. By the end of this course, students will be well-equipped to tackle complex problems in biomedical research using Bayesian inference. Prerequisite: BIOSTAT 724 or equivalent course with instructor permission. Credits: 3

 

Independent Study is a semester-long course focused on mentored research in the practice of biostatistics. Students work with an assigned mentor. This course is only open to students by permission of the Director of Graduate Studies. Credits: 1, 2, or 3

Continuation is a semester-based, noncredit-bearing enrollment status used when a student is continuing scholarly activities with the same mentor. This course is only open to students by permission of the Director of Graduate Studies. Credits: 0

The student gains a holistic view of career options and the tools they will need to succeed as professionals in the world of work. The course will focus on resume development, cover letters, creating and maintaining a professional digital presence, and successfully conducting informational interviews. Credit: 1

Student develops knowledge of their own strengths and learns how to leverage their abilities for professional development. The course will focus on personal strengths, teamwork, interpersonal communication, in-person networking events, interviewing, and salary negotiations. Credit: 1

A data scientist needs to master several different tools to obtain, process, analyze, visualize and interpret large biomedical data sets such as electronic health records, medical images, and genomic sequences. It is also critical that the data scientist masters the best practices associated with using these tools, so the results are robust and reproducible. The course covers foundational tools that will allow students to assemble a data science toolkit, including the Unix shell, text editors, regular expressions, relational and NoSQL databases, and the Python programming language for data munging, visualization and machine learning. Best practices that students will learn include the Findable, Accessible, Interoperable and Reusable (FAIR) practices for data stewardship, as well as reproducible analysis with literate programming version control and containerization. Credits: 3

This course will build on the foundation laid in software tools for data science. The course will explore the flow of a typical data science project from importing, cleaning, transforming and visualizing datasets to modeling and communicating results, within the context of R programming. While the course will include best practices, syntax and idioms specific to R, the focus will be on the process of conducting analysis in a reproducible fashion, writing readable, well-documented code and creating a coherent presentation of results. Credits: 3

This course describes the challenges faced by analysts with the increasing importance of large data sets, and the strategies that have been developed in response to these challenges. The core topics are how to manage data and how to make computation scalable. The data management module covers guidelines for working with open data, and the concepts and practical skills for working with in-memory, relational and NoSQL databases. The scalable computing module focuses on asynchronous, concurrent, parallel and distributed computing, as well as the construction of effective workflows following DevOps practices. Applications to the analysis of structured, semi-structured andunstructured data, especially from biomedical contexts, will be interleaved into the course. The course examples are primarily in Python and fluency in Python is assumed. Credits: 3

This course will highlight how biomedical data science blends the field of biostatistics with the field of computer science through the introduction of 3 to 5 case studies. Students will be introduced to analytic programs typically encountered in biomedical data science and will implement the data science and statistical skills introduced in their previous coursework. Credits: 3

This course focuses on theoretical and algorithmic foundations of bandits and reinforcement learning, involving topics including upper confidence bound methods, Thompson sampling, linear and deep contextual bandits, Markov decision process, Q-learning, policy gradient methods, etc. The course targets graduate-level students with a solid mathematical background (linear algebra, probability and statistics, and basic calculus), and a strong research interest in bandits and reinforcement learning. Prerequisite(s): linear algebra, probability and statistics, and basic calculus, or consent of the instructor and director of graduate studies. Credits: 3

Advanced seminar on topics at the research frontiers in biostatistics. Readings of current  biostatistical research and presentations by faculty and advanced students of current research in their area of specialization. Credit: 1

Introduction to linear models and linear inference from the coordinate- free viewpoint. Topics: identifiability and estimability, key properties of and results for finite-dimensional vector spaces, linear transformations, self-adjoint transformations, spectral theorem, properties and geometry of orthogonal projectors, Cochran's theorem, estimation and interference for normal models, distributional properties of quadratic forms, minimum variance linear unbiased estimation, Gauss-Markov theorem and estimation, calculus of differentials, analysis of variance and covariance. Prerequisite(s): real analysis, and linear algebra, or consent of the instructor and Director of Graduate Studies. Credits: 3

Introduce decision theory and optimality criteria, sufficiency, methods for point estimation, confidence interval and hypothesis testing methods and theory. Prerequisite: Biostatistics 704 or equivalent. Instructor consent required, permission of the Director of Graduate Studies. Credits: 3

The theory for M-and Z-estimators and applications. Semi-parametric models, geometry of efficient score functions and efficient influence functions, construction of semi-parametric efficient estimators. Introduction to the bootstrap: consistency, inconsistency and remedy, correction for bias, and double bootstrap. U statistics and rank and permutation tests. Prerequisite(s): STA 711 and BIOSTAT 906 or permission of the Director of Graduate Studies. Credits: 3

 

Introduction to probabilistic graphical models and structured prediction, with applications in genetics and genomics. Hidden Markov Models, conditional random fields, stochastic grammars, Bayesian hierarchical models, neural networks, and approaches to integrative modeling. Algorithms for exact and approximate inference. Applications in DNA/RNA analysis, phylogenetics, sequence alignment, gene expression, allelic phasing and imputation, genome/epigenome annotation, and gene regulation. Prerequisite(s): Permission of the Director of Graduate Studies. Credits: 3

 

The goal of this course is to provide motivated Ph.D. and master’s students with background knowledge of high-dimensional statistics/machine learning for their research, especially in their methodology and theory development. Discussions cover theory, methodology, and applications. Selected topics in this course include the basics of high-dimensional statistics, matrix and tensor modeling, concentration inequity, nonconvex optimization, applications in genomics, and biomedical informatics. Prerequisite: Knowledge in probability, inference, and basic algebra are required. Credits: 3

Topology of R^n, continuous functions, uniform convergence, compactness, infinite series, theory of differentiation, and integration. Not open to students who have had Mathematics 431.
Prerequisite: Mathematics 221.

Algebraic and topological structure of the real number system; rigorous development of one-variable calculus including continuous, differentiable, and Riemann integrable functions and the Fundamental Theorem of Calculus; uniform convergence of a sequence of functions; contributions of Newton, Leibniz, Cauchy, Riemann, and Weierstrass. An assignment will ask the student to relate this course to their research.