Course Descriptions

This course provides a formal introduction to the basic theory and methods of probability and statistics. It covers topics in probability theory with an emphasis on those needed in statistics, including probability and sample spaces, independence, conditional probability, random variables, parametric families of distributions, and sampling distributions. Core concepts are mastered through mathematical exploration and linkage with the applied concepts studied in BIOSTAT 704.
Prerequisite(s): 2 semesters of calculus or its equivalent (multivariate calculus preferred). Familiarity with linear algebras is helpful. Corequisite(s): BIOSTAT 702, BIOSTAT 703.
Credits: 3

This course provides an introduction to study design, descriptive statistics, and analysis of statistical models with one or two predictor variables. Topics include principles of study design, basic study designs, descriptive statistics, sampling, contingency tables, one- and two-way analysis of variance, simple linear regression, and analysis of covariance. Both parametric and non-parametric techniques are explored. Core concepts are mastered through team-based case studies and analysis of authentic research problems encountered by program faculty and demonstrated in practicum experiences in concert with BIOSTAT 703. Computational exercises will use the R and SAS packages.
Prerequisite(s): 2 semesters of calculus or its equivalent (multivariate calculus preferred). Familiarity with linear algebras is helpful. Corequisites(s): BIOSTAT 701, BIOSTAT 703, BIOSTAT 721.
Credits: 3

This course provides an introduction to biology at a level suitable for practicing biostatisticians and directed practice in techniques of statistical collaboration and communication. With an emphasis on the connection between biomedical content and statistical approach, this course helps unify the statistical concepts and applications learned in BIOSTAT 701 and BIOSTAT 702. In addition to didactic sessions on biomedical issues, students are introduced to different areas of biostatistical practice at Duke University Medical Center. Biomedical topics are organized around the fundamental mechanisms of disease from both evolutionary and mechanistic perspectives, illustrated using examples from infectious disease, cancer and chronic /degenerative disease. In addition, students learn how to read and interpret research and clinical trial papers. Core concepts and skills are mastered through individual reading and class discussion of selected biomedical papers, team-based case studies and practical sessions introducing the art of collaborative statistics.
Corequisite(s): BIOSTAT 701, BIOSTAT 702.
Credits: 3

The lab is an extension of the course. The lab is run like a journal club. The lab instructs students how to dissect a research article from a statistical and scientific perspective. The lab provides students the opportunity to present on material covered in the co-requisite course and to practice the communication skills that are a core tenant of the program.
Corequisite(s): BIOSTAT 703 or permission of the director of graduate studies.
Credits: 0

This course provides formal introduction to the basic theory and methods of probability and statistics. It covers topics in statistical inference, including classical and Bayesian methods, and statistical models for discrete, continuous and categorical outcomes. Core concepts are mastered through mathematical exploration, simulations, and linkage with the applied concepts studied in BIOSTAT 705.
Prerequisite(s): BIOSTAT 701 or its equivalent. Corequisite(s): BIOSTAT 705, BIOSTAT 706.
Credits: 3

This course provides an introduction to general linear models and the concept of experimental designs. Topics include linear regression models, analysis of variance, mixed-effects models, generalized linear models (GLM) including binary, multinomial responses and log-linear models, basic models for survival analysis and regression models for censored survival data, and model assessment, validation and prediction. Core concepts are mastered through statistical methods application and analysis of practical research problems encountered by program faculty and demonstrated in practicum experiences in concert with BIOSTAT 706. Computational examples and exercises will use the SAS and R packages.
Prerequisite(s): BIOSTAT 702 or its equivalent. Corequisite(s): BIOSTAT 704, BIOSTAT 706, BIOSTAT 722/821.
Credits: 3

This course revisits the topics covered in BIOSTAT 703 in the context of high-throughput, high-dimensional studies such as genomics and transcriptomics. The course will be based on reading of both the textbook and research papers. Students will learn the biology and technology underlying the generation of “big data,” and the computational and statistical challenges associated with the analysis of such data sets. As with BIOSTAT 703, there will be strong emphasis on the development of communication skills via written and oral presentations.
Prerequisite(s): BIOSTAT 703. Corequisite(s): BIOSTAT 704, BIOSTAT 705.
Credits: 3

This course surveys a number of techniques for high dimensional data analysis useful for data mining, machine learning and genomic applications, among others. Topics include principal and independent component analysis, multidimensional scaling, tree-based classifiers, clustering techniques, support vector machines and networks, and techniques for model validation. Core concepts are mastered through the analysis and interpretation of several actual high dimensional genomics datasets.
Prerequisite(s): BIOSTAT 701, 702, 704, 705, and 721 or 722/821 or their equivalents, or permission of the director of graduate studies.
Credits: 3

Topics include: history/background and process for clinical trial, key concepts for good statistics practice (GSP)/good clinical practice (GCP), regulatory requirement for pharmaceutical/clinical development, basic considerations for clinical trials, designs for clinical trials, classification of clinical trials, power analysis for sample size calculation, statistical analysis for efficacy evaluation, statistical analysis for safety assessment, implementation of a clinical protocol, statistical analysis plan, data safety monitoring, adaptive design methods in clinical trials (general concepts, group sequential design, dose finding design, and phase I/II or phase II/III seamless design) and controversial issues in clinical trials.
Prerequisite(s): BIOSTAT 701, 702, 704, 705, and 721 or 722/821 or their equivalents, or permission of the director of graduate studies.
Credits: 3

Methods for causal inference, including confounding and selection bias in observational or quasi-experimental research designs, propensity score methodology, instrumental variables, and methods for non-compliance
in randomized clinical trials.
Prerequisite(s): BIOSTAT 701, 702, 704, 705, and 721 or 722/821 or their equivalents, or permission of the director of graduate studies.
Credits: 3

Topics from current and classical methods for assessing familiality and heritability, linkage analysis of Mendelian and complex traits, family-based and population-based association studies, genetic heterogeneity, epistasis, and gene-environmental interactions. Computational methods and applications in current research areas. The course will include a simple overview of genetic data, terminology, and essential population genetic results. Topics will include sampling designs in human genetics, gene frequency estimation, segregation analysis, linkage analysis, tests of association, and detection of errors in genetic data.
Prerequisite(s): BIOSTAT 701, 702, 704, 705, and 721 or 722/821 or their equivalents, or permission of the director of graduate studies.
Credits: 3

Introduction to concepts and techniques used in the analysis of time to event data, including censoring, hazard rates, estimation of survival curves, regression techniques, applications to clinical trials. Interval censoring, informative censoring, competing risks, multiple events and multiple endpoints, time dependent covariates; nonparametric and semi- parametric methods.
Prerequisite(s): BIOSTAT 701, 702, 704, 705, and 721 or 722/821 or their equivalents, or permission of the director of graduate studies.
Credits: 3

Topics in categorical modeling and data analysis/contingency tables; measures of association and testing; logistic regression; log-linear models; computational methods including iterative proportional fitting; models for sparse data; Poisson regression; models for ordinal categorical data, and longitudinal analysis. 
Prerequisite(s): BIOSTAT 701, 702, 704, 705, and 721 or 722/821 or their equivalents, or permission of the director of graduate studies. 
Credits: 3

Topics include linear and nonlinear mixed models; generalized estimating equations; subject specific versus population average interpretation; and hierarchical model.
Prerequisite(s): BIOSTAT 701, 702, 704, 705, and 721 or 722/821 or their equivalents, or permission of the director of graduate studies.
Credits: 3

The class introduces the concept of exponential family of distributions and link function, and their use in generalizing the standard linear regression to accommodate various outcome types. Theoretical framework will be presented but detailed practical analyses will be performed as well, including logistic regression and Poisson regression with extensions. Majority of the course will deal with the independent observations framework. However, there will be substantial discussion of longitudinal/clustered data where correlations within clusters are expected. To deal with such data the Generalized Estimating Equations and the Generalized Linear Mixed models will be introduced. An introduction to a Bayesian analysis approach will be presented, time permitting.
Prerequisite(s): BIOSTAT 701, 702, 704, 705, and 721 or 722/821 or their equivalents, or permission of the director of graduate studies.
Credits: 3

Completed during a student’s final year of study, the master’s project is performed under the direction of a faculty mentor and is intended to demonstrate general mastery of biostatistical practice.
Prerequisite(s): BIOSTAT 701 through BIOSTAT 706.
Credits: 3 in Fall Semester and 3 in Spring Semester

This class is an introduction to programming in R, targeted at statistics majors with minimal programming knowledge, which will give them the skills to grasp how statistical software works, tweak it  to suit their needs, recombine existing pieces of code, and when needed create their own programs. Students will learn the core of ideas of programming (functions, objects, data structures, input and output, debugging, and logical design) through writing code to assist in numerical and graphical statistical analyses. Students will learn how to write maintainable code, and to test code for correctness. They will then learn how to set up stochastic simulations and how to work with and filter large data sets. Since code is also an important form of communication among scientists, students will learn how to comment and organize code to achieve reproducibility. Programming techniques and their application will be closely connected with the methods and examples presented in the co-requisite course.
The primary programming package used in this course will be R.
Prerequisite(s): None; familiarity with linear algebras is helpful. Corequisite(s): BIOSTAT 702.
Credits: 3

This class is an introduction to programming in SAS, targeted at statistics majors with minimal programming knowledge, which will give them the skills to grasp how statistical software works, tweak it to suit their needs, recombine existing pieces of code, and when needed create their own programs. Students will learn the core of ideas of programming (data step, procedures, macros, ODS, input and output, debugging, and logical design) through writing code to assist in numerical and graphical statistical analyses. Students will learn how to write maintainable code, and to test code for correctness. They will then learn how to set up stochastic simulations and how to work with and filter large data sets. Since code is also an important form of communication among scientists, students will learn how to comment and organize code to achieve reproducibility. Programming techniques and their application will be closely connected with the methods and examples presented in the co-requisite course. The primary programming package focus used in this course will be SAS.
Prerequisite(s): None; familiarity with linear algebras is helpful. Corequisite(s): BIOSTAT 705.
Credits: 3

Independent Study is a semester long course focused on mentored research in the practice of biostatistics. Students work with an assigned mentor. This course is only open to students by permission of the director of graduate studies.
Credits: 1, 2, or 3

Continuation is a semester-based, noncredit-bearing enrollment status used when a student is continuing scholarly activities with the same mentor. This course is only open to students by permission of the director of graduate studies.
Credits: 0

The purpose of this course is to give the student a holistic view of career choices and development and the tools they will need to succeed as professionals in the world of work. The fall semester will focus on resume development, creating a professional presence, networking techniques, what American employers expect in the workplace, creating and maintaining a professional digital presence and learning how to conduct and succeed at informational interviews. Practicums in this semester include an informational interviewing and networking practicum with invited guests. Students participate in a professional “etiquette dinner” and a “dress for success” module as well an employer panel.
Corequisite(s): BIOSTAT 701 through BIOSTAT 703.
Credit: 1

The purpose of this course is to further develop the student’s job seeking ability and the practical aspects of job/internship search or interviewing for a PhD program. The goal is to learn these skills once and use them for a lifetime. Modules that will be covered include: Communication skills both written and oral, interviewing with videotaped practice and review, negotiating techniques, potential career choices in the Biostatistics marketplace, and working on a team. This semester includes writing and interviewing practicum, and a panel of relevant industry speakers. Students will leave this course with the knowledge to manage their careers now and in the future.
Prerequisite: BIOSTAT 801.
Credit: 1

A data scientist needs to master several different tools to obtain, process, analyze, visualize and interpret large biomedical data sets such as electronic health records, medical images, and genomic sequences. It is also critical that the data scientist masters the best practices associated with using these tools, so that the results are robust and reproducible. The course covers foundational tools that will allow students to assemble a data science toolkit, including the Unix shell, text editors, regular expressions, relational and NoSQL databases, and the Python programming language for data munging, visualization and machine learning. Best practices that students will learn include the Findable, Accessible, Interoperable and Reusable (FAIR) practices for data stewardship, as well as reproducible analysis with literate programming, version control and containerization.
Prerequisite: BIOSTAT 721 and permission of the director of graduate studies.
Credits: 3

This course will build on the foundation laid in software tools for data science. The course will explore the flow of a typical data science project from importing, cleaning, transforming and visualizing datasets to modeling and communicating results, within the context of R programming. While the course will include best practices, syntax and idioms specific to R, the focus will be on the process of conducting analysis in a reproducible fashion, writing readable, well-documented code and creating a coherent presentation of results.
Prerequisite: BIOSTAT 722 or BIOSTAT 821 or permission of the director of graduate studies.
Credits: 3

This course describes the challenges faced by analysts with the increasing importance of large data sets, and the strategies that have been developed in response to these challenges. The core topics are how to manage data and how to make computation scalable. The data management module covers guidelines for working with open data, and the concepts and practical skills for working with in-memory, relational and NoSQL databases. The scalable computing module focuses on asynchronous, concurrent, parallel and distributed computing, as well as the construction of effective workflows following DevOps practices. Applications to the analysis of structured, semi-structured and unstructured data, especially from biomedical contexts, will be interleaved into the course. The course examples are primarily in Python and fluency in Python is assumed.
Prerequisite(s): BIOSTAT 821 or permission of the director of graduate studies.
Credits: 3

This course will highlight how biomedical data science blends the field of biostatistics with the field of computer science through the introduction of 3 to 5 case studies. Students will be introduced to analytic programs typically encountered in biomedical data science and will implement the data science and statistical skills introduced in their previous course work.
Prerequisite(s): BIOSTAT 707, 821, 822, and 823 or permission of the director of graduate studies.
Credits: 3

Introduction to linear models and linear inference from the coordinate- free viewpoint. Topics: identifiability and estimability, key properties of and results for finite-dimensional vector spaces, linear transformations, self-adjoint transformations, spectral theorem, properties and geometry of orthogonal projectors, Cochran’s theorem, estimation and inference for normal models, distributional properties of quadratic forms, minimum variance linear unbiased estimation, Gauss-Markov theorem and estimation, calculus of differentials, analysis of variance and covariance.
Prerequisite(s): Biostatistics 702, 704, 705, real analysis, and linear algebra, or consent of the instructor and director of graduate studies.
Credits: 3

Introduce decision theory and optimality criteria, sufficiency, methods for point estimation, confidence interval and hypothesis testing methods and theory. Prerequisite: Biostatistics 704 or equivalent. Instructor consent required.
Prerequisite: Permission of the director of graduate studies.
Credits: 3

The theory for M- and Z- estimators and applications. Semiparametric models, geometry of efficient score functions and efficient influence functions, construction of semiparametric efficient estimators. Introduction to the bootstrap: consistency, inconsistency and remedy, correction for bias, and double bootstrap. U statistics and rank and permutation tests.
Prerequisite: STA 711 and BIOSTAT 906 or Permission of the director of graduate studies.
Credits: 3

Introduction to probabilistic graphical models and structured prediction, with applications in genetics and genomics. Hidden Markov Models, conditional random fields, stochastic grammars, Bayesian hierarchical models, neural networks, and approaches to integrative modeling. Algorithms for exact and approximate inference. Applications in DNA/RNA analysis, phylogenetics, sequence alignment, gene expression, allelic phasing and imputation, genome/epigenome annotation, and gene regulation.
Prerequisite: Permission of the director of graduate studies.
Credits: 3

The goal of this course is to provide motivated Ph.D. and master’s students with background knowledge of high-dimensional statistics/machine learning for their research, especially in their methodology and theory development. Discussions cover theory, methodology, and applications. Selected topics in this course include the basics of high-dimensional statistics, matrix and tensor modeling, concentration inequity, nonconvex optimization, applications in genomics, and biomedical informatics.
Prerequisite: Knowledge in probability, inference, and basic algebra are required.
Credits: 3

Topology of R^n, continuous functions, uniform convergence, compactness, infinite series, theory of differentiation, and integration. Not open to students who have had Mathematics 431.
Prerequisite: Mathematics 221.

Algebraic and topological structure of the real number system; rigorous development of one-variable calculus including continuous, differentiable, and Riemann integrable functions and the Fundamental Theorem of Calculus; uniform convergence of a sequence of functions; contributions of Newton, Leibniz, Cauchy, Riemann, and Weierstrass. An assignment will ask the student to relate this course to their research.