MSCI Biostatistics II Syllabus


Key:

Hn Harrell Chapter n in the book’s second edition
Bn section n in Biostatistics for Biomedical Research notes by Harrell
An section n in The Analysis of Biological Data by Whitlock and Schluter
Ln lecture n

General Introduction

Simple and Multiple Regression Models (B10, A17-18, L1)

  1. Background (B10 p. 1-2)
  2. Stratification vs. matching vs. regression (B10.1)
  3. Purposes of statistical modeling: prediction, estimation & hypothesis testing (B10.2)
  4. Advantages of modeling (B10.3)
  5. Nonparametric regression (B10.4)
  6. Simple linear regression (B10.5.1-10.5.8, A17)
  7. Proper transformations and percentiling (B10.6, L2)
  8. Multiple linear regression (B10.7, A18)
  9. Multiple Regression with a binary predictor (B10.8)
  10. Correlation coefficient revisited (B8.4, B8.5.2, B10.9)
  11. Using regression for ANOVA (B10.10-B10.10.3)
  12. Two-Way ANOVA (B10.10.4)
  13. Heterogeneity of treatment effect (differential treatment effect; interactions) (B10.10.6, B13.6.1)

Introduction to Regression Modeling Strategies (L7, H1)

  1. Review of hypothesis testing vs. estimation vs. prediction
  2. Examples of multivariable prediction problems
  3. Study planning considerations
  4. Choice of model

General Methods for Multivariable Models (H2)

  1. Notation for general regression models
  2. Model formulations
  3. Interpreting model parameters
    1. nominal predictors
    2. interactions
  4. Categorization of continuous variables is not a solution to non-linearity; demonstration: getRs('catgNoise.s')
  5. Relaxing linearity assumption for continuous predictors (L8)
    1. nonparametric smoothing
    2. simple nonlinear terms
    3. splines for estimating shape of regression function and determining predictor transformations
    4. cubic spline functions
    5. restricted cubic splines
    6. advantages of splines over other methods such as nonparametric regression
  6. Recursive partitioning and tree models in a nutshell
  7. Tests of association
  8. Assessment of model fit
    1. regression assumptions
    2. modeling and testing interactions

Missing Data (H3, L9)

  1. Types of missing data
  2. Prelude to modeling
  3. Problems with alternatives to imputation
  4. Strategies for developing imputations
  5. Multiple imputation

Multivariable Modeling Strategies (H4 omitting 4.2, 4.7.3-4, 4.7.7)

  1. Spending d.f.
  2. Pre-specification of predictor complexity
  3. Variable selection
  4. Overfitting and number of predictors
  5. Shrinkage
  6. Collinearity
  7. Data reduction
  8. Overly influential observations
  9. Comparing two models - see also this
  10. Overall modeling strategies

Bootstrap, Validating and Describing the Model (H5, L14)

  1. Describing the fitted model
  2. Bootstrap
  3. Model validation
  4. Internal vs. external model validation (B10.11)
  5. Bootstrapping ranks of predictors
  6. How to break bad habits

Case study in Longitudinal Modeling (B15, H7)

Binary Logistic Model (H10, L15)

  1. Model
  2. Odds ratios
  3. Student presentations
  4. Special residual plots
  5. Applications of general methods
  6. Graphically presenting model
  7. Overview of Bayesian logistic model (B6.10.3)
  8. Case study 1 (H11)
  9. Case study 2 (H12)

Risk-Based Diagnostic Assessment (B19)

Analysis of Covariance in Randomized Studies (B13-13.7)

See also blog article 1, blog article 2

Proportional Odds Ordinal Logistic Models (B7.6, B4.1.2, B5.12.4-.5, H13.1-13.3)

  1. Model
  2. Odds ratios
  3. Applications of general methods
  4. Example for an ordinal clinical outcome
  5. Power (B7.8.3-.6)

Brief Introduction to Survival Analysis

  1. Basics (H17.1-17.3)
  2. Cox proportional hazards regression model (H20.1.1-20.1.2)
  3. Case study in parametric survival modeling (notes Chapter 14)
  4. Case study in Cox survival modeling (notes Chapter 15)

Optional: Challenges of High-Dimensional Data Analysis (B19, A p. 456-458)