Regression Modeling Strategies

Course Overview

All standard regression models have assumptions that must be verified for the model to have power to test hypotheses and for it to be able to predict accurately. Of the principal assumptions (linearity, additivity, distributional), this course will emphasize methods for assessing and satisfying the first two. For the last, emphasis is placed on semiparametric ordinal regression models that do not assume a distribution.

Practical but powerful tools are presented for validating model assumptions and presenting model results. This course provides methods for estimating the shape of the relationship between predictors and response using the widely applicable method of augmenting the design matrix using restricted cubic splines.

Even when assumptions are satisfied, overfitting can ruin a model’s predictive ability for future observations. Methods for data reduction will be introduced to deal with the common case where the number of potential predictors is large in comparison with the number of observations. Methods of model validation (bootstrap and cross-validation) will be covered, as will auxiliary topics such as modeling interaction surfaces, variable selection, overly influential observations, collinearity, predictive accuracy, variable importance, shrinkage, model interpretation, and chunk tests. A brief introduction to the rms package in R for handling these problems will also be covered. The course also introduces the Bayesian approach to modeling.

The methods covered will apply to almost any regression model, including:

Statistical models will be contrasted with machine learning so that the student can make an informed choice of predictive tools.

The 4-day course also has a session introducing causal inference with special attention to how causal inference should inform model specification.

Course Outline

Numbers in brackets refer to section numbers in the course notes.

Day 1

Session 1: Introduction 9:00-10:45

Session 2: General Aspects of Fitting Regression Models 11:00-12:00, 1:00-2:30

Session 3: Multivariable Modeling Strategies 2:45-4:00


Day 2

Session 4: Describing Model Fits and Model Validation 9:00-10:30

Session 5: Binary Logistic Regression 10:45-12:00

Session 6: Ordinal Logistic Regression 12:45-1:50

Note: Order for May 2026 was reversed for Sessions 7-8

Session 7: Regression Models for Continuous Y 2:00-2:50

Session 8: Modeling Longitudinal Responses using Generalized Least Squares 3:00-4:00


Day 3

Session 9: Causal Models for Variable Selection 9:00-10:15

Session 10: Semiparametric Ordinal Longitudinal Models 10:25-12:00

Session 11: Bayesian Modeling 1:00-2:30

Session 12: Parametric Survival Models 2:40-4:00


Day 4

Session 13: Cox Proportional Hazards Regression Model 9:00-10:20

Session 14: Ordinal Semiparametric Regression for Survival Analysis 10:30-12:00

Session 15: General Likelihood Ratio Test and Profile Confidence Limits 1:00-1:45

Session 16: Wrap-up: RMS Summary and Discussion 1:55-4:00

Target Audience

Statisticians and persons from other quantitative disciplines who are interested in multivariable regression analysis of univariate responses, in developing, validating, and graphically describing multivariable predictive models, and in covariable adjustment in clinical trials. The course will be of particular interest to:

A good command of ordinary multiple regression is a prerequisite. The one-day pre-RMS provides this prerequisite.

Learning Outcomes

Students will:

Instructional Methods

Extensive and tested handouts will be given to students. The course will be informal enough for students to be able to ask questions throughout the day. The style will be a mixture of lecture and presentation of moderately comprehensive case studies. Handouts make heavy use of graphics to facilitate learning.

The presentation and handouts show output from R functions, but software use is not covered in detail in the course. Students who are interested in later using free R software to run examples presented in the case studies may do so by installing the rms package available at www.r-project.org.

Presenters

Prof. Frank E. Harrell Jr.

Dr. Harrell is Professor of Biostatistics, Founding Chair of the Department of Biostatistics of Vanderbilt University School of Medicine.

He is author of the book Regression Modeling Strategies, Second Edition (Springer, 2015) and teaches courses in biostatistical modeling. He is a Fellow of the American Statistical Association and was the recipient of the ASA’s WJ Dixon award for excellence in statistical consulting in 2014. He is active on BlueSky and Twitter under @f2harrell and leads datamethods.org for in-depth discussion of data-related methodologies.

Drew G. Levy PhD

Dr. Levy has a PhD in Epidemiology from the Unviversity of Washington (Seattle) and heads Good Science, Inc.. He is moderator for the 4-day course and is instructor for the causal inference part of the course.

Textbook

Harrell, F.E. (2015). Regression Modeling Strategies with Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis, Second Edition. New York: Springer.

Handouts

Handouts are here.

Software