Regression Modeling Strategies

With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis

Book Information and Updates

Second Edition

  • Available for ordering here
  • Changes from the first edition
  • ISBN 978-3-319-19424-0
  • Errata
  • R code for all examples in the book’s 2nd edition. Numbers in file names are chapter numbers.
  • Alternate R Code
  • Reviews
    • Steve Miller’s blog
    • Ewout Steyerberg’s in Biometrics vol. 72 no. 3 September 2016, p. 1006-7. doi:10.1111/biom.12569
    • James E. Helmreich in Journal of Statistical Software vol. 70, 2016. here

First Edition

  • REGRESSION MODELING STRATEGIES with Applications to Linear Models, Logistic Regression, and Survival Analysis by FE Harrell. The book was published June 5 2001 by Springer New York, ISBN 0-387-95232-2. Click here to see the text from the book’s back cover. Click here to see the preface and table of contents for the book manuscript in .pdf format. Click here to obtain a partial index to the book in .pdf format, and) to here to see a sample chapter from the book ( Note:This material is Copyright 2001-2004 Springer-Verlag and may not be reproduced).
  • Changes and additions for the second edition
  • Reviews of the book:
    • Statistical Methods in Biomedical Research
    • Biometrics 58:477, June 2002
    • Bulletin of the Swiss Statistical Society (appearing also in Statistical Methods in Medical Research)
    • International Journal of Epidemiology 31(3):699-700, June 2002. Note: This otherwise excellent review states that the book recommends selecting variables to include in the model on the basis of their frequency of selection by a bootstrap procedure. This is definitely not the case.
    • Journal of the American Statistical Association 98:257-258, March 2003
    • Medical Decision Making, 23(2):182-183, April 2003
    • Technometrics 45:170, May 2003
    • Statistics in Medicine 22:2531-2532, 15 Aug 2003
    • Springer
    • Chemistry
  • Errata for the first and second printings. The book had its third printing in December 2002 and its fourth printing in December 2003. The sixth printing was in December 2005.
  • New versions of R code that makes some examples in the book relying on the Design package to work with the rms package

Short Courses

4-day Short Course

  • Click here
  • Click here for a detailed course description
  • To be added: description of the pre-seminar workshop on R and RStudio
  • Contact Frank Harrell for information about the course

Full Semester Course

  • Offered for the first time in the Vanderbilt University Department of Biostatistics graduate program Spring 2013 (Jan-Apr). It is taught yearly by Prof. Harrell

Materials

  • See full semester course for up-to-date material
  • Survey of new approaches to regression and tree-based modeling (referred to in Chapter 4 of the second edition)
  • Syllabus for a 1-day short course “Modern Approaches to Predictive Modeling and Covariable Adjustment in Randomized Clinical Trials”

Discussion Board

  • datamethods and here
  • stats.stackexchange.com
  • To be added from rmsdisc: An older discussion board for readers and the author to discuss questions, issues, controversies, and new research related to the text

Datasets

Quizzes

  • Quizzes (with answer sheets) on concepts in the text and on prerequisites, are available to instructors by E-mailing the author

Software

  • To be added from BioMod#FittingDemos: Interactive scripts demonstrating various curve fitting criteria and showing the flexibility of restricted cubic splines (see also this)
  • Warren Sarle’s SAS macros and examples for bootstrapping and jackknifing. See Warren’s cautionary note on bootstrap confidence intervals, with a good example related to R^2 in multiple regression. The example shows that when the estimate of R^2 is badly biased, bootstrap confidence limits are badly displaced to the right. Included in the notes is the standard error of R^2 and information about adjusted R^2.
  • The penalized package in R
  • function for binary logistic model external validation

Studies of Methods Used in the Text

  • Recent simulation experiments conducted by Carl Moons and Frank Harrell indicate that the performance of transcan for multiple imputation is about halfway between single conditional mean imputation and MICE (see below), consistent with the findings from Faris PD, Ghali WA, et al (2002): Multiple imputation versus data enhancement for dealing with missing data in observational health care outcome analyses. J Clin Epidemiology 55:184-191. Suboptimal performance of transcan for multiple imputation is probably due to the fact thattranscan fits the flexible additive imputation models and then draws all multiple imputations from the fitted models. A new function in the Hmisc package, aregImpute, uses the bootstrap to re-fit additive nonparametric imputation models for each of the multiple imputations. Results for aregImpute are very promising (see below).
  • Validation of binary logistic models
    • Simulation studies
    • Steyerberg EW, Harrell FE, Borsboom GJJM, Eijkemans MJC, Vergouwe Y, Habbema JDF (2001): Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis. Journal of Clinical Epidemiology 54:774-781.
    • Steyerberg EW et al (2003): Internal and external validation of predictive models: A simulation study of bias and precision in small samples. Journal of Clinical Epidemiology 56:441-447.
    • Vergouwe Y, Steyerberg EW, Eijkemans MJC, Habbema JDF (2005): Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. Journal of Clinical Epidemiology 58:475-483.
  • Studying the degrees of freedom spending strategy that uses generalized Spearman rho^2, in terms of preserving type I error and sigma^2 in ordinary least squares
  • Prediction Error in Cox Models Varying Number of Predictors
  • Shrinkage and problems with stepwise variable selection: See Steyerberg EW, Eijkemans MJC, Harrell FE, Habbema JDF (2001): Prognostic modeling with logistic regression analysis: In search of a sensible strategy in small data sets. Medical Decision Making 21:45-56.
  • Model simplification and stepwise variable selection: See Ambler G, Brady AR, Royston P (2002): Simplifying a prognostic model: a simulation study based on clinical data. Statistics in Medicine 21:3803-3822. The authors studied the performance of the model simplification strategy discussed in the book, and compared it with more traditional variable selection methods, finding that standard variable selection can work well when there is a large proportion of irrelevant variables.
  • New case study on penalized maximum likelihood estimation for binary logistic modeling: Moons KGM, Donders ART, Steyerberg EW, Harrell FE (2004): Penalized maximum likelihood estimation to directly adjust diagnostic and prognostic prediction models for overoptimism: a clinical example. J Clinical Epidemiology 57:1262-1270.
  • Choosing the penalty
  • Interactive demonstrations of curve fitting, effects of categorization, etc.
  • Peter Ellis’ article about overanalysis of time series data

R Help

Multiple Imputation

  • Joseph Schafer’s Multiple imputation FAQ
  • Napier University’s imputation page
  • Online Multiple Imputation and R MICE Software by Stef van Buuren and Karin Oudshoorn
  • To subscribe to the Impute E-mail discussion group led by Juned Siddique of Northwestern University, click here.
  • A paper containing a good overview of multiple imputation and a comparison of some software packages is Horton NJ, Lipsitz SR, The American Statistician 55:244-254; 2001.
  • An excellent recent survey of missing data methods is Schafer, JL and Graham JW, Psychological Methods 7:147-177; 2002.
  • See also Analysis of biases in SPSS by Paul von Hippel, The American Statistician 58:160-164; 2004.
  • Notes from Tim Hesterberg on why the response variable must be used when doing multiple imputation. Tim’s notes include code to do several simulations illustrating his points.
  • Comparisons of aregImpute with other imputation algorithms
    • Moons KGM, Donders RART, Stijnen T, Harrell FE, J Clinical Epidemiology 59:1092-1101; 2006.
    • Horton NJ, Kleinman KP, The American Statistician 61:79-90; 2007.

General Statistical Information