R rms Package

Author

Frank Harrell

Published

January 14, 2025

Regression Modeling Strategies

News 2025-01-14

rms 7.0 is a milestone release of the package, now in its 34\(^{th}\) year, with greatly improved fitting functions and key new statistical analysis capabilities. Many of the computational improvements related to logistic and ordinal regression are detailed here. A list of all the major user-visible improvements follow.

  • Semiparametric ordinal regression modeling using the lrm and orm function now has no limitations on the number of intercepts in the model, so ordinal models are now even more appropriate for continuous Y. For example, a model fitted to N=300,000 with continuous Y (no ties), so that there are 299,999 intercepts in the model, fits in 2.5 seconds with 20 covariates. This is achievable when a Newton-type fitting algorithm is being used (Newton-Raphson or Levenberg-Marquardt). The standard error of one of the parameter estimates may be computed in 0.3s. This is due to the following point.
  • The lrm and orm functions now both optimally use sparse matrices, taking full advance of the incredibly fast and comprehensive Matrix package, not only to compute final variances and covariances of parameter estimates, but also to do Newton-type parameter estimate updating while solving for maximum likelihood estimates. The information matrix is stored in a list of 3 submatrices that are as small as possible. The covariance matrix is not computed for the model fits but is computed on demand by the vcov function. Needed portions of the covariance matrix for the parameters (inverse of the information matrix) can be computed in a fraction of a second even for a 300000 \(\times\) 300000 matrix.
  • rms has a new function infoMxop (information matrix operations) that facilitates inverting regular and sparse information matrices, obtaining parts of the inverse, and computing matrix products of the inverse and a user-specified matrix, for Newton updating, getting standard errors of predicted values, etc. infoMxop is used by vcov.lrm and vcov.orm.
  • Likelihood calculations for orm, like has always been the case for lrm, are all done in Fortran for speed. Link functions are now hard-coded in Fortran, so users may not specify customized link functions. The author is glad to add new link functions on demand.
  • orm has an option to use Levenberg-Marquart optimization in addition to Newton-Raphson with step-halving.
  • lrm implements many different optimizers including hessian-free ones.
  • orm now supports weights and penalties
  • Mean, Quantile, and ExProb function generators: now use fast sparse matrix operations to more quickly use the \(\delta\) method to get variances of estimated means, quantiles, and exceedance probabilities from ordinal odels
  • bootcov: changed way of handling non-sampled ordinal Y levels to use linear interpolation/extrapolation of intercepts. But it may be better to use the new Hmisc package ordGroupBoot function to mildly bin Y so that no bootstrap sample will omit any distinct Y data values in the first place.
  • contrast.rms: implemented conf.type='profile' to compute likelihood profile confidence intervals for general contrasts, and corresponding likelihood ratio \(\chi^2\) tests.

Earlier Updates

rms 6.8-0 has a non-downward-compatible change to the orm function that improves how unique numeric values are determined for dependent variables. Previous versions could give different results on different hardward due to behavior of the R unique function for floating point vectors. Now unique values are determined by the y.precison argument which defaults to multiplying values by \(10^5\) before rounding. Details are in this report by Shawn Garbett of the Vanderbilt Department of Biostatistics.

Version 6.8-0 also has an important new function for relative explained variation, rexVar.

rms 6.7-0 appeared on CRAN 2023-05-08 and represents a major update. The most significant new feature is automatically computing all likelihood ratio (LR) \(\chi^2\) chunk test statistics that can be inferred from the model design when the model is fitted using lrm, orm, psm, cph, Glm. I’ve been meaning to do this for more than 10 years because LR tests are more accurate than the default anova.rms Wald tests. LR tests do not suffer from the Hauck-Donner effect when a predictor has an infinite regression coefficient that drives the Wald \(\chi^2\) to zero because the standard error blows up.

An example of a full LR anova is here.

Also new is the implementation of LR tests when doing multiple imputation, using the method of Chan and Meng. This uses a new feature in Hmisc:fit.mult.impute where besides testing on individual completed datasets the log likelihood is computed from a stacked dataset of all completed datasets. Specifying lrt=TRUE to fit.mult.impute will take the necessary actions to get LR tests with processMI including setting argument method to 'stack' which makes final regression coefficient estimates come from a single fit of a stacked dataset.

Note that Chan and Ming do not answer their email, so the method seems to have stagnated.

There are new rms functions or options relating to this:

  • LRupdate: update LR test-related stats after processMI is run (including pseudo \(R^2\) measures)
  • processMI.fit.mult.impute: added processing of anova result from fit.mult.impute(..., lrt=TRUE)
  • prmiInfo: print (or html) inputation parameters on the result of processMI(..., 'anova')

This new rms requires installing the latest Hmisc from CRAN.

Documentation | CRAN | GitHub | Online

Evolution

rms is an R package that is a replacement for the Design package. The package accompanies FE Harrell’s book Regression Modeling Strategies. It began in 1991 as the S-Plus Design package.

Bug Reports

Please use Issues on GitHub.