6 R Software

R allows interaction spline functions, wide variety of predictor parameterizations, wide variety of models, unifying model formula language, model validation by resampling.

R is comprehensive:

Easy to write R functions for new models $\rightarrow$ wide variety of modern regression models implemented (trees, nonparametric, ACE, AVAS, survival models for multiple events)
Designs can be generated for any model $\rightarrow$ all handle “class” var, interactions, nonlinear expansions
Single R objects (e.g., fit object) can be self-documenting $\rightarrow$ automatic hypothesis tests, predictions for new data
Superior graphics
Classes and generic functions

6.1 The `R` Modeling Language

R statistical modeling language:

Code

response ~ terms

y ~ age + sex            # age + sex main effects
y ~ age + sex + age:sex  # add second-order interaction
y ~ age*sex              # second-order interaction +
                         # all main effects
y ~ (age + sex + pressure)^2
                         # age+sex+pressure+age:sex+age:pressure...
y ~ (age + sex + pressure)^2 - sex:pressure
                         # all main effects and all 2nd order
                         # interactions except sex:pressure
y ~ (age + race)*sex     # age+race+sex+age:sex+race:sex
y ~ treatment*(age*race + age*sex) # no interact. with race,sex
sqrt(y) ~ sex*sqrt(age) + race
# functions, with dummy variables generated if
# race is an R factor (classification) variable
y ~ sex + poly(age,2)    # poly generates orthogonal polynomials
race.sex <- interaction(race,sex)
y ~ age + race.sex       # for when you want dummy variables for
                         # all combinations of the factors

The formula for a regression model is given to a modeling function, e.g.

Code

lrm(y ~ rcs(x,4))

is read “use a logistic regression model to model y as a function of x, representing x by a restricted cubic spline with 4 default knots”¹.

¹ lrm andrcs are in the rms package.

update function: re-fit model with changes in terms or data:

Code

f  <- lrm(y ~ rcs(x,4) + x2 + x3)
f2 <- update(f, subset=sex=="male")
f3 <- update(f, .~.-x2)         # remove x2 from model
f4 <- update(f, .~. + rcs(x5,5))# add rcs(x5,5) to model
f5 <- update(f, y2 ~ .)         # same terms, new response var.

6.2 User-Contributed Functions

R is high-level object-oriented language.
R runs on all platforms, and can be run on chromebooks using RStudio Server
Multitude of user-contributed functions freely available
International community of users

Some R functions:

See Venables and Ripley
Hierarchical clustering: hclust
Principal components: princomp, prcomp
Canonical correlation: cancor
Nonparametric transform-both-sides additive models:
ace, avas
Parametric transform-both-sides additive models:
areg, areg.boot (Hmisc package in R))
Rank correlation methods:
rcorr, hoeffd, spearman2 (Hmisc)
Variable clustering: varclus (Hmisc)
Single imputation: transcan (Hmisc)
Multiple imputation: aregImpute (Hmisc)
Restricted cubic splines:
rcspline.eval (Hmisc)
Re-state restricted spline in simpler form:
rcspline.restate (Hmisc)

6.3 The `rms` Package

Big Picture

rms package handles many popular standard models
It makes it easier to do the right thing
There are more extensible, general approaches, especially using Bayesian methods
- brms package
- rstanarm package
- rstanarm survival analysis system

FAQ

datadist function to compute predictor distribution summaries

Code

y ~ sex + lsp(age,c(20,30,40,50,60)) +
    sex %ia% lsp(age,c(20,30,40,50,60))

E.g. restrict age $\times$ cholesterol interaction to be of form $AF(B) + BG(A)$:

Code

y ~ lsp(age,30) + rcs(cholesterol,4) +
    lsp(age,30) %ia% rcs(cholesterol,4)

Special fitting functions to simplify procedures described in these notes:

`rms` Fitting Functions
Function	Purpose	Related `R` Functions
`ols`	Ordinary least squares linear model	`lm`
`lrm`	Binary and ordinal logistic regression model with options for penalize MLE, meant for discrete $Y$	`glm`
`orm`	Ordinal semi-parametric regression model for continuous $Y$ and several link functions, penalization, left, right, and interval censoring	`polr`,`lrm`
`psm`	Accelerated failure time parametric survival models	`survreg`
`cph`	Cox proportional hazards regression	`coxph`
`bj`	Buckley-James censored least squares model	`survreg`,`lm`
`Glm`	`rms` version of `glm`	`glm`
`Gls`	`rms` version of `gls`	`gls` (`nlme` package)
`Rq`	`rms` version of `rq`	`rq` (`quantreg` package)
`adapt_orm`	Use AIC to find the best link function and number of knots for a single continuous predictor using `orm`; a replacement for nonparametric smoothers that handles censoring	`orm`

`rms` Transformation Functions
Function	Purpose
`asis`	No post-transformation (seldom used explicitly)
`rcs`	Restricted cubic splines
`pol`	Polynomial using standard notation
`lsp`	Linear spline
`catg`	Categorical predictor (seldom)
`scored`	Ordinal categorical variables
`matrx`	Keep variables as group for `anova` and `fastbw`
`strat`	Non-modeled stratification factors (used for `cph` only)
`Ocens`	Used by `orm` to code left, right, interval censoring of $Y$

The transformation functions work also with regular R functions, e.g. when predict() is called the predicted values are computed by looking up the knot locations for rcs.

Below notice that there are three graphic models implemented for depicting the effects of predictors in the fitted model: lattice graphics, a ggplot method using the ggplot2 package (which has an option to convert the result to plotly), and a direct plotly method. plotly is used to create somewhat interactive graphics with drill-down capability, and the rms package takes advantage of this capability. plotly graphics are best used with RStudio Rmarkdown or Quarto html output. The ggplot and plotly (plotp; see later in this section) methods are recommended.

`rms` After-fit Functions
Function	Purpose	Related Functions
`print`	Print parameters and statistics of fit
`coef`	Fitted regression coefficients
`formula`	Formula used in the fit
`specs`	Detailed specifications of fit
`vcov`	Fetch covariance matrix
`logLik`	Fetch maximized log-likelihood
`AIC`	Fetch AIC with option to put on chi-square basis
`lrtest`	Likelihood ratio test for two nested models
`univarLR`	Compute all univariable LR $\chi^{2}$
`robcov`	Robust covariance matrix estimates
`bootcov`	Bootstrap covariance matrix estimates and bootstrap distribution of estimates
`pentrace`	Find optimum penalty factors by tracing effective AIC for a grid of penalties
`effective.df`	Print effective d.f. for each type of variable in model, for penalized fit or `pentrace` results
`summary`	Summary of effects of predictors
`plot.summary`	Plot continuously shaded confidence bars for results of `summary`
`anova`	Wald and LR tests of most meaningful hypotheses
`plot.anova`	Graphical depiction of anova
`contrast`	General contrasts, C.L., tests (both Wald- and likelihood-based)
`gendata`	Easily generate predictor combinations
`predict`	Obtain predicted values or design matrix
`Predict`	Obtain predicted values and confidence limits easily varying a subset of predictors and others set at default values
`plot.Predict`	Plot the result of `Predict` using `lattice`
`ggplot.Predict`	Plot the result of `Predict` using `ggplot2`
`plotp.Predict`	Plot the result of `Predict` using `plotly`
`fastbw`	Fast backward step-down variable selection	`step`
`residuals`	(or `resid`) Residuals, influence stats from fit
`sensuc`	Sensitivity analysis for unmeasured confounder
`which.influence`	Which observations are overly influential	`residuals`
`impactPO`	Assess the impact of the proportional odds assumption for `lrm`	`ordParallel`
`latex`	$\LaTeX$ representation of fitted model	`Function`
`Function`	`R` function analytic representation of $X\hat{\beta}$ from a fitted regression model	`latex`
`Hazard`	`R` function analytic representation of a fitted hazard function (for `psm`)
`Survival`	`R` function analytic representation of fitted survival function (for `psm`, `cph`)
`Quantile`	`R` function analytic representation of fitted function for quantiles of survival time (for `psm`, `cph`)
`Mean`	`R` function analytic representation of fitted function for mean survival time or for ordinal logistic
`nomogram`	Draws a nomogram for the fitted model	`latex`, `plot`
`survest`	Estimate survival probabilities (`psm`, `cph`)	`survfit`
`survplot`	Plot survival curves (`psm`, `cph`)	`plot.survfit`
`survplotp`	Plot survival curves with `plotly` features	`survplot`
`validate`	Validate indexes of model fit using resampling
`val.prob`	External validation of a probability model	`lrm`
`val.surv`	External validation of a survival model	`calibrate`
`calibrate`	Estimate calibration curve using resampling	`val.prob`
`vif`	Variance inflation factors for fitted model
`naresid`	Bring elements corresponding to missing data back into predictions and residuals
`naprint`	Print summary of missing values
`impute`	Impute missing values	`aregImpute`

The orm function has special after-fit functions:

`orm` After-fit Functions
Function	Purpose	Related Functions
`intCalibration`	Internal calibration, useful for assessing goodness-of-fit	`calibrate`
`ordParallel`	Check parallelism assumption, e.g., proportional odds	`impactPO`
`plotIntercepts`	Line plot of intercept estimates
`ordESS`	Effective sample size with special handling of censored observations
`Olinks`	Compute deviance of model under four link functions

`rmsb`: Bayesian Regression Modeling Strategies Package, Focusing on Semiparametric Univariate and Longitudinal Models
Function	Purpose
`blrm`	Bayesian binary and ordinal logistic model
`stackMI`	Bayesian posterior stacking for multiple imputation
`stanDx`	Stan diagnostics on fit
`stanDxplot`	Trace plots to check posterior sampling convergence
`PostF`	Creates R function for computing posterior probabilities
`plot.rmsb`	Plot posterior densities, intervals, point summaries
`compareBmods`	Compare two models using LOO-cv
`HPDint`	Compute highest posterior density interval
`distSym`	Compute meaure of symmetry of posterior distribution

An extensive overview of Bayesian capabilities of the rmsb package may be found at hbiostat.org/R/rmsb/blrm.html.

Global options prType and grType control printed and some graphical output, respectively as shown in example code below. The default is plain output and static graphics. If using plotly interactive graphics through ggplot or plotp or with anova or summary functions it is best to do so with RStudio html output or html notebooks. If using html output you must be producing an html document or notebook. When setting grType to use $\LaTeX$ or html it is highly recommended that you use the knitr package.

Example:

treat: categorical variable with levels "a","b","c"
num.diseases: ordinal variable, 0-4
age: continuous
Restricted cubic spline
cholesterol: continuous
(3 missings; use median)
log(cholesterol+10)
Allow treat $\times$ cholesterol interaction
Program to fit logistic model, test all effects in design, estimate effects (e.g. inter-quartile range odds ratios), plot estimated transformations

Code

require(rms)                     # make new functions available
options(prType='html')           # print, summary, anova, validate: html output
                                 # others: 'latex', 'plain'
options(grType='plotly')         # plotly graphics for ggplot, anova, summary
                                 # default is 'base' for static graphics
ddist <- datadist(cholesterol, treat, num.diseases, age)
# Could have used ddist <- datadist(data.frame.name)
options(datadist="ddist")        # defines data dist. to rms
cholesterol <- impute(cholesterol)
fit <- lrm(y ~ treat + scored(num.diseases) + rcs(age) +
                 log(cholesterol+10) + treat:log(cholesterol+10),
                     x=TRUE, y=TRUE)  # needed for robcov, anova test='LR'
fit    # outputs plain, LaTeX, or html markup
describe(y ~ treat + scored(num.diseases) + rcs(age))
# or use describe(formula(fit)) for all variables used in fit
# describe function (in Hmisc) gets simple statistics on variables
# fit <- robcov(fit)             # Would make all statistics that follow
                                 # use a robust covariance matrix
                                 # would need x=T, y=T in lrm()
specs(fit)                       # Describe the design characteristics
anova(fit)                       # Wald tests, plain, LaTex, or html
anova(fit, test='LR')            # Likelihood ratio tests
anova(fit, treat, cholesterol)   # Test these 2 by themselves
plot(anova(fit))                 # Summarize anova graphically
summary(fit)                     # Estimate effects using default ranges
                                 # prints plain, LaTeX, or html
plot(summary(fit))               # Graphical display of effects with C.I.
summary(fit, treat="b", age=60)  # Specify reference cell and adjustment val
summary(fit, age=c(50,70))       # Estimate effect of increasing age from
                                 # 50 to 70
summary(fit, age=c(50,60,70))    # Increase age from 50 to 70, adjust to
                                 # 60 when estimating effects of other
                                 # factors
# If had not defined datadist, would have to define ranges for all var.

# Estimate and test treatment (b-a) effect averaged over 3 cholesterols
contrast(fit, list(treat='b', cholesterol=c(150,200,250)),
              list(treat='a', cholesterol=c(150,200,250)),
         type='average')
# Add conf.type='profile' got get more accurate profile likilihood CIs
# as well as likelihood ratio tests for contrasts.
# See the help file for contrast.rms for several examples of
# how to obtain joint tests of multiple contrasts and how to get
# double differences (interaction contrasts)

p <- Predict(fit, age=seq(20,80,length=100), treat, conf.int=FALSE)
plot(p)                          # Plot relationship between age and log
# or ggplot(p), plotp(p)         # odds, separate curve for each treat,
                                 # no C.I.
plot(p, ~ age | treat)           # Same but 2 panels
ggplot(p, groups=FALSE)
bplot(Predict(fit, age, cholesterol, np=50))
                                 # 3-dimensional perspective plot for age,
                                 # cholesterol, and log odds using default
                                 # ranges for both variables
plot(Predict(fit, num.diseases, fun=function(x) 1/(1+exp(-x)), conf.int=.9),
     ylab="Prob")                # Plot estimated probabilities instead of
                                 # log odds (or use ggplot())
                                 # can also use plotp() for plotly
# Again, if no datadist were defined, would have to tell plot all limits
logit <- predict(fit, expand.grid(treat="b",num.dis=1:3,age=c(20,40,60),
                 cholesterol=seq(100,300,length=10)))
# Could also obtain list of predictor settings interactively}
logit <- predict(fit, gendata(fit, nobs=12))

# Since age doesn't interact with anything, we can quickly and
# interactively try various transformations of age, taking the spline
# function of age as the gold standard. We are seeking a linearizing
# transformation.

ag <- 10:80
logit <- predict(fit, expand.grid(treat="a", num.dis=0, age=ag,
                 cholesterol=median(cholesterol)), type="terms")[,"age"]
# Note: if age interacted with anything, this would be the age
#        "main effect" ignoring interaction terms
# Could also use
#    logit <- Predict(f, age=ag, ...)$yhat,
# which allows evaluation of the shape for any level of interacting
#  factors.  When age does not interact with anything, the result from
# predict(f, ..., type="terms") would equal the result from
# Predict if all other terms were ignored

# Could also specify
#    logit <- predict(fit, gendata(fit, age=ag, cholesterol=...))
# Un-mentioned variables set to reference values

plot(ag^.5, logit)               # try square root vs. spline transform.
plot(ag^1.5, logit)              # try 1.5 power

latex(fit)                       # fit in math notation
# Draw a nomogram for the model fit
plot(nomogram(fit))

# Compose R function to evaluate linear predictors analytically
g <- Function(fit)
g(treat='b', cholesterol=260, age=50)
# Letting num.diseases default to reference value

To examine interactions in a simpler way, you may want to group age into tertiles:

Code

age.tertile <- cut2(age, g=3)
# For automatic ranges later, add age.tertile to datadist input
fit <- lrm(y ~ age.tertile * rcs(cholesterol))

6.4 Other Functions

processMI: works with Hmisc::fit.mult.impute to process resampling validation statistics and likelihood ratio $\chi^2$ statistics accounting for multiple imputation
supsmu: Friedman’s “super smoother”
lowess: Cleveland’s scatterplot smoother
glm: generalized linear models (see Glm)
gam: Generalized additive models
rpart: Like original CART with surrogate splits for missings, censored data extension (Atkinson & Therneau)
validate.rpart: in rms; validates recursive partitioning with respect to certain accuracy indexes
loess: multi-dimensional scatterplot smoother

Code

f <- loess(y ~ age * pressure)
plot(f)                           # cross-sectional plots
ages <- seq(20,70,length=40)
pressures <- seq(80,200,length=40)
pred <- predict(f, expand.grid(age=ages, pressure=pressures))
persp(ages, pressures, pred)      # 3-d plot

```{r setup, include=FALSE} require(Hmisc) getRs('qbookfun.r') ``` # R Software {#sec-r} `R` allows interaction spline functions, wide variety of predictor parameterizations, wide variety of models, unifying model formula language, model validation by resampling. `R` is comprehensive: * Easy to write `R` functions for new models $\rightarrow$ wide variety of modern regression models implemented (trees, nonparametric, ACE, AVAS, survival models for multiple events) * Designs can be generated for any model $\rightarrow$ all handle "class" var, interactions, nonlinear expansions * Single `R` objects (e.g., fit object) can be self-documenting $\rightarrow$ automatic hypothesis tests, predictions for new data * Superior graphics * Classes and generic functions ## The `R` Modeling Language `R` statistical modeling language: ```{r eval=FALSE} response ~ terms y ~ age + sex # age + sex main effects y ~ age + sex + age:sex # add second-order interaction y ~ age*sex # second-order interaction + # all main effects y ~ (age + sex + pressure)^2 # age+sex+pressure+age:sex+age:pressure... y ~ (age + sex + pressure)^2 - sex:pressure # all main effects and all 2nd order # interactions except sex:pressure y ~ (age + race)*sex # age+race+sex+age:sex+race:sex y ~ treatment*(age*race + age*sex) # no interact. with race,sex sqrt(y) ~ sex*sqrt(age) + race # functions, with dummy variables generated if # race is an R factor (classification) variable y ~ sex + poly(age,2) # poly generates orthogonal polynomials race.sex <- interaction(race,sex) y ~ age + race.sex # for when you want dummy variables for # all combinations of the factors ``` The formula for a regression model is given to a modeling function, e.g. ```{r eval=FALSE} lrm(y ~ rcs(x,4)) ``` is read "use a logistic regression model to model y as a function of x, representing x by a restricted cubic spline with 4 default knots"^[`lrm` and`rcs` are in the `rms` package.]. `update` function: re-fit model with changes in terms or data: ```{r eval=FALSE} f <- lrm(y ~ rcs(x,4) + x2 + x3) f2 <- update(f, subset=sex=="male") f3 <- update(f, .~.-x2) # remove x2 from model f4 <- update(f, .~. + rcs(x5,5))# add rcs(x5,5) to model f5 <- update(f, y2 ~ .) # same terms, new response var. ``` ## User-Contributed Functions * `R` is high-level object-oriented language. * `R` runs on all platforms, and can be run on chromebooks using [RStudio Server](https://posit.co/products/open-source/rstudio-server) * Multitude of user-contributed functions freely available * International community of users Some `R` functions: * See Venables and Ripley * Hierarchical clustering: `hclust` * Principal components: `princomp, prcomp` * Canonical correlation: `cancor` * Nonparametric transform-both-sides additive models: `ace, avas` * Parametric transform-both-sides additive models: `areg`, `areg.boot` (`Hmisc` package in `R`)) * Rank correlation methods: `rcorr`, `hoeffd`, `spearman2` (`Hmisc`) * Variable clustering: `varclus` (`Hmisc`) * Single imputation: `transcan` (`Hmisc`) * Multiple imputation: `aregImpute` (`Hmisc`) * Restricted cubic splines: `rcspline.eval` (`Hmisc`) * Re-state restricted spline in simpler form: `rcspline.restate` (`Hmisc`) ## The `rms` Package **Big Picture** * `rms` package handles many popular standard models * It makes it easier to do the right thing * There are more extensible, general approaches, especially using Bayesian methods + `brms` package + `rstanarm` package + `rstanarm` survival analysis system `r mrg(rmsdisc())` [[FAQ](http://datamethods.org/rms)]{.aside} * `datadist` function to compute predictor distribution summaries ```{r eval=FALSE} y ~ sex + lsp(age,c(20,30,40,50,60)) + sex %ia% lsp(age,c(20,30,40,50,60)) ``` E.g. restrict age $\times$ cholesterol interaction to be of form $AF(B) + BG(A)$: ```{r eval=FALSE} y ~ lsp(age,30) + rcs(cholesterol,4) + lsp(age,30) %ia% rcs(cholesterol,4) ``` Special fitting functions to simplify procedures described in these notes: | Function | Purpose | Related `R` Functions| |-----|-------------------------------|---| | `ols` | Ordinary least squares linear model | `lm` | | `lrm` | Binary and ordinal logistic regression model with options for penalize MLE, meant for discrete $Y$ | `glm` | | `orm` | Ordinal semi-parametric regression model for continuous $Y$ and several link functions, penalization, left, right, and interval censoring |`polr`,`lrm` | | `psm` | Accelerated failure time parametric survival models | `survreg` | | `cph` | Cox proportional hazards regression | `coxph` | | `bj` | Buckley-James censored least squares model | `survreg`,`lm` | | `Glm` | `rms` version of `glm` | `glm` | | `Gls` | `rms` version of `gls` | `gls` (`nlme` package) | | `Rq` | `rms` version of `rq` | `rq` (`quantreg` package) | | `adapt_orm` | Use AIC to find the best link function and number of knots for a single continuous predictor using `orm`; a replacement for nonparametric smoothers that handles censoring | `orm` | : `rms` Fitting Functions | Function | Purpose | Related `R` Functions | |-----|-----| | `asis` | No post-transformation (seldom used explicitly) | `I` | | `rcs` | Restricted cubic splines | `ns` | | `pol` | Polynomial using standard notation | `poly` | | `lsp` | Linear spline | | | `catg` | Categorical predictor (seldom) | `factor` | | `scored` | Ordinal categorical variables | `ordered` | | `matrx` | Keep variables as group for `anova` and `fastbw` | `matrix` | | `strat` | Non-modeled stratification factors (used for `cph` only) | `strata` | | `Ocens` | Used by `orm` to code left, right, interval censoring of $Y$ | : `rms` Transformation Functions The transformation functions work also with regular R functions, e.g. when `predict()` is called the predicted values are computed by looking up the knot locations for `rcs`. Below notice that there are three graphic models implemented for depicting the effects of predictors in the fitted model: `lattice` graphics, a `ggplot` method using the `ggplot2` package (which has an option to convert the result to `plotly`), and a direct `plotly` method. `plotly` is used to create somewhat interactive graphics with drill-down capability, and the `rms` package takes advantage of this capability. `plotly` graphics are best used with `RStudio Rmarkdown` or `Quarto` html output. The `ggplot` and `plotly` (`plotp`; see later in this section) methods are recommended. | Function | Purpose | Related Functions | |--------------|--------------------------------|-----| | `print` | Print parameters and statistics of fit | | | `coef` | Fitted regression coefficients | | | `formula` | Formula used in the fit | | | `specs` | Detailed specifications of fit | | | `vcov` | Fetch covariance matrix | | | `logLik` | Fetch maximized log-likelihood | | | `AIC` | Fetch AIC with option to put on chi-square basis | | | `lrtest` | Likelihood ratio test for two nested models | | | `univarLR` | Compute all univariable LR $\chi^{2}$ | | | `robcov` | Robust covariance matrix estimates | | | `bootcov` | Bootstrap covariance matrix estimates and bootstrap distribution of estimates | | | `pentrace` | Find optimum penalty factors by tracing effective AIC for a grid of penalties | | | `effective.df`| Print effective d.f. for each type of variable in model, for penalized fit or `pentrace` results | | | `summary` | Summary of effects of predictors | | | `plot.summary` | Plot continuously shaded confidence bars for results of `summary`| | | `anova` | Wald and LR tests of most meaningful hypotheses | | | `plot.anova` | Graphical depiction of anova | | | `contrast` | General contrasts, C.L., tests (both Wald- and likelihood-based) | | | `gendata` | Easily generate predictor combinations | | | `predict` | Obtain predicted values or design matrix | | | `Predict` | Obtain predicted values and confidence limits easily varying a subset of predictors and others set at default values| | | `plot.Predict` | Plot the result of `Predict` using `lattice` | | | `ggplot.Predict` | Plot the result of `Predict` using `ggplot2` | | | `plotp.Predict` | Plot the result of `Predict` using `plotly` | | | `fastbw` | Fast backward step-down variable selection | `step` | | `residuals` | (or `resid`) Residuals, influence stats from fit | | | `sensuc` | Sensitivity analysis for unmeasured confounder | | | `which.influence` | Which observations are overly influential |`residuals` | | `impactPO` | Assess the impact of the proportional odds assumption for `lrm` | `ordParallel` | | `latex` | $\LaTeX$ representation of fitted model | `Function` | | `Function`| `R` function analytic representation of $X\hat{\beta}$ from a fitted regression model| `latex` | | `Hazard` | `R` function analytic representation of a fitted hazard function (for `psm`) | | | `Survival` | `R` function analytic representation of fitted survival function (for `psm`, `cph`) | | | `Quantile` | `R` function analytic representation of fitted function for quantiles of survival time (for `psm`, `cph`) | | | `Mean` | `R` function analytic representation of fitted function for mean survival time or for ordinal logistic | | | `nomogram` | Draws a nomogram for the fitted model | `latex`, `plot` | | `survest` | Estimate survival probabilities (`psm`, `cph`) |`survfit` | | `survplot` | Plot survival curves (`psm`, `cph`) | `plot.survfit` | | `survplotp` | Plot survival curves with `plotly` features | `survplot` | | `validate` | Validate indexes of model fit using resampling | | | `val.prob` | External validation of a probability model | `lrm` | | `val.surv` | External validation of a survival model | `calibrate` | | `calibrate` | Estimate calibration curve using resampling | `val.prob` | | `vif` | Variance inflation factors for fitted model | | | `naresid` | Bring elements corresponding to missing data back into predictions and residuals | | | `naprint` | Print summary of missing values | | | `impute` | Impute missing values | `aregImpute` | : `rms` After-fit Functions The `orm` function has special after-fit functions: | Function | Purpose | Related Functions | |----------|--------------------------------|-----| | `intCalibration` | Internal calibration, useful for assessing goodness-of-fit | `calibrate` | | `ordParallel` | Check parallelism assumption, e.g., proportional odds | `impactPO` | | `plotIntercepts` | Line plot of intercept estimates | | | `ordESS` | Effective sample size with special handling of censored observations | | `Olinks` | Compute deviance of model under four link functions | : `orm` After-fit Functions | Function | Purpose | |----------------|-----------------------------------------------------------| | `blrm` | Bayesian binary and ordinal logistic model | | `stackMI` | Bayesian posterior stacking for multiple imputation | | `stanDx` | Stan diagnostics on fit | | `stanDxplot` | Trace plots to check posterior sampling convergence | | `PostF` | Creates R function for computing posterior probabilities | | `plot.rmsb` | Plot posterior densities, intervals, point summaries | | `compareBmods` | Compare two models using LOO-cv | | `HPDint` | Compute highest posterior density interval | | `distSym` | Compute meaure of symmetry of posterior distribution | : `rmsb`: Bayesian Regression Modeling Strategies Package, Focusing on Semiparametric Univariate and Longitudinal Models An extensive overview of Bayesian capabilities of the `rmsb` package may be found at [hbiostat.org/R/rmsb/blrm.html](https://hbiostat.org/R/rmsb/blrm.html). Global options `prType` and `grType` control printed and some graphical output, respectively as shown in example code below. The default is plain output and static graphics. If using `plotly` interactive graphics through `ggplot` or `plotp` or with `anova` or `summary` functions it is best to do so with RStudio `html` output or `html` notebooks. If using `html` output you must be producing an `html` document or notebook. When setting `grType` to use $\LaTeX$ or `html` it is highly recommended that you use the `knitr` package. Example: * `treat`: categorical variable with levels `"a","b","c"` * `num.diseases`: ordinal variable, 0-4 * `age`: continuous Restricted cubic spline * `cholesterol`: continuous (3 missings; use median) `log(cholesterol+10)` * Allow `treat` $\times$ `cholesterol` interaction * Program to fit logistic model, test all effects in design, estimate effects (e.g. inter-quartile range odds ratios), plot estimated transformations  ```{r eval=FALSE} require(rms) # make new functions available options(prType='html') # print, summary, anova, validate: html output # others: 'latex', 'plain' options(grType='plotly') # plotly graphics for ggplot, anova, summary # default is 'base' for static graphics ddist <- datadist(cholesterol, treat, num.diseases, age) # Could have used ddist <- datadist(data.frame.name) options(datadist="ddist") # defines data dist. to rms cholesterol <- impute(cholesterol) fit <- lrm(y ~ treat + scored(num.diseases) + rcs(age) + log(cholesterol+10) + treat:log(cholesterol+10), x=TRUE, y=TRUE) # needed for robcov, anova test='LR' fit # outputs plain, LaTeX, or html markup describe(y ~ treat + scored(num.diseases) + rcs(age)) # or use describe(formula(fit)) for all variables used in fit # describe function (in Hmisc) gets simple statistics on variables # fit <- robcov(fit) # Would make all statistics that follow # use a robust covariance matrix # would need x=T, y=T in lrm() specs(fit) # Describe the design characteristics anova(fit) # Wald tests, plain, LaTex, or html anova(fit, test='LR') # Likelihood ratio tests anova(fit, treat, cholesterol) # Test these 2 by themselves plot(anova(fit)) # Summarize anova graphically summary(fit) # Estimate effects using default ranges # prints plain, LaTeX, or html plot(summary(fit)) # Graphical display of effects with C.I. summary(fit, treat="b", age=60) # Specify reference cell and adjustment val summary(fit, age=c(50,70)) # Estimate effect of increasing age from # 50 to 70 summary(fit, age=c(50,60,70)) # Increase age from 50 to 70, adjust to # 60 when estimating effects of other # factors # If had not defined datadist, would have to define ranges for all var. # Estimate and test treatment (b-a) effect averaged over 3 cholesterols contrast(fit, list(treat='b', cholesterol=c(150,200,250)), list(treat='a', cholesterol=c(150,200,250)), type='average') # Add conf.type='profile' got get more accurate profile likilihood CIs # as well as likelihood ratio tests for contrasts. # See the help file for contrast.rms for several examples of # how to obtain joint tests of multiple contrasts and how to get # double differences (interaction contrasts) p <- Predict(fit, age=seq(20,80,length=100), treat, conf.int=FALSE) plot(p) # Plot relationship between age and log # or ggplot(p), plotp(p) # odds, separate curve for each treat, # no C.I. plot(p, ~ age | treat) # Same but 2 panels ggplot(p, groups=FALSE) bplot(Predict(fit, age, cholesterol, np=50)) # 3-dimensional perspective plot for age, # cholesterol, and log odds using default # ranges for both variables plot(Predict(fit, num.diseases, fun=function(x) 1/(1+exp(-x)), conf.int=.9), ylab="Prob") # Plot estimated probabilities instead of # log odds (or use ggplot()) # can also use plotp() for plotly # Again, if no datadist were defined, would have to tell plot all limits logit <- predict(fit, expand.grid(treat="b",num.dis=1:3,age=c(20,40,60), cholesterol=seq(100,300,length=10))) # Could also obtain list of predictor settings interactively} logit <- predict(fit, gendata(fit, nobs=12)) # Since age doesn't interact with anything, we can quickly and # interactively try various transformations of age, taking the spline # function of age as the gold standard. We are seeking a linearizing # transformation. ag <- 10:80 logit <- predict(fit, expand.grid(treat="a", num.dis=0, age=ag, cholesterol=median(cholesterol)), type="terms")[,"age"] # Note: if age interacted with anything, this would be the age # "main effect" ignoring interaction terms # Could also use # logit <- Predict(f, age=ag, ...)$yhat, # which allows evaluation of the shape for any level of interacting # factors. When age does not interact with anything, the result from # predict(f, ..., type="terms") would equal the result from # Predict if all other terms were ignored # Could also specify # logit <- predict(fit, gendata(fit, age=ag, cholesterol=...)) # Un-mentioned variables set to reference values plot(ag^.5, logit) # try square root vs. spline transform. plot(ag^1.5, logit) # try 1.5 power latex(fit) # fit in math notation # Draw a nomogram for the model fit plot(nomogram(fit)) # Compose R function to evaluate linear predictors analytically g <- Function(fit) g(treat='b', cholesterol=260, age=50) # Letting num.diseases default to reference value ``` To examine interactions in a simpler way, you may want to group age into tertiles: ```{r eval=FALSE} age.tertile <- cut2(age, g=3) # For automatic ranges later, add age.tertile to datadist input fit <- lrm(y ~ age.tertile * rcs(cholesterol)) ``` ## Other Functions * `processMI`: works with `Hmisc::fit.mult.impute` to process resampling validation statistics and likelihood ratio $\chi^2$ statistics accounting for multiple imputation * `supsmu`: Friedman's "super smoother" * `lowess`: Cleveland's scatterplot smoother * `glm`: generalized linear models (see `Glm`) * `gam`: Generalized additive models * `rpart`: Like original CART with surrogate splits for missings, censored data extension (Atkinson \& Therneau) * `validate.rpart`: in `rms`; validates recursive partitioning with respect to certain accuracy indexes * `loess`: multi-dimensional scatterplot smoother ```{r eval=FALSE} f <- loess(y ~ age * pressure) plot(f) # cross-sectional plots ages <- seq(20,70,length=40) pressures <- seq(80,200,length=40) pred <- predict(f, expand.grid(age=ages, pressure=pressures)) persp(ages, pressures, pred) # 3-d plot ```

6.1 The R Modeling Language

6.2 User-Contributed Functions

6.3 The rms Package

6.4 Other Functions

6.1 The `R` Modeling Language

6.3 The `rms` Package