R allows interaction spline functions, wide variety of predictor parameterizations, wide variety of models, unifying model formula language, model validation by resampling.
R is comprehensive:
Easy to write R functions for new models \(\rightarrow\) wide variety of modern regression models implemented (trees, nonparametric, ACE, AVAS, survival models for multiple events)
Designs can be generated for any model \(\rightarrow\) all handle “class” var, interactions, nonlinear expansions
Single R objects (e.g., fit object) can be self-documenting \(\rightarrow\) automatic hypothesis tests, predictions for new data
Superior graphics
Classes and generic functions
6.1 The R Modeling Language
R statistical modeling language:
Code
response ~ termsy ~ age + sex # age + sex main effectsy ~ age + sex + age:sex # add second-order interactiony ~ age*sex # second-order interaction +# all main effectsy ~ (age + sex + pressure)^2# age+sex+pressure+age:sex+age:pressure...y ~ (age + sex + pressure)^2- sex:pressure# all main effects and all 2nd order# interactions except sex:pressurey ~ (age + race)*sex # age+race+sex+age:sex+race:sexy ~ treatment*(age*race + age*sex) # no interact. with race,sexsqrt(y) ~ sex*sqrt(age) + race# functions, with dummy variables generated if# race is an R factor (classification) variabley ~ sex +poly(age,2) # poly generates orthogonal polynomialsrace.sex <-interaction(race,sex)y ~ age + race.sex # for when you want dummy variables for# all combinations of the factors
The formula for a regression model is given to a modeling function, e.g.
Code
lrm(y ~rcs(x,4))
is read “use a logistic regression model to model y as a function of x, representing x by a restricted cubic spline with 4 default knots”1.
1lrm andrcs are in the rms package.
update function: re-fit model with changes in terms or data:
Code
f <-lrm(y ~rcs(x,4) + x2 + x3)f2 <-update(f, subset=sex=="male")f3 <-update(f, .~.-x2) # remove x2 from modelf4 <-update(f, .~. +rcs(x5,5))# add rcs(x5,5) to modelf5 <-update(f, y2 ~ .) # same terms, new response var.
6.2 User-Contributed Functions
R is high-level object-oriented language.
R runs on all platforms, and can be run on chromebooks using RStudio Server
Multitude of user-contributed functions freely available
International community of users
Some R functions:
See Venables and Ripley
Hierarchical clustering: hclust
Principal components: princomp, prcomp
Canonical correlation: cancor
Nonparametric transform-both-sides additive models: ace, avas
Parametric transform-both-sides additive models: areg, areg.boot (Hmisc package in R))
datadist function to compute predictor distribution summaries
Code
y ~ sex +lsp(age,c(20,30,40,50,60)) + sex %ia%lsp(age,c(20,30,40,50,60))
E.g. restrict age \(\times\) cholesterol interaction to be of form \(AF(B) + BG(A)\):
Code
y ~lsp(age,30) +rcs(cholesterol,4) +lsp(age,30) %ia%rcs(cholesterol,4)
Special fitting functions by Harrell to simplify procedures described in these notes:
rms Fitting Functions
Function
Purpose
Related R Functions
ols
Ordinary least squares linear model
lm
lrm
Binary and ordinal logistic regression model with options for penalize MLE
glm
orm
Ordinal semi-parametric regression model for continuous \(Y\) and several link functions
polr,lrm
psm
Accelerated failure time parametric survival models
survreg
cph
Cox proportional hazards regression
coxph
bj
Buckley-James censored least squares model
survreg,lm
Glm
rms version of glm
glm
Gls
rms version of gls
gls (nlme package)
Rq
rms version of rq
rq (quantreg package)
rms Transformation Functions
Function
Purpose
asis
No post-transformation (seldom used explicitly)
rcs
Restricted cubic splines
pol
Polynomial using standard notation
lsp
Linear spline
catg
Categorical predictor (seldom)
scored
Ordinal categorical variables
matrx
Keep variables as group for anova and fastbw
strat
Non-modeled stratification factors (used for cph only)
The transformation functions work also with regular R functions, e.g. when predict() is called the predicted values are computed by looking up the knot locations for rcs.
Below notice that there are three graphic models implemented for depicting the effects of predictors in the fitted model: lattice graphics, a ggplot method using the ggplot2 package (which has an option to convert the result to plotly), and a direct plotly method. plotly is used to create somewhat interactive graphics with drill-down capability, and the rms package takes advantage of this capability. plotly graphics are best used with RStudio Rmarkdown html output.
rms After-fit Functions
Function
Purpose
Related Functions
print
Print parameters and statistics of fit
coef
Fitted regression coefficients
formula
Formula used in the fit
specs
Detailed specifications of fit
vcov
Fetch covariance matrix
logLik
Fetch maximized log-likelihood
AIC
Fetch AIC with option to put on chi-square basis
lrtest
Likelihood ratio test for two nested models
univarLR
Compute all univariable LR \(\chi^{2}\)
robcov
Robust covariance matrix estimates
bootcov
Bootstrap covariance matrix estimates and bootstrap distribution of estimates
pentrace
Find optimum penalty factors by tracing effective AIC for a grid of penalties
effective.df
Print effective d.f. for each type of variable in model, for penalized fit or pentrace results
summary
Summary of effects of predictors
plot.summary
Plot continuously shaded confidence bars for results of summary
anova
Wald and LR tests of most meaningful hypotheses
plot.anova
Graphical depiction of anova
contrast
General contrasts, C.L., tests
gendata
Easily generate predictor combinations
predict
Obtain predicted values or design matrix
Predict
Obtain predicted values and confidence limits easily varying a subset of predictors and others set at default values
plot.Predict
Plot the result of Predict using lattice
ggplot.Predict
Plot the result of Predict using ggplot2
plotp.Predict
Plot the result of Predict using plotly
fastbw
Fast backward step-down variable selection
step
residuals
(or resid) Residuals, influence stats from fit
sensuc
Sensitivity analysis for unmeasured confounder
which.influence
Which observations are overly influential
residuals
latex
\(\LaTeX\) representation of fitted model
Function
Function
R function analytic representation of \(X\hat{\beta}\) from a fitted regression model
latex
Hazard
R function analytic representation of a fitted hazard function (for psm)
Survival
R function analytic representation of fitted survival function (for psm, cph)
Quantile
R function analytic representation of fitted function for quantiles of survival time (for psm, cph)
Mean
R function analytic representation of fitted function for mean survival time or for ordinal logistic
nomogram
Draws a nomogram for the fitted model
latex, plot
survest
Estimate survival probabilities (psm, cph)
survfit
survplot
Plot survival curves (psm, cph)
plot.survfit
survplotp
Plot survival curves with plotly features
survplot
validate
Validate indexes of model fit using resampling
val.prob
External validation of a probability model
lrm
val.surv
External validation of a survival model
calibrate
calibrate
Estimate calibration curve using resampling
val.prob
vif
Variance inflation factors for fitted model
naresid
Bring elements corresponding to missing data back into predictions and residuals
naprint
Print summary of missing values
impute
Impute missing values
aregImpute
rmsb: Bayesian Regression Modeling Strategies Package, Focusing on Semiparametric Univariate and Longitudinal Models
Function
Purpose
blrm
Bayesian binary and ordinal logistic model
stackMI
Bayesian posterior stacking for multiple imputation
stanDx
Stan diagnostics on fit
stanDxplot
Trace plots to check posterior sampling convergence
PostF
Creates R function for computing posterior probabilities
plot.rmsb
Plot posterior densities, intervals, point summaries
compareBmods
Compare two models using LOO-cv
HPDint
Compute highest posterior density interval
distSym
Compute meaure of symmetry of posterior distribution
Global options prType and grType control printed and some graphical output, respectively as shown in example code below. The default is plain output and static graphics. If using plotly interactive graphics through ggplot or plotp or with anova or summary functions it is best to do so with RStudio html output or html notebooks. If using html output you must be producing an html document or notebook. When setting grType to use \(\LaTeX\) or html it is highly recommended that you use the knitr package.
Example:
treat: categorical variable with levels "a","b","c"
num.diseases: ordinal variable, 0-4
age: continuous Restricted cubic spline
cholesterol: continuous (3 missings; use median) log(cholesterol+10)
Allow treat\(\times\)cholesterol interaction
Program to fit logistic model, test all effects in design, estimate effects (e.g. inter-quartile range odds ratios), plot estimated transformations
Code
require(rms) # make new functions availableoptions(prType='html') # print, summary, anova, validate: html output# others: 'latex', 'plain'options(grType='plotly') # plotly graphics for ggplot, anova, summary# default is 'base' for static graphicsddist <-datadist(cholesterol, treat, num.diseases, age)# Could have used ddist <- datadist(data.frame.name)options(datadist="ddist") # defines data dist. to rmscholesterol <-impute(cholesterol)fit <-lrm(y ~ treat +scored(num.diseases) +rcs(age) +log(cholesterol+10) + treat:log(cholesterol+10),x=TRUE, y=TRUE) # needed for robcov, anova test='LR'fit # outputs plain, LaTeX, or html markupdescribe(y ~ treat +scored(num.diseases) +rcs(age))# or use describe(formula(fit)) for all variables used in fit# describe function (in Hmisc) gets simple statistics on variables# fit <- robcov(fit) # Would make all statistics that follow# use a robust covariance matrix# would need x=T, y=T in lrm()specs(fit) # Describe the design characteristicsanova(fit) # Wald tests, plain, LaTex, or htmlanova(fit, test='LR') # Likelihood ratio testsanova(fit, treat, cholesterol) # Test these 2 by themselvesplot(anova(fit)) # Summarize anova graphicallysummary(fit) # Estimate effects using default ranges# prints plain, LaTeX, or htmlplot(summary(fit)) # Graphical display of effects with C.I.summary(fit, treat="b", age=60) # Specify reference cell and adjustment valsummary(fit, age=c(50,70)) # Estimate effect of increasing age from# 50 to 70summary(fit, age=c(50,60,70)) # Increase age from 50 to 70, adjust to# 60 when estimating effects of other# factors# If had not defined datadist, would have to define ranges for all var.# Estimate and test treatment (b-a) effect averaged over 3 cholesterolscontrast(fit, list(treat='b', cholesterol=c(150,200,250)),list(treat='a', cholesterol=c(150,200,250)),type='average')# Add conf.type='profile' got get more accurate profile likilihood CIs# as well as likelihood ratio tests for contrasts.# See the help file for contrast.rms for several examples of# how to obtain joint tests of multiple contrasts and how to get# double differences (interaction contrasts)p <-Predict(fit, age=seq(20,80,length=100), treat, conf.int=FALSE)plot(p) # Plot relationship between age and log# or ggplot(p), plotp(p) # odds, separate curve for each treat,# no C.I.plot(p, ~ age | treat) # Same but 2 panelsggplot(p, groups=FALSE)bplot(Predict(fit, age, cholesterol, np=50))# 3-dimensional perspective plot for age,# cholesterol, and log odds using default# ranges for both variablesplot(Predict(fit, num.diseases, fun=function(x) 1/(1+exp(-x)), conf.int=.9),ylab="Prob") # Plot estimated probabilities instead of# log odds (or use ggplot())# can also use plotp() for plotly# Again, if no datadist were defined, would have to tell plot all limitslogit <-predict(fit, expand.grid(treat="b",num.dis=1:3,age=c(20,40,60),cholesterol=seq(100,300,length=10)))# Could also obtain list of predictor settings interactively}logit <-predict(fit, gendata(fit, nobs=12))# Since age doesn't interact with anything, we can quickly and# interactively try various transformations of age, taking the spline# function of age as the gold standard. We are seeking a linearizing# transformation.ag <-10:80logit <-predict(fit, expand.grid(treat="a", num.dis=0, age=ag,cholesterol=median(cholesterol)), type="terms")[,"age"]# Note: if age interacted with anything, this would be the age# "main effect" ignoring interaction terms# Could also use# logit <- Predict(f, age=ag, ...)$yhat,# which allows evaluation of the shape for any level of interacting# factors. When age does not interact with anything, the result from# predict(f, ..., type="terms") would equal the result from# Predict if all other terms were ignored# Could also specify# logit <- predict(fit, gendata(fit, age=ag, cholesterol=...))# Un-mentioned variables set to reference valuesplot(ag^.5, logit) # try square root vs. spline transform.plot(ag^1.5, logit) # try 1.5 powerlatex(fit) # fit in math notation# Draw a nomogram for the model fitplot(nomogram(fit))# Compose R function to evaluate linear predictors analyticallyg <-Function(fit)g(treat='b', cholesterol=260, age=50)# Letting num.diseases default to reference value
To examine interactions in a simpler way, you may want to group age into tertiles:
Code
age.tertile <-cut2(age, g=3)# For automatic ranges later, add age.tertile to datadist inputfit <-lrm(y ~ age.tertile *rcs(cholesterol))
6.4 Other Functions
processMI: works with Hmisc::fit.mult.impute to process resampling validation statistics and likelihood ratio \(\chi^2\) statistics accounting for multiple imputation
supsmu: Friedman’s “super smoother”
lowess: Cleveland’s scatterplot smoother
glm: generalized linear models (see Glm)
gam: Generalized additive models
rpart: Like original CART with surrogate splits for missings, censored data extension (Atkinson & Therneau)
validate.rpart: in rms; validates recursive partitioning with respect to certain accuracy indexes
loess: multi-dimensional scatterplot smoother
Code
f <-loess(y ~ age * pressure)plot(f) # cross-sectional plotsages <-seq(20,70,length=40)pressures <-seq(80,200,length=40)pred <-predict(f, expand.grid(age=ages, pressure=pressures))persp(ages, pressures, pred) # 3-d plot
```{r setup, include=FALSE}require(Hmisc)getRs('qbookfun.r')```# R Software {#sec-r}`R` allows interaction spline functions, wide variety ofpredictor parameterizations, wide variety of models, unifying modelformula language, model validation by resampling. `R` is comprehensive:* Easy to write `R` functions for new models $\rightarrow$ wide variety of modern regression models implemented (trees, nonparametric, ACE, AVAS, survival models for multiple events)* Designs can be generated for any model $\rightarrow$ all handle "class" var, interactions, nonlinear expansions* Single `R` objects (e.g., fit object) can be self-documenting $\rightarrow$ automatic hypothesis tests, predictions for new data* Superior graphics* Classes and generic functions## The `R` Modeling Language`R` statistical modeling language:```{r eval=FALSE}response ~ termsy ~ age + sex # age + sex main effectsy ~ age + sex + age:sex # add second-order interactiony ~ age*sex # second-order interaction + # all main effectsy ~ (age + sex + pressure)^2 # age+sex+pressure+age:sex+age:pressure...y ~ (age + sex + pressure)^2 - sex:pressure # all main effects and all 2nd order # interactions except sex:pressurey ~ (age + race)*sex # age+race+sex+age:sex+race:sexy ~ treatment*(age*race + age*sex) # no interact. with race,sexsqrt(y) ~ sex*sqrt(age) + race# functions, with dummy variables generated if# race is an R factor (classification) variabley ~ sex + poly(age,2) # poly generates orthogonal polynomialsrace.sex <- interaction(race,sex)y ~ age + race.sex # for when you want dummy variables for # all combinations of the factors```The formula for a regression model is given to a modelingfunction, e.g.```{r eval=FALSE}lrm(y ~ rcs(x,4))```is read "use a logistic regression model to model y as a function of x, representing x bya restricted cubic spline with 4 default knots"^[`lrm` and`rcs` are in the `rms` package.].`update` function: re-fit model with changes in terms or data:```{r eval=FALSE}f <- lrm(y ~ rcs(x,4) + x2 + x3)f2 <- update(f, subset=sex=="male")f3 <- update(f, .~.-x2) # remove x2 from modelf4 <- update(f, .~. + rcs(x5,5))# add rcs(x5,5) to modelf5 <- update(f, y2 ~ .) # same terms, new response var.```## User-Contributed Functions* `R` is high-level object-oriented language. * `R` runs on all platforms, and can be run on chromebooks using [RStudio Server](https://www.rstudio.com/products/rstudio/#rstudio-server)* Multitude of user-contributed functions freely available* International community of usersSome `R` functions:* See Venables and Ripley * Hierarchical clustering: `hclust`* Principal components: `princomp, prcomp`* Canonical correlation: `cancor`* Nonparametric transform-both-sides additive models: <br> `ace, avas`* Parametric transform-both-sides additive models: <br> `areg`,`areg.boot` (`Hmisc` package in `R`))* Rank correlation methods: <br> `rcorr`, `hoeffd`,`spearman2` (`Hmisc`)* Variable clustering: `varclus` (`Hmisc`)* Single imputation: `transcan` (`Hmisc`)* Multiple imputation: `aregImpute` (`Hmisc`)* Restricted cubic splines: <br>`rcspline.eval` (`Hmisc`)* Re-state restricted spline in simpler form:<br> `rcspline.restate` (`Hmisc`)## The `rms` Package**Big Picture*** `rms` package handles many popular standard models* It makes it easier to do the right thing* There are more extensible, general approaches, especially using Bayesian methods + `brms` package + `rstanarm` package + `rstanarm` survival analysis system`r mrg(rmsdisc())`[[FAQ](http://datamethods.org/rms)]{.aside}* `datadist` function to compute predictor distribution summaries ```{r eval=FALSE}y ~ sex + lsp(age,c(20,30,40,50,60)) + sex %ia% lsp(age,c(20,30,40,50,60))```E.g. restrict age $\times$cholesterol interaction to be of form $AF(B) + BG(A)$:```{r eval=FALSE}y ~ lsp(age,30) + rcs(cholesterol,4) + lsp(age,30) %ia% rcs(cholesterol,4)```Special fitting functions by Harrell to simplifyprocedures described in these notes:| Function | Purpose | Related `R` Functions||-----|-------------------------------|---|| `ols` | Ordinary least squares linear model | `lm` || `lrm` | Binary and ordinal logistic regression model with options for penalize MLE| `glm` || `orm` | Ordinal semi-parametric regression model for continuous $Y$ and several link functions |`polr`,`lrm` || `psm` | Accelerated failure time parametric survival models | `survreg` || `cph` | Cox proportional hazards regression | `coxph` || `bj` | Buckley-James censored least squares model | `survreg`,`lm` || `Glm` | `rms` version of `glm` | `glm` || `Gls` | `rms` version of `gls` | `gls` (`nlme` package) || `Rq` | `rms` version of `rq` | `rq` (`quantreg` package) |: `rms` Fitting Functions| Function | Purpose | Related `R` Functions ||-----|-----|| `asis` | No post-transformation (seldom used explicitly) | `I` || `rcs` | Restricted cubic splines | `ns` || `pol` | Polynomial using standard notation | `poly` || `lsp` | Linear spline | || `catg` | Categorical predictor (seldom) | `factor` || `scored` | Ordinal categorical variables | `ordered` || `matrx` | Keep variables as group for `anova` and `fastbw` | `matrix` || `strat` | Non-modeled stratification factors (used for `cph` only) | `strata` |: `rms` Transformation FunctionsThe transformation functions work also with regular R functions, e.g. when `predict()` is called the predicted values are computed by looking up the knot locations for `rcs`.Below notice that there are three graphic models implemented fordepicting the effects of predictors in the fitted model: `lattice`graphics, a `ggplot` method using the `ggplot2` package (whichhas an option to convert the result to `plotly`), and a direct`plotly` method. `plotly` is used to create somewhatinteractive graphics with drill-down capability, and the `rms`package takes advantage of this capability. `plotly` graphics arebest used with RStudio Rmarkdown html output.| Function | Purpose | Related Functions ||----------|--------------------------------|-----|| `print` | Print parameters and statistics of fit | || `coef` | Fitted regression coefficients | || `formula` | Formula used in the fit | || `specs` | Detailed specifications of fit | || `vcov` | Fetch covariance matrix | || `logLik` | Fetch maximized log-likelihood | || `AIC` | Fetch AIC with option to put on chi-square basis | || `lrtest` | Likelihood ratio test for two nested models | || univarLR | Compute all univariable LR $\chi^{2}$ | || `robcov` | Robust covariance matrix estimates | || `bootcov` | Bootstrap covariance matrix estimates and bootstrap distribution of estimates | || `pentrace` | Find optimum penalty factors by tracing effective AIC for a grid of penalties | || `effective.df`| Print effective d.f. for each type of variable in model, for penalized fit or `pentrace` results | || `summary` | Summary of effects of predictors | || `plot.summary` | Plot continuously shaded confidence bars for results of `summary`| || `anova` | Wald and LR tests of most meaningful hypotheses | || `plot.anova` | Graphical depiction of anova | || `contrast` | General contrasts, C.L., tests | || `gendata` | Easily generate predictor combinations | || `predict` | Obtain predicted values or design matrix | || `Predict` | Obtain predicted values and confidence limits easily varying a subset of predictors and others set at default values| || `plot.Predict` | Plot the result of `Predict` using `lattice`| || `ggplot.Predict` | Plot the result of `Predict` using `ggplot2` | || `plotp.Predict` | Plot the result of `Predict` using `plotly` | || `fastbw` | Fast backward step-down variable selection | `step` || `residuals` | (or `resid`) Residuals, influence stats from fit | || `sensuc` | Sensitivity analysis for unmeasured confounder | || `which.influence` | Which observations are overly influential |`residuals` || `latex` | $\LaTeX$ representation of fitted model | `Function` || `Function`| `R` function analytic representation of $X\hat{\beta}$ from a fitted regression model| `latex` || `Hazard` | `R` function analytic representation of a fitted hazard function (for `psm`) | || `Survival` | `R` function analytic representation of fitted survival function (for `psm`, `cph`) | || `Quantile` | `R` function analytic representation of fitted function for quantiles of survival time (for `psm`, `cph`) | || `Mean` | `R` function analytic representation of fitted function for mean survival time or for ordinal logistic | || `nomogram` | Draws a nomogram for the fitted model | `latex`, `plot` || `survest` | Estimate survival probabilities (`psm`, `cph`) |`survfit` || `survplot` | Plot survival curves (`psm`, `cph`) | `plot.survfit` || `survplotp` | Plot survival curves with `plotly` features | `survplot` || `validate` | Validate indexes of model fit using resampling | || `val.prob` | External validation of a probability model | `lrm` || `val.surv` | External validation of a survival model | `calibrate` || `calibrate` | Estimate calibration curve using resampling | `val.prob` || `vif` | Variance inflation factors for fitted model | || `naresid` | Bring elements corresponding to missing data back into predictions and residuals | || `naprint` | Print summary of missing values | || `impute` | Impute missing values | `aregImpute` |: `rms` After-fit Functions| Function | Purpose ||-----|-----|| `blrm` | Bayesian binary and ordinal logistic model || `stackMI` | Bayesian posterior stacking for multiple imputation || `stanDx` | Stan diagnostics on fit || `stanDxplot` | Trace plots to check posterior sampling convergence || `PostF` | Creates R function for computing posterior probabilities || `plot.rmsb` | Plot posterior densities, intervals, point summaries || `compareBmods` | Compare two models using LOO-cv || `HPDint` | Compute highest posterior density interval || `distSym` | Compute meaure of symmetry of posterior distribution |: `rmsb`: Bayesian Regression Modeling Strategies Package, Focusing on Semiparametric Univariate and Longitudinal ModelsAn extensive overview of Bayesian capabilities of the `rmsb` packagemay be found at [hbiostat.org/R/rmsb/blrm.html](https://hbiostat.org/R/rmsb/blrm.html).Global options `prType` and `grType` control printed and somegraphical output, respectively as shown in example code below. Thedefault is plain output and static graphics. If using `plotly`interactive graphics through `ggplot` or `plotp` or with`anova` or `summary` functions it is best todo so with RStudio `html` outputor `html` notebooks. If using `html` output you must beproducing an `html` document or notebook. When setting `grType`to use $\LaTeX$ or `html` it is highly recommended that you use the`knitr` package.Example:* `treat`: categorical variable with levels `"a","b","c"`* `num.diseases`: ordinal variable, 0-4* `age`: continuous <br> Restricted cubic spline* `cholesterol`: continuous <br> (3 missings; use median) <br>`log(cholesterol+10)`* Allow `treat` $\times$ `cholesterol` interaction* Program to fit logistic model, test all effects in design, estimate effects (e.g. inter-quartile range odds ratios), plot estimated transformations<!-- NEW: x=TRUE y=TRUE and LR tests -->```{r eval=FALSE}require(rms) # make new functions availableoptions(prType='html') # print, summary, anova, validate: html output # others: 'latex', 'plain'options(grType='plotly') # plotly graphics for ggplot, anova, summary # default is 'base' for static graphicsddist <- datadist(cholesterol, treat, num.diseases, age)# Could have used ddist <- datadist(data.frame.name)options(datadist="ddist") # defines data dist. to rmscholesterol <- impute(cholesterol)fit <- lrm(y ~ treat + scored(num.diseases) + rcs(age) + log(cholesterol+10) + treat:log(cholesterol+10), x=TRUE, y=TRUE) # needed for robcov, anova test='LR'fit # outputs plain, LaTeX, or html markupdescribe(y ~ treat + scored(num.diseases) + rcs(age))# or use describe(formula(fit)) for all variables used in fit# describe function (in Hmisc) gets simple statistics on variables# fit <- robcov(fit) # Would make all statistics that follow # use a robust covariance matrix # would need x=T, y=T in lrm()specs(fit) # Describe the design characteristicsanova(fit) # Wald tests, plain, LaTex, or htmlanova(fit, test='LR') # Likelihood ratio testsanova(fit, treat, cholesterol) # Test these 2 by themselvesplot(anova(fit)) # Summarize anova graphicallysummary(fit) # Estimate effects using default ranges # prints plain, LaTeX, or htmlplot(summary(fit)) # Graphical display of effects with C.I.summary(fit, treat="b", age=60) # Specify reference cell and adjustment valsummary(fit, age=c(50,70)) # Estimate effect of increasing age from # 50 to 70summary(fit, age=c(50,60,70)) # Increase age from 50 to 70, adjust to # 60 when estimating effects of other # factors# If had not defined datadist, would have to define ranges for all var.# Estimate and test treatment (b-a) effect averaged over 3 cholesterolscontrast(fit, list(treat='b', cholesterol=c(150,200,250)), list(treat='a', cholesterol=c(150,200,250)), type='average')# Add conf.type='profile' got get more accurate profile likilihood CIs# as well as likelihood ratio tests for contrasts.# See the help file for contrast.rms for several examples of# how to obtain joint tests of multiple contrasts and how to get# double differences (interaction contrasts)p <- Predict(fit, age=seq(20,80,length=100), treat, conf.int=FALSE)plot(p) # Plot relationship between age and log# or ggplot(p), plotp(p) # odds, separate curve for each treat, # no C.I.plot(p, ~ age | treat) # Same but 2 panelsggplot(p, groups=FALSE)bplot(Predict(fit, age, cholesterol, np=50)) # 3-dimensional perspective plot for age, # cholesterol, and log odds using default # ranges for both variablesplot(Predict(fit, num.diseases, fun=function(x) 1/(1+exp(-x)), conf.int=.9), ylab="Prob") # Plot estimated probabilities instead of # log odds (or use ggplot()) # can also use plotp() for plotly# Again, if no datadist were defined, would have to tell plot all limitslogit <- predict(fit, expand.grid(treat="b",num.dis=1:3,age=c(20,40,60), cholesterol=seq(100,300,length=10)))# Could also obtain list of predictor settings interactively}logit <- predict(fit, gendata(fit, nobs=12))# Since age doesn't interact with anything, we can quickly and# interactively try various transformations of age, taking the spline# function of age as the gold standard. We are seeking a linearizing# transformation.ag <- 10:80logit <- predict(fit, expand.grid(treat="a", num.dis=0, age=ag, cholesterol=median(cholesterol)), type="terms")[,"age"]# Note: if age interacted with anything, this would be the age# "main effect" ignoring interaction terms# Could also use# logit <- Predict(f, age=ag, ...)$yhat,# which allows evaluation of the shape for any level of interacting# factors. When age does not interact with anything, the result from# predict(f, ..., type="terms") would equal the result from# Predict if all other terms were ignored# Could also specify# logit <- predict(fit, gendata(fit, age=ag, cholesterol=...))# Un-mentioned variables set to reference valuesplot(ag^.5, logit) # try square root vs. spline transform.plot(ag^1.5, logit) # try 1.5 powerlatex(fit) # fit in math notation# Draw a nomogram for the model fitplot(nomogram(fit))# Compose R function to evaluate linear predictors analyticallyg <- Function(fit)g(treat='b', cholesterol=260, age=50)# Letting num.diseases default to reference value```To examine interactions in a simpler way, you may want to group ageinto tertiles:```{r eval=FALSE}age.tertile <- cut2(age, g=3)# For automatic ranges later, add age.tertile to datadist inputfit <- lrm(y ~ age.tertile * rcs(cholesterol))```## Other Functions* `processMI`: works with `Hmisc::fit.mult.impute` to process resampling validation statistics and likelihood ratio $\chi^2$ statistics accounting for multiple imputation* `supsmu`: Friedman's "super smoother" * `lowess`: Cleveland's scatterplot smoother* `glm`: generalized linear models (see `Glm`)* `gam`: Generalized additive models* `rpart`: Like original CART with surrogate splits for missings, censored data extension (Atkinson \& Therneau)* `validate.rpart`: in `rms`; validates recursive partitioning with respect to certain accuracy indexes* `loess`: multi-dimensional scatterplot smoother```{r eval=FALSE}f <- loess(y ~ age * pressure)plot(f) # cross-sectional plotsages <- seq(20,70,length=40)pressures <- seq(80,200,length=40)pred <- predict(f, expand.grid(age=ages, pressure=pressures))persp(ages, pressures, pred) # 3-d plot```