Overview and Setup

blrm is for Bayesian binary and ordinal proportional odds logistic regression. It is the analog of rms::lrm and for ordinal responses is intended for outcomes with up to a few dozen ordinal levels. The Bayesian approach has a number of advantage over traditional frequentist models, including

  1. the use of exact calculations (to within simulation error) instead of large sample (e.g., normal theory) approximations to p-values and confidence intervals
  2. exact and more intuitive inference when random effects are included in the model
  3. the ability to make probability statements about parameters and combinations of parameters, which includes computations or the probabilities of assertions such as “the effect of \(x_1\) exceeds 1.2 and the effect of \(x_2\) exceeds 0.7”
  4. capturing more sources of uncertainty. For example, the blrm function automatically computes higheset posterior density intervals on a variety of statistical indexes such as the Brier score and AUROC (\(c\)-index). Note: By default these intervals are computed using only 400 posterior draws to save time. For a blrm fit object f you can specify how many samples to draw, to get more accurate intervals, by specifying for example print(f, ns=2000).
  5. the ability to incorporate external information or beliefs about parameters using prior distributions

blrm uses normal priors for the \(\beta\) parameters, and the user may specify the standard deviation of these priors separately for each model parameters. The default is \(\sigma=100\) which is nearly flat. When there are random effects, the analyst may specify the mean of the exponential distribution that serves as the prior for the standard deviation of the random effects, with the default mean being 1.0, a reasonable number for the logit scale. There is also an option to use a half-\(t\) distribution for the SD of random effects.

blrm uses Stan code written by Ben Goodrich of Columbia University. To use blrm you must prepare by doing the following:

  1. Install the rstan package
  2. Run the following R commands to compile Stan code used by rms whenever one of the .stan files changes here
require(rms)    # On my system: stancompiled='~/R/stan'
options(stancompiled='whatever directory to hold compiled Stan code')
stanCompile()

This will download and compile the .stan programs, and will abort if you did not install rstan first. Each program takes about a minute to compile, and you do not need to do this step for each of your project directories. Instead, store the compiled R objects centrally.

It is recommended that you put something like options(stancompiled='~/R/stan') in your home directory’s .Rprofile file.

require(rms)
knitrSet(lang='markdown', w=7, h=7, fig.path='png/')
options(mc.cores = parallel::detectCores())   # use max # CPUs
options(stancompiled='~/R/stan', prType='html')
# Run the following once to store compiled Stan code in a central
# place for all projects
# stanCompile('~/R/stan')



You’ll see file='...' in the longer running of the blrm calls below. If the file already exists and none of the data nor the options sent to rstan nor the underlying Stan code have changed from what were used to create the fit object stored in that file (as judged by their md5 hash), that saved fit object is returned immediately without running the rstan code, often saving a great deal of execution time. This works well with the workflow of long R markdown reports making it so that only portions of Bayesian analysis that have changed are run. Note that using the knitr cache=TRUE option does not work well as the cache files for this script were about a GB in size, and the system does not accurately recognize when a model fit hasn’t changed and doesn’t need to be run when rstan is involved.

A restored fit object does not contain the rstan object, saving tens of megabytes of storage. Standard Stan diagnostics are stored in the fit object separately, and it is assumed that if the user wanted to run rstan::loo loo=TRUE would have been specified to blrm so that loo is run and its (small) result stored in the blrm fit object. If you want to run pairs to get more graphical diagnostics, intead of relying on the rstan object always being available, specify pairs=TRUE or pairs='filename.png' to blrm to graph the pair plots. The latter is recommended, and one can put knitr::include_graphics('filename'png') in the R code chunk to render the graph in the report even if blrm was not re-run.

When file='filename.rds' is specified to blrm and the file does not exist or analysis inputs have changed since the file was created, the blrm fit object will be saved in saveRDS .rds binary format in your current working directory at the completion of blrm’s work. The rstan component is omitted from the saved file.

The rms utility function fitIf is another way to efficiently manage Bayesian simulation workflow. The fitIf approach runs the analysis only if the file doesn’t already exist. If it exists but the data or the model have changed, the fitIf approach is not intelligent enough to re-run the analysis (unlike the file='...' approach above). Whether using fitIf or file=, you lose the ability to run the rms::stanDxplot(..., rstan=TRUE) on restored objects, so the stanDxplot function tries to find an existing trace plot image file corresponding to the current R markdown chunk when the fit object no longer has the rstan component. For most purposes this doesn’t matter, because running using the defaults shows the needed non-burnin samples which are always stored in fit objects.

The blrm AR(1) approach is experimental and is recommended only for statistical research at present; see warnings in examples below.

Proportional Odds and Binary Logistic Regression

Example: 10-level Ordinal Outcome

Simulate a dataset with three predictors and one 10-level ordinal outcome. Run the frequentist proportional odds model then the Bayesian one.

set.seed(1)
n <- 500
x1 <- runif(n, -1, 1)
x2 <- runif(n, -1, 1)
x3 <- sample(0 : 1, n, TRUE)
y <- x1 + 0.5 * x2 + x3 + rnorm(n)
y <- as.integer(cut2(y, g=10))
dd <- datadist(x1, x2, x3); options(datadist='dd')
f <- lrm(y ~ x1 + pol(x2, 2) + x3, eps=1e-7) # eps to check against rstan
f

Logistic Regression Model

 lrm(formula = y ~ x1 + pol(x2, 2) + x3, eps = 1e-07)
 

Frequencies of Responses

  1  2  3  4  5  6  7  8  9 10 
 50 50 50 50 50 50 50 50 50 50 
 
Model Likelihood
Ratio Test
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 500 LR χ2 235.04 R2 0.379 C 0.740
max |∂log L/∂β| 4×10-14 d.f. 4 g 1.561 Dxy 0.480
Pr(>χ2) <0.0001 gr 4.762 γ 0.480
gp 0.302 τa 0.433
Brier 0.181
β S.E. Wald Z Pr(>|Z|)
y≥2   2.0552  0.1951 10.53 <0.0001
y≥3   1.0832  0.1678 6.45 <0.0001
y≥4   0.4006  0.1606 2.49 0.0126
y≥5  -0.1790  0.1599 -1.12 0.2629
y≥6  -0.7246  0.1626 -4.46 <0.0001
y≥7  -1.2867  0.1689 -7.62 <0.0001
y≥8  -1.8982  0.1788 -10.62 <0.0001
y≥9  -2.6288  0.1953 -13.46 <0.0001
y≥10  -3.6408  0.2282 -15.95 <0.0001
x1   1.5694  0.1532 10.24 <0.0001
x2   0.7901  0.1368 5.78 <0.0001
x22  -0.2213  0.2578 -0.86 0.3908
x3   1.6341  0.1707 9.57 <0.0001

This Two

Yes

Computing Environment

 R version 4.0.0 (2020-04-24)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Pop!_OS 20.04 LTS
 
 Matrix products: default
 BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
 
 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base     
 
 other attached packages:
 [1] rms_6.0-0       SparseM_1.78    Hmisc_4.4-1     ggplot2_3.3.0  
 [5] Formula_1.2-3   survival_3.1-12 lattice_0.20-41
 
To cite R in publication use:

R Core Team (2020). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.