rmsb Package Examples Using cmdstan

Author
Affiliation

Department of Biostatistics
Vanderbilt University School of Medicine
VICTR Research Methods Program

Published

April 13, 2025

1 Overview and Setup

The rmsb package is the Bayesian companion to the rms package, and uses many of its functions for post-model-fitting analysis and graphics. The sole fitting function in rmsb at present is blrm. blrm is for Bayesian binary and ordinal proportional odds logistic regression and the Peterson & Harrell (1990) partial proportional odds model that relaxes the proportional odds assumption. It is the analog and generalization of rms::lrm and for ordinal responses is intended for outcomes with up to a several hundred ordinal levels. The Bayesian approach has a number of advantage over traditional frequentist models, including

  1. the use of exact calculations (to within simulation error) instead of large sample (e.g., normal theory) approximations to p-values and confidence intervals
  2. exact and more intuitive inference when random effects are included in the model
  3. the ability to make probability statements about parameters and combinations of parameters, which includes computations of the probabilities of assertions such as “the effect of \(x_1\) exceeds 1.2 and the effect of \(x_2\) exceeds 0.7”
  4. capturing more sources of uncertainty. For example, the blrm function automatically computes highest posterior density intervals on a variety of statistical indexes such as the Brier score and AUROC (\(c\)-index). Note: By default these intervals are computed using only 400 posterior draws to save time. For a blrm fit object f you can specify how many samples to draw, to get more accurate intervals, by specifying for example print(f, ns=2000).
  5. the ability to incorporate external information or beliefs about linear combinations of parameters using prior distributions on contrasts
  6. by using Stan language to specify the likelihood components, one can not only do posterior distribution sampling but can quickly compute maximum likelihood estimates (Bayesian posterior modes)

blrm uses uninformative priors for \(\beta\) parameters that are not used in contrasts appearing in the pcontrast argument. For other \(\beta\)s, priors are put on contrasts specified in pcontrast, capitalizing on Stan automatically propagating priors on linear combinations of parameters to priors on individual parameters. This allows priors to be specified on more of a user-friendly scale, and allows one to put priors on specific comparisons, nonlinear effects, interactions, parts of interactions, and more. This versatility is shown in a series of examples here. One example is the common situation in which there is a time \(\times\) treatment interaction and the prior for the treatment effect is needed to be applied to a specific time.

When there are random effects, the analyst may specify the mean of the exponential distribution that serves as the prior for the standard deviation of the random effects, with the default mean being 1.0, a reasonable number for the logit scale. You can also use a half-\(t\) distribution for the SD of random effects.

Regarding uncertainties about statistical indexes of model performance such as AUROC, the Bayesian posterior distributions computed by rmsb account for coefficient uncertainty for the sample at hand’s ability to uncover the true unknown data generating coefficients and how strongly those coefficients can predict outcomes in the sample. The posterior distributions do not pertain to sample-to-sample variability of model fit or model validation.

blrm uses Stan code written by Ben Goodrich of Columbia University and Frank Harrell. This code is precompiled when the rmsb package is built when using rstan for which you must install the rstan package to use rmsb. You can use cmdstan and the cmdstanr R package instead. This is generally recommended, as cmdstan and cmdstanr make use of the latest versions of Stan. Here are the steps I used for installing these on Linux, using the latest version of cmdstan:

cd ~
sudo R
install.packages('cmdstanr',
                  repos = c("https://stan-dev.r-universe.dev",
                            "https://cloud.r-project.org"),
                  lib='/usr/local/rpackage')
# Default location for cmdstan is in `~/.cmdstan` in the user's home directory
# This is not such a good idea
cmdstanr::install_cmdstan(cores=10, dir='/usr/local/bin')
# When running sudo, default is /root/.cmdstan which is not easily accessible

Add the following line in .zshrc or .bashrc so that cmdstanr will know where to find cmdstan:

export CMDSTAN=/usr/local/bin/cmdstan-2.36.0

This did not work consistently. Instead of that, I have the location defined in cmdstan.loc in ~/.Rprofile.

cmdstanr will try to put compiled Stan code in the same directory where cmdstan is stored. With the above setup, this location is not user-writeable, so it will not work. The blrm function by default puts compiled code in the .rmsb directory under the user’s home directory. Specify options(rmsbdir='some another location') to change this. The default behavior puts all compiled code in a system-wide location. If you desire to archive a project and reproduce the result even if the Stan code inside rmsb is updated, you may want to specify a project-specific location fo rmsbdir.

Here is a typical setup code chunk for using rmsb. When using rstan, it is not recommended to use options(auto_write=TRUE) as when running a series of regressions, the rstan package tends to recompile Stan code even when the code doesn’t change.

```{r setup}
require(rmsb)
options(mc.cores = parallel::detectCores() - 1)   # use max # CPUs less 1
# The default number of chains cmdstanr uses is 4, so it doesn't do any good
# to specify more than 4 cores in most cases.
```

If you want to use cmdstan by default, avoiding having to use the backend argument to blrm, you can add this to the setup chunk:

options(rmsb.backend='cmdstan')

This vignette is run separately to use rstan instead of cmdstanr. The variable wstan below defines which system to run. So that the file= arguments to blrm can store results in different files, a little function rfile is created below. It checks wstan and when using cmdstan adds "c" to the end of the main part of the file name. rfile also automatically quotes the base file name specified by the user.

wstan <- c('cmdstan', 'rstan')[1]
rfile <- function(f)
  paste0(as.character(substitute(f)), if(wstan == 'cmdstan') 'c', '.rds')
require(rmsb)
knitrSet(lang='quarto', w=7, h=7)
options(prType='html')     # for rms output: print, anova, summary, etc.
options(mc.cores = parallel::detectCores() - 1)   # use max # CPUs less 1
if(wstan == 'cmdstan') {
  Sys.getenv('CMDSTAN')
  # cmdstanr::cmdstan_path()
  # Sometimes RStudio doesn't find the environment variable
  cat('cmdstan.loc=', cmdstan.loc, '\n')
  cmdstanr::set_cmdstan_path(cmdstan.loc)
  options(rmsb.backend='cmdstan')
    }
cmdstan.loc= /usr/local/bin/cmdstan-2.36.0 

1.1 Running Fits Only When Something Changes

You’ll see file='...' in the longer running of the blrm calls below (actually the code uses rfile just defined to construct the file name). If the file already exists and none of the data nor the options sent to rstan/cmdstanr nor the underlying Stan code have changed from what were used to create the fit object stored in that file (as judged by their md5 hash), that saved fit object is returned immediately without running the rstan code, often saving a great deal of execution time. This works well with the workflow of long R markdown reports making it so that only portions of Bayesian analysis that have changed are run. Note that using the knitr cache=TRUE option does not work well as the cache files for this script were about a GB in size, and the system does not accurately recognize when a model fit hasn’t changed and doesn’t need to be run when rstan is involved.

A restored fit object does not contain the rstan or cmdstanr object, saving tens of megabytes of storage. Standard Stan diagnostics are stored in the fit object separately, and it is assumed that if the user wanted to run rstan::loo loo=TRUE would have been specified to blrm so that loo is run and its (small) result stored in the blrm fit object. If you want to run pairs to get more graphical diagnostics, intead of relying on the rstan object always being available, specify pairs=TRUE or pairs='filename.png' to blrm to graph the pair plots. The latter is recommended, and one can put knitr::include_graphics('filename.png') in the R code chunk to render the graph in the report even if blrm was not re-run.

When file='filename.rds' is specified to blrm and the file does not exist or analysis inputs have changed since the file was created, the blrm fit object will be saved in saveRDS .rds binary format in your current working directory at the completion of blrm’s work. The rstan component is omitted from the saved file.

The utility function fitIf available from here is another way to efficiently manage Bayesian simulation workflow. The fitIf approach runs the analysis only if the file doesn’t already exist. If it exists but the data or the model have changed, the fitIf approach is not intelligent enough to re-run the analysis (unlike the file='...' approach above). Whether using fitIf or file=, you lose the ability to run the rms::stanDxplot(..., rstan=TRUE) on restored objects, so the stanDxplot function tries to find an existing trace plot image file corresponding to the current R markdown chunk when the fit object no longer has the rstan component. For most purposes this doesn’t matter, because running using the defaults shows the needed non-burnin samples which are always stored in fit objects. The file=filename.rds approach is preferred.

1.2 Priors

It is often difficult to specify priors for parameters, especially when there are nonlinear effects (e.g., splines) and interactions in the model. We need a way to specify priors on the original \(X\) and \(Y\) scales. Fortunately Stan provides an elegant solution.

As discussed here Stan allows one to specify priors on transformations of model parameters, and these priors propagate back to the original parameters. It is easier to specify a prior for the effect of increasing age from 30 to 60 that it is to specify a prior for the age slope. It may be difficult to specify a prior for an age \(\times\) treatment interaction, but much easier to specify a prior for how different the treatment effect is for a 30 year old and a 60 year old. By specifying priors on one or more contrasts one can easily encode outside information / information borrowing / shrinkage.

The rms contrast function provides a general way to implement contrasts up to double differences, and more details about computations are provided in that link. The approach used for specifying priors for contrast in rmsb::blrm uses the same process but is even more general. Both contrast and blrm compute design matrices at user-specified predictor settings, and the contrast matrices (matrices multipled by \(\hat{\beta}\)) are simply differences in such design matrices. Thinking of contrasts as differences in predicted values frees the user from having to care about how parameters map to estimands, and allows an R predict(fit, type='x') function do the hard work. See here for several interesting examples.

Beginning with rmsb version 1.0-0, prior standard deviations are not specified for individual parameters, and priors are only specified for contrasts through the pcontrast and npcontrast arguments. Since blrm transforms the design matrix through a QR decomposition to speed up posterior sampling, it also transforms the contrast matrices the same way so that priors for contrasts are in the right space inside the Stan code. Parameters not used in any contrast will have non-informative prior distributions.

For non-proportional odds effects in a constrained partial PO model, priors are specified through contrasts using the npcontrast argument, which is a list of the same structure as pcontrast but corresponds to the model given to the second argument to blrm.

Instead of specifying prior standard deviations directly, it is convenient to solve for SDs that correspond to a specified probability for a specified interval of an effect in the model. Effects for the PO model are usually expressed as odds ratios (OR). For the case where the prior median for the OR is 1.0 (prior mean or median log(OR)=0.0) it is useful to solve for the prior SD \(\sigma\) so that \(\Pr(\text{OR} > r) = a = \Pr(\text{OR} < \frac{1}{r})\), leading to \(a = \frac{|\log(r)|}{\Phi^{-1}(1-a)}\), computed by the psigma function below. Another function . is defined as an abbreviation for list() for later usage with pcontrast.

psigma <- function(r, a, inline=FALSE, pr=! inline) {
  sigma <- abs(log(r)) / qnorm(1 - a)
  dir <- if(r > 1.) '>' else '<'
  x <- if(inline) paste0('$\\Pr(\\text{OR}', dir, r, ') =', a,
                         ' \\Rightarrow \\sigma=', round(sigma, 3), '$')
  else paste0('Pr(OR ', dir, ' ', r, ') = ', a, ' ⇒ σ=', round(sigma, 3))
  if(inline) return(x)
  if(pr) {
    cat('\n', x, '\n\n', sep='')
    return(invisible(sigma))
  }
  sigma
}
. <- function(...) list(...)

1.3 Example: 10-level Ordinal Outcome

Simulate a dataset with three predictors and one 10-level ordinal outcome. Run the frequentist proportional odds model then the Bayesian one.

set.seed(1)
n <- 500
x1 <- runif(n, -1, 1)
x2 <- runif(n, -1, 1)
x3 <- sample(0 : 1, n, TRUE)
y <- x1 + 0.5 * x2 + x3 + rnorm(n)
y <- as.integer(cut2(y, g=10))
dd <- datadist(x1, x2, x3); options(datadist='dd')
f <- lrm(y ~ x1 + pol(x2, 2) + x3, eps=1e-7) # eps to check against Stan
f

Logistic Regression Model

lrm(formula = y ~ x1 + pol(x2, 2) + x3, eps = 1e-07)

Frequencies of Responses

 1  2  3  4  5  6  7  8  9 10 
50 50 50 50 50 50 50 50 50 50 
Model Likelihood
Ratio Test
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 500 LR χ2 235.04 R2 0.379 C 0.740
max |∂log L/∂β| 9×10-7 d.f. 4 R24,500 0.370 Dxy 0.480
Pr(>χ2) <0.0001 R24,495 0.373 γ 0.480
Brier 0.183 τa 0.433
β S.E. Wald Z Pr(>|Z|)
y≥2   2.0552  0.1951 10.53 <0.0001
y≥3   1.0832  0.1678 6.45 <0.0001
y≥4   0.4006  0.1606 2.49 0.0126
y≥5  -0.1790  0.1599 -1.12 0.2629
y≥6  -0.7246  0.1626 -4.46 <0.0001
y≥7  -1.2867  0.1689 -7.62 <0.0001
y≥8  -1.8982  0.1788 -10.62 <0.0001
y≥9  -2.6288  0.1953 -13.46 <0.0001
y≥10  -3.6408  0.2282 -15.95 <0.0001
x1   1.5694  0.1532 10.24 <0.0001
x2   0.7901  0.1368 5.78 <0.0001
x22  -0.2213  0.2578 -0.86 0.3908
x3   1.6341  0.1707 9.57 <0.0001

Before getting posterior distributions of parameters, use Stan to just get maximum likelihood estimates and compare them with those from lrm. Do this for increasingly flat priors for the \(\beta\)s associated with x2. Running method='optimizing' is a quick way to study the effect of priors on the posterior modes for non-intercepts when there are no random effects in the model.

# Define a function that creates a `pcontrast` for `blrm`
# Skepticism of prior is specified by making changes in Y be small
# as x2 goes from -1 to 0 to 1
# Pr(OR > 2) = p
con <- function(p)
  list(sd=psigma(2, p),
        c1=.(x2=0), c2=.(x2=-1), c3=.(x2=1), c4=.(x2=0),
        contrast=expression(c1-c2, c3-c4))
k <- NULL
for(p in c(.01, .05, .1, .2)) {
    g <- blrm(y ~ x1 + pol(x2, 2) + x3, method='optimizing',
              pcontrast=con(p))
    cat('-2 log likelihood:', g$deviance, '\n')
    k <- rbind(k, g$coefficients)
}

Pr(OR > 2) = 0.01 ⇒ σ=0.298

-2 log likelihood: 2051.08 

Pr(OR > 2) = 0.05 ⇒ σ=0.421

-2 log likelihood: 2048.26 

Pr(OR > 2) = 0.1 ⇒ σ=0.541

-2 log likelihood: 2047.16 

Pr(OR > 2) = 0.2 ⇒ σ=0.824

-2 log likelihood: 2046.7 
k
         y>=2     y>=3      y>=4       y>=5       y>=6      y>=7      y>=8
[1,] 1.985119 1.023839 0.3514193 -0.2191947 -0.7563559 -1.310276 -1.913121
[2,] 2.006409 1.041019 0.3645887 -0.2095343 -0.7499753 -1.307015 -1.913131
[3,] 2.020027 1.052577 0.3741372 -0.2017828 -0.7438895 -1.302538 -1.910403
[4,] 2.037175 1.067335 0.3867145 -0.1911465 -0.7351502 -1.295593 -1.905395
          y>=9     y>=10       x1        x2        x2^2       x3
[1,] -2.632091 -3.630891 1.563146 0.5559472 -0.08210495 1.610512
[2,] -2.636521 -3.640501 1.564845 0.6520621 -0.12059023 1.620211
[3,] -2.636113 -3.642703 1.565855 0.6993637 -0.14720869 1.624592
[4,] -2.633695 -3.643145 1.567955 0.7479708 -0.18219874 1.629478
# Compare with ordinary MLEs and deviance
f$deviance
[1] 2302.585 2067.549
coef(f)
      y>=2       y>=3       y>=4       y>=5       y>=6       y>=7       y>=8 
 2.0551529  1.0831900  0.4005868 -0.1790073 -0.7246249 -1.2866643 -1.8982180 
      y>=9      y>=10         x1         x2       x2^2         x3 
-2.6287996 -3.6408392  1.5694325  0.7900802 -0.2212900  1.6340800 

Fit the model with Dirichlet priors on intercepts and wide normal priors on the \(\beta\)s. Show the model fit summary. Note that the indexes of predictive discrimination/accuracy include 0.95 highest posterior density intervals. In frequentist inference we pretend that quantities such as AUROC and \(R^2\) are estimated without error, which is far from the case.

In several places you will see an index named Symmetry. This is a measure of the symmetry of a posterior distribution. Values farther from 1.0 indicate asymmetry, which indicates that the use of standard errors and the use of a normal approximation for the posterior distribution are not justified. The symmetry index is the ratio of the gap between the posterior mean and the 0.95 quantile of the posterior distribution to the gap between the 0.05 quantile and the mean.

bs <- blrm(y ~ x1 + pol(x2, 2) + x3, file=rfile(bs))
bs

Bayesian Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.233 for Intercepts

blrm(formula = y ~ x1 + pol(x2, 2) + x3, file = rfile(bs))

Frequencies of Responses

 1  2  3  4  5  6  7  8  9 10 
50 50 50 50 50 50 50 50 50 50 
Mixed Calibration/
Discrimination Indexes
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 500 LOO log L -1047.06±13.16 g 1.572 [1.372, 1.777] C 0.738 [0.736, 0.74]
Draws 4000 LOO IC 2094.12±26.33 gp 0.113 [0.088, 0.137] Dxy 0.477 [0.471, 0.481]
Chains 4 Effective p 13.29±0.24 EV 0.147 [0.107, 0.189]
Time 2.1s B 0.077 [0.076, 0.079] v 1.9 [1.441, 2.418]
p 4 vp 0.014 [0.008, 0.019]
Mode β Mean β Median β S.E. Lower Upper Pr(β>0) Symmetry
y≥2   2.0550   2.0638   2.0583  0.1981   1.6764   2.4432  1.0000  1.06
y≥3   1.0831   1.0848   1.0851  0.1702   0.7413   1.4121  1.0000  1.02
y≥4   0.4005   0.3986   0.3973  0.1656   0.0832   0.7330  0.9932  1.04
y≥5  -0.1791  -0.1830  -0.1823  0.1631  -0.5201   0.1192  0.1295  1.03
y≥6  -0.7247  -0.7307  -0.7316  0.1666  -1.0651  -0.4222  0.0000  1.01
y≥7  -1.2867  -1.2937  -1.2913  0.1709  -1.6307  -0.9696  0.0000  0.97
y≥8  -1.8983  -1.9070  -1.9037  0.1815  -2.2590  -1.5659  0.0000  0.97
y≥9  -2.6289  -2.6404  -2.6394  0.1998  -3.0137  -2.2335  0.0000  0.97
y≥10  -3.6409  -3.6611  -3.6612  0.2315  -4.1410  -3.2376  0.0000  0.96
x1   1.5698   1.5762   1.5748  0.1559   1.2529   1.8582  1.0000  0.99
x2   0.7895   0.7956   0.8005  0.1398   0.5086   1.0669  1.0000  0.97
x22  -0.2209  -0.2210  -0.2173  0.2530  -0.7158   0.2645  0.1942  0.99
x3   1.6340   1.6416   1.6383  0.1700   1.3286   1.9925  1.0000  1.01
# Show more detailed analysis of model performance measures
blrmStats(bs, pl=TRUE)
Indexes computed for a random sample of 400 of 4000 posterior draws

gp, B, EV, and vp are for intercept 1 out of 9 intercepts

           Dxy     C     g    gp     B    EV     v    vp
Mean     0.477 0.738 1.576 0.115 0.077 0.149 1.908 0.014
SE       0.003 0.001 0.109 0.014 0.001 0.023 0.262 0.003
Lower    0.471 0.736 1.365 0.092 0.076 0.111 1.416 0.008
Upper    0.480 0.740 1.786 0.142 0.079 0.194 2.443 0.020
Symmetry 0.514 0.514 0.953 0.966 1.262 0.918 1.057 1.063

Dxy: 2*(C - 0.5)   C: concordance probability
g: Gini mean |difference| on linear predictor (lp)
gp: Gini on predicted probability        B: Brier score
EV: explained variation on prob. scale   v: var(lp)   vp: var(prob)

Show basic Stan diagnostics. Had stanDxplots(bs, rstan=TRUE) been used, intercepts would have been shifted from what is in g because of subtractions of covariate means before passing data to rstan.

stanDxplot(bs)
stanDx(bs)
Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved

For each parameter, n_eff is a crude measure of effective sample size
and Rhat is the potential scale reduction factor on split chains
(at convergence, Rhat=1)


Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Rank-normalized split effective sample size satisfactory for all parameters.

Rank-normalized split R-hat values satisfactory for all parameters.

Processing complete, no problems detected.

EBFMI: 0.981 1.1 1.011 1.068 

   Parameter  Rhat ESS bulk ESS tail
1   alpha[1] 1.001     6891     2910
2   alpha[2] 1.001     7052     3223
3   alpha[3] 1.000     7563     2903
4   alpha[4] 1.000     7839     2739
5   alpha[5] 1.002     8378     2986
6   alpha[6] 1.000     7066     2857
7   alpha[7] 1.002     7631     2751
8   alpha[8] 1.001     7180     3147
9   alpha[9] 1.001     7355     3488
10   beta[1] 1.000     6543     3792
11   beta[2] 1.001     9449     2608
12   beta[3] 1.001     9401     2950
13   beta[4] 1.001     8428     3293

Here are the posterior distributions, calculated using kernel density estimates from posterior draws. Posterior models, shown as vertical lines, are parameter values that maximize the log posterior density (using the Stan optimizer in the original model fit) so do not necessarily coincide with the peak of the kernel density estimates.

plot(bs)
# Also show 2-d posterior density contour for two collinear terms
plot(bs, c('x2', 'x2^2'), bivar=TRUE)   # assumes ellipse
plot(bs, c('x2', 'x2^2'), bivar=TRUE, bivarmethod='kernel')   # kernel density

# Print frequentist side-by-side with Bayesian posterior mean, median, mode

cbind(MLE=coef(f), t(bs$param))
             MLE       mode       mean     median
y>=2   2.0551529  2.0550341  2.0638114  2.0583246
y>=3   1.0831900  1.0830941  1.0847773  1.0851496
y>=4   0.4005868  0.4005041  0.3986415  0.3972764
y>=5  -0.1790073 -0.1790749 -0.1830026 -0.1823254
y>=6  -0.7246249 -0.7246922 -0.7307361 -0.7316492
y>=7  -1.2866643 -1.2867349 -1.2937189 -1.2913036
y>=8  -1.8982180 -1.8982859 -1.9069993 -1.9037481
y>=9  -2.6287996 -2.6288759 -2.6404052 -2.6394494
y>=10 -3.6408392 -3.6409159 -3.6610893 -3.6611716
x1     1.5694325  1.5697726  1.5762275  1.5747686
x2     0.7900802  0.7894890  0.7955602  0.8004948
x2^2  -0.2212900 -0.2208656 -0.2210175 -0.2173066
x3     1.6340800  1.6339856  1.6416067  1.6383403
# Compare covariance matrix of posterior draws with MLE
round(diag(vcov(f)) / diag(vcov(bs)), 2)
 y>=2  y>=3  y>=4  y>=5  y>=6  y>=7  y>=8  y>=9 y>=10    x1    x2  x2^2    x3 
 0.97  0.97  0.94  0.96  0.95  0.98  0.97  0.96  0.97  0.97  0.96  1.04  1.01 
range(vcov(f) / vcov(bs))
[1] -1.882570  8.948802

Next show frequentist and Bayesian contrasts. For the Bayesian contrast the point estimate is the posterior mean, and the 0.95 highest posterior density interval is computed. Instead of a p-value, the posterior probability that the contrast is positive is computed.

contrast(f,  list(x1=0, x3=1), list(x1=.25, x3=0))
            x2 Contrast      S.E.     Lower    Upper    Z Pr(>|z|)
1 -0.001559955 1.241722 0.1720205 0.9045679 1.578876 7.22        0

Confidence intervals are 0.95 individual intervals
k <- contrast(bs, list(x1=0:1, x3=1), list(x1=.25, x3=0))
k
            x2 Contrast      S.E.     Lower    Upper Pr(Contrast>0)
1 -0.001559955 1.247550 0.1719986 0.9239994 1.597536              1
2 -0.001559955 2.823777 0.2122353 2.3927610 3.221752              1

Intervals are 0.95 highest posterior density intervals
Contrast is the posterior mean 

For Bayesian contrasts we can also plot the posterior densities for the contrasts, and 2-d highest-density contour.

plot(k)
plot(k, bivar=TRUE)   # applicable when exactly 2 contrasts
plot(k, bivar=TRUE, bivarmethod='kernel')

Compute posterior probabilities for various assertions about unknown true parameter values. The PostF function is a function generator that effectively evaluates the assertion to a 0/1 value and computes the mean of these binary values over posterior draws. As is the case with inference about the quadratic effect of x2 below, when the assertion does not evaluate to a binary 0/1 or logical TRUE/FALSE value, it is taken as a quantity that is derived from one or more model parameters, and a posterior density is drawn for the derived parameter. We use that to get a posterior distribution on the vertex of the quadratic x2 effect.

P <- PostF(bs, pr=TRUE)   # show new short legal R names
 Original Name Short Name
 y>=2          a1        
 y>=3          a2        
 y>=4          a3        
 y>=5          a4        
 y>=6          a5        
 y>=7          a6        
 y>=8          a7        
 y>=9          a8        
 y>=10         a9        
 x1            b1        
 x2            b2        
 x2^2          b3        
 x3            b4        
P(b3 > 0 & b1 > 1.5)
[1] 0.1365
P(b3 > 0)
[1] 0.19425
P(abs(b3) < 0.25)        # evidence for small |nonlinearity|
[1] 0.51925
mean(bs$draws[, 'x2^2'] > 0, na.rm=TRUE)    # longhand calculation
[1] 0.19425
# Plot posterior distribution for the vertex of the quadratic x2 effect
# This distribution should be wide because the relationship is linear
# (true value of b3 is zero)
plot(P(-b2 / (2 * b3)))

# Recreate the P function using original parameter names
# (which may not be legal R name)
P <- PostF(bs, name='orig')
P(`x2^2` > 0)
[1] 0.19425
P(`x2^2` > 0 & x1 > 1.5)
[1] 0.1365
# Remove rstan results from fit.  Compute  savings in object size.
# Note: this will only be accurate when running the fits for
# the first time (not when restoring shortened forms of them from
# disc)
# Result: 33.8MB before, 0.5MB after
s1 <- format(object.size(bs), 'MB')
bs$rstan <- NULL
s2 <- format(object.size(bs), 'MB')
cat('Before:', s1, '  After:', s2, '\n')
Before: 0.5 Mb   After: 0.5 Mb 

1.4 Bayesian Wilcoxon Test

Since the proportional odds ordinal (PO) logistic model is a generalization of Wilcoxon/Kruskal-Wallis tests one can use Bayesian proportional odds regression to get the Bayesian equivalent to the Wilcoxon test. Even if not adjusting for covariates (impossible with the Wilcoxon test) there are advantages to putting this in a modeling framework as detailed in Section 7.6 of BBR. A major advantage is estimation ability. One can estimate group-specific means, quantiles, and exceedance probabilities. And Bayesian inference provides exact uncertainty intervals (highest posterior density intervals in what follows) for these.

The PO model does not require there to be any ties among the Y values, so it handles continuous data very well (the orm function in the rms package efficiently handles many thousands of distinct Y levels requiring many thousands of intercepts in the model). Let’s re-analyze the calprotectin data in Section 7.3.1 of BBR to mimic the frequentist PO analysis in Section 7.6.

# Fecal Calprotectin: 2500 is above detection limit
# When detection limits occur at a single value, the PO
# model easily handles this in terms of estimating group
# differences (but not for estimating the mean Y)
calpro <- c(2500, 244, 2500, 726, 86, 2500, 61, 392, 2500, 114, 1226,
            2500, 168, 910, 627, 2500, 781, 57, 483, 30, 925, 1027,
            2500, 2500, 38, 18)

# Endoscopy score: 1 = No/Mild, 2=Mod/Severe Disease
# Would have been far better to code dose as 4 ordinal levels
endo <- c(2, 1, 2, 2, 2, 1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 1, 2,
          2, 2, 2, 2, 2, 1, 1)
endo <- factor(endo, 1 : 2,
               c("No or Mild Activity", "Moderate or Severe Activity"))

dd <- datadist(endo, calpro); options(datadist='dd')
bcalpro <- blrm(calpro ~ endo, file=rfile(bcalpro))
print(bcalpro, intercepts=TRUE)

Bayesian Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.134 for Intercepts

blrm(formula = calpro ~ endo, file = rfile(bcalpro))

Frequencies of Responses

  18   30   38   57   61   86  114  168  244  392  483  627  726  781  910  925 
   1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1 
1027 1226 2500 
   1    1    8 
Mixed Calibration/
Discrimination Indexes
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 26 LOO log L -93.83±8.66 g 1.299 [0.477, 2.275] C 0.854 [0.854, 0.854]
Draws 4000 LOO IC 187.65±17.33 gp 0.046 [0, 0.142] Dxy 0.708 [0.708, 0.708]
Chains 4 Effective p 30.72±3.64 EV 0.065 [0, 0.213]
Time 0.6s B 0.037 [0.034, 0.046] v 2.137 [0.023, 4.768]
p 1 vp 0.004 [0, 0.023]
Mode β Mean β Median β S.E. Lower Upper Pr(β>0) Symmetry
y≥30   2.0969   2.4674   2.3325  1.2099   0.2768   4.9138  0.9950  1.44
y≥38   1.3395   1.4515   1.3966  0.8728  -0.2724   3.0826  0.9650  1.21
y≥57   0.8678   0.8817   0.8660  0.7454  -0.4712   2.3783  0.8875  1.07
y≥61   0.4733   0.4438   0.4249  0.6930  -0.8883   1.7864  0.7270  1.05
y≥86   0.1122   0.0438   0.0525  0.6871  -1.2201   1.4767  0.5278  1.01
y≥114  -0.1956  -0.2825  -0.2919  0.6771  -1.6217   0.9994  0.3432  0.95
y≥168  -0.4710  -0.5747  -0.5704  0.6812  -1.9675   0.6697  0.2028  0.96
y≥244  -0.7653  -0.8841  -0.8791  0.7023  -2.1857   0.5480  0.1010  0.95
y≥392  -1.0953  -1.2254  -1.2066  0.7511  -2.7176   0.2024  0.0460  0.93
y≥483  -1.4155  -1.5623  -1.5325  0.8031  -3.1772  -0.0045  0.0210  0.90
y≥627  -1.6849  -1.8344  -1.7976  0.8404  -3.4894  -0.2076  0.0112  0.91
y≥726  -1.9226  -2.0814  -2.0489  0.8677  -3.8740  -0.4825  0.0058  0.90
y≥781  -2.1399  -2.3106  -2.2788  0.8895  -4.0391  -0.5500  0.0027  0.89
y≥910  -2.3438  -2.5294  -2.4838  0.9029  -4.3629  -0.8405  0.0015  0.89
y≥925  -2.5396  -2.7413  -2.7016  0.9243  -4.6106  -1.0303  0.0008  0.89
y≥1027  -2.7312  -2.9489  -2.9066  0.9377  -4.8348  -1.2171  0.0005  0.90
y≥1226  -2.9224  -3.1629  -3.1171  0.9531  -4.9752  -1.3100  0.0003  0.89
y≥2500  -3.1165  -3.3855  -3.3528  0.9654  -5.2310  -1.5021  0.0000  0.90
endo=Moderate or Severe Activity   2.7585   2.9168   2.8846  0.9846   1.1468   5.0134  0.9998  1.13
# print.blrm defaults to not showing intercepts if more than 9 of them
summary(bcalpro)
Effects   Response: calpro
Low High Δ Effect S.E. Lower 0.95 Upper 0.95
endo --- No or Mild Activity:Moderate or Severe Activity 2 1 -2.9170 0.9846 -5.013000 -1.1470
Odds Ratio 2 1 0.0541 0.006649 0.3176

One can see that the posterior probability of a positive group difference exceeds 0.99.

Now compute the posterior mean estimate of the mean and median calprotectin levels for the unknown data generating process, stratified by group and compare with sample estimates.

# Sample estimates
tapply(calpro, endo, mean)
        No or Mild Activity Moderate or Severe Activity 
                    400.000                    1372.944 
tapply(calpro, endo, median)
        No or Mild Activity Moderate or Severe Activity 
                       87.5                       976.0 
# Now compute estimates and 0.95 HPD intervals assuming PO
# The first method is exact
newdata <- data.frame(endo=levels(endo))
bar <- Mean(bcalpro)
predict(bcalpro, newdata, fun=bar)
$linear.predictors
[1]  310.3338 1346.3339

$lower
[1]  80.89445 935.76497

$upper
[1]  625.0841 1734.2748
quant <- Quantile(bcalpro)
med <- function(lp, ...) quant(lp=lp, ...)
Predict(bcalpro, endo, fun=med)
                         endo      yhat    lower     upper
1         No or Mild Activity  93.92177  18.0000  236.4783
2 Moderate or Severe Activity 980.57654 597.9356 1571.4520

Response variable (y):  

Limits are 0.95 confidence limits

The contrast function in rms now allows one to get posterior distributions of differences in nonlinearly transformed parameters, as follows.

k <- contrast(bcalpro, list(endo=levels(endo)[1]),
                       list(endo=levels(endo)[2]), fun=bar)
k

Posterior Summaries for First X Settings

  Posterior Mean Posterior Median Lower 0.95 HPD Upper 0.95 HPD
1       310.3338         277.6756       80.89445       625.0841

Posterior Summaries for Second X Settings

  Posterior Mean Posterior Median Lower 0.95 HPD Upper 0.95 HPD
1       1346.334         1346.689        935.765       1734.275

Posterior Summaries of First - Second

  Posterior Mean Posterior Median Lower 0.95 HPD Upper 0.95 HPD
1          -1036        -1044.717      -1549.205      -519.4963
plot(k, which='diff')