`rmsb` Package Examples Using `cmdstan`

Author

Affiliation

Department of Biostatistics
Vanderbilt University School of Medicine
VICTR Research Methods Program

Published

April 13, 2025

1 Overview and Setup

The rmsb package is the Bayesian companion to the rms package, and uses many of its functions for post-model-fitting analysis and graphics. The sole fitting function in rmsb at present is blrm. blrm is for Bayesian binary and ordinal proportional odds logistic regression and the Peterson & Harrell (1990) partial proportional odds model that relaxes the proportional odds assumption. It is the analog and generalization of rms::lrm and for ordinal responses is intended for outcomes with up to a several hundred ordinal levels. The Bayesian approach has a number of advantage over traditional frequentist models, including

the use of exact calculations (to within simulation error) instead of large sample (e.g., normal theory) approximations to p-values and confidence intervals
exact and more intuitive inference when random effects are included in the model
the ability to make probability statements about parameters and combinations of parameters, which includes computations of the probabilities of assertions such as “the effect of \(x_1\) exceeds 1.2 and the effect of \(x_2\) exceeds 0.7”
capturing more sources of uncertainty. For example, the blrm function automatically computes highest posterior density intervals on a variety of statistical indexes such as the Brier score and AUROC (\(c\)-index). Note: By default these intervals are computed using only 400 posterior draws to save time. For a blrm fit object f you can specify how many samples to draw, to get more accurate intervals, by specifying for example print(f, ns=2000).
the ability to incorporate external information or beliefs about linear combinations of parameters using prior distributions on contrasts
by using Stan language to specify the likelihood components, one can not only do posterior distribution sampling but can quickly compute maximum likelihood estimates (Bayesian posterior modes)

blrm uses uninformative priors for \(\beta\) parameters that are not used in contrasts appearing in the pcontrast argument. For other \(\beta\)s, priors are put on contrasts specified in pcontrast, capitalizing on Stan automatically propagating priors on linear combinations of parameters to priors on individual parameters. This allows priors to be specified on more of a user-friendly scale, and allows one to put priors on specific comparisons, nonlinear effects, interactions, parts of interactions, and more. This versatility is shown in a series of examples here. One example is the common situation in which there is a time \(\times\) treatment interaction and the prior for the treatment effect is needed to be applied to a specific time.

When there are random effects, the analyst may specify the mean of the exponential distribution that serves as the prior for the standard deviation of the random effects, with the default mean being 1.0, a reasonable number for the logit scale. You can also use a half-\(t\) distribution for the SD of random effects.

Regarding uncertainties about statistical indexes of model performance such as AUROC, the Bayesian posterior distributions computed by rmsb account for coefficient uncertainty for the sample at hand’s ability to uncover the true unknown data generating coefficients and how strongly those coefficients can predict outcomes in the sample. The posterior distributions do not pertain to sample-to-sample variability of model fit or model validation.

blrm uses Stan code written by Ben Goodrich of Columbia University and Frank Harrell. This code is precompiled when the rmsb package is built when using rstan for which you must install the rstan package to use rmsb. You can use cmdstan and the cmdstanr R package instead. This is generally recommended, as cmdstan and cmdstanr make use of the latest versions of Stan. Here are the steps I used for installing these on Linux, using the latest version of cmdstan:

cd ~
sudo R
install.packages('cmdstanr',
                  repos = c("https://stan-dev.r-universe.dev",
                            "https://cloud.r-project.org"),
                  lib='/usr/local/rpackage')
# Default location for cmdstan is in `~/.cmdstan` in the user's home directory
# This is not such a good idea
cmdstanr::install_cmdstan(cores=10, dir='/usr/local/bin')
# When running sudo, default is /root/.cmdstan which is not easily accessible

Add the following line in .zshrc or .bashrc so that cmdstanr will know where to find cmdstan:

export CMDSTAN=/usr/local/bin/cmdstan-2.36.0

This did not work consistently. Instead of that, I have the location defined in cmdstan.loc in ~/.Rprofile.

cmdstanr will try to put compiled Stan code in the same directory where cmdstan is stored. With the above setup, this location is not user-writeable, so it will not work. The blrm function by default puts compiled code in the .rmsb directory under the user’s home directory. Specify options(rmsbdir='some another location') to change this. The default behavior puts all compiled code in a system-wide location. If you desire to archive a project and reproduce the result even if the Stan code inside rmsb is updated, you may want to specify a project-specific location fo rmsbdir.

Here is a typical setup code chunk for using rmsb. When using rstan, it is not recommended to use options(auto_write=TRUE) as when running a series of regressions, the rstan package tends to recompile Stan code even when the code doesn’t change.

```{r setup}
require(rmsb)
options(mc.cores = parallel::detectCores() - 1)   # use max # CPUs less 1
# The default number of chains cmdstanr uses is 4, so it doesn't do any good
# to specify more than 4 cores in most cases.
```

If you want to use cmdstan by default, avoiding having to use the backend argument to blrm, you can add this to the setup chunk:

options(rmsb.backend='cmdstan')

This vignette is run separately to use rstan instead of cmdstanr. The variable wstan below defines which system to run. So that the file= arguments to blrm can store results in different files, a little function rfile is created below. It checks wstan and when using cmdstan adds "c" to the end of the main part of the file name. rfile also automatically quotes the base file name specified by the user.

wstan <- c('cmdstan', 'rstan')[1]
rfile <- function(f)
  paste0(as.character(substitute(f)), if(wstan == 'cmdstan') 'c', '.rds')
require(rmsb)

knitrSet(lang='quarto', w=7, h=7)
options(prType='html')     # for rms output: print, anova, summary, etc.
options(mc.cores = parallel::detectCores() - 1)   # use max # CPUs less 1
if(wstan == 'cmdstan') {
  Sys.getenv('CMDSTAN')
  # cmdstanr::cmdstan_path()
  # Sometimes RStudio doesn't find the environment variable
  cat('cmdstan.loc=', cmdstan.loc, '\n')
  cmdstanr::set_cmdstan_path(cmdstan.loc)
  options(rmsb.backend='cmdstan')
    }

cmdstan.loc= /usr/local/bin/cmdstan-2.36.0

1.1 Running Fits Only When Something Changes

You’ll see file='...' in the longer running of the blrm calls below (actually the code uses rfile just defined to construct the file name). If the file already exists and none of the data nor the options sent to rstan/cmdstanr nor the underlying Stan code have changed from what were used to create the fit object stored in that file (as judged by their md5 hash), that saved fit object is returned immediately without running the rstan code, often saving a great deal of execution time. This works well with the workflow of long R markdown reports making it so that only portions of Bayesian analysis that have changed are run. Note that using the knitr cache=TRUE option does not work well as the cache files for this script were about a GB in size, and the system does not accurately recognize when a model fit hasn’t changed and doesn’t need to be run when rstan is involved.

A restored fit object does not contain the rstan or cmdstanr object, saving tens of megabytes of storage. Standard Stan diagnostics are stored in the fit object separately, and it is assumed that if the user wanted to run rstan::loo loo=TRUE would have been specified to blrm so that loo is run and its (small) result stored in the blrm fit object. If you want to run pairs to get more graphical diagnostics, intead of relying on the rstan object always being available, specify pairs=TRUE or pairs='filename.png' to blrm to graph the pair plots. The latter is recommended, and one can put knitr::include_graphics('filename.png') in the R code chunk to render the graph in the report even if blrm was not re-run.

When file='filename.rds' is specified to blrm and the file does not exist or analysis inputs have changed since the file was created, the blrm fit object will be saved in saveRDS .rds binary format in your current working directory at the completion of blrm’s work. The rstan component is omitted from the saved file.

Alternate Way to Cache Results

The utility function fitIf available from here is another way to efficiently manage Bayesian simulation workflow. The fitIf approach runs the analysis only if the file doesn’t already exist. If it exists but the data or the model have changed, the fitIf approach is not intelligent enough to re-run the analysis (unlike the file='...' approach above). Whether using fitIf or file=, you lose the ability to run the rms::stanDxplot(..., rstan=TRUE) on restored objects, so the stanDxplot function tries to find an existing trace plot image file corresponding to the current R markdown chunk when the fit object no longer has the rstan component. For most purposes this doesn’t matter, because running using the defaults shows the needed non-burnin samples which are always stored in fit objects. The file=filename.rds approach is preferred.

1.2 Priors

It is often difficult to specify priors for parameters, especially when there are nonlinear effects (e.g., splines) and interactions in the model. We need a way to specify priors on the original \(X\) and \(Y\) scales. Fortunately Stan provides an elegant solution.

As discussed here Stan allows one to specify priors on transformations of model parameters, and these priors propagate back to the original parameters. It is easier to specify a prior for the effect of increasing age from 30 to 60 that it is to specify a prior for the age slope. It may be difficult to specify a prior for an age \(\times\) treatment interaction, but much easier to specify a prior for how different the treatment effect is for a 30 year old and a 60 year old. By specifying priors on one or more contrasts one can easily encode outside information / information borrowing / shrinkage.

The rms contrast function provides a general way to implement contrasts up to double differences, and more details about computations are provided in that link. The approach used for specifying priors for contrast in rmsb::blrm uses the same process but is even more general. Both contrast and blrm compute design matrices at user-specified predictor settings, and the contrast matrices (matrices multipled by \(\hat{\beta}\)) are simply differences in such design matrices. Thinking of contrasts as differences in predicted values frees the user from having to care about how parameters map to estimands, and allows an R predict(fit, type='x') function do the hard work. See here for several interesting examples.

Beginning with rmsb version 1.0-0, prior standard deviations are not specified for individual parameters, and priors are only specified for contrasts through the pcontrast and npcontrast arguments. Since blrm transforms the design matrix through a QR decomposition to speed up posterior sampling, it also transforms the contrast matrices the same way so that priors for contrasts are in the right space inside the Stan code. Parameters not used in any contrast will have non-informative prior distributions.

For non-proportional odds effects in a constrained partial PO model, priors are specified through contrasts using the npcontrast argument, which is a list of the same structure as pcontrast but corresponds to the model given to the second argument to blrm.

Instead of specifying prior standard deviations directly, it is convenient to solve for SDs that correspond to a specified probability for a specified interval of an effect in the model. Effects for the PO model are usually expressed as odds ratios (OR). For the case where the prior median for the OR is 1.0 (prior mean or median log(OR)=0.0) it is useful to solve for the prior SD \(\sigma\) so that \(\Pr(\text{OR} > r) = a = \Pr(\text{OR} < \frac{1}{r})\), leading to \(a = \frac{|\log(r)|}{\Phi^{-1}(1-a)}\), computed by the psigma function below. Another function . is defined as an abbreviation for list() for later usage with pcontrast.

psigma <- function(r, a, inline=FALSE, pr=! inline) {
  sigma <- abs(log(r)) / qnorm(1 - a)
  dir <- if(r > 1.) '>' else '<'
  x <- if(inline) paste0('$\\Pr(\\text{OR}', dir, r, ') =', a,
                         ' \\Rightarrow \\sigma=', round(sigma, 3), '$')
  else paste0('Pr(OR ', dir, ' ', r, ') = ', a, ' ⇒ σ=', round(sigma, 3))
  if(inline) return(x)
  if(pr) {
    cat('\n', x, '\n\n', sep='')
    return(invisible(sigma))
  }
  sigma
}
. <- function(...) list(...)

1.3 Example: 10-level Ordinal Outcome

Simulate a dataset with three predictors and one 10-level ordinal outcome. Run the frequentist proportional odds model then the Bayesian one.

set.seed(1)
n <- 500
x1 <- runif(n, -1, 1)
x2 <- runif(n, -1, 1)
x3 <- sample(0 : 1, n, TRUE)
y <- x1 + 0.5 * x2 + x3 + rnorm(n)
y <- as.integer(cut2(y, g=10))
dd <- datadist(x1, x2, x3); options(datadist='dd')
f <- lrm(y ~ x1 + pol(x2, 2) + x3, eps=1e-7) # eps to check against Stan
f

Logistic Regression Model

lrm(formula = y ~ x1 + pol(x2, 2) + x3, eps = 1e-07)

Frequencies of Responses

 1  2  3  4  5  6  7  8  9 10 
50 50 50 50 50 50 50 50 50 50

	Model Likelihood Ratio Test	Discrimination Indexes	Rank Discrim. Indexes
Obs 500	LR χ² 235.04	R² 0.379	C 0.740
max \|∂log L/∂β\| 9×10^-7	d.f. 4	R²_4,500 0.370	D_xy 0.480
	Pr(>χ²) <0.0001	R²_4,495 0.373	γ 0.480
		Brier 0.183	τ_a 0.433

	β	S.E.	Wald Z	Pr(>\|Z\|)
y≥2	2.0552	0.1951	10.53	<0.0001
y≥3	1.0832	0.1678	6.45	<0.0001
y≥4	0.4006	0.1606	2.49	0.0126
y≥5	-0.1790	0.1599	-1.12	0.2629
y≥6	-0.7246	0.1626	-4.46	<0.0001
y≥7	-1.2867	0.1689	-7.62	<0.0001
y≥8	-1.8982	0.1788	-10.62	<0.0001
y≥9	-2.6288	0.1953	-13.46	<0.0001
y≥10	-3.6408	0.2282	-15.95	<0.0001
x1	1.5694	0.1532	10.24	<0.0001
x2	0.7901	0.1368	5.78	<0.0001
x2²	-0.2213	0.2578	-0.86	0.3908
x3	1.6341	0.1707	9.57	<0.0001

Before getting posterior distributions of parameters, use Stan to just get maximum likelihood estimates and compare them with those from lrm. Do this for increasingly flat priors for the \(\beta\)s associated with x2. Running method='optimizing' is a quick way to study the effect of priors on the posterior modes for non-intercepts when there are no random effects in the model.

# Define a function that creates a `pcontrast` for `blrm`
# Skepticism of prior is specified by making changes in Y be small
# as x2 goes from -1 to 0 to 1
# Pr(OR > 2) = p
con <- function(p)
  list(sd=psigma(2, p),
        c1=.(x2=0), c2=.(x2=-1), c3=.(x2=1), c4=.(x2=0),
        contrast=expression(c1-c2, c3-c4))
k <- NULL
for(p in c(.01, .05, .1, .2)) {
    g <- blrm(y ~ x1 + pol(x2, 2) + x3, method='optimizing',
              pcontrast=con(p))
    cat('-2 log likelihood:', g$deviance, '\n')
    k <- rbind(k, g$coefficients)
}


Pr(OR > 2) = 0.01 ⇒ σ=0.298

-2 log likelihood: 2051.08


Pr(OR > 2) = 0.05 ⇒ σ=0.421

-2 log likelihood: 2048.26


Pr(OR > 2) = 0.1 ⇒ σ=0.541

-2 log likelihood: 2047.16


Pr(OR > 2) = 0.2 ⇒ σ=0.824

-2 log likelihood: 2046.7

         y>=2     y>=3      y>=4       y>=5       y>=6      y>=7      y>=8
[1,] 1.985119 1.023839 0.3514193 -0.2191947 -0.7563559 -1.310276 -1.913121
[2,] 2.006409 1.041019 0.3645887 -0.2095343 -0.7499753 -1.307015 -1.913131
[3,] 2.020027 1.052577 0.3741372 -0.2017828 -0.7438895 -1.302538 -1.910403
[4,] 2.037175 1.067335 0.3867145 -0.1911465 -0.7351502 -1.295593 -1.905395
          y>=9     y>=10       x1        x2        x2^2       x3
[1,] -2.632091 -3.630891 1.563146 0.5559472 -0.08210495 1.610512
[2,] -2.636521 -3.640501 1.564845 0.6520621 -0.12059023 1.620211
[3,] -2.636113 -3.642703 1.565855 0.6993637 -0.14720869 1.624592
[4,] -2.633695 -3.643145 1.567955 0.7479708 -0.18219874 1.629478

# Compare with ordinary MLEs and deviance
f$deviance

[1] 2302.585 2067.549

coef(f)

      y>=2       y>=3       y>=4       y>=5       y>=6       y>=7       y>=8 
 2.0551529  1.0831900  0.4005868 -0.1790073 -0.7246249 -1.2866643 -1.8982180 
      y>=9      y>=10         x1         x2       x2^2         x3 
-2.6287996 -3.6408392  1.5694325  0.7900802 -0.2212900  1.6340800

Fit the model with Dirichlet priors on intercepts and wide normal priors on the \(\beta\)s. Show the model fit summary. Note that the indexes of predictive discrimination/accuracy include 0.95 highest posterior density intervals. In frequentist inference we pretend that quantities such as AUROC and \(R^2\) are estimated without error, which is far from the case.

In several places you will see an index named Symmetry. This is a measure of the symmetry of a posterior distribution. Values farther from 1.0 indicate asymmetry, which indicates that the use of standard errors and the use of a normal approximation for the posterior distribution are not justified. The symmetry index is the ratio of the gap between the posterior mean and the 0.95 quantile of the posterior distribution to the gap between the 0.05 quantile and the mean.

bs <- blrm(y ~ x1 + pol(x2, 2) + x3, file=rfile(bs))

bs

Bayesian Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.233 for Intercepts

blrm(formula = y ~ x1 + pol(x2, 2) + x3, file = rfile(bs))

Frequencies of Responses

 1  2  3  4  5  6  7  8  9 10 
50 50 50 50 50 50 50 50 50 50

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 500	LOO log L -1047.06±13.16	g 1.572 [1.372, 1.777]	C 0.738 [0.736, 0.74]
Draws 4000	LOO IC 2094.12±26.33	g_p 0.113 [0.088, 0.137]	D_xy 0.477 [0.471, 0.481]
Chains 4	Effective p 13.29±0.24	EV 0.147 [0.107, 0.189]
Time 2.1s	B 0.077 [0.076, 0.079]	v 1.9 [1.441, 2.418]
p 4		vp 0.014 [0.008, 0.019]

	Mode β	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
y≥2	2.0550	2.0638	2.0583	0.1981	1.6764	2.4432	1.0000	1.06
y≥3	1.0831	1.0848	1.0851	0.1702	0.7413	1.4121	1.0000	1.02
y≥4	0.4005	0.3986	0.3973	0.1656	0.0832	0.7330	0.9932	1.04
y≥5	-0.1791	-0.1830	-0.1823	0.1631	-0.5201	0.1192	0.1295	1.03
y≥6	-0.7247	-0.7307	-0.7316	0.1666	-1.0651	-0.4222	0.0000	1.01
y≥7	-1.2867	-1.2937	-1.2913	0.1709	-1.6307	-0.9696	0.0000	0.97
y≥8	-1.8983	-1.9070	-1.9037	0.1815	-2.2590	-1.5659	0.0000	0.97
y≥9	-2.6289	-2.6404	-2.6394	0.1998	-3.0137	-2.2335	0.0000	0.97
y≥10	-3.6409	-3.6611	-3.6612	0.2315	-4.1410	-3.2376	0.0000	0.96
x1	1.5698	1.5762	1.5748	0.1559	1.2529	1.8582	1.0000	0.99
x2	0.7895	0.7956	0.8005	0.1398	0.5086	1.0669	1.0000	0.97
x2²	-0.2209	-0.2210	-0.2173	0.2530	-0.7158	0.2645	0.1942	0.99
x3	1.6340	1.6416	1.6383	0.1700	1.3286	1.9925	1.0000	1.01

# Show more detailed analysis of model performance measures
blrmStats(bs, pl=TRUE)

Indexes computed for a random sample of 400 of 4000 posterior draws

gp, B, EV, and vp are for intercept 1 out of 9 intercepts

           Dxy     C     g    gp     B    EV     v    vp
Mean     0.477 0.738 1.576 0.115 0.077 0.149 1.908 0.014
SE       0.003 0.001 0.109 0.014 0.001 0.023 0.262 0.003
Lower    0.471 0.736 1.365 0.092 0.076 0.111 1.416 0.008
Upper    0.480 0.740 1.786 0.142 0.079 0.194 2.443 0.020
Symmetry 0.514 0.514 0.953 0.966 1.262 0.918 1.057 1.063

Dxy: 2*(C - 0.5)   C: concordance probability
g: Gini mean |difference| on linear predictor (lp)
gp: Gini on predicted probability        B: Brier score
EV: explained variation on prob. scale   v: var(lp)   vp: var(prob)

Show basic Stan diagnostics. Had stanDxplots(bs, rstan=TRUE) been used, intercepts would have been shifted from what is in g because of subtractions of covariate means before passing data to rstan.

stanDxplot(bs)
stanDx(bs)

Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved

For each parameter, n_eff is a crude measure of effective sample size
and Rhat is the potential scale reduction factor on split chains
(at convergence, Rhat=1)


Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Rank-normalized split effective sample size satisfactory for all parameters.

Rank-normalized split R-hat values satisfactory for all parameters.

Processing complete, no problems detected.

EBFMI: 0.981 1.1 1.011 1.068 

   Parameter  Rhat ESS bulk ESS tail
1   alpha[1] 1.001     6891     2910
2   alpha[2] 1.001     7052     3223
3   alpha[3] 1.000     7563     2903
4   alpha[4] 1.000     7839     2739
5   alpha[5] 1.002     8378     2986
6   alpha[6] 1.000     7066     2857
7   alpha[7] 1.002     7631     2751
8   alpha[8] 1.001     7180     3147
9   alpha[9] 1.001     7355     3488
10   beta[1] 1.000     6543     3792
11   beta[2] 1.001     9449     2608
12   beta[3] 1.001     9401     2950
13   beta[4] 1.001     8428     3293

Here are the posterior distributions, calculated using kernel density estimates from posterior draws. Posterior models, shown as vertical lines, are parameter values that maximize the log posterior density (using the Stan optimizer in the original model fit) so do not necessarily coincide with the peak of the kernel density estimates.

plot(bs)
# Also show 2-d posterior density contour for two collinear terms
plot(bs, c('x2', 'x2^2'), bivar=TRUE)   # assumes ellipse
plot(bs, c('x2', 'x2^2'), bivar=TRUE, bivarmethod='kernel')   # kernel density

# Print frequentist side-by-side with Bayesian posterior mean, median, mode

cbind(MLE=coef(f), t(bs$param))

             MLE       mode       mean     median
y>=2   2.0551529  2.0550341  2.0638114  2.0583246
y>=3   1.0831900  1.0830941  1.0847773  1.0851496
y>=4   0.4005868  0.4005041  0.3986415  0.3972764
y>=5  -0.1790073 -0.1790749 -0.1830026 -0.1823254
y>=6  -0.7246249 -0.7246922 -0.7307361 -0.7316492
y>=7  -1.2866643 -1.2867349 -1.2937189 -1.2913036
y>=8  -1.8982180 -1.8982859 -1.9069993 -1.9037481
y>=9  -2.6287996 -2.6288759 -2.6404052 -2.6394494
y>=10 -3.6408392 -3.6409159 -3.6610893 -3.6611716
x1     1.5694325  1.5697726  1.5762275  1.5747686
x2     0.7900802  0.7894890  0.7955602  0.8004948
x2^2  -0.2212900 -0.2208656 -0.2210175 -0.2173066
x3     1.6340800  1.6339856  1.6416067  1.6383403

# Compare covariance matrix of posterior draws with MLE
round(diag(vcov(f)) / diag(vcov(bs)), 2)

 y>=2  y>=3  y>=4  y>=5  y>=6  y>=7  y>=8  y>=9 y>=10    x1    x2  x2^2    x3 
 0.97  0.97  0.94  0.96  0.95  0.98  0.97  0.96  0.97  0.97  0.96  1.04  1.01

range(vcov(f) / vcov(bs))

[1] -1.882570  8.948802

Next show frequentist and Bayesian contrasts. For the Bayesian contrast the point estimate is the posterior mean, and the 0.95 highest posterior density interval is computed. Instead of a p-value, the posterior probability that the contrast is positive is computed.

contrast(f,  list(x1=0, x3=1), list(x1=.25, x3=0))

            x2 Contrast      S.E.     Lower    Upper    Z Pr(>|z|)
1 -0.001559955 1.241722 0.1720205 0.9045679 1.578876 7.22        0

Confidence intervals are 0.95 individual intervals

k <- contrast(bs, list(x1=0:1, x3=1), list(x1=.25, x3=0))
k

            x2 Contrast      S.E.     Lower    Upper Pr(Contrast>0)
1 -0.001559955 1.247550 0.1719986 0.9239994 1.597536              1
2 -0.001559955 2.823777 0.2122353 2.3927610 3.221752              1

Intervals are 0.95 highest posterior density intervals
Contrast is the posterior mean

For Bayesian contrasts we can also plot the posterior densities for the contrasts, and 2-d highest-density contour.

plot(k)
plot(k, bivar=TRUE)   # applicable when exactly 2 contrasts
plot(k, bivar=TRUE, bivarmethod='kernel')

Compute posterior probabilities for various assertions about unknown true parameter values. The PostF function is a function generator that effectively evaluates the assertion to a 0/1 value and computes the mean of these binary values over posterior draws. As is the case with inference about the quadratic effect of x2 below, when the assertion does not evaluate to a binary 0/1 or logical TRUE/FALSE value, it is taken as a quantity that is derived from one or more model parameters, and a posterior density is drawn for the derived parameter. We use that to get a posterior distribution on the vertex of the quadratic x2 effect.

P <- PostF(bs, pr=TRUE)   # show new short legal R names

 Original Name Short Name
 y>=2          a1        
 y>=3          a2        
 y>=4          a3        
 y>=5          a4        
 y>=6          a5        
 y>=7          a6        
 y>=8          a7        
 y>=9          a8        
 y>=10         a9        
 x1            b1        
 x2            b2        
 x2^2          b3        
 x3            b4

P(b3 > 0 & b1 > 1.5)

[1] 0.1365

P(b3 > 0)

[1] 0.19425

P(abs(b3) < 0.25)        # evidence for small |nonlinearity|

[1] 0.51925

mean(bs$draws[, 'x2^2'] > 0, na.rm=TRUE)    # longhand calculation

[1] 0.19425

# Plot posterior distribution for the vertex of the quadratic x2 effect
# This distribution should be wide because the relationship is linear
# (true value of b3 is zero)
plot(P(-b2 / (2 * b3)))

# Recreate the P function using original parameter names
# (which may not be legal R name)
P <- PostF(bs, name='orig')
P(`x2^2` > 0)

[1] 0.19425

P(`x2^2` > 0 & x1 > 1.5)

[1] 0.1365

# Remove rstan results from fit.  Compute  savings in object size.
# Note: this will only be accurate when running the fits for
# the first time (not when restoring shortened forms of them from
# disc)
# Result: 33.8MB before, 0.5MB after
s1 <- format(object.size(bs), 'MB')
bs$rstan <- NULL
s2 <- format(object.size(bs), 'MB')
cat('Before:', s1, '  After:', s2, '\n')

Before: 0.5 Mb   After: 0.5 Mb

1.4 Bayesian Wilcoxon Test

Since the proportional odds ordinal (PO) logistic model is a generalization of Wilcoxon/Kruskal-Wallis tests one can use Bayesian proportional odds regression to get the Bayesian equivalent to the Wilcoxon test. Even if not adjusting for covariates (impossible with the Wilcoxon test) there are advantages to putting this in a modeling framework as detailed in Section 7.6 of BBR. A major advantage is estimation ability. One can estimate group-specific means, quantiles, and exceedance probabilities. And Bayesian inference provides exact uncertainty intervals (highest posterior density intervals in what follows) for these.

The PO model does not require there to be any ties among the Y values, so it handles continuous data very well (the orm function in the rms package efficiently handles many thousands of distinct Y levels requiring many thousands of intercepts in the model). Let’s re-analyze the calprotectin data in Section 7.3.1 of BBR to mimic the frequentist PO analysis in Section 7.6.

# Fecal Calprotectin: 2500 is above detection limit
# When detection limits occur at a single value, the PO
# model easily handles this in terms of estimating group
# differences (but not for estimating the mean Y)
calpro <- c(2500, 244, 2500, 726, 86, 2500, 61, 392, 2500, 114, 1226,
            2500, 168, 910, 627, 2500, 781, 57, 483, 30, 925, 1027,
            2500, 2500, 38, 18)

# Endoscopy score: 1 = No/Mild, 2=Mod/Severe Disease
# Would have been far better to code dose as 4 ordinal levels
endo <- c(2, 1, 2, 2, 2, 1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 1, 2,
          2, 2, 2, 2, 2, 1, 1)
endo <- factor(endo, 1 : 2,
               c("No or Mild Activity", "Moderate or Severe Activity"))

dd <- datadist(endo, calpro); options(datadist='dd')
bcalpro <- blrm(calpro ~ endo, file=rfile(bcalpro))

print(bcalpro, intercepts=TRUE)

Bayesian Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.134 for Intercepts

blrm(formula = calpro ~ endo, file = rfile(bcalpro))

Frequencies of Responses

  18   30   38   57   61   86  114  168  244  392  483  627  726  781  910  925 
   1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1 
1027 1226 2500 
   1    1    8

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 26	LOO log L -93.83±8.66	g 1.299 [0.477, 2.275]	C 0.854 [0.854, 0.854]
Draws 4000	LOO IC 187.65±17.33	g_p 0.046 [0, 0.142]	D_xy 0.708 [0.708, 0.708]
Chains 4	Effective p 30.72±3.64	EV 0.065 [0, 0.213]
Time 0.6s	B 0.037 [0.034, 0.046]	v 2.137 [0.023, 4.768]
p 1		vp 0.004 [0, 0.023]

	Mode β	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
y≥30	2.0969	2.4674	2.3325	1.2099	0.2768	4.9138	0.9950	1.44
y≥38	1.3395	1.4515	1.3966	0.8728	-0.2724	3.0826	0.9650	1.21
y≥57	0.8678	0.8817	0.8660	0.7454	-0.4712	2.3783	0.8875	1.07
y≥61	0.4733	0.4438	0.4249	0.6930	-0.8883	1.7864	0.7270	1.05
y≥86	0.1122	0.0438	0.0525	0.6871	-1.2201	1.4767	0.5278	1.01
y≥114	-0.1956	-0.2825	-0.2919	0.6771	-1.6217	0.9994	0.3432	0.95
y≥168	-0.4710	-0.5747	-0.5704	0.6812	-1.9675	0.6697	0.2028	0.96
y≥244	-0.7653	-0.8841	-0.8791	0.7023	-2.1857	0.5480	0.1010	0.95
y≥392	-1.0953	-1.2254	-1.2066	0.7511	-2.7176	0.2024	0.0460	0.93
y≥483	-1.4155	-1.5623	-1.5325	0.8031	-3.1772	-0.0045	0.0210	0.90
y≥627	-1.6849	-1.8344	-1.7976	0.8404	-3.4894	-0.2076	0.0112	0.91
y≥726	-1.9226	-2.0814	-2.0489	0.8677	-3.8740	-0.4825	0.0058	0.90
y≥781	-2.1399	-2.3106	-2.2788	0.8895	-4.0391	-0.5500	0.0027	0.89
y≥910	-2.3438	-2.5294	-2.4838	0.9029	-4.3629	-0.8405	0.0015	0.89
y≥925	-2.5396	-2.7413	-2.7016	0.9243	-4.6106	-1.0303	0.0008	0.89
y≥1027	-2.7312	-2.9489	-2.9066	0.9377	-4.8348	-1.2171	0.0005	0.90
y≥1226	-2.9224	-3.1629	-3.1171	0.9531	-4.9752	-1.3100	0.0003	0.89
y≥2500	-3.1165	-3.3855	-3.3528	0.9654	-5.2310	-1.5021	0.0000	0.90
endo=Moderate or Severe Activity	2.7585	2.9168	2.8846	0.9846	1.1468	5.0134	0.9998	1.13

# print.blrm defaults to not showing intercepts if more than 9 of them
summary(bcalpro)

Effects Response: `calpro`
	Low	High	Δ	Effect	S.E.	Lower 0.95	Upper 0.95
endo --- No or Mild Activity:Moderate or Severe Activity	2	1		-2.9170	0.9846	-5.013000	-1.1470
Odds Ratio	2	1		0.0541		0.006649	0.3176

One can see that the posterior probability of a positive group difference exceeds 0.99.

Now compute the posterior mean estimate of the mean and median calprotectin levels for the unknown data generating process, stratified by group and compare with sample estimates.

# Sample estimates
tapply(calpro, endo, mean)

        No or Mild Activity Moderate or Severe Activity 
                    400.000                    1372.944

tapply(calpro, endo, median)

        No or Mild Activity Moderate or Severe Activity 
                       87.5                       976.0

# Now compute estimates and 0.95 HPD intervals assuming PO
# The first method is exact
newdata <- data.frame(endo=levels(endo))
bar <- Mean(bcalpro)
predict(bcalpro, newdata, fun=bar)

$linear.predictors
[1]  310.3338 1346.3339

$lower
[1]  80.89445 935.76497

$upper
[1]  625.0841 1734.2748

quant <- Quantile(bcalpro)
med <- function(lp, ...) quant(lp=lp, ...)
Predict(bcalpro, endo, fun=med)

                         endo      yhat    lower     upper
1         No or Mild Activity  93.92177  18.0000  236.4783
2 Moderate or Severe Activity 980.57654 597.9356 1571.4520

Response variable (y):  

Limits are 0.95 confidence limits

The contrast function in rms now allows one to get posterior distributions of differences in nonlinearly transformed parameters, as follows.

k <- contrast(bcalpro, list(endo=levels(endo)[1]),
                       list(endo=levels(endo)[2]), fun=bar)
k


Posterior Summaries for First X Settings

  Posterior Mean Posterior Median Lower 0.95 HPD Upper 0.95 HPD
1       310.3338         277.6756       80.89445       625.0841

Posterior Summaries for Second X Settings

  Posterior Mean Posterior Median Lower 0.95 HPD Upper 0.95 HPD
1       1346.334         1346.689        935.765       1734.275

Posterior Summaries of First - Second

  Posterior Mean Posterior Median Lower 0.95 HPD Upper 0.95 HPD
1          -1036        -1044.717      -1549.205      -519.4963

plot(k, which='diff')

Peeking ahead a later section, we can use the constrained partial proportional odds model to assess the proportional odds assumption. Let’s assume that departures from proportional odds (constant increments in log odds) are modeled as linear in square root of calprotectin level. There is only one predictor (endo), so there is only one variable that might act non-proportionally. Hence the second formula has the same right-hand-side as the first formula in the blrm call.

bcalp <- blrm(calpro ~ endo, ~ endo, cppo=sqrt)

Running MCMC with 4 chains, at most 11 in parallel...

Chain 2 finished in 1.0 seconds.
Chain 3 finished in 1.0 seconds.
Chain 4 finished in 1.2 seconds.
Chain 1 finished in 2.4 seconds.

All 4 chains finished successfully.
Mean chain execution time: 1.4 seconds.
Total execution time: 2.5 seconds.

# use cppo=function(y) y for linear x by y interaction
bcalp

Bayesian Constrained Partial Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.134 for Intercepts

blrm(formula = calpro ~ endo, ppo = ~endo, cppo = sqrt)

Frequencies of Responses

  18   30   38   57   61   86  114  168  244  392  483  627  726  781  910  925 
   1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1 
1027 1226 2500 
   1    1    8

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 26	LOO log L -82.39±9.44	g 2.038 [0.968, 3.034]	C 0.854 [0.854, 0.854]
Draws 4000	LOO IC 164.78±18.88	g_p 0.063 [0, 0.18]	D_xy 0.708 [0.708, 0.708]
Chains 4	Effective p 18.95±3.41	EV 0.106 [0, 0.327]
Time 2.9s	B 0.039 [0.034, 0.059]	v 5.056 [0.648, 9.976]
p 1		vp 0.008 [0, 0.036]

	Mode β	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
endo=Moderate or Severe Activity	4.0110	5.1786	5.0566	1.4528	2.4065	8.0962	1.0000	1.22
endo=Moderate or Severe Activity x f(y)	-0.0598	-0.1105	-0.1072	0.0446	-0.1940	-0.0245	0.0003	0.75

The probability that the effect of moderate or severe activity decreases for higher levels of calprotectin is 0.99. Let’s graph the logit of the group-stratified empirical CDFs to see some visual evidence for this.

Ecdf(~ calpro, group=endo, fun=qlogis)

Non-parallelism is readily seen, indicating non-proportional odds. Note that allowing for a systematic departure from proportional odds is like having unequal group variances in a 2-sample \(t\)-test, but is more general.

With the constrained partial proportional odds model one can examine for a given y the evidence for a group effect in the tail of the distribution of Y beyond y. For ordinal logistic models the group effect is captured by the relative odds that Y \(\geq\) y. We can use the rms contrast function to do the calculations for varying y. contrast will evaluate the cppo function as needed (which is a scaled and centered \(\sqrt{}\)).

ys <- seq(100, 2500, by=100)
k <- contrast(bcalp, list(endo='Moderate or Severe Activity'),
                     list(endo='No or Mild Activity'),  ycut=ys)
xl <- 'Calprotectin Cutoff'
par(mfrow=c(1,2))
with(k, plot(ycut, Contrast, xlab=xl, ylab='log OR',    type='l'))
with(k, plot(ycut, PP,       xlab=xl, ylab='P(OR > 1)', type='l'))

1.5 Binary Regression with Restricted Cubic Splines

Turn to the support dataset and fit a binary logistic model to predict the probability of in-hospital death of critically ill adults. blrm keeps posterior sampling efficient by orthonormalizing the design matrix before doing the sampling (this is done internally in the Stan code). This allows for arbitrary collinearities, for example in the basis functions used in restricted cubic splines. When there are such collinearities, expect to see some disagreements in estimates between blrm and lrm, because the latter does not do orthonormalization (only normalization to mean 0 variance 1). Collinearity implies that there are many different solutions to the equations, all giving almost the same predicted values.

getHdata(support)
dd <- datadist(support); options(datadist='dd')
f <- lrm(hospdead ~ dzgroup + rcs(crea, 5) + rcs(meanbp, 5),
                 data=support, eps=1e-4, x=TRUE, y=TRUE)
specs(f)

lrm(formula = hospdead ~ dzgroup + rcs(crea, 5) + rcs(meanbp, 
    5), data = support, x = TRUE, y = TRUE, eps = 1e-04)

        Label                              Assumption
dzgroup dzgroup                            category  
crea    Serum creatinine Day 3             rcspline  
meanbp  Mean Arterial Blood Pressure Day 3 rcspline  
        Parameters                                                                      
dzgroup  ARF/MOSF w/Sepsis COPD CHF Cirrhosis Coma Colon Cancer Lung Cancer MOSF w/Malig
crea     0.59998 0.8999 1.2 1.7998 5.5996                                               
meanbp   47 65.725 78 106 128.05                                                        
        d.f.
dzgroup 7   
crea    4   
meanbp  4

Logistic Regression Model

lrm(formula = hospdead ~ dzgroup + rcs(crea, 5) + rcs(meanbp, 
    5), data = support, x = TRUE, y = TRUE, eps = 1e-04)

Frequencies of Missing Values Due to Each Variable

hospdead  dzgroup     crea   meanbp 
       0        0        3        0

	Model Likelihood Ratio Test	Discrimination Indexes	Rank Discrim. Indexes
Obs 997	LR χ² 278.86	R² 0.360	C 0.820
0 744	d.f. 15	R²_15,997 0.233	D_xy 0.640
1 253	Pr(>χ²) <0.0001	R²_15,566.4 0.372	γ 0.641
max \|∂log L/∂β\| 3×10^-9		Brier 0.138	τ_a 0.243

	β	S.E.	Wald Z	Pr(>\|Z\|)
Intercept	6.9789	1.5068	4.63	<0.0001
dzgroup=COPD	-0.6235	0.3382	-1.84	0.0653
dzgroup=CHF	-2.3452	0.4855	-4.83	<0.0001
dzgroup=Cirrhosis	0.3552	0.3626	0.98	0.3273
dzgroup=Coma	1.6365	0.3273	5.00	<0.0001
dzgroup=Colon Cancer	-0.9825	0.6296	-1.56	0.1187
dzgroup=Lung Cancer	0.0808	0.3424	0.24	0.8135
dzgroup=MOSF w/Malig	0.7160	0.2664	2.69	0.0072
crea	-2.9954	1.1432	-2.62	0.0088
crea'	261.5451	94.6489	2.76	0.0057
crea''	-576.5474	224.5354	-2.57	0.0102
crea'''	338.1780	149.7214	2.26	0.0239
meanbp	-0.1035	0.0231	-4.47	<0.0001
meanbp'	0.2508	0.1923	1.30	0.1922
meanbp''	-0.4008	0.7294	-0.55	0.5827
meanbp'''	0.0499	0.7022	0.07	0.9433

# Compute the apparent standard error of Dxy (not accounting for overfitting)
# for comparison with the Bayesian HPD interval for Dxy
rcorr.cens(predict(f), support$hospdead)

       C Index            Dxy           S.D.              n        missing 
  8.202431e-01   6.404862e-01   2.998483e-02   9.970000e+02   3.000000e+00 
    uncensored Relevant Pairs     Concordant      Uncertain 
  9.970000e+02   3.764640e+05   3.087920e+05   0.000000e+00

Function(f)

function (dzgroup = "ARF/MOSF w/Sepsis", crea = 1.1999512, meanbp = 78) 
{
    6.9789403 - 0.62348171 * (dzgroup == "COPD") - 2.3451715 * 
        (dzgroup == "CHF") + 0.35516492 * (dzgroup == "Cirrhosis") + 
        1.6364993 * (dzgroup == "Coma") - 0.98246981 * (dzgroup == 
        "Colon Cancer") + 0.08076708 * (dzgroup == "Lung Cancer") + 
        0.71602209 * (dzgroup == "MOSF w/Malig") - 2.9954464 * 
        crea + 10.463337 * pmax(crea - 0.59997559, 0)^3 - 23.065276 * 
        pmax(crea - 0.89990234, 0)^3 + 13.529103 * pmax(crea - 
        1.1999512, 0)^3 - 0.90432144 * pmax(crea - 1.7998047, 
        0)^3 - 0.022843078 * pmax(crea - 5.5996094, 0)^3 - 0.10352776 * 
        meanbp + 3.8184801e-05 * pmax(meanbp - 47, 0)^3 - 6.1008621e-05 * 
        pmax(meanbp - 65.725, 0)^3 + 7.5995675e-06 * pmax(meanbp - 
        78, 0)^3 + 1.4835639e-05 * pmax(meanbp - 106, 0)^3 + 
        3.8861354e-07 * pmax(meanbp - 128.05, 0)^3
}
<environment: 0x150088708>

bsup <- blrm(hospdead ~ dzgroup + rcs(crea, 5) + rcs(meanbp, 5), 
                 data=support, file=rfile(bsup))

specs(bsup)

blrm(formula = hospdead ~ dzgroup + rcs(crea, 5) + rcs(meanbp, 
    5), data = support, file = rfile(bsup))

        Label                              Assumption
dzgroup dzgroup                            category  
crea    Serum creatinine Day 3             rcspline  
meanbp  Mean Arterial Blood Pressure Day 3 rcspline  
        Parameters                                                                      
dzgroup  ARF/MOSF w/Sepsis COPD CHF Cirrhosis Coma Colon Cancer Lung Cancer MOSF w/Malig
crea     0.59998 0.8999 1.2 1.7998 5.5996                                               
meanbp   47 65.725 78 106 128.05                                                        
        d.f.
dzgroup 7   
crea    4   
meanbp  4

Function(bsup)   # by default uses posterior mode parameter values

function (dzgroup = "ARF/MOSF w/Sepsis", crea = 1.1999512, meanbp = 78) 
{
    7.1924425 - 0.64309638 * (dzgroup == "COPD") - 2.4486428 * 
        (dzgroup == "CHF") + 0.33661614 * (dzgroup == "Cirrhosis") + 
        1.6725706 * (dzgroup == "Coma") - 1.1269342 * (dzgroup == 
        "Colon Cancer") + 0.067342667 * (dzgroup == "Lung Cancer") + 
        0.73156562 * (dzgroup == "MOSF w/Malig") - 3.0366178 * 
        crea + 10.611733 * pmax(crea - 0.59997559, 0)^3 - 23.388463 * 
        pmax(crea - 0.89990234, 0)^3 + 13.714125 * pmax(crea - 
        1.1999512, 0)^3 - 0.91407897 * pmax(crea - 1.7998047, 
        0)^3 - 0.023316889 * pmax(crea - 5.5996094, 0)^3 - 0.10697823 * 
        meanbp + 4.0258489e-05 * pmax(meanbp - 47, 0)^3 - 6.6411048e-05 * 
        pmax(meanbp - 65.725, 0)^3 + 1.1043116e-05 * pmax(meanbp - 
        78, 0)^3 + 1.4667122e-05 * pmax(meanbp - 106, 0)^3 + 
        4.4232035e-07 * pmax(meanbp - 128.05, 0)^3
}
<environment: 0x3d0bdaa40>

# To add an intercept use e.g. Function(bsup, intercept=coef(g, 'mode')[5])
bsup

Bayesian Logistic Model

Dirichlet Priors With Concentration Parameter 0.541 for Intercepts

blrm(formula = hospdead ~ dzgroup + rcs(crea, 5) + rcs(meanbp, 
    5), data = support, file = rfile(bsup))

Frequencies of Missing Values Due to Each Variable

hospdead  dzgroup     crea   meanbp 
       0        0        3        0

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 997	LOO log L -441.87±18.36	g 1.805 [1.555, 2.087]	C 0.813 [0.807, 0.818]
0 744	LOO IC 883.74±36.72	g_p 0.248 [0.226, 0.27]	D_xy 0.627 [0.615, 0.637]
1 253	Effective p 16.39±1.13	EV 0.289 [0.25, 0.343]
Draws 4000	B 0.14 [0.138, 0.142]	v 2.786 [1.993, 3.661]
Chains 4		vp 0.055 [0.047, 0.066]
Time 4.3s
p 15

	Mode β	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
Intercept	6.9800	7.1924	7.1920	1.5222	4.2607	10.0378	1.0000	1.01
dzgroup=COPD	-0.6244	-0.6431	-0.6297	0.3422	-1.3020	0.0374	0.0253	0.91
dzgroup=CHF	-2.3444	-2.4486	-2.4183	0.4990	-3.4249	-1.4864	0.0000	0.82
dzgroup=Cirrhosis	0.3552	0.3366	0.3453	0.3622	-0.4207	1.0016	0.8275	0.95
dzgroup=Coma	1.6367	1.6726	1.6757	0.3341	1.0049	2.2875	1.0000	1.01
dzgroup=Colon Cancer	-0.9827	-1.1269	-1.0649	0.6700	-2.4928	0.0970	0.0295	0.78
dzgroup=Lung Cancer	0.0806	0.0673	0.0686	0.3534	-0.6082	0.7507	0.5680	0.96
dzgroup=MOSF w/Malig	0.7160	0.7316	0.7370	0.2630	0.2033	1.2327	0.9962	0.99
crea	-2.9963	-3.0366	-3.0185	1.1538	-5.3197	-0.7983	0.0053	1.00
crea'	261.6472	265.2545	262.2460	96.4834	68.9530	450.4253	0.9978	1.03
crea''	-576.7908	-584.6259	-578.3379	229.3328	-1020.1083	-116.1628	0.0043	0.98
crea'''	338.3385	342.8029	340.1524	153.2308	49.1131	644.1841	0.9870	1.02
meanbp	-0.1035	-0.1070	-0.1065	0.0236	-0.1552	-0.0628	0.0000	0.92
meanbp'	0.2506	0.2645	0.2621	0.1955	-0.1221	0.6283	0.9132	1.05
meanbp''	-0.3998	-0.4363	-0.4234	0.7417	-1.7967	1.1082	0.2802	0.96
meanbp'''	0.0488	0.0725	0.0641	0.7142	-1.3454	1.4521	0.5398	1.03

stanDx(bsup)

Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved

For each parameter, n_eff is a crude measure of effective sample size
and Rhat is the potential scale reduction factor on split chains
(at convergence, Rhat=1)


Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Rank-normalized split effective sample size satisfactory for all parameters.

Rank-normalized split R-hat values satisfactory for all parameters.

Processing complete, no problems detected.

EBFMI: 1.03 1.038 1.037 1.06 

   Parameter  Rhat ESS bulk ESS tail
1   alpha[1] 1.000     5185     3287
2    beta[1] 1.001     7400     3185
3    beta[2] 1.002     5308     3112
4    beta[3] 1.001     8876     3003
5    beta[4] 1.001     6980     3298
6    beta[5] 1.001     7094     3026
7    beta[6] 1.000     8571     2907
8    beta[7] 1.001     8574     3156
9    beta[8] 1.003     7376     2501
10   beta[9] 1.001     7964     2897
11  beta[10] 1.000     9796     3065
12  beta[11] 1.001     8059     3125
13  beta[12] 1.002     6957     3363
14  beta[13] 1.002     5500     2959
15  beta[14] 1.003     6080     3062
16  beta[15] 1.003     7851     2852

stanDxplot(bsup)
plot(bsup)

Show approximate relative explained variation (REV) and compare this with Wald statistics from the frequentist lrm model.

plot(rexVar(bsup, support))
plot(anova(f, test='LR'), 'proportion chisq')

Note: To get a Bayesian equivalent of a likelihood ratio test for comparing two models use the rms function compareBmods.

Now compute odds ratios over default inter-quartile ranges for continuous predictors, based on posterior mode parameters. Also show 0.95 HPD intervals. Note that unlike the print method, the plot method for summary doesn’t actually compute HPD intervals, but approximates them by assuming normality and using the standard deviation of the posterior samples. Compare the plot with the ordinary lrm result.

s <- summary(bsup)
s

Effects Response: `hospdead`
	Low	High	Δ	Effect	S.E.	Lower 0.95	Upper 0.95
crea	0.8999	1.9	1.00	1.30100	0.2640	0.77010	1.81800
Odds Ratio	0.8999	1.9	1.00	3.67200		2.16000	6.16000
meanbp	64.7500	107.0	42.25	-0.44960	0.2293	-0.89980	-0.02616
Odds Ratio	64.7500	107.0	42.25	0.63790		0.40670	0.97420
dzgroup --- COPD:ARF/MOSF w/Sepsis	1.0000	2.0		-0.64310	0.3422	-1.30200	0.03739
Odds Ratio	1.0000	2.0		0.52570		0.27200	1.03800
dzgroup --- CHF:ARF/MOSF w/Sepsis	1.0000	3.0		-2.44900	0.4990	-3.42500	-1.48600
Odds Ratio	1.0000	3.0		0.08641		0.03255	0.22620
dzgroup --- Cirrhosis:ARF/MOSF w/Sepsis	1.0000	4.0		0.33660	0.3622	-0.42070	1.00200
Odds Ratio	1.0000	4.0		1.40000		0.65660	2.72300
dzgroup --- Coma:ARF/MOSF w/Sepsis	1.0000	5.0		1.67300	0.3341	1.00500	2.28800
Odds Ratio	1.0000	5.0		5.32600		2.73200	9.85000
dzgroup --- Colon Cancer:ARF/MOSF w/Sepsis	1.0000	6.0		-1.12700	0.6700	-2.49300	0.09696
Odds Ratio	1.0000	6.0		0.32400		0.08268	1.10200
dzgroup --- Lung Cancer:ARF/MOSF w/Sepsis	1.0000	7.0		0.06734	0.3534	-0.60820	0.75070
Odds Ratio	1.0000	7.0		1.07000		0.54440	2.11900
dzgroup --- MOSF w/Malig:ARF/MOSF w/Sepsis	1.0000	8.0		0.73160	0.2630	0.20330	1.23300
Odds Ratio	1.0000	8.0		2.07800		1.22500	3.43000

plot(s)
plot(summary(bsup))

Draw partial effect plots with 0.95 HPD intervals. Point estimates are posterior modes (which can be easily changed).

ggplot(Predict(bsup))

Compute estimated mortality probabilities at all levels of dzgroup adjusting covariates to medians/modes. Need funint=FALSE to tell Predict that fun is a simple 1-1 function of the linear predictor (unlike Mean, Quantile, etc.).

Predict(bsup, dzgroup, fun=plogis, funint=FALSE)

            dzgroup     crea meanbp       yhat       lower      upper
1 ARF/MOSF w/Sepsis 1.199951     78 0.11647598 0.063285714 0.17834741
2              COPD 1.199951     78 0.06681322 0.025722950 0.11618806
3               CHF 1.199951     78 0.01240006 0.002646686 0.02529199
4         Cirrhosis 1.199951     78 0.15808806 0.069927507 0.25959531
5              Coma 1.199951     78 0.40770610 0.246261075 0.58126046
6      Colon Cancer 1.199951     78 0.04722260 0.005327286 0.10224342
7       Lung Cancer 1.199951     78 0.12535633 0.053686520 0.20242028
8      MOSF w/Malig 1.199951     78 0.21536908 0.111790638 0.33916524

Response variable (y):  

Adjust to: crea=1.2 meanbp=78  

Limits are 0.95 confidence limits

Draw a nomogram from posterior mode parameter values.

p <- nomogram(bsup, fun=plogis, funlabel='P(death)')
plot(p)

For comparison here is a nomogram based on maximum likelihood estimates of parameters rather than posterior modes.

plot(nomogram(f, fun=plogis, funlabel='P(death)'))

2 Partial Proportional Odds Model

The proportional odds (PO) assumption is a parallelism assumption reflecting the belief that the effect of baseline variables on, say, \(Y \geq 3\) is the same effect on \(Y \geq 4\). To relax that assumption, Peterson & Harrell (1990) developed the partial proportional odds model (PPO) model. The blrm function accepts a second model formula in the argument named ppo that specifies the subset of predictors for which PO is not to be assumed but for which the model is effectively polytomous (multinomial). Note that for frequentist modeling the R VGAM package handles the PPO model (as will be shown below). VGAM is more flexible than what blrm can do in allowing for all sorts of model restrictions.

The presence of the second formula triggers fitting a PPO model. The default is the unconstrained PPO model, which has \((k - 2) \times q\) extra parameter for \(k\) category \(Y\) and \(q\) equal to the number of columns in the induced design matrix from the second formula. This is normally too many parameters. More typically, the constrained PPO model (Peterson & Harrell (1990) Eq. 6) is used. This model is fitted when a function is provided as the cppo argument. Generalizing Peterson and Harrell Eq. 6 the cppo function can be a continuous function of \(Y\) as well as being a discontinuous function such as an indicator variable that allows a category of \(Y\) (typically the first or last) to have a special effect of selected covariates. When \(Y\) is continuous, cppo would typically be the continuous value of \(Y\) or a monotonic function of it. This induces a model that is akin to having a treatment \(\times\) time interaction in a Cox proportional hazards model, or systematic heteroskedasticity in a linear model. When Y is very skewed, it may be more reasonable to use something like cppo = function(y) y^(1/3). Note that post-fitting functions for estimation and prediction are not implemented for the unconstrained partial PO model.

Note that the cppo function used to trigger the use of the constrained PPO model is never evaluated at the lowest value of Y. That is because in the data likelihood, the probability element for observations at the minimum Y is one minus the probability element at the second lowest value of Y. So you can’t give a special effect at the minimum Y. If for example you have a scale ordered as death, sick, well and you want to allow for a special effect of a covariate on death, reverse the order of levels in the factor variable representing Y and specify cppo=function(y) y == 'dead'.

2.1 Unconstrained Partial PO Model

Let’s look first at the unconstrained PPO model, which is more suitable for categorical Y with not too many categories. Consider a \(2\times 3\) table of proportions (2 treatment groups, 3 ordered outcome levels) where the treatment effect is not in PO. We will fit a PO model and see how well it tries to reconstruct the 6 proportions, then fit a PPO model. As of 2020-05-11 blrm has not implemented predict-type functions for PPO models, so you will se predictions done the long way (which does better show how PPO works). Note: the VGAM vgam function parameterizes PPO effects by \(y\)-specific covariate effects whereas blrm like Peterson and Harrell parameterize the model by estimating increments in log odds for the covariate effect for \(y \geq y\) over and above the effect for \(y \geq 2\) if \(y\) has values 1, 2, …, \(k\). Bayesian model specification is detailed here.

Not shown here is that blrm also allows for random effects with PPO models to handle longitudinal data if a compound symmetry correlation pattern is reasonable.

p0 <- c(.4, .2, .4)
p1 <- c(.3, .1, .6)
m  <- 50             # observations per cell
m0 <- p0 * m         # from proportions to frequencies
m1 <- p1 * m
x  <- c(rep(0, m), rep(1, m))
y0 <- c(rep(1, m0[1]), rep(2, m0[2]), rep(3, m0[3]))
y1 <- c(rep(1, m1[1]), rep(2, m1[2]), rep(3, m1[3]))
y  <- c(y0, y1)
table(x, y)

   y
x    1  2  3
  0 20 10 20
  1 15  5 30

# A PO model cannot reproduce the original proportions
f <- lrm(y ~ x)
f

Logistic Regression Model

lrm(formula = y ~ x)

	Model Likelihood Ratio Test	Discrimination Indexes	Rank Discrim. Indexes
Obs 100	LR χ² 2.96	R² 0.034	C 0.574
1 35	d.f. 1	R²_1,100 0.019	D_xy 0.149
2 15	Pr(>χ²) 0.0855	R²_1,82.9 0.023	γ 0.290
3 50		Brier 0.226	τ_a 0.091
max \|∂log L/∂β\| 9×10^-9

	β	S.E.	Wald Z	Pr(>\|Z\|)
y≥2	0.3177	0.2725	1.17	0.2437
y≥3	-0.3177	0.2725	-1.17	0.2437
x	0.6601	0.3864	1.71	0.0876

predict(f, data.frame(x=c(0, 1)), type='fitted.ind')

        y=1       y=2       y=3
1 0.4212365 0.1575270 0.4212365
2 0.2733366 0.1418996 0.5847638

require(VGAM)

fv <- vgam(y ~ x, cumulative(reverse=TRUE, parallel=TRUE))
coef(fv)

(Intercept):1 (Intercept):2             x 
    0.3177002    -0.3176987     0.6600579

predict(fv, data.frame(x=c(0, 1)), type='response')

          1         2         3
1 0.4212363 0.1575270 0.4212367
2 0.2733369 0.1418997 0.5847635

# Now fit a PPO model that will reproduce all cell proportions
fvppo <- vgam(y ~ x, cumulative(reverse=TRUE, parallel=FALSE))
coef(fvppo)

(Intercept):1 (Intercept):2           x:1           x:2 
    0.4054651    -0.4054651     0.4418328     0.8109302

predict(fvppo, data.frame(x=c(0, 1)), type='response')  # perfect recovery

    1   2   3
1 0.4 0.2 0.4
2 0.3 0.1 0.6

# Function to manually compute cell probablities
pprop <- function(co, type, centered=FALSE) {
  x <- if(centered) c(-0.5, 0.5) else 0:1
  switch(type,
         vgam = {
                   pge2 <- plogis(co[1] + x * co[3])
                 peq3 <- plogis(co[2] + x * co[4])
                 rbind(c(1 - pge2[1], pge2[1] - peq3[1], peq3[1]),
             c(1 - pge2[2], pge2[2] - peq3[2], peq3[2]))
         },      
        blrm = {
            pge2 <- plogis(co[1] + x * co[3])
            peq3 <- plogis(co[2] + x * (co[3] + co[4]))
            rbind(c(1 - pge2[1], pge2[1] - peq3[1], peq3[1]),
                  c(1 - pge2[2], pge2[2] - peq3[2], peq3[2]))
        } )
}          

co <- coef(vgam(y ~ x, cumulative(reverse=TRUE, parallel=FALSE)))
pprop(co, type='vgam')

     [,1] [,2] [,3]
[1,]  0.4  0.2  0.4
[2,]  0.3  0.1  0.6

# Now try blrm
# First fit an onconstrained partial PO model
b <- blrm(y ~ x, ~ x, method='opt')

coef(b)

     y>=2      y>=3         x    x:y>=3 
 0.405330 -0.405536  0.442030  0.368990

pprop(coef(b), type='blrm')

          [,1]      [,2]      [,3]
[1,] 0.4000324 0.1999846 0.3999830
[2,] 0.2999870 0.1000085 0.6000045

# Fit a constrained PPO model
b <- blrm(y ~ x, ~ x, cppo=function(y) y, method='opt')

coef(b)

      y>=2       y>=3          x   x x f(y) 
 0.4055070 -0.4054533 -0.2965560  0.3691440

pprop(coef(b), type='blrm')

          [,1]      [,2]      [,3]
[1,] 0.3999899 0.2000072 0.4000028
[2,] 0.4727892 0.1096672 0.4175436

# First mimic PO model by penalizing PPO term to nearly zero
# Quickly get maximum likelihood estimates (posterior modes)
b <- blrm(y ~ x, ~x, priorsdppo=0.01, method='opt')

coef(b)

         y>=2          y>=3             x        x:y>=3 
 3.177030e-01 -3.176982e-01  6.600460e-01  1.560706e-05

pprop(coef(b), type='blrm')

          [,1]      [,2]      [,3]
[1,] 0.4212356 0.1575275 0.4212368
[2,] 0.2733387 0.1418968 0.5847645

# Now really fit PPO model, at first only getting MLE
# Do full posterior sampling
b <- blrm(y ~ x, ~ x, priorsdppo=1000, method='opt')

coef(b)   # also the posterior mode

      y>=2       y>=3          x     x:y>=3 
 0.4053560 -0.4055613  0.4419820  0.3690740

coef(b)[3] + coef(b)[4]

       x 
0.811056

bppo <- blrm(y ~ x, ~ x, priorsdppo=1000, file=rfile(bppo))

# take differences in last 2 coefficients to get our scheme
# Check recovery of proportions, using posterior mode/mean/median
pprop(coef(bppo, 'mode'),   type='blrm')

          [,1]      [,2]      [,3]
[1,] 0.4000262 0.1999969 0.3999769
[2,] 0.2999916 0.1000013 0.6000071

pprop(coef(bppo, 'mean'),   type='blrm')

          [,1]      [,2]      [,3]
[1,] 0.3977098 0.2095940 0.3926962
[2,] 0.2899299 0.1131405 0.5969296

pprop(coef(bppo, 'median'), type='blrm')

          [,1]      [,2]      [,3]
[1,] 0.3972532 0.2090440 0.3937028
[2,] 0.2925012 0.1132163 0.5942826

bppo

Bayesian Partial Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.541 for Intercepts

blrm(formula = y ~ x, ppo = ~x, priorsdppo = 1000, file = rfile(bppo))

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 100	LOO log L -101.68±4.65	g 0.279 [0.001, 0.623]	C 0.609 [0.355, 0.645]
1 35	LOO IC 203.37±9.3	g_p 0.062 [0, 0.139]	D_xy 0.218 [-0.29, 0.29]
2 15	Effective p 4±0.34	EV 0.024 [0, 0.084]
3 50	B 0.229 [0.225, 0.238]	v 0.111 [0, 0.384]
Draws 4000		vp 0.005 [0, 0.019]
Chains 4
Time 0.7s
p 1

	Mode β	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
y≥2	0.4054	0.4150	0.4169	0.2918	-0.1809	0.9647	0.9278	0.99
y≥3	-0.4056	-0.4360	-0.4318	0.2917	-1.0143	0.1174	0.0645	0.97
x	0.4420	0.4807	0.4663	0.4373	-0.4042	1.3285	0.8750	1.05
x:y≥3	0.3691	0.3480	0.3471	0.3055	-0.2516	0.9425	0.8762	1.04

2.2 Constrained Partial PO Model

Consider the same dataset analyzed above. Specify a constrained PPO model that in this particular case is really unconstrained because it has a total of two parameters to handle the group effect, and there are only \(k=3\) levels of Y.

# cppo function specifies that there is a special effect of x for y=3
bcppo <- blrm(y ~ x, ~ x, cppo=function(y) y == 3, file=rfile(bcppo))

cppo <- bcppo$cppo
cppo

function (y) 
y == 3
<environment: 0x13f487c40>

b <- coef(bcppo, 'mode')
rbind(Mode=b, Mean=coef(bcppo))

         y>=2       y>=3         x x x f(y)
Mode 0.405356 -0.4055614 0.4419820 0.369074
Mean 0.423478 -0.4316312 0.4746615 0.352956

# Compute the 4 cumulative probabilities using the posterior mode (MLE)
# b[3] and b[4] are multiplied x (hence drop out of the x=0 row)
L <- rbind('x=0' = c(b[1], b[2]),
           'x=1' = c(b[1] + b[3] + cppo(2) * b[4], b[2] + b[3] + cppo(3) * b[4]))
plogis(L)

         y>=2      y>=3
x=0 0.5999738 0.3999769
x=1 0.7000084 0.6000071

Now consider the severity of nausea data from Peterson & Harrell (1990).

d0 <- data.frame(tx=0, y=c(rep(0, 43), rep(1, 39), rep(2, 13), rep(3, 22),
                           rep(4, 15), rep(5, 29)))
d1 <- data.frame(tx=1, y=c(rep(0, 7), rep(1, 7), rep(2, 3), rep(3, 12),
                           rep(4, 15), rep(5, 14)))
d <- rbind(d0, d1)
d$tx <- factor(d$tx, 0:1, c('No cisplatin', 'cisplatin'))
dd <- datadist(d); options(datadist='dd')
with(d, table(tx, y))

              y
tx              0  1  2  3  4  5
  No cisplatin 43 39 13 22 15 29
  cisplatin     7  7  3 12 15 14

# Allow for a different effect of tx at y=5
g <- function(y) y==5    # max(y)=5 and y is discrete
# Check against maximum likelihood estimates in Peterson & Harrell
f <- blrm(y ~ tx, ~ tx, cppo=g, data=d, method='opt')

# Compute the treatment effect log(OR) for y=1, 2, 3, 4, 5
k <- coef(f)
k

               y>=1                y>=2                y>=3                y>=4 
         0.99762818         -0.01286082         -0.32561932         -1.01304282 
               y>=5        tx=cisplatin tx=cisplatin x f(y) 
        -1.54661927          1.06393152         -0.62925705

k[6] + g(1:5) * k[7]   # matches paper

[1] 1.0639315 1.0639315 1.0639315 1.0639315 0.4346745

# Now get posterior distributions of parameters
fp <- blrm(y ~ tx, ~ tx, cppo=g, data=d, file=rfile(bnausea))

rbind(coef(f), coef(fp, 'mode'), coef(fp, 'mean'))

          y>=1         y>=2       y>=3      y>=4      y>=5 tx=cisplatin
[1,] 0.9976282 -0.012860819 -0.3256193 -1.013043 -1.546619     1.063932
[2,] 0.9976282 -0.012860819 -0.3256193 -1.013043 -1.546619     1.063932
[3,] 1.0091968 -0.008661389 -0.3268825 -1.018740 -1.571184     1.077618
     tx=cisplatin x f(y)
[1,]          -0.6292571
[2,]          -0.6292571
[3,]          -0.6556726

k <- coef(fp)          # posterior means
k[6] + g(1:5) * k[7]   # close to paper

[1] 1.0776182 1.0776182 1.0776182 1.0776182 0.4219456

dat <- data.frame(tx=levels(d$tx))
# Get posterior mean and 0.95 HPD intervals for treatment
# effects at all levels of y
contrast(fp, list(tx='cisplatin'), list(tx='No cisplatin'), ycut=1:5)

         Contrast      S.E.      Lower    Upper Pr(Contrast>0)
1   y=1 1.0776182 0.2931575  0.5112992 1.653967         0.9998
2*  y=2 1.0776182 0.2931575  0.5112992 1.653967         0.9998
3*  y=3 1.0776182 0.2931575  0.5112992 1.653967         0.9998
4*  y=4 1.0776182 0.2931575  0.5112992 1.653967         0.9998
5   y=5 0.4219456 0.3609584 -0.2730404 1.129299         0.8808

Redundant contrasts are denoted by *

Intervals are 0.95 highest posterior density intervals
Contrast is the posterior mean

Predict(fp, tx)

            tx         yhat      lower    upper
1 No cisplatin -0.008661389 -0.3149432 0.304677
2    cisplatin  1.068956844  0.5266153 1.561403

Response variable (y): log odds 

Limits are 0.95 confidence limits

If the response variable is discrete and has character strings for the ordered factor levels, you can use these strings in the cppo function definition. For example, suppose that Y was a factor variable with these levels: "ok", "in pain", "stroke", or "death". To allow a treatment to a have a different effect for the last two levels while adjusting for a covariate age that is assumed to operate in proportional odds, one can code

f <- blrm(y ~ age + tx, ~ tx, cppo=function(y) y %in% c('stroke', 'death'))

There are special arguments to some of the rms functions for getting estimates or predictions for partial PO models. For the Predict function you can specify ycut or kint to specify the response variable value such that the logit or probability of \(Y\geq y\) is being estimated. For contrast (full name contrast.rms) you can specify the argument y as a scalar or vector. When it is a scalar, that value of the y cutoff is used for all contrasts. When it is a vector, it is assume to either (1) have length equal to the number of contrasts being specified so that the appropriate value of y is used for each contrast, or (2) when only one contrast is requested, the contrast is repeated length(y) times to estimate the effect of varying y. This was done in the last part of the example above.

When the response variable is continuous, it is more flexible to specify ycut to Predict than to specify kint, because for ycut you can specify a value of Y that did not occur in the data (when the cppo function is continuous) to take into account the degree of non-proportional odds at that value.

3 Longitudinal Data Examples: Random Effects

3.1 Schizophrenia Dataset from `mixor` Package

The R mixor package fits frequentist random effects proportional odds models. Let’s analyze the first dataset discussed in the mixor package vignette. The outcome is a 4-level severity of illness scale. Unfortunately, mixor is no longer on CRAN, so we access the dataset on the Vanderbilt Department of Biostatistics data repository, and use the ordinal package to fit the random intercepts model.

require(ordinal)

getHdata(schizophrenia)
d <- schizophrenia
f <- clmm2(factor(imps79o) ~ sqrt(Week) * TxDrug, random=factor(id),
           link='logistic', Hess=TRUE, data=d)

summary(f)

Cumulative Link Mixed Model fitted with the Laplace approximation

Call:
clmm2(location = factor(imps79o) ~ sqrt(Week) * TxDrug, random = factor(id), 
    data = d, Hess = TRUE, link = "logistic")

Random effects:
                Var  Std.Dev
factor(id) 3.606788 1.899154

Location coefficients:
                  Estimate  Std. Error z value   Pr(>|z|)  
sqrt(Week)          -0.7678    0.0145   -52.8160 < 2.22e-16
TxDrug              -0.0681    0.0143    -4.7666 1.8733e-06
sqrt(Week):TxDrug   -1.1982    0.0146   -82.1808 < 2.22e-16

No scale coefficients

Threshold coefficients:
    Estimate  Std. Error z value  
1|2   -5.8488    0.0143  -408.0444
2|3   -2.8233    0.0144  -196.6312
3|4   -0.7143    0.0828    -8.6313

log-likelihood: -1708.111 
AIC: 3430.221 
Condition number of Hessian: 46.31894

sqrt(diag(vcov(f)))

              1|2               2|3               3|4        sqrt(Week) 
       0.01433368        0.01435845        0.08275618        0.01453735 
           TxDrug sqrt(Week):TxDrug                   
       0.01428859        0.01458059        0.01362287

Fit the same model using the Bayesian approach.

bmixor <- blrm(imps79o ~ sqrt(Week) * TxDrug + cluster(id), data=d,
               file=rfile(bmixor))

bmixor

Bayesian Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.455 for Intercepts

blrm(formula = imps79o ~ sqrt(Week) * TxDrug + cluster(id), data = d, 
    file = rfile(bmixor))

Frequencies of Responses

  1   2   3   4 
190 474 412 527

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 1603	B 0.198 [0.194, 0.202]	g 1.954 [1.776, 2.122]	C 0.769 [0.769, 0.771]
Draws 4000		g_p 0.34 [0.322, 0.361]	D_xy 0.538 [0.537, 0.541]
Chains 4		EV 0.401 [0.355, 0.441]
Time 10.8s		v 3.012 [2.496, 3.549]
p 3		vp 0.095 [0.084, 0.106]
Cluster on `id`
Clusters 437
σ_γ 1.9426 [1.701, 2.1768]

	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
y≥2	5.8559	5.8515	0.3255	5.2251	6.5007	1.0000	1.01
y≥3	2.8222	2.8191	0.2854	2.2231	3.3539	1.0000	1.01
y≥4	0.7062	0.7030	0.2709	0.1865	1.2448	0.9955	1.00
Week	-0.7691	-0.7691	0.1268	-1.0105	-0.5244	0.0000	0.97
TxDrug	-0.0624	-0.0667	0.3115	-0.6370	0.5848	0.4125	1.02
Week × TxDrug	-1.2026	-1.2030	0.1485	-1.4946	-0.9112	0.0000	0.99

The posterior median for \(\sigma_\gamma\) is 1.9426 which compares well with the ordinal estimate of 1.899. The \(\hat{\beta}\)s also compare very well. Note that the model is stated differently, which makes two of the intercepts have different meanings across packages.

3.2 Simulated Random Effects Longitudinal Data

Let’s generate some data with repeatedly measured outcome per subject where the outcome is binary and the random effects have a \(N(0, 0.25^2)\) distribution. 500 subjects have 10 measurements each.

n <- 500   # subjects
set.seed(2)
re <- rnorm(n) * 0.25   # worked fine also with rnorm(n) * 4
X <- runif(n)   # baseline covariate, will be duplicated over repeats
m <- 10         # measurements per subject

id <- rep(1 : n, each = m)
x  <- X[id]
L <- x + re[id]   # actual logit
y <- ifelse(runif(n * m) <= plogis(L), 1, 0)
f <- lrm(y ~ x, x=TRUE, y=TRUE)     # ordinary fit
f     # now use cluster sandwich covariance estimator:

Logistic Regression Model

lrm(formula = y ~ x, x = TRUE, y = TRUE)

	Model Likelihood Ratio Test	Discrimination Indexes	Rank Discrim. Indexes
Obs 5000	LR χ² 93.67	R² 0.025	C 0.581
0 1908	d.f. 1	R²_1,5000 0.018	D_xy 0.162
1 3092	Pr(>χ²) <0.0001	R²_1,3539.7 0.026	γ 0.162
max \|∂log L/∂β\| 9×10^-8		Brier 0.232	τ_a 0.076

	β	S.E.	Wald Z	Pr(>\|Z\|)
Intercept	0.0087	0.0566	0.15	0.8783
x	0.9432	0.0983	9.60	<0.0001

g <- robcov(f, id)  # covariance matrix adjusted for clustering
g

Logistic Regression Model

lrm(formula = y ~ x, x = TRUE, y = TRUE)

	Model Likelihood Ratio Test	Discrimination Indexes	Rank Discrim. Indexes
Obs 5000	LR χ² 93.67	R² 0.025	C 0.581
0 1908	d.f. 1	R²_1,5000 0.018	D_xy 0.162
1 3092	Pr(>χ²) <0.0001	R²_1,3539.7 0.026	γ 0.162
Cluster on `id`		Brier 0.232	τ_a 0.076
Clusters 500
max \|∂log L/∂β\| 9×10^-8

	β	S.E.	Wald Z	Pr(>\|Z\|)
Intercept	0.0087	0.0613	0.14	0.8876
x	0.9432	0.1056	8.93	<0.0001

We first fit an inappropriate Bayesian model in which the random effects are omitted.

# Note: loo defaults to FALSE when n > 1000 as in this case
# Need loo for compareBmods
breo <- blrm(y ~ x, loo=TRUE, file=rfile(breo))

breo

Bayesian Logistic Model

Dirichlet Priors With Concentration Parameter 0.541 for Intercepts

blrm(formula = y ~ x, loo = TRUE, file = rfile(breo))

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 5000	LOO log L -3279.4±18.94	g 0.325 [0.255, 0.388]	C 0.581 [0.581, 0.581]
0 1908	LOO IC 6558.8±37.87	g_p 0.076 [0.061, 0.092]	D_xy 0.162 [0.162, 0.162]
1 3092	Effective p 2.03±0.02	EV 0.019 [0.011, 0.026]
Draws 4000	B 0.232 [0.232, 0.232]	v 0.08 [0.049, 0.113]
Chains 4		vp 0.004 [0.003, 0.006]
Time 9.1s
p 1

	Mode β	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
Intercept	0.0087	0.0096	0.0102	0.0569	-0.1058	0.1175	0.5655	1.02
x	0.9432	0.9419	0.9413	0.0979	0.7610	1.1413	1.0000	1.04

Now use a proper Bayesian random effects model. The prior distribution for the standard deviation \(\sigma_{\gamma}\) of the random effects (\(\gamma\)s) is assumed to be exponential when psigma=2, and we will use the default mean of 1.0.

bre <- blrm(y ~ x + cluster(id), psigma=2, loo=TRUE, file=rfile(bre))

bre

Bayesian Logistic Model

Dirichlet Priors With Concentration Parameter 0.541 for Intercepts

blrm(formula = y ~ x + cluster(id), psigma = 2, loo = TRUE, file = rfile(bre))

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 5000	LOO log L -3272.97±19.26	g 0.334 [0.247, 0.403]	C 0.581 [0.581, 0.581]
0 1908	LOO IC 6545.94±38.52	g_p 0.078 [0.059, 0.094]	D_xy 0.162 [0.162, 0.162]
1 3092	Effective p 85.43±0.68	EV 0.019 [0.01, 0.027]
Draws 4000	B 0.232 [0.232, 0.232]	v 0.085 [0.044, 0.12]
Chains 4		vp 0.005 [0.003, 0.007]
Time 30.3s
p 1
Cluster on `id`
Clusters 500
σ_γ 0.2989 [0.1547, 0.4154]

	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
Intercept	0.0086	0.0080	0.0639	-0.1108	0.1375	0.5518	0.97
x	0.9622	0.9621	0.1095	0.7496	1.1762	1.0000	1.03

plot(bre)

Before delving more into the random effects model, let’s compare this new model with the previous model that erroneously omitted the random effects.

compareBmods(breo, bre)

Method: stacking
------
       weight
model1 0.063 
model2 0.937

Roughly speaking, of the two models, the one with random effects has a probability of 0.93 of being the correct one. See rstan::loo and loo::loo.array for details.

Now let’s get into more details from the random effects model fit.

# Plot distribution of the 500 estimated random effects (posterior medians)
hist(bre$gammas, xlab='Estimated Random Effects', nclass=40)

Now generate similar data except for a bimodal random effects distribution. This will fool the random effects normal prior into having a wider variance for a single normal distribution but will still result in estimated random effects that are somewhat realistic.

n <- 500
set.seed(3)
re <- c(rnorm(n/2, mean=-1.75), rnorm(n/2, mean=1.75)) * 0.25
cat('SD of real random effects:', round(sd(re), 4), '\n')

SD of real random effects: 0.5115

X <- runif(n)   # baseline covariate, will be duplicated over repeats
m <- 10         # measurements per subject

id <- rep(1 : n, each = m)
x  <- X[id]
L <- x + re[id]   # actual logit
y <- ifelse(runif(n * m) <= plogis(L), 1, 0)
breb <- blrm(y ~ x + cluster(id), file=rfile(breb))

breb

Bayesian Logistic Model

Dirichlet Priors With Concentration Parameter 0.541 for Intercepts

blrm(formula = y ~ x + cluster(id), file = rfile(breb))

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 5000	B 0.233 [0.233, 0.234]	g 0.343 [0.246, 0.434]	C 0.578 [0.578, 0.578]
0 1928		g_p 0.079 [0.057, 0.1]	D_xy 0.156 [0.156, 0.156]
1 3072		EV 0.02 [0.01, 0.032]
Draws 4000		v 0.09 [0.045, 0.141]
Chains 4		vp 0.005 [0.002, 0.008]
Time 24.7s
p 1
Cluster on `id`
Clusters 500
σ_γ 0.5832 [0.487, 0.676]

	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
Intercept	-0.0141	-0.0129	0.0816	-0.1767	0.1385	0.4315	1.01
x	1.0812	1.0800	0.1483	0.7928	1.3623	1.0000	0.97

par(mfrow=c(2, 2))
hist(breb$gammas, xlab='Estimated Random Effects', nclass=40, main='')
hist(re,       xlab='Real Random Effects',      nclass=40, main='')
plot(re, breb$gammas, xlab='Real', ylab='Estimated')
abline(a=0, b=1)

3.3 Absorbing State in Mixed Effects Ordinal Regression

blrm is not designed to handle this situation but let’s see how it performs.

For an ordinal outcome y=0, 1, 2, 3, 4, 5 suppose that y=5 represents an absorbing state such as death. Suppose that subjects are observed for 10 days, and if death occurs within those days, all later values of y for that subject are set to 5. Generate repeated outcomes under a \(N(0, 0.25^2)\) random effects model with two treatments: a and b. The b:a odds ratio is 0.65 and the cell probabilities are 0.3, 0.3, 0.1, 0.1, 0.1, 0.1 corresponding to y=0-5, when the random effect is zero.

# Generate data as if there is no absorbing state
n <- 1000
set.seed(6)
pa <- c(.3, .3, .1, .1, .1, .1)     # P(Y=0-5 | tx=a, random effect=0)
pb <- pomodm(p=pa, odds.ratio=0.65) # P(Y=0-5 | tx=b, re=0)   # Hmisc
round(pb, 3)

[1] 0.397 0.300 0.084 0.078 0.072 0.067

re <- rnorm(n) * 0.25
tx <- c(rep('a', n/2), rep('b', n/2))   # will be duplicated over repeats
m <- 10         # measurements per subject

id   <- rep(1 : n, each = m)
time <- rep(1 : m, n)
or   <- exp(log(0.65) * (tx[id] == 'b') + re[id])
y   <- integer(n * m)
for(j in 1 : (n * m)) {
  p    <- pomodm(p=pa, odds.ratio=or[j])
  y[j] <- sample(0:5, 1, p, replace=TRUE)
}
Tx <- tx[id]
table(Tx, y)

   y
Tx     0    1    2    3    4    5
  a 1517 1448  506  516  491  522
  b 2008 1524  424  379  320  345

The first Bayesian proportional odds model fitted is the one that exactly matches the data generation model, as we have not yet imposed an absorbing state, so that outcomes with y < 5 can appear after a y=5 outcome for the subject.

bst <- blrm(y ~ Tx + cluster(id), file=rfile(bst))

bst

Bayesian Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.345 for Intercepts

blrm(formula = y ~ Tx + cluster(id), file = rfile(bst))

Frequencies of Responses

   0    1    2    3    4    5 
3525 2972  930  895  811  867

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 10000	B 0.226 [0.226, 0.226]	g 0.238 [0.199, 0.276]	C 0.594 [0.594, 0.594]
Draws 4000		g_p 0.054 [0.046, 0.063]	D_xy 0.189 [0.189, 0.189]
Chains 4		EV 0.013 [0.009, 0.017]
Time 76.9s		v 0.057 [0.04, 0.076]
p 1		vp 0.003 [0.002, 0.004]
Cluster on `id`
Clusters 1000
σ_γ 0.3079 [0.2402, 0.3687]

	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
y≥1	0.8704	0.8702	0.0322	0.8077	0.9315	1.0000	1.00
y≥2	-0.3986	-0.3983	0.0309	-0.4576	-0.3405	0.0000	1.02
y≥3	-0.8555	-0.8560	0.0317	-0.9140	-0.7905	0.0000	1.00
y≥4	-1.4108	-1.4109	0.0343	-1.4731	-1.3400	0.0000	1.00
y≥5	-2.1774	-2.1776	0.0407	-2.2551	-2.0974	0.0000	0.97
Tx=b	-0.4779	-0.4778	0.0418	-0.5605	-0.3981	0.0000	0.98

stanDx(bst)

Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved

For each parameter, n_eff is a crude measure of effective sample size
and Rhat is the potential scale reduction factor on split chains
(at convergence, Rhat=1)


Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Rank-normalized split effective sample size satisfactory for all parameters.

Rank-normalized split R-hat values satisfactory for all parameters.

Processing complete, no problems detected.

EBFMI: 0.602 0.671 0.645 0.658 

  Parameter  Rhat ESS bulk ESS tail
1  alpha[1] 1.001     3784     3115
2  alpha[2] 1.002     3457     2874
3  alpha[3] 1.000     3116     3011
4  alpha[4] 1.001     3352     3075
5  alpha[5] 1.002     3184     2605
6   beta[1] 1.000     3345     2567
7 sigmag[1] 1.004      993     1778

stanDxplot(bst, 'ALL')

If time were to be added to the above model, you’ll see that its regression coefficient is very small (\(\hat{\beta}=0.009\) in this case), in alignment with the data generating model.

Now assume that state y=5 is an absorbing state. Change observations after the first y=5 within subject to also have y=5.

require(data.table)

g <- function(x) if(length(x)) min(x, na.rm=TRUE) else 99L
u <- data.table(id, time, Tx, y, key='id')
# Add variable 'first' which is time of first y=5 for subject (99 if never)
w <- u[, .(first=g(time[y == 5])), by=id]
d <- u[w]

# Show distribution of first time of y=5
table(d[time == 1, first])


  1   2   3   4   5   6   7   8   9  10  99 
 82  70  66  70  50  55  39  51  43  43 431

# Set all observations after the first y=5 to also have y=5
z <- d
z[time > first, y:=5]
table(u$y); table(d$y); table(z$y)


   0    1    2    3    4    5 
3525 2972  930  895  811  867


   0    1    2    3    4    5 
2628 2072  652  609  558 3481


   0    1    2    3    4    5 
2628 2072  652  609  558 3481

bcf <- blrm(y ~ Tx + cluster(id), data=z, file=rfile(bcf))

bcf

Bayesian Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.345 for Intercepts

blrm(formula = y ~ Tx + cluster(id), data = z, file = rfile(bcf))

Frequencies of Responses

   0    1    2    3    4    5 
2628 2072  652  609  558 3481

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 10000	B 0.252 [0.247, 0.256]	g 0.386 [0.265, 0.514]	C 0.608 [0.608, 0.608]
Draws 4000		g_p 0.09 [0.063, 0.12]	D_xy 0.216 [0.216, 0.216]
Chains 4		EV 0.035 [0.015, 0.058]
Time 63.1s		v 0.153 [0.065, 0.255]
p 1		vp 0.008 [0.003, 0.013]
Cluster on `id`
Clusters 1000
σ_γ 1.7527 [1.6516, 1.8538]

	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
y≥1	2.0838	2.0831	0.0889	1.8995	2.2493	1.0000	0.95
y≥2	0.8527	0.8540	0.0872	0.6825	1.0191	1.0000	0.97
y≥3	0.4575	0.4593	0.0871	0.2939	0.6276	1.0000	0.96
y≥4	0.0511	0.0537	0.0868	-0.1184	0.2169	0.7260	0.96
y≥5	-0.3843	-0.3821	0.0868	-0.5540	-0.2157	0.0000	0.92
Tx=b	-0.7619	-0.7598	0.1219	-1.0029	-0.5285	0.0000	0.96

hist(bcf$gammas, xlab='Estimated Random Effects', nclass=40, main='')

The regression coefficient for treatment is too large (the true value is log(0.65) = -0.446). The standard deviation of random effects is large (the true value is 0.25), reflecting increased dependence of outcomes without subject due to the duplication of y=5 records. However the data being analyzed were not formally generated with the model that has a treatment odds ratio of 0.65. Repeated correlated ordinal outcomes were generated with that odds ratio and with a random effect standard deviation of 0.25, but then the outcomes were overridden in the following fashion: The first time within a subject that y=5 causes suppression of all later records.

The histogram of estimated subject random effects (posterior medians) shows some bimodality with heavy right tail due to the y=5 absorbing state. Let s also plot the random effects against the time of death (99 if the subject did not die, recoded here to 15).

t5 <- subset(z, time == 1)$first
t5 <- ifelse(t5 == 99, 15, t5)
plot(t5, bcf$gammas, xlab='Time of y=5', ylab='Random Effect')

What happens with time is added to this model?

bcft <- blrm(y ~ Tx + time + cluster(id), data=z, file=rfile(bcft))

bcft

Bayesian Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.345 for Intercepts

blrm(formula = y ~ Tx + time + cluster(id), data = z, file = rfile(bcft))

Frequencies of Responses

   0    1    2    3    4    5 
2628 2072  652  609  558 3481

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 10000	B 0.244 [0.24, 0.248]	g 0.988 [0.913, 1.066]	C 0.636 [0.635, 0.636]
Draws 4000		g_p 0.208 [0.194, 0.222]	D_xy 0.272 [0.271, 0.272]
Chains 4		EV 0.14 [0.123, 0.158]
Time 65.8s		v 0.745 [0.636, 0.871]
p 2		vp 0.033 [0.029, 0.038]
Cluster on `id`
Clusters 1000
σ_γ 1.9432 [1.83, 2.051]

	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
y≥1	0.8613	0.8622	0.0965	0.6604	1.0450	1.0000	0.99
y≥2	-0.4474	-0.4461	0.0960	-0.6449	-0.2641	0.0000	0.98
y≥3	-0.8809	-0.8781	0.0967	-1.0777	-0.7009	0.0000	1.00
y≥4	-1.3420	-1.3402	0.0974	-1.5336	-1.1544	0.0000	0.99
y≥5	-1.8633	-1.8608	0.0982	-2.0627	-1.6787	0.0000	0.97
Tx=b	-0.8383	-0.8375	0.1308	-1.0936	-0.5798	0.0000	1.03
time	0.2614	0.2614	0.0076	0.2461	0.2759	1.0000	0.98

We see that the slope of time is very large, but the treatment effect and random effect standard deviation are still very large.

Look at random effects again.

hist(bcft$gammas, xlab='Estimated Random Effects', nclass=40, main='')
plot(t5, bcft$gammas, xlab='Time of y=5', ylab='Random Effect')

Next we truncate patient records so that y=5 is not carried forward.

zt <- z[time <= first]
bnc <- blrm(y ~ Tx + cluster(id), data=zt, file=rfile(bnc))

bnc

Bayesian Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.345 for Intercepts

blrm(formula = y ~ Tx + cluster(id), data = zt, file = rfile(bnc))

Frequencies of Responses

   0    1    2    3    4    5 
2628 2072  652  609  558  569

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 7088	B 0.231 [0.231, 0.231]	g 0.238 [0.194, 0.28]	C 0.592 [0.592, 0.592]
Draws 4000		g_p 0.054 [0.045, 0.064]	D_xy 0.185 [0.185, 0.185]
Chains 4		EV 0.013 [0.009, 0.018]
Time 52.4s		v 0.057 [0.038, 0.079]
p 1		vp 0.003 [0.002, 0.004]
Cluster on `id`
Clusters 1000
σ_γ 0.3134 [0.2185, 0.4047]

	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
y≥1	0.8331	0.8330	0.0400	0.7614	0.9200	1.0000	1.05
y≥2	-0.4153	-0.4153	0.0375	-0.4865	-0.3424	0.0000	0.99
y≥3	-0.8780	-0.8786	0.0388	-0.9521	-0.8009	0.0000	1.03
y≥4	-1.4308	-1.4309	0.0419	-1.5101	-1.3449	0.0000	1.01
y≥5	-2.2166	-2.2160	0.0499	-2.3144	-2.1205	0.0000	0.99
Tx=b	-0.4758	-0.4757	0.0487	-0.5742	-0.3817	0.0000	0.97

Finally, add time to the above model.

bnct <- blrm(y ~ Tx + time + cluster(id), data=zt, file=rfile(bnct))

bnct

Bayesian Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.345 for Intercepts

blrm(formula = y ~ Tx + time + cluster(id), data = zt, file = rfile(bnct))

Frequencies of Responses

   0    1    2    3    4    5 
2628 2072  652  609  558  569

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 7088	B 0.231 [0.231, 0.232]	g 0.25 [0.196, 0.301]	C 0.548 [0.547, 0.55]
Draws 4000		g_p 0.057 [0.046, 0.069]	D_xy 0.096 [0.095, 0.101]
Chains 4		EV 0.013 [0.008, 0.018]
Time 51.8s		v 0.058 [0.035, 0.082]
p 2		vp 0.003 [0.002, 0.004]
Cluster on `id`
Clusters 1000
σ_γ 0.3188 [0.2119, 0.4072]

	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
y≥1	0.8141	0.8136	0.0535	0.7131	0.9224	1.0000	1.00
y≥2	-0.4353	-0.4355	0.0524	-0.5383	-0.3317	0.0000	0.99
y≥3	-0.8989	-0.8993	0.0533	-1.0057	-0.7956	0.0000	1.00
y≥4	-1.4522	-1.4525	0.0557	-1.5582	-1.3422	0.0000	1.02
y≥5	-2.2385	-2.2381	0.0631	-2.3615	-2.1139	0.0000	1.02
Tx=b	-0.4790	-0.4786	0.0492	-0.5711	-0.3777	0.0000	0.98
time	0.0047	0.0045	0.0084	-0.0118	0.0208	0.7148	1.06

The time effect is very weak, and adding it did not change the already-accurate (with respect to the first part of the data generating mechanism) treatment effect posterior mean.

4 Censored Data

The blrm function handles left-, right-, and interval-censored ordinal categorical or continuous Y. This opens up numerous possibilities, for example

one can analyze the usual right-censored time-to-event outcome but assuming, for example, prpoportional odds instead of proportional hazards
there can be more than one lower limit of detectability for a lab measurement
in a longitudinal study certain ranges of the outcome scale may not be assessed on certain days

As an example of the third situation, suppose that Y is defined as follows:

Level of Y	Meaning
0	best quality of life
1	very good QOL
2	fair QOL
3	poor QOL
4	myocardial infarction
5	stroke
6	death

Suppose that Y were assessed weekly and that the clinical events of MI, stroke, or death are always known when they occur. But suppose that QOL is only assessed once per month. Instead of dealing with complex missing data methods, consider Y to be partially assessed by the use of left censoring. On weeks of non-assessment of QOL consider Y to just be known to be < 4 when the participant is event-free.

blrm uses the Ocens function (“ordinal censoring”) to handle censored Y. The notation is Ocens(a, b) where a is the lowest value that Y might be, and b is the highest value it might be for a certain observation. If a=b, the observation is uncensored. If in a given week a patient has a clinical event, that event overrides any level of QOL and will result in Y=5, for example, for stroke. For a week for which a clinical event does not occur, we know that Y < 4. When QOL is assessed, we know which of Y=0,1,2,3 pertains. When QOL is not assessed, Y = Ocens(0, 4). It should be clear that if QOL were not assessed for any participant, the dependent variable is really a 4-level outcome (3 clinical outcomes, with Y=0 denoting that no bad outcome occurred for the participant).

In full-likelihood models such as our extended PO model, censored data are easily handled. One just has to compute the contribution to the log-likelihood for each observation from the information it provides. An observation interval-censored with Y \(\in [3,4]\) has a likelihood equal to the model’s probability that Y is between 3 and 4 inclusive. For a cumulative probability model this easily derived from \(P(Y \geq 3) - P(Y > 4)\) which is a difference in expits.

Here is an example where we simulate a dataset with a 6-level ordinal response variable with no censored values, and fit the PO model to it. Then the dataset is duplicated and all the observations with y=1 or 2 are left censored at <= 2 (which is the same as interval censoring on [1,2]), all those with y=3 or 4 are interval censored in [3,4], and all those with y=5 or 6 are right censored at y >= 5. The model is re-fitted to see if the posterior mean regression coefficients remain relatively unchanged.

set.seed(1)
n <- 500
x <- rnorm(n)
y <- as.integer(cut2(x + rnorm(n), g=6))
f <- blrm(y ~ x, file=rfile(bnocens))

m <- length(x)
y2 <- y[1 : m]
a <- b <- y2
# Left censor obs with y <= 2
i <- y2 <= 2
a[i] <- 1
b[i] <- 2
# Interval censor obs with y = 3 or 4
i <- y2 == 3 | y2 == 4
a[i] <- 3
b[i] <- 4
# Right censor obs with y = 5 or 6
i <- y2 >= 5
a[i] <- 5
b[i] <- 6
table(y2, paste0('[', a, ',', b, ']'))

   
y2  [1,2] [3,4] [5,6]
  1    84     0     0
  2    83     0     0
  3     0    83     0
  4     0    84     0
  5     0     0    83
  6     0     0    83

Y <- Ocens(c(y, a), c(y, b))
x <- c(x, x[1 : m])
g <- blrm(Y ~ x, file=rfile(bcens))

rbind('No cens:mode'=coef(f, 'mode'),
      'No cens:mean'=coef(f, 'mean'),
      'Cens:mode'   =coef(g, 'mode'),
            'Cens:mean'   =coef(g, 'mean'))

                 y>=2      y>=3         y>=4      y>=5      y>=6        x
No cens:mode 2.199364 0.9533138 -0.012486669 -1.015299 -2.288436 1.532681
No cens:mean 2.204290 0.9540755 -0.013017776 -1.018180 -2.292009 1.535130
Cens:mode    2.231893 0.9764529 -0.006157364 -1.024540 -2.313647 1.581299
Cens:mean    2.238116 0.9773347 -0.005698193 -1.026383 -2.317654 1.585545

5 Multiple Imputation

When possible, full joint Bayesian modeling of (possibly missing) covariates and the outcome variable should be used to get exact inference in the presence of missing covariate values. Another good approach is to use multiple imputation with stacking of posterior draws after running the Bayesian model for each completed dataset. When doing posterior inference on the stacked posterior draws the uncertainty from multiple imputation is fully taken into account, and so is the change in posterior distribution. Frequentist inference requires complex adjustments, as multiple imputation alters the sampling distribution of model parameter estimates. For example, regression coefficient estimates that have a normal distribution with complete data may have a \(t\)-like distribution after multiple imputation.

blrm works with the stackMI function to make posterior stacking easier. It works with Hmisc::aregImpute and the mice package. stackMI is the analog of the Hmisc::fit.mult.impute but is much simpler due to the use of the Bayesian paradigm.

Here is an example adapted from the aregImpute help file. The aregImpute result is stored so that the random seed-initiated multiple imputation process will not be re-run if not necessary. That will allow the Stan code to not be run again until the underlying data changes or the code in the chunks changes.

set.seed(2)
n <- 1000
x1 <- factor(sample(c('a','b','c'), n, TRUE))
x2 <- (x1=='b') + 3*(x1=='c') + rnorm(n,0,2)
x3 <- rnorm(n)
xbeta <- 0.35 * (x2 + 1 * (x1 == 'c') + 0.2 * x3)
y  <- ifelse(runif(n) <= plogis(xbeta), 1, 0)
x1[1:250]   <- NA
x2[251:350] <- NA
d <- data.frame(x1,x2,x3,y, stringsAsFactors=TRUE)

mithere <- file.exists('mi.rds')
mi <- if(mithere) readRDS('mi.rds') else
  aregImpute(~ y + x1 + x2 + x3, nk=3, data=d, B=10, n.impute=5, pr=FALSE)
if(! mithere) saveRDS(mi, 'mi.rds', compress='xz')
mi


Multiple Imputation using Bootstrap and PMM

aregImpute(formula = ~y + x1 + x2 + x3, data = d, n.impute = 5, 
    nk = 3, pr = FALSE, B = 10)

n: 1000     p: 4    Imputations: 5      nk: 3 

Number of NAs:
  y  x1  x2  x3 
  0 250 100   0 

   type d.f.
y     l    1
x1    c    2
x2    s    1
x3    s    2

Transformation of Target Variables Forced to be Linear

R-squares for Predicting Non-Missing Values for Each Variable
Using Last Imputations of Predictors
   x1    x2 
0.283 0.376

# Note: The following model will be re-run every time aregImpute runs
# because imputations have randomness
bmi <- stackMI(y ~ x1 + x2 + x3, blrm, mi, data=d, refresh=50, file=rfile(bmi))

stanDx(bmi)

Diagnostics for each of 5 imputations

Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved

For each parameter, n_eff is a crude measure of effective sample size
and Rhat is the potential scale reduction factor on split chains
(at convergence, Rhat=1)


Imputation 1 


Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Rank-normalized split effective sample size satisfactory for all parameters.

Rank-normalized split R-hat values satisfactory for all parameters.

Processing complete, no problems detected.

EBFMI: 1.045 1.088 1.192 1.014 

  Parameter  Rhat ESS bulk ESS tail
1  alpha[1] 1.002     4286     3003
2   beta[1] 1.000     4383     3275
3   beta[2] 1.000     5123     3068
4   beta[3] 1.000     4451     2939
5   beta[4] 1.001     4855     2838

Imputation 2 


Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Rank-normalized split effective sample size satisfactory for all parameters.

Rank-normalized split R-hat values satisfactory for all parameters.

Processing complete, no problems detected.

EBFMI: 1.121 1.16 1.202 1.11 

  Parameter  Rhat ESS bulk ESS tail
1  alpha[1] 1.001     3837     3138
2   beta[1] 1.001     4139     3010
3   beta[2] 1.002     4278     3010
4   beta[3] 1.001     4428     3232
5   beta[4] 1.000     4582     2948

Imputation 3 


Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Rank-normalized split effective sample size satisfactory for all parameters.

Rank-normalized split R-hat values satisfactory for all parameters.

Processing complete, no problems detected.

EBFMI: 1.079 1.123 1.152 1.074 

  Parameter  Rhat ESS bulk ESS tail
1  alpha[1] 1.001     4566     3309
2   beta[1] 1.003     4790     3028
3   beta[2] 1.001     3960     3104
4   beta[3] 1.002     3848     2714
5   beta[4] 1.002     4607     3156

Imputation 4 


Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Rank-normalized split effective sample size satisfactory for all parameters.

Rank-normalized split R-hat values satisfactory for all parameters.

Processing complete, no problems detected.

EBFMI: 1.064 0.999 1.169 1.114 

  Parameter  Rhat ESS bulk ESS tail
1  alpha[1] 1.000     4387     2613
2   beta[1] 1.000     4385     3205
3   beta[2] 1.001     4546     3358
4   beta[3] 1.000     4388     2896
5   beta[4] 1.001     4750     2806

Imputation 5 


Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Rank-normalized split effective sample size satisfactory for all parameters.

Rank-normalized split R-hat values satisfactory for all parameters.

Processing complete, no problems detected.

EBFMI: 1.154 1.206 1.117 1.123 

  Parameter  Rhat ESS bulk ESS tail
1  alpha[1] 1.003     4379     2823
2   beta[1] 1.000     4747     3106
3   beta[2] 1.000     4190     2989
4   beta[3] 1.000     4718     2822
5   beta[4] 1.002     4802     3058

stanDxplot(bmi, which='x1=c', rev=TRUE, stripsize=5)

plot(bmi)

One can see that the 5 individual posterior distributions for the frequently missing variables x1 and x2 vary a lot, but not so for the never missing variable x3.

Computations done on the bmi object will automatically use the full stacked posterior distribution.

6 Scaling and Intepretation of Priors on \(\beta\)s

blrm orthonormalizes the data design matrix using the QR decomposition to greatly improve posterior distribution sampling in the case of collinearities among predictors (especially among spline basis functions). This created complexities for rmsb prior to version 1.0-0. With version 1.0-0 priors are specified only on contrasts, and these contrasts are automatically translated to QR space, so users do not need to account for QR.

One exception is that for non-proportional odds parameters one must use the keepsep argument to hold them out of QR, and the prior variance for regression coefficients (here \(\tau\)s) is provided. In the future this will be handled through contrasts also.

7 Speed of `blrm` For Large Numbers of Y Levels

When there is a large number of intercepts in the model, the speed of blrm will decrease. What about the speed of using blrm just to get (potentially penalized) maximum likelihood estimates? Let’s try fitting a progressively more continuous dependent variable.

set.seed(1)
n <- 1000
x <- rnorm(n)
y <- x + rnorm(n)
for(g in c(2, 4, 8, 16, 32, 64, 128, 256)) {
  cat('\n', g, 'distinct values of y\n')
  yg <- cut2(y, g=g)
  print(system.time(f <- blrm(yg ~ x, method='optimizing')))
}


 2 distinct values of y

   user  system elapsed 
  0.084   0.150   0.362 

 4 distinct values of y

   user  system elapsed 
  0.086   0.158   0.362 

 8 distinct values of y

   user  system elapsed 
  0.085   0.151   0.350 

 16 distinct values of y

   user  system elapsed 
  0.085   0.155   0.358 

 32 distinct values of y

   user  system elapsed 
  0.084   0.149   0.354 

 64 distinct values of y

   user  system elapsed 
  0.089   0.149   0.372 

 128 distinct values of y

   user  system elapsed 
  0.087   0.147   0.352 

 256 distinct values of y

   user  system elapsed 
  0.090   0.148   0.359

This is impressive. For g=256 compare with the execution time of the Newton-Raphson method making optimum use of sparse matrices. Also compare coefficients. When sampling is done, the default Dirichlet distribution concentration parameter for the intercepts is selected to make the posterior means agree with maximum likelihood estimates, sacrificing some performance of posterior modes. When method='optimizing', instead a concentration parameter of 1.0 for the Dirichlet prior distribution for intercepts is always used, which seems to make optimization agree more with maximum likelihood estimates. This optimization is used to get posterior modes when random effects are not present.

system.time(g <- orm(yg ~ x))

   user  system elapsed 
  0.008   0.001   0.008

plot(coef(g), coef(f), xlab='Coefficients from orm',
     ylab='Coefficients from blrm')
abline(a=0, b=1, col=gray(0.8))

See how long it takes to do posterior sampling with Stan when there are 16, 64, or 128 levels of y.

for(g in c(16, 64, 128)) {
  cat('\n', g, 'distinct values of y\n')
  yg <- cut2(y, g=g)
  print(system.time(h <- blrm(yg ~ x)))
}


 16 distinct values of y

Running MCMC with 4 chains, at most 11 in parallel...

Chain 1 finished in 2.3 seconds.
Chain 2 finished in 2.3 seconds.
Chain 3 finished in 2.3 seconds.
Chain 4 finished in 2.3 seconds.

All 4 chains finished successfully.
Mean chain execution time: 2.3 seconds.
Total execution time: 2.5 seconds.

   user  system elapsed 
 13.144   1.453   8.222 

 64 distinct values of y

Running MCMC with 4 chains, at most 11 in parallel...

Chain 4 finished in 2.8 seconds.
Chain 2 finished in 3.2 seconds.
Chain 3 finished in 3.2 seconds.
Chain 1 finished in 3.8 seconds.

All 4 chains finished successfully.
Mean chain execution time: 3.2 seconds.
Total execution time: 3.8 seconds.

   user  system elapsed 
 17.293   1.536   9.915 

 128 distinct values of y

Running MCMC with 4 chains, at most 11 in parallel...

Chain 1 finished in 4.6 seconds.
Chain 2 finished in 4.7 seconds.
Chain 3 finished in 4.6 seconds.
Chain 4 finished in 4.6 seconds.

All 4 chains finished successfully.
Mean chain execution time: 4.6 seconds.
Total execution time: 4.8 seconds.

   user  system elapsed 
 23.692   1.599  11.760

8 Computing Environment

 R version 4.4.2 (2024-10-31)
 Platform: aarch64-apple-darwin20
 Running under: macOS Sequoia 15.4
 
 Matrix products: default
 BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
 LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
 
 attached base packages:
 [1] splines   stats4    stats     graphics  grDevices utils     datasets 
 [8] methods   base     
 
 other attached packages:
 [1] data.table_1.17.0   ordinal_2023.12-4.1 VGAM_1.1-13        
 [4] rmsb_1.1-2          rms_8.0-0           Hmisc_5.2-4

To cite R in publications use:

R Core Team (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

To cite the Hmisc package in publications use:

Harrell Jr F (2025). Hmisc: Harrell Miscellaneous. R package version 5.2-4, https://hbiostat.org/R/Hmisc/.

To cite the rms package in publications use:

Harrell Jr FE (2025). rms: Regression Modeling Strategies. R package version 8.0-0, https://github.com/harrelfe/rms, https://hbiostat.org/R/rms/.

To cite the rmsb package in publications use:

Harrell F (2025). rmsb: Bayesian Regression Modeling Strategies. R package version 1.1-2, https://hbiostat.org/R/rmsb/.

To cite the VGAM package in publications use:

Yee TW (2015). Vector Generalized Linear and Additive Models: With an Implementation in R. Springer, New York, USA.

To cite the data.table package in publications use:

Barrett T, Dowle M, Srinivasan A, Gorecki J, Chirico M, Hocking T, Schwendinger B, Krylov I (2025). data.table: Extension of 'data.frame'. R package version 1.17.0, https://CRAN.R-project.org/package=data.table.

9 References

Peterson, B., & Harrell, F. E. (1990). Partial proportional odds models for ordinal response variables. Appl Stat, 39, 205–217.

1 Overview and Setup

1.1 Running Fits Only When Something Changes

1.2 Priors

1.3 Example: 10-level Ordinal Outcome

1.4 Bayesian Wilcoxon Test

1.5 Binary Regression with Restricted Cubic Splines

2 Partial Proportional Odds Model

2.1 Unconstrained Partial PO Model

2.2 Constrained Partial PO Model

3 Longitudinal Data Examples: Random Effects

3.1 Schizophrenia Dataset from mixor Package

3.2 Simulated Random Effects Longitudinal Data

3.3 Absorbing State in Mixed Effects Ordinal Regression

4 Censored Data

5 Multiple Imputation

6 Scaling and Intepretation of Priors on \(\beta\)s

7 Speed of blrm For Large Numbers of Y Levels

8 Computing Environment

9 References

3.1 Schizophrenia Dataset from `mixor` Package

7 Speed of `blrm` For Large Numbers of Y Levels