options(rmsb.backend='cmdstan')
rmsb
Package Examples Using cmdstan
1 Overview and Setup
The rmsb
package is the Bayesian companion to the rms
package, and uses many of its functions for post-model-fitting analysis and graphics. The sole fitting function in rmsb
at present is blrm
. blrm
is for Bayesian binary and ordinal proportional odds logistic regression and the Peterson & Harrell (1990) partial proportional odds model that relaxes the proportional odds assumption. It is the analog and generalization of rms::lrm
and for ordinal responses is intended for outcomes with up to a several hundred ordinal levels. The Bayesian approach has a number of advantage over traditional frequentist models, including
- the use of exact calculations (to within simulation error) instead of large sample (e.g., normal theory) approximations to p-values and confidence intervals
- exact and more intuitive inference when random effects are included in the model
- the ability to make probability statements about parameters and combinations of parameters, which includes computations of the probabilities of assertions such as “the effect of \(x_1\) exceeds 1.2 and the effect of \(x_2\) exceeds 0.7”
- capturing more sources of uncertainty. For example, the
blrm
function automatically computes highest posterior density intervals on a variety of statistical indexes such as the Brier score and AUROC (\(c\)-index). Note: By default these intervals are computed using only 400 posterior draws to save time. For ablrm
fit objectf
you can specify how many samples to draw, to get more accurate intervals, by specifying for exampleprint(f, ns=2000)
. - the ability to incorporate external information or beliefs about linear combinations of parameters using prior distributions on contrasts
- by using Stan language to specify the likelihood components, one can not only do posterior distribution sampling but can quickly compute maximum likelihood estimates (Bayesian posterior modes)
blrm
uses uninformative priors for \(\beta\) parameters that are not used in contrasts appearing in the pcontrast
argument. For other \(\beta\)s, priors are put on contrasts specified in pcontrast
, capitalizing on Stan automatically propagating priors on linear combinations of parameters to priors on individual parameters. This allows priors to be specified on more of a user-friendly scale, and allows one to put priors on specific comparisons, nonlinear effects, interactions, parts of interactions, and more. This versatility is shown in a series of examples here. One example is the common situation in which there is a time \(\times\) treatment interaction and the prior for the treatment effect is needed to be applied to a specific time.
When there are random effects, the analyst may specify the mean of the exponential distribution that serves as the prior for the standard deviation of the random effects, with the default mean being 1.0, a reasonable number for the logit scale. You can also use a half-\(t\) distribution for the SD of random effects.
Regarding uncertainties about statistical indexes of model performance such as AUROC, the Bayesian posterior distributions computed by rmsb
account for coefficient uncertainty for the sample at hand’s ability to uncover the true unknown data generating coefficients and how strongly those coefficients can predict outcomes in the sample. The posterior distributions do not pertain to sample-to-sample variability of model fit or model validation.
blrm
uses Stan code written by Ben Goodrich of Columbia University and Frank Harrell. This code is precompiled when the rmsb
package is built when using rstan
for which you must install the rstan
package to use rmsb
. You can use cmdstan
and the cmdstanr
R package instead. This is generally recommended, as cmdstan
and cmdstanr
make use of the latest versions of Stan. Here are the steps I used for installing these on Linux, using the latest version of cmdstan
as of 2023-09-25 (2.33.1):
cd ~
sudo R
install.packages('cmdstanr', repos='https://mc-stan.org/r-packages',
lib='/usr/local/lib/R/site-library')
# Default location for cmdstan is in `~/.cmdstan` in the user's home directory
# This is not such a good idea
cmdstanr::install_cmdstan(cores=10, dir='/usr/local/bin')
# When running sudo, default is /root/.cmdstan which is not easily accessible
Add the following line in .zshrc
or .bashrc
so that cmdstanr
will know where to find cmdstan
:
export CMDSTAN=/usr/local/bin/cmdstan-2.34.1
Instead of that, I have the location defined in cmdstan.loc
in ~/.Rprofile
.
cmdstanr
will try to put compiled Stan code in the same directory where cmdstan
is stored. With the above setup, this location is not user-writeable, so it will not work. The blrm
function by default puts compiled code in the .rmsb
directory under the user’s home directory. Specify options(rmsbdir='some another location')
to change this. The default behavior puts all compiled code in a system-wide location. If you desire to archive a project and reproduce the result even if the Stan code inside rmsb
is updated, you may want to specify a project-specific location fo rmsbdir
.
Here is a typical setup code chunk for using rmsb
. When using rstan
, it is not recommended to use options(auto_write=TRUE)
as when running a series of regressions, the rstan
package tends to recompile Stan code even when the code doesn’t change.
```{r setup}
require(rmsb)
options(mc.cores = parallel::detectCores() - 1) # use max # CPUs less 1
```
If you want to use cmdstan
by default, avoiding having to use the backend
argument to blrm
, you can add this to the setup chunk:
This vignette is run separately to use rstan
instead of cmdstanr
. The variable wstan
below defines which system to run. So that the file=
arguments to blrm
can store results in different files, a little function rfile
is created below. It checks wstan
and when using cmdstan
adds "c"
to the end of the main part of the file name. rfile
also automatically quotes the base file name specified by the user.
<- c('cmdstan', 'rstan')[1]
wstan <- function(f)
rfile paste0(as.character(substitute(f)), if(wstan == 'cmdstan') 'c', '.rds')
require(rmsb)
knitrSet(lang='quarto', w=7, h=7)
options(prType='html') # for rms output: print, anova, summary, etc.
options(mc.cores = parallel::detectCores() - 1) # use max # CPUs less 1
if(wstan == 'cmdstan') {
Sys.getenv('CMDSTAN')
# cmdstanr::cmdstan_path()
# Sometimes RStudio doesn't find the environment variable
cat('cmdstan.loc=', cmdstan.loc, '\n')
::set_cmdstan_path(cmdstan.loc)
cmdstanroptions(rmsb.backend='cmdstan')
}
cmdstan.loc= /usr/local/bin/cmdstan-2.34.1
1.1 Running Fits Only When Something Changes
You’ll see file='...'
in the longer running of the blrm
calls below (actually the code uses rfile
just defined to construct the file name). If the file already exists and none of the data nor the options sent to rstan/cmdstanr
nor the underlying Stan code have changed from what were used to create the fit object stored in that file (as judged by their md5
hash), that saved fit object is returned immediately without running the rstan
code, often saving a great deal of execution time. This works well with the workflow of long R markdown reports making it so that only portions of Bayesian analysis that have changed are run. Note that using the knitr
cache=TRUE
option does not work well as the cache files for this script were about a GB in size, and the system does not accurately recognize when a model fit hasn’t changed and doesn’t need to be run when rstan
is involved.
A restored fit object does not contain the rstan
or cmdstanr
object, saving tens of megabytes of storage. Standard Stan diagnostics are stored in the fit object separately, and it is assumed that if the user wanted to run rstan::loo
loo=TRUE
would have been specified to blrm
so that loo
is run and its (small) result stored in the blrm
fit object. If you want to run pairs
to get more graphical diagnostics, intead of relying on the rstan
object always being available, specify pairs=TRUE
or pairs='filename.png'
to blrm
to graph the pair plots. The latter is recommended, and one can put knitr::include_graphics('filename.png')
in the R code chunk to render the graph in the report even if blrm
was not re-run.
When file='filename.rds'
is specified to blrm
and the file does not exist or analysis inputs have changed since the file was created, the blrm
fit object will be saved in saveRDS
.rds
binary format in your current working directory at the completion of blrm
’s work. The rstan
component is omitted from the saved file.
The utility function fitIf
available from here is another way to efficiently manage Bayesian simulation workflow. The fitIf
approach runs the analysis only if the file doesn’t already exist. If it exists but the data or the model have changed, the fitIf
approach is not intelligent enough to re-run the analysis (unlike the file='...'
approach above). Whether using fitIf
or file=
, you lose the ability to run the rms::stanDxplot(..., rstan=TRUE)
on restored objects, so the stanDxplot
function tries to find an existing trace plot image file corresponding to the current R markdown chunk when the fit object no longer has the rstan
component. For most purposes this doesn’t matter, because running using the defaults shows the needed non-burnin samples which are always stored in fit objects. The file=filename.rds
approach is preferred.
1.2 Priors
It is often difficult to specify priors for parameters, especially when there are nonlinear effects (e.g., splines) and interactions in the model. We need a way to specify priors on the original \(X\) and \(Y\) scales. Fortunately Stan
provides an elegant solution.
As discussed here Stan
allows one to specify priors on transformations of model parameters, and these priors propagate back to the original parameters. It is easier to specify a prior for the effect of increasing age from 30 to 60 that it is to specify a prior for the age slope. It may be difficult to specify a prior for an age \(\times\) treatment interaction, but much easier to specify a prior for how different the treatment effect is for a 30 year old and a 60 year old. By specifying priors on one or more contrasts one can easily encode outside information / information borrowing / shrinkage.
The rms
contrast
function provides a general way to implement contrasts up to double differences, and more details about computations are provided in that link. The approach used for specifying priors for contrast in rmsb::blrm
uses the same process but is even more general. Both contrast
and blrm
compute design matrices at user-specified predictor settings, and the contrast matrices (matrices multipled by \(\hat{\beta}\)) are simply differences in such design matrices. Thinking of contrasts as differences in predicted values frees the user from having to care about how parameters map to estimands, and allows an R predict(fit, type='x')
function do the hard work. See here for several interesting examples.
Beginning with rmsb
version 1.0-0, prior standard deviations are not specified for individual parameters, and priors are only specified for contrasts through the pcontrast
and npcontrast
arguments. Since blrm
transforms the design matrix through a QR decomposition to speed up posterior sampling, it also transforms the contrast matrices the same way so that priors for contrasts are in the right space inside the Stan
code. Parameters not used in any contrast will have non-informative prior distributions.
For non-proportional odds effects in a constrained partial PO model, priors are specified through contrasts using the npcontrast
argument, which is a list of the same structure as pcontrast
but corresponds to the model given to the second argument to blrm
.
Instead of specifying prior standard deviations directly, it is convenient to solve for SDs that correspond to a specified probability for a specified interval of an effect in the model. Effects for the PO model are usually expressed as odds ratios (OR). For the case where the prior median for the OR is 1.0 (prior mean or median log(OR)=0.0) it is useful to solve for the prior SD \(\sigma\) so that \(\Pr(\text{OR} > r) = a = \Pr(\text{OR} < \frac{1}{r})\), leading to \(a = \frac{|\log(r)|}{\Phi^{-1}(1-a)}\), computed by the psigma
function below. Another function .
is defined as an abbreviation for list()
for later usage with pcontrast
.
<- function(r, a, inline=FALSE, pr=! inline) {
psigma <- abs(log(r)) / qnorm(1 - a)
sigma <- if(r > 1.) '>' else '<'
dir <- if(inline) paste0('$\\Pr(\\text{OR}', dir, r, ') =', a,
x ' \\Rightarrow \\sigma=', round(sigma, 3), '$')
else paste0('Pr(OR ', dir, ' ', r, ') = ', a, ' ⇒ σ=', round(sigma, 3))
if(inline) return(x)
if(pr) {
cat('\n', x, '\n\n', sep='')
return(invisible(sigma))
}
sigma
}<- function(...) list(...) .
1.3 Example: 10-level Ordinal Outcome
Simulate a dataset with three predictors and one 10-level ordinal outcome. Run the frequentist proportional odds model then the Bayesian one.
set.seed(1)
<- 500
n <- runif(n, -1, 1)
x1 <- runif(n, -1, 1)
x2 <- sample(0 : 1, n, TRUE)
x3 <- x1 + 0.5 * x2 + x3 + rnorm(n)
y <- as.integer(cut2(y, g=10))
y <- datadist(x1, x2, x3); options(datadist='dd')
dd <- lrm(y ~ x1 + pol(x2, 2) + x3, eps=1e-7) # eps to check against Stan
f f
Logistic Regression Model
lrm(formula = y ~ x1 + pol(x2, 2) + x3, eps = 1e-07)
Frequencies of Responses
1 2 3 4 5 6 7 8 9 10 50 50 50 50 50 50 50 50 50 50
Model Likelihood Ratio Test |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 500 | LR χ2 235.04 | R2 0.379 | C 0.740 |
max |∂log L/∂β| 2×10-14 | d.f. 4 | R24,500 0.370 | Dxy 0.480 |
Pr(>χ2) <0.0001 | R24,495 0.373 | γ 0.480 | |
Brier 0.181 | τa 0.433 |
β | S.E. | Wald Z | Pr(>|Z|) | |
---|---|---|---|---|
y≥2 | 2.0552 | 0.1951 | 10.53 | <0.0001 |
y≥3 | 1.0832 | 0.1678 | 6.45 | <0.0001 |
y≥4 | 0.4006 | 0.1606 | 2.49 | 0.0126 |
y≥5 | -0.1790 | 0.1599 | -1.12 | 0.2629 |
y≥6 | -0.7246 | 0.1626 | -4.46 | <0.0001 |
y≥7 | -1.2867 | 0.1689 | -7.62 | <0.0001 |
y≥8 | -1.8982 | 0.1788 | -10.62 | <0.0001 |
y≥9 | -2.6288 | 0.1953 | -13.46 | <0.0001 |
y≥10 | -3.6408 | 0.2282 | -15.95 | <0.0001 |
x1 | 1.5694 | 0.1532 | 10.24 | <0.0001 |
x2 | 0.7901 | 0.1368 | 5.78 | <0.0001 |
x22 | -0.2213 | 0.2578 | -0.86 | 0.3908 |
x3 | 1.6341 | 0.1707 | 9.57 | <0.0001 |
Before getting posterior distributions of parameters, use Stan
to just get maximum likelihood estimates and compare them with those from lrm
. Do this for increasingly flat priors for the \(\beta\)s associated with x2
. Running method='optimizing'
is a quick way to study the effect of priors on the posterior modes for non-intercepts when there are no random effects in the model.
# Define a function that creates a `pcontrast` for `blrm`
# Skepticism of prior is specified by making changes in Y be small
# as x2 goes from -1 to 0 to 1
# Pr(OR > 2) = p
<- function(p) list(sd=psigma(2, p),
con c1=.(x2=0), c2=.(x2=-1), c3=.(x2=1), c4=.(x2=0),
contrast=expression(c1-c2, c3-c4))
<- NULL
k for(p in c(.01, .05, .1, .2)) {
<- blrm(y ~ x1 + pol(x2, 2) + x3, method='optimizing',
g pcontrast=con(p))
cat('-2 log likelihood:', g$deviance, '\n')
<- rbind(k, g$coefficients)
k }
Pr(OR > 2) = 0.01 ⇒ σ=0.298
Initial log joint probability = -1480.52
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
46 -1025.54 0.00435668 0.00391851 1 1 53
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.1 seconds.
-2 log likelihood: 2051.08
Pr(OR > 2) = 0.05 ⇒ σ=0.421
Initial log joint probability = -1320.96
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
37 -1024.13 0.00285089 0.00769745 1 1 44
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.1 seconds.
-2 log likelihood: 2048.26
Pr(OR > 2) = 0.1 ⇒ σ=0.541
Initial log joint probability = -1319.01
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
36 -1023.58 0.00285098 0.00490071 1 1 40
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.1 seconds.
-2 log likelihood: 2047.16
Pr(OR > 2) = 0.2 ⇒ σ=0.824
Initial log joint probability = -1372.21
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
39 -1023.35 0.00198249 0.00154428 0.8894 0.8894 46
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.1 seconds.
-2 log likelihood: 2046.7
k
y>=2 y>=3 y>=4 y>=5 y>=6 y>=7 y>=8
[1,] 1.985206 1.023936 0.3514663 -0.2191457 -0.7563312 -1.310259 -1.913104
[2,] 2.006682 1.041192 0.3647719 -0.2093801 -0.7498310 -1.306896 -1.913078
[3,] 2.020167 1.052687 0.3743067 -0.2015923 -0.7436885 -1.302302 -1.910143
[4,] 2.037455 1.067615 0.3869954 -0.1908566 -0.7348045 -1.295228 -1.905025
y>=9 y>=10 x1 x2 x2^2 x3
[1,] -2.632084 -3.630934 1.562997 0.5561742 -0.08257481 1.610746
[2,] -2.636508 -3.640498 1.564886 0.6512892 -0.12194449 1.620939
[3,] -2.635823 -3.642373 1.565875 0.6992980 -0.14707952 1.624125
[4,] -2.633275 -3.642765 1.567856 0.7480966 -0.18288428 1.629388
# Compare with ordinary MLEs and deviance
$deviance f
[1] 2302.585 2067.549
coef(f)
y>=2 y>=3 y>=4 y>=5 y>=6 y>=7 y>=8
2.0551529 1.0831900 0.4005868 -0.1790073 -0.7246249 -1.2866643 -1.8982180
y>=9 y>=10 x1 x2 x2^2 x3
-2.6287996 -3.6408392 1.5694325 0.7900802 -0.2212900 1.6340800
Fit the model with Dirichlet priors on intercepts and wide normal priors on the \(\beta\)s. Show the model fit summary. Note that the indexes of predictive discrimination/accuracy include 0.95 highest posterior density intervals. In frequentist inference we pretend that quantities such as AUROC and \(R^2\) are estimated without error, which is far from the case.
In several places you will see an index named Symmetry
. This is a measure of the symmetry of a posterior distribution. Values farther from 1.0 indicate asymmetry, which indicates that the use of standard errors and the use of a normal approximation for the posterior distribution are not justified. The symmetry index is the ratio of the gap between the posterior mean and the 0.95 quantile of the posterior distribution to the gap between the 0.05 quantile and the mean.
<- blrm(y ~ x1 + pol(x2, 2) + x3, file=rfile(bs)) bs
bs
Bayesian Proportional Odds Ordinal Logistic Model
Dirichlet Priors With Concentration Parameter 0.233 for Intercepts
blrm(formula = y ~ x1 + pol(x2, 2) + x3, file = rfile(bs))
Frequencies of Responses
1 2 3 4 5 6 7 8 9 10 50 50 50 50 50 50 50 50 50 50
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 500 | LOO log L -1046.84±13.18 | g 1.581 [1.331, 1.781] | C 0.739 [0.736, 0.74] |
Draws 4000 | LOO IC 2093.69±26.36 | gp 0.113 [0.084, 0.139] | Dxy 0.477 [0.471, 0.48] |
Chains 4 | Effective p 13.08±0.24 | EV 0.148 [0.104, 0.194] | |
Time 2s | B 0.077 [0.076, 0.079] | v 1.924 [1.322, 2.398] | |
p 4 | vp 0.014 [0.007, 0.02] |
Mode β | Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|---|
y≥2 | 2.0551 | 2.0668 | 2.0645 | 0.1974 | 1.6809 | 2.4586 | 1.0000 | 1.04 |
y≥3 | 1.0831 | 1.0877 | 1.0863 | 0.1669 | 0.7508 | 1.4021 | 1.0000 | 1.03 |
y≥4 | 0.4005 | 0.4022 | 0.4035 | 0.1614 | 0.0702 | 0.7001 | 0.9945 | 1.01 |
y≥5 | -0.1791 | -0.1802 | -0.1774 | 0.1564 | -0.4826 | 0.1330 | 0.1168 | 1.01 |
y≥6 | -0.7247 | -0.7286 | -0.7271 | 0.1580 | -1.0490 | -0.4344 | 0.0000 | 1.01 |
y≥7 | -1.2868 | -1.2925 | -1.2917 | 0.1657 | -1.6196 | -0.9700 | 0.0000 | 0.98 |
y≥8 | -1.8984 | -1.9054 | -1.9020 | 0.1754 | -2.2400 | -1.5597 | 0.0000 | 0.98 |
y≥9 | -2.6290 | -2.6387 | -2.6303 | 0.1937 | -3.0138 | -2.2525 | 0.0000 | 0.95 |
y≥10 | -3.6410 | -3.6566 | -3.6498 | 0.2267 | -4.1007 | -3.2076 | 0.0000 | 0.94 |
x1 | 1.5698 | 1.5789 | 1.5772 | 0.1530 | 1.2852 | 1.8862 | 1.0000 | 1.00 |
x2 | 0.7899 | 0.7969 | 0.7982 | 0.1431 | 0.5060 | 1.0706 | 1.0000 | 1.01 |
x22 | -0.2210 | -0.2216 | -0.2250 | 0.2534 | -0.7293 | 0.2736 | 0.1870 | 1.01 |
x3 | 1.6341 | 1.6417 | 1.6409 | 0.1666 | 1.3236 | 1.9725 | 1.0000 | 1.00 |
# Show more detailed analysis of model performance measures
blrmStats(bs, pl=TRUE)
Indexes computed for a random sample of 400 of 4000 posterior draws
gp, B, EV, and vp are for intercept 1 out of 9 intercepts
Dxy C g gp B EV v vp
Mean 0.477 0.738 1.580 0.115 0.077 0.149 1.920 0.014
SE 0.003 0.001 0.110 0.014 0.001 0.023 0.265 0.003
Lower 0.471 0.736 1.369 0.089 0.076 0.108 1.435 0.008
Upper 0.480 0.740 1.787 0.142 0.079 0.192 2.430 0.020
Symmetry 0.447 0.447 0.942 1.045 1.609 1.088 1.040 1.279
Dxy: 2*(C - 0.5) C: concordance probability
g: Gini mean |difference| on linear predictor (lp)
gp: Gini on predicted probability B: Brier score
EV: explained variation on prob. scale v: var(lp) vp: var(prob)
Show basic Stan diagnostics. Had stanDxplots(bs, rstan=TRUE)
been used, intercepts would have been shifted from what is in g
because of subtractions of covariate means before passing data to rstan
.
stanDxplot(bs)
stanDx(bs)
Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved
For each parameter, n_eff is a crude measure of effective sample size
and Rhat is the potential scale reduction factor on split chains
(at convergence, Rhat=1)
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete, no problems detected.
EBFMI: 1.046 1.056 1.024 0.986
Parameter Rhat ESS bulk ESS tail
1 alpha[1] 1.002 8165 3414
2 alpha[2] 1.001 7251 3211
3 alpha[3] 1.001 7722 3015
4 alpha[4] 1.001 8138 3105
5 alpha[5] 1.000 9198 3077
6 alpha[6] 1.001 9772 2680
7 alpha[7] 1.003 8624 2964
8 alpha[8] 1.000 6899 3241
9 alpha[9] 1.004 6955 3372
10 beta[1] 1.001 6679 3232
11 beta[2] 1.001 9296 2841
12 beta[3] 1.001 8582 2741
13 beta[4] 1.000 6528 3476
Here are the posterior distributions, calculated using kernel density estimates from posterior draws. Posterior models, shown as vertical lines, are parameter values that maximize the log posterior density (using the Stan optimizer in the original model fit) so do not necessarily coincide with the peak of the kernel density estimates.
plot(bs)
# Also show 2-d posterior density contour for two collinear terms
plot(bs, c('x2', 'x2^2'), bivar=TRUE) # assumes ellipse
plot(bs, c('x2', 'x2^2'), bivar=TRUE, bivarmethod='kernel') # kernel density
# Print frequentist side-by-side with Bayesian posterior mean, median, mode
cbind(MLE=coef(f), t(bs$param))
MLE mode mean median
y>=2 2.0551529 2.0550884 2.0667650 2.0644883
y>=3 1.0831900 1.0831184 1.0876989 1.0863129
y>=4 0.4005868 0.4004984 0.4021547 0.4035196
y>=5 -0.1790073 -0.1791006 -0.1801865 -0.1774422
y>=6 -0.7246249 -0.7247323 -0.7285530 -0.7270796
y>=7 -1.2866643 -1.2867886 -1.2924845 -1.2916686
y>=8 -1.8982180 -1.8983616 -1.9053575 -1.9020143
y>=9 -2.6287996 -2.6289616 -2.6387408 -2.6302800
y>=10 -3.6408392 -3.6410316 -3.6566111 -3.6498110
x1 1.5694325 1.5697733 1.5789189 1.5772318
x2 0.7900802 0.7898633 0.7969237 0.7981532
x2^2 -0.2212900 -0.2209874 -0.2215606 -0.2250196
x3 1.6340800 1.6341296 1.6416724 1.6409405
# Compare covariance matrix of posterior draws with MLE
round(diag(vcov(f)) / diag(vcov(bs)), 2)
y>=2 y>=3 y>=4 y>=5 y>=6 y>=7 y>=8 y>=9 y>=10 x1 x2 x2^2 x3
0.98 1.01 0.99 1.05 1.06 1.04 1.04 1.02 1.01 1.00 0.91 1.03 1.05
range(vcov(f) / vcov(bs))
[1] -1.414811 1.984495
Next show frequentist and Bayesian contrasts. For the Bayesian contrast the point estimate is the posterior mean, and the 0.95 highest posterior density interval is computed. Instead of a p-value, the posterior probability that the contrast is positive is computed.
contrast(f, list(x1=0, x3=1), list(x1=.25, x3=0))
x2 Contrast S.E. Lower Upper Z Pr(>|z|)
1 -0.001559955 1.241722 0.1720205 0.9045679 1.578876 7.22 0
Confidence intervals are 0.95 individual intervals
<- contrast(bs, list(x1=0:1, x3=1), list(x1=.25, x3=0))
k k
x2 Contrast S.E. Lower Upper Pr(Contrast>0)
1 -0.001559955 1.246943 0.1665426 0.9097081 1.563316 1
2 -0.001559955 2.825862 0.2131043 2.4133918 3.242280 1
Intervals are 0.95 highest posterior density intervals
Contrast is the posterior mean
For Bayesian contrasts we can also plot the posterior densities for the contrasts, and 2-d highest-density contour.
plot(k)
plot(k, bivar=TRUE) # applicable when exactly 2 contrasts
plot(k, bivar=TRUE, bivarmethod='kernel')
Compute posterior probabilities for various assertions about unknown true parameter values. The PostF
function is a function generator that effectively evaluates the assertion to a 0/1 value and computes the mean of these binary values over posterior draws. As is the case with inference about the quadratic effect of x2
below, when the assertion does not evaluate to a binary 0/1 or logical TRUE/FALSE
value, it is taken as a quantity that is derived from one or more model parameters, and a posterior density is drawn for the derived parameter. We use that to get a posterior distribution on the vertex of the quadratic x2
effect.
<- PostF(bs, pr=TRUE) # show new short legal R names P
Original Name Short Name
y>=2 a1
y>=3 a2
y>=4 a3
y>=5 a4
y>=6 a5
y>=7 a6
y>=8 a7
y>=9 a8
y>=10 a9
x1 b1
x2 b2
x2^2 b3
x3 b4
P(b3 > 0 & b1 > 1.5)
[1] 0.13625
P(b3 > 0)
[1] 0.187
P(abs(b3) < 0.25) # evidence for small |nonlinearity|
[1] 0.51025
mean(bs$draws[, 'x2^2'] > 0, na.rm=TRUE) # longhand calculation
[1] 0.187
# Plot posterior distribution for the vertex of the quadratic x2 effect
# This distribution should be wide because the relationship is linear
# (true value of b3 is zero)
plot(P(-b2 / (2 * b3)))
# Recreate the P function using original parameter names
# (which may not be legal R name)
<- PostF(bs, name='orig')
P P(`x2^2` > 0)
[1] 0.187
P(`x2^2` > 0 & x1 > 1.5)
[1] 0.13625
# Remove rstan results from fit. Compute savings in object size.
# Note: this will only be accurate when running the fits for
# the first time (not when restoring shortened forms of them from
# disc)
# Result: 33.8MB before, 0.5MB after
<- format(object.size(bs), 'MB')
s1 $rstan <- NULL
bs<- format(object.size(bs), 'MB')
s2 cat('Before:', s1, ' After:', s2, '\n')
Before: 0.5 Mb After: 0.5 Mb
1.4 Bayesian Wilcoxon Test
Since the proportional odds ordinal (PO) logistic model is a generalization of Wilcoxon/Kruskal-Wallis tests one can use Bayesian proportional odds regression to get the Bayesian equivalent to the Wilcoxon test. Even if not adjusting for covariates (impossible with the Wilcoxon test) there are advantages to putting this in a modeling framework as detailed in Section 7.6 of BBR. A major advantage is estimation ability. One can estimate group-specific means, quantiles, and exceedance probabilities. And Bayesian inference provides exact uncertainty intervals (highest posterior density intervals in what follows) for these.
The PO model does not require there to be any ties among the Y values, so it handles continuous data very well (the orm
function in the rms
package efficiently handles many thousands of distinct Y levels requiring many thousands of intercepts in the model). Let’s re-analyze the calprotectin data in Section 7.3.1 of BBR to mimic the frequentist PO analysis in Section 7.6.
# Fecal Calprotectin: 2500 is above detection limit
# When detection limits occur at a single value, the PO
# model easily handles this in terms of estimating group
# differences (but not for estimating the mean Y)
<- c(2500, 244, 2500, 726, 86, 2500, 61, 392, 2500, 114, 1226,
calpro 2500, 168, 910, 627, 2500, 781, 57, 483, 30, 925, 1027,
2500, 2500, 38, 18)
# Endoscopy score: 1 = No/Mild, 2=Mod/Severe Disease
# Would have been far better to code dose as 4 ordinal levels
<- c(2, 1, 2, 2, 2, 1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 1, 2,
endo 2, 2, 2, 2, 2, 1, 1)
<- factor(endo, 1 : 2,
endo c("No or Mild Activity", "Moderate or Severe Activity"))
<- datadist(endo, calpro); options(datadist='dd')
dd <- blrm(calpro ~ endo, file=rfile(bcalpro)) bcalpro
print(bcalpro, intercepts=TRUE)
Bayesian Proportional Odds Ordinal Logistic Model
Dirichlet Priors With Concentration Parameter 0.134 for Intercepts
blrm(formula = calpro ~ endo, file = rfile(bcalpro))
Frequencies of Responses
18 30 38 57 61 86 114 168 244 392 483 627 726 781 910 925 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1027 1226 2500 1 1 8
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 26 | LOO log L -96.27±8.93 | g 1.335 [0.526, 2.101] | C 0.854 [0.854, 0.854] |
Draws 4000 | LOO IC 192.54±17.85 | gp 0.048 [0.001, 0.137] | Dxy 0.708 [0.708, 0.708] |
Chains 4 | Effective p 33.18±3.99 | EV 0.069 [0.001, 0.215] | |
Time 0.6s | B 0.037 [0.034, 0.047] | v 2.204 [0.211, 4.674] | |
p 1 | vp 0.005 [0, 0.021] |
Mode β | Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|---|
y≥30 | 2.0969 | 2.4310 | 2.2632 | 1.2174 | 0.4298 | 5.0078 | 0.9955 | 1.46 |
y≥38 | 1.3395 | 1.4116 | 1.3679 | 0.8412 | -0.1289 | 3.1941 | 0.9675 | 1.20 |
y≥57 | 0.8678 | 0.8578 | 0.8321 | 0.7258 | -0.5813 | 2.3007 | 0.8862 | 1.10 |
y≥61 | 0.4733 | 0.4196 | 0.4195 | 0.6801 | -0.8435 | 1.8342 | 0.7305 | 1.06 |
y≥86 | 0.1122 | 0.0320 | 0.0398 | 0.6747 | -1.2452 | 1.4131 | 0.5245 | 0.98 |
y≥114 | -0.1956 | -0.2875 | -0.2733 | 0.6860 | -1.5930 | 1.0995 | 0.3425 | 0.94 |
y≥168 | -0.4710 | -0.5852 | -0.5730 | 0.6907 | -1.8922 | 0.8220 | 0.2023 | 0.94 |
y≥244 | -0.7653 | -0.8995 | -0.8936 | 0.7197 | -2.2990 | 0.5434 | 0.1065 | 0.91 |
y≥392 | -1.0953 | -1.2446 | -1.2174 | 0.7591 | -2.6415 | 0.2876 | 0.0412 | 0.91 |
y≥483 | -1.4156 | -1.5833 | -1.5405 | 0.8185 | -3.1823 | -0.0641 | 0.0177 | 0.90 |
y≥627 | -1.6849 | -1.8630 | -1.8120 | 0.8500 | -3.4730 | -0.2315 | 0.0073 | 0.90 |
y≥726 | -1.9227 | -2.1004 | -2.0721 | 0.8699 | -3.8808 | -0.5029 | 0.0037 | 0.87 |
y≥781 | -2.1400 | -2.3255 | -2.2953 | 0.8868 | -4.1110 | -0.6828 | 0.0022 | 0.87 |
y≥910 | -2.3439 | -2.5394 | -2.5013 | 0.8986 | -4.3217 | -0.8495 | 0.0010 | 0.87 |
y≥925 | -2.5396 | -2.7495 | -2.7185 | 0.9199 | -4.6301 | -1.0576 | 0.0008 | 0.89 |
y≥1027 | -2.7313 | -2.9610 | -2.9153 | 0.9341 | -4.7583 | -1.1651 | 0.0005 | 0.88 |
y≥1226 | -2.9224 | -3.1735 | -3.1371 | 0.9486 | -4.9342 | -1.2973 | 0.0005 | 0.88 |
y≥2500 | -3.1166 | -3.3893 | -3.3525 | 0.9573 | -5.1991 | -1.5130 | 0.0000 | 0.89 |
endo=Moderate or Severe Activity | 2.7586 | 2.9293 | 2.8951 | 0.9891 | 1.0156 | 4.8402 | 0.9992 | 1.11 |
# print.blrm defaults to not showing intercepts if more than 9 of them
summary(bcalpro)
Effects Response: calpro |
|||||||
---|---|---|---|---|---|---|---|
Low | High | Δ | Effect | S.E. | Lower 0.95 | Upper 0.95 | |
endo --- No or Mild Activity:Moderate or Severe Activity | 2 | 1 | -2.92900 | 0.9891 | -4.840000 | -1.0160 | |
Odds Ratio | 2 | 1 | 0.05343 | 0.007906 | 0.3622 |
One can see that the posterior probability of a positive group difference exceeds 0.99.
Now compute the posterior mean estimate of the mean and median calprotectin levels for the unknown data generating process, stratified by group and compare with sample estimates.
# Sample estimates
tapply(calpro, endo, mean)
No or Mild Activity Moderate or Severe Activity
400.000 1372.944
tapply(calpro, endo, median)
No or Mild Activity Moderate or Severe Activity
87.5 976.0
# Now compute estimates and 0.95 HPD intervals assuming PO
# The first method is exact
<- data.frame(endo=levels(endo))
newdata <- Mean(bcalpro)
bar predict(bcalpro, newdata, fun=bar)
$linear.predictors
[1] 307.3402 1347.6587
$lower
[1] 72.04265 933.21485
$upper
[1] 609.484 1773.567
<- Quantile(bcalpro)
quant <- function(lp, ...) quant(lp=lp, ...)
med Predict(bcalpro, endo, fun=med)
endo yhat lower upper
1 No or Mild Activity 92.58508 18.0000 226.9042
2 Moderate or Severe Activity 984.15874 591.3656 1630.5648
Response variable (y):
Limits are 0.95 confidence limits
The contrast
function in rms
now allows one to get posterior distributions of differences in nonlinearly transformed parameters, as follows.
<- contrast(bcalpro, list(endo=levels(endo)[1]),
k list(endo=levels(endo)[2]), fun=bar)
k
Posterior Summaries for First X Settings
Posterior Mean Posterior Median Lower 0.95 HPD Upper 0.95 HPD
1 307.3402 271.7195 72.04265 609.484
Posterior Summaries for Second X Settings
Posterior Mean Posterior Median Lower 0.95 HPD Upper 0.95 HPD
1 1347.659 1345.319 933.2149 1773.567
Posterior Summaries of First - Second
Posterior Mean Posterior Median Lower 0.95 HPD Upper 0.95 HPD
1 -1040.318 -1047.142 -1558.114 -523.2424
plot(k, which='diff')
Peeking ahead a later section, we can use the constrained partial proportional odds model to assess the proportional odds assumption. Let’s assume that departures from proportional odds (constant increments in log odds) are modeled as linear in square root of calprotectin level. There is only one predictor (endo
), so there is only one variable that might act non-proportionally. Hence the second formula has the same right-hand-side as the first formula in the blrm
call.
<- blrm(calpro ~ endo, ~ endo, cppo=sqrt) bcalp
Initial log joint probability = -40.16
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
Error evaluating model log probability: Non-finite function evaluation.
Error evaluating model log probability: Non-finite function evaluation.
Error evaluating model log probability: Non-finite function evaluation.
99 -26.2443 0.0046237 0.150498 0.5264 0.5264 124
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
199 -26.2301 0.00151754 0.00187871 0.08565 0.9691 261
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
258 -26.23 0.0210887 0.00244513 1 1 339
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.1 seconds.
Running MCMC with 4 chains, at most 11 in parallel...
Chain 1 finished in 0.9 seconds.
Chain 2 finished in 0.8 seconds.
Chain 3 finished in 0.9 seconds.
Chain 4 finished in 0.9 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.9 seconds.
Total execution time: 1.1 seconds.
# use cppo=function(y) y for linear x by y interaction
bcalp
Bayesian Constrained Partial Proportional Odds Ordinal Logistic Model
Dirichlet Priors With Concentration Parameter 0.134 for Intercepts
blrm(formula = calpro ~ endo, ppo = ~endo, cppo = sqrt)
Frequencies of Responses
18 30 38 57 61 86 114 168 244 392 483 627 726 781 910 925 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1027 1226 2500 1 1 8
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 26 | LOO log L -82.06±9.35 | g 2.003 [0.884, 3.194] | C 0.854 [0.854, 0.854] |
Draws 4000 | LOO IC 164.13±18.7 | gp 0.062 [0.001, 0.17] | Dxy 0.708 [0.708, 0.708] |
Chains 4 | Effective p 18.6±3.26 | EV 0.102 [0.001, 0.282] | |
Time 1.4s | B 0.038 [0.034, 0.056] | v 4.908 [0.388, 10.535] | |
p 1 | vp 0.007 [0, 0.033] |
Mode β | Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|---|
endo=Moderate or Severe Activity | 4.0092 | 5.1964 | 5.0990 | 1.4119 | 2.5152 | 8.0632 | 1.0000 | 1.15 |
endo=Moderate or Severe Activity x f(y) | -0.0597 | -0.1108 | -0.1072 | 0.0444 | -0.1970 | -0.0281 | 0.0005 | 0.79 |
The probability that the effect of moderate or severe activity decreases for higher levels of calprotectin is 0.99. Let’s graph the logit of the group-stratified empirical CDFs to see some visual evidence for this.
Ecdf(~ calpro, group=endo, fun=qlogis)
Non-parallelism is readily seen, indicating non-proportional odds. Note that allowing for a systematic departure from proportional odds is like having unequal group variances in a 2-sample \(t\)-test, but is more general.
With the constrained partial proportional odds model one can examine for a given y the evidence for a group effect in the tail of the distribution of Y beyond y. For ordinal logistic models the group effect is captured by the relative odds that Y \(\geq\) y. We can use the rms
contrast
function to do the calculations for varying y. contrast
will evaluate the cppo
function as needed (which is a scaled and centered \(\sqrt{}\)).
<- seq(100, 2500, by=100)
ys <- contrast(bcalp, list(endo='Moderate or Severe Activity'),
k list(endo='No or Mild Activity'), ycut=ys)
<- 'Calprotectin Cutoff'
xl par(mfrow=c(1,2))
with(k, plot(ycut, Contrast, xlab=xl, ylab='log OR', type='l'))
with(k, plot(ycut, PP, xlab=xl, ylab='P(OR > 1)', type='l'))
1.5 Binary Regression with Restricted Cubic Splines
Turn to the support
dataset and fit a binary logistic model to predict the probability of in-hospital death of critically ill adults. blrm
keeps posterior sampling efficient by orthonormalizing the design matrix before doing the sampling (this is done internally in the Stan code). This allows for arbitrary collinearities, for example in the basis functions used in restricted cubic splines. When there are such collinearities, expect to see some disagreements in estimates between blrm
and lrm
, because the latter does not do orthonormalization (only normalization to mean 0 variance 1). Collinearity implies that there are many different solutions to the equations, all giving almost the same predicted values.
getHdata(support)
<- datadist(support); options(datadist='dd')
dd <- lrm(hospdead ~ dzgroup + rcs(crea, 5) + rcs(meanbp, 5),
f data=support, eps=1e-4, x=TRUE, y=TRUE)
specs(f)
lrm(formula = hospdead ~ dzgroup + rcs(crea, 5) + rcs(meanbp,
5), data = support, x = TRUE, y = TRUE, eps = 1e-04)
Label Assumption
dzgroup dzgroup category
crea Serum creatinine Day 3 rcspline
meanbp Mean Arterial Blood Pressure Day 3 rcspline
Parameters
dzgroup ARF/MOSF w/Sepsis COPD CHF Cirrhosis Coma Colon Cancer Lung Cancer MOSF w/Malig
crea 0.59998 0.8999 1.2 1.7998 5.5996
meanbp 47 65.725 78 106 128.05
d.f.
dzgroup 7
crea 4
meanbp 4
f
Logistic Regression Model
lrm(formula = hospdead ~ dzgroup + rcs(crea, 5) + rcs(meanbp, 5), data = support, x = TRUE, y = TRUE, eps = 1e-04)Frequencies of Missing Values Due to Each Variable
hospdead dzgroup crea meanbp 0 0 3 0
Model Likelihood Ratio Test |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 997 | LR χ2 278.86 | R2 0.360 | C 0.820 |
0 744 | d.f. 15 | R215,997 0.233 | Dxy 0.640 |
1 253 | Pr(>χ2) <0.0001 | R215,566.4 0.372 | γ 0.641 |
max |∂log L/∂β| 3×10-9 | Brier 0.138 | τa 0.243 |
β | S.E. | Wald Z | Pr(>|Z|) | |
---|---|---|---|---|
Intercept | 6.9789 | 1.5068 | 4.63 | <0.0001 |
dzgroup=COPD | -0.6235 | 0.3382 | -1.84 | 0.0653 |
dzgroup=CHF | -2.3452 | 0.4855 | -4.83 | <0.0001 |
dzgroup=Cirrhosis | 0.3552 | 0.3626 | 0.98 | 0.3273 |
dzgroup=Coma | 1.6365 | 0.3273 | 5.00 | <0.0001 |
dzgroup=Colon Cancer | -0.9825 | 0.6296 | -1.56 | 0.1187 |
dzgroup=Lung Cancer | 0.0808 | 0.3424 | 0.24 | 0.8135 |
dzgroup=MOSF w/Malig | 0.7160 | 0.2664 | 2.69 | 0.0072 |
crea | -2.9954 | 1.1432 | -2.62 | 0.0088 |
crea' | 261.5451 | 94.6489 | 2.76 | 0.0057 |
crea'' | -576.5474 | 224.5354 | -2.57 | 0.0102 |
crea''' | 338.1780 | 149.7214 | 2.26 | 0.0239 |
meanbp | -0.1035 | 0.0231 | -4.47 | <0.0001 |
meanbp' | 0.2508 | 0.1923 | 1.30 | 0.1922 |
meanbp'' | -0.4008 | 0.7294 | -0.55 | 0.5827 |
meanbp''' | 0.0499 | 0.7022 | 0.07 | 0.9433 |
# Compute the apparent standard error of Dxy (not accounting for overfitting)
# for comparison with the Bayesian HPD interval for Dxy
rcorr.cens(predict(f), support$hospdead)
C Index Dxy S.D. n missing
8.202431e-01 6.404862e-01 2.998483e-02 9.970000e+02 3.000000e+00
uncensored Relevant Pairs Concordant Uncertain
9.970000e+02 3.764640e+05 3.087920e+05 0.000000e+00
Function(f)
function (dzgroup = "ARF/MOSF w/Sepsis", crea = 1.1999512, meanbp = 78)
{
6.9789403 - 0.62348171 * (dzgroup == "COPD") - 2.3451715 *
(dzgroup == "CHF") + 0.35516492 * (dzgroup == "Cirrhosis") +
1.6364993 * (dzgroup == "Coma") - 0.98246981 * (dzgroup ==
"Colon Cancer") + 0.08076708 * (dzgroup == "Lung Cancer") +
0.71602209 * (dzgroup == "MOSF w/Malig") - 2.9954464 *
crea + 10.463337 * pmax(crea - 0.59997559, 0)^3 - 23.065276 *
pmax(crea - 0.89990234, 0)^3 + 13.529103 * pmax(crea -
1.1999512, 0)^3 - 0.90432144 * pmax(crea - 1.7998047,
0)^3 - 0.022843078 * pmax(crea - 5.5996094, 0)^3 - 0.10352776 *
meanbp + 3.8184801e-05 * pmax(meanbp - 47, 0)^3 - 6.1008621e-05 *
pmax(meanbp - 65.725, 0)^3 + 7.5995675e-06 * pmax(meanbp -
78, 0)^3 + 1.4835639e-05 * pmax(meanbp - 106, 0)^3 +
3.8861354e-07 * pmax(meanbp - 128.05, 0)^3
}
<environment: 0x3489205f8>
<- blrm(hospdead ~ dzgroup + rcs(crea, 5) + rcs(meanbp, 5),
bsup data=support, file=rfile(bsup))
specs(bsup)
blrm(formula = hospdead ~ dzgroup + rcs(crea, 5) + rcs(meanbp,
5), data = support, file = rfile(bsup))
Label Assumption
dzgroup dzgroup category
crea Serum creatinine Day 3 rcspline
meanbp Mean Arterial Blood Pressure Day 3 rcspline
Parameters
dzgroup ARF/MOSF w/Sepsis COPD CHF Cirrhosis Coma Colon Cancer Lung Cancer MOSF w/Malig
crea 0.59998 0.8999 1.2 1.7998 5.5996
meanbp 47 65.725 78 106 128.05
d.f.
dzgroup 7
crea 4
meanbp 4
Function(bsup) # by default uses posterior mode parameter values
function (dzgroup = "ARF/MOSF w/Sepsis", crea = 1.1999512, meanbp = 78)
{
7.2091072 - 0.65111303 * (dzgroup == "COPD") - 2.4606359 *
(dzgroup == "CHF") + 0.34510828 * (dzgroup == "Cirrhosis") +
1.6721639 * (dzgroup == "Coma") - 1.11907 * (dzgroup ==
"Colon Cancer") + 0.063989481 * (dzgroup == "Lung Cancer") +
0.73134427 * (dzgroup == "MOSF w/Malig") - 3.0387295 *
crea + 10.621004 * pmax(crea - 0.59997559, 0)^3 - 23.411858 *
pmax(crea - 0.89990234, 0)^3 + 13.730783 * pmax(crea -
1.1999512, 0)^3 - 0.91662855 * pmax(crea - 1.7998047,
0)^3 - 0.023300522 * pmax(crea - 5.5996094, 0)^3 - 0.10730655 *
meanbp + 4.0697434e-05 * pmax(meanbp - 47, 0)^3 - 6.7903832e-05 *
pmax(meanbp - 65.725, 0)^3 + 1.2293481e-05 * pmax(meanbp -
78, 0)^3 + 1.443495e-05 * pmax(meanbp - 106, 0)^3 + 4.7796827e-07 *
pmax(meanbp - 128.05, 0)^3
}
<environment: 0x34c7cc478>
# To add an intercept use e.g. Function(bsup, intercept=coef(g, 'mode')[5])
bsup
Bayesian Logistic Model
Dirichlet Priors With Concentration Parameter 0.541 for Intercepts
blrm(formula = hospdead ~ dzgroup + rcs(crea, 5) + rcs(meanbp, 5), data = support, file = rfile(bsup))Frequencies of Missing Values Due to Each Variable
hospdead dzgroup crea meanbp 0 0 3 0
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 997 | LOO log L -442.1±18.39 | g 1.822 [1.538, 2.09] | C 0.813 [0.808, 0.819] |
0 744 | LOO IC 884.2±36.78 | gp 0.248 [0.23, 0.273] | Dxy 0.626 [0.615, 0.638] |
1 253 | Effective p 16.62±1.16 | EV 0.29 [0.243, 0.332] | |
Draws 4000 | B 0.14 [0.138, 0.142] | v 2.836 [1.892, 3.679] | |
Chains 4 | vp 0.055 [0.046, 0.064] | ||
Time 4.1s | |||
p 15 |
Mode β | Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|---|
Intercept | 6.9786 | 7.2091 | 7.1906 | 1.5247 | 4.2257 | 10.1458 | 1.0000 | 1.06 |
dzgroup=COPD | -0.6234 | -0.6511 | -0.6424 | 0.3425 | -1.3636 | -0.0177 | 0.0257 | 0.93 |
dzgroup=CHF | -2.3458 | -2.4606 | -2.4243 | 0.5227 | -3.4718 | -1.4389 | 0.0000 | 0.82 |
dzgroup=Cirrhosis | 0.3552 | 0.3451 | 0.3482 | 0.3728 | -0.4583 | 1.0186 | 0.8250 | 0.97 |
dzgroup=Coma | 1.6366 | 1.6722 | 1.6673 | 0.3451 | 1.0172 | 2.3684 | 1.0000 | 1.05 |
dzgroup=Colon Cancer | -0.9832 | -1.1191 | -1.0664 | 0.6817 | -2.4735 | 0.1212 | 0.0318 | 0.80 |
dzgroup=Lung Cancer | 0.0809 | 0.0640 | 0.0647 | 0.3401 | -0.5677 | 0.7668 | 0.5762 | 0.97 |
dzgroup=MOSF w/Malig | 0.7161 | 0.7313 | 0.7280 | 0.2786 | 0.1904 | 1.2858 | 0.9968 | 1.02 |
crea | -2.9954 | -3.0387 | -3.0476 | 1.1083 | -5.1718 | -0.8517 | 0.0020 | 0.97 |
crea' | 261.5393 | 265.4862 | 264.2944 | 92.0751 | 88.3029 | 439.4957 | 0.9985 | 1.01 |
crea'' | -576.5328 | -585.2107 | -582.6774 | 219.0139 | -986.1243 | -151.2353 | 0.0032 | 0.98 |
crea''' | 338.1678 | 343.2193 | 343.3486 | 146.6114 | 56.2586 | 620.0492 | 0.9912 | 1.01 |
meanbp | -0.1035 | -0.1073 | -0.1065 | 0.0237 | -0.1524 | -0.0605 | 0.0000 | 0.92 |
meanbp' | 0.2508 | 0.2673 | 0.2641 | 0.1984 | -0.1096 | 0.6511 | 0.9105 | 1.02 |
meanbp'' | -0.4006 | -0.4461 | -0.4350 | 0.7531 | -1.8836 | 1.0370 | 0.2865 | 0.98 |
meanbp''' | 0.0497 | 0.0808 | 0.0832 | 0.7253 | -1.2647 | 1.5434 | 0.5398 | 1.02 |
stanDx(bsup)
Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved
For each parameter, n_eff is a crude measure of effective sample size
and Rhat is the potential scale reduction factor on split chains
(at convergence, Rhat=1)
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete, no problems detected.
EBFMI: 1.019 1.044 0.929 1.026
Parameter Rhat ESS bulk ESS tail
1 alpha[1] 1.001 3903 2842
2 beta[1] 1.002 6691 2883
3 beta[2] 1.001 4000 2502
4 beta[3] 1.000 7157 2958
5 beta[4] 1.001 5485 2940
6 beta[5] 0.999 6273 2876
7 beta[6] 1.003 7425 3154
8 beta[7] 1.001 7707 3274
9 beta[8] 1.000 6137 2834
10 beta[9] 1.002 7197 2994
11 beta[10] 1.003 6880 3251
12 beta[11] 1.001 7389 3140
13 beta[12] 1.000 5268 3267
14 beta[13] 1.000 4528 3135
15 beta[14] 1.001 4935 2923
16 beta[15] 1.000 5609 3420
stanDxplot(bsup)
plot(bsup)
Show approximate relative explained variation (REV) and compare this with Wald statistics from the frequentist lrm
model.
plot(rexVar(bsup, support))
plot(anova(f, test='LR'), 'proportion chisq')
singular information matrix in lrm.fit (rank= 12 ). Offending variable(s):
crea'''
Note: To get a Bayesian equivalent of a likelihood ratio test for comparing two models use the rms
function compareBmods
.
Now compute odds ratios over default inter-quartile ranges for continuous predictors, based on posterior mode parameters. Also show 0.95 HPD intervals. Note that unlike the print
method, the plot
method for summary
doesn’t actually compute HPD intervals, but approximates them by assuming normality and using the standard deviation of the posterior samples. Compare the plot with the ordinary lrm
result.
<- summary(bsup)
s s
Effects Response: hospdead |
|||||||
---|---|---|---|---|---|---|---|
Low | High | Δ | Effect | S.E. | Lower 0.95 | Upper 0.95 | |
crea | 0.8999 | 1.9 | 1.00 | 1.30100 | 0.2522 | 0.83160 | 1.80000 |
Odds Ratio | 0.8999 | 1.9 | 1.00 | 3.67300 | 2.29700 | 6.04700 | |
meanbp | 64.7500 | 107.0 | 42.25 | -0.44560 | 0.2295 | -0.87970 | 0.01253 |
Odds Ratio | 64.7500 | 107.0 | 42.25 | 0.64040 | 0.41490 | 1.01300 | |
dzgroup --- COPD:ARF/MOSF w/Sepsis | 1.0000 | 2.0 | -0.65110 | 0.3425 | -1.36400 | -0.01766 | |
Odds Ratio | 1.0000 | 2.0 | 0.52150 | 0.25570 | 0.98250 | ||
dzgroup --- CHF:ARF/MOSF w/Sepsis | 1.0000 | 3.0 | -2.46100 | 0.5227 | -3.47200 | -1.43900 | |
Odds Ratio | 1.0000 | 3.0 | 0.08538 | 0.03106 | 0.23720 | ||
dzgroup --- Cirrhosis:ARF/MOSF w/Sepsis | 1.0000 | 4.0 | 0.34510 | 0.3728 | -0.45830 | 1.01900 | |
Odds Ratio | 1.0000 | 4.0 | 1.41200 | 0.63240 | 2.76900 | ||
dzgroup --- Coma:ARF/MOSF w/Sepsis | 1.0000 | 5.0 | 1.67200 | 0.3451 | 1.01700 | 2.36800 | |
Odds Ratio | 1.0000 | 5.0 | 5.32400 | 2.76600 | 10.68000 | ||
dzgroup --- Colon Cancer:ARF/MOSF w/Sepsis | 1.0000 | 6.0 | -1.11900 | 0.6817 | -2.47300 | 0.12120 | |
Odds Ratio | 1.0000 | 6.0 | 0.32660 | 0.08429 | 1.12900 | ||
dzgroup --- Lung Cancer:ARF/MOSF w/Sepsis | 1.0000 | 7.0 | 0.06399 | 0.3401 | -0.56770 | 0.76680 | |
Odds Ratio | 1.0000 | 7.0 | 1.06600 | 0.56680 | 2.15300 | ||
dzgroup --- MOSF w/Malig:ARF/MOSF w/Sepsis | 1.0000 | 8.0 | 0.73130 | 0.2786 | 0.19040 | 1.28600 | |
Odds Ratio | 1.0000 | 8.0 | 2.07800 | 1.21000 | 3.61800 |
plot(s)
plot(summary(bsup))
Draw partial effect plots with 0.95 HPD intervals. Point estimates are posterior modes (which can be easily changed).
ggplot(Predict(bsup))
Compute estimated mortality probabilities at all levels of dzgroup
adjusting covariates to medians/modes. Need funint=FALSE
to tell Predict
that fun
is a simple 1-1 function of the linear predictor (unlike Mean
, Quantile
, etc.).
Predict(bsup, dzgroup, fun=plogis, funint=FALSE)
dzgroup crea meanbp yhat lower upper
1 ARF/MOSF w/Sepsis 1.199951 78 0.11632147 0.063245920 0.17586539
2 COPD 1.199951 78 0.06614979 0.023049916 0.11084229
3 CHF 1.199951 78 0.01236042 0.001862634 0.02504638
4 Cirrhosis 1.199951 78 0.15934393 0.064660453 0.26115460
5 Coma 1.199951 78 0.40782523 0.223143658 0.57618452
6 Colon Cancer 1.199951 78 0.04788483 0.003750662 0.10476836
7 Lung Cancer 1.199951 78 0.12457605 0.059036390 0.20223857
8 MOSF w/Malig 1.199951 78 0.21530408 0.100855426 0.32769271
Response variable (y):
Adjust to: crea=1.2 meanbp=78
Limits are 0.95 confidence limits
Draw a nomogram from posterior mode parameter values.
<- nomogram(bsup, fun=plogis, funlabel='P(death)')
p plot(p)
For comparison here is a nomogram based on maximum likelihood estimates of parameters rather than posterior modes.
plot(nomogram(f, fun=plogis, funlabel='P(death)'))
2 Partial Proportional Odds Model
The proportional odds (PO) assumption is a parallelism assumption reflecting the belief that the effect of baseline variables on, say, \(Y \geq 3\) is the same effect on \(Y \geq 4\). To relax that assumption, Peterson & Harrell (1990) developed the partial proportional odds model (PPO) model. The blrm
function accepts a second model formula in the argument named ppo
that specifies the subset of predictors for which PO is not to be assumed but for which the model is effectively polytomous (multinomial). Note that for frequentist modeling the R VGAM
package handles the PPO model (as will be shown below). VGAM
is more flexible than what blrm
can do in allowing for all sorts of model restrictions.
The presence of the second formula triggers fitting a PPO model. The default is the unconstrained PPO model, which has \((k - 2) \times q\) extra parameter for \(k\) category \(Y\) and \(q\) equal to the number of columns in the induced design matrix from the second formula. This is normally too many parameters. More typically, the constrained PPO model (Peterson & Harrell (1990) Eq. 6) is used. This model is fitted when a function is provided as the cppo
argument. Generalizing Peterson and Harrell Eq. 6 the cppo
function can be a continuous function of \(Y\) as well as being a discontinuous function such as an indicator variable that allows a category of \(Y\) (typically the first or last) to have a special effect of selected covariates. When \(Y\) is continuous, cppo
would typically be the continuous value of \(Y\) or a monotonic function of it. This induces a model that is akin to having a treatment \(\times\) time interaction in a Cox proportional hazards model, or systematic heteroskedasticity in a linear model. When Y is very skewed, it may be more reasonable to use something like cppo = function(y) y^(1/3)
. Note that post-fitting functions for estimation and prediction are not implemented for the unconstrained partial PO model.
Note that the cppo
function used to trigger the use of the constrained PPO model is never evaluated at the lowest value of Y. That is because in the data likelihood, the probability element for observations at the minimum Y is one minus the probability element at the second lowest value of Y. So you can’t give a special effect at the minimum Y. If for example you have a scale ordered as death, sick, well and you want to allow for a special effect of a covariate on death, reverse the order of levels in the factor variable representing Y and specify cppo=function(y) y == 'dead'
.
2.1 Unconstrained Partial PO Model
Let’s look first at the unconstrained PPO model, which is more suitable for categorical Y with not too many categories. Consider a \(2\times 3\) table of proportions (2 treatment groups, 3 ordered outcome levels) where the treatment effect is not in PO. We will fit a PO model and see how well it tries to reconstruct the 6 proportions, then fit a PPO model. As of 2020-05-11 blrm
has not implemented predict
-type functions for PPO models, so you will se predictions done the long way (which does better show how PPO works). Note: the VGAM
vgam
function parameterizes PPO effects by \(y\)-specific covariate effects whereas blrm
like Peterson and Harrell parameterize the model by estimating increments in log odds for the covariate effect for \(y \geq y\) over and above the effect for \(y \geq 2\) if \(y\) has values 1, 2, …, \(k\). Bayesian model specification is detailed here.
Not shown here is that blrm
also allows for random effects with PPO models to handle longitudinal data if a compound symmetry correlation pattern is reasonable.
<- c(.4, .2, .4)
p0 <- c(.3, .1, .6)
p1 <- 50 # observations per cell
m <- p0 * m # from proportions to frequencies
m0 <- p1 * m
m1 <- c(rep(0, m), rep(1, m))
x <- c(rep(1, m0[1]), rep(2, m0[2]), rep(3, m0[3]))
y0 <- c(rep(1, m1[1]), rep(2, m1[2]), rep(3, m1[3]))
y1 <- c(y0, y1)
y table(x, y)
y
x 1 2 3
0 20 10 20
1 15 5 30
# A PO model cannot reproduce the original proportions
<- lrm(y ~ x)
f f
Logistic Regression Model
lrm(formula = y ~ x)
Model Likelihood Ratio Test |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 100 | LR χ2 2.96 | R2 0.034 | C 0.574 |
1 35 | d.f. 1 | R21,100 0.019 | Dxy 0.149 |
2 15 | Pr(>χ2) 0.0855 | R21,82.9 0.023 | γ 0.290 |
3 50 | Brier 0.226 | τa 0.091 | |
max |∂log L/∂β| 9×10-9 |
β | S.E. | Wald Z | Pr(>|Z|) | |
---|---|---|---|---|
y≥2 | 0.3177 | 0.2725 | 1.17 | 0.2437 |
y≥3 | -0.3177 | 0.2725 | -1.17 | 0.2437 |
x | 0.6601 | 0.3864 | 1.71 | 0.0876 |
predict(f, data.frame(x=c(0, 1)), type='fitted.ind')
y=1 y=2 y=3
1 0.4212365 0.1575270 0.4212365
2 0.2733366 0.1418996 0.5847638
require(VGAM)
<- vgam(y ~ x, cumulative(reverse=TRUE, parallel=TRUE))
fv coef(fv)
(Intercept):1 (Intercept):2 x
0.3177002 -0.3176987 0.6600579
predict(fv, data.frame(x=c(0, 1)), type='response')
1 2 3
1 0.4212363 0.1575270 0.4212367
2 0.2733369 0.1418997 0.5847635
# Now fit a PPO model that will reproduce all cell proportions
<- vgam(y ~ x, cumulative(reverse=TRUE, parallel=FALSE))
fvppo coef(fvppo)
(Intercept):1 (Intercept):2 x:1 x:2
0.4054651 -0.4054651 0.4418328 0.8109302
predict(fvppo, data.frame(x=c(0, 1)), type='response') # perfect recovery
1 2 3
1 0.4 0.2 0.4
2 0.3 0.1 0.6
# Function to manually compute cell probablities
<- function(co, type, centered=FALSE) {
pprop <- if(centered) c(-0.5, 0.5) else 0:1
x switch(type,
vgam = {
<- plogis(co[1] + x * co[3])
pge2 <- plogis(co[2] + x * co[4])
peq3 rbind(c(1 - pge2[1], pge2[1] - peq3[1], peq3[1]),
c(1 - pge2[2], pge2[2] - peq3[2], peq3[2]))
}, blrm = {
<- plogis(co[1] + x * co[3])
pge2 <- plogis(co[2] + x * (co[3] + co[4]))
peq3 rbind(c(1 - pge2[1], pge2[1] - peq3[1], peq3[1]),
c(1 - pge2[2], pge2[2] - peq3[2], peq3[2]))
} )
}
<- coef(vgam(y ~ x, cumulative(reverse=TRUE, parallel=FALSE)))
co pprop(co, type='vgam')
[,1] [,2] [,3]
[1,] 0.4 0.2 0.4
[2,] 0.3 0.1 0.6
# Now try blrm
# First fit an onconstrained partial PO model
<- blrm(y ~ x, ~ x, method='opt') b
Initial log joint probability = -114.692
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
16 -102.474 0.00286257 0.000959234 1 1 19
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.1 seconds.
coef(b)
y>=2 y>=3 x x:y>=3
0.405330 -0.405536 0.442030 0.368990
pprop(coef(b), type='blrm')
[,1] [,2] [,3]
[1,] 0.4000324 0.1999846 0.3999830
[2,] 0.2999870 0.1000085 0.6000045
# Fit a constrained PPO model
<- blrm(y ~ x, ~ x, cppo=function(y) y, method='opt') b
Initial log joint probability = -109.168
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
19 -96.9501 0.00097211 0.000261165 0.7506 0.7506 23
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.1 seconds.
coef(b)
y>=2 y>=3 x x x f(y)
0.4055070 -0.4054533 -0.2965560 0.3691440
pprop(coef(b), type='blrm')
[,1] [,2] [,3]
[1,] 0.3999899 0.2000072 0.4000028
[2,] 0.4727892 0.1096672 0.4175436
# First mimic PO model by penalizing PPO term to nearly zero
# Quickly get maximum likelihood estimates (posterior modes)
<- blrm(y ~ x, ~x, priorsdppo=0.01, method='opt') b
Initial log joint probability = -105.482
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
20 -94 0.00182497 0.000148781 0.9861 0.9861 29
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.1 seconds.
coef(b)
y>=2 y>=3 x x:y>=3
3.177030e-01 -3.176982e-01 6.600460e-01 1.560706e-05
pprop(coef(b), type='blrm')
[,1] [,2] [,3]
[1,] 0.4212356 0.1575275 0.4212368
[2,] 0.2733387 0.1418968 0.5847645
# Now really fit PPO model, at first only getting MLE
# Do full posterior sampling
<- blrm(y ~ x, ~ x, priorsdppo=1000, method='opt') b
Initial log joint probability = -116.995
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
16 -104.777 0.00286617 0.000960758 1 1 19
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.1 seconds.
coef(b) # also the posterior mode
y>=2 y>=3 x x:y>=3
0.4053560 -0.4055613 0.4419820 0.3690740
coef(b)[3] + coef(b)[4]
x
0.811056
<- blrm(y ~ x, ~ x, priorsdppo=1000, file=rfile(bppo)) bppo
# take differences in last 2 coefficients to get our scheme
# Check recovery of proportions, using posterior mode/mean/median
pprop(coef(bppo, 'mode'), type='blrm')
[,1] [,2] [,3]
[1,] 0.4000262 0.1999969 0.3999769
[2,] 0.2999916 0.1000013 0.6000071
pprop(coef(bppo, 'mean'), type='blrm')
[,1] [,2] [,3]
[1,] 0.3965603 0.2098856 0.3935541
[2,] 0.2887990 0.1149495 0.5962515
pprop(coef(bppo, 'median'), type='blrm')
[,1] [,2] [,3]
[1,] 0.3963650 0.2079793 0.3956557
[2,] 0.2909916 0.1139354 0.5950730
bppo
Bayesian Partial Proportional Odds Ordinal Logistic Model
Dirichlet Priors With Concentration Parameter 0.541 for Intercepts
blrm(formula = y ~ x, ppo = ~x, priorsdppo = 1000, file = rfile(bppo))
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 100 | LOO log L -101.73±4.63 | g 0.266 [0, 0.585] | C 0.607 [0.355, 0.645] |
1 35 | LOO IC 203.47±9.25 | gp 0.059 [0.001, 0.129] | Dxy 0.213 [-0.29, 0.29] |
2 15 | Effective p 4.04±0.36 | EV 0.021 [0, 0.073] | |
3 50 | B 0.229 [0.225, 0.237] | v 0.098 [0, 0.339] | |
Draws 4000 | vp 0.005 [0, 0.016] | ||
Chains 4 | |||
Time 0.6s | |||
p 1 |
Mode β | Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|---|
y≥2 | 0.4054 | 0.4198 | 0.4206 | 0.2857 | -0.1776 | 0.9563 | 0.9275 | 1.03 |
y≥3 | -0.4056 | -0.4324 | -0.4236 | 0.2891 | -0.9956 | 0.1230 | 0.0622 | 0.98 |
x | 0.4420 | 0.4814 | 0.4699 | 0.4330 | -0.3961 | 1.2988 | 0.8692 | 1.07 |
x:y≥3 | 0.3691 | 0.3409 | 0.3386 | 0.3100 | -0.2381 | 0.9707 | 0.8648 | 1.01 |
2.2 Constrained Partial PO Model
Consider the same dataset analyzed above. Specify a constrained PPO model that in this particular case is really unconstrained because it has a total of two parameters to handle the group effect, and there are only \(k=3\) levels of Y.
# cppo function specifies that there is a special effect of x for y=3
<- blrm(y ~ x, ~ x, cppo=function(y) y == 3, file=rfile(bcppo)) bcppo
<- bcppo$cppo
cppo cppo
function (y)
y == 3
<environment: 0x34f02d890>
<- coef(bcppo, 'mode')
b rbind(Mode=b, Mean=coef(bcppo))
y>=2 y>=3 x x x f(y)
Mode 0.4053560 -0.4055614 0.4419820 0.3690740
Mean 0.4201117 -0.4333377 0.4753288 0.3440668
# Compute the 4 cumulative probabilities using the posterior mode (MLE)
# b[3] and b[4] are multiplied x (hence drop out of the x=0 row)
<- rbind('x=0' = c(b[1], b[2]),
L 'x=1' = c(b[1] + b[3] + cppo(2) * b[4], b[2] + b[3] + cppo(3) * b[4]))
plogis(L)
y>=2 y>=3
x=0 0.5999738 0.3999769
x=1 0.7000084 0.6000071
Now consider the severity of nausea data from Peterson & Harrell (1990).
<- data.frame(tx=0, y=c(rep(0, 43), rep(1, 39), rep(2, 13), rep(3, 22),
d0 rep(4, 15), rep(5, 29)))
<- data.frame(tx=1, y=c(rep(0, 7), rep(1, 7), rep(2, 3), rep(3, 12),
d1 rep(4, 15), rep(5, 14)))
<- rbind(d0, d1)
d $tx <- factor(d$tx, 0:1, c('No cisplatin', 'cisplatin'))
d<- datadist(d); options(datadist='dd')
dd with(d, table(tx, y))
y
tx 0 1 2 3 4 5
No cisplatin 43 39 13 22 15 29
cisplatin 7 7 3 12 15 14
# Allow for a different effect of tx at y=5
<- function(y) y==5 # max(y)=5 and y is discrete
g # Check against maximum likelihood estimates in Peterson & Harrell
<- blrm(y ~ tx, ~ tx, cppo=g, data=d, method='opt') f
Initial log joint probability = -387.608
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
23 -367.405 0.00315823 0.00395241 1 1 27
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.1 seconds.
# Compute the treatment effect log(OR) for y=1, 2, 3, 4, 5
<- coef(f)
k k
y>=1 y>=2 y>=3 y>=4
0.99762818 -0.01286082 -0.32561932 -1.01304282
y>=5 tx=cisplatin tx=cisplatin x f(y)
-1.54661927 1.06393152 -0.62925705
6] + g(1:5) * k[7] # matches paper k[
[1] 1.0639315 1.0639315 1.0639315 1.0639315 0.4346745
# Now get posterior distributions of parameters
<- blrm(y ~ tx, ~ tx, cppo=g, data=d, file=rfile(bnausea)) fp
Initial log joint probability = -387.608
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
23 -367.405 0.00315823 0.00395241 1 1 27
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.1 seconds.
Running MCMC with 4 chains, at most 11 in parallel...
Chain 2 finished in 0.6 seconds.
Chain 3 finished in 0.5 seconds.
Chain 1 finished in 0.6 seconds.
Chain 4 finished in 0.5 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.6 seconds.
Total execution time: 0.7 seconds.
rbind(coef(f), coef(fp, 'mode'), coef(fp, 'mean'))
y>=1 y>=2 y>=3 y>=4 y>=5 tx=cisplatin
[1,] 0.9976282 -0.012860819 -0.3256193 -1.013043 -1.546619 1.063932
[2,] 0.9976282 -0.012860819 -0.3256193 -1.013043 -1.546619 1.063932
[3,] 1.0101354 -0.009347481 -0.3262213 -1.016829 -1.569046 1.079022
tx=cisplatin x f(y)
[1,] -0.6292571
[2,] -0.6292571
[3,] -0.6558878
<- coef(fp) # posterior means
k 6] + g(1:5) * k[7] # close to paper k[
[1] 1.0790225 1.0790225 1.0790225 1.0790225 0.4231347
<- data.frame(tx=levels(d$tx))
dat # Get posterior mean and 0.95 HPD intervals for treatment
# effects at all levels of y
contrast(fp, list(tx='cisplatin'), list(tx='No cisplatin'), ycut=1:5)
Contrast S.E. Lower Upper Pr(Contrast>0)
1 y=1 1.0790225 0.2844305 0.5399123 1.645820 1.0000
2* y=2 1.0790225 0.2844305 0.5399123 1.645820 1.0000
3* y=3 1.0790225 0.2844305 0.5399123 1.645820 1.0000
4* y=4 1.0790225 0.2844305 0.5399123 1.645820 1.0000
5 y=5 0.4231347 0.3525891 -0.3211363 1.062355 0.8842
Redundant contrasts are denoted by *
Intervals are 0.95 highest posterior density intervals
Contrast is the posterior mean
Predict(fp, tx)
tx yhat lower upper
1 No cisplatin -0.009347481 -0.3202163 0.2767592
2 cisplatin 1.069675010 0.5691994 1.6162737
Response variable (y): log odds
Limits are 0.95 confidence limits
If the response variable is discrete and has character strings for the ordered factor levels, you can use these strings in the cppo
function definition. For example, suppose that Y was a factor
variable with these levels: "ok"
, "in pain"
, "stroke"
, or "death"
. To allow a treatment to a have a different effect for the last two levels while adjusting for a covariate age
that is assumed to operate in proportional odds, one can code
<- blrm(y ~ age + tx, ~ tx, cppo=function(y) y %in% c('stroke', 'death')) f
There are special arguments to some of the rms
functions for getting estimates or predictions for partial PO models. For the Predict
function you can specify ycut
or kint
to specify the response variable value such that the logit or probability of \(Y\geq y\) is being estimated. For contrast
(full name contrast.rms
) you can specify the argument y
as a scalar or vector. When it is a scalar, that value of the y cutoff is used for all contrasts. When it is a vector, it is assume to either (1) have length equal to the number of contrasts being specified so that the appropriate value of y
is used for each contrast, or (2) when only one contrast is requested, the contrast is repeated length(y)
times to estimate the effect of varying y
. This was done in the last part of the example above.
When the response variable is continuous, it is more flexible to specify ycut
to Predict
than to specify kint
, because for ycut
you can specify a value of Y that did not occur in the data (when the cppo
function is continuous) to take into account the degree of non-proportional odds at that value.
3 Longitudinal Data Examples: Random Effects
3.1 Schizophrenia Dataset from mixor
Package
The R mixor
package fits frequentist random effects proportional odds models. Let’s analyze the first dataset discussed in the mixor
package vignette. The outcome is a 4-level severity of illness scale. Unfortunately, mixor
is no longer on CRAN, so we access the dataset on the Vanderbilt Department of Biostatistics data repository, and use the ordinal
package to fit the random intercepts model.
require(ordinal)
getHdata(schizophrenia)
<- schizophrenia
d <- clmm2(factor(imps79o) ~ sqrt(Week) * TxDrug, random=factor(id),
f link='logistic', Hess=TRUE, data=d)
summary(f)
Cumulative Link Mixed Model fitted with the Laplace approximation
Call:
clmm2(location = factor(imps79o) ~ sqrt(Week) * TxDrug, random = factor(id),
data = d, Hess = TRUE, link = "logistic")
Random effects:
Var Std.Dev
factor(id) 3.606788 1.899154
Location coefficients:
Estimate Std. Error z value Pr(>|z|)
sqrt(Week) -0.7678 0.0145 -52.8160 < 2.22e-16
TxDrug -0.0681 0.0143 -4.7666 1.8733e-06
sqrt(Week):TxDrug -1.1982 0.0146 -82.1808 < 2.22e-16
No scale coefficients
Threshold coefficients:
Estimate Std. Error z value
1|2 -5.8488 0.0143 -408.0444
2|3 -2.8233 0.0144 -196.6312
3|4 -0.7143 0.0828 -8.6313
log-likelihood: -1708.111
AIC: 3430.221
Condition number of Hessian: 46.31894
sqrt(diag(vcov(f)))
1|2 2|3 3|4 sqrt(Week)
0.01433368 0.01435845 0.08275618 0.01453735
TxDrug sqrt(Week):TxDrug
0.01428859 0.01458059 0.01362287
Fit the same model using the Bayesian approach.
<- blrm(imps79o ~ sqrt(Week) * TxDrug + cluster(id), data=d,
bmixor file=rfile(bmixor))
Running MCMC with 4 chains, at most 11 in parallel...
Chain 3 finished in 8.3 seconds.
Chain 1 finished in 8.6 seconds.
Chain 2 finished in 8.5 seconds.
Chain 4 finished in 8.6 seconds.
All 4 chains finished successfully.
Mean chain execution time: 8.5 seconds.
Total execution time: 8.7 seconds.
bmixor
Bayesian Proportional Odds Ordinal Logistic Model
Dirichlet Priors With Concentration Parameter 0.455 for Intercepts
blrm(formula = imps79o ~ sqrt(Week) * TxDrug + cluster(id), data = d, file = rfile(bmixor))
Frequencies of Responses
1 2 3 4 190 474 412 527
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 1603 | B 0.198 [0.194, 0.201] | g 1.942 [1.759, 2.106] | C 0.769 [0.769, 0.771] |
Draws 4000 | gp 0.338 [0.313, 0.355] | Dxy 0.538 [0.537, 0.541] | |
Chains 4 | EV 0.397 [0.348, 0.436] | ||
Time 10.4s | v 2.977 [2.416, 3.472] | ||
p 3 | vp 0.094 [0.082, 0.104] | ||
Cluster on id |
|||
Clusters 437 | |||
σγ 1.9378 [1.7124, 2.1695] |
Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|
y≥2 | 5.8517 | 5.8505 | 0.3296 | 5.1667 | 6.4509 | 1.0000 | 1.00 |
y≥3 | 2.8202 | 2.8212 | 0.2872 | 2.2496 | 3.3629 | 1.0000 | 1.02 |
y≥4 | 0.7032 | 0.7051 | 0.2753 | 0.1836 | 1.2561 | 0.9948 | 1.03 |
Week | -0.7647 | -0.7641 | 0.1333 | -1.0306 | -0.5136 | 0.0000 | 0.98 |
TxDrug | -0.0583 | -0.0500 | 0.3146 | -0.6694 | 0.5673 | 0.4310 | 0.97 |
Week × TxDrug | -1.2075 | -1.2068 | 0.1525 | -1.5099 | -0.9072 | 0.0000 | 1.03 |
The posterior median for \(\sigma_\gamma\) is 1.93776 which compares well with the ordinal
estimate of 1.899. The \(\hat{\beta}\)s also compare very well. Note that the model is stated differently, which makes two of the intercepts have different meanings across packages.
3.2 Simulated Random Effects Longitudinal Data
Let’s generate some data with repeatedly measured outcome per subject where the outcome is binary and the random effects have a \(N(0, 0.25^2)\) distribution. 500 subjects have 10 measurements each.
<- 500 # subjects
n set.seed(2)
<- rnorm(n) * 0.25 # worked fine also with rnorm(n) * 4
re <- runif(n) # baseline covariate, will be duplicated over repeats
X <- 10 # measurements per subject
m
<- rep(1 : n, each = m)
id <- X[id]
x <- x + re[id] # actual logit
L <- ifelse(runif(n * m) <= plogis(L), 1, 0)
y <- lrm(y ~ x, x=TRUE, y=TRUE) # ordinary fit
f # now use cluster sandwich covariance estimator: f
Logistic Regression Model
lrm(formula = y ~ x, x = TRUE, y = TRUE)
Model Likelihood Ratio Test |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 5000 | LR χ2 93.67 | R2 0.025 | C 0.581 |
0 1908 | d.f. 1 | R21,5000 0.018 | Dxy 0.162 |
1 3092 | Pr(>χ2) <0.0001 | R21,3539.7 0.026 | γ 0.162 |
max |∂log L/∂β| 1×10-13 | Brier 0.232 | τa 0.076 |
β | S.E. | Wald Z | Pr(>|Z|) | |
---|---|---|---|---|
Intercept | 0.0087 | 0.0566 | 0.15 | 0.8783 |
x | 0.9432 | 0.0983 | 9.60 | <0.0001 |
<- robcov(f, id) # covariance matrix adjusted for clustering
g g
Logistic Regression Model
lrm(formula = y ~ x, x = TRUE, y = TRUE)
Model Likelihood Ratio Test |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 5000 | LR χ2 93.67 | R2 0.025 | C 0.581 |
0 1908 | d.f. 1 | R21,5000 0.018 | Dxy 0.162 |
1 3092 | Pr(>χ2) <0.0001 | R21,3539.7 0.026 | γ 0.162 |
Cluster on id |
Brier 0.232 | τa 0.076 | |
Clusters 500 | |||
max |∂log L/∂β| 1×10-13 |
β | S.E. | Wald Z | Pr(>|Z|) | |
---|---|---|---|---|
Intercept | 0.0087 | 0.0613 | 0.14 | 0.8876 |
x | 0.9432 | 0.1056 | 8.93 | <0.0001 |
We first fit an inappropriate Bayesian model in which the random effects are omitted.
# Note: loo defaults to FALSE when n > 1000 as in this case
# Need loo for compareBmods
<- blrm(y ~ x, loo=TRUE, file=rfile(breo)) breo
Initial log joint probability = -5481.46
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
14 -3277.38 0.040852 2.66905e-05 1 1 15
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.1 seconds.
Running MCMC with 4 chains, at most 11 in parallel...
Chain 4 finished in 5.7 seconds.
Chain 2 finished in 5.8 seconds.
Chain 1 finished in 5.9 seconds.
Chain 3 finished in 6.2 seconds.
All 4 chains finished successfully.
Mean chain execution time: 5.9 seconds.
Total execution time: 6.3 seconds.
breo
Bayesian Logistic Model
Dirichlet Priors With Concentration Parameter 0.541 for Intercepts
blrm(formula = y ~ x, loo = TRUE, file = rfile(breo))
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 5000 | LOO log L -3279.46±18.95 | g 0.329 [0.258, 0.393] | C 0.581 [0.581, 0.581] |
0 1908 | LOO IC 6558.91±37.89 | gp 0.077 [0.06, 0.091] | Dxy 0.162 [0.162, 0.162] |
1 3092 | Effective p 2.09±0.02 | EV 0.019 [0.012, 0.026] | |
Draws 4000 | B 0.232 [0.232, 0.232] | v 0.083 [0.048, 0.114] | |
Chains 4 | vp 0.004 [0.003, 0.006] | ||
Time 9.1s | |||
p 1 |
Mode β | Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|---|
Intercept | 0.0087 | 0.0082 | 0.0083 | 0.0594 | -0.1121 | 0.1193 | 0.5575 | 1.03 |
x | 0.9432 | 0.9445 | 0.9461 | 0.1027 | 0.7528 | 1.1569 | 1.0000 | 1.02 |
Now use a proper Bayesian random effects model. The prior distribution for the standard deviation \(\sigma_{\gamma}\) of the random effects (\(\gamma\)s) is assumed to be exponential when psigma=2
, and we will use the default mean of 1.0.
<- blrm(y ~ x + cluster(id), psigma=2, loo=TRUE, file=rfile(bre)) bre
Running MCMC with 4 chains, at most 11 in parallel...
Chain 3 finished in 19.5 seconds.
Chain 2 finished in 19.8 seconds.
Chain 1 finished in 20.4 seconds.
Chain 4 finished in 21.0 seconds.
All 4 chains finished successfully.
Mean chain execution time: 20.2 seconds.
Total execution time: 21.2 seconds.
bre
Bayesian Logistic Model
Dirichlet Priors With Concentration Parameter 0.541 for Intercepts
blrm(formula = y ~ x + cluster(id), psigma = 2, loo = TRUE, file = rfile(bre))
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 5000 | LOO log L -3273.23±19.25 | g 0.333 [0.267, 0.411] | C 0.581 [0.581, 0.581] |
0 1908 | LOO IC 6546.47±38.5 | gp 0.078 [0.062, 0.095] | Dxy 0.162 [0.162, 0.162] |
1 3092 | Effective p 83.32±0.67 | EV 0.019 [0.011, 0.028] | |
Draws 4000 | B 0.232 [0.232, 0.232] | v 0.085 [0.048, 0.121] | |
Chains 4 | vp 0.005 [0.003, 0.006] | ||
Time 24.6s | |||
p 1 | |||
Cluster on id |
|||
Clusters 500 | |||
σγ 0.2926 [0.1418, 0.4154] |
Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|
Intercept | 0.0106 | 0.0105 | 0.0624 | -0.1110 | 0.1325 | 0.5698 | 1.02 |
x | 0.9583 | 0.9568 | 0.1074 | 0.7665 | 1.1812 | 1.0000 | 1.00 |
plot(bre)
Before delving more into the random effects model, let’s compare this new model with the previous model that erroneously omitted the random effects.
compareBmods(breo, bre)
Method: stacking
------
weight
model1 0.051
model2 0.949
Roughly speaking, of the two models, the one with random effects has a probability of 0.93 of being the correct one. See rstan::loo
and loo::loo.array
for details.
Now let’s get into more details from the random effects model fit.
# Plot distribution of the 500 estimated random effects (posterior medians)
hist(bre$gammas, xlab='Estimated Random Effects', nclass=40)
Now generate similar data except for a bimodal random effects distribution. This will fool the random effects normal prior into having a wider variance for a single normal distribution but will still result in estimated random effects that are somewhat realistic.
<- 500
n set.seed(3)
<- c(rnorm(n/2, mean=-1.75), rnorm(n/2, mean=1.75)) * 0.25
re cat('SD of real random effects:', round(sd(re), 4), '\n')
SD of real random effects: 0.5115
<- runif(n) # baseline covariate, will be duplicated over repeats
X <- 10 # measurements per subject
m
<- rep(1 : n, each = m)
id <- X[id]
x <- x + re[id] # actual logit
L <- ifelse(runif(n * m) <= plogis(L), 1, 0)
y <- blrm(y ~ x + cluster(id), file=rfile(breb)) breb
Running MCMC with 4 chains, at most 11 in parallel...
Chain 2 finished in 18.6 seconds.
Chain 4 finished in 18.4 seconds.
Chain 1 finished in 18.7 seconds.
Chain 3 finished in 18.8 seconds.
All 4 chains finished successfully.
Mean chain execution time: 18.6 seconds.
Total execution time: 19.0 seconds.
breb
Bayesian Logistic Model
Dirichlet Priors With Concentration Parameter 0.541 for Intercepts
blrm(formula = y ~ x + cluster(id), file = rfile(breb))
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 5000 | B 0.233 [0.233, 0.234] | g 0.347 [0.262, 0.432] | C 0.578 [0.578, 0.578] |
0 1928 | gp 0.08 [0.061, 0.099] | Dxy 0.156 [0.156, 0.156] | |
1 3072 | EV 0.021 [0.012, 0.031] | ||
Draws 4000 | v 0.092 [0.052, 0.14] | ||
Chains 4 | vp 0.005 [0.003, 0.007] | ||
Time 22.3s | |||
p 1 | |||
Cluster on id |
|||
Clusters 500 | |||
σγ 0.5866 [0.4965, 0.6855] |
Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|
Intercept | -0.0124 | -0.0122 | 0.0811 | -0.1777 | 0.1367 | 0.4410 | 0.99 |
x | 1.0833 | 1.0802 | 0.1506 | 0.8025 | 1.3836 | 1.0000 | 1.05 |
par(mfrow=c(2, 2))
hist(breb$gammas, xlab='Estimated Random Effects', nclass=40, main='')
hist(re, xlab='Real Random Effects', nclass=40, main='')
plot(re, breb$gammas, xlab='Real', ylab='Estimated')
abline(a=0, b=1)
3.3 Absorbing State in Mixed Effects Ordinal Regression
blrm
is not designed to handle this situation but let’s see how it performs.
For an ordinal outcome y=0, 1, 2, 3, 4, 5 suppose that y=5 represents an absorbing state such as death. Suppose that subjects are observed for 10 days, and if death occurs within those days, all later values of y for that subject are set to 5. Generate repeated outcomes under a \(N(0, 0.25^2)\) random effects model with two treatments: a
and b
. The b:a
odds ratio is 0.65 and the cell probabilities are 0.3, 0.3, 0.1, 0.1, 0.1, 0.1 corresponding to y=0-5, when the random effect is zero.
# Generate data as if there is no absorbing state
<- 1000
n set.seed(6)
<- c(.3, .3, .1, .1, .1, .1) # P(Y=0-5 | tx=a, random effect=0)
pa <- pomodm(p=pa, odds.ratio=0.65) # P(Y=0-5 | tx=b, re=0) # Hmisc
pb round(pb, 3)
[1] 0.397 0.300 0.084 0.078 0.072 0.067
<- rnorm(n) * 0.25
re <- c(rep('a', n/2), rep('b', n/2)) # will be duplicated over repeats
tx <- 10 # measurements per subject
m
<- rep(1 : n, each = m)
id <- rep(1 : m, n)
time <- exp(log(0.65) * (tx[id] == 'b') + re[id])
or <- integer(n * m)
y for(j in 1 : (n * m)) {
<- pomodm(p=pa, odds.ratio=or[j])
p <- sample(0:5, 1, p, replace=TRUE)
y[j]
}<- tx[id]
Tx table(Tx, y)
y
Tx 0 1 2 3 4 5
a 1517 1448 506 516 491 522
b 2008 1524 424 379 320 345
The first Bayesian proportional odds model fitted is the one that exactly matches the data generation model, as we have not yet imposed an absorbing state, so that outcomes with y < 5 can appear after a y=5 outcome for the subject.
<- blrm(y ~ Tx + cluster(id), file=rfile(bst)) bst
Running MCMC with 4 chains, at most 11 in parallel...
Chain 3 finished in 63.6 seconds.
Chain 4 finished in 65.6 seconds.
Chain 2 finished in 66.1 seconds.
Chain 1 finished in 66.4 seconds.
All 4 chains finished successfully.
Mean chain execution time: 65.4 seconds.
Total execution time: 66.6 seconds.
bst
Bayesian Proportional Odds Ordinal Logistic Model
Dirichlet Priors With Concentration Parameter 0.345 for Intercepts
blrm(formula = y ~ Tx + cluster(id), file = rfile(bst))
Frequencies of Responses
0 1 2 3 4 5 3525 2972 930 895 811 867
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 10000 | B 0.226 [0.226, 0.226] | g 0.239 [0.203, 0.278] | C 0.594 [0.594, 0.594] |
Draws 4000 | gp 0.054 [0.047, 0.064] | Dxy 0.189 [0.189, 0.189] | |
Chains 4 | EV 0.013 [0.009, 0.017] | ||
Time 72.6s | v 0.057 [0.041, 0.077] | ||
p 1 | vp 0.003 [0.002, 0.004] | ||
Cluster on id |
|||
Clusters 1000 | |||
σγ 0.3085 [0.2425, 0.373] |
Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|
y≥1 | 0.8704 | 0.8705 | 0.0310 | 0.8075 | 0.9300 | 1.0000 | 1.04 |
y≥2 | -0.3992 | -0.3996 | 0.0305 | -0.4551 | -0.3351 | 0.0000 | 1.02 |
y≥3 | -0.8561 | -0.8563 | 0.0313 | -0.9156 | -0.7934 | 0.0000 | 1.01 |
y≥4 | -1.4119 | -1.4116 | 0.0341 | -1.4777 | -1.3431 | 0.0000 | 1.01 |
y≥5 | -2.1785 | -2.1784 | 0.0416 | -2.2617 | -2.1000 | 0.0000 | 0.97 |
Tx=b | -0.4776 | -0.4775 | 0.0409 | -0.5569 | -0.3950 | 0.0000 | 0.95 |
stanDx(bst)
Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved
For each parameter, n_eff is a crude measure of effective sample size
and Rhat is the potential scale reduction factor on split chains
(at convergence, Rhat=1)
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete, no problems detected.
EBFMI: 0.636 0.684 0.671 0.729
Parameter Rhat ESS bulk ESS tail
1 alpha[1] 1.002 4542 3203
2 alpha[2] 1.000 4363 3062
3 alpha[3] 1.000 4339 2828
4 alpha[4] 1.001 3965 2693
5 alpha[5] 1.001 4644 3087
6 beta[1] 1.001 3795 2899
7 sigmag[1] 1.003 990 2061
stanDxplot(bst, 'ALL')
If time
were to be added to the above model, you’ll see that its regression coefficient is very small (\(\hat{\beta}=0.009\) in this case), in alignment with the data generating model.
Now assume that state y=5 is an absorbing state. Change observations after the first y=5 within subject to also have y=5.
require(data.table)
<- function(x) if(length(x)) min(x, na.rm=TRUE) else 99L
g <- data.table(id, time, Tx, y, key='id')
u # Add variable 'first' which is time of first y=5 for subject (99 if never)
<- u[, .(first=g(time[y == 5])), by=id]
w <- u[w]
d
# Show distribution of first time of y=5
table(d[time == 1, first])
1 2 3 4 5 6 7 8 9 10 99
82 70 66 70 50 55 39 51 43 43 431
# Set all observations after the first y=5 to also have y=5
<- d
z > first, y:=5]
z[time table(u$y); table(d$y); table(z$y)
0 1 2 3 4 5
3525 2972 930 895 811 867
0 1 2 3 4 5
2628 2072 652 609 558 3481
0 1 2 3 4 5
2628 2072 652 609 558 3481
<- blrm(y ~ Tx + cluster(id), data=z, file=rfile(bcf)) bcf
Running MCMC with 4 chains, at most 11 in parallel...
Chain 2 finished in 50.8 seconds.
Chain 4 finished in 52.5 seconds.
Chain 1 finished in 53.1 seconds.
Chain 3 finished in 53.6 seconds.
All 4 chains finished successfully.
Mean chain execution time: 52.5 seconds.
Total execution time: 53.8 seconds.
bcf
Bayesian Proportional Odds Ordinal Logistic Model
Dirichlet Priors With Concentration Parameter 0.345 for Intercepts
blrm(formula = y ~ Tx + cluster(id), data = z, file = rfile(bcf))
Frequencies of Responses
0 1 2 3 4 5 2628 2072 652 609 558 3481
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 10000 | B 0.252 [0.248, 0.256] | g 0.379 [0.265, 0.495] | C 0.608 [0.608, 0.608] |
Draws 4000 | gp 0.089 [0.061, 0.114] | Dxy 0.216 [0.216, 0.216] | |
Chains 4 | EV 0.034 [0.016, 0.055] | ||
Time 59.9s | v 0.147 [0.068, 0.242] | ||
p 1 | vp 0.008 [0.004, 0.013] | ||
Cluster on id |
|||
Clusters 1000 | |||
σγ 1.7571 [1.6515, 1.8622] |
Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|
y≥1 | 2.0791 | 2.0791 | 0.0851 | 1.9229 | 2.2618 | 1.0000 | 0.98 |
y≥2 | 0.8482 | 0.8486 | 0.0836 | 0.6833 | 1.0098 | 1.0000 | 0.97 |
y≥3 | 0.4531 | 0.4526 | 0.0834 | 0.2915 | 0.6163 | 1.0000 | 0.99 |
y≥4 | 0.0467 | 0.0478 | 0.0829 | -0.1089 | 0.2132 | 0.7130 | 0.96 |
y≥5 | -0.3886 | -0.3874 | 0.0827 | -0.5468 | -0.2249 | 0.0000 | 1.00 |
Tx=b | -0.7589 | -0.7606 | 0.1137 | -0.9796 | -0.5409 | 0.0000 | 1.05 |
hist(bcf$gammas, xlab='Estimated Random Effects', nclass=40, main='')
The regression coefficient for treatment is too large (the true value is log(0.65) = -0.446). The standard deviation of random effects is large (the true value is 0.25), reflecting increased dependence of outcomes without subject due to the duplication of y=5 records. However the data being analyzed were not formally generated with the model that has a treatment odds ratio of 0.65. Repeated correlated ordinal outcomes were generated with that odds ratio and with a random effect standard deviation of 0.25, but then the outcomes were overridden in the following fashion: The first time within a subject that y=5 causes suppression of all later records.
The histogram of estimated subject random effects (posterior medians) shows some bimodality with heavy right tail due to the y=5 absorbing state. Let s also plot the random effects against the time of death (99 if the subject did not die, recoded here to 15).
<- subset(z, time == 1)$first
t5 <- ifelse(t5 == 99, 15, t5)
t5 plot(t5, bcf$gammas, xlab='Time of y=5', ylab='Random Effect')
What happens with time is added to this model?
<- blrm(y ~ Tx + time + cluster(id), data=z, file=rfile(bcft)) bcft
Running MCMC with 4 chains, at most 11 in parallel...
Chain 1 finished in 54.9 seconds.
Chain 4 finished in 54.8 seconds.
Chain 3 finished in 56.5 seconds.
Chain 2 finished in 62.1 seconds.
All 4 chains finished successfully.
Mean chain execution time: 57.1 seconds.
Total execution time: 62.2 seconds.
bcft
Bayesian Proportional Odds Ordinal Logistic Model
Dirichlet Priors With Concentration Parameter 0.345 for Intercepts
blrm(formula = y ~ Tx + time + cluster(id), data = z, file = rfile(bcft))
Frequencies of Responses
0 1 2 3 4 5 2628 2072 652 609 558 3481
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 10000 | B 0.244 [0.24, 0.248] | g 0.987 [0.913, 1.067] | C 0.636 [0.634, 0.636] |
Draws 4000 | gp 0.208 [0.193, 0.221] | Dxy 0.271 [0.269, 0.272] | |
Chains 4 | EV 0.14 [0.122, 0.157] | ||
Time 68.6s | v 0.743 [0.632, 0.869] | ||
p 2 | vp 0.033 [0.029, 0.037] | ||
Cluster on id |
|||
Clusters 1000 | |||
σγ 1.9434 [1.839, 2.0604] |
Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|
y≥1 | 0.8640 | 0.8651 | 0.1002 | 0.6692 | 1.0536 | 1.0000 | 0.96 |
y≥2 | -0.4449 | -0.4431 | 0.0994 | -0.6469 | -0.2607 | 0.0000 | 0.97 |
y≥3 | -0.8784 | -0.8768 | 0.0997 | -1.0816 | -0.6931 | 0.0000 | 0.95 |
y≥4 | -1.3392 | -1.3377 | 0.1009 | -1.5320 | -1.1370 | 0.0000 | 0.96 |
y≥5 | -1.8612 | -1.8583 | 0.1022 | -2.0551 | -1.6577 | 0.0000 | 0.96 |
Tx=b | -0.8413 | -0.8388 | 0.1302 | -1.0864 | -0.5879 | 0.0000 | 0.97 |
time | 0.2612 | 0.2610 | 0.0075 | 0.2469 | 0.2765 | 1.0000 | 1.03 |
We see that the slope of time is very large, but the treatment effect and random effect standard deviation are still very large.
Look at random effects again.
hist(bcft$gammas, xlab='Estimated Random Effects', nclass=40, main='')
plot(t5, bcft$gammas, xlab='Time of y=5', ylab='Random Effect')
Next we truncate patient records so that y=5 is not carried forward.
<- z[time <= first]
zt <- blrm(y ~ Tx + cluster(id), data=zt, file=rfile(bnc)) bnc
Running MCMC with 4 chains, at most 11 in parallel...
Chain 1 finished in 45.1 seconds.
Chain 4 finished in 45.5 seconds.
Chain 3 finished in 45.7 seconds.
Chain 2 finished in 46.7 seconds.
All 4 chains finished successfully.
Mean chain execution time: 45.8 seconds.
Total execution time: 46.9 seconds.
bnc
Bayesian Proportional Odds Ordinal Logistic Model
Dirichlet Priors With Concentration Parameter 0.345 for Intercepts
blrm(formula = y ~ Tx + cluster(id), data = zt, file = rfile(bnc))
Frequencies of Responses
0 1 2 3 4 5 2628 2072 652 609 558 569
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 7088 | B 0.231 [0.231, 0.231] | g 0.237 [0.189, 0.286] | C 0.592 [0.592, 0.592] |
Draws 4000 | gp 0.054 [0.043, 0.064] | Dxy 0.185 [0.185, 0.185] | |
Chains 4 | EV 0.013 [0.008, 0.018] | ||
Time 51.9s | v 0.057 [0.035, 0.081] | ||
p 1 | vp 0.003 [0.002, 0.004] | ||
Cluster on id |
|||
Clusters 1000 | |||
σγ 0.3124 [0.2157, 0.3992] |
Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|
y≥1 | 0.8326 | 0.8325 | 0.0402 | 0.7531 | 0.9087 | 1.0000 | 1.01 |
y≥2 | -0.4155 | -0.4153 | 0.0377 | -0.4910 | -0.3442 | 0.0000 | 1.06 |
y≥3 | -0.8783 | -0.8789 | 0.0386 | -0.9564 | -0.8064 | 0.0000 | 1.01 |
y≥4 | -1.4311 | -1.4316 | 0.0414 | -1.5142 | -1.3541 | 0.0000 | 0.95 |
y≥5 | -2.2167 | -2.2165 | 0.0497 | -2.3158 | -2.1204 | 0.0000 | 0.99 |
Tx=b | -0.4763 | -0.4754 | 0.0496 | -0.5698 | -0.3734 | 0.0000 | 0.97 |
Finally, add time to the above model.
<- blrm(y ~ Tx + time + cluster(id), data=zt, file=rfile(bnct)) bnct
Running MCMC with 4 chains, at most 11 in parallel...
Chain 1 finished in 44.3 seconds.
Chain 4 finished in 45.2 seconds.
Chain 2 finished in 45.4 seconds.
Chain 3 finished in 47.1 seconds.
All 4 chains finished successfully.
Mean chain execution time: 45.5 seconds.
Total execution time: 47.4 seconds.
bnct
Bayesian Proportional Odds Ordinal Logistic Model
Dirichlet Priors With Concentration Parameter 0.345 for Intercepts
blrm(formula = y ~ Tx + time + cluster(id), data = zt, file = rfile(bnct))
Frequencies of Responses
0 1 2 3 4 5 2628 2072 652 609 558 569
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 7088 | B 0.231 [0.231, 0.232] | g 0.249 [0.201, 0.296] | C 0.548 [0.547, 0.55] |
Draws 4000 | gp 0.057 [0.046, 0.068] | Dxy 0.096 [0.095, 0.101] | |
Chains 4 | EV 0.013 [0.008, 0.018] | ||
Time 52.2s | v 0.058 [0.036, 0.08] | ||
p 2 | vp 0.003 [0.002, 0.004] | ||
Cluster on id |
|||
Clusters 1000 | |||
σγ 0.3171 [0.2183, 0.4052] |
Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|
y≥1 | 0.8147 | 0.8138 | 0.0524 | 0.7182 | 0.9246 | 1.0000 | 0.98 |
y≥2 | -0.4348 | -0.4346 | 0.0516 | -0.5351 | -0.3328 | 0.0000 | 1.05 |
y≥3 | -0.8978 | -0.8977 | 0.0529 | -0.9961 | -0.7890 | 0.0000 | 1.02 |
y≥4 | -1.4516 | -1.4522 | 0.0556 | -1.5657 | -1.3526 | 0.0000 | 0.98 |
y≥5 | -2.2376 | -2.2375 | 0.0623 | -2.3584 | -2.1146 | 0.0000 | 0.96 |
Tx=b | -0.4789 | -0.4789 | 0.0486 | -0.5735 | -0.3840 | 0.0000 | 0.99 |
time | 0.0044 | 0.0045 | 0.0080 | -0.0115 | 0.0197 | 0.7105 | 0.98 |
The time effect is very weak, and adding it did not change the already-accurate (with respect to the first part of the data generating mechanism) treatment effect posterior mean.
4 Censored Data
The blrm
function handles left-, right-, and interval-censored ordinal categorical or continuous Y. This opens up numerous possibilities, for example
- one can analyze the usual right-censored time-to-event outcome but assuming, for example, prpoportional odds instead of proportional hazards
- there can be more than one lower limit of detectability for a lab measurement
- in a longitudinal study certain ranges of the outcome scale may not be assessed on certain days
As an example of the third situation, suppose that Y is defined as follows:
Level of Y | Meaning |
---|---|
0 | best quality of life |
1 | very good QOL |
2 | fair QOL |
3 | poor QOL |
4 | myocardial infarction |
5 | stroke |
6 | death |
Suppose that Y were assessed weekly and that the clinical events of MI, stroke, or death are always known when they occur. But suppose that QOL is only assessed once per month. Instead of dealing with complex missing data methods, consider Y to be partially assessed by the use of left censoring. On weeks of non-assessment of QOL consider Y to just be known to be < 4 when the participant is event-free.
blrm
uses the Ocens
function (“ordinal censoring”) to handle censored Y. The notation is Ocens(a, b)
where a
is the lowest value that Y might be, and b
is the highest value it might be for a certain observation. If a=b
, the observation is uncensored. If in a given week a patient has a clinical event, that event overrides any level of QOL and will result in Y=5, for example, for stroke. For a week for which a clinical event does not occur, we know that Y < 4. When QOL is assessed, we know which of Y=0,1,2,3 pertains. When QOL is not assessed, Y = Ocens(0, 4)
. It should be clear that if QOL were not assessed for any participant, the dependent variable is really a 4-level outcome (3 clinical outcomes, with Y=0 denoting that no bad outcome occurred for the participant).
In full-likelihood models such as our extended PO model, censored data are easily handled. One just has to compute the contribution to the log-likelihood for each observation from the information it provides. An observation interval-censored with Y \(\in [3,4]\) has a likelihood equal to the model’s probability that Y is between 3 and 4 inclusive. For a cumulative probability model this easily derived from \(P(Y \geq 3) - P(Y > 4)\) which is a difference in expits.
Here is an example where we simulate a dataset with a 6-level ordinal response variable with no censored values, and fit the PO model to it. Then the dataset is duplicated and all the observations with y=1 or 2 are left censored at <= 2 (which is the same as interval censoring on [1,2]), all those with y=3 or 4 are interval censored in [3,4], and all those with y=5 or 6 are right censored at y >= 5. The model is re-fitted to see if the posterior mean regression coefficients remain relatively unchanged.
set.seed(1)
<- 500
n <- rnorm(n)
x <- as.integer(cut2(x + rnorm(n), g=6))
y <- blrm(y ~ x, file=rfile(bnocens)) f
Initial log joint probability = -1158.27
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
16 -756.799 0.000200797 0.00309593 0.8894 0.8894 19
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
Running MCMC with 4 chains, at most 11 in parallel...
Chain 1 finished in 1.0 seconds.
Chain 2 finished in 1.0 seconds.
Chain 3 finished in 1.0 seconds.
Chain 4 finished in 1.0 seconds.
All 4 chains finished successfully.
Mean chain execution time: 1.0 seconds.
Total execution time: 1.4 seconds.
<- length(x)
m <- y[1 : m]
y2 <- b <- y2
a # Left censor obs with y <= 2
<- y2 <= 2
i <- 1
a[i] <- 2
b[i] # Interval censor obs with y = 3 or 4
<- y2 == 3 | y2 == 4
i <- 3
a[i] <- 4
b[i] # Right censor obs with y = 5 or 6
<- y2 >= 5
i <- 5
a[i] <- 6
b[i] table(y2, paste0('[', a, ',', b, ']'))
y2 [1,2] [3,4] [5,6]
1 84 0 0
2 83 0 0
3 0 83 0
4 0 84 0
5 0 0 83
6 0 0 83
<- Ocens(c(y, a), c(y, b))
Y <- c(x, x[1 : m])
x <- blrm(Y ~ x, file=rfile(bcens)) g
Initial log joint probability = -1812.75
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
17 -1183.86 0.00654922 0.0027525 0.9676 0.9676 22
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
Running MCMC with 4 chains, at most 11 in parallel...
Chain 1 finished in 2.0 seconds.
Chain 2 finished in 1.9 seconds.
Chain 3 finished in 1.9 seconds.
Chain 4 finished in 2.0 seconds.
All 4 chains finished successfully.
Mean chain execution time: 2.0 seconds.
Total execution time: 2.3 seconds.
rbind('No cens:mode'=coef(f, 'mode'),
'No cens:mean'=coef(f, 'mean'),
'Cens:mode' =coef(g, 'mode'),
'Cens:mean' =coef(g, 'mean'))
y>=2 y>=3 y>=4 y>=5 y>=6 x
No cens:mode 2.199384 0.9533240 -0.012507468 -1.015330 -2.288416 1.532672
No cens:mean 2.206683 0.9550262 -0.013765782 -1.017101 -2.296361 1.538970
Cens:mode 2.231817 0.9764970 -0.006108156 -1.024490 -2.313503 1.581117
Cens:mean 2.237361 0.9767551 -0.008692016 -1.026865 -2.316603 1.585905
5 Multiple Imputation
When possible, full joint Bayesian modeling of (possibly missing) covariates and the outcome variable should be used to get exact inference in the presence of missing covariate values. Another good approach is to use multiple imputation with stacking of posterior draws after running the Bayesian model for each completed dataset. When doing posterior inference on the stacked posterior draws the uncertainty from multiple imputation is fully taken into account, and so is the change in posterior distribution. Frequentist inference requires complex adjustments, as multiple imputation alters the sampling distribution of model parameter estimates. For example, regression coefficient estimates that have a normal distribution with complete data may have a \(t\)-like distribution after multiple imputation.
blrm
works with the stackMI
function to make posterior stacking easier. It works with Hmisc::aregImpute
and the mice
package. stackMI
is the analog of the Hmisc::fit.mult.impute
but is much simpler due to the use of the Bayesian paradigm.
Here is an example adapted from the aregImpute
help file. The aregImpute
result is stored so that the random seed-initiated multiple imputation process will not be re-run if not necessary. That will allow the Stan code to not be run again until the underlying data changes or the code in the chunks changes.
set.seed(2)
<- 1000
n <- factor(sample(c('a','b','c'), n, TRUE))
x1 <- (x1=='b') + 3*(x1=='c') + rnorm(n,0,2)
x2 <- rnorm(n)
x3 <- 0.35 * (x2 + 1 * (x1 == 'c') + 0.2 * x3)
xbeta <- ifelse(runif(n) <= plogis(xbeta), 1, 0)
y 1:250] <- NA
x1[251:350] <- NA
x2[<- data.frame(x1,x2,x3,y, stringsAsFactors=TRUE)
d
<- file.exists('mi.rds')
mithere <- if(mithere) readRDS('mi.rds') else
mi aregImpute(~ y + x1 + x2 + x3, nk=3, data=d, B=10, n.impute=5, pr=FALSE)
if(! mithere) saveRDS(mi, 'mi.rds', compress='xz')
mi
Multiple Imputation using Bootstrap and PMM
aregImpute(formula = ~y + x1 + x2 + x3, data = d, n.impute = 5,
nk = 3, pr = FALSE, B = 10)
n: 1000 p: 4 Imputations: 5 nk: 3
Number of NAs:
y x1 x2 x3
0 250 100 0
type d.f.
y l 1
x1 c 2
x2 s 1
x3 s 2
Transformation of Target Variables Forced to be Linear
R-squares for Predicting Non-Missing Values for Each Variable
Using Last Imputations of Predictors
x1 x2
0.283 0.376
# Note: The following model will be re-run every time aregImpute runs
# because imputations have randomness
<- stackMI(y ~ x1 + x2 + x3, blrm, mi, data=d, refresh=50, file=rfile(bmi)) bmi
Initial log joint probability = -897.95
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
26 -575.969 0.00271884 0.00409143 1 1 29
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
Initial log joint probability = -1031.45
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
18 -585.022 0.0014463 0.00534971 1 1 19
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
Initial log joint probability = -1100.47
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
20 -580.146 0.0550563 0.0061046 1 1 24
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
Initial log joint probability = -857.204
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
17 -582.573 0.00756208 0.0020154 1 1 18
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
Initial log joint probability = -1207.4
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
25 -582.707 0.00282346 0.00289274 0.5782 0.05782 29
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
stanDx(bmi)
Diagnostics for each of 5 imputations
Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved
For each parameter, n_eff is a crude measure of effective sample size
and Rhat is the potential scale reduction factor on split chains
(at convergence, Rhat=1)
Imputation 1
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete, no problems detected.
EBFMI: 1.233 1.147 1.063 1.064
Parameter Rhat ESS bulk ESS tail
1 alpha[1] 1.001 4649 3371
2 beta[1] 1.001 5218 3302
3 beta[2] 1.001 4651 3242
4 beta[3] 1.001 4611 3191
5 beta[4] 1.002 4595 3331
Imputation 2
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete, no problems detected.
EBFMI: 1.055 1.069 1.179 1.132
Parameter Rhat ESS bulk ESS tail
1 alpha[1] 1.001 4131 3145
2 beta[1] 1.002 4727 3208
3 beta[2] 1.002 4356 2971
4 beta[3] 1.001 4552 3325
5 beta[4] 1.001 4748 3219
Imputation 3
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete, no problems detected.
EBFMI: 1.186 1.116 1.125 1.006
Parameter Rhat ESS bulk ESS tail
1 alpha[1] 1.000 4860 3448
2 beta[1] 1.001 4216 3166
3 beta[2] 1.001 4049 2961
4 beta[3] 1.000 4080 3163
5 beta[4] 1.000 4449 3547
Imputation 4
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete, no problems detected.
EBFMI: 1.171 1.003 1.065 1.124
Parameter Rhat ESS bulk ESS tail
1 alpha[1] 1.000 4079 3293
2 beta[1] 1.002 4910 3119
3 beta[2] 1.000 4283 3038
4 beta[3] 1.001 4492 2828
5 beta[4] 1.001 4773 2995
Imputation 5
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete, no problems detected.
EBFMI: 1.099 1.138 1.136 1.064
Parameter Rhat ESS bulk ESS tail
1 alpha[1] 1.000 4390 3031
2 beta[1] 1.001 5146 3391
3 beta[2] 1.001 4121 2936
4 beta[3] 1.000 4653 2761
5 beta[4] 1.000 5132 3209
stanDxplot(bmi, which='x1=c', rev=TRUE, stripsize=5)
plot(bmi)
One can see that the 5 individual posterior distributions for the frequently missing variables x1
and x2
vary a lot, but not so for the never missing variable x3
.
Computations done on the bmi
object will automatically use the full stacked posterior distribution.
6 Scaling and Intepretation of Priors on \(\beta\)s
blrm
orthonormalizes the data design matrix using the QR decomposition to greatly improve posterior distribution sampling in the case of collinearities among predictors (especially among spline basis functions). This created complexities for rmsb
prior to version 1.0-0. With version 1.0-0 priors are specified only on contrasts, and these contrasts are automatically translated to QR space, so users do not need to account for QR.
keepsep
argument to hold them out of QR, and the prior variance for regression coefficients (here \(\tau\)s) is provided. In the future this will be handled through contrasts also.7 Speed of blrm
For Large Numbers of Y Levels
When there is a large number of intercepts in the model, the speed of blrm
will decrease. What about the speed of using blrm
just to get (potentially penalized) maximum likelihood estimates? Let’s try fitting a progressively more continuous dependent variable.
set.seed(1)
<- 1000
n <- rnorm(n)
x <- x + rnorm(n)
y for(g in c(2, 4, 8, 16, 32, 64, 128, 256)) {
cat('\n', g, 'distinct values of y\n')
<- cut2(y, g=g)
yg print(system.time(f <- blrm(yg ~ x, method='optimizing')))
}
2 distinct values of y
Initial log joint probability = -753.369
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
10 -512.881 0.0154697 0.00680246 1 1 13
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
user system elapsed
0.061 0.039 0.440
4 distinct values of y
Initial log joint probability = -1528.9
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
16 -1112.68 0.000413208 0.00760905 1 1 17
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
user system elapsed
0.059 0.056 0.424
8 distinct values of y
Initial log joint probability = -2614
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
21 -1760.76 0.001458 0.00459888 1 1 23
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
user system elapsed
0.059 0.057 0.425
16 distinct values of y
Initial log joint probability = -3416.31
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
18 -2417.06 0.00259355 0.00659989 1 1 19
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
user system elapsed
0.059 0.043 0.424
32 distinct values of y
Initial log joint probability = -4182.99
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
18 -3054.09 0.00391769 0.00415608 1 1 19
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
user system elapsed
0.060 0.049 0.427
64 distinct values of y
Initial log joint probability = -4691.63
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
14 -3622.95 0.00535222 0.00972658 1 1 18
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
user system elapsed
0.061 0.047 0.430
128 distinct values of y
Initial log joint probability = -5148.01
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
15 -4024.05 0.0029433 0.00422298 1 1 18
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
user system elapsed
0.063 0.047 0.423
256 distinct values of y
Initial log joint probability = -5141.45
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
16 -4044.65 0.00106451 0.0025487 0.6525 0.6525 19
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
user system elapsed
0.071 0.098 0.442
This is impressive. For g=256 compare with the execution time of the Newton-Raphson method making optimum use of sparse matrices. Also compare coefficients. When sampling is done, the default Dirichlet distribution concentration parameter for the intercepts is selected to make the posterior means agree with maximum likelihood estimates, sacrificing some performance of posterior modes. When method='optimizing'
, instead a concentration parameter of 1.0 for the Dirichlet prior distribution for intercepts is always used, which seems to make optimization agree more with maximum likelihood estimates. This optimization is used to get posterior modes when random effects are not present.
system.time(g <- orm(yg ~ x))
user system elapsed
0.039 0.001 0.042
plot(coef(g), coef(f), xlab='Coefficients from orm',
ylab='Coefficients from blrm')
abline(a=0, b=1, col=gray(0.8))
See how long it takes to do posterior sampling with Stan
when there are 16, 64, or 128 levels of y.
for(g in c(16, 64, 128)) {
cat('\n', g, 'distinct values of y\n')
<- cut2(y, g=g)
yg print(system.time(h <- blrm(yg ~ x)))
}
16 distinct values of y
Initial log joint probability = -3126.42
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
15 -2417.06 0.00539103 0.0116098 1 1 19
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
Running MCMC with 4 chains, at most 11 in parallel...
Chain 1 finished in 2.2 seconds.
Chain 2 finished in 2.3 seconds.
Chain 3 finished in 2.3 seconds.
Chain 4 finished in 2.3 seconds.
All 4 chains finished successfully.
Mean chain execution time: 2.3 seconds.
Total execution time: 2.6 seconds.
user system elapsed
12.299 0.879 7.679
64 distinct values of y
Initial log joint probability = -4585.05
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
14 -3622.95 0.00377665 0.0107938 1 1 17
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
Running MCMC with 4 chains, at most 11 in parallel...
Chain 1 finished in 2.7 seconds.
Chain 2 finished in 2.8 seconds.
Chain 3 finished in 2.9 seconds.
Chain 4 finished in 2.9 seconds.
All 4 chains finished successfully.
Mean chain execution time: 2.8 seconds.
Total execution time: 3.3 seconds.
user system elapsed
14.960 0.874 8.881
128 distinct values of y
Initial log joint probability = -5077.25
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
14 -4024.05 0.00391544 0.00557172 1 1 16
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
Running MCMC with 4 chains, at most 11 in parallel...
Chain 1 finished in 4.1 seconds.
Chain 2 finished in 4.6 seconds.
Chain 3 finished in 4.6 seconds.
Chain 4 finished in 4.4 seconds.
All 4 chains finished successfully.
Mean chain execution time: 4.4 seconds.
Total execution time: 4.8 seconds.
user system elapsed
21.938 0.913 11.133
8 Computing Environment
R version 4.3.2 (2023-10-31) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Sonoma 14.3.1 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0 attached base packages: [1] splines stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] data.table_1.15.2 ordinal_2023.12-4 VGAM_1.1-10 rmsb_1.1-0 [5] rms_6.8-0 Hmisc_5.1-2To cite R in publications use:
R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
To cite the Hmisc package in publications use:Harrell Jr F (2024). Hmisc: Harrell Miscellaneous. R package version 5.1-2, https://hbiostat.org/R/Hmisc/.
To cite the rms package in publications use:Harrell Jr FE (2024). rms: Regression Modeling Strategies. R package version 6.8-0, https://github.com/harrelfe/rms, https://hbiostat.org/R/rms/.
To cite the rmsb package in publications use:Harrell F (2024). rmsb: Bayesian Regression Modeling Strategies. R package version 1.1-0, https://hbiostat.org/R/rmsb/.
To cite the VGAM package in publications use:Yee TW (2015). Vector Generalized Linear and Additive Models: With an Implementation in R. Springer, New York, USA.
To cite the data.table package in publications use:Barrett T, Dowle M, Srinivasan A, Gorecki J, Chirico M, Hocking T (2024). data.table: Extension of 'data.frame'. R package version 1.15.2, https://CRAN.R-project.org/package=data.table.