rmsb
Package Examples
Overview and Setup
The rmsb
package is the Bayesian companion to the rms
package, and uses many of its functions for post-model-fitting analysis and graphics. The sole fitting function in rmsb
at present is blrm
. blrm
is for Bayesian binary and ordinal proportional odds logistic regression and the Peterson & Harrell (1990) partial proportional odds model that relaxes the proportional odds assumption. It is the analog and generalization of rms::lrm
and for ordinal responses is intended for outcomes with up to a several hundred ordinal levels. The Bayesian approach has a number of advantage over traditional frequentist models, including
- the use of exact calculations (to within simulation error) instead of large sample (e.g., normal theory) approximations to p-values and confidence intervals
- exact and more intuitive inference when random effects are included in the model
- the ability to make probability statements about parameters and combinations of parameters, which includes computations of the probabilities of assertions such as “the effect of \(x_1\) exceeds 1.2 and the effect of \(x_2\) exceeds 0.7”
- capturing more sources of uncertainty. For example, the
blrm
function automatically computes highest posterior density intervals on a variety of statistical indexes such as the Brier score and AUROC (\(c\)-index). Note: By default these intervals are computed using only 400 posterior draws to save time. For ablrm
fit objectf
you can specify how many samples to draw, to get more accurate intervals, by specifying for exampleprint(f, ns=2000)
. - the ability to incorporate external information or beliefs about parameters using prior distributions
- by using Stan language to specify the likelihood components, one can not only do posterior distribution sampling but can quickly compute maximum likelihood estimates (Bayesian posterior modes)
blrm
uses normal priors for the \(\beta\) parameters, and the user may specify the standard deviation of these priors separately for each model parameters. The default is \(\sigma=100\) which is nearly flat. When there are random effects, the analyst may specify the mean of the exponential distribution that serves as the prior for the standard deviation of the random effects, with the default mean being 1.0, a reasonable number for the logit scale. You can also use a half-\(t\) distribution for the SD of random effects.
Regarding uncertainies about statistical indexes of model performance such as AUROC, the Bayesian posterior distributions computed by rmsb
account for coefficient uncertainty for the sample at hand’s ability to uncover the true unknown data generating coefficients and how strongly those coefficients can predict outcomes in the sample. The posterior distributions do not pertain to sample-to-sample variability of model fit or model validation.
blrm
uses Stan code written by Ben Goodrich of Columbia University and Frank Harrell. This code is precompiled when the rmsb
package is built. You must isntall the rstan
package to use rmsb
.
Here is a typical setup code chunk for using rmsb
. It is not recommended to use options(auto_write=TRUE)
as when running a series of regressions, the rstan
package tends to recompile Stan code even when the code doesn’t change.
```{r setup}
require(rmsb)
options(mc.cores = parallel::detectCores()) # use max # CPUs
require(rmsb)
knitrSet(lang='markdown', w=7, h=7, fig.path='png/')
options(prType='html') # for rms output: print, anova, summary, etc.
options(mc.cores = parallel::detectCores()) # use max # CPUs
# Function to print an object, inside a verbatim quote if
# results='asis' in effect
# Works for all output formats in R markdown
function(x, ...) {
vprint <- knitr::opts_current$get('results')
r <- length(r) && r == 'asis'
asis <-if(! asis) return(invisible(print(x, ...)))
c('\n\n```', capture.output(print(x, ...)), '```\n\n')
r <-cat(r, sep='\n')
invisible()
}
Running Fits Only When Something Changes
You’ll see file='...'
in the longer running of the blrm
calls below. If the file already exists and none of the data nor the options sent to rstan
nor the underlying Stan code have changed from what were used to create the fit object stored in that file (as judged by their md5
hash), that saved fit object is returned immediately without running the rstan
code, often saving a great deal of execution time. This works well with the workflow of long R markdown reports making it so that only portions of Bayesian analysis that have changed are run. Note that using the knitr
cache=TRUE
option does not work well as the cache files for this script were about a GB in size, and the system does not accurately recognize when a model fit hasn’t changed and doesn’t need to be run when rstan
is involved.
A restored fit object does not contain the rstan
object, saving tens of megabytes of storage. Standard Stan diagnostics are stored in the fit object separately, and it is assumed that if the user wanted to run rstan::loo
loo=TRUE
would have been specified to blrm
so that loo
is run and its (small) result stored in the blrm
fit object. If you want to run pairs
to get more graphical diagnostics, intead of relying on the rstan
object always being available, specify pairs=TRUE
or pairs='filename.png'
to blrm
to graph the pair plots. The latter is recommended, and one can put knitr::include_graphics('filename.png')
in the R code chunk to render the graph in the report even if blrm
was not re-run.
When file='filename.rds'
is specified to blrm
and the file does not exist or analysis inputs have changed since the file was created, the blrm
fit object will be saved in saveRDS
.rds
binary format in your current working directory at the completion of blrm
’s work. The rstan
component is omitted from the saved file.
The utility function fitIf
available from here is another way to efficiently manage Bayesian simulation workflow. The fitIf
approach runs the analysis only if the file doesn’t already exist. If it exists but the data or the model have changed, the fitIf
approach is not intelligent enough to re-run the analysis (unlike the file='...'
approach above). Whether using fitIf
or file=
, you lose the ability to run the rms::stanDxplot(..., rstan=TRUE)
on restored objects, so the stanDxplot
function tries to find an existing trace plot image file corresponding to the current R markdown chunk when the fit object no longer has the rstan
component. For most purposes this doesn’t matter, because running using the defaults shows the needed non-burnin samples which are always stored in fit objects. The file=filename.rds
approach is preferred.
Proportional Odds and Binary Logistic Regression
Example: 10-level Ordinal Outcome
Simulate a dataset with three predictors and one 10-level ordinal outcome. Run the frequentist proportional odds model then the Bayesian one.
set.seed(1)
500
n <- runif(n, -1, 1)
x1 <- runif(n, -1, 1)
x2 <- sample(0 : 1, n, TRUE)
x3 <- x1 + 0.5 * x2 + x3 + rnorm(n)
y <- as.integer(cut2(y, g=10))
y <- datadist(x1, x2, x3); options(datadist='dd')
dd <- lrm(y ~ x1 + pol(x2, 2) + x3, eps=1e-7) # eps to check against rstan
f <- f
Logistic Regression Model
lrm(formula = y ~ x1 + pol(x2, 2) + x3, eps = 1e-07)
Frequencies of Responses
1 2 3 4 5 6 7 8 9 10 50 50 50 50 50 50 50 50 50 50
Model Likelihood Ratio Test |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 500 | LR χ2 235.04 | R2 0.379 | C 0.740 |
max |∂log L/∂β| 4×10-14 | d.f. 4 | g 1.561 | Dxy 0.480 |
Pr(>χ2) <0.0001 | gr 4.762 | γ 0.480 | |
gp 0.302 | τa 0.433 | ||
Brier 0.181 |
β | S.E. | Wald Z | Pr(>|Z|) | |
---|---|---|---|---|
y≥2 | 2.0552 | 0.1951 | 10.53 | <0.0001 |
y≥3 | 1.0832 | 0.1678 | 6.45 | <0.0001 |
y≥4 | 0.4006 | 0.1606 | 2.49 | 0.0126 |
y≥5 | -0.1790 | 0.1599 | -1.12 | 0.2629 |
y≥6 | -0.7246 | 0.1626 | -4.46 | <0.0001 |
y≥7 | -1.2867 | 0.1689 | -7.62 | <0.0001 |
y≥8 | -1.8982 | 0.1788 | -10.62 | <0.0001 |
y≥9 | -2.6288 | 0.1953 | -13.46 | <0.0001 |
y≥10 | -3.6408 | 0.2282 | -15.95 | <0.0001 |
x1 | 1.5694 | 0.1532 | 10.24 | <0.0001 |
x2 | 0.7901 | 0.1368 | 5.78 | <0.0001 |
x22 | -0.2213 | 0.2578 | -0.86 | 0.3908 |
x3 | 1.6341 | 0.1707 | 9.57 | <0.0001 |
Before getting posterior distributions of parameters, use rstan
to just get maximum likelihood estimates and compare them with those from lrm
. Do this for increasingly flat priors for the \(\beta\)s. The default prior SD for blrm
is 100. Running method='optimizing'
is a quick way to study the effect of priors on the posterior modes for non-intercepts when there are no random effects in the model. It is also a way to get a sense of the scaling of parameters used in the actual calculations (but see a later section for how to show standard deviations of variables on the transformed scale given to Stan). Except for variables listed in the keepsep
argument to blrm
, prior standard deviations apply to the QR orthonormalized version of the design matrix.
for(psd in c(0.25, 1, 10, 100, 10000)) {
cat('\nPrior SD:', psd, '\n')
blrm(y ~ x1 + pol(x2, 2) + x3, method='optimizing', priorsd=psd)
g <-cat('-2 log likelihood:', g$deviance, '\n')
print(g$coefficients)
}
Prior SD: 0.25
-2 log likelihood: 2269.378
y>=2 y>=3 y>=4 y>=5 y>=6 y>=7
2.191046301 1.378643586 0.838324839 0.395207927 -0.011602895 -0.418570617
y>=8 y>=9 y>=10 x1 x2 x2^2
-0.862048892 -1.402835097 -2.215654082 0.024804025 0.013114821 -0.003458598
x3
0.026136325
Prior SD: 1
-2 log likelihood: 2234.618
y>=2 y>=3 y>=4 y>=5 y>=6 y>=7
2.12797711 1.29550040 0.73699395 0.27627257 -0.14878702 -0.57642046
y>=8 y>=9 y>=10 x1 x2 x2^2
-1.04203046 -1.60705055 -2.44522185 0.31969267 0.16828142 -0.04299895
x3
0.33452830
Prior SD: 10
-2 log likelihood: 2076.618
y>=2 y>=3 y>=4 y>=5 y>=6 y>=7 y>=8
2.0518358 1.0895535 0.4153739 -0.1563173 -0.6939620 -1.2471658 -1.8490486
y>=9 y>=10 x1 x2 x2^2 x3
-2.5683076 -3.5681608 1.5016729 0.7577512 -0.2091988 1.5632871
Prior SD: 100
-2 log likelihood: 2086.232
y>=2 y>=3 y>=4 y>=5 y>=6 y>=7 y>=8
2.0546496 1.0827924 0.4002833 -0.1792460 -0.7247696 -1.2867285 -1.8981788
y>=9 y>=10 x1 x2 x2^2 x3
-2.6286338 -3.6405325 1.5685655 0.7895744 -0.2202702 1.6335934
Prior SD: 10000
-2 log likelihood: 2122.98
y>=2 y>=3 y>=4 y>=5 y>=6 y>=7 y>=8
2.0551518 1.0831610 0.4005549 -0.1790191 -0.7246383 -1.2866789 -1.8982289
y>=9 y>=10 x1 x2 x2^2 x3
-2.6288108 -3.6408433 1.5695440 0.7896154 -0.2213424 1.6341990
# Compare with ordinary MLEs and deviance
$deviance f
[1] 2302.585 2067.549
coef(f)
y>=2 y>=3 y>=4 y>=5 y>=6 y>=7 y>=8
2.0551529 1.0831900 0.4005868 -0.1790073 -0.7246249 -1.2866643 -1.8982180
y>=9 y>=10 x1 x2 x2^2 x3
-2.6287996 -3.6408392 1.5694325 0.7900802 -0.2212900 1.6340800
Fit the model with Dirichlet priors on intercepts and wide normal priors on the \(\beta\)s. Show the model fit summary. Note that the indexes of predictive discrimination/accuracy include 0.95 highest posterior density intervals. In frequentist inference we pretend that quantities such as AUROC and \(R^2\) are estimated without error, which is far from the case.
In several places you will see an index named Symmetry
. This is a measure of the symmetry of a posterior distribution. Values farther from 1.0 indicate asymmetry, which indicates that the use of standard errors and the use of a normal approximation for the posterior distribution are not justified. The symmetry index is the ratio of the gap between the posterior mean and the 0.95 quantile of the posterior distribution to the gap between the 0.05 quantile and the mean.
blrm(y ~ x1 + pol(x2, 2) + x3, file='bs.rds')
bs <- bs
Bayesian Proportional Odds Ordinal Logistic Model
Dirichlet Priors With Concentration Parameter 0.233 for Intercepts
blrm(formula = y ~ x1 + pol(x2, 2) + x3, file = "bs.rds")
Frequencies of Responses
1 2 3 4 5 6 7 8 9 10 50 50 50 50 50 50 50 50 50 50
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 500 | LOO log L -1046.97±13.15 | g 1.58 [1.381, 1.808] | C 0.738 [0.735, 0.74] |
Draws 4000 | LOO IC 2093.95±26.3 | gp 0.115 [0.088, 0.138] | Dxy 0.477 [0.47, 0.48] |
Chains 4 | Effective p 13.2±0.24 | EV 0.149 [0.109, 0.192] | |
p 4 | B 0.077 [0.075, 0.079] | v 1.919 [1.454, 2.498] | |
vp 0.014 [0.008, 0.02] |
Mode β | Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|---|
y≥2 | 2.0546 | 2.0608 | 2.0583 | 0.1944 | 1.6855 | 2.4402 | 1.0000 | 1.02 |
y≥3 | 1.0828 | 1.0835 | 1.0835 | 0.1650 | 0.7730 | 1.4219 | 1.0000 | 1.05 |
y≥4 | 0.4003 | 0.3981 | 0.4010 | 0.1577 | 0.0583 | 0.6769 | 0.9928 | 1.01 |
y≥5 | -0.1792 | -0.1825 | -0.1832 | 0.1611 | -0.4942 | 0.1368 | 0.1290 | 0.99 |
y≥6 | -0.7248 | -0.7302 | -0.7304 | 0.1637 | -1.0486 | -0.4109 | 0.0000 | 0.94 |
y≥7 | -1.2867 | -1.2946 | -1.2941 | 0.1682 | -1.6322 | -0.9839 | 0.0000 | 0.96 |
y≥8 | -1.8982 | -1.9063 | -1.9035 | 0.1755 | -2.2450 | -1.5606 | 0.0000 | 0.97 |
y≥9 | -2.6286 | -2.6387 | -2.6355 | 0.1953 | -3.0125 | -2.2507 | 0.0000 | 0.97 |
y≥10 | -3.6405 | -3.6543 | -3.6511 | 0.2333 | -4.1259 | -3.2100 | 0.0000 | 0.98 |
x1 | 1.5686 | 1.5746 | 1.5725 | 0.1561 | 1.2745 | 1.8775 | 1.0000 | 1.03 |
x2 | 0.7896 | 0.7917 | 0.7942 | 0.1382 | 0.5340 | 1.0735 | 1.0000 | 1.01 |
x22 | -0.2203 | -0.2220 | -0.2226 | 0.2507 | -0.6892 | 0.2776 | 0.1880 | 1.00 |
x3 | 1.6336 | 1.6404 | 1.6373 | 0.1748 | 1.2839 | 1.9686 | 1.0000 | 1.06 |
# Show more detailed analysis of model performance measures
blrmStats(bs, pl=TRUE)
Indexes computed for a random sample of 400 of 4000 posterior draws
gp, B, EV, and vp are for intercept 1 out of 9 intercepts
Dxy C g gp B EV v vp
Mean 0.477 0.738 1.571 0.114 0.077 0.147 1.897 0.014
SE 0.003 0.002 0.112 0.014 0.001 0.023 0.271 0.003
Lower 0.471 0.735 1.383 0.088 0.076 0.110 1.416 0.008
Upper 0.481 0.740 1.806 0.141 0.079 0.196 2.432 0.020
Symmetry 0.423 0.423 1.097 1.167 1.482 1.226 1.186 1.314
Dxy: 2*(C - 0.5) C: concordance probability
g: Gini mean |difference| on linear predictor (lp)
gp: Gini on predicted probability B: Brier score
EV: explained variation on prob. scale v: var(lp) vp: var(prob)
Show basic Stan diagnostics. Had stanDxplots(bs, rstan=TRUE)
been used, intercepts would have been shifted from what is in g
because of subtractions of covariate means before passing data to rstan
.
stanDxplot(bs)
stanDx(bs)
Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved
For each parameter, n_eff is a crude measure of effective sample size
and Rhat is the potential scale reduction factor on split chains
(at convergence, Rhat=1)
n_eff Rhat
y>=2 7163 1.000
y>=3 5962 1.000
y>=4 7114 1.000
y>=5 8307 0.999
y>=6 10963 0.999
y>=7 9538 0.999
y>=8 8619 0.999
y>=9 8206 0.999
y>=10 7256 0.999
x1 6646 0.999
x2 9568 0.999
x2^2 8848 1.000
x3 5968 1.000
Here are the posterior distributions, calculated using kernel density estimates from posterior draws. Posterior models, shown as vertical lines, are parameter values that maximize the log posterior density (using rstan::optimizing
in the original model fit) so do not necessarily coincide with the peak of the kernel density estimates.
plot(bs)
# Also show 2-d posterior density contour for two collinear terms
plot(bs, c('x2', 'x2^2'), bivar=TRUE) # assumes ellipse
plot(bs, c('x2', 'x2^2'), bivar=TRUE, bivarmethod='kernel') # kernel density
# Print frequentist side-by-side with Bayesian posterior mean, median, mode
cbind(MLE=coef(f), t(bs$param))
MLE mode mean median
y>=2 2.0551529 2.0546496 2.0608013 2.0583224
y>=3 1.0831900 1.0827924 1.0835423 1.0835161
y>=4 0.4005868 0.4002833 0.3980935 0.4010254
y>=5 -0.1790073 -0.1792460 -0.1825232 -0.1832366
y>=6 -0.7246249 -0.7247696 -0.7302207 -0.7303855
y>=7 -1.2866643 -1.2867285 -1.2945701 -1.2941131
y>=8 -1.8982180 -1.8981788 -1.9063342 -1.9034973
y>=9 -2.6287996 -2.6286338 -2.6387192 -2.6354563
y>=10 -3.6408392 -3.6405325 -3.6543224 -3.6510901
x1 1.5694325 1.5685655 1.5746271 1.5724682
x2 0.7900802 0.7895744 0.7916821 0.7942466
x2^2 -0.2212900 -0.2202702 -0.2220469 -0.2225900
x3 1.6340800 1.6335934 1.6404096 1.6372988
# Compare covariance matrix of posterior draws with MLE
round(diag(vcov(f)) / diag(vcov(bs)), 2)
y>=2 y>=3 y>=4 y>=5 y>=6 y>=7 y>=8 y>=9 y>=10 x1 x2 x2^2 x3
1.01 1.04 1.04 0.98 0.99 1.01 1.04 1.00 0.96 0.96 0.98 1.06 0.95
range(vcov(f) / vcov(bs))
[1] -0.8151463 1.5075811