Simulating Bayesian Operating Characteristics of Bayesian Sequential RCT
1 Introduction
There are many types of operating characteristics to consider for a statistical assessment of evidence for treatment effects, for example
- Bayesian or frequentest power or frequentist type I assertion probability α if the study goes to its planned end and there is only one look at the data at that time. This is a pre-study tendency.
- Pre-study tendencies for various successes or failures to happen sometime during the course of the study if the data are analyzed sequentially. In the frequentist setting, α in this case is the probability of ever rejecting H0 when the treatment is ignorable (no benefit and no harm).
- Reliability of post-study evidence quantification when there is a single data look
- Reliability of the evidence quantification at the decision point in a sequential study. I.e., at the first data look at which an evidence threshold is exceeded what is the reliability of a frequentist null hypothesis test at that look or of a Bayesian posterior probability at that look.
- Properties of a futility assessment that is done in mid course, e.g., of a frequentist conditional power calculation or a Bayesian posterior predictive probability of ultimate study success
Once the study has begun and the first analysis is done, purely pre-study operating characteristics such as 1. and 2. are no longer relevant. In this report we are concerned with 2. and 4.
Consider a two-treatment arm therapeutic randomized controlled trial (RCT) for assessing treatment, with 1:1 randomization ratio and a univariate ordinal outcome. Though primary analyses should always be covariate-adjusted we will not consider covariate adjustment in these simulations. The trial design considered here is a fully sequential one with unlimited data looks so as to minimize the expected sample size. We will estimate the probability that the posterior probability, under various priors, will reach given targets for efficacy and harm/inefficacy, as a function of the sequential look number. These operating characteristics are useful for study planning, especially with regard to avoiding a futile trial from the outset. Once data are available, however, the simulations in the last part of this document are the most important. This second set of simulations addresses the reliability of evidence at the decision point, i.e., at the moment of stopping for efficacy, inefficacy, or harm.
For simplicity we align the looks with completion of follow-up of patients in the trial, e.g., a look after the 50th patient is followed will assume that patients 1-49 have also completed follow-up.
The simulations presented here assume that a normal distribution is an adequate approximation to the distribution of the treatment effect estimate, here a treatment B : treatment A log odds ratio. The data model is a proportional odds ordinal logistic model analyzed with the R rms
package lrm
function. Simulations are done by the Hmisc
package estSeqSim
and gbayesSeqSim
functions. As implemented in the R Hmisc
package gbayes function, the Bayesian posterior distribution used to approximate (thus avoiding another simulation loop for MCMC posterior draws) the real posterior distribution is normal, making for very quick approximate posterior probability calculations when the prior distribution used for the log OR is also Gaussian as used here. Three priors are used:
- a skeptical prior for assessing evidence for efficacy
- a flat prior for assessing evidence for efficacy
- a flat prior for studying posterior probabilities of inefficacy/harm
- an optimistic prior for inefficacy/harm
See here for more simulations and graphical presentations of them.
1.1 Technical Note
Our simulations involve simulating ordinal data for the planned maximum sample size and then progressively revealing more and more of this sample as sequential looks progress. As with a real study, the data in the sequential looks overlap and this needs to be taken into account for certain analyses of posterior probability paths.
Unlike comparing means with a two-sample t-test, the formula for the variance of a log odds ratio is not a simple function of the group sample sizes. So one cannot compute the variance of a log odds ratio computed on one patient from treatment A and one from treatment B then divide that variance by the number of A-B patient pairs to get the variance of a log odds ratio for an arbitrary number of patients. Were that not the case, our simulations would have been even faster because one could simulate from a normal distribution instead of actually fitting the logistic model for each sample and look.
2 Simulation Parameters
Unlike here, we take treatment benefit to mean that the odds ratio (OR) < 1. Since a large number of true ORs are used in the simulation, we simulate only 500 clinical trials per OR and borrow information across ORs to compute Bayesian operating characteristics. ORs will vary from benefit to harm over this sequence: 0.4, 0.45, …, 1.25. The maximum sample size will be 1000 patients, with the first look taken at the 25th patient. Look every patient until 100 patients, then for faster simulations look only every 5 patients until a maximum sample size of 1000. Posterior probabilities will be computed for the following assertion and prior distribution combinations:
- Efficacy: P(OR < 1) with a skeptical prior P(OR < 0.5) = 0.025
- Efficacy: P(OR < 1) with a flat prior (log(OR) has mean 0 and SD 100)
- Inefficacy/harm: P(OR > 1) with a flat prior (log(OR) has mean 0 and SD 100)
- Inefficacy/harm: P(OR > 1) with an optimistic prior (log(OR) has mean log(0.85) = -0.1625 and SD of 0.5)
For computing the probabilities of hitting various targets, the following posterior probability thresholds are used:
- Efficacy: P(OR < 1) > 0.95
- Inefficacy/harm: P(OR > 1) > 0.9
3 Simulation to Estimate Probabilities of Stopping for Different Reasons
First define functions needed by the Hmisc
package estSeqSim
and gbayesSeqSim
functions.
# Use a low-level logistic regression call to speed up simulations
<- function(x, y) {
lfit # lrm.fit.bare will be in version 6.1-0 of rms; until then use lrm.fit
<- rms::lrm.fit.bare(x, y)
f <- f$coefficients
cof <- length(cof)
k c(cof[k], f$var[k, k])
}
# Data generation function
<- function(beta, n1, n2) {
gdat # Cell probabilities for a 7-category ordinal outcome for the control group
<- rev(c(2, 1, 2, 7, 8, 38, 42) / 100)
p
# Compute cell probabilities for the treated group
<- pomodm(p=p, odds.ratio=exp(beta))
p2 <- sample(1 : 7, n1, p, replace=TRUE)
y1 <- sample(1 : 7, n2, p2, replace=TRUE)
y2 list(y1=y1, y2=y2)
}
Now run the simulations. The result is saved so that the model fits are not run each time this report is compiled.
<- seq(0.4, 1.25, by=0.05)
ors <- c(25 : 100, seq(105, 1000, by=5))
looks
if(! simdone) {
set.seed(3)
# estSeqSim took 38m for 2.3M model fits; plan on 0.001s per OR/simulation/look
print(system.time(est <- estSeqSim(log(ors), looks, gdat, lfit, nsim=500, progress=TRUE)))
saveRDS(est, 'seqsim.rds', compress='xz')
else est <- readRDS('seqsim.rds')
}
# Define assertions and priors to be used for them
# Assertion 1: log(OR) < 0 under prior with prior mean 0 and sigma: P(OR>2)=0.025
# Assertion 2: log(OR) < 0 under flat prior
# Assertion 3: log(OR) > 0 under flat prior (sigma=100)
# Assertion 4: log(OR) > 0 under optimistm prior with mean log(0.85), sigma=0.5
<- list(list('Efficacy', '<', 0, cutprior=log(2), tailprob=0.025),
asserts list('Efficacy flat', '<', 0, mu=0, sigma=100),
list('Inefficacy/harm flat', '>', 0, mu=0, sigma=100),
list('Inefficacy/harm optimistic', '>', 0, mu=log(0.85), sigma=0.5))
<- gbayesSeqSim(est, asserts=asserts)
s
head(s)
sim parameter look est vest p1 mean1 sd1
1 1 -0.9162907 25 -0.5608048 0.6702457 0.6070526 -0.08819149 0.3246568
2 1 -0.9162907 26 -0.7216567 0.6351371 0.6432983 -0.11872801 0.3232548
3 1 -0.9162907 27 -0.5308043 0.6012535 0.6118197 -0.09140265 0.3217666
4 1 -0.9162907 28 -0.6291448 0.5922355 0.6335884 -0.10969854 0.3213456
5 1 -0.9162907 29 -0.7187434 0.5847377 0.6534119 -0.12664488 0.3209870
6 1 -0.9162907 30 -0.8010207 0.5784064 0.6715144 -0.14241269 0.3206779
p2 mean2 sd2 p3 mean3 sd3 p4
1 0.7533229 -0.5607672 0.8186579 0.2466771 -0.5607672 0.8186579 0.2628995
2 0.8173968 -0.7216109 0.7969296 0.1826032 -0.7216109 0.7969296 0.2246521
3 0.7531798 -0.5307724 0.7753821 0.2468202 -0.5307724 0.7753821 0.2597402
4 0.7931801 -0.6291075 0.7695456 0.2068199 -0.6291075 0.7695456 0.2363885
5 0.8263650 -0.7187014 0.7646591 0.1736350 -0.7187014 0.7646591 0.2158080
6 0.8538774 -0.8009744 0.7605084 0.1461226 -0.8009744 0.7605084 0.1976082
mean4 sd4
1 -0.2707199 0.4267123
2 -0.3204430 0.4235439
3 -0.2706787 0.4202129
4 -0.3010270 0.4192764
5 -0.3291055 0.4184808
6 -0.3552087 0.4177965
attr(s, 'asserts')
label cutprior tailprob mu sigma assertion
1 Efficacy 0.6931472 0.025 0.0000000 0.353653 < 0
2 Efficacy flat NA NA 0.0000000 100.000000 < 0
3 Inefficacy/harm flat NA NA 0.0000000 100.000000 > 0
4 Inefficacy/harm optimistic NA NA -0.1625189 0.500000 > 0
<- attr(s, 'alabels') # named vector to map p1 p2 p3 p4 to labels alabels
First let’s examine the effect of the priors by making two pairwise comparisons: differences in posterior probabilities of efficacy under skeptical vs. flat prior, and differences in posterior probabilities of inefficacy under flat and optimistic priors.
<- data.table(s)
w <- w[, .(p12max=max(abs(p1 - p2)), p12mean=mean(abs(p1 - p2)),
u p34max=max(abs(p3 - p4)), p34mean=mean(abs(p3 - p4))), by=.(look)]
<- melt(u, measure.vars=c('p12max', 'p12mean', 'p34max', 'p34mean'),
z variable.name='which', value.name='diff')
<- c(p12max='Efficacy max', p12mean='Efficacy mean',
k p34max='Inefficacy max', p34mean='Inefficacy mean')
:= k[which]]
z[, w ggplot(z, aes(x=look, y=diff, color=w)) + geom_line() +
guides(color=guide_legend(title='Comparison')) +
xlab('Look') + ylab('Absolute Difference in Posterior Probabilities')
Compute the cumulative probability of hitting assertion-specific targets as data looks advance.
# Reshape results into taller and thinner data table so can plot over 3 assertions
<- names(alabels)
ps <- melt(w,
w measure.vars=list(ps, paste0('mean', 1:4), paste0('sd', 1:4)),
variable.name='assert', value.name=c('p', 'mean', 'sd'))
:= alabels[assert]]
w[, assert head(w)
sim parameter look est vest assert p mean
1: 1 -0.9162907 25 -0.5608048 0.6702457 Efficacy 0.6070526 -0.08819149
2: 1 -0.9162907 26 -0.7216567 0.6351371 Efficacy 0.6432983 -0.11872801
3: 1 -0.9162907 27 -0.5308043 0.6012535 Efficacy 0.6118197 -0.09140265
4: 1 -0.9162907 28 -0.6291448 0.5922355 Efficacy 0.6335884 -0.10969854
5: 1 -0.9162907 29 -0.7187434 0.5847377 Efficacy 0.6534119 -0.12664488
6: 1 -0.9162907 30 -0.8010207 0.5784064 Efficacy 0.6715144 -0.14241269
sd
1: 0.3246568
2: 0.3232548
3: 0.3217666
4: 0.3213456
5: 0.3209870
6: 0.3206779
# Define targets
<- c(Efficacy = 0.95,
target 'Efficacy flat' = 0.95,
'Inefficacy/harm flat' = 0.9,
'Inefficacy/harm optimistic' = 0.9)
:= target[assert]] # spreads targets to all rows
w[, target # hit = 0/1 indicator if hitting target at or before a certain look equals
# 1 if cumulative number of target hits > 0 at that look
<- w[, .(hit = 1*(cumsum(p > target) > 0), look=look), by=.(sim, parameter, assert)]
u
# To compute for each assertion/prior/parameter combination the
# first look at which the target was hit run the following
# (first is set to infinity if never hit)
# f <- w[, .(first=min(look[p > target])), by=.(sim, parameter, assert)]
# with(subset(f, assert == 'Efficacy' & parameter == 0), mean(is.finite(first)))
Next estimate the probability of hitting the posterior probability targets as a function of the true parameter generating the data, the look number, and the assertion. We use simple stratified proportions for this purpose because of an adequate number of replications per condition. (With fewer replications we would have used logistic regression to interpolate probability estimates for more precision.) Estimates are made for a subset of the true odds ratios.
# Estimate the probability at OR 0.4 0.5 ... using simple proportions
<- round(exp(u$parameter), 2)
ors <- u[ors == round(ors, 1), ] # subset data table with OR incremented by 0.1
us <- us[, .(p=mean(hit)), by=.(parameter, look, assert)]
prop ggplot(prop, aes(x=look, y=p, color=factor(exp(parameter)))) + geom_line() +
facet_wrap(~ assert) +
xlab('Look') + ylab('Proportion Stopping') +
guides(color=guide_legend(title='True OR')) +
theme(legend.position = 'bottom')