5 Posterior Probabilities With Sequential Analysis
In a Bayesian analysis) It is entirely appropriate to collect data until a point has been proven or disproven, or until the data collector runs out of time, money, or patience. — Edwards et al. (1963)
… sequential designs are often disappointing in what they tell us about the treatment beyond the fact that it works. How well it works is usually not well established. It is sometimes given that the reason that results from sequential trials may be unreliable is precisely because they are sequential. However, I favor a different explanation. They are unreliable when and if they stop early, because in that case, they deliver less information. Many of these difficulties are inherited by flexible designs, and indeed, it has even been claimed that they bring no advantages compared with sequential designs. … I want to make it quite clear that I have no objection with flexible designs where they help us avoid pursuing dead-ends. The analogous technique from the sequential design world is stopping for futility. In my opinion of all the techniques of sequential analysis, this is the most useful. — Senn (2013)
- Bayesian approach allows aggressive sequential testing and early study termination
- Point estimate at termination is pulled back by prior → perfect calibration
- Things go wrong if
- model incorrectly specified (this also hurts frequentist)
- prior used in analysis conflicts with prior used by study’s judge
- What if assess efficacy at will and terminate the instant P(efficacy) > 0.95?
- PP on average above 0.95 but remains perfectly calibrated as long as no subtitution of prior
- Math and Bayes’ theorem dictate perfect calibration independent of stopping rule, but simulation exposes logic flow
- Example:
- one-arm study with maximum n=500
- look after each subject
- assume the E raw data measurement with σ=1
- efficacy = mean μ>0
5.1 Skeptical Prior
- Prior taken to allow harm equally likely as benefits (mean 0)
- Favors no effect (peak of prior density function at 0)
- Low probability of large effects
- Mixtures of normals: flexible way to specify priors
- 1:1 mix of two normals, each with mean 0
- SD of first: P(μ>1)=0.1
- SD of second: P(μ>0.25)=0.05
Code
<- 1 / qnorm(1 - 0.1)
sd1 <- 0.25 / qnorm(1 - 0.05) sd2
Code
spar(bty='l')
<- 0.5 # 1:1 mixture
wt <- function(x) wt * dnorm(x, 0, sd1) + (1 - wt) * dnorm(x, 0, sd2)
pdensity <- seq(-3, 3, length=200)
x plot(x, pdensity(x), type='l', xlab=expression(mu), ylab='Prior Density')
- Simulate 50,000 different (on μ) studies; reflects real-world case where μ is unknown
- Frequentists emphasize whether if nature throws the null at you, you will not declare the truth non-null
- Bayesian inference: whatever nature throws at you, can we reconstruct what happened?
- Given prior beliefs, Bayes tries to reveal hidden truths by updating your beliefs
- Simulations reflect this:
- draw μ from the prior
- generate data assuming that μ
- do Bayesian analysis that doesn’t know μ but knows the prior generating μ
- sample sizes analyzed: 1, 2, 3, …, 500
- for speed, the code generates all 500 data points but they are revealed one-at-a-time
- Stop the instant P(efficacy) ≥ 0.95 or P(futility) ≥ 0.9
- Futility: μ < 0.05
Code
<- function(N, prior.mu=0, prior.sd, wt, mucut=0, mucutf=0.05,
simseq postcut=0.95, postcutf=0.9,
ignore=20, nsim=1000) {
<- rep(prior.mu, length=2)
prior.mu <- rep(prior.sd, length=2)
prior.sd <- prior.sd[1]; sd2 <- prior.sd[2]
sd1 <- sd1 ^ 2
v1 <- sd2 ^ 2
v2 <- 1 : N
j <- Mu <- PostN <- Post <- Postf <- postfe <- postmean <- numeric(nsim)
cmean <- stoppedi <- stoppedf <- stoppedfu <- stopfe <- status <-
stopped integer(nsim)
<- - (1 : ignore)
notignored
# Derive function to compute posterior mean
<- gbayesMixPost(NA, NA, d0=prior.mu[1], d1=prior.mu[2],
pmean v0=v1, v1=v2, mix=wt, what='postmean')
for(i in 1 : nsim) {
# See http://stats.stackexchange.com/questions/70855
<- if(wt == 1) 1 else sample(1 : 2, size=1, prob=c(wt, 1. - wt))
component <- prior.mu[component] + rnorm(1) * prior.sd[component]
mu # mu <- rnorm(1, mean=prior.mu, sd=prior.sd) if only 1 component
<- mu
Mu[i] <- rnorm(N, mean=mu, sd=1)
y <- cumsum(y) / j # all N means for N sequential analyses
ybar <- gbayesMixPost(ybar, 1. / j,
pcdf d0=prior.mu[1], d1=prior.mu[2],
v0=v1, v1=v2, mix=wt, what='cdf')
<- 1 - pcdf(mucut)
post <- post[N]
PostN[i] <- pcdf(mucutf)
postf <- stopped[i] <-
s if(max(post) < postcut) N else min(which(post >= postcut))
<- post[s] # posterior at stopping
Post[i] <- ybar[s] # observed mean at stopping
cmean[i] # If want to compute posterior median at stopping:
# pcdfs <- pcdf(mseq, x=ybar[s], v=1. / s)
# postmed[i] <- approx(pcdfs, mseq, xout=0.5, rule=2)$y
# if(abs(postmed[i]) == max(mseq)) stop(paste('program error', i))
<- pmean(x=ybar[s], v=1. / s)
postmean[i]
# Compute stopping time if ignore the first "ignore" looks
<- if(max(post[notignored]) < postcut) N
stoppedi[i] else
+ min(which(post[notignored] >= postcut))
ignore
# Compute stopping time if also allow to stop for futility:
# posterior probability mu < 0.05 > 0.9
<- if(max(post) < postcut & max(postf) < postcutf) N
stoppedf[i] else
min(which(post >= postcut | postf >= postcutf))
# Compute stopping time for pure futility analysis
<- if(max(postf) < postcutf) N else min(which(postf >= postcutf))
s <- postf[s]
Postf[i] <- s
stoppedfu[i]
## Another way to do this: find first look that stopped for either
## efficacy or futility. Record status: 0:not stopped early,
## 1:stopped early for futility, 2:stopped early for efficacy
## Stopping time: stopfe, post prob at stop: postfe
<- post >= postcut | postf >= postcutf
stp <- stopfe[i] <- if(any(stp)) min(which(stp)) else N
s <- if(any(stp)) ifelse(postf[s] >= postcutf, 1, 2) else 0
status[i] <- if(any(stp)) ifelse(status[i] == 2, post[s],
postfe[i] else post[N]
postf[s])
}list(mu=Mu, post=Post, postn=PostN, postf=Postf,
stopped=stopped, stoppedi=stoppedi,
stoppedf=stoppedf, stoppedfu=stoppedfu,
cmean=cmean, postmean=postmean,
postfe=postfe, status=status, stopfe=stopfe)
}
Code
<- 50000
nsim <- function() {
g set.seed(1)
simseq(500, prior.mu=0, prior.sd=c(sd1, sd2), wt=wt, postcut=0.95,
postcutf=0.9, nsim=nsim)
}<- runifChanged(g, sd1, sd2, wt, nsim)
z <- z$mu
mu <- z$post
post <- z$postn
postn <- z$stopped
st <- z$stoppedi
sti <- z$stoppedf
stf <- z$stoppedfu
stfu <- z$cmean
cmean <- z$postmean
postmean<- z$postf
postf <- z$status
status <- z$postfe
postfe <- function(x) formatNP(mean(x), digits=3)
rmean <- status == 2
k <- status == 1 kf
- 20393 trials stopped early for efficacy, proportion=0.408
- of 24907 trials with μ>0, proportion=0.786
- of 2367 trials with μ∈[0.15, 0.2], proportion=0.895
- of 12210 trials with μ≥0.2, proportion=0.981
- 28438 trials stopped early for futility
- 1169 trials went to completion (n=500)
- Average PP of efficacy at stopping for efficacy: 0.961
- Of trials stopped early for efficacy, proportion with μ > 0: 0.960
- Average PP of futility at stopping for futility: 0.920
- Of trials stopped early for futility, proportion with μ < 0.05: 0.923
- Perfect calibration of PP at stopping point
- Check calibration of P(efficacy) if did not stop for either reason (1169 trials)
- Assessed using smooth calibration curve exactly as in risk models
- Relate actual P(μ>0) (from prior) to PP(μ>0)
Code
spar(left=1)
<- val.prob(postn[status == 0], mu[status == 0] > 0, m=200,
v legendloc=c(.6, .4),
xlab='Posterior Probability of Efficacy at 500th Look',
ylab='Proportion of Trials\nwith Efficacy > 0')
- Histogram showing distribution of true μ when stopped early for efficacy
Code
spar(bty='l')
hist(mu[k], nclass=50, xlim=c(-1,4), xlab='Efficacy', main='')
abline(v=0, col='red', lwd=0.5)
<- mean(mu[k] <= 0)
regret text(-0.5, 500, paste0('Proportion regret=', round(regret, 3)), srt=90, adj=0)
- Check calibration of posterior mean when stop early for efficacy
- Relate posterior and ordinary sample mean to true μ using “super smoother”
Code
spar(bty='l')
plot(0, 0, xlab='Estimated Efficacy',
ylab='True Efficacy', type='n', xlim=c(-2, 4), ylim=c(-2, 4))
abline(a=0, b=1, col=gray(.9), lwd=4)
lines(supsmu(cmean[k], mu[k]))
lines(supsmu(postmean[k], mu[k]), col='blue')
text(2, .4, 'Sample mean')
text(-1, .8, 'Posterior mean', col='blue')
- Frequentist adjustment to sample mean for early stopping is complex, as is CL
- Automatic for Bayesian posterior mean, median, mode, credible interval
Median Stopping Time as a Function of True μ
- Expect to stop earlier when μ generating the data was larger
- Use a parametric survival model (log-normal) to flexibly relate μ to the stopping time
- No early stopping for efficacy → right-censor at sample size of 500
Code
<- datadist(mu); options(datadist='dd')
dd <- psm(Surv(st, st < 500) ~ rcs(mu, 6), dist='lognormal')
f plot(Predict(f, mu, fun=exp, conf.int=FALSE), xlim=c(-.25, 1), ylim=c(0, 500),
ylab='Median Stopping Time',
abline=list(list(v=0, col=gray(.9))))