# 5  Posterior Probabilities With Sequential Analysis

In a Bayesian analysis) It is entirely appropriate to collect data until a point has been proven or disproven, or until the data collector runs out of time, money, or patience. — Edwards et al. (1963)

… sequential designs are often disappointing in what they tell us about the treatment beyond the fact that it works. How well it works is usually not well established. It is sometimes given that the reason that results from sequential trials may be unreliable is precisely because they are sequential. However, I favor a different explanation. They are unreliable when and if they stop early, because in that case, they deliver less information. Many of these difficulties are inherited by flexible designs, and indeed, it has even been claimed that they bring no advantages compared with sequential designs. … I want to make it quite clear that I have no objection with flexible designs where they help us avoid pursuing dead-ends. The analogous technique from the sequential design world is stopping for futility. In my opinion of all the techniques of sequential analysis, this is the most useful. — Senn (2013)

• Bayesian approach allows aggressive sequential testing and early study termination
• Point estimate at termination is pulled back by prior → perfect calibration
• Things go wrong if
• model incorrectly specified (this also hurts frequentist)
• prior used in analysis conflicts with prior used by study’s judge
• What if assess efficacy at will and terminate the instant P(efficacy) > 0.95?
• PP on average above 0.95 but remains perfectly calibrated as long as no subtitution of prior
• Math and Bayes’ theorem dictate perfect calibration independent of stopping rule, but simulation exposes logic flow
• Example:
• one-arm study with maximum n=500
• look after each subject
• assume the E raw data measurement with σ=1
• efficacy = mean μ>0

## 5.1 Skeptical Prior

• Prior taken to allow harm equally likely as benefits (mean 0)
• Favors no effect (peak of prior density function at 0)
• Low probability of large effects
• Mixtures of normals: flexible way to specify priors
• 1:1 mix of two normals, each with mean 0
• SD of first: P(μ>1)=0.1
• SD of second: P(μ>0.25)=0.05
Code
``````sd1 <- 1    / qnorm(1 - 0.1)
sd2 <- 0.25 / qnorm(1 - 0.05)``````
Code
``````spar(bty='l')
wt  <- 0.5   # 1:1 mixture
pdensity <- function(x) wt * dnorm(x, 0, sd1) + (1 - wt) * dnorm(x, 0, sd2)
x <- seq(-3, 3, length=200)
plot(x, pdensity(x), type='l', xlab=expression(mu), ylab='Prior Density')``````
• Simulate 50,000 different (on μ) studies; reflects real-world case where μ is unknown
• Frequentists emphasize whether if nature throws the null at you, you will not declare the truth non-null
• Bayesian inference: whatever nature throws at you, can we reconstruct what happened?
• Given prior beliefs, Bayes tries to reveal hidden truths by updating your beliefs
• Simulations reflect this:
• draw μ from the prior
• generate data assuming that μ
• do Bayesian analysis that doesn’t know μ but knows the prior generating μ
• sample sizes analyzed: 1, 2, 3, …, 500
• for speed, the code generates all 500 data points but they are revealed one-at-a-time
• Stop the instant P(efficacy) ≥ 0.95 or P(futility) ≥ 0.9
• Futility: μ < 0.05
Code
``````simseq <- function(N, prior.mu=0, prior.sd, wt, mucut=0, mucutf=0.05,
postcut=0.95, postcutf=0.9,
ignore=20, nsim=1000) {
prior.mu <- rep(prior.mu, length=2)
prior.sd <- rep(prior.sd, length=2)
sd1 <- prior.sd[1]; sd2 <- prior.sd[2]
v1 <- sd1 ^ 2
v2 <- sd2 ^ 2
j <- 1 : N
cmean <- Mu <- PostN <- Post <- Postf <- postfe <- postmean <- numeric(nsim)
stopped <- stoppedi <- stoppedf <- stoppedfu <- stopfe <- status <-
integer(nsim)
notignored <- - (1 : ignore)

# Derive function to compute posterior mean
pmean <- gbayesMixPost(NA, NA, d0=prior.mu[1], d1=prior.mu[2],
v0=v1, v1=v2, mix=wt, what='postmean')

for(i in 1 : nsim) {
# See http://stats.stackexchange.com/questions/70855
component <- if(wt == 1) 1 else sample(1 : 2, size=1, prob=c(wt, 1. - wt))
mu <- prior.mu[component] + rnorm(1) * prior.sd[component]
# mu <- rnorm(1, mean=prior.mu, sd=prior.sd) if only 1 component

Mu[i] <- mu
y  <- rnorm(N, mean=mu, sd=1)
ybar <- cumsum(y) / j    # all N means for N sequential analyses
pcdf <- gbayesMixPost(ybar, 1. / j,
d0=prior.mu[1], d1=prior.mu[2],
v0=v1, v1=v2, mix=wt, what='cdf')
post  <- 1 - pcdf(mucut)
PostN[i] <- post[N]
postf <- pcdf(mucutf)
s <- stopped[i] <-
if(max(post) < postcut) N else min(which(post >= postcut))
Post[i]  <- post[s]   # posterior at stopping
cmean[i] <- ybar[s]   # observed mean at stopping
# If want to compute posterior median at stopping:
#    pcdfs <- pcdf(mseq, x=ybar[s], v=1. / s)
#    postmed[i] <- approx(pcdfs, mseq, xout=0.5, rule=2)\$y
#    if(abs(postmed[i]) == max(mseq)) stop(paste('program error', i))
postmean[i] <- pmean(x=ybar[s], v=1. / s)

# Compute stopping time if ignore the first "ignore" looks
stoppedi[i] <- if(max(post[notignored]) < postcut) N
else
ignore + min(which(post[notignored] >= postcut))

# Compute stopping time if also allow to stop for futility:
# posterior probability mu < 0.05 > 0.9
stoppedf[i] <- if(max(post) < postcut & max(postf) < postcutf) N
else
min(which(post >= postcut | postf >= postcutf))

# Compute stopping time for pure futility analysis
s <- if(max(postf) < postcutf) N else min(which(postf >= postcutf))
Postf[i] <- postf[s]
stoppedfu[i] <- s

## Another way to do this: find first look that stopped for either
## efficacy or futility.  Record status: 0:not stopped early,
## 1:stopped early for futility, 2:stopped early for efficacy
## Stopping time: stopfe, post prob at stop: postfe

stp <- post >= postcut | postf >= postcutf
s <- stopfe[i] <- if(any(stp)) min(which(stp)) else N
status[i] <- if(any(stp)) ifelse(postf[s] >= postcutf, 1, 2) else 0
postfe[i] <- if(any(stp)) ifelse(status[i] == 2, post[s],
postf[s]) else post[N]
}
list(mu=Mu, post=Post, postn=PostN, postf=Postf,
stopped=stopped, stoppedi=stoppedi,
stoppedf=stoppedf, stoppedfu=stoppedfu,
cmean=cmean, postmean=postmean,
postfe=postfe, status=status, stopfe=stopfe)
}``````
Code
``````nsim <- 50000
g <- function() {
set.seed(1)
simseq(500, prior.mu=0, prior.sd=c(sd1, sd2), wt=wt, postcut=0.95,
postcutf=0.9, nsim=nsim)
}
z       <- runifChanged(g, sd1, sd2, wt, nsim)
mu      <- z\$mu
post    <- z\$post
postn   <- z\$postn
st      <- z\$stopped
sti     <- z\$stoppedi
stf     <- z\$stoppedf
stfu    <- z\$stoppedfu
cmean   <- z\$cmean
postmean<- z\$postmean
postf   <- z\$postf
status  <- z\$status
postfe  <- z\$postfe
rmean <- function(x) formatNP(mean(x), digits=3)
k  <- status == 2
kf <- status == 1``````
• 20393 trials stopped early for efficacy, proportion=0.408
• of 24907 trials with μ>0, proportion=0.786
• of 2367 trials with μ∈[0.15, 0.2], proportion=0.895
• of 12210 trials with μ≥0.2, proportion=0.981
• 28438 trials stopped early for futility
• 1169 trials went to completion (n=500)
• Average PP of efficacy at stopping for efficacy: 0.961
• Of trials stopped early for efficacy, proportion with μ > 0: 0.960
• Average PP of futility at stopping for futility: 0.920
• Of trials stopped early for futility, proportion with μ < 0.05: 0.923
• Perfect calibration of PP at stopping point
• Check calibration of P(efficacy) if did not stop for either reason (1169 trials)
• Assessed using smooth calibration curve exactly as in risk models
• Relate actual P(μ>0) (from prior) to PP(μ>0)
Code
``````spar(left=1)
v <- val.prob(postn[status == 0], mu[status == 0] > 0, m=200,
legendloc=c(.6, .4),
xlab='Posterior Probability of Efficacy at 500th Look',
ylab='Proportion of Trials\nwith Efficacy > 0')``````
• Histogram showing distribution of true μ when stopped early for efficacy
Code
``````spar(bty='l')
hist(mu[k], nclass=50, xlim=c(-1,4), xlab='Efficacy', main='')
abline(v=0, col='red', lwd=0.5)
regret <- mean(mu[k] <= 0)
text(-0.5, 500, paste0('Proportion regret=', round(regret, 3)), srt=90, adj=0)``````
• Check calibration of posterior mean when stop early for efficacy
• Relate posterior and ordinary sample mean to true μ using “super smoother”
Code
``````spar(bty='l')
plot(0, 0, xlab='Estimated Efficacy',
ylab='True Efficacy', type='n', xlim=c(-2, 4), ylim=c(-2, 4))
abline(a=0, b=1, col=gray(.9), lwd=4)
lines(supsmu(cmean[k],    mu[k]))
lines(supsmu(postmean[k], mu[k]), col='blue')
text(2, .4, 'Sample mean')
text(-1, .8, 'Posterior mean', col='blue')``````
• Frequentist adjustment to sample mean for early stopping is complex, as is CL
• Automatic for Bayesian posterior mean, median, mode, credible interval

### Median Stopping Time as a Function of True μ

• Expect to stop earlier when μ generating the data was larger
• Use a parametric survival model (log-normal) to flexibly relate μ to the stopping time
• No early stopping for efficacy → right-censor at sample size of 500
Code
``````dd <- datadist(mu); options(datadist='dd')
f <- psm(Surv(st, st < 500) ~ rcs(mu, 6), dist='lognormal')
plot(Predict(f, mu, fun=exp, conf.int=FALSE), xlim=c(-.25, 1), ylim=c(0, 500),
ylab='Median Stopping Time',
abline=list(list(v=0, col=gray(.9))))``````