5  Posterior Probabilities With Sequential Analysis

In a Bayesian analysis) It is entirely appropriate to collect data until a point has been proven or disproven, or until the data collector runs out of time, money, or patience. — Edwards et al. (1963)

… sequential designs are often disappointing in what they tell us about the treatment beyond the fact that it works. How well it works is usually not well established. It is sometimes given that the reason that results from sequential trials may be unreliable is precisely because they are sequential. However, I favor a different explanation. They are unreliable when and if they stop early, because in that case, they deliver less information. Many of these difficulties are inherited by flexible designs, and indeed, it has even been claimed that they bring no advantages compared with sequential designs. … I want to make it quite clear that I have no objection with flexible designs where they help us avoid pursuing dead-ends. The analogous technique from the sequential design world is stopping for futility. In my opinion of all the techniques of sequential analysis, this is the most useful. — Senn (2013)

5.1 Skeptical Prior

  • Prior taken to allow harm equally likely as benefits (mean 0)
  • Favors no effect (peak of prior density function at 0)
  • Low probability of large effects
  • Mixtures of normals: flexible way to specify priors
  • 1:1 mix of two normals, each with mean 0
  • SD of first: P(μ>1)=0.1
  • SD of second: P(μ>0.25)=0.05
Code
sd1 <- 1    / qnorm(1 - 0.1)
sd2 <- 0.25 / qnorm(1 - 0.05)
Code
spar(bty='l')
wt  <- 0.5   # 1:1 mixture
pdensity <- function(x) wt * dnorm(x, 0, sd1) + (1 - wt) * dnorm(x, 0, sd2)
x <- seq(-3, 3, length=200)
plot(x, pdensity(x), type='l', xlab=expression(mu), ylab='Prior Density')

Skeptical prior distribution for the unknown mean in a single arm study. This is a 1:1 mixture of zero mean normals with SD=0.780 and 0.152 respectively.
  • Simulate 50,000 different (on μ) studies; reflects real-world case where μ is unknown
  • Frequentists emphasize whether if nature throws the null at you, you will not declare the truth non-null
  • Bayesian inference: whatever nature throws at you, can we reconstruct what happened?
  • Given prior beliefs, Bayes tries to reveal hidden truths by updating your beliefs
  • Simulations reflect this:
    • draw μ from the prior
    • generate data assuming that μ
    • do Bayesian analysis that doesn’t know μ but knows the prior generating μ
    • sample sizes analyzed: 1, 2, 3, …, 500
    • for speed, the code generates all 500 data points but they are revealed one-at-a-time
  • Stop the instant P(efficacy) ≥ 0.95 or P(futility) ≥ 0.9
  • Futility: μ < 0.05
Code
simseq <- function(N, prior.mu=0, prior.sd, wt, mucut=0, mucutf=0.05,
                postcut=0.95, postcutf=0.9,
                ignore=20, nsim=1000) {
  prior.mu <- rep(prior.mu, length=2)
  prior.sd <- rep(prior.sd, length=2)
  sd1 <- prior.sd[1]; sd2 <- prior.sd[2]
  v1 <- sd1 ^ 2
  v2 <- sd2 ^ 2
  j <- 1 : N
  cmean <- Mu <- PostN <- Post <- Postf <- postfe <- postmean <- numeric(nsim)
  stopped <- stoppedi <- stoppedf <- stoppedfu <- stopfe <- status <-
    integer(nsim)
  notignored <- - (1 : ignore)

  # Derive function to compute posterior mean
  pmean <- gbayesMixPost(NA, NA, d0=prior.mu[1], d1=prior.mu[2],
                                 v0=v1, v1=v2, mix=wt, what='postmean')
  
  for(i in 1 : nsim) {
    # See http://stats.stackexchange.com/questions/70855
    component <- if(wt == 1) 1 else sample(1 : 2, size=1, prob=c(wt, 1. - wt))
    mu <- prior.mu[component] + rnorm(1) * prior.sd[component]
    # mu <- rnorm(1, mean=prior.mu, sd=prior.sd) if only 1 component
    
    Mu[i] <- mu
    y  <- rnorm(N, mean=mu, sd=1)
    ybar <- cumsum(y) / j    # all N means for N sequential analyses
    pcdf <- gbayesMixPost(ybar, 1. / j,
                          d0=prior.mu[1], d1=prior.mu[2],
                          v0=v1, v1=v2, mix=wt, what='cdf')
    post  <- 1 - pcdf(mucut)
    PostN[i] <- post[N]
    postf <- pcdf(mucutf)
    s <- stopped[i] <-
      if(max(post) < postcut) N else min(which(post >= postcut))
    Post[i]  <- post[s]   # posterior at stopping
    cmean[i] <- ybar[s]   # observed mean at stopping
    # If want to compute posterior median at stopping:
    #    pcdfs <- pcdf(mseq, x=ybar[s], v=1. / s)
    #    postmed[i] <- approx(pcdfs, mseq, xout=0.5, rule=2)$y
    #    if(abs(postmed[i]) == max(mseq)) stop(paste('program error', i))
    postmean[i] <- pmean(x=ybar[s], v=1. / s)
    
    # Compute stopping time if ignore the first "ignore" looks
    stoppedi[i] <- if(max(post[notignored]) < postcut) N
    else
      ignore + min(which(post[notignored] >= postcut))
    
    # Compute stopping time if also allow to stop for futility:
    # posterior probability mu < 0.05 > 0.9
    stoppedf[i] <- if(max(post) < postcut & max(postf) < postcutf) N
    else
      min(which(post >= postcut | postf >= postcutf))
    
    # Compute stopping time for pure futility analysis
    s <- if(max(postf) < postcutf) N else min(which(postf >= postcutf))
    Postf[i] <- postf[s]
    stoppedfu[i] <- s

    ## Another way to do this: find first look that stopped for either
    ## efficacy or futility.  Record status: 0:not stopped early,
    ## 1:stopped early for futility, 2:stopped early for efficacy
    ## Stopping time: stopfe, post prob at stop: postfe

    stp <- post >= postcut | postf >= postcutf
    s <- stopfe[i] <- if(any(stp)) min(which(stp)) else N
    status[i] <- if(any(stp)) ifelse(postf[s] >= postcutf, 1, 2) else 0
    postfe[i] <- if(any(stp)) ifelse(status[i] == 2, post[s],
                                     postf[s]) else post[N]
  }
  list(mu=Mu, post=Post, postn=PostN, postf=Postf,
       stopped=stopped, stoppedi=stoppedi,
       stoppedf=stoppedf, stoppedfu=stoppedfu,
       cmean=cmean, postmean=postmean,
       postfe=postfe, status=status, stopfe=stopfe)
}
Code
nsim <- 50000
g <- function() {
    set.seed(1)
    simseq(500, prior.mu=0, prior.sd=c(sd1, sd2), wt=wt, postcut=0.95,
           postcutf=0.9, nsim=nsim)
}
z       <- runifChanged(g, sd1, sd2, wt, nsim)
mu      <- z$mu
post    <- z$post
postn   <- z$postn
st      <- z$stopped
sti     <- z$stoppedi
stf     <- z$stoppedf
stfu    <- z$stoppedfu
cmean   <- z$cmean
postmean<- z$postmean
postf   <- z$postf
status  <- z$status
postfe  <- z$postfe
rmean <- function(x) formatNP(mean(x), digits=3)
k  <- status == 2
kf <- status == 1
  • 20393 trials stopped early for efficacy, proportion=0.408
    • of 24907 trials with μ>0, proportion=0.786
    • of 2367 trials with μ∈[0.15, 0.2], proportion=0.895
    • of 12210 trials with μ≥0.2, proportion=0.981
  • 28438 trials stopped early for futility
  • 1169 trials went to completion (n=500)
  • Average PP of efficacy at stopping for efficacy: 0.961
  • Of trials stopped early for efficacy, proportion with μ > 0: 0.960
  • Average PP of futility at stopping for futility: 0.920
  • Of trials stopped early for futility, proportion with μ < 0.05: 0.923
  • Perfect calibration of PP at stopping point
  • Check calibration of P(efficacy) if did not stop for either reason (1169 trials)
    • Assessed using smooth calibration curve exactly as in risk models
    • Relate actual P(μ>0) (from prior) to PP(μ>0)
Code
spar(left=1)
v <- val.prob(postn[status == 0], mu[status == 0] > 0, m=200,
              legendloc=c(.6, .4),
         xlab='Posterior Probability of Efficacy at 500th Look',
         ylab='Proportion of Trials\nwith Efficacy > 0')

  • Histogram showing distribution of true μ when stopped early for efficacy
Code
spar(bty='l')
hist(mu[k], nclass=50, xlim=c(-1,4), xlab='Efficacy', main='')
abline(v=0, col='red', lwd=0.5)
regret <- mean(mu[k] <= 0)
text(-0.5, 500, paste0('Proportion regret=', round(regret, 3)), srt=90, adj=0)

  • Check calibration of posterior mean when stop early for efficacy
    • Relate posterior and ordinary sample mean to true μ using “super smoother”
Code
spar(bty='l')
plot(0, 0, xlab='Estimated Efficacy',
     ylab='True Efficacy', type='n', xlim=c(-2, 4), ylim=c(-2, 4))
abline(a=0, b=1, col=gray(.9), lwd=4)
lines(supsmu(cmean[k],    mu[k]))
lines(supsmu(postmean[k], mu[k]), col='blue')
text(2, .4, 'Sample mean')
text(-1, .8, 'Posterior mean', col='blue')

  • Frequentist adjustment to sample mean for early stopping is complex, as is CL
  • Automatic for Bayesian posterior mean, median, mode, credible interval

Median Stopping Time as a Function of True μ

  • Expect to stop earlier when μ generating the data was larger
  • Use a parametric survival model (log-normal) to flexibly relate μ to the stopping time
  • No early stopping for efficacy → right-censor at sample size of 500
Code
dd <- datadist(mu); options(datadist='dd')
f <- psm(Surv(st, st < 500) ~ rcs(mu, 6), dist='lognormal')
plot(Predict(f, mu, fun=exp, conf.int=FALSE), xlim=c(-.25, 1), ylim=c(0, 500),
     ylab='Median Stopping Time',
     abline=list(list(v=0, col=gray(.9))))

Estimated median stopping time for efficacy as a function of true value of μ, using a log-normal survival time distribution and a restricted cubic spline in μ with 6 default knots. Studies never stopping are right censored at 500 trials.