Simulating Operating Characteristics of a Sequential Longitudinal Markov Ordinal Outcome Trial

1 Summary

The performance of pairwise treatment comparison tests is simulated for 3-level longitudinal ordinal responses. The setup is a special case of longitudinal continuous response assessments with an override for deaths, which are counted as the highest response category. Here the continuous daily assessment is dichotomized so that on a given day the response Y=0 for fully recovered and alive, Y=1 for alive but not fully recovered, and Y=2 for dead. Simulations are done for estimating frequentist test power for both a longitudinal discrete time Markov proportional odds ordinal logistic model and a Cox model for time until recovery, the latter considering death at any time to represent no recovery by 28 days (time to recovery right censored at 28d). Y=0 and 2 are absorbing states, so once a patient recovers she is assumed to stay recovered. Under ideal conditions (constant treatment effect over time, proportional odds holds for the treatment effect, proportional hazards holds for time to recovery) the transition model odds ratio equals the Cox model hazard ratio, and the two tests provide identical test statistics and power. Note that for the general case where there are many levels of Y, the power of the longitudinal ordinal model will greatly exceed that of the time to recovery Cox test as demonstrated here. Note also that when Y has 3 levels and two of them are absorbing, the Markov transition model has no need to use the previous state as a covariate (since it’s constant), which is another reason that the Cox-ordinal longitudinal model equivalence will not hold in general. For the current limited-Y case the Markov model provides added flexibility and a simpler interpretation in the presence of competing events, by estimating these quantities:

the probability that Y \(\geq 1\) as a function of time and treatment (being not recovered or dead)
the probability that Y=1 as a function of time and treatment (being alive but not recovered)
the probability of being dead as a function of time and treatment
the mean number of days unrecovered (days with Y>0) by treatment

The last quantity is particularly clinically interpretable, being the expected number of days within the 28 day follow-up window for which the patient has died or is alive but not yet recovered.

For comparing two equal size treatment groups of \(n=89\) patients each, the frequentist power at \(\alpha=0.05\) for both statistical tests is given below, along with the hazard and odds ratios to be detected, and the corresponding reduction in mean days unrecovered.

Hazard Ratio	Reduction in Days Unrecovered	Cox Power
1.5	2.0	0.71
1.6	2.3	0.82
1.66667	2.5	0.88
1.75	2.7	0.93

As shown below, the Cox model analysis for time to recovery in essence makes the proportional odds assumption for the outcomes of recovered, not recovered, and dead. When the proportional odds assumption is violated, the Cox model hazard ratio for time to recovery is almost exactly equal to the “average” odds ratio from the PO model that falsely assumes proportional odds. Like the PO model in this case, the Cox model (and its special case the logrank test) is not able to recover a pure effect of a treatment on recovery unless the treatment affects death the same way as it affects recovery (i.e., proportional odds holds). The same examples show that when the partial PO model is used, by design it is able to accurately recover the separate recovery and mortality treatment effects.

2 Overview

Consider a clinical trial for the treatment of Stevens-Johnson syndrome / toxic epidermal necrolysis, a rare and life-threatening severe cutaneous adverse reaction induced primarily by drugs. Our goal is to estimate Bayesian operating characteristics for a two treatment arm comparison with five pre-planned data looks with optional stopping for efficacy, inferiority, or futility. A first-order discrete time Markov Bayesian partial proportional odds (PO) model is used, with daily patient assessment on days 1, 2, …, 28 after randomization. The underlying daily measurement is the percent body surface area for which skin has not re-attached (re-epitheliazation), with death considered as worse than any level of skin detachment. The most powerful analysis will treat % BSA involvement as a continuous ordinal measurement with death override. But we now concern ourselves with a simpler daily outcome: Y=0 (alive and completely recovered), Y=1 (alive but some skin remains detached), and Y=2 (dead). Consider Y=0 and Y=2 to be absorbing states, so per-patient serial data records stop once either of these is achieved.

Once Markov transition probabilities are transformed into state occupancy probabilities (SOPs), these can be used as primary estimands or a combination of the 28 days of SOPs can be summarized. In the study the primary estimand is the difference between treatments in the mean time not recovered, which is the difference in the sums over the 28 days of the probability that Y>0. Were there to be no deaths, the mean time not recovered is just one day less than the mean time to recovery. For the purpose of the simulations presented here the estimand will be the transition odds ratio in the Markov PO model. Note that the PO model places death as worse than non-recovery, but makes no assumption about the spacing of categories, i.e., going from death to not recovered is not assumed to be the same as going from not recovered to recovered.

It is common practice to analyze such data by collapsing each patient’s raw data into a single number—time to recovery—and then applying a Cox model comparison or its competing risk analog. For these analyses, death is considered a censoring event, and patients who die are censored by treating them as non-recovered on day 28. Thus a death at day 2 is considered as serious as a death at day 27 and these are considered as serious as being alive without recovery on day 28. It can be difficult to interpret time until a good outcome when the outcome can be interrupted by a bad outcome. State transition models provide estimates of probabilities of easy-to-interpret conditions, e.g. day-specific P(dead), P(alive but not recovered), or P(either being dead or being alive but not recovered). The easy generalization of such models to fully use the continuous version of the daily outcome (% body surface area involvement) is another advantage.

As an aside, it is customary to summarize death-censored time-to-recovery using the median. The median requires one to have continuous measurements, and ties in the day of recovery across patients makes the median simultaneously inefficient and non-robust. Adding one patient to the study can alter the median by a whole day. One would need to have hourly assessments for the median number of days to have good statistical behavior. These same problems do not exist for the mean time to recovery.

More statistical background and statistical computing details are available here. As detailed here, the compound symmetric correlation pattern assumption of a simpler random effects ordinal model is severely violated for designs like ours, hence the use of the Markov serial correlation pattern. Note that even when the proportional odds model is strongly violated for treatment, a simple function of the odds ratio (OR) from the model almost exactly reproduces the Wilcoxon statistic or concordance probability. The Wilcoxon statistic, when scaled to 0-1 (i.e., the concordance probability) is to within an average absolute error of 0.002 equal to \(\frac{\mathrm{OR}^{0.66}}{1 + \mathrm{OR}^{0.66}}\). The most important non-PO effect to model is due to the possible change in the mix of outcomes over follow-up time—an effect not necessarily involving treatment. This is modeled using the Peterson and Harrell (1990) partial proportional odds model.

olooks   <- c(30, 54, 78, 102, 126, 150, 174, 198, 222, 246, 267)
kl       <- length(olooks)
colooks1 <- paste(olooks[- kl], collapse=', ')
colooks  <- paste0(colooks1, ', and ', olooks[kl])
looks    <- olooks * 2 / 3
clooks1  <- paste(looks[- kl], collapse=', ')
clooks   <- paste0(clooks1, ', and ', looks[kl])

The new study in question is a 3-arm study with a planned maximum sample size of 267 patients with looks after follow-up is complete on 30, 54, 78, 102, 126, 150, 174, 198, 222, 246, and 267 (the target overall sample size) patients. Looks are scheduled after the first 30 patients have completed follow-up, then every 6m over an additional 5y period (23.7 patients per 6m rounded to 24, with 21 enrolled in the final 6m). Here we consider 2-arm comparisons so the combined sample sizes are 20, 36, 52, 68, 84, 100, 116, 132, 148, 164, and 178 (final look at planned maximum sample size).

3 R Setup

outfmt <- if(knitr::is_html_output ()) 'html'  else 'pdf'
markup <- if(knitr::is_latex_output()) 'latex' else 'html'
require(rms)          # automatically engages rms(Hmisc)

require(data.table)   # used to compute various summary measures

if(outfmt == 'html') require(plotly)

fdev <- switch(outfmt, html='png', pdf='pdf')
knitrSet(lang='markdown', w=7, h=7, dev=fdev, fig.path=paste0(fdev, '/sim2-'))
options(prType=markup)
## If producing html, ggplot2 graphics are converted to plotly graphics
## so that hovertext will show extra information
## Assumes aes(..., label=txt) used
ggp <- if(outfmt != 'html')
  function(ggobject, ...) ggobject else Hmisc::ggplotlyr

# When outputting to pdf, make LaTeX place figures right at the point       
# in which they are created; requires float.sty (see yaml header)
if(outfmt == 'pdf') knitr::opts_chunk$set(fig.pos = 'H', out.extra='')

# Function in Hmisc that will sense chunk has results='asis' and
# print in Rmarkdown verbatim mode if needed
pr <- markupSpecs$markdown$pr

# Define the squote, equote, pr, findstart, sop12, and dosim functions
source('funs.r')

# Read the source code for the hashCheck and runifChanged functions from
# https://github.com/harrelfe/rscripts/blob/master/hashCheck.r
# This makes it easy to see if any objects changed that require re-running
# a simulation, and reports on any changes
getRs('hashCheck.r', put='source')

# Load runParallel function from github
# Makes parallel package easier to use and does recombination
# over batches
getRs('runParallel.r', put='source')

# Quick way to get the variance-covariance matrix from a vgam fit, per author Thomas Yee
# See ~/r/VGAM/notes.md
vgamcov <- function(fit) {
  covun <- chol2inv(fit@R)
  cnames <- names(fit@coefficients)
  dimnames(covun) <- list(cnames, cnames)
  covun
}

4 Simulation Model

To be able to simulate serial 3-level ordinal outcomes (Y=0,1,2 above) it is necessary to have estimates of control arm daily outcome probabilities. Our estimates come from the corticosteriod arm of the trial by Wang et al, Figure 3 Panel D. The cumulative incidence estimates of complete re-epitheliazation there correspond to survivors. The new trial’s follow-up period for the primary Y=0,1,2 endpoint is 28 days. We assume that the 21d mortality for the expected severely ill patient population being studied to be 0.2, and assume an exponential distribution for the time until death such that \(S(21) = 0.8\), leading to the cumulative incidence of death by day \(t\) of \(1 - \exp(\lambda t)\) where \(\lambda = \frac{-\log(0.8)}{21} = 0.010625\). We smooth the recovery probabilities of Wang et al and interpolate the estimates to 1, 2, …, 28 days, then scale each day’s estimate by the probability of the event upon which they are conditioned (being alive).

Wang’s estimate for 28d recovery probability (0.615) is too low. Apply a multiplicative correction to yield 0.8.

d <- read.csv('wangFig3D.csv', col.names=c('t', 'p'), header=FALSE)
d$p <- d$p / 100
s <- with(d, supsmu(t, p))
w <- approx(s, xout=1:28)
t <- w$x
p <- w$y
lambda <- -log(0.8) / 21
pdeath <- 1 - exp(- lambda * (1 : 28))
precoveredAlive  <- p * (1 - pdeath) * 0.7 / 0.622892705
precoveredAlive2 <- p * (1 - pdeath) * 0.8 / 0.622892705

pnotrecoverdAlive  <- 1 - pdeath - precoveredAlive
pnotrecoverdAlive2 <- 1 - pdeath - precoveredAlive2

type <- rep(c('death', 'alive and not recovered',
         'alive and recovered with multiplier',
     'dead or not recovered', 'Wang recovery if survive'), each=28)
y    <- c(pdeath, pnotrecoverdAlive2, precoveredAlive2, 1 - precoveredAlive2, p)
g <- ggplot(data.frame(Day=1:28, Probability=y),
            aes(x=Day, y=Probability, color=type)) + geom_line()
ggp(g)

Next compute the mean time to recovery in the control arm, and under various hazard ratios. Power of the ordinary Cox two-sample test is estimated two ways. First, exponential distributions are assumed so that all that matters is the number of recovered patients. Second, the Cox test is simulated by drawing recovery time samples from the Wang control arm adjusted to 0.8 final recovery probability, and assuming proportional hazards to get the corresponding active treatment arm simulated recovery times. Times are simulated by computing the step function inverse of the Wang cumulative distribution function.

The adjusted Wang cumulative recovery curve and the curve that would result from a treatment benefit of HR=1.667 are found below.

s <- 1 - precoveredAlive2
plot(1:28, 1 - s, xlab='', ylab='Cumulative Probability of Recovery', type='l', ylim=c(0, 1))
lines(1:28, y <- 1 - s ^ 1.6666666, col='blue')
title(sub='Black: adjusted estimates from Wang for control\nBlue: estimates for treated group, using HR=1.667', adj=0)
title(sub=paste(round(max(1 - s), 2), 'at 28d\n', round(max(y), 2), 'at 28d'), adj=1)

# Approximate power assuming exponential distributions (just need expected # events by tx)
cpow <- function(hr, nevents1, nevents2, alpha=0.05) {
  zcrit <- - qnorm(alpha / 2)
  sd    <- sqrt((1. / nevents1) + (1. / nevents2))
  z     <- abs(log(hr)) / sd
  1. - (pnorm(zcrit - z) - pnorm(- zcrit - z))
}

# Simulated power based on Wang
pval <- function(hr, n1, pl=FALSE) {
  u  <- runif(n1)
  y1 <- approx(c(precoveredAlive2, 1), 1:29, xout=u, method='constant', f=1)$y
  e1 <- ifelse(y1 == 29, 0, 1)
  y1 <- pmin(y1, 28)
  u  <- runif(n1)
  y2 <- approx(c(1. - (1. - precoveredAlive2) ^ hr, 1), 1:29, xout=u, method='constant', f=1)$y
  e2 <- ifelse(y2 == 29, 0, 1)
  y2 <- pmin(y2, 28)
  y  <- c(y1, y2)
  e  <- c(e1, e2)
  tx <- c(rep(0, n1), rep(1, n1))
  if(pl) {
    plot(survfit(Surv(y, e) ~ tx))
    points(1:28, 1. - precoveredAlive2)
  }
  f  <- coxph(Surv(y, e) ~ tx)
  lr <- diff(f$loglik)
  1. - pchisq(2. * lr, 1)
}

# Test the function on large random samples
pval(1.5, 20000, pl=TRUE)

[1] 0

# Also estimate power using Jon Fintzi's delayed exponential model

pvaljf <- function(hr, n1, pl=FALSE) {
  tx <- rep(0:1, each=n1)
  y  <- c(7 + ceiling(rexp(n1, log(5) / (28 - 7))),
          7 + ceiling(rexp(n1, log(5) / (28 - 7) * hr)))
  e <- ifelse(y > 28, 0, 1)
  y[e == 0] <- 28   # non-recovery
  if(pl) {
    plot(survfit(Surv(y, e) ~ tx))
    points(1:28, 1. - precoveredAlive2)
  }
  f <- coxph(Surv(y, e) ~ tx)
  lr <- diff(f$loglik)
  1. - pchisq(2. * lr, 1)
}
pvaljf(1.5, 20000, pl=TRUE)

[1] 0

simpow <- function(hr, n1, fun=pval, nsim=1000) {
  pvals <- numeric(nsim)
  for(i in 1 : nsim) pvals[i] <- fun(hr, n1)
  mean(pvals < 0.05)
}

n1 <- 178 / 2

# Estimate power with JF's methods for HR=1.5
simpow(1.5, n1, pvaljf)

[1] 0.704

hrs <- c(1, 1.25, 1.5, 1.6, 1.6666667, 1.75, 2)

coxsim <- function() {
s <- 1 - precoveredAlive2
mtr <- pow <- spow <- numeric(0)
for(h in hrs) {
  s2   <- s ^ h
  mtr  <- c(mtr, sum(s2))
  pow  <- c(pow, cpow(h, n1 * max(1. - s), n1 * max(1. - s2)))
  spow <- c(spow, simpow(h, n1, nsim=2000)) 
}
diff <- mtr[1] - mtr
w <- data.frame(HR=hrs, `Mean Time<br>To Recovery`=mtr, Difference=diff,
                Power=pow, `Simulated<br>Power`=spow, check.names=FALSE)
w
}
seed <- 3
set.seed(seed)
w   <- runifChanged(coxsim, seed, n1, hrs, precoveredAlive2, simpow, pval)

knitr::kable(w, digits=3,
   caption='Hazard ratio, mean time to recovery, and power of Cox test')

Table 4.1: Hazard ratio, mean time to recovery, and power of Cox test
HR	Mean Time To Recovery	Difference	Power	Simulated Power
1.000	18.897	0.000	0.050	0.052
1.250	17.781	1.116	0.274	0.254
1.500	16.898	1.999	0.704	0.708
1.600	16.596	2.301	0.827	0.825
1.667	16.408	2.489	0.885	0.877
1.750	16.187	2.710	0.935	0.930
2.000	15.605	3.292	0.991	0.993

NOTE: Too many simulations failed to have the VGAM::vglm fit converge to trust the results with a target cumulative incidence of 0.8. For what follows 0.7 is used.

Compute parameters of a Markov ordinal model that provide the best compromise fit the the SOPs for death and for alive not recovered, over 28d. Since there are only 3 states and two of them are absorbing, the previous state is always Y=1 so previous state is ignored in the transition functions g1 and g2. g1 and g2 are the same for control patients. g1 parameterizes a constant treatment effect over time, and g2 parameterizes a linearly increasing effect such that there is no effect at day 1 and the maximum effect is reached at day 10. The parameter for g1 is the log odds transition ratio for treatment, and for g2 is the log odds ratio for treatment at 10d.

# Define function that computes logit of transition probabilities
# Allow for non-PO with regard to a quadratic time effect

g1 <- function(yprev, t, gap, X, parameter=0, extra) {
  if(! all(yprev == 1)) {prn(yprev); stop('program logic error')}
    kappa <- extra[1:4]
    lp <- matrix(0.,
                 nrow=length(yprev), ncol=2,
                 dimnames=list(as.character(yprev), c('1', '2')))
    # 2 columns = no. distinct y less 1 = length of intercepts
    # lp[yp, ] is a 3-vector because the kappa components are split out into a 3-vector
    # kappa[1:2] are the PO linear and quadratic time trend parameters and
    # kappa[3:4] are the non-PO extra values for death
    for(yp in yprev)
      lp[as.character(yp), ] <- kappa[1] * t + kappa[2] * t ^ 2 +
                                cbind(0, 1) * (kappa[3] * t  + kappa[4] * t ^ 2) +
                                parameter * (X == 2)
  lp
}

g2 <- function(yprev, t, gap, X, parameter=0, extra) {
  if(! all(yprev == 1)) {prn(yprev); stop('program logic error')}
    kappa <- extra[1:4]
    lp <- matrix(0.,
                 nrow=length(yprev), ncol=2,
                 dimnames=list(as.character(yprev), c('1', '2')))
    # 2 columns = no. distinct y less 1 = length of intercepts
    # lp[yp, ] is a 3-vector because the kappa components are split out into a 3-vector
    # kappa[1:2] are the PO linear and quadratic time trend parameters and
    # kappa[3:4] are the non-PO extra values for death
    for(yp in yprev)
      lp[as.character(yp), ] <- kappa[1] * t + kappa[2] * t ^ 2 +
                                cbind(0, 1) * (kappa[3] * t  + kappa[4] * t ^ 2) +
                                parameter * (X == 2) * pmin(t - 1, 9) / 9
  lp
}


# SOP targets for control patients:
target <- cbind(precoveredAlive, pnotrecoverdAlive, pdeath)
rownames(target) <- as.character(1:28)

seed      <- 6
int.try   <- c(12, -4.4)
extra.try <- c(-1, .02, .9, -.02)
fun       <- function(...) findstart(intercepts=int.try, extra=extra.try,
                                     initial=1, absorb=c(0, 2),
                                     times=1:28, y=0:2, g=g1, seed=seed)

sparams   <- runifChanged(fun, seed, target, g1, int.try, extra.try, findstart, intMarkovOrd,
                          file='sparams.rds', .inclfun.=FALSE)

sop12(y=0:2, times=1:28, initial=1, absorb=c(0,2),
      intercepts=sparams$intercepts, g=g1, extra=sparams$extra, which=1)

# Save the optimal values of the extra vector as default values for the extra argument
# so we don't need to specify them in later steps
formals(g1)$extra <- sparams$extra
formals(g2)$extra <- sparams$extra
saveRDS(list(g1=g1, g2=g2), 'gsim2.rds')

# Save intercepts
ints <- sparams$intercepts

Parameter values were found that made the Markov PO model accurately reflect the existing data, with a total absolute error of 0.59 over 84 state occupancy probabilities. Note that a strong non-proportional odds component for the absolute follow-up time effect was necessary to reflect the changing relative frequencies of the three outcomes over time.

5 Transition Odds Ratios vs. Change in Mean Time Unrecovered

The primary treatment effect in a Markov PO model is stated in terms of the transition odds ratio (OR), i.e., the ratio of odds of \(Y(t) \geq y | Y(t-1) = a\) for treatment B vs. treatment A (with only one non-absorbing state the \(Y(t-1) = a\) condition is superfluous). The primary result will be stated in terms of the B - A difference in mean time unhealed (which includes time not alive). We need to be able to relate one to the other. Two simulations are done. In the first, the treatment effect is constant over time in the simulated data and in the model used to analyze them. In the second simulation we allow in the simulations and in the final analysis for a treatment \(\times\) time interaction. For these simulations we define the OR \(\exp{\beta}\) to be the treatment effect at 10d, assume the OR stays there after 10d, and assume that the log odds ratio is linear in time from \(t=1\) to \(t=10\) with the OR being 1.0 at \(t=1\) (no treatment effect on the first day post randomization). So define the treatment effect on day \(t\) on the log odds scale to be \(\frac{\min(t - 1, 9)}{9} \beta\). But for the fitted model for these second simulations we use a quadratic time \(\times\) treatment interaction to reflect imperfect knowledge about a time-varying treatment effect.

In addition to computation of days unrecovered from the true model, simulate a sample of 20,000 patients for each model and report the mean and median time to recovery. Note that because of censoring, the median is not valid when its estimate exceeds 27d. It is also very problematic because the variable being summarized is not continuous. The mean does not have a problem with discrete data.

Vary treatment effect \(\beta\) and compute the difference and ratio of mean days not recovered, separately for the two true treatment effect models. Note that time to recovery is defined so that it is expected to be approximately one day longer than time unrecovered.

seed <- 17
set.seed(seed)
ors <- sort(c(1/1.6, seq(0.1, 1, by=0.05)))
m <- length(ors)
# Define time to recovery with treatment of death at any time as censored at 28d
tcl <- function(y) {
  if(any(y == 2)) return(list(rtime=28L, event=0L))   # death
  if(any(y == 0)) return(list(rtime=min(which(y == 0)), event=1L))
    # which works since time=1,2,3,...
  list(rtime=length(y), event=0L)     # censored; length(y) should be 28
}
fun <- function() {
  W <- list()
  for(wg in 1 : 2) {
    mnr <- ttrmean <- ttrmedian <- numeric(m)
    w <- NULL
    i   <- 0
    for(or in ors) {
      gg <- switch(wg, g1, g2)
      sop <- soprobMarkovOrd(0:2, 1:28, initial=1, absorb=c(0,2),
                                 intercepts=ints, g=gg,
                             X=2, parameter=log(or))
    i <- i + 1
    mnr[i] <- sum(1. - sop[, 1])   # mean time not recovered
    s <- simMarkovOrd(n=20000, y=0:2, 1:28, initial=1, X=c(group=2),
                        absorb=c(0, 2), intercepts=ints, g=gg, parameter=log(or))
      setDT(s, key='id')
      ts <- s[, tcl(y), by=id]
      ttrmean  [i] <- mean(ts$rtime)
      ttrmedian[i] <- median(ts$rtime)
    }
    W[[wg]] <- 
      data.frame(OR=ors,
                 `Mean Days Unrecovered` = mnr,
                 `Difference in Days`    = mnr[m] - mnr,
                            Ratio                  = mnr / mnr[m], 
                            'Mean TTR'             = ttrmean,
                            'Median TTR'           = ttrmedian,
                              check.names=FALSE)
  }
  W
}

W <- runifChanged(fun, seed, ors, tcl, g1, g2, ints, .inclfun.=FALSE)

for(wg in 1 : 2) {
  cap <- switch(wg, 'Constant treatment effect', 'Changing treatment effect')
  print(knitr::kable(W[[wg]], digits=3, caption=cap))
  }

Table 5.1: Constant treatment effect
OR	Mean Days Unrecovered	Difference in Days	Ratio	Mean TTR	Median TTR
0.100	11.468	8.612	0.571	12.459	12
0.150	12.482	7.598	0.622	13.423	13
0.200	13.288	6.792	0.662	14.271	14
0.250	13.977	6.102	0.696	14.946	15
0.300	14.591	5.489	0.727	15.560	15
0.350	15.151	4.929	0.755	16.080	15
0.400	15.669	4.410	0.780	16.566	16
0.450	16.155	3.925	0.805	17.010	16
0.500	16.612	3.468	0.827	17.491	17
0.550	17.043	3.036	0.849	17.904	17
0.600	17.452	2.627	0.869	18.329	17
0.625	17.649	2.430	0.879	18.530	18
0.650	17.841	2.239	0.889	18.680	18
0.700	18.210	1.870	0.907	18.995	18
0.750	18.560	1.519	0.924	19.321	18
0.800	18.894	1.185	0.941	19.668	19
0.850	19.212	0.867	0.957	19.987	19
0.900	19.515	0.564	0.972	20.203	19
0.950	19.804	0.275	0.986	20.513	20
1.000	20.080	0.000	1.000	20.803	20

Table 5.1: Changing treatment effect
OR	Mean Days Unrecovered	Difference in Days	Ratio	Mean TTR	Median TTR
0.100	12.326	7.753	0.614	13.230	13
0.150	13.212	6.868	0.658	14.144	14
0.200	13.929	6.151	0.694	14.922	14
0.250	14.547	5.533	0.724	15.441	15
0.300	15.099	4.980	0.752	16.073	15
0.350	15.605	4.475	0.777	16.514	16
0.400	16.074	4.006	0.800	16.952	16
0.450	16.513	3.567	0.822	17.336	16
0.500	16.927	3.153	0.843	17.760	17
0.550	17.318	2.762	0.862	18.196	17
0.600	17.689	2.391	0.881	18.553	18
0.625	17.867	2.212	0.890	18.697	18
0.650	18.041	2.038	0.898	18.881	18
0.700	18.376	1.703	0.915	19.186	18
0.750	18.695	1.384	0.931	19.491	18
0.800	18.999	1.080	0.946	19.748	19
0.850	19.289	0.791	0.961	20.077	19
0.900	19.565	0.515	0.974	20.244	19
0.950	19.828	0.251	0.987	20.536	20
1.000	20.080	0.000	1.000	20.896	20

We may focus on a transition OR=0.6 which translates to a 2.6 reduction in days not recovered (2.4d for time-varying treatment effect model) or ratio of mean times of 0.88 and 0.80 or a hazard ratio of 1/0.6 = 1.667.

6 Effect of OR=0.6 on Time to Recovery Distribution

From our model based on Wang et al’s result (after multiplicative factor applied), derive the time to recovery distribution under the null and under OR=0.6 by drawing two samples of size n=20,000.

set.seed(5)
s1 <- simMarkovOrd(n=20000, y=0:2, 1:28, initial=1, X=c(group=1),
                  absorb=c(0, 2), intercepts=ints, g=g1, parameter=0.)
s2 <- simMarkovOrd(n=20000, y=0:2, 1:28, initial=1, X=c(group=2),
                     absorb=c(0, 2), intercepts=ints, g=g1, parameter=log(0.6))
s  <- rbind(s1, s2)
s$group <- factor(s$group, 1:2, c('control', 'active'))
setDT(s, key=c('group','id'))
ts <- s[, tcl(y), by=.(group, id)]
f  <- npsurv(Surv(rtime, event) ~ group, data=ts)
w  <- round(1. - summary(f, times=28)$surv, 3)
survplot(f, fun=function(y) 1. - y, ylab='Cumulative Probability of Recovery')

A OR0.6 moves a cumulative probability of recovery by 28d from 0.702 to 0.855.

7 Effect of Death on Time to Recovery

The interpretation of Cox model parameters under censoring of time to recovery at 28d for death at any time is worth exploring. Under the proportional odds (PO) assumption, a treatment that affects mortality to the same degree that it affects non-recovered will have fewer deaths and a shorter time to recovery. If there is non-PO to the extent that there is no treatment effect on mortality in the Markov PO model, deaths are random events, and even though the time to recovery is censored at 28d, the random deaths may be expected to weaken the treatment effect. Put that to the test by simulating for each degree of non-PO a 20000 patient dataset with 10000 in each treatment group. Estimate the hazard ratio for recovery for each case and compare its reciprocal to to the group 2:1 OR for \(Y\geq 1\) and the OR for \(Y=2\). The main OR (for \(Y \geq 1\)) is 0.75 in all cases. The OR estimated from a PO Markov model and from a partial PO model are also presented. PPO in the table refers to a fitted partial PO model that unlinks the death effect from the recovery effect.

file <- 'orhrdeath.rds'
require(VGAM)

seed <- 11
set.seed(seed)
n <- 20000
g3 <- function(yprev, t, gap, X, parameter=0, extra) {
  if(! all(yprev == 1)) stop('program logic error')
    kappa <- extra[1:4]
    tau   <- extra[5]   # amount of non-PO for treatment for Y=2
    lp <- matrix(0.,
                 nrow=length(yprev), ncol=2,
                 dimnames=list(as.character(yprev), c('1', '2')))
    # 2 columns = no. distinct y less 1 = length of intercepts
    # lp[yp, ] is a 3-vector because the kappa components are split out into a 3-vector
    # kappa[1:2] are the PO linear and quadratic time trend parameters and
    # kappa[3:4] are the non-PO extra values for death
    for(yp in yprev)
      lp[as.character(yp), ] <- kappa[1] * t + kappa[2] * t ^ 2 +
                                cbind(0, 1) * (kappa[3] * t  + kappa[4] * t ^ 2) +
                                (X == 2) * cbind(parameter, parameter + tau)
  lp
}

formals(g3)$extra <- c(sparams$extra, 0.)

hashobj <- hashCheck(seed, n, g3, ints, file=file)
hash    <- hashobj$hash
z       <- hashobj$result
if(! length(z)) {
  s1 <- simMarkovOrd(n=n / 2, y=0:2, 1:28, initial=1, X=c(group=1),
                                     absorb=c(0, 2), intercepts=ints, g=g3, parameter=0)
  setDT(s1, key='id')
  t1 <- s1[, tcl(y), by=id]

  for(tau in c(-0.3, 0, 0.3, 0.6, 0.9, 1.5)) {
    formals(g3)$extra <- c(sparams$extra, tau)
    # tau ignored when X=1
    s2 <- simMarkovOrd(n=n / 2, y=0:2, 1:28, initial=1, X=c(group=2),
                                     absorb=c(0, 2), intercepts=ints, g=g3, parameter=log(0.75))
    setDT(s2, key='id')
    s12 <- rbind(s1, s2)
    s12[, group := factor(group)]
    f <- try(vglm(y ~ pol(time, 2) + group, cumulative(parallel=FALSE ~ pol(time, 2), reverse=TRUE),
              data=s12), silent=TRUE)
    or.overall <- if(inherits(f, 'try-error')) NA else exp(coef(f)['group2'])
    f <- try(vglm(y ~ pol(time, 2) + group, cumulative(parallel=FALSE ~ pol(time, 2) + group, reverse=TRUE),
              data=s12), silent=TRUE)
    if(inherits(f, 'try-error')) or1 <- or2 <- NA else {
      or1 <- exp(coef(f)['group2:1'])
      or2 <- exp(coef(f)['group2:2'])
    }
  
      t2 <- s2[, tcl(y), by=id]
      w  <- rbind(cbind(t1, group='1'), cbind(t2, group='2'))
      f  <- survival::coxph(Surv(rtime, event) ~ group, data=w)
      z  <- rbind(z, data.frame(tau=tau,
                                'True OR for<br>not recovered'=0.75,
                                'True OR<br>for death'=exp(log(0.75) + tau),
                                'Estimated OR<br>under PO'=or.overall,
                                'PPO Estimated OR<br>for not recovered'=or1,
                                'PPO Estimated OR<br>for death'=or2,
                                '1/HR'=round(exp(-coef(f)), 3),
                                check.names=FALSE))
  }
  setattr(z, 'hash', hash)
  saveRDS(z, file=file, compress='xz')
}
knitr::kable(z, digits=3, row.names=FALSE)

tau	True OR for not recovered	True OR for death	Estimated OR under PO	PPO Estimated OR for not recovered	PPO Estimated OR for death	1/HR
-0.3	0.75	0.556	0.711	NA	NA	0.675
0.0	0.75	0.750	0.754	0.761	0.732	0.733
0.3	0.75	1.012	0.810	0.748	0.971	0.814
0.6	0.75	1.367	0.883	0.760	1.304	0.913
0.9	0.75	1.845	0.993	0.762	1.762	1.075
1.5	0.75	3.361	1.351	0.755	3.259	1.638

See that when the treatment effect on death is in the wrong direction (tau > -log(0.75)), the reciprocal of the Cox model estimate of the hazard ratio for time to recovery is moved towards the null and away from the effect of treatment on recovery per se. When the treatment benefits mortality more than it benefits recovery, the Cox hazard ratio is more impressive than the OR for recovery. So analysis of time to recovery is mixing in the mortality effect, similar to the way the PO model does. One could go almost as far as saying that the competing risk Cox model analysis for time to recovery also assumes PO. The time to recovery HR (actually its reciprocal, to convert to a “larger is bad” framework) is very close to the average OR for treatment that falsely assumes PO.

8 Clinical Trial Simulation

In addition to simulating longitudinal responses under a Markov partial proportional odds model, the estSeqMarkovOrd at the last data look also simulates a Cox proportional hazards \(\chi^2\) statistic for time to recovery. Time to recovery is the first time at which Y=0, and if none or any Y=2, time to recovery is censored at 28d. The purpose of this additional simulation is to compare the frequentist power of a two-sample comparison of time until recovery with the frequentist power of the transition odds ratio at days 10-28 in the Markov proportional odds model.

8.1 Simulate a Single Large Trial

The simulation model is given above and is encapsulated in the g functions previously defined. For g2 the analysis model is not informed of the linearly increasing treatment effect from day 1 to day 10 that stays constant from days 10-28. Instead, the analysis model used on each simulated dataset has a quadratic time \(\times\) treatment interaction.

Let’s simulate a single 10000-patient trial to check that we can recover a true log odds ratio of log(0.5) = -0.69 overall (model 1) or at \(t=10\) (model 2).

require(VGAM)
n <- 10000
for(wg in 1 : 2) {
  cat(if(wg == 1) 'Constant treatment effect\n\n' else '\nTime-varying treatment effect\n\n')
  set.seed(8)
  g <- if(wg == 1) g1 else g2
  s1 <- simMarkovOrd(n=n / 2, y=0:2, 1:28, initial=1, X=c(group=1),
                                    absorb=c(0, 2), intercepts=ints, g=g, parameter=0)
  s2 <- simMarkovOrd(n=n / 2, y=0:2, 1:28, initial=1, X=c(group=2),
                                    absorb=c(0, 2), intercepts=ints, g=g, parameter=log(0.5))
  s2$id <- s2$id + n
  s <- rbind(s1, s2)
  s$yprev <- as.factor(s$yprev)
  s$group <- as.factor(s$group)
  form <-  y ~ pol(time, 2) + group
  if(wg == 2) form <- y ~ pol(time, 2) * group
  f <- vglm(form, cumulative(parallel=FALSE ~ pol(time, 2), reverse=TRUE), data=s)
  k <- coef(f)
  # Compute treatment effect; is a constant for model 1
  if(wg == 1) cat('beta =', round(k['group2'], 3), '\n')
  else {
    # Plot estimated treatment effect over time
    time <- 1 : 28
    effect <- k[7] + time * k[8] + time * time * k[9]
    plot(time, effect, type='l')
    lines(c(1, 10, 28), c(0, log(0.5), log(0.5)), col=gray(0.8))
    # Repeat, doing this another way
    contrvgam <- function(fit) {  #, n1=NULL, n2=NULL, parameter=NULL) {   # for customized vgam contrasts if non-PO in effect
        beta   <- coef(fit)
        i      <- c('group2', 'pol(time, 2)time:group2', 'pol(time, 2)time^2:group2')
        b      <- beta[i]
        effect <- b[1] + 10 * b[2] + 100 * b[3] 
        # Treatment effect at 10 days (quadratic time x treatment interaction)
        # if(length(n1)) return(list(Contrast=effect, SE=approxse(or=exp(parameter), n1=n1, n2=n2)))
        v      <- vgamcov(fit)
        if(any(i %nin% rownames(v))) {saveRDS(fit, 'problemfit.rds'); stop()}
        v      <- v[i, i]
        vr     <-       v[1, 1] +  100 * v[2, 2] + 10000 * v[3, 3] + 20 * v[1, 2] +
                  200 * v[1, 3] + 2000 * v[2, 3]
        list(Contrast=effect, SE=sqrt(vr))
    }
  print(contrvgam(f))
  }
}

Constant treatment effect

beta = -0.638 

Time-varying treatment effect

$Contrast
   group2 
-1.129932 

$SE
[1] 0.00553126

The partial PO Markov model was able to recover the treatment effect large simulated data, as the fitted quadratic effect was not far from the true linear spline time-related treatment effect shown in gray above.

8.2 Simulation to Approximate the Standard Error of the Treatment Effect

Let’s see if we can satisfactorily approximate the standard error of the treatment effect. This would speed up simulations in the future. We model on the scale of the log of the \(\hat{\beta}\) standard error. Do this separately for model 1 (one treatment parameter) and model 2 (three treatment parameters).

# Try seeing how predictable is the standard error of the main contrast, for varying true effects and sample sizes
file <- 'vn.rds'
seed <- 1
set.seed(seed)
ors     <- seq(0.1, 1, by=0.1)
combos  <- expand.grid(or=ors, n=c(50, 100, 150, 200, seq(300, 2000, by=100)), model=1:2)
hashobj <- hashCheck(seed, combos, g1, g2, ints, file=file)
hash    <- hashobj$hash
w       <- hashobj$result
if(! length(w)) {
  w <- combos
  v <- numeric(nrow(w))
  for(i in 1 : nrow(w)) {
    n <- w[i, 'n']
    parameter <- log(w[i, 'or'])
    m <- w[i, 'model']
    g <- if(m == 1) g1 else g2
    s1 <- simMarkovOrd(n=n / 2, y=0:2, 1:28, initial=1, X=c(group=1),
                                    absorb=c(0, 2), intercepts=ints, g=g, parameter=0)
    s2 <- simMarkovOrd(n=n / 2, y=0:2, 1:28, initial=1, X=c(group=2),
                                    absorb=c(0, 2), intercepts=ints, g=g, parameter=parameter)
    s <- rbind(s1, s2)
    s$group <- as.factor(s$group)
    form <-  y ~ pol(time, 2) + group
    if(m == 2) form <- y ~ pol(time, 2) * group
    f <- try(vglm(form, cumulative(parallel=FALSE ~ pol(time, 2), reverse=TRUE), data=s), silent=TRUE)
    v[i] <- if(inherits(f, 'try-error')) NA else if(m == 1) vcov(f)['group2', 'group2'] else contrvgam(f)$SE ^ 2
  }
  w$v <- v
  setattr(w, 'hash', hash)
  saveRDS(w, file, compress='xz')
}
ratio <- subset(w, model == 2)$v / subset(w, model == 1)$v
pr('Median model 2 : model 1 ratio of variances of treatment effect', inline=median(ratio))


Median model 2 : model 1 ratio of variances of treatment effect NA

w <- subset(w, v < 1)   # remove outliers from vgam problems
v <- w$v
ggplot(w, aes(x=n, y=v * n, shape=factor(model), color=or)) + geom_point()

Judging by the median variance ratio and the graph above, the 3-parameter treatment contrast is less than half as efficient as the 1-parameter contrast assuming constant treatment effect. Judging by the flatness of the above trends in the variance of \(\hat{\beta}\) multiplied by the sample size, the dominant form of the variance is \(\frac{k}{n}\).

# Fit various models to log(v)
h <- function(form) {
  z <- NULL
  for(type in c('ols', 'robust')) {
    h <- switch(type,
                robust = MASS::rlm(form, data=dat),
                ols    = ols(form, data=dat))
    r <- resid(h)
    ssr <- var(fitted(h))
    r2 <- ssr / (ssr + var(r))
    z <- rbind(z, data.frame(type=type, model=as.character(form)[3], 
                             stat=c('Mean', 'Median', 'P(|error| > 0.1)'),
                             error=c(mean(abs(r)), Median=median(abs(r)),
                               mean(abs(r) > 0.1)),
                             r2=r2,
                             check.names=FALSE))
  }
  z
}

for(wg in 1 : 2) {
  cat('\nModel', wg, '\n\n')
  dat <- subset(w, model == wg)
  z <- rbind(
    h(log(v) ~ log(n)),
    h(log(v) ~ pol(log(n), 2)),
    h(log(v) ~ log(n) + log(or)),
    h(log(v) ~ log(n) * log(or)),
    h(log(v) ~ pol(log(n), 2) + pol(log(or), 2)),
    h(log(v) ~ pol(log(n), 2) * pol(log(or), 2)) )

  print(ggplot(z, aes(x=error, y=model, shape=type)) + geom_point() +
        facet_wrap(~ stat, scales='free_x') + labs(caption=paste('Model', wg)))
  }


Model 1


Model 2

The full models (two quadratics) without interaction estimated using robust regression best predicts the log of the standard error of the treatment log odds ratio. These model have \(R^{2} > 0.99\) and less than 0.12 chance of making an error larger than 0.1 on the log scale (which is a fold change error of 0.9). The median absolute log error is < 0.03 (fold change error of 0.97 in estimating the variance). Write a function for using these approximations later.

for(wg in 1 : 2) {
  dat <- subset(w, model == wg)
  approxvar <- MASS::rlm(log(v) ~ pol(log(n), 2) + pol(log(or), 2), data=dat)

  # Derive a function to compute standard error from n and or
  b <- coef(approxvar)
  if(wg == 1) beta.approxvar1 <- b else beta.approxvar2 <- b
  approxse <- function(n, or, n1=NULL, n2=NULL) {
    b <- beta.approxvar
    if(length(n1)) n <- 4. / ((1. / n1) + (1. / n2))
    x <- log(n)
    y <- log(or)
    xb <- b[1] + b[2] * x + b[3] * x ^ 2 + b[4] * y + b[5] * y ^ 2 
    sqrt(exp(xb))
  }
}

8.3 Simulation of Frequentist Trials

For each true OR we simulate 1000 clinical trials. ORs will vary from 0.2, 0.3, …, 1.0. Markov ordinal analysis is done after 20, 36, 52, 68, 84, 100, 116, 132, 148, 164, and 178 patients have completed follow-up. Frequentist power is computed at \(\alpha=0.05\).

For each trial the treatments are randomly assigned with probability 1/2 each. The high-level estSeqMarkovOrd function calls the simMarkovOrd function to simulate each trial. Because the contrast of interest needs to take into account a treatment \(\times\) time interaction, the groupContrast argument must be specified below.

Note: dosim calls estSeqMarkovOrd which fits data using the rms package lrm function when the model is a proportional odds model. Since there is non-proportional odds, the VGAM package’s vgam or vglm function is used to fit the partial proportional odds model instead. When doing Bayesian analysis, the rmsb package blrm function will fit a constrained partial PO model.

ppo       <- ~ pol(time, 2)    # partial PO model with PO exception for time
ors       <- c(0.3, 0.4, 0.5, 0.55, 0.6, 0.65, 0.7, 0.8, 1.0)

# Define time to recovery with treatment of death at any time as censored at 28d
tc <- function(y) {
  if(any(y == 2)) return(c(28, 0))   # death
  if(any(y == 0)) return(c(min(which(y == 0)), 1))
    # which works since time=1,2,3,...
  c(length(y), 0)             # censored; length(y) should be 28
}

dosimit  <- function(model, g, formula, file)
  dosim(y=0:2, times=1:28, g=g, initial=c('1'=1.0),  # one initial state, sampled w/prob 1.0
        absorb=c(0, 2),
        intercepts=ints,
        ors=ors, formula=formula,
        ppo=ppo,  groupContrast=if(model == 2) contrvgam,
        timecriterion=tc, coxzph=FALSE, seed=3,
        nsim=1000, looks=looks,
        maxest=2, maxvest=10,
                file=file)

pr('Model 1')

Model 1

sim1 <- dosimit(1, g1, y ~ pol(time, 2) + group, file='simflooks1.rds')

Failed simulations 

                                                                                               failure
1:                                                                                        |contrast|>2
2:                                        Error in lm.fit(X.vlm, y = z.vlm, ...) : NA/NaN/Inf in 'y'\n
3:              Error in tapplymat1(ccump, "diff") : \n  NA/NaN/Inf in foreign function call (arg 1)\n
4: Simulated data for simulation with sample size 20 has 2 distinct y values instead of the required 3
5: Simulated data for simulation with sample size 36 has 2 distinct y values instead of the required 3
6: Simulated data for simulation with sample size 52 has 2 distinct y values instead of the required 3
    Freq
1:   131
2:  3485
3: 16997
4:   168
5:    10
6:     1

Median of 1000 logistic regression coefficients for each OR 

     (Intercept):1 (Intercept):2 pol(time, 2)time:1 pol(time, 2)time:2
0.3         5.0555       -3.7088            -0.1263            -0.2686
0.4         4.7127       -3.6245            -0.0437            -0.3069
0.5         4.7273       -3.8868            -0.0757            -0.2698
0.55        4.8425       -3.6212            -0.0772            -0.2361
0.6         5.0698       -3.8890            -0.1264            -0.2352
0.65        4.8132       -3.8957            -0.0644            -0.1989
0.7         4.7954       -3.8270            -0.0586            -0.2001
0.8         4.7900       -3.8663            -0.0979            -0.2190
1           4.9208       -4.0593            -0.1362            -0.2037
     pol(time, 2)time^2:1 pol(time, 2)time^2:2  group2
0.3               -0.0047               0.0176 -0.8863
0.4               -0.0088               0.0180 -0.7568
0.5               -0.0064               0.0162 -0.5733
0.55              -0.0060               0.0142 -0.5798
0.6               -0.0034               0.0160 -0.4835
0.65              -0.0059               0.0130 -0.4316
0.7               -0.0070               0.0124 -0.2652
0.8               -0.0042               0.0130 -0.2146
1                 -0.0026               0.0132  0.0322

Standard deviation of simulated group effects, scaled MAD, and square root of median of estimated variances 

      OR look    SD   MAD SDest
 1: 0.30   20 0.397 0.188 0.513
 2: 0.30   36 0.282 0.130 0.244
 3: 0.30   52 0.239 0.120 0.186
 4: 0.30   68 0.208 0.104 0.156
 5: 0.30   84 0.210 0.089 0.138
 6: 0.30  100 0.204 0.080 0.123
 7: 0.30  116 0.206 0.086 0.006
 8: 0.30  132 0.196 0.085 0.001
 9: 0.30  148 0.192 0.087 0.000
10: 0.30  164 0.185 0.087 0.000
11: 0.30  178 0.180 0.089 0.000
12: 0.40   20 0.345 0.209 0.339
13: 0.40   36 0.163 0.176 0.004
14: 0.40   52 0.127 0.174 0.000
15: 0.40   68 0.123 0.174 0.000
16: 0.40   84 0.118 0.174 0.000
17: 0.40  100 0.117 0.174 0.000
18: 0.40  116 0.119 0.174 0.000
19: 0.40  132 0.117 0.174 0.000
20: 0.40  148 0.117 0.174 0.000
21: 0.40  164 0.116 0.174 0.000
22: 0.40  178 0.119 0.174 0.000
23: 0.50   20 0.413 0.215 0.389
24: 0.50   36 0.235 0.161 0.004
25: 0.50   52 0.182 0.181 0.000
26: 0.50   68 0.171 0.206 0.000
27: 0.50   84 0.175 0.205 0.000
28: 0.50  100 0.174 0.205 0.000
29: 0.50  116 0.172 0.203 0.000
30: 0.50  132 0.173 0.206 0.000
31: 0.50  148 0.171 0.206 0.000
32: 0.50  164 0.172 0.206 0.000
33: 0.50  178 0.171 0.206 0.000
34: 0.55   20 0.463 0.362 0.518
35: 0.55   36 0.292 0.168 0.010
36: 0.55   52 0.214 0.164 0.001
37: 0.55   68 0.184 0.151 0.000
38: 0.55   84 0.171 0.142 0.000
39: 0.55  100 0.159 0.143 0.000
40: 0.55  116 0.155 0.141 0.000
41: 0.55  132 0.148 0.140 0.000
42: 0.55  148 0.145 0.140 0.000
43: 0.55  164 0.143 0.140 0.000
44: 0.55  178 0.141 0.140 0.000
45: 0.60   20 0.494 0.510 0.519
46: 0.60   36 0.314 0.240 0.368
47: 0.60   52 0.265 0.232 0.005
48: 0.60   68 0.237 0.230 0.004
49: 0.60   84 0.220 0.205 0.003
50: 0.60  100 0.205 0.204 0.002
51: 0.60  116 0.195 0.182 0.000
52: 0.60  132 0.188 0.168 0.000
53: 0.60  148 0.184 0.156 0.000
54: 0.60  164 0.179 0.158 0.000
55: 0.60  178 0.177 0.145 0.000
56: 0.65   20 0.394 0.205 0.319
57: 0.65   36 0.213 0.095 0.000
58: 0.65   52 0.165 0.096 0.000
59: 0.65   68 0.141 0.095 0.000
60: 0.65   84 0.130 0.095 0.000
61: 0.65  100 0.121 0.076 0.000
62: 0.65  116 0.116 0.052 0.000
63: 0.65  132 0.114 0.054 0.000
64: 0.65  148 0.112 0.052 0.000
65: 0.65  164 0.110 0.051 0.000
66: 0.65  178 0.110 0.051 0.000
67: 0.70   20 0.404 0.256 0.017
68: 0.70   36 0.190 0.165 0.000
69: 0.70   52 0.144 0.163 0.000
70: 0.70   68 0.134 0.163 0.000
71: 0.70   84 0.133 0.163 0.000
72: 0.70  100 0.135 0.163 0.000
73: 0.70  116 0.132 0.163 0.000
74: 0.70  132 0.131 0.163 0.000
75: 0.70  148 0.131 0.163 0.000
76: 0.70  164 0.131 0.163 0.000
77: 0.70  178 0.130 0.163 0.000
78: 0.80   20 0.511 0.323 0.507
79: 0.80   36 0.296 0.211 0.011
80: 0.80   52 0.219 0.181 0.006
81: 0.80   68 0.189 0.181 0.005
82: 0.80   84 0.179 0.181 0.005
83: 0.80  100 0.173 0.181 0.004
84: 0.80  116 0.163 0.181 0.004
85: 0.80  132 0.162 0.181 0.004
86: 0.80  148 0.158 0.178 0.003
87: 0.80  164 0.155 0.181 0.003
88: 0.80  178 0.151 0.165 0.003
89: 1.00   20 0.523 0.441 0.516
90: 1.00   36 0.330 0.245 0.365
91: 1.00   52 0.252 0.227 0.006
92: 1.00   68 0.226 0.226 0.002
93: 1.00   84 0.203 0.224 0.001
94: 1.00  100 0.187 0.203 0.001
95: 1.00  116 0.177 0.202 0.001
96: 1.00  132 0.166 0.202 0.000
97: 1.00  148 0.163 0.202 0.000
98: 1.00  164 0.157 0.202 0.000
99: 1.00  178 0.155 0.199 0.000
      OR look    SD   MAD SDest

examSim(sim1, desc='Partial PO Markov model with constant treatment effect')

Number of successful simulations out of 1000 

      OR look   n
 1: 0.30   20 551
 2: 0.30   36 646
 3: 0.30   52 702
 4: 0.30   68 741
 5: 0.30   84 754
 6: 0.30  100 774
 7: 0.30  116 791
 8: 0.30  132 800
 9: 0.30  148 815
10: 0.30  164 822
11: 0.30  178 831
12: 0.40   20 471
13: 0.40   36 545
14: 0.40   52 636
15: 0.40   68 699
16: 0.40   84 761
17: 0.40  100 817
18: 0.40  116 848
19: 0.40  132 872
20: 0.40  148 886
21: 0.40  164 902
22: 0.40  178 915
23: 0.50   20 399
24: 0.50   36 472
25: 0.50   52 525
26: 0.50   68 585
27: 0.50   84 646
28: 0.50  100 688
29: 0.50  116 734
30: 0.50  132 759
31: 0.50  148 779
32: 0.50  164 796
33: 0.50  178 808
34: 0.55   20 528
35: 0.55   36 635
36: 0.55   52 736
37: 0.55   68 802
38: 0.55   84 856
39: 0.55  100 895
40: 0.55  116 915
41: 0.55  132 931
42: 0.55  148 945
43: 0.55  164 957
44: 0.55  178 962
45: 0.60   20 569
46: 0.60   36 633
47: 0.60   52 698
48: 0.60   68 770
49: 0.60   84 798
50: 0.60  100 835
51: 0.60  116 860
52: 0.60  132 871
53: 0.60  148 881
54: 0.60  164 884
55: 0.60  178 888
56: 0.65   20 460
57: 0.65   36 586
58: 0.65   52 688
59: 0.65   68 760
60: 0.65   84 816
61: 0.65  100 840
62: 0.65  116 863
63: 0.65  132 876
64: 0.65  148 887
65: 0.65  164 891
66: 0.65  178 893
67: 0.70   20 501
68: 0.70   36 614
69: 0.70   52 742
70: 0.70   68 822
71: 0.70   84 879
72: 0.70  100 915
73: 0.70  116 943
74: 0.70  132 961
75: 0.70  148 974
76: 0.70  164 981
77: 0.70  178 984
78: 0.80   20 542
79: 0.80   36 605
80: 0.80   52 708
81: 0.80   68 777
82: 0.80   84 818
83: 0.80  100 847
84: 0.80  116 869
85: 0.80  132 888
86: 0.80  148 894
87: 0.80  164 901
88: 0.80  178 905
89: 1.00   20 629
90: 1.00   36 745
91: 1.00   52 841
92: 1.00   68 910
93: 1.00   84 943
94: 1.00  100 961
95: 1.00  116 975
96: 1.00  132 984
97: 1.00  148 986
98: 1.00  164 992
99: 1.00  178 993
      OR look   n

Comparison of OR and HR 

     OR   HR
1: 0.30 0.34
2: 0.40 0.43
3: 0.50 0.52
4: 0.55 0.56
5: 0.60 0.60
6: 0.65 0.65
7: 0.70 0.70
8: 0.80 0.80
9: 1.00 1.00

Power of Cox test at last look for time until Y=1 

     OR power
1: 0.30  1.00
2: 0.40  1.00
3: 0.50  0.96
4: 0.55  0.92
5: 0.60  0.84
6: 0.65  0.69
7: 0.70  0.53
8: 0.80  0.26
9: 1.00  0.04

Power of Markov proportional odds model test 

      OR look power
 1: 0.30   20  0.70
 2: 0.30   36  0.93
 3: 0.30   52  0.99
 4: 0.30   68  1.00
 5: 0.30   84  1.00
 6: 0.30  100  1.00
 7: 0.30  116  1.00
 8: 0.30  132  1.00
 9: 0.30  148  1.00
10: 0.30  164  1.00
11: 0.30  178  1.00
12: 0.40   20  0.70
13: 0.40   36  0.93
14: 0.40   52  0.98
15: 0.40   68  0.99
16: 0.40   84  1.00
17: 0.40  100  1.00
18: 0.40  116  1.00
19: 0.40  132  1.00
20: 0.40  148  1.00
21: 0.40  164  1.00
22: 0.40  178  1.00
23: 0.50   20  0.53
24: 0.50   36  0.88
25: 0.50   52  0.97
26: 0.50   68  0.99
27: 0.50   84  0.99
28: 0.50  100  1.00
29: 0.50  116  1.00
30: 0.50  132  1.00
31: 0.50  148  1.00
32: 0.50  164  1.00
33: 0.50  178  1.00
34: 0.55   20  0.44
35: 0.55   36  0.71
36: 0.55   52  0.84
37: 0.55   68  0.90
38: 0.55   84  0.93
39: 0.55  100  0.95
40: 0.55  116  0.96
41: 0.55  132  0.97
42: 0.55  148  0.98
43: 0.55  164  0.98
44: 0.55  178  0.99
45: 0.60   20  0.36
46: 0.60   36  0.60
47: 0.60   52  0.72
48: 0.60   68  0.79
49: 0.60   84  0.84
50: 0.60  100  0.87
51: 0.60  116  0.90
52: 0.60  132  0.92
53: 0.60  148  0.94
54: 0.60  164  0.95
55: 0.60  178  0.95
56: 0.65   20  0.54
57: 0.65   36  0.81
58: 0.65   52  0.89
59: 0.65   68  0.92
60: 0.65   84  0.93
61: 0.65  100  0.94
62: 0.65  116  0.95
63: 0.65  132  0.96
64: 0.65  148  0.96
65: 0.65  164  0.97
66: 0.65  178  0.97
67: 0.70   20  0.56
68: 0.70   36  0.87
69: 0.70   52  0.96
70: 0.70   68  0.98
71: 0.70   84  0.98
72: 0.70  100  0.99
73: 0.70  116  0.99
74: 0.70  132  1.00
75: 0.70  148  1.00
76: 0.70  164  1.00
77: 0.70  178  0.99
78: 0.80   20  0.43
79: 0.80   36  0.64
80: 0.80   52  0.72
81: 0.80   68  0.76
82: 0.80   84  0.78
83: 0.80  100  0.80
84: 0.80  116  0.81
85: 0.80  132  0.82
86: 0.80  148  0.83
87: 0.80  164  0.84
88: 0.80  178  0.84
89: 1.00   20  0.35
90: 1.00   36  0.52
91: 1.00   52  0.59
92: 1.00   68  0.63
93: 1.00   84  0.64
94: 1.00  100  0.64
95: 1.00  116  0.65
96: 1.00  132  0.65
97: 1.00  148  0.65
98: 1.00  164  0.65
99: 1.00  178  0.65
      OR look power

Spearman rho correlation between Markov and Cox model chi-square: 0.24

Figure 8.1: Time to Y=1 by group and OR, over simulations. Partial PO Markov model with constant treatment effect.

Figure 8.2: Cumulative incidence of Y=1. Partial PO Markov model with constant treatment effect.

Figure 8.3: Scatter plot of Cox and PO model Z statistics. Partial PO Markov model with constant treatment effect.

For many of the simulations in this constant treatment effect model the vgam function did not converge properly, creating impossibly large maximum likelihood estimates or variance estimates. In the latter part of the output above these outliers have been removed. There you can see the extremely good accuracy of the standard error estimates from the maximum likelihood estimation process.

The transition odds ratios are equal to the hazard ratios in the time-to-recovery analyses and at the last data look the power of the transition model treatment test and the Cox proportional hazards model test are the same. The two \(\chi^2\) statistics are virtually the same.

There is approximately 0.94 power to detect An OR or HR of 0.55 and 0.85 power to detect 0.60, all at the planned maximum sample size of 178 for two treatment arms. As shown earlier, ORs of 0.55 and 0.6 correspond to 3.2 and 2.8 days, respectively, reduction in mean time unrecovered.

Now look at model 2 in which the treatment effect modeled as a quadratic function in time but the true treatment effect is linear up until day 10, then flat. The transition model treatment contrast is made at 10d using the quadratic time \(\times\) treatment interaction.

pr('Model 2')

Model 2

# beta.approxvar <- beta.approxvar2
sim2 <- dosimit(2, g2, y ~ pol(time, 2) * group, file='simflooks2.rds')

Failed simulations 

                                                                                               failure
1:                                                                                        |contrast|>2
2:                                        Error in lm.fit(X.vlm, y = z.vlm, ...) : NA/NaN/Inf in 'y'\n
3:              Error in tapplymat1(ccump, "diff") : \n  NA/NaN/Inf in foreign function call (arg 1)\n
4: Simulated data for simulation with sample size 20 has 2 distinct y values instead of the required 3
5:                                                                                         variance>10
6: Simulated data for simulation with sample size 36 has 2 distinct y values instead of the required 3
7: Simulated data for simulation with sample size 52 has 2 distinct y values instead of the required 3
8: Simulated data for simulation with sample size 68 has 2 distinct y values instead of the required 3
    Freq
1:  3271
2:   843
3: 16518
4:   129
5:     7
6:     5
7:     1
8:     1

Median of 1000 logistic regression coefficients for each OR 

     (Intercept):1 (Intercept):2 pol(time, 2)time:1 pol(time, 2)time:2
0.3         9.9569       -4.1555            -0.7476            -0.1236
0.4         4.4796       -3.8188            -0.0656            -0.2756
0.5         4.5616       -3.5543            -0.0700            -0.3195
0.55        4.2697       -4.0137             0.0019            -0.1969
0.6         5.2952       -3.7688            -0.1867            -0.3462
0.65        4.7171       -3.3079            -0.1121            -0.3683
0.7         4.5102       -3.7812            -0.0416            -0.2238
0.8         4.5040       -3.9113            -0.0412            -0.2397
1           5.0149       -4.0618            -0.1554            -0.1913
     pol(time, 2)time^2:1 pol(time, 2)time^2:2  group2 pol(time, 2)time:group2
0.3                0.0162               0.0074  0.0696                 -0.0050
0.4               -0.0050               0.0191 -0.6262                  0.1763
0.5               -0.0054               0.0211 -0.4226                  0.1163
0.55              -0.0083               0.0132 -0.0180                  0.0002
0.6                0.0006               0.0192 -0.0754                  0.0095
0.65              -0.0032               0.0195 -0.6275                  0.2031
0.7               -0.0071               0.0160 -0.1959                  0.0224
0.8               -0.0062               0.0151 -0.1542                  0.0389
1                 -0.0020               0.0112  0.0725                 -0.0013
     pol(time, 2)time^2:group2
0.3                    -0.0034
0.4                    -0.0156
0.5                    -0.0098
0.55                   -0.0064
0.6                    -0.0029
0.65                   -0.0121
0.7                    -0.0034
0.8                    -0.0044
1                      -0.0014

Standard deviation of simulated group effects, scaled MAD, and square root of median of estimated variances 

      OR look    SD   MAD SDest
 1: 0.30   20 0.704 0.750 0.873
 2: 0.30   36 0.586 0.528 0.606
 3: 0.30   52 0.559 0.514 0.488
 4: 0.30   68 0.528 0.501 0.420
 5: 0.30   84 0.512 0.482 0.377
 6: 0.30  100 0.517 0.450 0.345
 7: 0.30  116 0.517 0.426 0.317
 8: 0.30  132 0.512 0.416 0.298
 9: 0.30  148 0.504 0.412 0.280
10: 0.30  164 0.499 0.427 0.268
11: 0.30  178 0.498 0.438 0.256
12: 0.40   20 0.587 0.243 0.591
13: 0.40   36 0.422 0.178 0.334
14: 0.40   52 0.355 0.172 0.237
15: 0.40   68 0.364 0.166 0.188
16: 0.40   84 0.361 0.164 0.153
17: 0.40  100 0.368 0.181 0.129
18: 0.40  116 0.372 0.194 0.116
19: 0.40  132 0.355 0.194 0.102
20: 0.40  148 0.371 0.197 0.064
21: 0.40  164 0.365 0.195 0.045
22: 0.40  178 0.366 0.197 0.042
23: 0.50   20 0.727 0.669 0.749
24: 0.50   36 0.499 0.565 0.314
25: 0.50   52 0.427 0.324 0.208
26: 0.50   68 0.404 0.326 0.150
27: 0.50   84 0.403 0.230 0.080
28: 0.50  100 0.392 0.218 0.051
29: 0.50  116 0.384 0.202 0.047
30: 0.50  132 0.367 0.201 0.033
31: 0.50  148 0.365 0.197 0.031
32: 0.50  164 0.354 0.195 0.006
33: 0.50  178 0.351 0.197 0.005
34: 0.55   20 0.768 0.563 0.865
35: 0.55   36 0.567 0.378 0.548
36: 0.55   52 0.446 0.362 0.316
37: 0.55   68 0.394 0.298 0.236
38: 0.55   84 0.363 0.234 0.183
39: 0.55  100 0.332 0.221 0.162
40: 0.55  116 0.306 0.184 0.135
41: 0.55  132 0.298 0.163 0.122
42: 0.55  148 0.287 0.159 0.114
43: 0.55  164 0.288 0.151 0.108
44: 0.55  178 0.279 0.133 0.103
45: 0.60   20 0.790 0.676 0.877
46: 0.60   36 0.603 0.383 0.554
47: 0.60   52 0.481 0.319 0.323
48: 0.60   68 0.422 0.305 0.239
49: 0.60   84 0.395 0.283 0.185
50: 0.60  100 0.362 0.281 0.132
51: 0.60  116 0.341 0.270 0.108
52: 0.60  132 0.325 0.269 0.100
53: 0.60  148 0.326 0.266 0.094
54: 0.60  164 0.320 0.261 0.089
55: 0.60  178 0.314 0.258 0.082
56: 0.65   20 0.709 0.479 0.722
57: 0.65   36 0.486 0.380 0.300
58: 0.65   52 0.395 0.362 0.207
59: 0.65   68 0.374 0.360 0.046
60: 0.65   84 0.372 0.351 0.040
61: 0.65  100 0.358 0.352 0.034
62: 0.65  116 0.354 0.348 0.031
63: 0.65  132 0.353 0.346 0.029
64: 0.65  148 0.347 0.344 0.027
65: 0.65  164 0.356 0.346 0.025
66: 0.65  178 0.353 0.345 0.024
67: 0.70   20 0.737 0.466 0.822
68: 0.70   36 0.424 0.319 0.174
69: 0.70   52 0.330 0.215 0.120
70: 0.70   68 0.290 0.224 0.103
71: 0.70   84 0.281 0.218 0.092
72: 0.70  100 0.278 0.216 0.084
73: 0.70  116 0.276 0.215 0.078
74: 0.70  132 0.269 0.216 0.073
75: 0.70  148 0.268 0.219 0.069
76: 0.70  164 0.264 0.220 0.065
77: 0.70  178 0.261 0.222 0.063
78: 0.80   20 0.710 0.569 0.814
79: 0.80   36 0.463 0.287 0.272
80: 0.80   52 0.351 0.200 0.120
81: 0.80   68 0.310 0.168 0.093
82: 0.80   84 0.276 0.162 0.078
83: 0.80  100 0.257 0.159 0.069
84: 0.80  116 0.242 0.131 0.063
85: 0.80  132 0.235 0.156 0.059
86: 0.80  148 0.230 0.156 0.056
87: 0.80  164 0.220 0.149 0.053
88: 0.80  178 0.213 0.162 0.051
89: 1.00   20 0.801 0.781 0.886
90: 1.00   36 0.563 0.440 0.577
91: 1.00   52 0.439 0.378 0.451
92: 1.00   68 0.368 0.278 0.384
93: 1.00   84 0.322 0.267 0.292
94: 1.00  100 0.293 0.261 0.223
95: 1.00  116 0.273 0.257 0.206
96: 1.00  132 0.253 0.252 0.193
97: 1.00  148 0.241 0.233 0.182
98: 1.00  164 0.229 0.228 0.172
99: 1.00  178 0.225 0.223 0.165
      OR look    SD   MAD SDest

examSim(sim2, desc='Partial PO Markov model with varying treatment effect')

Number of successful simulations out of 1000 

      OR look   n
 1: 0.30   20 556
 2: 0.30   36 670
 3: 0.30   52 710
 4: 0.30   68 736
 5: 0.30   84 753
 6: 0.30  100 757
 7: 0.30  116 774
 8: 0.30  132 783
 9: 0.30  148 780
10: 0.30  164 781
11: 0.30  178 779
12: 0.40   20 448
13: 0.40   36 553
14: 0.40   52 621
15: 0.40   68 687
16: 0.40   84 730
17: 0.40  100 764
18: 0.40  116 786
19: 0.40  132 805
20: 0.40  148 831
21: 0.40  164 848
22: 0.40  178 863
23: 0.50   20 470
24: 0.50   36 566
25: 0.50   52 654
26: 0.50   68 711
27: 0.50   84 763
28: 0.50  100 784
29: 0.50  116 818
30: 0.50  132 842
31: 0.50  148 864
32: 0.50  164 870
33: 0.50  178 874
34: 0.55   20 535
35: 0.55   36 625
36: 0.55   52 668
37: 0.55   68 725
38: 0.55   84 753
39: 0.55  100 782
40: 0.55  116 801
41: 0.55  132 817
42: 0.55  148 833
43: 0.55  164 847
44: 0.55  178 855
45: 0.60   20 588
46: 0.60   36 729
47: 0.60   52 815
48: 0.60   68 871
49: 0.60   84 918
50: 0.60  100 942
51: 0.60  116 954
52: 0.60  132 969
53: 0.60  148 971
54: 0.60  164 976
55: 0.60  178 981
56: 0.65   20 416
57: 0.65   36 585
58: 0.65   52 699
59: 0.65   68 802
60: 0.65   84 859
61: 0.65  100 892
62: 0.65  116 927
63: 0.65  132 953
64: 0.65  148 963
65: 0.65  164 975
66: 0.65  178 975
67: 0.70   20 367
68: 0.70   36 538
69: 0.70   52 661
70: 0.70   68 757
71: 0.70   84 838
72: 0.70  100 885
73: 0.70  116 909
74: 0.70  132 933
75: 0.70  148 936
76: 0.70  164 954
77: 0.70  178 960
78: 0.80   20 429
79: 0.80   36 585
80: 0.80   52 723
81: 0.80   68 783
82: 0.80   84 844
83: 0.80  100 869
84: 0.80  116 890
85: 0.80  132 897
86: 0.80  148 901
87: 0.80  164 903
88: 0.80  178 905
89: 1.00   20 555
90: 1.00   36 705
91: 1.00   52 807
92: 1.00   68 854
93: 1.00   84 910
94: 1.00  100 932
95: 1.00  116 942
96: 1.00  132 952
97: 1.00  148 957
98: 1.00  164 957
99: 1.00  178 955
      OR look   n

Comparison of OR and HR 

     OR   HR
1: 0.30 0.39
2: 0.40 0.47
3: 0.50 0.55
4: 0.55 0.59
5: 0.60 0.63
6: 0.65 0.68
7: 0.70 0.72
8: 0.80 0.80
9: 1.00 1.00

Power of Cox test at last look for time until Y=1 

     OR power
1: 0.30  1.00
2: 0.40  0.99
3: 0.50  0.94
4: 0.55  0.85
5: 0.60  0.76
6: 0.65  0.61
7: 0.70  0.48
8: 0.80  0.23
9: 1.00  0.06

Power of Markov proportional odds model test 

      OR look power
 1: 0.30   20  0.06
 2: 0.30   36  0.20
 3: 0.30   52  0.37
 4: 0.30   68  0.51
 5: 0.30   84  0.64
 6: 0.30  100  0.73
 7: 0.30  116  0.80
 8: 0.30  132  0.86
 9: 0.30  148  0.89
10: 0.30  164  0.90
11: 0.30  178  0.92
12: 0.40   20  0.13
13: 0.40   36  0.30
14: 0.40   52  0.47
15: 0.40   68  0.62
16: 0.40   84  0.71
17: 0.40  100  0.77
18: 0.40  116  0.83
19: 0.40  132  0.85
20: 0.40  148  0.88
21: 0.40  164  0.89
22: 0.40  178  0.90
23: 0.50   20  0.19
24: 0.50   36  0.45
25: 0.50   52  0.61
26: 0.50   68  0.73
27: 0.50   84  0.78
28: 0.50  100  0.82
29: 0.50  116  0.83
30: 0.50  132  0.85
31: 0.50  148  0.85
32: 0.50  164  0.86
33: 0.50  178  0.88
34: 0.55   20  0.16
35: 0.55   36  0.34
36: 0.55   52  0.46
37: 0.55   68  0.59
38: 0.55   84  0.65
39: 0.55  100  0.70
40: 0.55  116  0.74
41: 0.55  132  0.77
42: 0.55  148  0.80
43: 0.55  164  0.82
44: 0.55  178  0.83
45: 0.60   20  0.11
46: 0.60   36  0.29
47: 0.60   52  0.42
48: 0.60   68  0.52
49: 0.60   84  0.58
50: 0.60  100  0.61
51: 0.60  116  0.64
52: 0.60  132  0.65
53: 0.60  148  0.67
54: 0.60  164  0.70
55: 0.60  178  0.73
56: 0.65   20  0.22
57: 0.65   36  0.36
58: 0.65   52  0.50
59: 0.65   68  0.56
60: 0.65   84  0.60
61: 0.65  100  0.63
62: 0.65  116  0.66
63: 0.65  132  0.67
64: 0.65  148  0.69
65: 0.65  164  0.70
66: 0.65  178  0.71
67: 0.70   20  0.20
68: 0.70   36  0.42
69: 0.70   52  0.49
70: 0.70   68  0.53
71: 0.70   84  0.57
72: 0.70  100  0.64
73: 0.70  116  0.67
74: 0.70  132  0.68
75: 0.70  148  0.71
76: 0.70  164  0.71
77: 0.70  178  0.72
78: 0.80   20  0.23
79: 0.80   36  0.36
80: 0.80   52  0.42
81: 0.80   68  0.52
82: 0.80   84  0.56
83: 0.80  100  0.60
84: 0.80  116  0.61
85: 0.80  132  0.62
86: 0.80  148  0.62
87: 0.80  164  0.61
88: 0.80  178  0.62
89: 1.00   20  0.07
90: 1.00   36  0.15
91: 1.00   52  0.24
92: 1.00   68  0.27
93: 1.00   84  0.29
94: 1.00  100  0.31
95: 1.00  116  0.32
96: 1.00  132  0.33
97: 1.00  148  0.34
98: 1.00  164  0.34
99: 1.00  178  0.35
      OR look power

Spearman rho correlation between Markov and Cox model chi-square: 0.21

Figure 8.4: Time to Y=1 by group and OR, over simulations. Partial PO Markov model with varying treatment effect.

Figure 8.5: Cumulative incidence of Y=1. Partial PO Markov model with varying treatment effect.

Figure 8.6: Scatter plot of Cox and PO model Z statistics. Partial PO Markov model with varying treatment effect.

Devoting 3 parameters to the treatment effect as done above is very expensive, with major reduction in power when testing the transition odds ratio. This is entirely consistent with the increased variance of the treatment contrast in the second model. The next section checks whether this inefficiency carries over to mean time unrecovered.

8.4 Number of Treatment Parameters and Efficiency of Reduction in Days Unrecovered

It is possible that as opposed to a treatment contrast of transition odds at a given day, an integrative summary such as mean number of days unrecovered will not inherit the same degree of imprecision / power loss as seen in the variance of the treatment contrast at a specific day when time \(\times\) treatment interaction is included in the model. We explore that here by simulating one dataset with 150 patients in each group, under both model 1 and model 2. Bayesian Markov proportional odds models are fitted and the posterior distribution of the reduction in mean number of days unrecovered is estimated from 4000 posterior samples. The data are simulated under a constant treatment effect with OR=0.75 (\(\beta=-0.29\)) and are analyzed as such in model 1 or using a quadratic time \(\times\) treatment interaction in model 2, and requesting mean times in states Y=1-2 over days 1-28. A constrained partial proportional odds model is used, where the model is in non-PO with regard to time. With only 3 Y levels, the constrained partial PO model is identical to an unconstrained partial PO model. The process is repeated for 4 simulated trials.

require(rmsb)

stanSet()

# Function to compute unconditional probabilities (state occupancy probabilities) for a single
# treatment and model for each posterior draw, then compute all 4000 posterior samples of mean time unrecovered
mtime <- function(fit, tx) {
  u <- soprobMarkovOrdm(fit, data.frame(group=tx, yprev=1), times=1:28, ylevels=0:2, absorb=c(0, 2))
  # Being in state Y=1,2 is equivalent to not being in state 0 (first column of u)
  apply(u, 1, function(z) sum(1. - z[, 1]))
}
ppos <- function(x, eps=0) mean(x > eps)
n <- 300
mt <- NULL
for(isim in 1 : 4) {
  set.seed(10 + isim)
  cat('\nSimulation', isim, '\n\n')
  s1 <- simMarkovOrd(n=n / 2, y=0:2, 1:28, initial=1, X=c(group=1),
                                     absorb=c(0, 2), intercepts=ints, g=g1, parameter=0)
  s2 <- simMarkovOrd(n=n / 2, y=0:2, 1:28, initial=1, X=c(group=2),
                                 absorb=c(0, 2), intercepts=ints, g=g, parameter=log(0.75))
  s       <- rbind(s1, s2)
  s$group <- as.factor(s$group)
  for(wg in 1 : 2) {
    cat(if(wg == 1) 'Constant treatment effect\n\n' else '\nTime-varying treatment effect\n\n')
    form <-  y ~ pol(time, 2) + group
    if(wg == 2) form <- y ~ pol(time, 2) * group
    fi <- paste0('bcost', wg, isim, '.rds')
    b <- blrm(form, ~ pol(time, 2), cppo=function(y) y, data=s, file=fi)
    print(coef(b), digits=3)
    mt <- rbind(mt,  data.frame(model=wg, simulation=isim, delta=mtime(b, '1') - mtime(b, '2')))
  }
}


Simulation 1 

Constant treatment effect

         y>=1          y>=2          time        time^2       group=2 
     12.20085      -4.06449      -1.00093       0.02300      -0.49255 
  time x f(y) time^2 x f(y) 
      0.21257      -0.00399 

Time-varying treatment effect

            y>=1             y>=2             time           time^2 
        12.83350         -3.78811         -1.08131          0.02529 
         group=2   time * group=2 time^2 * group=2      time x f(y) 
        -1.51404          0.13024         -0.00375          0.22362 
   time^2 x f(y) 
        -0.00433 

Simulation 2 

Constant treatment effect

         y>=1          y>=2          time        time^2       group=2 
     10.46410      -4.02131      -0.87605       0.02134      -0.19547 
  time x f(y) time^2 x f(y) 
      0.16880      -0.00332 

Time-varying treatment effect

            y>=1             y>=2             time           time^2 
        10.24848         -4.25329         -0.84882          0.02060 
         group=2   time * group=2 time^2 * group=2      time x f(y) 
         0.22388         -0.05393          0.00151          0.16925 
   time^2 x f(y) 
        -0.00333 

Simulation 3 

Constant treatment effect

         y>=1          y>=2          time        time^2       group=2 
     11.67600      -4.61384      -0.98478       0.02317      -0.26654 
  time x f(y) time^2 x f(y) 
      0.22939      -0.00486 

Time-varying treatment effect

            y>=1             y>=2             time           time^2 
        11.90480         -4.48874         -1.02025          0.02436 
         group=2   time * group=2 time^2 * group=2      time x f(y) 
        -0.68300          0.06240         -0.00206          0.23303 
   time^2 x f(y) 
        -0.00498 

Simulation 4 

Constant treatment effect

         y>=1          y>=2          time        time^2       group=2 
     10.65499      -3.92185      -0.85677       0.02010      -0.42973 
  time x f(y) time^2 x f(y) 
      0.16838      -0.00316 

Time-varying treatment effect

            y>=1             y>=2             time           time^2 
       10.534281        -4.019671        -0.845205         0.019876 
         group=2   time * group=2 time^2 * group=2      time x f(y) 
       -0.227813        -0.019296         0.000374         0.167874 
   time^2 x f(y) 
       -0.003159

setDT(mt, key=c('simulation', 'model'))
cat('\nPosterior probabilities of > 0 reduction in days unrecovered:\n\n')


Posterior probabilities of > 0 reduction in days unrecovered:

mt[, ppos(delta), by=.(simulation, model)]

   simulation model      V1
1:          1     1 1.00000
2:          1     2 0.99975
3:          2     1 0.94025
4:          2     2 0.92100
5:          3     1 0.98475
6:          3     2 0.97625
7:          4     1 0.99975
8:          4     2 0.99900

cat('\nPosterior probabilities of > 1d reduction in days unrecovered:\n\n')


Posterior probabilities of > 1d reduction in days unrecovered:

mt[, ppos(delta, 1), by=.(simulation, model)]

   simulation model      V1
1:          1     1 0.99225
2:          1     2 0.99225
3:          2     1 0.54050
4:          2     2 0.52000
5:          3     1 0.73350
6:          3     2 0.72450
7:          4     1 0.97375
8:          4     2 0.95750

ggplot(mt, aes(x=delta, color=paste('Model', model))) + geom_density() +
  facet_wrap(~ paste('Simulation', simulation), nrow=2) +
  xlab('Reduction in Days Unrecovered') + labs(color='')

As judged by the posterior probabilities of any reduction in expected number of days unrecovered and the width of the posterior distributions for the treatment difference in expected days unrecovered, there is a small to moderate loss in efficiency on the integrative summary when using two more treatment parameters in the transition model.

Now instead of including treatment \(\times\) time interactions, assume a constant treatment effect over time but allow for the treatment effect to not be constant over levels of Y, i.e., allow the treatment effect to not satisfy the proportional odds assumption. We need to define a new g function that will simulate such data. The increment in the treatment log odds ratio for Y=2 is taken as 0.5, meaning that if the treatment is effective for reducing P(Y=1) it is less effective for reducing mortality. We use the g3 function defined earlier.

mt <- NULL
formals(g3)$extra <- c(sparams$extra, 0.5)
## tau=0.5 above; tau ignored when X=1

for(isim in 1 : 4) {
  set.seed(20 + isim)
  cat('\nSimulation', isim, '\n\n')
  s1 <- simMarkovOrd(n=n / 2, y=0:2, 1:28, initial=1, X=c(group=1),
                                     absorb=c(0, 2), intercepts=ints, g=g3, parameter=0)
  s2 <- simMarkovOrd(n=n / 2, y=0:2, 1:28, initial=1, X=c(group=2),
                                 absorb=c(0, 2), intercepts=ints, g=g3, parameter=log(0.75))
  s       <- rbind(s1, s2)
  s$group <- as.factor(s$group)
  for(wg in 1 : 2) {
    cat(if(wg == 1) 'Constant treatment effect with PO\n\n' else '\nCoonstant treatment effect with non-PO\n\n')
    form <-  y ~ pol(time, 2) + group
    ppo  <- ~ pol(time, 2)
    if(wg == 2) ppo <- ~ pol(time, 2) + group
    fi <- paste0('bcostppo', wg, isim, '.rds')
    b <- blrm(form, ppo, cppo=function(y) y, data=s, file=fi)
    print(coef(b), digits=3)
    mt <- rbind(mt,  data.frame(model=wg, simulation=isim, delta=mtime(b, '1') - mtime(b, '2')))
  }
}


Simulation 1 

Constant treatment effect with PO

         y>=1          y>=2          time        time^2       group=2 
     10.46401      -3.94949      -0.86297       0.02036      -0.11443 
  time x f(y) time^2 x f(y) 
      0.17833      -0.00329 

Coonstant treatment effect with non-PO

          y>=1           y>=2           time         time^2        group=2 
      10.55681       -4.07600       -0.86894        0.02053       -0.20462 
   time x f(y)  time^2 x f(y) group=2 x f(y) 
       0.18014       -0.00334        0.07581 

Simulation 2 

Constant treatment effect with PO

         y>=1          y>=2          time        time^2       group=2 
      11.2604       -4.0016       -0.9464        0.0225       -0.2445 
  time x f(y) time^2 x f(y) 
       0.2061       -0.0042 

Coonstant treatment effect with non-PO

          y>=1           y>=2           time         time^2        group=2 
      11.26427       -4.07955       -0.94130        0.02230       -0.31264 
   time x f(y)  time^2 x f(y) group=2 x f(y) 
       0.20443       -0.00413        0.05540 

Simulation 3 

Constant treatment effect with PO

         y>=1          y>=2          time        time^2       group=2 
     11.10324      -4.09555      -0.89898       0.02025      -0.10019 
  time x f(y) time^2 x f(y) 
      0.18825      -0.00329 

Coonstant treatment effect with non-PO

          y>=1           y>=2           time         time^2        group=2 
       11.1554        -4.2576        -0.8952         0.0201        -0.2350 
   time x f(y)  time^2 x f(y) group=2 x f(y) 
        0.1864        -0.0032         0.1090 

Simulation 4 

Constant treatment effect with PO

         y>=1          y>=2          time        time^2       group=2 
     11.03819      -3.58853      -0.89603       0.02059       0.00159 
  time x f(y) time^2 x f(y) 
      0.16940      -0.00265 

Coonstant treatment effect with non-PO

          y>=1           y>=2           time         time^2        group=2 
      11.22055       -3.78456       -0.90786        0.02092       -0.18644 
   time x f(y)  time^2 x f(y) group=2 x f(y) 
       0.17187       -0.00271        0.13216

setDT(mt, key=c('simulation', 'model'))
cat('\nPosterior probabilities of > 0 reduction in days unrecovered:\n\n')


Posterior probabilities of > 0 reduction in days unrecovered:

mt[, ppos(delta), by=.(simulation, model)]

   simulation model      V1
1:          1     1 0.82025
2:          1     2 0.78225
3:          2     1 0.97050
4:          2     2 0.97100
5:          3     1 0.79200
6:          3     2 0.73650
7:          4     1 0.49475
8:          4     2 0.48975

cat('\nPosterior probabilities of > 1d reduction in days unrecovered:\n\n')


Posterior probabilities of > 1d reduction in days unrecovered:

mt[, ppos(delta, 1), by=.(simulation, model)]

   simulation model      V1
1:          1     1 0.30300
2:          1     2 0.27275
3:          2     1 0.69450
4:          2     2 0.66000
5:          3     1 0.24925
6:          3     2 0.19800
7:          4     1 0.07900
8:          4     2 0.07675

ggplot(mt, aes(x=delta, color=paste('Model', model))) + geom_density() +
  facet_wrap(~ paste('Simulation', simulation), nrow=2) +
  xlab('Reduction in Days Unrecovered') + labs(color='')

9 Simulation of Bayesian Sequential Trials

When considering Bayesian operating characteristics, as detailed here it is important to note that the Bayesian approach does not condition on unknowns, so the probability of deciding in favor of efficacy were the true efficacy to be zero is not a probability that makes sense in the Bayesian paradigm. What is important for a Bayesian analysis to control is the probability of making a wrong decision given the data used at the decision point. This is one minus the probability of efficacy. This probability can be miscalibrated for only three reasons:

the data model is incorrect (a problem shared by the frequentist paradigm)
the study design or measurements are biased (also shared by frequentist)
the prior is disagreeable (most often caused by an investigator assuming a prior that is too optimistic in favor of a positive treatment benefit)

So the unique aspect to examine in the Bayesian system is the prior distribution. Once the prior is agreed upon, the accuracy of posterior probabilities flows automatically, given the first two bullet points above are not problematic.

For those who insist on applying frequentist concepts to interpretation of Bayesian results, Bayesian power and the Bayesian equivalent of type I assertion probability \(\alpha\) under the null (OR=1.0) are given below.

Since we are using the normal approximation to the data likelihood, the time-consuming part of the simulations has already be done. What remains is to translate the log odds ratio estimates their estimated variances to posterior distributions. We use flat priors for all but the treatment effect, and apply a skeptical normal prior distribution for the treatment effect. Then the posterior distribution for the treatment effect is also normal. The skeptical prior is set up as follows. We assume it is equally likely that the treatment causes harm as that it benefits the patient. Thus the mean of the normal prior distribution for the treatment effect log(OR) is zero. The standard deviation is chosen so that P(OR < 0.25) = P(OR > 4) = 0.05. Three other priors are used for assessing efficacy: a more skeptical one, an even more skeptical one, and one that is nearly flat.

The data model assumes a constant treatment effect on the transition OR scale.

Posterior probabilities will be computed for the following assertions:

Efficacy: P(OR < 1)
Efficacy: P(OR < 1) with a target of 0.975 instead of 0.95 (all other efficacy assessments target 0.95)
Efficacy, more skeptical prior: P(OR < 0.25) = P(OR > 4) = 0.01
Efficacy, extremely skeptical prior: P(OR < 0.25) = P(OR > 4) = 0.001
Efficacy, nearly flat prior: P(OR < 1) with prior P(OR < 0.25) = P(OR > 4) = 0.4.
Inefficacy/harm: P(OR > 1)

Bayesian power is taken to be the probability that the posterior probability of a beneficial effect exceeds 0.95 at the end of the study. We also compute the probability of finding a greater than 0.9 probability of a harmful effect.

asserts <-
  list(list('Efficacy',                       '<', 0, cutprior=log(4), tailprob=0.05),
       list('Efficacy, target 0.975',         '<', 0, cutprior=log(4), tailprob=0.01),
       list('Efficacy, more skeptical prior', '<', 0, cutprior=log(4), tailprob=0.01),
       list('Efficacy, extremely skeptical prior', '<', 0, cutprior=log(4), tailprob=0.001),
       list('Efficacy, flat prior',           '<', 0, cutprior=log(4), tailprob=0.4),
       list('Inefficacy/harm',                '>', 0, cutprior=log(4), tailprob=0.05))
  
s <- gbayesSeqSim(sim1$sim, asserts=asserts)
setDT(s, key=c('OR', 'sim', 'look'))
s[, c('parameter', 'loghr', 'lrchisq') := NULL]
head(s)

   sim look        est       vest  OR        p1      mean1       sd1        p2
1: 1 1   20 -0.8169981 0.29389416 0.3 0.8975074 -0.5778956 0.4559421 0.8675242
2: 1 1   36 -0.9746272 0.16515420 0.3 0.9846224 -0.7907690 0.3660584 0.9762244
3: 1 1   52 -0.7542444 0.10862971 0.3 0.9834661 -0.6541980 0.3069537 0.9773868
4: 1 1   68 -0.6649038 0.07748615 0.3 0.9883388 -0.5995063 0.2643197 0.9847735
5: 1 1   84 -0.8250990 0.06798828 0.3 0.9987487 -0.7530237 0.2490969 0.9981283
6: 1 1  100 -0.9018082 0.05831168 0.3 0.9998347 -0.8333935 0.2321377 0.9997311
        mean2       sd2        p3      mean3       sd3        p4      mean4
1: -0.4470292 0.4010079 0.8675242 -0.4470292 0.4010079 0.8316695 -0.3320634
2: -0.6652380 0.3357485 0.9762244 -0.6652380 0.3357485 0.9622470 -0.5353166
3: -0.5775643 0.2884154 0.9773868 -0.5775643 0.2884154 0.9674230 -0.4898378
4: -0.5458067 0.2522040 0.9847735 -0.5458067 0.2522040 0.9788030 -0.4800643
5: -0.6925123 0.2388789 0.9981283 -0.6925123 0.2388789 0.9968888 -0.6167417
6: -0.7746110 0.2238012 0.9997311 -0.7746110 0.2238012 0.9994962 -0.6992104
         sd4        p5      mean5       sd5           p6      mean6       sd6
1: 0.3456173 0.9331540 -0.8090568 0.5394789 0.1024925729 -0.5778956 0.4559421
2: 0.3011833 0.9916138 -0.9692808 0.4052755 0.0153775728 -0.7907690 0.3660584
3: 0.2656100 0.9888228 -0.7515178 0.3289940 0.0165338881 -0.6541980 0.3069537
4: 0.2365279 0.9914728 -0.6631875 0.2780038 0.0116612321 -0.5995063 0.2643197
5: 0.2254320 0.9992133 -0.8232297 0.2604501 0.0012513476 -0.7530237 0.2490969
6: 0.2126300 0.9999046 -0.9000553 0.2412433 0.0001652869 -0.8333935 0.2321377

attr(s, 'asserts')

                                label cutprior tailprob mu     sigma assertion
1                            Efficacy 1.386294    0.050  0 0.8428071       < 0
2              Efficacy, target 0.975 1.386294    0.010  0 0.5959102       < 0
3      Efficacy, more skeptical prior 1.386294    0.010  0 0.5959102       < 0
4 Efficacy, extremely skeptical prior 1.386294    0.001  0 0.4486052       < 0
5                Efficacy, flat prior 1.386294    0.400  0 5.4719172       < 0
6                     Inefficacy/harm 1.386294    0.050  0 0.8428071       > 0

alabels <- attr(s, 'alabels')  # named vector to map p1 p2 to labels

Compute the Bayesian power—the probability of hitting assertion-specific targets at the planned study end.

# Reshape results into taller and thinner data table so can plot over multiple assertions

ps <- names(alabels)
m  <- melt(s,
           measure.vars=list(ps, paste0('mean', 1:5), paste0('sd', 1:5)),
           variable.name='assert', value.name=c('p', 'mean', 'sd'))
m[, assert := alabels[assert]]
setkey(m, OR, assert, sim, look)
head(m)

   sim look        est       vest  OR      mean6       sd6   assert         p
1: 1 1   20 -0.8169981 0.29389416 0.3 -0.5778956 0.4559421 Efficacy 0.8975074
2: 1 1   36 -0.9746272 0.16515420 0.3 -0.7907690 0.3660584 Efficacy 0.9846224
3: 1 1   52 -0.7542444 0.10862971 0.3 -0.6541980 0.3069537 Efficacy 0.9834661
4: 1 1   68 -0.6649038 0.07748615 0.3 -0.5995063 0.2643197 Efficacy 0.9883388
5: 1 1   84 -0.8250990 0.06798828 0.3 -0.7530237 0.2490969 Efficacy 0.9987487
6: 1 1  100 -0.9018082 0.05831168 0.3 -0.8333935 0.2321377 Efficacy 0.9998347
         mean        sd
1: -0.5778956 0.4559421
2: -0.7907690 0.3660584
3: -0.6541980 0.3069537
4: -0.5995063 0.2643197
5: -0.7530237 0.2490969
6: -0.8333935 0.2321377

# Re-check frequentist power
z <- m[assert == 'Efficacy' & look == max(looks), ]
pr('Frequentist power',
   z[, .(Power=mean(abs(est / sqrt(vest)) > 1.959964, na.rm=TRUE)), by=OR])


Frequentist power 

     OR Power
1: 0.30 1.000
2: 0.40 1.000
3: 0.50 1.000
4: 0.55 0.988
5: 0.60 0.955
6: 0.65 0.972
7: 0.70 0.995
8: 0.80 0.844
9: 1.00 0.651

# Check Bayesian power if only looked at the end
pr('Bayesian power with one data look at n=178',
   z[, .(Power=mean(p > 0.95, na.rm=TRUE)), by=OR])


Bayesian power with one data look at n=178 

     OR Power
1: 0.30 1.000
2: 0.40 1.000
3: 0.50 1.000
4: 0.55 0.994
5: 0.60 0.976
6: 0.65 0.975
7: 0.70 0.996
8: 0.80 0.785
9: 1.00 0.288

# Define targets
ptarget <- c('Efficacy'                            = 0.95,
             'Efficacy, target 0.975'              = 0.975,
             'Efficacy, more skeptical prior'      = 0.95,
             'Efficacy, extremely skeptical prior' = 0.95,
             'Efficacy, flat prior'                = 0.95,
             'Inefficacy/harm'                     = 0.9 )

m[, ptarget := ptarget[assert]]   # spreads targets to all rows

# Show the posterior probability of efficacy paths for 12 of the simulated RCTs
w <- m[assert == 'Efficacy' & sim < '1 20', ]
setkey(w, OR, sim, look)
ggplot(w, aes(look, p, color=factor(sim))) + geom_line() +
  facet_wrap(~ OR) + xlab('Sample Size at Data Look') +
  ylab('Posterior Probability of Efficacy') +
  labs(color='', title='Representative posterior probability of efficacy paths',
       subtitle='12 simulated RCTs, one panel per true OR') +
  guides(color=FALSE)

# Target is hit if the posterior probability of the assertion ever reaches
# the probability target.   This happens if and only if the maximum
# post. prob. exceeds the target, so compute pmax for each combination of
# OR, assert, and sim

pm <- m[, .(pmax = max(p, na.rm=TRUE), ptarget=ptarget[1]), by=.(OR, assert, sim)]

# Compute mean maximum post prob
pmm <- pm[, .(meanpp = mean(pmax, na.rm=TRUE)), by=.(OR, assert)]
ggplot(pmm, aes(x=OR, y=meanpp, color=assert)) + geom_point() +
  labs(color='', title='Mean Maximum Posterior Probability over Data Looks',
       subtitle='Evidence for harm is never great since no OR > 1 were tested')

# hit = 0/1 indicator if hitting target at the single fixed sample size
u <- pm[, .(Power = mean(pmax > ptarget, na.rm=TRUE)), by=.(OR, assert)]
z <- u
z$txt <- with(z, paste0('OR:', OR, '<br>', assert, '<br>Hit probability:', round(Power, 3)))
ggp(ggplot(z, aes(x=OR, y=Power, color=assert, label=txt)) +
  geom_line() +
  xlab('OR') + ylab('Proportion Hitting Posterior Probability Target') +
  guides(color=guide_legend(title='Assertion')))

Figure 9.1: Probability of hitting posterior probability targets for various assertions, based on the posterior distribution of the odds ratio. Posterior probability targets could also be constructed for state occupancy probabilities.

Probability of hitting posterior probability targets for various assertions, based on the posterior distribution of the odds ratio. Posterior probability targets could also be constructed for state occupancy probabilities.

From the table below, Bayesian power based on multiple data looks is 0.960 if the true OR=0.6 and the original prior distribution is used. Under the more skeptical prior power drops to 0.955. The Bayesian equivalent of the type I assertion probability \(\alpha\) is found by viewing the row with OR=1.0, where the proportion of simulated clinical trials that would hit the 0.95 posterior probability efficacy target at any look is 0.143, and is 0.123 for the more skeptical prior. The sacrifice in Bayesian power from moving to the more skeptical prior distribution is a difference of 0.005 at OR=0.6. The loss is a little greater at OR=0.8. Instead of using a posterior probability cutoff of 0.95, when 0.975 is used in conjunction with the original skeptical prior, the probability if ever hitting the efficacy target when OR=1.0 is 0.062.

uu <- dcast(u, OR ~ assert, value.var='Power')
knitr::kable(uu, digits=3, caption='Bayesian power: probability of hitting the target at any look')

Table 9.1: Bayesian power: probability of hitting the target at any look
OR	Efficacy	Efficacy, extremely skeptical prior	Efficacy, flat prior	Efficacy, more skeptical prior	Efficacy, target 0.975	Inefficacy/harm
0.30	0.871	0.861	0.874	0.866	0.859	0.000
0.40	0.930	0.919	0.931	0.925	0.919	0.001
0.50	0.845	0.842	0.848	0.844	0.841	0.001
0.55	0.971	0.970	0.972	0.970	0.965	0.003
0.60	0.897	0.890	0.898	0.895	0.879	0.004
0.65	0.922	0.918	0.924	0.921	0.912	0.002
0.70	0.992	0.992	0.992	0.992	0.992	0.013
0.80	0.781	0.765	0.793	0.776	0.740	0.114
1.00	0.341	0.328	0.359	0.332	0.310	0.461

10 Futility Analysis

Futility is based on the probability that, given the current posterior probability \(P_c\) at one of the first data looks, the posterior probability of efficacy will never be \(\geq 0.95\), written symbolically as \(P(\max(P(OR < 1) | P_{c})) < 0.95)\) where the maximum is over the remaining data looks. This failure probability will be estimated as a function of \(P_c\) by simulation, and is not conditional on a value of the OR.

More details about futility calculations may be found here.

For looks at n=20, 36, 52, 68, 84, 100, 116, 132, 148, 164 save the posterior probabilities of efficacy and use these to predict, using a binary logistic model, the probability that the posterior probability is \(\geq\) 0.95.

w <- s[, .(OR, sim, look, p1)]
# Save final post prob and put on each record
w[, p := p1[.N], by=.(OR, sim)]
w <- w[look < max(looks),]
w[, n := factor(look)]
dd <- datadist(w); options(datadist='dd')
cqnorm <- function(x) qnorm(pmax(pmin(x, 1. - 1e-5), 1e-5))
f <- lrm(p >= 0.95 ~ rcs(cqnorm(p1), 4) * n, data=w, maxit=30)
ggp(ggplot(Predict(f, p1, n, fun=plogis, conf.int=FALSE), colorscale=function(...) {}) +
  scale_x_continuous(breaks=seq(0, 1, by=0.1), minor_breaks=seq(0, 1, by=0.05)) +
  scale_y_continuous(breaks=seq(0, 1, by=0.1), minor_breaks=seq(0, 1, by=0.05)) +
  xlab('Early Posterior Probability') +
  ylab('P(Final posterior probability >= 0.95)'))

For there to be better than a 50/50 chance of ultimate success the posterior probability must exceed 0.4 at the first look (n=20 for two treatment arms) or 0.86 at n=100.

See what proportion of trials, separately for each true unknown OR, would have stopped early for futility, here meaning the probability that the ultimate posterior probability exceeds 0.95 being less than 0.05.

# Create an R function that computes the prob. that the last post. prob. exceeds 0.95
success <- Function(f)
failure  <- function(pp, n) 1. - plogis(success(pp, as.character(n)))
failure(0.4, 20); failure(0.86, 100)

[1] 0.40343

[1] 0.5749181

# Put failure probability on each look
w[, fail := failure(p1, look)]
z <- w[, .(failed=any(fail > 0.95)), by=.(OR, sim)]
pr('Proportion of trials stopped early for futility',
   z[, .(propfailed = mean(failed, na.rm=TRUE)), by=OR])


Proportion of trials stopped early for futility 

     OR propfailed
1: 0.30      0.000
2: 0.40      0.000
3: 0.50      0.000
4: 0.55      0.008
5: 0.60      0.020
6: 0.65      0.024
7: 0.70      0.006
8: 0.80      0.354
9: 1.00      0.794

There is a substantial (0.48) risk of stopping early for futility were the true OR to be \(\geq 0.8\). Under the null, thankfully 0.90 of trials would stop early for futility, here futility meaning low chance of ever declaring efficacy.

Were the Bayesian operating characteristics to be re-run with stopping early when the ultimate success probability is \(< 0.05\), the \(\alpha\) probability would be lower, but the Bayesian power would also be noticeably lower were OR to be \(\geq 0.65\) (where 0.13 of trials would have stopped early without claiming efficacy).

11 More Information

Full R markdown script, Hmisc and service functions, and pdf version of this report: hbiostat.org/R/Hmisc/markov
COVID-19 statistical resources: hbiostat.org/proj/covid19 including detailed analyses of the VIOLET 2 and ORCHID studies plus an examination of the handling of irregular measurement times in a Markov model
Bayesian design and analysis resources: hbiostat.org/bayes
Markov modeling references: hbiostat.org/bib/markov.html
Longitudinal ordinal analysis references: hbiostat.org/bib/ordSerial.html
General references on longitudinal data analysis: hbiostat.org/bib/serial.html
Introduction to the proportional odds model: hbiostat.org/bbr/md chapter 7
More about the proportional odds model: hbiostat.org/rms

12 Computing Environment

 R version 4.1.0 (2021-05-18)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Pop!_OS 21.04
 
 Matrix products: default
 BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
 
 attached base packages:
 [1] splines   stats4    stats     graphics  grDevices utils     datasets 
 [8] methods   base     
 
 other attached packages:
  [1] rstan_2.21.2         StanHeaders_2.21.0-7 rmsb_0.0.2          
  [4] VGAM_1.1-5           plotly_4.9.3         data.table_1.13.6   
  [7] rms_6.2-1            SparseM_1.81         Hmisc_4.6-0         
 [10] ggplot2_3.3.3        Formula_1.2-4        survival_3.2-11     
 [13] lattice_0.20-44

To cite R in publications use:

R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

To cite the Hmisc package in publications use:

Harrell Jr F (2021). Hmisc: Harrell Miscellaneous. R package version 4.6-0, https://hbiostat.org/R/Hmisc/.

To cite the rms package in publications use:

Harrell Jr FE (2021). rms: Regression Modeling Strategies. https://hbiostat.org/R/rms/, https://github.com/harrelfe/rms.

To cite the rmsb package in publications use:

Harrell F (2021). rmsb: Bayesian Regression Modeling Strategies. R package version 0.0.2, https://hbiostat.org/R/rmsb/.

To cite the VGAM package in publications use:

Yee TW (2015). Vector Generalized Linear and Additive Models: With an Implementation in R. Springer, New York, USA.

To cite the data.table package in publications use:

Dowle M, Srinivasan A (2020). data.table: Extension of 'data.frame'. R package version 1.13.6, https://CRAN.R-project.org/package=data.table.

To cite the ggplot2 package in publications use:

Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org.

To cite the rstan package in publications use:

Stan Development Team (2020). “RStan: the R interface to Stan.” R package version 2.21.2, http://mc-stan.org/.

To cite the survival package in publications use:

Therneau T (2021). A Package for Survival Analysis in R. R package version 3.2-11, https://CRAN.R-project.org/package=survival.