Longitudinal Ordinal Analysis for ORCHID

Introduction

ORCHID was a randomized clinical trial of Hydroxychloroquine vs. placebo for treatment of hospitalized COVID-19 patients. The main publication is here. The primary analysis for the trial used a covariate-adjusted proportional odds model for the 7-level WHO COVID ordinal scale measured at 14 days. There was little evidence for any effectiveness of the drug, despite the fact that the U.S. stockpiled 63,000,000 doses of it. The purpose of the reanalyses done here is to learn from a longitudinal analysis using all available ordinal outcomes. These are assessed at days 1, 2, 3, 4, 7, 14, 28 after randomization. It will be seen that even with this more powerful longitudinal analysis, there remains very little evidence for drug effectiveness.

After much experimentation with various longitudinal models (and also see here), a first-order Markov state transition model for ordinal outcome states provides the best fit to the correlation structure found in the data. The Markov model uses the previous ordinal state as a (categorical) predictor of the current state. It allows the correlation between states on two successive days to be arbitrarily high within patient, and it elegantly handles absorbing states such as death. Excellent handling of the correlation structure is done without the complexities of random effects. See below for justification of this statement. Go here for a list of papers about Markov longitudinal modeling of binary and ordinal outcomes.

Markov models in their usual form assume that the outcome variable is also assessed at baseline (time zero). Such is the case for ORCHID. The fact that some of the outcome levels (e.g., death) are not used at baseline is of no consequence.

The only obstacle to using transition models is that the primary model parameters (here, in a proportional odds model) relate to the probability of being in the current state or higher given the previous state. For day 1 post randomization, this is no problem. But we need to restate the model to obtain time-unconditional probabilities of being in state y or worse (“state occupancy probabilities”). This is done through recursive (through time) matrix multiplications to “uncondition” the transition probabilities to get probabilities of being in the current state regardless of previous states. This are simple to interpret quantities that are functions of all model parameters. Because of that it is difficult to get confidence intervals and statistical tests using frequentist methods. In examples below it is seen that the Bayesian approach provides exact inference, taking all uncertainties into account. Graphs will show the probability of the patient being at home as a function of time and treatment. Home is almost, but not quite, an absorbing state. So it is possible that such graphs will increase up to a point then slightly decrease (that didn’t happen in this study). If we consider home to be an absorbing state, then the graph is a cumulative incidence graph.

Various ways of restating the model results are also given below. One of these is the mean restricted time to the patient going home.

Note that the transition model needs to take absolute time (days from randomization) into account, as well as previous state. That is because the mix of outcomes changes over time in a way that simple transition odds ratios can’t capture.

To be added to this report is an example of using the fitted model to simulate longitudinal ordinal data, and using the simulation to estimate probabilities of more complex quantities than can easily be obtained by the analytic solutions shown here. For example, one can simulate data to compute the probability that patients on a given treatment with given baseline covariates will be alive on day 28 if they are on a ventilator on day 14.

require(rmsb)
require(data.table)
require(plotly)
knitrSet(lang='markdown', w=7, h=7, dev='png', fig.path='png/a-')
options(prType='html')

# Get around a bug in tooltip construction with ggplotly
# See https://stackoverflow.com/questions/66316337
ggp <- function(obj) {
  g <- ggplotly(obj, tooltip='label')
    d <- g$x$data
    for(i in 1 : length(d)) {
      w <- d[[i]]$text
    if(length(w)) d[[i]]$text <- gsub('txt: ', '', w)
        }
    g$x$data <- d
    g
    }
    
d <- readRDS('source/orchid.rds')
d[, gap := day - shift(day), by=id]
d[is.na(gap), gap := 1]
label(d$gap) <- 'Days since last measurement'
d[, table(day, gap)]
##     gap
## day    1   3   7  14
##   1  479   0   0   0
##   2  479   0   0   0
##   3  478   0   0   0
##   4  475   0   0   0
##   7    0 474   0   0
##   14   0   0 461   0
##   28   0   0   0 447
# Also fetch ORCHID dataset for all days, carrying death forward
dall <- readRDS('source/orchidAllDays.rds')
ordLevels <-
    c('death',
    'hospitalized on invasive mechanical ventilation/ECMO',
    'hospitalized on non-invasive ventilation or high flow nasal cannula',
    'hospitalized on supplemental oxygen',
    'hospitalized not on supplemental oxygen',
    'not hospitalized with limitation of activity',
    'not hospitalized without limitation in activity')

Descriptive Statistics

html(describe(d))
d

18 Variables   3293 Observations

id: Subject ID
nmissingdistinct
32930479
lowest :ORCH-0001ORCH-0002ORCH-0003ORCH-0004ORCH-0005
highest:ORCH-0475ORCH-0476ORCH-0477ORCH-0478ORCH-0479

age
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
3293070156.5118.6729354457687885
lowest : 18 20 22 23 24 , highest: 85 86 87 88 89
icu
nmissingdistinctInfoSumMeanGmd
3293020.4566160.18710.3042

bmi: Body Mass Index
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
3060233396132.810.0120.8022.6826.8431.1436.9344.4650.22
lowest : 7.098765 8.65051911.42857117.68978917.968750
highest:67.52077667.82719175.05481578.69406996.559285

y0: Baseline status
image
nmissingdistinctInfoMeanGmd
3293040.854.1140.8686
 Value          2     3     4     5
 Frequency    214   364  1549  1166
 Proportion 0.065 0.111 0.470 0.354
 

deathday: Days from randomization to death
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
3182975250.99615.389.578 5 6 815202830
lowest : 2 3 5 6 7 , highest: 28 29 30 32 34
pfratio: PaO2 / FiO2
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
4032890530.999143.894 45.00 53.33 80.00127.50220.00267.86295.24
lowest : 27.36842 30.00000 44.00000 45.00000 50.00000
highest:290.47619295.23810322.50000343.33333438.09524

race
image
nmissingdistinct
329303
 Value      black white other
 Frequency    789  1451  1053
 Proportion 0.240 0.441 0.320
 

sfratio
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
31251681610.999348.6118.6124.3178.0283.3383.3438.1452.4461.9
lowest : 0.9666667 30.0000000 69.0000000 78.0000000 79.0000000
highest:457.1428571461.9047619466.6666667471.4285714476.1904762

sofa_gcs
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
32930170.9633.0882.6580112469
lowest : 0 1 2 3 4 , highest: 12 13 14 16 17
 Value          0     1     2     3     4     5     6     7     8     9    10    11
 Frequency    189   741   891   503   390   182    84    67    33    61    25    14
 Proportion 0.057 0.225 0.271 0.153 0.118 0.055 0.026 0.020 0.010 0.019 0.008 0.004
                                         
 Value         12    13    14    16    17
 Frequency     54    33    12     7     7
 Proportion 0.016 0.010 0.004 0.002 0.002
 

sofa_nogcs
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
32930130.9592.7732.1880112467
lowest : 0 1 2 3 4 , highest: 8 9 11 13 14
 Value          0     1     2     3     4     5     6     7     8     9    11    13
 Frequency    189   783   918   539   359   165    82   110    47    75    12     7
 Proportion 0.057 0.238 0.279 0.164 0.109 0.050 0.025 0.033 0.014 0.023 0.004 0.002
                 
 Value         14
 Frequency      7
 Proportion 0.002
 

time_to_recovery
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
32930290.989.13210.45 0 0 0 5152828
lowest : 0 1 2 3 4 , highest: 24 25 26 27 28
sex
nmissingdistinct
329302
 Value      female   male
 Frequency    1448   1845
 Proportion   0.44   0.56
 

trt: Treatment
nmissingdistinct
329302
 Value                 placebo hydroxychloroquine
 Frequency                1631               1662
 Proportion              0.495              0.505
 

day: Days since randomization
image
nmissingdistinctInfoMeanGmd
3293070.988.2178.698
lowest : 1 2 3 4 7 , highest: 3 4 7 14 28
 Value          1     2     3     4     7    14    28
 Frequency    479   479   478   475   474   461   447
 Proportion 0.145 0.145 0.145 0.144 0.144 0.140 0.136
 

y: Ordinal Outcome
image
nmissingdistinctInfoMeanGmd
3293070.9634.7121.804
lowest : 1 2 3 4 5 , highest: 3 4 5 6 7
 Value          1     2     3     4     5     6     7
 Frequency     50   357   297   817   526   757   489
 Proportion 0.015 0.108 0.090 0.248 0.160 0.230 0.148
 

yprev: Previous ordinal outcome
image
nmissingdistinctInfoMeanGmd
3293060.9524.4861.572
lowest : 2 3 4 5 6 , highest: 3 4 5 6 7
 Value          2     3     4     5     6     7
 Frequency    366   352  1022   683   610   260
 Proportion 0.111 0.107 0.310 0.207 0.185 0.079
 

gap: Days since last measurement
image
nmissingdistinctInfoMeanGmd
3293040.7963.8924.216
 Value          1     3     7    14
 Frequency   1911   474   461   447
 Proportion 0.580 0.144 0.140 0.136
 

Event Chart for Random Sample of Patients

A status timeline is shown for each of 65 randomly chosen patients. The patients are sorted so that the worst outcomes at the last follow-up day are at the top. One day is subtract from the time variable to make follow-up start at day 0.

ssamp <- sample(unique(d$id), 65, FALSE)
dr        <- subset(d, id %in% ssamp)
dr$id     <- as.integer(as.factor(dr$id))
dr$status <- factor(dr$y, levels=7:1)
dr$day    <- dr$day - 1
multEventChart(status ~ day + id, data=dr,
               absorb='1', sortbylast = TRUE) +
  theme_classic() +
  theme(legend.position='bottom')

Summary of Successive State Transitions

gg <- propsTrans(y ~ day + id, data=dall, maxsize=6, labels=ordLevels) +
        theme(legend.position='bottom',
        axis.text.x=element_text(angle=90, hjust=1))
ggp(gg)
# For a nice event chart for this type of data see https://livefreeordichotomize.com/2020/05/21/survival-model-detective-1

Category Proportions Over Time

The following plot shows how the mix of outcomes changes over time stratified also by treatment.

gg <- propsPO(y ~ trt + day, nrow=1, data=dall) +
        theme(legend.position='bottom',
                      axis.text.x=element_text(angle=90, hjust=1))
ggp(gg)

Single-Day Analysis

To provide a result similar to the ORCHID’s pre-specified primary analysis, we use a proportional odds model for day 14 status in isolation.

d14 <- subset(dall, day == 14)
d14$y0 <- as.factor(d14$y0)
f <- lrm(y ~ trt + y0 + rcs(age, 3) + race + sofa_nogcs + sex, data=d14)
f

Logistic Regression Model

 lrm(formula = y ~ trt + y0 + rcs(age, 3) + race + sofa_nogcs + 
     sex, data = d14)
 

Frequencies of Responses

   1   2   3   4   5   6   7 
  32  42  12  40  37 169 147 
 
Model Likelihood
Ratio Test
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 479 LR χ2 176.28 R2 0.320 C 0.731
max |∂log L/∂β| 2×10-12 d.f. 10 g 1.360 Dxy 0.461
Pr(>χ2) <0.0001 gr 3.895 γ 0.462
gp 0.180 τa 0.349
Brier 0.106
β S.E. Wald Z Pr(>|Z|)
y≥2   4.2312  0.8255 5.13 <0.0001
y≥3   3.0594  0.8146 3.76 0.0002
y≥4   2.8138  0.8141 3.46 0.0005
y≥5   2.1481  0.8141 2.64 0.0083
y≥6   1.6364  0.8134 2.01 0.0442
y≥7  -0.2346  0.8054 -0.29 0.7708
trt=hydroxychloroquine   0.0437  0.1701 0.26 0.7973
y0=3  -0.1120  0.4477 -0.25 0.8025
y0=4   0.9315  0.4550 2.05 0.0406
y0=5   1.5653  0.4925 3.18 0.0015
age  -0.0098  0.0127 -0.77 0.4395
age’  -0.0222  0.0153 -1.45 0.1475
race=white  -0.4846  0.2244 -2.16 0.0308
race=other   0.0313  0.2452 0.13 0.8985
sofa_nogcs  -0.2699  0.0588 -4.59 <0.0001
sex=male  -0.1971  0.1820 -1.08 0.2788

Serial Conditional Markov Ordinal Model

Here we use a first order Markov process where the PO model on a given day is conditional on the ordinal outcome on the previous day as well as on the other variables. The 1-day lagged outcome status is variable yprev. Note that the gaps (1 1 1 3 7 14) between measurement times (1 2 3 4 7 14 28) has a linear correlation with the measurement times of 0.998, so including day as an effect modifier of previous status is similar to allowing the influence of the previous state to be a function of the gap duration.

First do a consolidated analysis in which all observed gap-previous state combinations are modeled as indicator variables. The reference cell here is gap=1 previous state 2. Then look at effects of previous state stratified by distinct levels of gap. Interpretation is clouded by all effects being relative to previous state 2.

d$yprev <- factor(d$yprev)
dd <- datadist(d); options(datadist='dd')
ugaps <- sort(unique(d$gap))
z <- with(d, interaction(gap, yprev, drop=TRUE))
f <- lrm(y ~ z + day + rcs(age, 3) + race + sofa_nogcs + sex, data=d)
co <- coef(f)[grep('z=', names(coef(f)))]
names(co) <- sub('z=', '', names(co))
n  <- names(co)
x  <- strsplit(n, '\\.')
g  <- sapply(x, function(u) as.numeric(u[1]))    # gaps
yp <- sapply(x, function(u) as.numeric(u[2]))    # yprev
ef  <- data.frame(gap=g, yp=yp, effect=unname(co))
# Reference cell in model was gap=1 yprev=2 so add a zero for this
ef <- rbind(data.frame(gap=1, yp=2, effect=0), ef)
print(ef, digits=3)
   gap yp effect
1    1  2  0.000
2    3  2 -0.796
3    7  2 -1.626
4   14  2 -5.056
5    1  3  2.365
6    3  3  2.057
7    7  3  1.790
8   14  3  1.104
9    1  4  4.959
10   3  4  6.356
11   7  4  5.427
12  14  4  4.052
13   1  5  7.388
14   3  5  8.255
15   7  5  7.345
16  14  5  5.390
17   1  6  9.311
18   3  6  9.715
19   7  6  8.069
20  14  6  5.656
21   7  7  9.385
22  14  7  7.396
# Fit gap trend separately by yprev
ggplot(ef, aes(x=gap, y=effect, col=factor(yp))) + geom_line() + geom_point()

n <- c('yprev=3', 'yprev=4', 'yprev=5', 'yprev=6', 'yprev=7')
k <- matrix(NA, nrow=length(ugaps), ncol=length(n), dimnames=list(ugaps, n))
i <- 0
for(gp in ugaps) {
  i <- i + 1
  s <- subset(d, gap == gp)
  s$yprev <- s$yprev[drop=TRUE]
  form <- if(gp == 1) y ~ day + yprev + rcs(age, 3) + race + sofa_nogcs + sex else
    y ~ yprev + rcs(age, 3) + race + sofa_nogcs + sex
  form <- y ~ yprev
  if(gp == 1) form <- y ~ day + yprev
  f <- lrm(form,  data=s)
  m <- intersect(n, names(coef(f)))
  k[i, m] <- coef(f)[m]
}
round(k, 2)
   yprev=3 yprev=4 yprev=5 yprev=6 yprev=7
1     4.28    7.99   11.34   23.54      NA
3     2.43    4.80    5.92    6.78      NA
7     1.71    3.19    4.71    5.45    6.72
14    1.68    3.32    4.47    4.91    6.64
z <- zp <- NULL
xx <- seq(1, 14, by=.1)
w <- data.frame(ugaps=xx)
for(j in 1 : 5) {
  y <- k[, j]
  mods <- if(j == 5) 'power fixed' else  c('linear spline', 'exponential', 'power', 'power fixed')
  for(model in mods) {
  form <- switch(model,
                 'linear spline' = y ~ lsp(ugaps, 3),
                 'exponential'   = log(y) ~ ugaps,
                 'power'         = log(y) ~ log(ugaps),
                 'power fixed'   = NA)
  if(model != 'power fixed') {
    f <- ols(form)
    if(model == 'power') print(coef(f))
    p <- predict(f, w)
    if(model != 'linear spline') p <- exp(p)
  } else {
    const <- exp(mean(log(y / (ugaps ^ -0.32)), na.rm=TRUE))
    p <- const * (xx ^ -0.32)
  }
  prev <- paste('Previous state:', j + 2)
  z  <- rbind(z,  data.frame(model=model, gap=xx,    y=p, yprev=prev))
  zp <- rbind(zp, data.frame(model=model, gap=ugaps, y=y, yprev=prev)) 
  }
}
 Intercept      ugaps 
 1.3760590 -0.3708071 
 Intercept      ugaps 
 2.0091496 -0.3570896 
Intercept     ugaps 
 2.320715 -0.356863 
 Intercept      ugaps 
 2.9231801 -0.5866748 
ggplot(z, aes(x=gap, y=y, color=model)) + geom_line() +
  geom_point(data=zp, aes(x=gap, y=y)) +
  facet_wrap(~ yprev)

By combining the patterns of both approaches, time gap is effectively modeled as a linear spline with knot at 3, which allows for a very different effect at the most command gap of 1d. But the use of two terms for the gap effect will result in a singularity when interacting gap with previous state, because some previous states do not occur at some gaps. The power function specifying a gap effect of a constant times \(d^{-0.32}\) where \(d\) is the time gap fits very well except for previous state 6, which is the only state in the above graph for which you can see a difference between the fitted power function (which estimated an exponent of -0.58) and this function. By using a single parameter (the exponent) to capture the gap effect for all possible gaps one can reduce the number of parameters to estimate by omitting previous state main effects from the model (see below). But again the fact that we are subtracting effects for being in previous state 2 makes this confusing. This will be explored in a better way in the VIOLET dataset where artifical gaps can be created in a way that is independent of absolute time, since in VIOLET all consecutive days had outcome assessments.

In formulating our model we need the following considerations.

  • transitions depend on a flexible function of time
  • the effect of the previous state may change with time or gap time
  • the treatment effect may change with time
  • there may be a weird effect where the effect of treatment may differ according to the previous outcome state

Due to a singularity (caused by collinearity), a model could not be fitted that allowed previous state to interact with both day and gap. Also, a singularity resulted when the linear spline is used for gap. There are no records with yprev=7 for gap < 7. So force gap to be linear for now. Let’s fit two models to see which one more strongly interacts, then use that one by itself.

f1 <- lrm(y ~ gap * yprev + pol(day,2) * (trt + rcs(age, 3) +
                 race + sofa_nogcs + sex) +
         yprev * trt,
         data=d)
f2 <- lrm(y ~ gap  + day * (yprev + trt + rcs(age, 3) +
                 race + sofa_nogcs + sex) +
         yprev * trt,
         data=d)
AIC(f1); AIC(f2)   # first model way better

[1] 7004.827 [1] 7060.269

f <- f1
anova(f)
Wald Statistics for y
χ2 d.f. P
gap (Factor+Higher Order Factors) 110.32 6 <0.0001
All Interactions 108.29 5 <0.0001
yprev (Factor+Higher Order Factors) 1878.87 15 <0.0001
All Interactions 114.56 10 <0.0001
day (Factor+Higher Order Factors) 129.38 16 <0.0001
All Interactions 45.62 14 <0.0001
Nonlinear (Factor+Higher Order Factors) 79.28 8 <0.0001
trt (Factor+Higher Order Factors) 7.15 8 0.5204
All Interactions 7.11 7 0.4177
age (Factor+Higher Order Factors) 45.95 6 <0.0001
All Interactions 18.47 4 0.0010
Nonlinear (Factor+Higher Order Factors) 3.94 3 0.2683
race (Factor+Higher Order Factors) 12.28 6 0.0560
All Interactions 8.56 4 0.0732
sofa_nogcs (Factor+Higher Order Factors) 29.11 3 <0.0001
All Interactions 9.20 2 0.0100
sex (Factor+Higher Order Factors) 1.19 3 0.7560
All Interactions 0.41 2 0.8151
gap × yprev (Factor+Higher Order Factors) 108.29 5 <0.0001
day × trt (Factor+Higher Order Factors) 2.45 2 0.2932
Nonlinear 1.24 1 0.2663
Nonlinear Interaction : f(A,B) vs. AB 1.24 1 0.2663
day × age (Factor+Higher Order Factors) 18.47 4 0.0010
Nonlinear 10.50 3 0.0148
Nonlinear Interaction : f(A,B) vs. AB 10.50 3 0.0148
f(A,B) vs. Af(B) + Bg(A) 2.82 1 0.0933
Nonlinear Interaction in day vs. Af(B) 9.85 2 0.0073
Nonlinear Interaction in age vs. Bg(A) 3.43 2 0.1799
day × race (Factor+Higher Order Factors) 8.56 4 0.0732
Nonlinear 7.10 2 0.0287
Nonlinear Interaction : f(A,B) vs. AB 7.10 2 0.0287
day × sofa_nogcs (Factor+Higher Order Factors) 9.20 2 0.0100
Nonlinear 8.86 1 0.0029
Nonlinear Interaction : f(A,B) vs. AB 8.86 1 0.0029
day × sex (Factor+Higher Order Factors) 0.41 2 0.8151
Nonlinear 0.00 1 0.9892
Nonlinear Interaction : f(A,B) vs. AB 0.00 1 0.9892
yprev × trt (Factor+Higher Order Factors) 6.07 5 0.2990
TOTAL NONLINEAR 80.29 10 <0.0001
TOTAL INTERACTION 153.84 24 <0.0001
TOTAL NONLINEAR + INTERACTION 234.98 26 <0.0001
TOTAL 2512.52 39 <0.0001
ggplot(Predict(f, gap, yprev))

There was little evidence for a treatment effect at any time, and little evidence that the effect of treatment on state transitions varied with time. There is also little evidence that the treatment effect differed for different previous states.

The graph above shows that being at home without disability is almost an absorbing state, and that the effect of previous state varies with time gap.

For now let’s simplify the model by having previous state only interact with gap (despite some evidence for the age effect varying over time), and removing the interactions with treatment. Now allow the effect of absolute time to be quadratic1. Compare the variances of parameter estimates with cluster sandwich robust estimates.

f <- lrm(y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 3) + race + sofa_nogcs + sex,
         data=d, x=TRUE, y=TRUE)
f

Logistic Regression Model

 lrm(formula = y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 
     3) + race + sofa_nogcs + sex, data = d, x = TRUE, y = TRUE)
 

Frequencies of Responses

   1   2   3   4   5   6   7 
  50 357 297 817 526 757 489 
 
Model Likelihood
Ratio Test
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 3293 LR χ2 4766.76 R2 0.787 C 0.908
max |∂log L/∂β| 2×10-9 d.f. 20 g 4.186 Dxy 0.816
Pr(>χ2) <0.0001 gr 65.746 γ 0.821
gp 0.306 τa 0.667
Brier 0.058
β S.E. Wald Z Pr(>|Z|)
y≥2   2.4712  0.3281 7.53 <0.0001
y≥3   -1.0620  0.3185 -3.33 0.0009
y≥4   -2.9688  0.3342 -8.88 <0.0001
y≥5   -6.0554  0.3481 -17.40 <0.0001
y≥6   -7.8957  0.3556 -22.20 <0.0001
y≥7  -10.8037  0.3689 -29.28 <0.0001
gap   -0.2184  0.0839 -2.60 0.0093
yprev=3   2.2491  0.2230 10.09 <0.0001
yprev=4   4.8417  0.2303 21.02 <0.0001
yprev=5   7.2783  0.2560 28.43 <0.0001
yprev=6   9.3438  0.2776 33.66 <0.0001
yprev=7   9.2858  0.5779 16.07 <0.0001
day   0.2792  0.0392 7.12 <0.0001
day2   -0.0052  0.0007 -7.01 <0.0001
trt=hydroxychloroquine   0.0134  0.0683 0.20 0.8444
age   -0.0086  0.0049 -1.75 0.0800
age’   -0.0034  0.0058 -0.60 0.5515
race=white   -0.0641  0.0877 -0.73 0.4650
race=other   0.0784  0.0951 0.82 0.4098
sofa_nogcs   -0.0886  0.0205 -4.31 <0.0001
sex=male   -0.0612  0.0719 -0.85 0.3948
gap × yprev=3   0.2064  0.0613 3.37 0.0008
gap × yprev=4   0.3304  0.0385 8.59 <0.0001
gap × yprev=5   0.2404  0.0392 6.13 <0.0001
gap × yprev=6   0.0953  0.0332 2.87 0.0041
gap × yprev=7   0.2302  0.0569 4.04 <0.0001
anova(f)
Wald Statistics for y
χ2 d.f. P
gap (Factor+Higher Order Factors) 106.77 6 <0.0001
All Interactions 104.49 5 <0.0001
yprev (Factor+Higher Order Factors) 1891.62 10 <0.0001
All Interactions 104.49 5 <0.0001
day 84.64 2 <0.0001
Nonlinear 49.14 1 <0.0001
trt 0.04 1 0.8444
age 26.83 2 <0.0001
Nonlinear 0.35 1 0.5515
race 3.14 2 0.2078
sofa_nogcs 18.61 1 <0.0001
sex 0.72 1 0.3948
gap × yprev (Factor+Higher Order Factors) 104.49 5 <0.0001
TOTAL NONLINEAR 49.55 2 <0.0001
TOTAL NONLINEAR + INTERACTION 186.88 7 <0.0001
TOTAL 2507.19 20 <0.0001
anova(f, age, race, sex)   # combined effects of 3 variables
Wald Statistics for y
χ2 d.f. P
age 26.83 2 <0.0001
Nonlinear 0.35 1 0.5515
race 3.14 2 0.2078
sex 0.72 1 0.3948
TOTAL 33.50 5 <0.0001
g <- robcov(f, d$id)
htmlVerbatim(round(diag(vcov(f)) / diag(vcov(g)), 2))
                   y>=2                   y>=3                   y>=4 
                   1.40                   1.31                   1.11 
                   y>=5                   y>=6                   y>=7 
                   0.97                   0.96                   0.91 
                    gap                yprev=3                yprev=4 
                   1.20                   0.85                   0.79 
                yprev=5                yprev=6                yprev=7 
                   0.75                   0.79                   0.84 
                    day                  day^2 trt=hydroxychloroquine 
                   1.52                   0.72                   1.22 
                    age                   age'             race=white 
                   1.53                   1.26                   1.24 
             race=other             sofa_nogcs               sex=male 
                   1.22                   1.16                   1.24 
          gap * yprev=3          gap * yprev=4          gap * yprev=5 
                   0.21                   0.52                   0.52 
          gap * yprev=6          gap * yprev=7 
                   0.57                   0.72 
 

The two variance estimates are similar, giving an indirect indication that the assumption of conditional independence (conditioning on previous state) is reasonable. The variance ratio is 1.2 for treatment (ordinary variance estimate is larger than the intra-cluster correlation-robust estimate).

Proportional Odds Assumption

The treatment is too weak to detect any non-proportional odds. Let’s check the PO assumption on the two strongest variables: gap and yprev. We omit important interactions between them and make day linear to be able to limit the number of regression coefficient estimates to examine as a function of y-cutoff. Expect results for the first cutoff, which corresponds to predicting deaths, to be unstable due to there being only 50 deaths.

n <- c('yprev=3', 'yprev=4', 'yprev=5', 'yprev=6', 'yprev=7', 'day', 'gap')
z <- se <- matrix(NA, nrow=6, ncol=length(n))
dimnames(z) <- dimnames(se) <- list(paste0('y>=', 2:7), n)
for(ycut in 2 : 7) {
    g <- lrm(y >= ycut ~ gap + yprev + day + trt + rcs(age, 3) + race +
                     sofa_nogcs + sex,
                     data=d, tol=1e-14, maxit=30)
    z [ycut - 1, n] <- coef(g)[n]
    se[ycut - 1, n] <- sqrt(diag(vcov(g)[n, n])) 
}
round(z, 2)
     yprev=3 yprev=4 yprev=5 yprev=6 yprev=7   day   gap
y>=2    0.41    1.70    2.76    2.71   10.61 -0.30  0.35
y>=3    3.29    5.09    6.43    6.20   13.45 -0.05  0.11
y>=4    1.20    4.82    6.59    6.72   12.62  0.24 -0.32
y>=5    1.94    3.35    6.70    8.10    8.02  0.41 -0.53
y>=6    2.58    3.82    5.01    8.42    7.42  0.64 -0.93
y>=7    2.32    3.06    3.75    4.50    6.08  0.28 -0.35
round(se, 2)
     yprev=3 yprev=4 yprev=5 yprev=6 yprev=7  day  gap
y>=2    0.47    0.45    0.69    0.64   27.72 0.24 0.47
y>=3    0.24    0.28    0.54    0.54   30.24 0.10 0.20
y>=4    0.26    0.27    0.41    0.55   20.70 0.08 0.16
y>=5    0.42    0.39    0.42    0.54    1.07 0.07 0.13
y>=6    0.55    0.51    0.52    0.56    0.64 0.07 0.14
y>=7    1.08    1.02    1.02    1.02    1.03 0.11 0.20

The effect of day and gap are roughly linear in the y-cutoff.
yprev may be predicting death differently from how it predicts the nonfatal outcomes.

The VGAM package vglm function would not converge for a partial PO model that allowed for non-PO in yprev.

Computing Unconditional Estimates

One can uncondition the previous model to obtain unconditional treatment effects, although these will be covariate- and time-dependent. We specify covariate settings (here we use representative values for each covariate, just for convenience, using a method described below) and vary the treatment assignment. Note the following.

  1. The initial state (baseline ordinal status) is always conditioned on, so is not marginalized out. Calculations can be done separately for each initial state.
  2. Given this initial state, use the fitted model to get the conditional distribution of Y at (post-randomization) day 1 given the initial state, covariate settings, and treatment.
  3. Likewise for later days.
  4. Death is an absorbing state so once death occurs all future status values are also set to death.
  5. Were time to be continuous one can find the first time for which the probability of being in state y or worse exceeds 0.5, which would define the median time until reaching that state. For the ORCHID study, gaps between measurements are too long for this unless time-interpolation is used. It may be better to focus on estimating the probability of being in state y or worse as a function of days since randomization, treatment, and covariates (covariates include the initial state).
  6. This approach does not factor in uncertainties in parameter estimates, so provides only a point estimate. The simple Bayesian solution to this is provided later.

Start with computation of a set of representative covariate values that will be used when getting estimates related to transformed parameters. This is done by using the global transition model fitted above, evaluating it at day 14 for a previous state of y=4, and finding covariate settings that give predicted values that are close to the unadjusted estimates for placebo. Of the qualifying settings, one that is subjectively closer to the center of the data distribution is chosen. For simplicity we drop weak categorical predictors sex and race. To better fit we allow time to have a nonlinear effect. First let’s see the distribution of day by yprev.

ggplot(d, aes(x=day)) + geom_histogram() + facet_wrap(~ yprev)
# For yprev=7 there are only 2 distinct assessment days (14, 28)
# So need interaction to be restricted to be linear
f <- lrm(y ~ gap * yprev + rcs(day, 4) + trt + rcs(age, 3) + sofa_nogcs,
         data=d)
newdat <- subset(d, day == 1)
newdat <- transform(newdat, day=14, yprev='4', gap=1)
xb1 <- predict(f, newdat)  # uses first intercept by default

# Do the same when covariates are omitted
fu <- lrm(y ~ gap * yprev + rcs(day, 4), data=d)
xb2 <- predict(fu, newdat[1, ])
diff <- abs(xb1 - xb2)
w <- newdat[, c('age', 'sofa_nogcs', 'trt')]
w$diff <- abs(xb1 - xb2)
setDT(w)
wi <- w[order(diff), ]
# Compute mean and median baseline covariates
co <- all.vars(~ age + sofa_nogcs)
setDT(newdat)
h <- function(x) c(mean=mean(x), median=median(x))
u <- function(x) { s <- sapply(x, h); w <- as.list(s)
                   names(w) <- outer(rownames(s), colnames(s), paste, sep='.'); w}
newdat[, u(.SD), .SDcols=co]
   mean.age median.age mean.sofa_nogcs median.sofa_nogcs
1: 56.85386         57        2.814196                 2
# Seek age=57, sofa 2 or 3
wi[1:30, ]
    age sofa_nogcs                trt        diff
 1:  47          7 hydroxychloroquine 0.001365465
 2:  63          5 hydroxychloroquine 0.001857581
 3:  76          3 hydroxychloroquine 0.002198086
 4:  75          3            placebo 0.002726674
 5:  88          1 hydroxychloroquine 0.005283669
 6:  70          4 hydroxychloroquine 0.006873129
 7:  82          2            placebo 0.007587458
 8:  48          7 hydroxychloroquine 0.008507628
 9:  63          5            placebo 0.010995304
10:  45          7            placebo 0.011599411
11:  75          3 hydroxychloroquine 0.011864397
12:  75          3 hydroxychloroquine 0.011864397
13:  83          2 hydroxychloroquine 0.012547716
14:  56          6            placebo 0.014215440
15:  64          5 hydroxychloroquine 0.014776646
16:  64          5 hydroxychloroquine 0.014776646
17:  77          3 hydroxychloroquine 0.016282989
18:  77          3 hydroxychloroquine 0.016282989
19:  48          7            placebo 0.017645351
20:  89          1            placebo 0.017952035
21:  89          1            placebo 0.017952035
22:  89          1            placebo 0.017952035
23:  87          1 hydroxychloroquine 0.019381651
24:  68          4 hydroxychloroquine 0.020411089
25:  71          4 hydroxychloroquine 0.020655362
26:  61          5 hydroxychloroquine 0.023464890
27:  61          5 hydroxychloroquine 0.023464890
28:  80          2 hydroxychloroquine 0.029746227
29:  67          4 hydroxychloroquine 0.033890653
30:  72          4 hydroxychloroquine 0.034516065
    age sofa_nogcs                trt        diff
# There are no good compromises so use marginal representative values
repcov <- list(age=57, sofa_nogcs=3)

There were no representative covariate combinations that were sufficiently close to unadjusted estimates and close to the center of covariate distributions, so we chose marginally central values.

Compute \(\Pr(Y_s \geq y | X, Y_0)\) analytically for a single value of \((y, Y_0, X, s)\), using this recursive formula where \(k\) is the number of outcome states and \(t\) is time:

\[\Pr(Y_{t} = y | X) = \sum_{j=1}^{k} \Pr(Y_{t} = y | X, Y_{t-1} = j) \Pr(Y_{t-1} = j | X)\] Note that recent updates to the R Hmisc package make the uprob function below obsolete. The soprobMarkovOrd function will compute state occupancy probabilities for given covariate settings. For Bayesian models Hmisc has soprobMarkovOrdm.

# Compute unconditional probabilities
# Output is a matrix with s rows corresponding to t=times(1), times(2), ...
# Each row has columns corresponding to the probabilities of being in each of the states
# times(1) must be 1
# 
# fit is the model fit which must contain the variable named day and the
# previous state, which must be named by the value of pvarname
# data is a data frame or data table that contains covariate settings
# other than day and what's in pvarname.  data has one row.
# times is the vector of days for which to evaluate state occupancy probabilities
# ylevels are the ordered levels of the outcome variable
# Set gap=TRUE if predictor gap is in the model and gap needs to be computed
# At t=times[1] gap is taken as times[1]

uprob <- function(fit, data, times, ylevels, pvarname='yprev', gap=FALSE) {
  k <- length(ylevels)
  s <- length(times)
  P <- matrix(NA, nrow=s, ncol=k)
  colnames(P) <- as.character(ylevels)
  rownames(P) <- as.character(times)
  # Never uncondition on initial state
  data$day <- times[1]
  if(gap) data$gap <- times[1]
  p <- predict(fit, data, type='fitted.ind')
  P[1, ] <- p
  # Matrix of conditional probabilities of Y conditioning on previous time Y
  # Columns = k conditional probabilities conditional on a single previous state
  # Rows    = all possible previous states
  cp <- matrix(NA, nrow=k, ncol=k, 
               dimnames=list(paste('t-1', ylevels), paste('t', ylevels)))
  data <- as.list(data)
  data[[pvarname]] <- as.character(ylevels[-1])
  # Above: don't request estimates after absorbing state
  edata <- expand.grid(data)
  for(it in 2 : s) {
    edata$day <- times[it]
    if(gap) edata$gap <- times[it] - times[it - 1]
    # edata$gap <- c(1,1,1,1,3,3,3,7,7,7,7,7,7,7,rep(14, 14))[times[it]]
    # y=1 = death = absorbing state
    cp <- rbind(c(1., rep(0., k - 1)), predict(fit, edata, type='fitted.ind'))
    # Compute unconditional probabilities of being in all possible states
    # at current time t
    P[it, ] <- t(cp) %*% P[it - 1, ]
    }
  P
}

dat <- data.frame(yprev='4', age=repcov$age, sofa_nogcs=repcov$sofa_nogcs)
  
times <- c(1, 2, 3, 4, 7, 14, 28)
tlev  <- levels(d$trt)

U <- NULL
for(trt in tlev) {   # separately for treatments
    dat$trt <- trt
    u <- uprob(f, dat, times, 1:7, gap=TRUE)
  cat('\n', trt, '\n\n')
  print(round(u, 3))
  U <- rbind(U, data.frame(trt, day=times, p=u, phome=u[, 6] + u[, 7]))
}

 placebo 

       1     2     3     4     5     6     7
1  0.001 0.037 0.172 0.646 0.118 0.024 0.001
2  0.007 0.085 0.157 0.491 0.173 0.077 0.008
3  0.016 0.100 0.130 0.395 0.194 0.141 0.024
4  0.028 0.107 0.117 0.339 0.189 0.184 0.036
7  0.033 0.076 0.077 0.223 0.209 0.282 0.100
14 0.038 0.051 0.038 0.104 0.150 0.394 0.225
28 0.042 0.035 0.014 0.036 0.056 0.374 0.444

 hydroxychloroquine 

       1     2     3     4     5     6     7
1  0.001 0.036 0.171 0.646 0.119 0.025 0.001
2  0.007 0.084 0.156 0.491 0.174 0.078 0.009
3  0.016 0.098 0.129 0.394 0.195 0.142 0.024
4  0.027 0.106 0.116 0.338 0.190 0.186 0.036
7  0.033 0.075 0.076 0.222 0.209 0.284 0.101
14 0.037 0.050 0.038 0.103 0.149 0.395 0.229
28 0.041 0.034 0.013 0.035 0.055 0.373 0.448
Rorchid <- U

Plot the unconditional probabilities of being at home at each time, for representative age and sofa score.

xl <- xlab('Day'); yl <- ylab('Probability of Being at Home')
U$txt <- with(U, paste0(trt, '<br>Day:', day, '<br>', round(phome, 3)))
gg <- ggplot(U, aes(x=day, y=phome, color=trt, label=txt)) +
      geom_line() +
      xl + yl + labs(color='')
ggp(gg)

Repeat but estimate probabilities at days not observed in the data.

U$type <- 'Actual Times'
for(trt in tlev) {   # separately for treatments
    dat$trt <- trt
    u <- uprob(f, dat, 1:28, 1:7, gap=TRUE)
  cat('\n', trt, '\n\n')
  print(round(u, 3))
  lab <- paste(trt, 'day 1-28')
  U <- rbind(U, data.frame(trt, day=1:28, p=u, phome=u[, 6] + u[, 7],
                                                 type='Days 1-28', txt=''))
}

 placebo 

       1     2     3     4     5     6     7
1  0.001 0.037 0.172 0.646 0.118 0.024 0.001
2  0.007 0.085 0.157 0.491 0.173 0.077 0.008
3  0.016 0.100 0.130 0.395 0.194 0.141 0.024
4  0.028 0.107 0.117 0.339 0.189 0.184 0.036
5  0.046 0.126 0.119 0.317 0.179 0.184 0.030
6  0.084 0.162 0.138 0.309 0.162 0.129 0.014
7  0.170 0.216 0.168 0.275 0.111 0.056 0.004
8  0.342 0.264 0.164 0.173 0.042 0.013 0.001
9  0.609 0.231 0.091 0.058 0.008 0.002 0.000
10 0.867 0.102 0.022 0.009 0.001 0.000 0.000
11 0.980 0.017 0.002 0.001 0.000 0.000 0.000
12 0.999 0.001 0.000 0.000 0.000 0.000 0.000
13 1.000 0.000 0.000 0.000 0.000 0.000 0.000
14 1.000 0.000 0.000 0.000 0.000 0.000 0.000
15 1.000 0.000 0.000 0.000 0.000 0.000 0.000
16 1.000 0.000 0.000 0.000 0.000 0.000 0.000
17 1.000 0.000 0.000 0.000 0.000 0.000 0.000
18 1.000 0.000 0.000 0.000 0.000 0.000 0.000
19 1.000 0.000 0.000 0.000 0.000 0.000 0.000
20 1.000 0.000 0.000 0.000 0.000 0.000 0.000
21 1.000 0.000 0.000 0.000 0.000 0.000 0.000
22 1.000 0.000 0.000 0.000 0.000 0.000 0.000
23 1.000 0.000 0.000 0.000 0.000 0.000 0.000
24 1.000 0.000 0.000 0.000 0.000 0.000 0.000
25 1.000 0.000 0.000 0.000 0.000 0.000 0.000
26 1.000 0.000 0.000 0.000 0.000 0.000 0.000
27 1.000 0.000 0.000 0.000 0.000 0.000 0.000
28 1.000 0.000 0.000 0.000 0.000 0.000 0.000

 hydroxychloroquine 

       1     2     3     4     5     6     7
1  0.001 0.036 0.171 0.646 0.119 0.025 0.001
2  0.007 0.084 0.156 0.491 0.174 0.078 0.009
3  0.016 0.098 0.129 0.394 0.195 0.142 0.024
4  0.027 0.106 0.116 0.338 0.190 0.186 0.036
5  0.045 0.124 0.118 0.316 0.180 0.186 0.030
6  0.083 0.161 0.137 0.309 0.164 0.132 0.015
7  0.167 0.214 0.168 0.277 0.113 0.057 0.005
8  0.337 0.264 0.165 0.176 0.043 0.014 0.001
9  0.604 0.233 0.093 0.060 0.008 0.002 0.000
10 0.863 0.104 0.023 0.009 0.001 0.000 0.000
11 0.979 0.018 0.002 0.001 0.000 0.000 0.000
12 0.999 0.001 0.000 0.000 0.000 0.000 0.000
13 1.000 0.000 0.000 0.000 0.000 0.000 0.000
14 1.000 0.000 0.000 0.000 0.000 0.000 0.000
15 1.000 0.000 0.000 0.000 0.000 0.000 0.000
16 1.000 0.000 0.000 0.000 0.000 0.000 0.000
17 1.000 0.000 0.000 0.000 0.000 0.000 0.000
18 1.000 0.000 0.000 0.000 0.000 0.000 0.000
19 1.000 0.000 0.000 0.000 0.000 0.000 0.000
20 1.000 0.000 0.000 0.000 0.000 0.000 0.000
21 1.000 0.000 0.000 0.000 0.000 0.000 0.000
22 1.000 0.000 0.000 0.000 0.000 0.000 0.000
23 1.000 0.000 0.000 0.000 0.000 0.000 0.000
24 1.000 0.000 0.000 0.000 0.000 0.000 0.000
25 1.000 0.000 0.000 0.000 0.000 0.000 0.000
26 1.000 0.000 0.000 0.000 0.000 0.000 0.000
27 1.000 0.000 0.000 0.000 0.000 0.000 0.000
28 1.000 0.000 0.000 0.000 0.000 0.000 0.000
# Plotly doesn't work will with both color and linetype
ggplot(U, aes(x=day, y=phome, color=trt, linetype=type)) + geom_line() +
    xl + yl + labs(color='', linetype='')

Everything has gone haywire. This is caused by the fact that both gap and day needed to be in the model to have the right main effects and interactions, but these two variables are collinear. We are requesting predictions above for combinations of day and gap that did not occur in the data, i.e., gap=1 when day> 4. Collinearities cause difficulties when geting predictions for out-of-sample data in which the collinearities disagree with those in the sample used in fitting the model.

So when the design is such that one needs to interact gap with previous state and gap is not randomly assigned or is otherwise tangled with absolute time, it is not possible to get estimates for unobserved times.

Before giving up, we’ll try a model below that does not contain gap, and that flexibly models the day main effect using a spline,

Markov Model Without Covariate Adjustment

Let’s repeat the exercise with no covariate adjustment to check how we are estimating the proportion of patients at home at each observed day. We relax the model to allow it to fit any time trend except that the interaction between day and previous state is restricted to be linear in day. We are making proportional odds assumptions for day, previous state, and treatment. The PO assumption will be less consequential for treatment because of its weak effect.

dfull <- dall[y0 == 4, .(day, y, trt, id)]
# Compute days since previous measurements
s <- dfull[, .(phome = mean(y >= 6)), by=.(trt, day)]
s[, type := 'Empirical']
empirical <- s

# It would be nice to just have day in the model and not gap (which can be
# derived from day), but the collinearities between yprev and day prevent
# this, and we must interact day with yprev - see the crosstab below

d[, table(day, yprev)]
    yprev
day    2   3   4   5   6   7
  1   32  55 224 168   0   0
  2   48  70 209 142  10   0
  3   52  73 189 113  51   0
  4   63  57 157  95 103   0
  7   69  50 132  75 148   0
  14  60  35  71  53 129 113
  28  42  12  40  37 169 147
# Treat day as numeric but with all needed variables added (less one)
# to make it be modeled as categorical except for the interaction term
# Using records truncated at death
fplain  <- lrm(y ~ gap * yprev + pol(day, 2) + trt, data=d)
# The following has a slightly greater LR chi-square:
fplain2 <- lrm(y ~ rcs(day, 4) + yprev + day %ia% yprev + trt, data=d)
# lrm(y ~ sqrt(day) * yprev + trt, data=d) also fits well
fplain

Logistic Regression Model

 lrm(formula = y ~ gap * yprev + pol(day, 2) + trt, data = d)
 

Frequencies of Responses

   1   2   3   4   5   6   7 
  50 357 297 817 526 757 489 
 
Model Likelihood
Ratio Test
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 3293 LR χ2 4705.63 R2 0.783 C 0.905
max |∂log L/∂β| 1×10-9 d.f. 14 g 4.121 Dxy 0.810
Pr(>χ2) <0.0001 gr 61.595 γ 0.825
gp 0.305 τa 0.662
Brier 0.058
β S.E. Wald Z Pr(>|Z|)
y≥2   1.2998  0.1895 6.86 <0.0001
y≥3   -2.1762  0.1821 -11.95 <0.0001
y≥4   -4.0680  0.2103 -19.34 <0.0001
y≥5   -7.1255  0.2356 -30.25 <0.0001
y≥6   -8.9419  0.2482 -36.03 <0.0001
y≥7  -11.8166  0.2703 -43.71 <0.0001
gap   -0.2125  0.0837 -2.54 0.0111
yprev=3   2.4481  0.2163 11.32 <0.0001
yprev=4   5.1622  0.2192 23.56 <0.0001
yprev=5   7.6626  0.2407 31.83 <0.0001
yprev=6   9.7256  0.2657 36.60 <0.0001
yprev=7   9.6987  0.5701 17.01 <0.0001
day   0.2786  0.0391 7.13 <0.0001
day2   -0.0051  0.0007 -6.94 <0.0001
trt=hydroxychloroquine   -0.0131  0.0679 -0.19 0.8470
gap × yprev=3   0.2093  0.0605 3.46 0.0005
gap × yprev=4   0.3010  0.0381 7.91 <0.0001
gap × yprev=5   0.2109  0.0388 5.44 <0.0001
gap × yprev=6   0.0820  0.0330 2.48 0.0130
gap × yprev=7   0.2155  0.0566 3.80 0.0001
anova(fplain)
Wald Statistics for y
χ2 d.f. P
gap (Factor+Higher Order Factors) 93.36 6 <0.0001
All Interactions 90.84 5 <0.0001
yprev (Factor+Higher Order Factors) 2192.50 10 <0.0001
All Interactions 90.84 5 <0.0001
day 83.98 2 <0.0001
Nonlinear 48.22 1 <0.0001
trt 0.04 1 0.8470
gap × yprev (Factor+Higher Order Factors) 90.84 5 <0.0001
TOTAL NONLINEAR + INTERACTION 170.18 6 <0.0001
TOTAL 2506.66 14 <0.0001
dat <- data.frame(yprev='4')
  
times <- c(1, 2, 3, 4, 7, 14, 28)
# gaps  <- c(1, diff(times))
tlev  <- levels(d$trt)

for(trt in tlev) {   # separately for treatments
    dat$trt <- trt
    dat$gap <- 1
    u <- uprob(fplain2, dat, times, 1:7, gap=TRUE)
  cat('\n', trt, '\n\n')
  print(round(u, 3))
  s <- rbind(s, data.frame(trt, day=times, p=u, phome=u[, 6] + u[, 7],
                                                 type='Model'), fill=TRUE)
}

 placebo 

       1     2     3     4     5     6     7
1  0.001 0.036 0.168 0.643 0.124 0.027 0.002
2  0.010 0.092 0.161 0.485 0.164 0.078 0.009
3  0.024 0.113 0.132 0.387 0.184 0.137 0.024
4  0.038 0.110 0.104 0.309 0.192 0.200 0.048
7  0.048 0.087 0.071 0.210 0.194 0.284 0.104
14 0.054 0.059 0.039 0.096 0.135 0.365 0.253
28 0.061 0.042 0.014 0.037 0.053 0.339 0.454

 hydroxychloroquine 

       1     2     3     4     5     6     7
1  0.001 0.037 0.169 0.642 0.123 0.026 0.002
2  0.010 0.093 0.163 0.485 0.163 0.077 0.009
3  0.025 0.114 0.133 0.387 0.182 0.135 0.024
4  0.039 0.112 0.105 0.310 0.191 0.197 0.047
7  0.050 0.089 0.072 0.212 0.194 0.282 0.102
14 0.056 0.060 0.039 0.097 0.136 0.364 0.248
28 0.063 0.043 0.015 0.037 0.054 0.341 0.448
ggplot(s, aes(x=day, y=phome, color=trt, linetype=type)) + geom_line() + xl + yl +
    labs(color='', linetype='')

Now add estimates for all days 1-28.

for(trt in tlev) {   # separately for treatments
    dat$trt <- trt
    u <- uprob(fplain, dat, 1:28, 1:7, gap=TRUE)
  cat('\n', trt, '\n\n')
  print(round(u, 3))
  s <- rbind(s, data.frame(trt=paste(trt, 'days 1-28'), day=1:28, p=u,
                                                 phome=u[, 6] + u[, 7],
                                                 type='Model', times='Days 1-28'), fill=TRUE)
}

 placebo 

       1     2     3     4     5     6     7
1  0.001 0.033 0.155 0.643 0.136 0.030 0.002
2  0.009 0.086 0.154 0.483 0.173 0.085 0.010
3  0.022 0.108 0.131 0.386 0.185 0.142 0.026
4  0.035 0.107 0.108 0.314 0.187 0.200 0.049
5  0.046 0.094 0.088 0.254 0.181 0.255 0.082
6  0.053 0.076 0.071 0.204 0.169 0.303 0.124
7  0.058 0.056 0.055 0.160 0.152 0.341 0.177
8  0.060 0.039 0.042 0.122 0.132 0.366 0.238
9  0.062 0.025 0.029 0.091 0.111 0.377 0.305
10 0.063 0.015 0.020 0.065 0.090 0.374 0.373
11 0.064 0.008 0.012 0.045 0.071 0.361 0.439
12 0.064 0.004 0.007 0.030 0.055 0.340 0.500
13 0.064 0.002 0.004 0.019 0.043 0.315 0.552
14 0.064 0.001 0.002 0.013 0.034 0.290 0.597
15 0.064 0.000 0.001 0.009 0.027 0.266 0.633
16 0.064 0.000 0.001 0.006 0.022 0.244 0.663
17 0.064 0.000 0.000 0.005 0.019 0.224 0.688
18 0.064 0.000 0.000 0.004 0.016 0.208 0.708
19 0.064 0.000 0.000 0.003 0.015 0.194 0.724
20 0.064 0.000 0.000 0.003 0.013 0.182 0.738
21 0.064 0.000 0.000 0.003 0.012 0.172 0.749
22 0.064 0.000 0.000 0.002 0.011 0.164 0.758
23 0.064 0.000 0.000 0.002 0.011 0.158 0.765
24 0.064 0.000 0.000 0.002 0.010 0.153 0.770
25 0.064 0.000 0.000 0.002 0.010 0.150 0.774
26 0.064 0.000 0.000 0.002 0.010 0.148 0.776
27 0.064 0.000 0.000 0.002 0.010 0.147 0.777
28 0.064 0.000 0.000 0.002 0.010 0.147 0.777

 hydroxychloroquine 

       1     2     3     4     5     6     7
1  0.001 0.033 0.157 0.643 0.135 0.030 0.002
2  0.009 0.087 0.156 0.483 0.171 0.083 0.010
3  0.023 0.110 0.132 0.386 0.183 0.140 0.025
4  0.036 0.109 0.109 0.315 0.186 0.197 0.048
5  0.047 0.096 0.089 0.256 0.181 0.251 0.080
6  0.054 0.078 0.072 0.206 0.169 0.300 0.121
7  0.059 0.058 0.057 0.162 0.153 0.338 0.173
8  0.062 0.040 0.043 0.124 0.133 0.364 0.234
9  0.064 0.026 0.030 0.093 0.112 0.376 0.300
10 0.065 0.016 0.020 0.066 0.091 0.374 0.368
11 0.065 0.009 0.013 0.046 0.072 0.361 0.433
12 0.066 0.005 0.007 0.031 0.056 0.341 0.494
13 0.066 0.002 0.004 0.020 0.044 0.317 0.547
14 0.066 0.001 0.002 0.013 0.034 0.292 0.592
15 0.066 0.000 0.001 0.009 0.027 0.268 0.629
16 0.066 0.000 0.001 0.006 0.022 0.246 0.659
17 0.066 0.000 0.000 0.005 0.019 0.226 0.684
18 0.066 0.000 0.000 0.004 0.017 0.210 0.704
19 0.066 0.000 0.000 0.003 0.015 0.195 0.720
20 0.066 0.000 0.000 0.003 0.013 0.184 0.734
21 0.066 0.000 0.000 0.003 0.012 0.174 0.745
22 0.066 0.000 0.000 0.002 0.012 0.166 0.754
23 0.066 0.000 0.000 0.002 0.011 0.159 0.761
24 0.066 0.000 0.000 0.002 0.010 0.154 0.767
25 0.066 0.000 0.000 0.002 0.010 0.151 0.771
26 0.066 0.000 0.000 0.002 0.010 0.149 0.773
27 0.066 0.000 0.000 0.002 0.010 0.148 0.774
28 0.066 0.000 0.000 0.002 0.010 0.149 0.773
ggplot(s, aes(x=day, y=phome, color=trt, linetype=type)) + geom_line() + xl + yl +
    labs(color='', linetype='')

Again there is a problem when estimating days not occurring in the data. Try a model that models day flexibly as a main effect, but that interacts it only linearly with previous state.

dat <- data.frame(yprev='4')
  
U <- empirical
for(trt in tlev) {   # separately for treatments
    dat$trt <- trt
    u <- uprob(fplain2, dat, 1:28, 1:7, gap=FALSE)
  U <- rbind(U, data.frame(trt, day=1:28, p=u, phome=u[, 6] + u[, 7],
                                                 type='Model days 1-28'), fill=TRUE)
}
ggplot(U, aes(x=day, y=phome, color=trt, linetype=type)) + geom_line() +
    xl + yl + labs(color='', linetype='')

We see the same problem as before.

Checking Validity of Method for Unequal Time Spacing Using VIOLET 2

We saw above that it is not valid to use a simple model fitted on sparse times to estimate state occupancy probabilities at unobserved times. Let’s now check that estimates are valid when observation timings are irregular but estimates are requested at actually occurring times. For this purpose we use the VIOLET 2 trial, which had a 4-level ordinal outcome scale measured on days 1-28 (records terminate at death). We’ll fit a non-covariate-adjusted (except for treatment) model using only days 1 2 3 4 7 14 28. Then we will request state occupancy probabilities at only those days, then at all days. But first start with fitting using all days and estimating all days.

Something strange happened with regard to ventilator use on day 28, so restrict analysis to first 27 days.

dcf <- readRDS('dcf.rds')
setDT(dcf)
dcf <- dcf[day < 28]

# Compute empirical proportions of being at home for those starting In Hospital, not on vent
dfull <- dcf[, .(id, day, bstatus, status, treatment, ddeath)]
emp <- dfull[bstatus == 'In Hospital',
             .(phome = mean(status == 'Home', na.rm=TRUE)),
             by=.(treatment, day)]
emp[, type := 'Empirical']
emp[, day  := unclass(day)]

dv <- dfull[day <= ddeath, ]   # don't carry death forward
s <- dv$bstatus
levels(s) <- list('Vent/ARDS'='ARDS', 'Vent/ARDS'='On Vent',
                                    'In Hospital/Facility'='In Hospital')
dv[, table(bstatus, s)]
             s
bstatus       Vent/ARDS In Hospital/Facility
  ARDS             2062                    0
  On Vent          8828                    0
  In Hospital         0                22229
dv[, bstatus := s]
setkey(dv, id, day)
dv[, pstatus := shift(status), by=id]
dv[day == 1, pstatus := bstatus]
dv[, pstatus := pstatus[drop=TRUE]]   # drop unused previous status Dead
ggplot(dv, aes(x=day)) + geom_histogram(binwidth=1) + facet_wrap(~ pstatus)
dd <- datadist(dv); options(datadist='dd')

# d2 <- dv[bstatus == 'In Hospital/Facility', ]
# Find model with best AIC
for(k in 3 : 10) {
  h <- lrm(status ~ rcs(day, k) * pstatus + treatment, data=dv, maxit=40, tol=1e-14)
  if(k == 7) f27 <- h
  cat(k, 'knots\n')
  print(AIC(h))
}
3 knots
[1] 12877.46
4 knots
[1] 12842.47
5 knots
[1] 12832.9
6 knots
[1] 12826.01
7 knots
[1] 12819.5
8 knots
[1] 12823.38
9 knots
[1] 12822.33
10 knots
[1] 12823.41
# Use 7 knots
dat <- data.frame(pstatus='In Hospital/Facility')
  
#times <- c(1, 2, 3, 4, 7, 14, 28)
tlev  <- unique(dv$treatment)
ylev  <- levels(dv$status)
times <- 1:28

R <- emp
for(trt in tlev) {   # separately for treatments
    dat$treatment <- trt
    u <- uprob(f27, dat, times, ylev, pvarname='pstatus')
  cat('\n', trt, '\n\n')
  print(round(u, 3))
  w <- data.table(treatment=trt, day=times, p=u, phome=u[, 4], type='Model')
  R <- rbind(R, w, fill=TRUE)
}

 Placebo 

    Dead Vent/ARDS In Hospital/Facility  Home
1  0.000     0.018                0.925 0.056
2  0.001     0.027                0.846 0.127
3  0.002     0.030                0.757 0.211
4  0.003     0.031                0.663 0.304
5  0.004     0.030                0.574 0.392
6  0.005     0.029                0.499 0.467
7  0.007     0.029                0.440 0.525
8  0.008     0.029                0.394 0.569
9  0.010     0.029                0.359 0.603
10 0.012     0.029                0.328 0.631
11 0.014     0.029                0.303 0.655
12 0.016     0.029                0.280 0.676
13 0.018     0.028                0.259 0.695
14 0.020     0.028                0.241 0.711
15 0.022     0.028                0.225 0.725
16 0.024     0.028                0.212 0.737
17 0.026     0.028                0.199 0.747
18 0.028     0.029                0.188 0.755
19 0.030     0.029                0.178 0.763
20 0.032     0.029                0.169 0.770
21 0.034     0.028                0.161 0.776
22 0.036     0.028                0.154 0.782
23 0.038     0.027                0.147 0.787
24 0.040     0.027                0.141 0.792
25 0.042     0.026                0.135 0.797
26 0.045     0.025                0.129 0.801
27 0.047     0.025                0.123 0.805
28 0.049     0.024                0.117 0.809

 Vitamin D 

    Dead Vent/ARDS In Hospital/Facility  Home
1  0.000     0.018                0.925 0.056
2  0.001     0.027                0.845 0.127
3  0.002     0.030                0.756 0.212
4  0.003     0.030                0.662 0.305
5  0.004     0.030                0.573 0.394
6  0.005     0.029                0.497 0.468
7  0.007     0.028                0.438 0.527
8  0.008     0.028                0.393 0.570
9  0.010     0.029                0.357 0.604
10 0.012     0.029                0.327 0.632
11 0.014     0.029                0.301 0.657
12 0.016     0.028                0.278 0.678
13 0.018     0.028                0.258 0.697
14 0.019     0.028                0.240 0.713
15 0.021     0.027                0.224 0.727
16 0.023     0.028                0.210 0.739
17 0.025     0.028                0.198 0.749
18 0.028     0.028                0.187 0.757
19 0.030     0.029                0.177 0.765
20 0.032     0.028                0.168 0.772
21 0.034     0.028                0.160 0.778
22 0.036     0.028                0.153 0.784
23 0.038     0.027                0.146 0.789
24 0.040     0.026                0.140 0.794
25 0.042     0.026                0.134 0.798
26 0.044     0.025                0.128 0.803
27 0.046     0.025                0.122 0.807
28 0.048     0.024                0.116 0.811
ggplot(R, aes(x=day, y=phome, color=treatment, linetype=type)) + geom_line() +
    xl + yl + labs(color='', linetype='')
# Fit is barely better beyond day 17 if ran lrm on data table d2

The model would probably fit better had a partial proportional odds model been used to relax the proportional odds assumption for day.

Now fit a model as if data collected was restricted to certain days. Need to recompute lags to not let a lag be based on a data that’s no longer sampled.

times <- c(1, 2, 3, 4, 7, 14, 27)
# Compute empirical proportions of being at home for those starting In Hospital, not on vent
dfull <- dcf[day %in% times, .(id, day, bstatus, status, treatment, ddeath)]
emps <- dfull[bstatus == 'In Hospital',
                            .(phome = mean(status == 'Home', na.rm=TRUE)),
                 by=.(treatment, day)]
emps[, type := 'Empirical on selected days']
emps[, day  := unclass(day)]

dv <- dfull[day <= ddeath, ]   # don't carry death forward
s  <- dv[, bstatus]
levels(s) <- list('Vent/ARDS'='ARDS', 'Vent/ARDS'='On Vent',
                                    'In Hospital/Facility'='In Hospital')
dv[, table(bstatus, s)]
             s
bstatus       Vent/ARDS In Hospital/Facility
  ARDS              611                    0
  On Vent          2408                    0
  In Hospital         0                 5925
dv[, bstatus := s]
setkey(dv, id, day)
dv[, pstatus := shift(status), by=id]
dv[day == 1, pstatus := bstatus]
dv[, pstatus := pstatus[drop=TRUE]]   # drop unused previous status Dead
dv[, gap := ifelse(day == 1, 1, day - shift(day)), by=id]
dv[, table(day, gap)]
    gap
day     1    3    7   13
  1  1352    0    0    0
  2  1336    0    0    0
  3  1321    0    0    0
  4  1303    0    0    0
  7     0 1268    0    0
  14    0    0 1212    0
  27    0    0    0 1159
ggplot(dv, aes(x=day)) + geom_histogram(binwidth=1) + facet_wrap(~ pstatus)
dd <- datadist(dv); options(datadist='dd')

# d2 <- dv[bstatus == 'In Hospital/Facility', ]
# Find transformation giving the best AIC
# Can't use a spline because of too many tied times
for(k in c(0 : 3)) {
    cat('Transformation', k, '\n')
    form <- if(k == 0) status ~ (day + gap) * pstatus + treatment
    else
        if(k == 1) status ~ (sqrt(day) + sqrt(gap)) * pstatus + treatment
    else
        if(k == 2) status ~ (pol(day, 2) + pol(gap, 2)) * pstatus + treatment
    else
        status ~ (pol(sqrt(day), 2) + pol(sqrt(day), 2)) * pstatus + treatment
    h <- lrm(form,  data=dv, maxit=50, tol=1e-14)
    if(k == 2) fsel <- h   # choose quadratic
    print(AIC(h))
}
Transformation 0 
[1] 8098.67
Transformation 1 
[1] 8008.668
Transformation 2 
[1] 7864.54
Transformation 3 
[1] 7896.565
dat <- data.frame(pstatus='In Hospital/Facility')
  
tlev  <- unique(dv$treatment)
ylev  <- levels(dv$status)

R <- rbind(R, emps, fill=TRUE)
for(trt in tlev) {   # separately for treatments
    dat$treatment <- trt
    u <- uprob(fsel, dat, times, ylev, pvarname='pstatus', gap=TRUE)
  cat('\n', trt, '\n\n')
  print(round(u, 3))
  w <- data.table(treatment=trt, day=times, p=u, phome=u[, 4], type='Model selected days')
  R <- rbind(R, w, fill=TRUE)
}

 Placebo 

    Dead Vent/ARDS In Hospital/Facility  Home
1  0.000     0.039                0.884 0.077
2  0.003     0.059                0.781 0.157
3  0.006     0.066                0.684 0.245
4  0.009     0.064                0.585 0.342
7  0.011     0.051                0.412 0.526
14 0.012     0.029                0.253 0.705
27 0.013     0.025                0.172 0.790

 Vitamin D 

    Dead Vent/ARDS In Hospital/Facility  Home
1  0.000     0.038                0.883 0.079
2  0.002     0.057                0.779 0.161
3  0.005     0.064                0.680 0.251
4  0.009     0.062                0.580 0.349
7  0.011     0.049                0.405 0.536
14 0.011     0.028                0.246 0.714
27 0.013     0.024                0.166 0.798
times <- 1:28
dat <- data.frame(pstatus='In Hospital/Facility')
for(trt in tlev) {   # separately for treatments
    dat$treatment <- trt
    u <- uprob(fsel, dat, times, ylev, pvarname='pstatus', gap=TRUE)
  cat('\n', trt, '\n\n')
  print(round(u, 3))
  w <- data.table(treatment=trt, day=times, p=u, phome=u[, 4], type='Model selected days, predict all')
  R <- rbind(R, w, fill=TRUE)
}

 Placebo 

    Dead Vent/ARDS In Hospital/Facility  Home
1  0.000     0.039                0.884 0.077
2  0.003     0.059                0.781 0.157
3  0.006     0.066                0.684 0.245
4  0.009     0.064                0.585 0.342
5  0.013     0.069                0.774 0.145
6  0.051     0.168                0.585 0.195
7  0.258     0.141                0.402 0.199
8  0.471     0.116                0.231 0.181
9  0.668     0.093                0.104 0.134
10 0.819     0.072                0.035 0.074
11 0.911     0.052                0.009 0.028
12 0.958     0.032                0.002 0.008
13 0.981     0.017                0.000 0.002
14 0.993     0.006                0.000 0.000
15 0.998     0.002                0.000 0.000
16 1.000     0.000                0.000 0.000
17 1.000     0.000                0.000 0.000
18 1.000     0.000                0.000 0.000
19 1.000     0.000                0.000 0.000
20 1.000     0.000                0.000 0.000
21 1.000     0.000                0.000 0.000
22 1.000     0.000                0.000 0.000
23 1.000     0.000                0.000 0.000
24 1.000     0.000                0.000 0.000
25 1.000     0.000                0.000 0.000
26 1.000     0.000                0.000 0.000
27 1.000     0.000                0.000 0.000
28 1.000     0.000                0.000 0.000

 Vitamin D 

    Dead Vent/ARDS In Hospital/Facility  Home
1  0.000     0.038                0.883 0.079
2  0.002     0.057                0.779 0.161
3  0.005     0.064                0.680 0.251
4  0.009     0.062                0.580 0.349
5  0.012     0.067                0.773 0.148
6  0.050     0.169                0.581 0.199
7  0.261     0.142                0.396 0.201
8  0.476     0.116                0.226 0.182
9  0.673     0.094                0.101 0.133
10 0.822     0.073                0.033 0.072
11 0.912     0.052                0.008 0.027
12 0.957     0.033                0.002 0.007
13 0.981     0.017                0.001 0.002
14 0.993     0.007                0.000 0.000
15 0.998     0.002                0.000 0.000
16 1.000     0.000                0.000 0.000
17 1.000     0.000                0.000 0.000
18 1.000     0.000                0.000 0.000
19 1.000     0.000                0.000 0.000
20 1.000     0.000                0.000 0.000
21 1.000     0.000                0.000 0.000
22 1.000     0.000                0.000 0.000
23 1.000     0.000                0.000 0.000
24 1.000     0.000                0.000 0.000
25 1.000     0.000                0.000 0.000
26 1.000     0.000                0.000 0.000
27 1.000     0.000                0.000 0.000
28 1.000     0.000                0.000 0.000
ggplot(R, aes(x=day, y=phome, color=treatment, linetype=type)) + geom_line() +
    xl + yl + labs(color='', linetype='')

This confirms that the Markov model likely handles uneven time spacings, but predictions should be limited to days appearing in the data. As before the problem is due to our attempt to make completely out-of-sample predictions on days not in the data when (1) both day and gap are important but (2) day and gap are collinear.

Better Modeling of Gap Effect in VIOLET 2

The completeness of VIOLET daily follow-up allows us to do another natural experiment where we create a full distribution of gaps by randomly sampling days within patients. For each patient randomly select a day t between 2 and L = last day of data - 6 days. Generate a random gap time r between 1 and L - 2. Remove days t, t+1, … t+r-1.

w <- dcf
w$gap <- NULL
setkey(w, id, day)
w <- w[!is.na(status), ]
# Recode baseline status like follow-up status
levels(w$bstatus) <- list('Vent/ARDS'='ARDS', 'Vent/ARDS'='On Vent',
                                            'In Hospital/Facility'='In Hospital')
w[, .(died=any(status=='Dead')), by=id][, table(died)]
died
FALSE  TRUE 
 1156   195 
u <- data.table(id=unique(w$id), key='id')
n <- nrow(u)
set.seed(1)
u[, start := sample(2 : 20, n, replace=TRUE)]
u[, width := sample(1 : 15, n, replace=TRUE)]
u[, end   := start + width - 1]
u[end < 27, table(start, width)] 
     width
start  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
   2   4  8  3  1  5  5  6  3  5  6  3  7  6  5  2
   3   6  4  4  7  6  5  4  3  6  2  6  6 13  3  2
   4   5  6  6  3  9  3  3  5  4  3  6  8  4  2  2
   5   6  4  5  4  4  6  5  5  7  4  1  3  8  3  3
   6   3  6  3  0  4  2  3  9  7  7  8  8  9  2  4
   7   1  3  5  2  3  6  7  5  6  7  5  4  7  3  4
   8   2  9  4  8  3  3  3  5  6  1  2  5  6  9  5
   9   3  3  4  5  8  3  5  4  8  5  7  3  7  4  5
   10 10  6  5  4  4  6  2  4  5  7  5  5  4  7  8
   11  4  7  4  5  6  7 10  6  6  9  3  3  4  2  1
   12  5  2  6  2  4  3  8  1  3  6  4  8  4  6  4
   13  6  3  6  4  6  4  2  4  3  8  2  7  7  2  0
   14  3  7  7  7  4  7  3  4  5  5  5  4  4  0  0
   15  8  4  8  8  7  2  7  4  4  5  4  4  0  0  0
   16  5  3  3  5  6  2  2  6  4  5  2  0  0  0  0
   17  3  9  4  6  7  1  5  4  2 10  0  0  0  0  0
   18  1  4  7  5  4  6  5  2  5  0  0  0  0  0  0
   19  3  6  4  2  0  3  4  3  0  0  0  0  0  0  0
   20  6  4  6  4  3  4  9  0  0  0  0  0  0  0  0
z <- w[u]
w <- z[end >= 27 | (day < start | day > end),]
# Recreate previous state and gap with new gaps
w[, pstatus := ifelse(day == 1, as.character(bstatus), shift(as.character(status))), by=id]
w[, gap     := ifelse(day == 1, 1L, day - shift(day)), by=id]
# Don't carry death forward 
w[,  nd := cumsum(status == 'Dead'), by=id]
w <- w[nd < 2, ]
w[, sum(nd > 0)]    # original number of deaths
[1] 195
w[day > 1, table(gap)]
gap
    1     2     3     4     5     6     7     8     9    10    11    12    13 
22788    76    92    84    77    89    73    85    73    78    84    61    71 
   14    15    16 
   78    45    36 
with(w, ggfreqScatter(day, gap))
dd <- datadist(w); options(datadist='dd')
f0 <- lrm(status ~ pstatus + rcs(day, 7) + treatment + rcs(age, 4) + sofa + lips,
         data=w, tol=1e-13)
f1 <- lrm(status ~ rcs(gap, 4) + pstatus + rcs(day, 4) + treatment + rcs(age, 4) + sofa + lips,
         data=w, tol=1e-13)
f2 <- lrm(status ~ gap * pstatus + rcs(day, 4) + treatment + rcs(age, 4) + sofa + lips,
         data=w, tol=1e-13)
f3 <- lrm(status ~ lsp(gap, 2) * pstatus + rcs(day, 4) + treatment + rcs(age, 4) + sofa + lips,
         data=w, tol=1e-13)
f4 <- lrm(status ~ lsp(gap, c(2,4)) * pstatus + rcs(day, 4) + treatment + rcs(age, 4) + sofa + lips,
         data=w, tol=1e-13)
f5 <- lrm(status ~ rcs(gap, 4) * pstatus + rcs(day, 4) + treatment + rcs(age, 4) + sofa + lips,
         data=w, tol=1e-13)
c(f0=AIC(f0), f1=AIC(f1), f2=AIC(f2), f3=AIC(f3), f4=AIC(f4), f5=AIC(f5))
      f0       f1       f2       f3       f4       f5 
12012.40 11652.23 11625.38 11597.44 11601.45 11600.24 
ggplot(Predict(f1, gap, conf.int=FALSE))
ggplot(Predict(f5, gap, pstatus, conf.int=FALSE))

anova(f5)
Wald Statistics for status
χ2 d.f. P
gap (Factor+Higher Order Factors) 504.72 9 <0.0001
All Interactions 129.33 6 <0.0001
Nonlinear (Factor+Higher Order Factors) 38.42 6 <0.0001
pstatus (Factor+Higher Order Factors) 5172.52 8 <0.0001
All Interactions 129.33 6 <0.0001
day 112.31 3 <0.0001
Nonlinear 19.27 2 <0.0001
treatment 0.40 1 0.5253
age 47.59 3 <0.0001
Nonlinear 2.34 2 0.3098
sofa 29.49 1 <0.0001
lips 7.27 1 0.0070
gap × pstatus (Factor+Higher Order Factors) 129.33 6 <0.0001
Nonlinear 15.44 4 0.0039
Nonlinear Interaction : f(A,B) vs. AB 15.44 4 0.0039
TOTAL NONLINEAR 64.92 10 <0.0001
TOTAL NONLINEAR + INTERACTION 179.57 12 <0.0001
TOTAL 5683.23 20 <0.0001

Before drawing conclusions let’s handle non-PO in time.

require(VGAM)
fpol <- vglm(ordered(status) ~ pstatus + day + treatment + rcs(age, 4) + sofa + lips,
          cumulative(reverse=TRUE, parallel=TRUE), data=w)
fpo3 <- vglm(ordered(status) ~ pstatus + rcs(day, 3) + treatment + rcs(age, 4) + sofa + lips,
          cumulative(reverse=TRUE, parallel=TRUE), data=w)
fpo5 <- vglm(ordered(status) ~ pstatus + rcs(day, 5) + treatment + rcs(age, 4) + sofa + lips,
          cumulative(reverse=TRUE, parallel=TRUE), data=w)
f0l <- vglm(ordered(status) ~ pstatus + day + treatment + rcs(age, 4) + sofa + lips,
          cumulative(reverse=TRUE, parallel=FALSE ~ day), data=w)
f03 <- vglm(ordered(status) ~ pstatus + rcs(day, 3) + treatment + rcs(age, 4) + sofa + lips,
          cumulative(reverse=TRUE, parallel=FALSE ~ rcs(day, 3)), data=w) 
f05 <- vglm(ordered(status) ~ pstatus + rcs(day, 5) + treatment + rcs(age, 4) + sofa + lips,
          cumulative(reverse=TRUE, parallel=FALSE ~ rcs(day, 5)), data=w)
f1 <- vglm(ordered(status) ~ pstatus + rcs(gap, 4) + rcs(day, 3) + treatment + rcs(age, 4) + sofa + lips,
          cumulative(reverse=TRUE, parallel=FALSE ~ rcs(day, 3)), data=w)
f2 <- vglm(ordered(status) ~ gap * pstatus + rcs(day, 3) + treatment + rcs(age, 4) + sofa + lips,
          cumulative(reverse=TRUE, parallel=FALSE ~ rcs(day, 3)), data=w)
f3 <- vglm(ordered(status) ~ rcs(gap, 3) * pstatus + rcs(day, 3) + treatment + rcs(age, 4) + sofa + lips,
          cumulative(reverse=TRUE, parallel=FALSE ~ rcs(day, 3)), data=w)
f4 <- vglm(ordered(status) ~ rcs(gap, 4) * pstatus + rcs(day, 3) + treatment + rcs(age, 4) + sofa + lips,
          cumulative(reverse=TRUE, parallel=FALSE ~ rcs(day, 3)), data=w)

c(fpol=AIC(fpol), fpo3=AIC(fpo3), fpo5=AIC(fpo5), f0l=AIC(f0l), f03=AIC(f03), f05=AIC(f05), f1=AIC(f1), f2=AIC(f2), f3=AIC(f3), f4=AIC(f4))
    fpol     fpo3     fpo5      f0l      f03      f05       f1       f2 
12044.31 12023.26 12023.96 12039.35 11988.62 11992.68 11639.42 11612.38 
      f3       f4 
11590.84 11592.26 

Comparing AICs indicates the following:

  • fpol vs. fpo3: effect of day is nonlinear in a PO model
  • fpo3 vs. fpo5: 3 knots are adequate for day in PO model
  • fpo5 vs. f05: moderate non-PO for day
  • f05 vs. f0l: non-PO effect of day or main effect of day or both are nonlinear
  • f05 vs. f03: 3 knots works better for day for combined main and non-PO effects
  • f1 vs. f03: main effect is needed for gap
  • f2 vs. f1: previous state interacts weekly with gap
  • f2, f3,f4: previous state interaction with gap is better represented by 3 knots

Odds Ratios for ORCHID

Now get treatment odds ratios for state occupancy probabilities, as a function of time, as opposed to odds of transitions over consecutive time periods.

odds <- function(x) x / (1. - x)
setDT(Rorchid, key=c('trt', 'day'))
b    <- Rorchid[trt == 'hydroxychloroquine', phome]
a    <- Rorchid[trt == 'placebo',            phome]
days <- Rorchid[trt == 'placebo',            day]
or   <- odds(b) / odds(a)
plot(days, or, type='b', xlab='Day', ylab='hydroxy : placebo OR', ylim=c(1, 1.05))

abline(h=exp(coef(f)['trt=hydroxychloroquine']), col=gray(0.85))
text(27, 1.007,
         'Horizontal line is conditional\ntransition OR from unified PO model', adj=1)

Bayesian Markov Model for ORCHID

We parallel the regular proportional odds model with a Bayesian Markov proportional hazards model.

stanSet()
bmarkov <- blrm(y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 3) +
                sofa_nogcs,
                data=d, refresh=100, file='bmarkov.rds')
htmlVerbatim(stanDx(bmarkov))

Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved

For each parameter, n_eff is a crude measure of effective sample size and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1)

                        n_eff  Rhat
 y>=2                    2662 1.001
 y>=3                    2219 1.001
 y>=4                    2691 1.000
 y>=5                    3632 1.001
 y>=6                    3459 1.001
 y>=7                    2280 1.001
 gap                     2625 1.000
 yprev=3                 3087 1.000
 yprev=4                 3970 1.000
 yprev=5                 3637 1.001
 yprev=6                 2769 1.001
 yprev=7                 2578 1.000
 day                     5192 1.000
 day^2                   5053 1.000
 trt=hydroxychloroquine  5442 1.000
 age                     5505 1.000
 age'                    5513 1.000
 sofa_nogcs              4865 1.000
 gap * yprev=3           4791 1.000
 gap * yprev=4           4223 1.000
 gap * yprev=5           5252 0.999
 gap * yprev=6           5020 0.999
 gap * yprev=7           5482 0.999
 
stanDxplot(bmarkov)
bmarkov

Bayesian Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.308 for Intercepts

 blrm(formula = y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 
     3) + sofa_nogcs, data = d, refresh = 100, file = "bmarkov.rds")
 

Frequencies of Responses

   1   2   3   4   5   6   7 
  50 357 297 817 526 757 489 
 
Mixed Calibration/
Discrimination Indexes
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 3293 B 0.084 [0.082, 0.085] g 4.179 [4.02, 4.356] C 0.908 [0.907, 0.908]
Draws 4000 gp 0.442 [0.437, 0.447] Dxy 0.816 [0.814, 0.817]
Chains 4 EV 0.625 [0.604, 0.644]
p 17 v 13.45 [12.436, 14.582]
vp 0.155 [0.151, 0.161]
Mode β Mean β Median β S.E. Lower Upper Pr(β>0) Symmetry
y≥2   2.4622   2.4679   2.4675  0.3170   1.8609   3.0937  1.0000  1.03
y≥3   -1.0702   -1.0803   -1.0814  0.3078   -1.6511   -0.4478  0.0000  0.96
y≥4   -2.9740   -2.9829   -2.9839  0.3222   -3.6155   -2.3538  0.0000  1.00
y≥5   -6.0546   -6.0639   -6.0613  0.3389   -6.7592   -5.4398  0.0000  0.98
y≥6   -7.8915   -7.8991   -7.8943  0.3454   -8.5890   -7.2479  0.0000  0.98
y≥7  -10.7955  -10.8085  -10.7999  0.3591  -11.5214  -10.1165  0.0000  0.97
gap   -0.2174   -0.2170   -0.2165  0.0872   -0.3858   -0.0496  0.0058  0.99
yprev=3   2.2423   2.2524   2.2523  0.2273   1.8178   2.6953  1.0000  1.03
yprev=4   4.8495   4.8570   4.8511  0.2362   4.4060   5.3202  1.0000  1.01
yprev=5   7.2669   7.2735   7.2694  0.2573   6.7586   7.7535  1.0000  1.01
yprev=6   9.3374   9.3438   9.3395  0.2811   8.8091   9.9018  1.0000  1.01
yprev=7   9.3061   9.3066   9.3182  0.5846   8.1136   10.4645  1.0000  0.99
day   0.2781   0.2788   0.2787  0.0400   0.2036   0.3577  1.0000  1.01
day2   -0.0052   -0.0052   -0.0052  0.0008   -0.0068   -0.0038  0.0000  1.00
trt=hydroxychloroquine   0.0113   0.0126   0.0116  0.0694   -0.1308   0.1394  0.5685  0.98
age   -0.0092   -0.0092   -0.0093  0.0050   -0.0185   0.0007  0.0330  1.01
age’   -0.0031   -0.0031   -0.0031  0.0058   -0.0147   0.0078  0.2962  1.02
sofa_nogcs   -0.0901   -0.0900   -0.0903  0.0204   -0.1294   -0.0495  0.0000  1.02
gap × yprev=3   0.2080   0.2063   0.2079  0.0618   0.0791   0.3202  0.9998  0.96
gap × yprev=4   0.3269   0.3258   0.3255  0.0390   0.2493   0.3995  1.0000  0.98
gap × yprev=5   0.2403   0.2396   0.2397  0.0394   0.1638   0.3170  1.0000  1.00
gap × yprev=6   0.0943   0.0940   0.0942  0.0329   0.0327   0.1580  0.9975  0.99
gap × yprev=7   0.2276   0.2288   0.2284  0.0570   0.1243   0.3456  1.0000  1.02

We need to be able to compute posterior distributions on state occupancy probabilities to take into account all sources of uncertainty. The follow function computes all needed derived quantities, separately for each posterior draw.

# Compute unconditional probabilities
# Output is an array of dimension number of draws by s rows corresponding to
# t=times(1), times(2), ...
# times(s), and final dimension is the number of Y categories
# times(1) must be 1
# 
# fit is the Bayesian model fit from blrm which must contain the variable named day and the
# previous state, which must be named by the value of pvarname
# data is a data frame or data table that contains covariate settings
# other than day and what's in pvarname.  data has one row.
# times is the vector of days for which to evaluate state occupancy probabilities
# ylevels are the ordered levels of the outcome variable
# Set gap=TRUE to compute time gaps when predictor gap is in the model

uprobB <- function(fit, data, times, ylevels, pvarname='yprev', gap=FALSE) {
    if(! length(fit$draws)) stop('fit does not have posterior draws')
    nd <- nrow(fit$draws)
  k  <- length(ylevels)
  s  <- length(times)
  P  <- array(NA, c(nd, s, k),
                        dimnames=list(paste('draw', 1 : nd), as.character(times), 
                                                    as.character(ylevels)))
  # Never uncondition on initial state
  data$day <- times[1]
  if(gap) data$gap <- times[1]
  p <- predict(fit, data, type='fitted.ind', posterior.summary='all')
  P[, 1, ] <- p
  # cp: matrix of conditional probabilities of Y conditioning on previous time Y
  # Columns = k conditional probabilities conditional on a single previous state
  # Rows    = all possible previous states
  # This is for a single posterior draw
  cp <- matrix(NA, nrow=k, ncol=k, 
               dimnames=list(paste('t-1', ylevels), paste('t', ylevels)))
  data <- as.list(data)
  data[[pvarname]] <- as.character(ylevels[-1])
  # don't request estimates after absorbing state
  edata <- expand.grid(data)
  for(it in 2 : s) {
    edata$day <- times[it]
    if(gap) edata$gap <- times[it] - times[it - 1]
    # y=first level = absorbing state
    pp <- predict(fit, edata, type='fitted.ind', posterior.summary='all')
    for(idraw in 1 : nd) {
        cp <- rbind(c(1., rep(0., k - 1)), pp[idraw, ,])
      # Compute unconditional probabilities of being in all possible states
      # at current time t
      P[idraw, it, ] <- t(cp) %*% P[idraw, it - 1, ]
    }
  }
  P
}

times <- c(1,2,3,4,7,14,28)
ylev <- 1 : 7
tlev <- levels(d$trt)
dat <- data.frame(yprev='4', age=repcov$age, sofa_nogcs=repcov$sofa_nogcs)
R   <- list()
for(trt in tlev) {   # separately for treatments
    dat$trt <- trt
    u <- uprobB(bmarkov, dat, times, ylev, pvarname='yprev', gap=TRUE)
    R[[trt]] <- u
}

# Compute summary measure per treatment, separately for each time and posterior draw
# P(home) = P(Y=6 or Y=7)

s <- function(u) u[, 6] + u[, 7]
# w <- apply(R$placebo, 1, s)    # to test
r <- lapply(R, function(x) apply(x, 1, s))

For each treatment and time compute the posterior mean P(home) and 0.95 highest posterior density intervals.

g <- function(x) {
    w <- c(mean(x), HPDint(x))
    names(w) <- c('mean', 'lower', 'upper')
    w
}

rs <- lapply(r, function(x) as.data.frame(t(apply(x, 1, g))))
for(trt in tlev) rs[[trt]]$trt <- trt
rs <- do.call(rbind, rs)
# Fetch day from end of row names (follows .)
rs$day <- as.numeric(sub('^.*\\.(.*?)', '\\2', rownames(rs), ))
xl <- xlab('Day'); yl <- ylab('P(being at home)')
rs$txt <- with(rs, paste0(trt, '<br>Day:', day, '<br>', round(mean, 3)))
gg <- ggplot(rs, aes(x=day, y=mean, color=trt, label=txt)) + geom_line() + 
    geom_ribbon(aes(ymin = lower, ymax = upper), alpha=0.2, linetype=0) +
    xl + yl + labs(color='')
ggp(gg)

Instead of individual per-treatment estimates summarize the difference and odds ratio for state occupancy probabilities.

Dif <- r$hydroxychloroquine - r$placebo
dif <- as.data.frame(t(apply(Dif, 1, g)))
dif$day <- as.numeric(sub('^.*\\.(.*?)', '\\2', rownames(dif), ))
dif$txt <- with(dif, paste0('Day:', day, '<br>', round(mean, 3)))
gg <- ggplot(dif, aes(x=day, y=mean, label=txt)) + geom_line() +
    geom_ribbon(aes(ymin=lower, ymax=upper), alpha=0.2, linetype=0) +
    xl + ylab('Hydroxy - Placebo Difference in P(home)')
ggp(gg)
or <- odds(r$hydroxychloroquine) / odds(r$placebo)
or <- as.data.frame(t(apply(or, 1, g)))
or$day <- as.numeric(sub('^.*\\.(.*?)', '\\2', rownames(or), ))
or$txt <- with(or, paste0('Day:', day, '<br>', round(mean, 3)))
gg <- ggplot(or, aes(x=day, y=mean, label=txt)) + geom_line() +
    geom_ribbon(aes(ymin=lower, ymax=upper), alpha=0.2, linetype=0) +
    xl + ylab('Hydroxy - Placebo OR for Being at Home')
ggp(gg)

Evidence for Efficacy Over Time for One Covariate Combination

Now compute the Bayesian evidence measure for a positive difference in P(home). Note that in the two scatterplots relating posterior draws for differences in occupancy probabilities (P(home)), plotting one day vs. another, no points are on the upper left or lower right quadrants, so the evidence for a positive treatment effect does not vary over the days of follow-up.

plot(Dif[1, ], Dif[5, ], xlab='Posterior Draws for Day 1', ylab='Day 7', cex=.3)
abline(h=0, v=0, col=gray(0.8))
plot(Dif[3, ], Dif[6, ], xlab='Posterior Draws for Day 3', ylab='Day 14', cex=.3)
abline(h=0, v=0, col=gray(0.8))
pp <- apply(Dif, 1, function(x) mean(x > 0))
w <- data.frame(day = as.integer(names(pp)), pp)
w$txt <- with(w, paste0('Day:', day, '<br>', round(pp, 3)))
gg <- ggplot(w, aes(x=day, y=pp, label=txt)) + geom_line() +
    xl + ylab('P(increase in P(home) with hydroxychloroquine)')
ggp(gg)

The evidence does vary when computing the posterior probability that treatment has a benefit exceeding an absolute risk difference of 0.02. This is primarily due to the near impossibility of being discharged on the first or second day.

pp <- apply(Dif, 1, function(x) mean(x > 0.02))
w <- data.frame(day = as.integer(names(pp)), pp)
w$txt <- with(w, paste0('Day:', day, '<br>', round(pp, 3)))
gg <- ggplot(w, aes(x=day, y=pp, label=txt)) + geom_line() +
    xl + ylab('P(increase in P(home) with hydroxychloroquine > 0.02)')
ggp(gg)

Posterior Distributions of Median and Restricted Mean Time to Home

Even though the data collection times in ORCHID were not granular enough to compute median time until the patient returned home, we compute this anyway, and compute per-treatment posterior means and discrete posterior probability distributions for these medians. For a given posterior sample, the median is the first day for which P(home) >= 0.5 for a treatment. We arbitrary use 30d as the median when the probability never exceeds 0.5. Recall that all these calculations are conditional on the patient being in state 4 (hospitalized on supplemental oxygen) pre-randomization.

days <- c(1,2,3,4,7,14,28)
g <- function(p) if(all(p < 0.5)) 30 else min(days[p >= 0.5])
rmed <- lapply(r, function(x) data.frame(med = apply(x, 2, g)))
for(trt in tlev) rmed[[trt]]$trt <- trt
rmed <- do.call(rbind, rmed)
with(rmed, tapply(med, trt, mean))   # posterior mean for median
hydroxychloroquine            placebo 
                14                 14 
with(rmed, table(med, trt))          # discrete distribution for median
    trt
med  hydroxychloroquine placebo
  14               4000    4000

The sharpness of the posterior distribution of the median time to home is just a reflection of what an easy question we are asking: what is the median of time to home after essentially rounding it to the nearest 7d.

Follow-up was not long enough to compute the mean time until returning home, but we can do inference for the mean restricted time until home (restricted to 28d) assuming that home is an absorbing state (which is approximately but not exactly true). Let \(t_{i}, i=1, 2, \dots, 7\) represent the assessment times (days 1, 2, 3, 4, 7, 14, 28) for time until home \(T\). Under the \(T \leq 28\) restriction, the mean restricted time to home is \(\sum_{i=1}^{7} t_{i} \Pr(T = t_{i}) = \sum_{i=1}^{7} t_{i} [\Pr(T \leq t_{i}) - [i > 1] \Pr(T \leq t_{i-1})]\).

g <- function(p) sum(days * (p - c(0., p[-length(p)])))
rmean <- lapply(r, function(x) data.frame(mean = apply(x, 2, g)))
for(trt in tlev) rmean[[trt]]$trt <- trt
rmean <- do.call(rbind, rmean)
with(rmean, tapply(mean, trt, mean))   # posterior means for mean restricted time to home
hydroxychloroquine            placebo 
          10.55112           10.57485 
ggplot(rmean, aes(x=mean, color=trt)) + geom_density() +
    labs(color='') + xlab('Restricted Mean Time to Home (days)') + ylab('')

Compute the posterior probability that the mean restricted time to home is shorter for hydroxy.

a <- rmean[grep('placebo', rownames(rmean)), 'mean']
b <- rmean[grep('hydrox',  rownames(rmean)), 'mean']
plot(density(b - a), main='', xlab='Difference in Mean Restricted Time to Home', ylab='')
abline(v=0, col=gray(0.85))
mean(b < a)
[1] 0.56825

Markov Model with Random Effects

Let’s see what difference it makes to add patient-level random effects to the Markov state transition model.

bmarkovre <- blrm(y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 3) +
                sofa_nogcs + cluster(id),
                data=d, refresh=100, file='bmarkovre.rds')
htmlVerbatim(stanDx(bmarkovre))

Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved

For each parameter, n_eff is a crude measure of effective sample size and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1)

                        n_eff  Rhat
 y>=2                    2058 1.000
 y>=3                    1266 1.002
 y>=4                    1485 1.001
 y>=5                    2911 1.000
 y>=6                    2458 1.000
 y>=7                    1585 1.001
 gap                     1950 1.001
 yprev=3                 1880 1.000
 yprev=4                 3299 1.000
 yprev=5                 2750 1.001
 yprev=6                 1627 1.002
 yprev=7                 2037 1.000
 day                     5880 0.999
 day^2                   5076 1.001
 trt=hydroxychloroquine  5040 0.999
 age                     4770 1.001
 age'                    5632 0.999
 sofa_nogcs              5487 0.999
 gap * yprev=3           4795 1.000
 gap * yprev=4           5105 0.999
 gap * yprev=5           4534 0.999
 gap * yprev=6           5255 1.000
 gap * yprev=7           5623 0.999
 sigmag                  1384 1.000
 
stanDxplot(bmarkovre)
bmarkovre

Bayesian Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.308 for Intercepts

 blrm(formula = y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 
     3) + sofa_nogcs + cluster(id), data = d, refresh = 100, file = "bmarkovre.rds")
 

Frequencies of Responses

   1   2   3   4   5   6   7 
  50 357 297 817 526 757 489 
 
Mixed Calibration/
Discrimination Indexes
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 3293 B 0.084 [0.082, 0.085] g 4.184 [4.003, 4.327] C 0.908 [0.907, 0.908]
Draws 4000 gp 0.442 [0.437, 0.447] Dxy 0.816 [0.814, 0.817]
Chains 4 EV 0.626 [0.604, 0.644]
p 17 v 13.469 [12.35, 14.435]
Cluster on id vp 0.156 [0.15, 0.16]
Clusters 479
σγ 0.0532 [1e-04, 0.1583]
Mean β Median β S.E. Lower Upper Pr(β>0) Symmetry
y≥2   2.4855   2.4782  0.3151   1.8645   3.0789  1.0000  1.00
y≥3   -1.0601   -1.0671  0.3088   -1.6477   -0.4493  0.0003  1.01
y≥4   -2.9656   -2.9664  0.3227   -3.5882   -2.3428  0.0000  1.03
y≥5   -6.0504   -6.0497  0.3357   -6.7091   -5.3733  0.0000  1.06
y≥6   -7.8895   -7.8913  0.3462   -8.6008   -7.2125  0.0000  1.03
y≥7  -10.8014  -10.8012  0.3590  -11.4987  -10.0604  0.0000  1.04
gap   -0.2173   -0.2159  0.0838   -0.3727   -0.0480  0.0048  1.03
yprev=3   2.2465   2.2485  0.2255   1.8144   2.6793  1.0000  0.99
yprev=4   4.8489   4.8438  0.2330   4.3954   5.3076  1.0000  1.02
yprev=5   7.2664   7.2617  0.2616   6.7623   7.7755  1.0000  1.01
yprev=6   9.3376   9.3361  0.2844   8.7519   9.8522  1.0000  0.98
yprev=7   9.2789   9.2782  0.5826   8.0628   10.3473  1.0000  1.02
day   0.2793   0.2797  0.0387   0.2083   0.3589  1.0000  1.04
day2   -0.0052   -0.0052  0.0007   -0.0067   -0.0038  0.0000  1.00
trt=hydroxychloroquine   0.0118   0.0111  0.0684   -0.1196   0.1475  0.5662  1.01
age   -0.0093   -0.0092  0.0049   -0.0183   0.0011  0.0310  1.00
age’   -0.0031   -0.0031  0.0057   -0.0150   0.0076  0.2855  1.00
sofa_nogcs   -0.0913   -0.0909  0.0207   -0.1296   -0.0490  0.0000  0.93
gap × yprev=3   0.2067   0.2060  0.0608   0.0911   0.3231  0.9995  1.01
gap × yprev=4   0.3279   0.3274  0.0386   0.2517   0.4014  1.0000  1.01
gap × yprev=5   0.2417   0.2421  0.0397   0.1665   0.3205  1.0000  0.98
gap × yprev=6   0.0954   0.0950  0.0338   0.0319   0.1614  0.9972  1.03
gap × yprev=7   0.2321   0.2324  0.0575   0.1205   0.3480  1.0000  1.03

The very low standard deviation of the random effects (0.053) lends credence to the idea that the state transitions are conditionally independent and that random effects are not needed. Also, the posterior mean and median regression parameter values and standard deviations of their posterior distributions changed very little.

Serial Transition Model With Non-Proportional Odds

Return to the Markov model without random effects. Let’s use a partial proportional odds model in which we add departures from the proportional odds assumption with regard to two variables: previous state and time (but not their interaction). We are using the constrained partial proportional odds model in which the constraint imposes linearity in the outcome category (but not for the effect of the previous outcome level).

bmarkovppo <- blrm(y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 3) +
                sofa_nogcs, ~ day + yprev, cppo=function(y) y,
                data=d, refresh=100, file='bmarkovppo.rds')
htmlVerbatim(stanDx(bmarkovppo))

Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved

For each parameter, n_eff is a crude measure of effective sample size and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1)

                        n_eff  Rhat
 y>=2                     524 1.009
 y>=3                     480 1.008
 y>=4                     441 1.009
 y>=5                     589 1.005
 y>=6                     795 1.001
 y>=7                     812 1.003
 gap                      487 1.010
 yprev=3                  978 1.004
 yprev=4                  595 1.006
 yprev=5                  761 1.004
 yprev=6                  902 1.005
 yprev=7                  631 1.009
 day                     2249 1.000
 day^2                   1787 1.001
 trt=hydroxychloroquine  2581 1.000
 age                     2304 1.002
 age'                    2419 1.000
 sofa_nogcs              1996 1.001
 gap * yprev=3           1883 1.003
 gap * yprev=4           2312 0.999
 gap * yprev=5           2078 1.000
 gap * yprev=6           1248 1.001
 gap * yprev=7           1702 1.001
 day x f(y)               485 1.008
 yprev=3 x f(y)           886 1.002
 yprev=4 x f(y)           519 1.008
 yprev=5 x f(y)          1056 1.003
 yprev=6 x f(y)           852 1.005
 yprev=7 x f(y)           549 1.008
 
stanDxplot(bmarkovppo)
bmarkovppo

Bayesian Constrained Partial Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.308 for Intercepts

 blrm(formula = y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 
     3) + sofa_nogcs, ppo = ~day + yprev, cppo = function(y) y, 
     data = d, refresh = 100, file = "bmarkovppo.rds")
 

Frequencies of Responses

   1   2   3   4   5   6   7 
  50 357 297 817 526 757 489 
 
Mixed Calibration/
Discrimination Indexes
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 3293 B 0.081 [0.08, 0.082] g 4.556 [4.243, 4.85] C 0.886 [0.88, 0.894]
Draws 4000 gp 0.447 [0.441, 0.452] Dxy 0.773 [0.76, 0.788]
Chains 4 EV 0.645 [0.624, 0.667]
p 17 v 15.982 [13.802, 18.078]
vp 0.16 [0.155, 0.165]
Mode β Mean β Median β S.E. Lower Upper Pr(β>0) Symmetry
y≥2   2.8421   2.8598   2.8643  0.3338   2.2264   3.5288  1.0000  1.00
y≥3   -1.4108   -1.4103   -1.4075  0.3250   -2.0104   -0.7244  0.0000  1.04
y≥4   -3.5968   -3.6227   -3.6207  0.3719   -4.3768   -2.8930  0.0000  0.98
y≥5   -7.0605   -7.1393   -7.1315  0.4372   -8.0427   -6.3284  0.0000  0.95
y≥6   -9.2730   -9.4218   -9.4100  0.5459  -10.5459   -8.4240  0.0000  0.88
y≥7  -12.3128  -12.5330  -12.4950  0.6901  -13.9470  -11.2632  0.0000  0.86
gap   0.0851   0.0841   0.0851  0.0888   -0.0953   0.2499  0.8238  1.00
yprev=3   2.8052   2.8386   2.8390  0.3517   2.1649   3.5282  1.0000  1.04
yprev=4   5.8843   5.9514   5.9393  0.3274   5.3706   6.6288  1.0000  1.08
yprev=5   9.0448   9.1057   9.0961  0.3553   8.3830   9.7680  1.0000  1.06
yprev=6   13.6666   13.6957   13.6902  0.5387   12.7090   14.8397  1.0000  1.06
yprev=7   12.0749   12.3731   12.3376  1.0384   10.3061   14.4209  1.0000  1.08
day   0.3580   0.3579   0.3578  0.0420   0.2780   0.4427  1.0000  1.01
day2   -0.0080   -0.0080   -0.0080  0.0007   -0.0094   -0.0066  0.0000  1.00
trt=hydroxychloroquine   0.0171   0.0142   0.0139  0.0710   -0.1138   0.1567  0.5768  1.00
age   -0.0108   -0.0109   -0.0109  0.0053   -0.0217   -0.0014  0.0192  0.99
age’   -0.0022   -0.0021   -0.0020  0.0061   -0.0141   0.0095  0.3665  1.04
sofa_nogcs   -0.0980   -0.0981   -0.0975  0.0203   -0.1376   -0.0598  0.0000  0.98
gap × yprev=3   -0.0861   -0.0817   -0.0823  0.0499   -0.1778   0.0124  0.0472  1.00
gap × yprev=4   -0.1911   -0.1875   -0.1887  0.0377   -0.2616   -0.1127  0.0000  1.06
gap × yprev=5   -0.2893   -0.2861   -0.2866  0.0471   -0.3745   -0.1887  0.0000  1.03
gap × yprev=6   -0.4627   -0.4601   -0.4604  0.0441   -0.5441   -0.3748  0.0000  1.01
gap × yprev=7   -0.3354   -0.3330   -0.3340  0.0635   -0.4538   -0.2100  0.0000  1.07
day x f(y)   0.1202   0.1197   0.1198  0.0068   0.1074   0.1338  1.0000  1.02
yprev=3 x f(y)   -0.1045   -0.0802   -0.0785  0.2495   -0.5887   0.3980  0.3650  1.03
yprev=4 x f(y)   0.1093   0.2008   0.1768  0.1969   -0.1366   0.6030  0.8655  1.50
yprev=5 x f(y)   -0.9969   -0.8857   -0.8974  0.2864   -1.4536   -0.3492  0.0035  1.12
yprev=6 x f(y)   -2.7648   -2.6253   -2.6180  0.3922   -3.4177   -1.8523  0.0000  0.94
yprev=7 x f(y)   -1.6179   -1.6577   -1.6263  0.6149   -2.9022   -0.5181  0.0018  0.87
# htmlVerbatim(compareBmods(bmarkovppo, bmarkov))
# Note: bmarkovppo fit with loo=TRUE was not saved because it was 171MB for some reason
# Output:
# Method: stacking
# ------
#        weight
#  model1 0.957 
#  model2 0.043 
#  
#  Compute posterior means divided by SE
round(coef(bmarkovppo) / sqrt(diag(vcov(bmarkovppo))), 2)
              y>=2                   y>=3                   y>=4 
              8.57                  -4.34                  -9.74 
              y>=5                   y>=6                   y>=7 
            -16.33                 -17.26                 -18.16 
               gap                yprev=3                yprev=4 
              0.95                   8.07                  18.18 
           yprev=5                yprev=6                yprev=7 
             25.63                  25.42                  11.92 
               day                  day^2 trt=hydroxychloroquine 
              8.53                 -10.99                   0.20 
               age                   age'             sofa_nogcs 
             -2.07                  -0.34                  -4.84 
     gap * yprev=3          gap * yprev=4          gap * yprev=5 
             -1.64                  -4.98                  -6.07 
     gap * yprev=6          gap * yprev=7             day x f(y) 
            -10.44                  -5.24                  17.72 
    yprev=3 x f(y)         yprev=4 x f(y)         yprev=5 x f(y) 
             -0.32                   1.02                  -3.09 
    yprev=6 x f(y)         yprev=7 x f(y) 
             -6.69                  -2.70 

There were some minor sampling/convergence problems especially with respect to the parameters for departures from proportional odds.

The model comparison results listed in the code above puts a great likelihood on the partial proportional odds model fitting better than the proportional odds model. The result of compareBmods, not saved in the code because the loo=TRUE option makes the fit object much bigger, is here (but from a slightly different fit):

Method: stacking
------
       weight
 model1 0.957 
 model2 0.043 

Based on posterior mean parameter values and standard deviations of posterior distributions (standard errors), the most important departures from PO seem to be time-related and the effect of previously being in state 6.

Final Model With Non-PO With Respect to Time

Our final model will have as non-proportional odds only the effect of time, but we will now allow the time effect to be nonlinear (quadratic). As before the cppo function assumes that the PO departure is proportional to the numeric 1-7 outcome codes. A fully unconstrained partial proportional odds model would have too many parameters.

bmarkovppof <- blrm(y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 3) +
                sofa_nogcs, ~ pol(day, 2), cppo=function(y) y,
                data=d, refresh=100, file='bmarkovppof.rds')
htmlVerbatim(stanDx(bmarkovppof))

Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved

For each parameter, n_eff is a crude measure of effective sample size and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1)

                        n_eff  Rhat
 y>=2                    3155 1.001
 y>=3                    2772 1.001
 y>=4                    3497 1.000
 y>=5                    4565 1.000
 y>=6                    3112 1.000
 y>=7                    2744 1.000
 gap                     3701 1.000
 yprev=3                 3236 1.001
 yprev=4                 4312 1.000
 yprev=5                 3439 1.001
 yprev=6                 2914 1.001
 yprev=7                 3443 1.000
 day                     5998 1.000
 day^2                   4539 1.000
 trt=hydroxychloroquine  7707 0.999
 age                     6236 1.000
 age'                    6518 1.000
 sofa_nogcs              5643 1.000
 gap * yprev=3           5416 0.999
 gap * yprev=4           7418 1.000
 gap * yprev=5           6227 0.999
 gap * yprev=6           4029 1.000
 gap * yprev=7           4523 1.000
 day x f(y)              2652 1.000
 day^2 x f(y)            4295 1.001
 
stanDxplot(bmarkovppof)
bmarkovppof

Bayesian Constrained Partial Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.308 for Intercepts

 blrm(formula = y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 
     3) + sofa_nogcs, ppo = ~pol(day, 2), cppo = function(y) y, 
     data = d, refresh = 100, file = "bmarkovppof.rds")
 

Frequencies of Responses

   1   2   3   4   5   6   7 
  50 357 297 817 526 757 489 
 
Mixed Calibration/
Discrimination Indexes
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 3293 B 0.081 [0.08, 0.082] g 3.805 [3.639, 3.941] C 0.884 [0.877, 0.892]
Draws 4000 gp 0.438 [0.433, 0.443] Dxy 0.768 [0.754, 0.784]
Chains 4 EV 0.615 [0.596, 0.635]
p 17 v 11.362 [10.498, 12.293]
vp 0.153 [0.149, 0.159]
Mode β Mean β Median β S.E. Lower Upper Pr(β>0) Symmetry
y≥2   4.4378   4.4540   4.4456  0.3640   3.7344   5.1346  1.0000  0.94
y≥3   -0.4172   -0.4190   -0.4177  0.3262   -1.1102   0.1631  0.0980  0.95
y≥4   -3.2193   -3.2292   -3.2277  0.3334   -3.8614   -2.5610  0.0000  0.97
y≥5   -7.1431   -7.1566   -7.1519  0.3534   -7.8163   -6.4290  0.0000  0.96
y≥6  -10.1971  -10.2173  -10.2167  0.3874  -10.9379   -9.4161  0.0000  0.98
y≥7  -15.0652  -15.0933  -15.0933  0.4446  -15.9659  -14.2205  0.0000  0.99
gap   -0.0655   -0.0663   -0.0672  0.0887   -0.2413   0.1038  0.2285  0.98
yprev=3   2.7151   2.7239   2.7242  0.2236   2.2572   3.1256  1.0000  1.00
yprev=4   5.8959   5.9090   5.9062  0.2396   5.4725   6.3953  1.0000  0.99
yprev=5   8.9304   8.9504   8.9518  0.2800   8.4281   9.5012  1.0000  1.05
yprev=6   11.2825   11.3124   11.3110  0.3179   10.6828   11.9249  1.0000  1.06
yprev=7   9.9164   9.9414   9.9370  0.6189   8.7525   11.1980  1.0000  0.94
day   0.3923   0.3917   0.3912  0.0425   0.3121   0.4794  1.0000  1.07
day2   -0.0053   -0.0053   -0.0053  0.0007   -0.0067   -0.0038  0.0000  0.97
trt=hydroxychloroquine   0.0257   0.0264   0.0260  0.0716   -0.1048   0.1767  0.6408  1.03
age   -0.0105   -0.0105   -0.0104  0.0052   -0.0214   -0.0011  0.0205  0.96
age’   -0.0027   -0.0027   -0.0027  0.0059   -0.0136   0.0097  0.3157  1.01
sofa_nogcs   -0.1082   -0.1084   -0.1085  0.0201   -0.1475   -0.0689  0.0000  1.06
gap × yprev=3   -0.0726   -0.0704   -0.0707  0.0460   -0.1642   0.0143  0.0640  1.04
gap × yprev=4   -0.2130   -0.2101   -0.2110  0.0366   -0.2816   -0.1406  0.0000  1.08
gap × yprev=5   -0.3967   -0.3938   -0.3936  0.0411   -0.4736   -0.3125  0.0000  1.01
gap × yprev=6   -0.5680   -0.5657   -0.5660  0.0383   -0.6388   -0.4888  0.0000  1.08
gap × yprev=7   -0.3339   -0.3299   -0.3308  0.0601   -0.4405   -0.2046  0.0000  1.05
day x f(y)   0.3454   0.3457   0.3457  0.0199   0.3057   0.3837  1.0000  0.99
day2 x f(y)   -0.0066   -0.0066   -0.0066  0.0006   -0.0078   -0.0054  0.0000  1.05

There is strong evidence that the departure from PO with respect to day of follow-up is nonlinear.

Final Check

Let’s fit a Bayesian transition model without covariates (other than treatment and initial state) to once again check against empirical proportions of being at home.

bmarkovppon <- blrm(y ~ gap * yprev + pol(day, 2) + trt, ~ pol(day, 2),
                                        cppo=function(y) y,
                      data=d, refresh=100, file='bmarkovppon.rds')
htmlVerbatim(stanDx(bmarkovppon))

Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved

For each parameter, n_eff is a crude measure of effective sample size and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1)

                        n_eff  Rhat
 y>=2                    3039 1.000
 y>=3                    2804 1.001
 y>=4                    3198 1.001
 y>=5                    4132 1.000
 y>=6                    3108 1.000
 y>=7                    2452 1.000
 gap                     3223 1.002
 yprev=3                 2929 1.000
 yprev=4                 3561 1.000
 yprev=5                 3773 1.000
 yprev=6                 2980 0.999
 yprev=7                 3157 1.000
 day                     5230 1.000
 day^2                   4511 0.999
 trt=hydroxychloroquine  5522 1.000
 gap * yprev=3           4918 0.999
 gap * yprev=4           5557 1.000
 gap * yprev=5           4704 1.000
 gap * yprev=6           3538 1.000
 gap * yprev=7           3973 1.001
 day x f(y)              2504 1.000
 day^2 x f(y)            4184 0.999
 
bmarkovppon

Bayesian Constrained Partial Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.308 for Intercepts

 blrm(formula = y ~ gap * yprev + pol(day, 2) + trt, ppo = ~pol(day, 
     2), cppo = function(y) y, data = d, refresh = 100, file = "bmarkovppon.rds")
 

Frequencies of Responses

   1   2   3   4   5   6   7 
  50 357 297 817 526 757 489 
 
Mixed Calibration/
Discrimination Indexes
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 3293 B 0.083 [0.082, 0.084] g 3.723 [3.602, 3.865] C 0.884 [0.875, 0.895]
Draws 4000 gp 0.436 [0.431, 0.442] Dxy 0.768 [0.751, 0.79]
Chains 4 EV 0.611 [0.59, 0.632]
p 14 v 10.918 [10.193, 11.744]
vp 0.152 [0.147, 0.157]
Mode β Mean β Median β S.E. Lower Upper Pr(β>0) Symmetry
y≥2   3.0582   3.0613   3.0559  0.2542   2.5698   3.5447  1.0000  1.04
y≥3   -1.7124   -1.7184   -1.7170  0.2084   -2.1268   -1.3098  0.0000  1.02
y≥4   -4.4747   -4.4817   -4.4835  0.2230   -4.9316   -4.0574  0.0000  1.00
y≥5   -8.3492   -8.3536   -8.3546  0.2506   -8.8241   -7.8498  0.0000  0.99
y≥6  -11.3550  -11.3588  -11.3569  0.2932  -11.9503  -10.7758  0.0000  0.96
y≥7  -16.1466  -16.1465  -16.1463  0.3684  -16.9045  -15.4516  0.0000  0.97
gap   -0.0601   -0.0610   -0.0611  0.0872   -0.2403   0.1043  0.2380  0.98
yprev=3   2.9563   2.9623   2.9583  0.2233   2.5177   3.4066  1.0000  1.02
yprev=4   6.2331   6.2370   6.2370  0.2320   5.7860   6.6868  1.0000  1.00
yprev=5   9.3574   9.3595   9.3578  0.2661   8.8260   9.8676  1.0000  1.03
yprev=6   11.7001   11.6994   11.6996  0.3149   11.0668   12.2847  1.0000  0.98
yprev=7   10.3873   10.3768   10.3723  0.6030   9.1319   11.4835  1.0000  1.04
day   0.3894   0.3892   0.3891  0.0416   0.3068   0.4689  1.0000  1.01
day2   -0.0053   -0.0053   -0.0053  0.0008   -0.0067   -0.0037  0.0000  1.01
trt=hydroxychloroquine   0.0029   0.0029   0.0016  0.0692   -0.1220   0.1461  0.5115  1.03
gap × yprev=3   -0.0700   -0.0689   -0.0696  0.0473   -0.1603   0.0202  0.0735  1.00
gap × yprev=4   -0.2336   -0.2304   -0.2308  0.0363   -0.3013   -0.1599  0.0000  1.00
gap × yprev=5   -0.4206   -0.4170   -0.4168  0.0413   -0.4960   -0.3342  0.0000  1.00
gap × yprev=6   -0.5733   -0.5692   -0.5690  0.0388   -0.6436   -0.4933  0.0000  1.00
gap × yprev=7   -0.3432   -0.3370   -0.3363  0.0610   -0.4566   -0.2188  0.0000  0.99
day x f(y)   0.3382   0.3374   0.3372  0.0190   0.3011   0.3736  1.0000  1.03
day2 x f(y)   -0.0064   -0.0064   -0.0064  0.0006   -0.0076   -0.0053  0.0000  0.99
dat <- data.frame(yprev='4')
R   <- list()
for(trt in tlev) {   # separately for treatments
    dat$trt <- trt
    u <- uprobB(bmarkovppon, dat, times, ylev, pvarname='yprev', gap=TRUE)
    R[[trt]] <- u
}

# Compute summary measure per treatment, separately for each time and posterior draw
# P(home) = P(Y=6 or Y=7)

s <- function(u) u[, 6] + u[, 7]
# w <- apply(R$placebo, 1, s)    # to test
r <- lapply(R, function(x) apply(x, 1, s))

# For each treatment and time compute the posterior mean P(home)

rs <- lapply(r, function(x) data.frame(phome = apply(x, 1, mean)))
for(trt in tlev) rs[[trt]]$trt <- trt
rs <- do.call(rbind, rs)
# Fetch day from end of row names (follows .)
rs$day <- as.numeric(sub('^.*\\.(.*?)', '\\2', rownames(rs), ))
rs$type <- 'Model'
rs <- rbind(rs, empirical)
ggplot(rs, aes(x=day, y=phome, color=trt, linetype=type)) + geom_line() + xl + yl +
    labs(color='', linetype='')

The agreement isn’t perfect but it’s probably the case that the sample size is not large enough to trust the empirical proportions.

A Different Markov Ordinal Model for Unequal Gaps

Consider a different approach in which we model serial dependence by attenuating the previous state effect by multiplying it by the reciprocal of gap time raised to a positive power. This has the properties of (1) forcing Y to be the same as the previous Y as the gap time approaches zero and (2) forcing independence across outcomes within patient as the gap time goes to infinity. To induce very high correlations in Y within patient with very small gaps, effects must approach \(\pm \infty\) on the logic scale, and the reciprocal raised to a power allows this. It is important to note that the effects being modeled must account for strong non-proportional odds. It is easy to see why when considering continuous Y with Gaussian residuals. A continuous-time AR(1) process is the Ornstein-Uhlenbeck process, and this process specifies that the variance of Y changes with the time gap. When the gap is very small, the conditional distribution of Y given the previous Y has a very small variance because in that case Y is very predictable from the previous Y. The analog to changing variance is changing the spacings of intercepts in a PO model. We accomplish that with a constrained partial PO model, the constraint being, as before, that the departure from PO is linear in Y. The question is whether we also need non-PO in absolute follow-up time. Logically, we do, but collinearity with gap may get in the way. We model serial dependence as a reciprocal of the cube root of the gap. The exponent really needs to be estimated from the data, which will require a nonlinear model. Our formulation does not allow main effects of previous states separate from the time gap to be included, so we are forcing the exponential function of gap to always be multiplied by effects of previous states.

A variety of functions for gap discounting of previous states, but none of them resulted in reasonable estimates of occupancy probabilities beyond the first gap > 1d. The last attempt is below. All of this points to the need for a true continuous time model when there is a need to estimate occupancy probabilities at unobserved times.

d$ypn <- NULL
rm(trt)
w <- d
ypn <- as.numeric(as.character(w$yprev))
w <- as.data.frame(w)
alpha <- 1.
lambda <- 0.15
gamma <- 2
X <- with(w, 1 * cbind(ypn == 2, ypn == 3, ypn == 4, ypn == 5, ypn == 6, ypn == 7))
# w$yp  <- X
w$ypr <- X / (w$gap ^ alpha)
w$ypr <- X * exp(-lambda * w$gap)
w$ypr <- X * (gamma + exp(-lambda * w$gap))
# model.matrix problem if yp is logical instead of numeric
bmarkovppor <- blrm(y ~ ypr + pol(day, 2) + trt, ~ pol(day, 2) + ypr,
                                        cppo=function(y) y,
                      data=w, refresh=100, file='bmarkovppor.rds')
b <- bmarkovppor
htmlVerbatim(stanDx(b))

Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved

For each parameter, n_eff is a crude measure of effective sample size and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1)

                        n_eff  Rhat
 y>=2                    1225 0.999
 y>=3                    1046 1.000
 y>=4                     960 1.000
 y>=5                    1272 1.000
 y>=6                    2215 1.000
 y>=7                    2524 1.001
 ypr[1]                  2123 1.001
 ypr[2]                  1581 1.000
 ypr[3]                   963 1.000
 ypr[4]                  1009 1.001
 ypr[5]                  2231 1.002
 ypr[6]                  1613 1.001
 day                     5208 1.000
 day^2                   6050 0.999
 trt=hydroxychloroquine  5684 1.000
 day x f(y)              1075 1.002
 day^2 x f(y)            2040 1.000
 ypr[1] x f(y)           1789 1.002
 ypr[2] x f(y)           2679 1.000
 ypr[3] x f(y)           1364 1.000
 ypr[4] x f(y)           1313 1.000
 ypr[5] x f(y)           2125 1.002
 ypr[6] x f(y)           3964 1.000
 
b

Bayesian Constrained Partial Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.308 for Intercepts

 blrm(formula = y ~ ypr + pol(day, 2) + trt, ppo = ~pol(day, 2) + 
     ypr, cppo = function(y) y, data = w, refresh = 100, file = "bmarkovppor.rds")
 

Frequencies of Responses

   1   2   3   4   5   6   7 
  50 357 297 817 526 757 489 
 
Mixed Calibration/
Discrimination Indexes
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 3293 B 0.084 [0.083, 0.085] g 4.511 [4.2, 4.819] C 0.884 [0.879, 0.893]
Draws 4000 gp 0.449 [0.443, 0.453] Dxy 0.769 [0.759, 0.785]
Chains 4 EV 0.651 [0.632, 0.671]
p 9 v 15.665 [13.51, 17.901]
vp 0.162 [0.157, 0.167]
Mode β Mean β Median β S.E. Lower Upper Pr(β>0) Symmetry
y≥2   9.0466   9.0329   9.0314  5.5668   -1.1526   20.2638  0.9502  1.02
y≥3   1.9489   1.9434   1.9365  3.9782   -5.9721   9.5441  0.6892  1.02
y≥4   -3.1405   -3.1685   -3.1378  2.7501   -8.7864   2.0215  0.1207  0.95
y≥5   -9.5137   -9.5646   -9.5349  2.4891  -14.7573   -4.9412  0.0000  0.96
y≥6  -14.3301  -14.4224  -14.4053  3.4199  -20.9956   -7.8009  0.0000  1.02
y≥7  -20.1491  -20.2917  -20.3525  4.9035  -29.9257  -11.1951  0.0000  1.04
ypr[1]   0.3317   0.3105   0.2920  0.8221   -1.2516   1.9675  0.6440  1.03
ypr[2]   1.1997   1.2103   1.1923  0.8127   -0.2407   2.9797  0.9380  1.03
ypr[3]   2.2590   2.2739   2.2575  0.8139   0.7930   4.0192  0.9970  1.02
ypr[4]   3.4183   3.4317   3.4179  0.8209   1.9777   5.2350  1.0000  1.03
ypr[5]   4.9959   4.9895   4.9714  0.8181   3.4314   6.6575  1.0000  1.03
ypr[6]   4.5479   4.5939   4.5781  0.9122   2.8510   6.4027  1.0000  1.02
day   0.3656   0.3669   0.3666  0.0471   0.2702   0.4574  1.0000  1.03
day2   -0.0083   -0.0083   -0.0083  0.0009   -0.0102   -0.0065  0.0000  0.99
trt=hydroxychloroquine   0.0146   0.0132   0.0140  0.0708   -0.1190   0.1548  0.5748  0.98
day x f(y)   0.3538   0.3560   0.3559  0.0626   0.2336   0.4760  1.0000  1.01
day2 x f(y)   -0.0073   -0.0074   -0.0074  0.0012   -0.0096   -0.0050  0.0000  0.95
ypr[1] x f(y)   1.4566   1.4369   1.4428  0.9882   -0.5276   3.2874  0.9285  0.99
ypr[2] x f(y)   1.3604   1.3655   1.3598  0.9896   -0.5180   3.3209  0.9132  1.00
ypr[3] x f(y)   1.4934   1.5060   1.5046  0.9894   -0.4427   3.3649  0.9382  0.98
ypr[4] x f(y)   0.8827   0.9027   0.8946  0.9878   -1.0460   2.7810  0.8170  0.98
ypr[5] x f(y)   0.0410   0.0775   0.0692  1.0024   -1.8831   1.9663  0.5275  1.00
ypr[6] x f(y)   0.6509   0.6510   0.6322  1.0204   -1.2392   2.6901  0.7375  0.99

Now get state occupancy probabilities for observed days, then for all days. We must modify the uprobB function to handle the special nature of gap dependencies.

uprobB <- function(fit, data, times, ylevels, pvarname='yprev', gap=FALSE) {
    if(! length(fit$draws)) stop('fit does not have posterior draws')
    nd <- nrow(fit$draws)
  k  <- length(ylevels)
  s  <- length(times)
  P  <- array(NA, c(nd, s, k),
                        dimnames=list(paste('draw', 1 : nd), as.character(times), 
                                                    as.character(ylevels)))
  # Never uncondition on initial state
  wdata <- data
  wdata$day <- times[1]
  if(gap) {
    gp <- times[1]
    wdata$gap <- gp
    ypn <- as.numeric(as.character(wdata$yprev))
    X <- 1. * cbind(ypn == 2, ypn == 3, ypn == 4, ypn == 5, ypn == 6, ypn == 7)
    # wdata$yp  <- X
    wdata$ypr <- X * (gamma + exp(-lambda * gp))     # X / (gp ^ alpha)
  }
  p <- predict(fit, wdata, type='fitted.ind', posterior.summary='all')
  P[, 1, ] <- p
  # cp: matrix of conditional probabilities of Y conditioning on previous time Y
  # Columns = k conditional probabilities conditional on a single previous state
  # Rows    = all possible previous states
  # This is for a single posterior draw
  cp <- matrix(NA, nrow=k, ncol=k, 
               dimnames=list(paste('t-1', ylevels), paste('t', ylevels)))
  data <- as.list(data)
  data[[pvarname]] <- as.character(ylevels[-1])
  # don't request estimates after absorbing state
  # When data contains a matrix with one row, expand.grid will treat each column of that matrix
  # as a new row in the output.   We don't want that.  Instead, create data frame without
  # expand.grid and add ypr as a matrix to it later
  # edata <- expand.grid(data)
  edata <- as.data.frame(data)  
  
  for(it in 2 : s) {
    edata$day <- times[it]
    if(gap) {
      gp        <- times[it] - times[it - 1]
      edata$gap <- gp
      ypn       <- as.numeric(as.character(edata[[pvarname]]))
      X         <- 1. * matrix(c(ypn == 2, ypn == 3, ypn == 4, ypn == 5, ypn == 6, ypn == 7), ncol=6)
      # edata$yp  <- X
      edata$ypr <- X * (gamma + exp(-lambda * gp))   # X / (gp ^ alpha)
    }
    # y=first level = absorbing state
    pp <- predict(fit, edata, type='fitted.ind', posterior.summary='all')
    for(idraw in 1 : nd) {
        cp <- rbind(c(1., rep(0., k - 1)), pp[idraw, ,])
      # Compute unconditional probabilities of being in all possible states
      # at current time t
      P[idraw, it, ] <- t(cp) %*% P[idraw, it - 1, ]
    }
  }
  P
}

times <- c(1,2,3,4,7,14,28)
ylev <- 1 : 7
tlev <- levels(d$trt)
dat <- data.frame(yprev='4', age=repcov$age, sofa_nogcs=repcov$sofa_nogcs)
R   <- list()
for(trt in tlev) {   # separately for treatments
    dat$trt <- trt
    u <- uprobB(b, dat, times, ylev, pvarname='yprev', gap=TRUE)
    R[[trt]] <- u
}

Look at state occupancy probabilities for placebo, summarized by the posterior mean probabilities.

sop <- round(apply(R[[1]], 2:3, mean), 2)
sop
      1    2    3    4    5    6    7
1  0.00 0.02 0.14 0.70 0.11 0.02 0.00
2  0.00 0.06 0.16 0.53 0.18 0.07 0.00
3  0.01 0.09 0.13 0.40 0.21 0.15 0.01
4  0.02 0.11 0.10 0.29 0.19 0.26 0.02
7  0.04 0.10 0.07 0.20 0.14 0.37 0.07
14 0.06 0.09 0.03 0.11 0.05 0.31 0.36
28 0.08 0.06 0.02 0.06 0.03 0.25 0.50

Now repeat, requesting probabilities for all consecutive days.

times <- 1 : 28
ylev <- 1 : 7
tlev <- levels(d$trt)
dat <- data.frame(yprev='4', age=repcov$age, sofa_nogcs=repcov$sofa_nogcs)
R   <- list()
for(trt in tlev) {   # separately for treatments
    dat$trt <- trt
    u <- uprobB(b, dat, times, ylev, pvarname='yprev', gap=TRUE)
    R[[trt]] <- u
}
round(apply(R[[1]], 2:3, mean), 2)
      1    2    3    4    5    6    7
1  0.00 0.02 0.14 0.70 0.11 0.02 0.00
2  0.00 0.06 0.16 0.53 0.18 0.07 0.00
3  0.01 0.09 0.13 0.40 0.21 0.15 0.01
4  0.02 0.11 0.10 0.29 0.19 0.26 0.02
5  0.04 0.11 0.07 0.20 0.15 0.37 0.06
6  0.06 0.09 0.05 0.13 0.10 0.43 0.15
7  0.07 0.08 0.03 0.07 0.06 0.40 0.29
8  0.09 0.06 0.02 0.04 0.03 0.30 0.46
9  0.11 0.04 0.01 0.02 0.02 0.20 0.61
10 0.12 0.02 0.01 0.01 0.01 0.12 0.71
11 0.12 0.02 0.00 0.01 0.00 0.08 0.77
12 0.13 0.01 0.00 0.00 0.00 0.05 0.81
13 0.13 0.01 0.00 0.00 0.00 0.03 0.83
14 0.13 0.00 0.00 0.00 0.00 0.02 0.84
15 0.13 0.00 0.00 0.00 0.00 0.02 0.85
16 0.13 0.00 0.00 0.00 0.00 0.01 0.85
17 0.13 0.00 0.00 0.00 0.00 0.01 0.85
18 0.13 0.00 0.00 0.00 0.00 0.01 0.86
19 0.14 0.00 0.00 0.00 0.00 0.01 0.86
20 0.14 0.00 0.00 0.00 0.00 0.01 0.86
21 0.14 0.00 0.00 0.00 0.00 0.01 0.86
22 0.14 0.00 0.00 0.00 0.00 0.01 0.86
23 0.14 0.00 0.00 0.00 0.00 0.01 0.86
24 0.14 0.00 0.00 0.00 0.00 0.01 0.86
25 0.14 0.00 0.00 0.00 0.00 0.01 0.86
26 0.14 0.00 0.00 0.00 0.00 0.01 0.86
27 0.14 0.00 0.00 0.00 0.00 0.01 0.86
28 0.14 0.00 0.00 0.00 0.00 0.01 0.85

Computing Environment

 R version 4.1.0 (2021-05-18)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Pop!_OS 20.10
 
 Matrix products: default
 BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
 
 attached base packages:
 [1] splines   stats4    stats     graphics  grDevices utils     datasets 
 [8] methods   base     
 
 other attached packages:
  [1] rstan_2.19.3         StanHeaders_2.21.0-1 VGAM_1.1-3          
  [4] plotly_4.9.3         data.table_1.13.0    rmsb_0.0.2          
  [7] rms_6.2-1            SparseM_1.78         Hmisc_4.6-0         
 [10] ggplot2_3.3.2        Formula_1.2-3        survival_3.2-11     
 [13] lattice_0.20-44     
 
To cite R in publications use:

R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

To cite the Hmisc package in publications use:

Harrell Jr F (2021). Hmisc: Harrell Miscellaneous. R package version 4.6-0, https://hbiostat.org/R/Hmisc/.

To cite the rms package in publications use:

Harrell Jr FE (2021). rms: Regression Modeling Strategies. https://hbiostat.org/R/rms/, https://github.com/harrelfe/rms.

To cite the rmsb package in publications use:

Harrell F (2021). rmsb: Bayesian Regression Modeling Strategies. R package version 0.0.2, https://hbiostat.org/R/rmsb/.

To cite the VGAM package in publications use:

Yee TW (2015). Vector Generalized Linear and Additive Models: With an Implementation in R. Springer, New York, USA.

To cite the data.table package in publications use:

Dowle M, Srinivasan A (2020). data.table: Extension of 'data.frame'. R package version 1.13.0, https://CRAN.R-project.org/package=data.table.

To cite the ggplot2 package in publications use:

Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org.

To cite the rstan package in publications use:

Stan Development Team (2020). “RStan: the R interface to Stan.” R package version 2.19.3, http://mc-stan.org/.

To cite the survival package in publications use:

Therneau T (2021). A Package for Survival Analysis in R. R package version 3.2-11, https://CRAN.R-project.org/package=survival.


  1. A singularity occurred when trying to allow gap to be a linear spline.↩︎