Longitudinal Ordinal Analysis for ORCHID

Introduction

ORCHID was a randomized clinical trial of Hydroxychloroquine vs. placebo for treatment of hospitalized COVID-19 patients. The main publication is here. The primary analysis for the trial used a covariate-adjusted proportional odds model for the 7-level WHO COVID ordinal scale measured at 14 days. There was little evidence for any effectiveness of the drug, despite the fact that the U.S. stockpiled 63,000,000 doses of it. The purpose of the reanalyses done here is to learn from a longitudinal analysis using all available ordinal outcomes. These are assessed at days 1, 2, 3, 4, 7, 14, 28 after randomization. It will be seen that even with this more powerful longitudinal analysis, there remains very little evidence for drug effectiveness.

After much experimentation with various longitudinal models (and also see here), a first-order Markov state transition model for ordinal outcome states provides the best fit to the correlation structure found in the data. The Markov model uses the previous ordinal state as a (categorical) predictor of the current state. It allows the correlation between states on two successive days to be arbitrarily high within patient, and it elegantly handles absorbing states such as death. Excellent handling of the correlation structure is done without the complexities of random effects. See below for justification of this statement. Go here for a list of papers about Markov longitudinal modeling of binary and ordinal outcomes.

Markov models in their usual form assume that the outcome variable is also assessed at baseline (time zero). Such is the case for ORCHID. The fact that some of the outcome levels (e.g., death) are not used at baseline is of no consequence.

The only obstacle to using transition models is that the primary model parameters (here, in a proportional odds model) relate to the probability of being in the current state or higher given the previous state. For day 1 post randomization, this is no problem. But we need to restate the model to obtain time-unconditional probabilities of being in state y or worse (“state occupancy probabilities”). This is done through recursive (through time) matrix multiplications to “uncondition” the transition probabilities to get probabilities of being in the current state regardless of previous states. This are simple to interpret quantities that are functions of all model parameters. Because of that it is difficult to get confidence intervals and statistical tests using frequentist methods. In examples below it is seen that the Bayesian approach provides exact inference, taking all uncertainties into account. Graphs will show the probability of the patient being at home as a function of time and treatment. Home is almost, but not quite, an absorbing state. So it is possible that such graphs will increase up to a point then slightly decrease (that didn’t happen in this study). If we consider home to be an absorbing state, then the graph is a cumulative incidence graph.

Various ways of restating the model results are also given below. One of these is the mean restricted time to the patient going home.

Note that the transition model needs to take absolute time (days from randomization) into account, as well as previous state. That is because the mix of outcomes changes over time in a way that simple transition odds ratios can’t capture.

To be added to this report is an example of using the fitted model to simulate longitudinal ordinal data, and using the simulation to estimate probabilities of more complex quantities than can easily be obtained by the analytic solutions shown here. For example, one can simulate data to compute the probability that patients on a given treatment with given baseline covariates will be alive on day 28 if they are on a ventilator on day 14.

require(rmsb)

require(data.table)

require(plotly)

knitrSet(lang='markdown', w=7, h=7, dev='png', fig.path='png/a-')
options(prType='html')

# Get around a bug in tooltip construction with ggplotly
# See https://stackoverflow.com/questions/66316337
ggp <- function(obj) {
  g <- ggplotly(obj, tooltip='label')
    d <- g$x$data
    for(i in 1 : length(d)) {
      w <- d[[i]]$text
    if(length(w)) d[[i]]$text <- gsub('txt: ', '', w)
        }
    g$x$data <- d
    g
    }
    
d <- readRDS('source/orchid.rds')
d[, gap := day - shift(day), by=id]
d[is.na(gap), gap := 1]
label(d$gap) <- 'Days since last measurement'
d[, table(day, gap)]

##     gap
## day    1   3   7  14
##   1  479   0   0   0
##   2  479   0   0   0
##   3  478   0   0   0
##   4  475   0   0   0
##   7    0 474   0   0
##   14   0   0 461   0
##   28   0   0   0 447

# Also fetch ORCHID dataset for all days, carrying death forward
dall <- readRDS('source/orchidAllDays.rds')
ordLevels <-
    c('death',
    'hospitalized on invasive mechanical ventilation/ECMO',
    'hospitalized on non-invasive ventilation or high flow nasal cannula',
    'hospitalized on supplemental oxygen',
    'hospitalized not on supplemental oxygen',
    'not hospitalized with limitation of activity',
    'not hospitalized without limitation in activity')

Descriptive Statistics

html(describe(d))

d

18 Variables 3293 Observations

id: Subject ID

n	missing	distinct
3293	0	479

lowest :	ORCH-0001	ORCH-0002	ORCH-0003	ORCH-0004	ORCH-0005
highest:	ORCH-0475	ORCH-0476	ORCH-0477	ORCH-0478	ORCH-0479

age

n	missing	distinct	Info	Mean	Gmd	.05	.10	.25	.50	.75	.90	.95
3293	0	70	1	56.51	18.67	29	35	44	57	68	78	85

lowest : 18 20 22 23 24 , highest: 85 86 87 88 89

icu

n	missing	distinct	Info	Sum	Mean	Gmd
3293	0	2	0.456	616	0.1871	0.3042

bmi: Body Mass Index

n	missing	distinct	Info	Mean	Gmd	.05	.10	.25	.50	.75	.90	.95
3060	233	396	1	32.8	10.01	20.80	22.68	26.84	31.14	36.93	44.46	50.22

lowest :	7.098765	8.650519	11.428571	17.689789	17.968750
highest:	67.520776	67.827191	75.054815	78.694069	96.559285

y0: Baseline status

n	missing	distinct	Info	Mean	Gmd
3293	0	4	0.85	4.114	0.8686

 Value          2     3     4     5
 Frequency    214   364  1549  1166
 Proportion 0.065 0.111 0.470 0.354

deathday: Days from randomization to death

n	missing	distinct	Info	Mean	Gmd	.05	.10	.25	.50	.75	.90	.95
318	2975	25	0.996	15.38	9.578	5	6	8	15	20	28	30

lowest : 2 3 5 6 7 , highest: 28 29 30 32 34

pfratio: PaO2 / FiO2

n	missing	distinct	Info	Mean	Gmd	.05	.10	.25	.50	.75	.90	.95
403	2890	53	0.999	143.8	94	45.00	53.33	80.00	127.50	220.00	267.86	295.24

lowest :	27.36842	30.00000	44.00000	45.00000	50.00000
highest:	290.47619	295.23810	322.50000	343.33333	438.09524

race

n	missing	distinct
3293	0	3

 Value      black white other
 Frequency    789  1451  1053
 Proportion 0.240 0.441 0.320

sfratio

n	missing	distinct	Info	Mean	Gmd	.05	.10	.25	.50	.75	.90	.95
3125	168	161	0.999	348.6	118.6	124.3	178.0	283.3	383.3	438.1	452.4	461.9

lowest :	0.9666667	30.0000000	69.0000000	78.0000000	79.0000000
highest:	457.1428571	461.9047619	466.6666667	471.4285714	476.1904762

sofa_gcs

n	missing	distinct	Info	Mean	Gmd	.05	.10	.25	.50	.75	.90	.95
3293	0	17	0.963	3.088	2.658	0	1	1	2	4	6	9

lowest : 0 1 2 3 4 , highest: 12 13 14 16 17

 Value          0     1     2     3     4     5     6     7     8     9    10    11
 Frequency    189   741   891   503   390   182    84    67    33    61    25    14
 Proportion 0.057 0.225 0.271 0.153 0.118 0.055 0.026 0.020 0.010 0.019 0.008 0.004
                                         
 Value         12    13    14    16    17
 Frequency     54    33    12     7     7
 Proportion 0.016 0.010 0.004 0.002 0.002

sofa_nogcs

n	missing	distinct	Info	Mean	Gmd	.05	.10	.25	.50	.75	.90	.95
3293	0	13	0.959	2.773	2.188	0	1	1	2	4	6	7

lowest : 0 1 2 3 4 , highest: 8 9 11 13 14

 Value          0     1     2     3     4     5     6     7     8     9    11    13
 Frequency    189   783   918   539   359   165    82   110    47    75    12     7
 Proportion 0.057 0.238 0.279 0.164 0.109 0.050 0.025 0.033 0.014 0.023 0.004 0.002
                 
 Value         14
 Frequency      7
 Proportion 0.002

time_to_recovery

n	missing	distinct	Info	Mean	Gmd	.05	.10	.25	.50	.75	.90	.95
3293	0	29	0.98	9.132	10.45	0	0	0	5	15	28	28

lowest : 0 1 2 3 4 , highest: 24 25 26 27 28

sex

n	missing	distinct
3293	0	2

 Value      female   male
 Frequency    1448   1845
 Proportion   0.44   0.56

trt: Treatment

n	missing	distinct
3293	0	2

 Value                 placebo hydroxychloroquine
 Frequency                1631               1662
 Proportion              0.495              0.505

day: Days since randomization

n	missing	distinct	Info	Mean	Gmd
3293	0	7	0.98	8.217	8.698

lowest : 1 2 3 4 7 , highest: 3 4 7 14 28

 Value          1     2     3     4     7    14    28
 Frequency    479   479   478   475   474   461   447
 Proportion 0.145 0.145 0.145 0.144 0.144 0.140 0.136

y: Ordinal Outcome

n	missing	distinct	Info	Mean	Gmd
3293	0	7	0.963	4.712	1.804

lowest : 1 2 3 4 5 , highest: 3 4 5 6 7

 Value          1     2     3     4     5     6     7
 Frequency     50   357   297   817   526   757   489
 Proportion 0.015 0.108 0.090 0.248 0.160 0.230 0.148

yprev: Previous ordinal outcome

n	missing	distinct	Info	Mean	Gmd
3293	0	6	0.952	4.486	1.572

lowest : 2 3 4 5 6 , highest: 3 4 5 6 7

 Value          2     3     4     5     6     7
 Frequency    366   352  1022   683   610   260
 Proportion 0.111 0.107 0.310 0.207 0.185 0.079

gap: Days since last measurement

n	missing	distinct	Info	Mean	Gmd
3293	0	4	0.796	3.892	4.216

 Value          1     3     7    14
 Frequency   1911   474   461   447
 Proportion 0.580 0.144 0.140 0.136

Event Chart for Random Sample of Patients

A status timeline is shown for each of 65 randomly chosen patients. The patients are sorted so that the worst outcomes at the last follow-up day are at the top. One day is subtract from the time variable to make follow-up start at day 0.

ssamp <- sample(unique(d$id), 65, FALSE)
dr        <- subset(d, id %in% ssamp)
dr$id     <- as.integer(as.factor(dr$id))
dr$status <- factor(dr$y, levels=7:1)
dr$day    <- dr$day - 1
multEventChart(status ~ day + id, data=dr,
               absorb='1', sortbylast = TRUE) +
  theme_classic() +
  theme(legend.position='bottom')

Summary of Successive State Transitions

gg <- propsTrans(y ~ day + id, data=dall, maxsize=6, labels=ordLevels) +
        theme(legend.position='bottom',
        axis.text.x=element_text(angle=90, hjust=1))
ggp(gg)

# For a nice event chart for this type of data see https://livefreeordichotomize.com/2020/05/21/survival-model-detective-1

Category Proportions Over Time

The following plot shows how the mix of outcomes changes over time stratified also by treatment.

gg <- propsPO(y ~ trt + day, nrow=1, data=dall) +
        theme(legend.position='bottom',
                      axis.text.x=element_text(angle=90, hjust=1))
ggp(gg)

Single-Day Analysis

To provide a result similar to the ORCHID’s pre-specified primary analysis, we use a proportional odds model for day 14 status in isolation.

d14 <- subset(dall, day == 14)
d14$y0 <- as.factor(d14$y0)
f <- lrm(y ~ trt + y0 + rcs(age, 3) + race + sofa_nogcs + sex, data=d14)
f

Logistic Regression Model

 lrm(formula = y ~ trt + y0 + rcs(age, 3) + race + sofa_nogcs + 
     sex, data = d14)

Frequencies of Responses

   1   2   3   4   5   6   7 
  32  42  12  40  37 169 147

	Model Likelihood Ratio Test	Discrimination Indexes	Rank Discrim. Indexes
Obs 479	LR χ² 176.28	R² 0.320	C 0.731
max \|∂log L/∂β\| 2×10^-12	d.f. 10	g 1.360	D_xy 0.461
	Pr(>χ²) <0.0001	g_r 3.895	γ 0.462
		g_p 0.180	τ_a 0.349
		Brier 0.106

	β	S.E.	Wald Z	Pr(>\|Z\|)
y≥2	4.2312	0.8255	5.13	<0.0001
y≥3	3.0594	0.8146	3.76	0.0002
y≥4	2.8138	0.8141	3.46	0.0005
y≥5	2.1481	0.8141	2.64	0.0083
y≥6	1.6364	0.8134	2.01	0.0442
y≥7	-0.2346	0.8054	-0.29	0.7708
trt=hydroxychloroquine	0.0437	0.1701	0.26	0.7973
y0=3	-0.1120	0.4477	-0.25	0.8025
y0=4	0.9315	0.4550	2.05	0.0406
y0=5	1.5653	0.4925	3.18	0.0015
age	-0.0098	0.0127	-0.77	0.4395
age’	-0.0222	0.0153	-1.45	0.1475
race=white	-0.4846	0.2244	-2.16	0.0308
race=other	0.0313	0.2452	0.13	0.8985
sofa_nogcs	-0.2699	0.0588	-4.59	<0.0001
sex=male	-0.1971	0.1820	-1.08	0.2788

Serial Conditional Markov Ordinal Model

Here we use a first order Markov process where the PO model on a given day is conditional on the ordinal outcome on the previous day as well as on the other variables. The 1-day lagged outcome status is variable yprev. Note that the gaps (1 1 1 3 7 14) between measurement times (1 2 3 4 7 14 28) has a linear correlation with the measurement times of 0.998, so including day as an effect modifier of previous status is similar to allowing the influence of the previous state to be a function of the gap duration.

First do a consolidated analysis in which all observed gap-previous state combinations are modeled as indicator variables. The reference cell here is gap=1 previous state 2. Then look at effects of previous state stratified by distinct levels of gap. Interpretation is clouded by all effects being relative to previous state 2.

d$yprev <- factor(d$yprev)
dd <- datadist(d); options(datadist='dd')
ugaps <- sort(unique(d$gap))
z <- with(d, interaction(gap, yprev, drop=TRUE))
f <- lrm(y ~ z + day + rcs(age, 3) + race + sofa_nogcs + sex, data=d)
co <- coef(f)[grep('z=', names(coef(f)))]
names(co) <- sub('z=', '', names(co))
n  <- names(co)
x  <- strsplit(n, '\\.')
g  <- sapply(x, function(u) as.numeric(u[1]))    # gaps
yp <- sapply(x, function(u) as.numeric(u[2]))    # yprev
ef  <- data.frame(gap=g, yp=yp, effect=unname(co))
# Reference cell in model was gap=1 yprev=2 so add a zero for this
ef <- rbind(data.frame(gap=1, yp=2, effect=0), ef)
print(ef, digits=3)

   gap yp effect
1    1  2  0.000
2    3  2 -0.796
3    7  2 -1.626
4   14  2 -5.056
5    1  3  2.365
6    3  3  2.057
7    7  3  1.790
8   14  3  1.104
9    1  4  4.959
10   3  4  6.356
11   7  4  5.427
12  14  4  4.052
13   1  5  7.388
14   3  5  8.255
15   7  5  7.345
16  14  5  5.390
17   1  6  9.311
18   3  6  9.715
19   7  6  8.069
20  14  6  5.656
21   7  7  9.385
22  14  7  7.396

# Fit gap trend separately by yprev
ggplot(ef, aes(x=gap, y=effect, col=factor(yp))) + geom_line() + geom_point()

n <- c('yprev=3', 'yprev=4', 'yprev=5', 'yprev=6', 'yprev=7')
k <- matrix(NA, nrow=length(ugaps), ncol=length(n), dimnames=list(ugaps, n))
i <- 0
for(gp in ugaps) {
  i <- i + 1
  s <- subset(d, gap == gp)
  s$yprev <- s$yprev[drop=TRUE]
  form <- if(gp == 1) y ~ day + yprev + rcs(age, 3) + race + sofa_nogcs + sex else
    y ~ yprev + rcs(age, 3) + race + sofa_nogcs + sex
  form <- y ~ yprev
  if(gp == 1) form <- y ~ day + yprev
  f <- lrm(form,  data=s)
  m <- intersect(n, names(coef(f)))
  k[i, m] <- coef(f)[m]
}

round(k, 2)

   yprev=3 yprev=4 yprev=5 yprev=6 yprev=7
1     4.28    7.99   11.34   23.54      NA
3     2.43    4.80    5.92    6.78      NA
7     1.71    3.19    4.71    5.45    6.72
14    1.68    3.32    4.47    4.91    6.64

z <- zp <- NULL
xx <- seq(1, 14, by=.1)
w <- data.frame(ugaps=xx)
for(j in 1 : 5) {
  y <- k[, j]
  mods <- if(j == 5) 'power fixed' else  c('linear spline', 'exponential', 'power', 'power fixed')
  for(model in mods) {
  form <- switch(model,
                 'linear spline' = y ~ lsp(ugaps, 3),
                 'exponential'   = log(y) ~ ugaps,
                 'power'         = log(y) ~ log(ugaps),
                 'power fixed'   = NA)
  if(model != 'power fixed') {
    f <- ols(form)
    if(model == 'power') print(coef(f))
    p <- predict(f, w)
    if(model != 'linear spline') p <- exp(p)
  } else {
    const <- exp(mean(log(y / (ugaps ^ -0.32)), na.rm=TRUE))
    p <- const * (xx ^ -0.32)
  }
  prev <- paste('Previous state:', j + 2)
  z  <- rbind(z,  data.frame(model=model, gap=xx,    y=p, yprev=prev))
  zp <- rbind(zp, data.frame(model=model, gap=ugaps, y=y, yprev=prev)) 
  }
}

 Intercept      ugaps 
 1.3760590 -0.3708071 
 Intercept      ugaps 
 2.0091496 -0.3570896 
Intercept     ugaps 
 2.320715 -0.356863 
 Intercept      ugaps 
 2.9231801 -0.5866748

ggplot(z, aes(x=gap, y=y, color=model)) + geom_line() +
  geom_point(data=zp, aes(x=gap, y=y)) +
  facet_wrap(~ yprev)

By combining the patterns of both approaches, time gap is effectively modeled as a linear spline with knot at 3, which allows for a very different effect at the most command gap of 1d. But the use of two terms for the gap effect will result in a singularity when interacting gap with previous state, because some previous states do not occur at some gaps. The power function specifying a gap effect of a constant times \(d^{-0.32}\) where \(d\) is the time gap fits very well except for previous state 6, which is the only state in the above graph for which you can see a difference between the fitted power function (which estimated an exponent of -0.58) and this function. By using a single parameter (the exponent) to capture the gap effect for all possible gaps one can reduce the number of parameters to estimate by omitting previous state main effects from the model (see below). But again the fact that we are subtracting effects for being in previous state 2 makes this confusing. This will be explored in a better way in the VIOLET dataset where artifical gaps can be created in a way that is independent of absolute time, since in VIOLET all consecutive days had outcome assessments.

In formulating our model we need the following considerations.

transitions depend on a flexible function of time
the effect of the previous state may change with time or gap time
the treatment effect may change with time
there may be a weird effect where the effect of treatment may differ according to the previous outcome state

Due to a singularity (caused by collinearity), a model could not be fitted that allowed previous state to interact with both day and gap. Also, a singularity resulted when the linear spline is used for gap. There are no records with yprev=7 for gap < 7. So force gap to be linear for now. Let’s fit two models to see which one more strongly interacts, then use that one by itself.

f1 <- lrm(y ~ gap * yprev + pol(day,2) * (trt + rcs(age, 3) +
                 race + sofa_nogcs + sex) +
         yprev * trt,
         data=d)
f2 <- lrm(y ~ gap  + day * (yprev + trt + rcs(age, 3) +
                 race + sofa_nogcs + sex) +
         yprev * trt,
         data=d)
AIC(f1); AIC(f2)   # first model way better

[1] 7004.827 [1] 7060.269

f <- f1
anova(f)

	χ²	d.f.	P
Wald Statistics for `y`
gap (Factor+Higher Order Factors)	110.32	6	<0.0001
All Interactions	108.29	5	<0.0001
yprev (Factor+Higher Order Factors)	1878.87	15	<0.0001
All Interactions	114.56	10	<0.0001
day (Factor+Higher Order Factors)	129.38	16	<0.0001
All Interactions	45.62	14	<0.0001
Nonlinear (Factor+Higher Order Factors)	79.28	8	<0.0001
trt (Factor+Higher Order Factors)	7.15	8	0.5204
All Interactions	7.11	7	0.4177
age (Factor+Higher Order Factors)	45.95	6	<0.0001
All Interactions	18.47	4	0.0010
Nonlinear (Factor+Higher Order Factors)	3.94	3	0.2683
race (Factor+Higher Order Factors)	12.28	6	0.0560
All Interactions	8.56	4	0.0732
sofa_nogcs (Factor+Higher Order Factors)	29.11	3	<0.0001
All Interactions	9.20	2	0.0100
sex (Factor+Higher Order Factors)	1.19	3	0.7560
All Interactions	0.41	2	0.8151
gap × yprev (Factor+Higher Order Factors)	108.29	5	<0.0001
day × trt (Factor+Higher Order Factors)	2.45	2	0.2932
Nonlinear	1.24	1	0.2663
Nonlinear Interaction : f(A,B) vs. AB	1.24	1	0.2663
day × age (Factor+Higher Order Factors)	18.47	4	0.0010
Nonlinear	10.50	3	0.0148
Nonlinear Interaction : f(A,B) vs. AB	10.50	3	0.0148
f(A,B) vs. Af(B) + Bg(A)	2.82	1	0.0933
Nonlinear Interaction in day vs. Af(B)	9.85	2	0.0073
Nonlinear Interaction in age vs. Bg(A)	3.43	2	0.1799
day × race (Factor+Higher Order Factors)	8.56	4	0.0732
Nonlinear	7.10	2	0.0287
Nonlinear Interaction : f(A,B) vs. AB	7.10	2	0.0287
day × sofa_nogcs (Factor+Higher Order Factors)	9.20	2	0.0100
Nonlinear	8.86	1	0.0029
Nonlinear Interaction : f(A,B) vs. AB	8.86	1	0.0029
day × sex (Factor+Higher Order Factors)	0.41	2	0.8151
Nonlinear	0.00	1	0.9892
Nonlinear Interaction : f(A,B) vs. AB	0.00	1	0.9892
yprev × trt (Factor+Higher Order Factors)	6.07	5	0.2990
TOTAL NONLINEAR	80.29	10	<0.0001
TOTAL INTERACTION	153.84	24	<0.0001
TOTAL NONLINEAR + INTERACTION	234.98	26	<0.0001
TOTAL	2512.52	39	<0.0001

ggplot(Predict(f, gap, yprev))

There was little evidence for a treatment effect at any time, and little evidence that the effect of treatment on state transitions varied with time. There is also little evidence that the treatment effect differed for different previous states.

The graph above shows that being at home without disability is almost an absorbing state, and that the effect of previous state varies with time gap.

For now let’s simplify the model by having previous state only interact with gap (despite some evidence for the age effect varying over time), and removing the interactions with treatment. Now allow the effect of absolute time to be quadratic¹. Compare the variances of parameter estimates with cluster sandwich robust estimates.

f <- lrm(y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 3) + race + sofa_nogcs + sex,
         data=d, x=TRUE, y=TRUE)
f

Logistic Regression Model

 lrm(formula = y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 
     3) + race + sofa_nogcs + sex, data = d, x = TRUE, y = TRUE)

Frequencies of Responses

   1   2   3   4   5   6   7 
  50 357 297 817 526 757 489

	Model Likelihood Ratio Test	Discrimination Indexes	Rank Discrim. Indexes
Obs 3293	LR χ² 4766.76	R² 0.787	C 0.908
max \|∂log L/∂β\| 2×10^-9	d.f. 20	g 4.186	D_xy 0.816
	Pr(>χ²) <0.0001	g_r 65.746	γ 0.821
		g_p 0.306	τ_a 0.667
		Brier 0.058

	β	S.E.	Wald Z	Pr(>\|Z\|)
y≥2	2.4712	0.3281	7.53	<0.0001
y≥3	-1.0620	0.3185	-3.33	0.0009
y≥4	-2.9688	0.3342	-8.88	<0.0001
y≥5	-6.0554	0.3481	-17.40	<0.0001
y≥6	-7.8957	0.3556	-22.20	<0.0001
y≥7	-10.8037	0.3689	-29.28	<0.0001
gap	-0.2184	0.0839	-2.60	0.0093
yprev=3	2.2491	0.2230	10.09	<0.0001
yprev=4	4.8417	0.2303	21.02	<0.0001
yprev=5	7.2783	0.2560	28.43	<0.0001
yprev=6	9.3438	0.2776	33.66	<0.0001
yprev=7	9.2858	0.5779	16.07	<0.0001
day	0.2792	0.0392	7.12	<0.0001
day²	-0.0052	0.0007	-7.01	<0.0001
trt=hydroxychloroquine	0.0134	0.0683	0.20	0.8444
age	-0.0086	0.0049	-1.75	0.0800
age’	-0.0034	0.0058	-0.60	0.5515
race=white	-0.0641	0.0877	-0.73	0.4650
race=other	0.0784	0.0951	0.82	0.4098
sofa_nogcs	-0.0886	0.0205	-4.31	<0.0001
sex=male	-0.0612	0.0719	-0.85	0.3948
gap × yprev=3	0.2064	0.0613	3.37	0.0008
gap × yprev=4	0.3304	0.0385	8.59	<0.0001
gap × yprev=5	0.2404	0.0392	6.13	<0.0001
gap × yprev=6	0.0953	0.0332	2.87	0.0041
gap × yprev=7	0.2302	0.0569	4.04	<0.0001

anova(f)

	χ²	d.f.	P
Wald Statistics for `y`
gap (Factor+Higher Order Factors)	106.77	6	<0.0001
All Interactions	104.49	5	<0.0001
yprev (Factor+Higher Order Factors)	1891.62	10	<0.0001
All Interactions	104.49	5	<0.0001
day	84.64	2	<0.0001
Nonlinear	49.14	1	<0.0001
trt	0.04	1	0.8444
age	26.83	2	<0.0001
Nonlinear	0.35	1	0.5515
race	3.14	2	0.2078
sofa_nogcs	18.61	1	<0.0001
sex	0.72	1	0.3948
gap × yprev (Factor+Higher Order Factors)	104.49	5	<0.0001
TOTAL NONLINEAR	49.55	2	<0.0001
TOTAL NONLINEAR + INTERACTION	186.88	7	<0.0001
TOTAL	2507.19	20	<0.0001

anova(f, age, race, sex)   # combined effects of 3 variables

	χ²	d.f.	P
Wald Statistics for `y`
age	26.83	2	<0.0001
Nonlinear	0.35	1	0.5515
race	3.14	2	0.2078
sex	0.72	1	0.3948
TOTAL	33.50	5	<0.0001

g <- robcov(f, d$id)
htmlVerbatim(round(diag(vcov(f)) / diag(vcov(g)), 2))

                   y>=2                   y>=3                   y>=4 
                   1.40                   1.31                   1.11 
                   y>=5                   y>=6                   y>=7 
                   0.97                   0.96                   0.91 
                    gap                yprev=3                yprev=4 
                   1.20                   0.85                   0.79 
                yprev=5                yprev=6                yprev=7 
                   0.75                   0.79                   0.84 
                    day                  day^2 trt=hydroxychloroquine 
                   1.52                   0.72                   1.22 
                    age                   age'             race=white 
                   1.53                   1.26                   1.24 
             race=other             sofa_nogcs               sex=male 
                   1.22                   1.16                   1.24 
          gap * yprev=3          gap * yprev=4          gap * yprev=5 
                   0.21                   0.52                   0.52 
          gap * yprev=6          gap * yprev=7 
                   0.57                   0.72

The two variance estimates are similar, giving an indirect indication that the assumption of conditional independence (conditioning on previous state) is reasonable. The variance ratio is 1.2 for treatment (ordinary variance estimate is larger than the intra-cluster correlation-robust estimate).

Proportional Odds Assumption

The treatment is too weak to detect any non-proportional odds. Let’s check the PO assumption on the two strongest variables: gap and yprev. We omit important interactions between them and make day linear to be able to limit the number of regression coefficient estimates to examine as a function of y-cutoff. Expect results for the first cutoff, which corresponds to predicting deaths, to be unstable due to there being only 50 deaths.

n <- c('yprev=3', 'yprev=4', 'yprev=5', 'yprev=6', 'yprev=7', 'day', 'gap')
z <- se <- matrix(NA, nrow=6, ncol=length(n))
dimnames(z) <- dimnames(se) <- list(paste0('y>=', 2:7), n)
for(ycut in 2 : 7) {
    g <- lrm(y >= ycut ~ gap + yprev + day + trt + rcs(age, 3) + race +
                     sofa_nogcs + sex,
                     data=d, tol=1e-14, maxit=30)
    z [ycut - 1, n] <- coef(g)[n]
    se[ycut - 1, n] <- sqrt(diag(vcov(g)[n, n])) 
}
round(z, 2)

     yprev=3 yprev=4 yprev=5 yprev=6 yprev=7   day   gap
y>=2    0.41    1.70    2.76    2.71   10.61 -0.30  0.35
y>=3    3.29    5.09    6.43    6.20   13.45 -0.05  0.11
y>=4    1.20    4.82    6.59    6.72   12.62  0.24 -0.32
y>=5    1.94    3.35    6.70    8.10    8.02  0.41 -0.53
y>=6    2.58    3.82    5.01    8.42    7.42  0.64 -0.93
y>=7    2.32    3.06    3.75    4.50    6.08  0.28 -0.35

round(se, 2)

     yprev=3 yprev=4 yprev=5 yprev=6 yprev=7  day  gap
y>=2    0.47    0.45    0.69    0.64   27.72 0.24 0.47
y>=3    0.24    0.28    0.54    0.54   30.24 0.10 0.20
y>=4    0.26    0.27    0.41    0.55   20.70 0.08 0.16
y>=5    0.42    0.39    0.42    0.54    1.07 0.07 0.13
y>=6    0.55    0.51    0.52    0.56    0.64 0.07 0.14
y>=7    1.08    1.02    1.02    1.02    1.03 0.11 0.20

The effect of day and gap are roughly linear in the y-cutoff.
yprev may be predicting death differently from how it predicts the nonfatal outcomes.

The VGAM package vglm function would not converge for a partial PO model that allowed for non-PO in yprev.

Computing Unconditional Estimates

One can uncondition the previous model to obtain unconditional treatment effects, although these will be covariate- and time-dependent. We specify covariate settings (here we use representative values for each covariate, just for convenience, using a method described below) and vary the treatment assignment. Note the following.

The initial state (baseline ordinal status) is always conditioned on, so is not marginalized out. Calculations can be done separately for each initial state.
Given this initial state, use the fitted model to get the conditional distribution of Y at (post-randomization) day 1 given the initial state, covariate settings, and treatment.
Likewise for later days.
Death is an absorbing state so once death occurs all future status values are also set to death.
Were time to be continuous one can find the first time for which the probability of being in state y or worse exceeds 0.5, which would define the median time until reaching that state. For the ORCHID study, gaps between measurements are too long for this unless time-interpolation is used. It may be better to focus on estimating the probability of being in state y or worse as a function of days since randomization, treatment, and covariates (covariates include the initial state).
This approach does not factor in uncertainties in parameter estimates, so provides only a point estimate. The simple Bayesian solution to this is provided later.

Start with computation of a set of representative covariate values that will be used when getting estimates related to transformed parameters. This is done by using the global transition model fitted above, evaluating it at day 14 for a previous state of y=4, and finding covariate settings that give predicted values that are close to the unadjusted estimates for placebo. Of the qualifying settings, one that is subjectively closer to the center of the data distribution is chosen. For simplicity we drop weak categorical predictors sex and race. To better fit we allow time to have a nonlinear effect. First let’s see the distribution of day by yprev.

ggplot(d, aes(x=day)) + geom_histogram() + facet_wrap(~ yprev)

# For yprev=7 there are only 2 distinct assessment days (14, 28)
# So need interaction to be restricted to be linear
f <- lrm(y ~ gap * yprev + rcs(day, 4) + trt + rcs(age, 3) + sofa_nogcs,
         data=d)
newdat <- subset(d, day == 1)
newdat <- transform(newdat, day=14, yprev='4', gap=1)
xb1 <- predict(f, newdat)  # uses first intercept by default

# Do the same when covariates are omitted
fu <- lrm(y ~ gap * yprev + rcs(day, 4), data=d)
xb2 <- predict(fu, newdat[1, ])
diff <- abs(xb1 - xb2)
w <- newdat[, c('age', 'sofa_nogcs', 'trt')]
w$diff <- abs(xb1 - xb2)
setDT(w)
wi <- w[order(diff), ]
# Compute mean and median baseline covariates
co <- all.vars(~ age + sofa_nogcs)
setDT(newdat)
h <- function(x) c(mean=mean(x), median=median(x))
u <- function(x) { s <- sapply(x, h); w <- as.list(s)
                   names(w) <- outer(rownames(s), colnames(s), paste, sep='.'); w}
newdat[, u(.SD), .SDcols=co]

   mean.age median.age mean.sofa_nogcs median.sofa_nogcs
1: 56.85386         57        2.814196                 2

# Seek age=57, sofa 2 or 3
wi[1:30, ]

    age sofa_nogcs                trt        diff
 1:  47          7 hydroxychloroquine 0.001365465
 2:  63          5 hydroxychloroquine 0.001857581
 3:  76          3 hydroxychloroquine 0.002198086
 4:  75          3            placebo 0.002726674
 5:  88          1 hydroxychloroquine 0.005283669
 6:  70          4 hydroxychloroquine 0.006873129
 7:  82          2            placebo 0.007587458
 8:  48          7 hydroxychloroquine 0.008507628
 9:  63          5            placebo 0.010995304
10:  45          7            placebo 0.011599411
11:  75          3 hydroxychloroquine 0.011864397
12:  75          3 hydroxychloroquine 0.011864397
13:  83          2 hydroxychloroquine 0.012547716
14:  56          6            placebo 0.014215440
15:  64          5 hydroxychloroquine 0.014776646
16:  64          5 hydroxychloroquine 0.014776646
17:  77          3 hydroxychloroquine 0.016282989
18:  77          3 hydroxychloroquine 0.016282989
19:  48          7            placebo 0.017645351
20:  89          1            placebo 0.017952035
21:  89          1            placebo 0.017952035
22:  89          1            placebo 0.017952035
23:  87          1 hydroxychloroquine 0.019381651
24:  68          4 hydroxychloroquine 0.020411089
25:  71          4 hydroxychloroquine 0.020655362
26:  61          5 hydroxychloroquine 0.023464890
27:  61          5 hydroxychloroquine 0.023464890
28:  80          2 hydroxychloroquine 0.029746227
29:  67          4 hydroxychloroquine 0.033890653
30:  72          4 hydroxychloroquine 0.034516065
    age sofa_nogcs                trt        diff

# There are no good compromises so use marginal representative values
repcov <- list(age=57, sofa_nogcs=3)

There were no representative covariate combinations that were sufficiently close to unadjusted estimates and close to the center of covariate distributions, so we chose marginally central values.

Compute \(\Pr(Y_s \geq y | X, Y_0)\) analytically for a single value of \((y, Y_0, X, s)\), using this recursive formula where \(k\) is the number of outcome states and \(t\) is time:

\[\Pr(Y_{t} = y | X) = \sum_{j=1}^{k} \Pr(Y_{t} = y | X, Y_{t-1} = j) \Pr(Y_{t-1} = j | X)\] Note that recent updates to the R Hmisc package make the uprob function below obsolete. The soprobMarkovOrd function will compute state occupancy probabilities for given covariate settings. For Bayesian models Hmisc has soprobMarkovOrdm.

# Compute unconditional probabilities
# Output is a matrix with s rows corresponding to t=times(1), times(2), ...
# Each row has columns corresponding to the probabilities of being in each of the states
# times(1) must be 1
# 
# fit is the model fit which must contain the variable named day and the
# previous state, which must be named by the value of pvarname
# data is a data frame or data table that contains covariate settings
# other than day and what's in pvarname.  data has one row.
# times is the vector of days for which to evaluate state occupancy probabilities
# ylevels are the ordered levels of the outcome variable
# Set gap=TRUE if predictor gap is in the model and gap needs to be computed
# At t=times[1] gap is taken as times[1]

uprob <- function(fit, data, times, ylevels, pvarname='yprev', gap=FALSE) {
  k <- length(ylevels)
  s <- length(times)
  P <- matrix(NA, nrow=s, ncol=k)
  colnames(P) <- as.character(ylevels)
  rownames(P) <- as.character(times)
  # Never uncondition on initial state
  data$day <- times[1]
  if(gap) data$gap <- times[1]
  p <- predict(fit, data, type='fitted.ind')
  P[1, ] <- p
  # Matrix of conditional probabilities of Y conditioning on previous time Y
  # Columns = k conditional probabilities conditional on a single previous state
  # Rows    = all possible previous states
  cp <- matrix(NA, nrow=k, ncol=k, 
               dimnames=list(paste('t-1', ylevels), paste('t', ylevels)))
  data <- as.list(data)
  data[[pvarname]] <- as.character(ylevels[-1])
  # Above: don't request estimates after absorbing state
  edata <- expand.grid(data)
  for(it in 2 : s) {
    edata$day <- times[it]
    if(gap) edata$gap <- times[it] - times[it - 1]
    # edata$gap <- c(1,1,1,1,3,3,3,7,7,7,7,7,7,7,rep(14, 14))[times[it]]
    # y=1 = death = absorbing state
    cp <- rbind(c(1., rep(0., k - 1)), predict(fit, edata, type='fitted.ind'))
    # Compute unconditional probabilities of being in all possible states
    # at current time t
    P[it, ] <- t(cp) %*% P[it - 1, ]
    }
  P
}

dat <- data.frame(yprev='4', age=repcov$age, sofa_nogcs=repcov$sofa_nogcs)
  
times <- c(1, 2, 3, 4, 7, 14, 28)
tlev  <- levels(d$trt)

U <- NULL
for(trt in tlev) {   # separately for treatments
    dat$trt <- trt
    u <- uprob(f, dat, times, 1:7, gap=TRUE)
  cat('\n', trt, '\n\n')
  print(round(u, 3))
  U <- rbind(U, data.frame(trt, day=times, p=u, phome=u[, 6] + u[, 7]))
}


 placebo 

       1     2     3     4     5     6     7
1  0.001 0.037 0.172 0.646 0.118 0.024 0.001
2  0.007 0.085 0.157 0.491 0.173 0.077 0.008
3  0.016 0.100 0.130 0.395 0.194 0.141 0.024
4  0.028 0.107 0.117 0.339 0.189 0.184 0.036
7  0.033 0.076 0.077 0.223 0.209 0.282 0.100
14 0.038 0.051 0.038 0.104 0.150 0.394 0.225
28 0.042 0.035 0.014 0.036 0.056 0.374 0.444

 hydroxychloroquine 

       1     2     3     4     5     6     7
1  0.001 0.036 0.171 0.646 0.119 0.025 0.001
2  0.007 0.084 0.156 0.491 0.174 0.078 0.009
3  0.016 0.098 0.129 0.394 0.195 0.142 0.024
4  0.027 0.106 0.116 0.338 0.190 0.186 0.036
7  0.033 0.075 0.076 0.222 0.209 0.284 0.101
14 0.037 0.050 0.038 0.103 0.149 0.395 0.229
28 0.041 0.034 0.013 0.035 0.055 0.373 0.448

Rorchid <- U

Plot the unconditional probabilities of being at home at each time, for representative age and sofa score.

xl <- xlab('Day'); yl <- ylab('Probability of Being at Home')
U$txt <- with(U, paste0(trt, '<br>Day:', day, '<br>', round(phome, 3)))
gg <- ggplot(U, aes(x=day, y=phome, color=trt, label=txt)) +
      geom_line() +
      xl + yl + labs(color='')
ggp(gg)

Repeat but estimate probabilities at days not observed in the data.

U$type <- 'Actual Times'
for(trt in tlev) {   # separately for treatments
    dat$trt <- trt
    u <- uprob(f, dat, 1:28, 1:7, gap=TRUE)
  cat('\n', trt, '\n\n')
  print(round(u, 3))
  lab <- paste(trt, 'day 1-28')
  U <- rbind(U, data.frame(trt, day=1:28, p=u, phome=u[, 6] + u[, 7],
                                                 type='Days 1-28', txt=''))
}


 placebo 

       1     2     3     4     5     6     7
1  0.001 0.037 0.172 0.646 0.118 0.024 0.001
2  0.007 0.085 0.157 0.491 0.173 0.077 0.008
3  0.016 0.100 0.130 0.395 0.194 0.141 0.024
4  0.028 0.107 0.117 0.339 0.189 0.184 0.036
5  0.046 0.126 0.119 0.317 0.179 0.184 0.030
6  0.084 0.162 0.138 0.309 0.162 0.129 0.014
7  0.170 0.216 0.168 0.275 0.111 0.056 0.004
8  0.342 0.264 0.164 0.173 0.042 0.013 0.001
9  0.609 0.231 0.091 0.058 0.008 0.002 0.000
10 0.867 0.102 0.022 0.009 0.001 0.000 0.000
11 0.980 0.017 0.002 0.001 0.000 0.000 0.000
12 0.999 0.001 0.000 0.000 0.000 0.000 0.000
13 1.000 0.000 0.000 0.000 0.000 0.000 0.000
14 1.000 0.000 0.000 0.000 0.000 0.000 0.000
15 1.000 0.000 0.000 0.000 0.000 0.000 0.000
16 1.000 0.000 0.000 0.000 0.000 0.000 0.000
17 1.000 0.000 0.000 0.000 0.000 0.000 0.000
18 1.000 0.000 0.000 0.000 0.000 0.000 0.000
19 1.000 0.000 0.000 0.000 0.000 0.000 0.000
20 1.000 0.000 0.000 0.000 0.000 0.000 0.000
21 1.000 0.000 0.000 0.000 0.000 0.000 0.000
22 1.000 0.000 0.000 0.000 0.000 0.000 0.000
23 1.000 0.000 0.000 0.000 0.000 0.000 0.000
24 1.000 0.000 0.000 0.000 0.000 0.000 0.000
25 1.000 0.000 0.000 0.000 0.000 0.000 0.000
26 1.000 0.000 0.000 0.000 0.000 0.000 0.000
27 1.000 0.000 0.000 0.000 0.000 0.000 0.000
28 1.000 0.000 0.000 0.000 0.000 0.000 0.000

 hydroxychloroquine 

       1     2     3     4     5     6     7
1  0.001 0.036 0.171 0.646 0.119 0.025 0.001
2  0.007 0.084 0.156 0.491 0.174 0.078 0.009
3  0.016 0.098 0.129 0.394 0.195 0.142 0.024
4  0.027 0.106 0.116 0.338 0.190 0.186 0.036
5  0.045 0.124 0.118 0.316 0.180 0.186 0.030
6  0.083 0.161 0.137 0.309 0.164 0.132 0.015
7  0.167 0.214 0.168 0.277 0.113 0.057 0.005
8  0.337 0.264 0.165 0.176 0.043 0.014 0.001
9  0.604 0.233 0.093 0.060 0.008 0.002 0.000
10 0.863 0.104 0.023 0.009 0.001 0.000 0.000
11 0.979 0.018 0.002 0.001 0.000 0.000 0.000
12 0.999 0.001 0.000 0.000 0.000 0.000 0.000
13 1.000 0.000 0.000 0.000 0.000 0.000 0.000
14 1.000 0.000 0.000 0.000 0.000 0.000 0.000
15 1.000 0.000 0.000 0.000 0.000 0.000 0.000
16 1.000 0.000 0.000 0.000 0.000 0.000 0.000
17 1.000 0.000 0.000 0.000 0.000 0.000 0.000
18 1.000 0.000 0.000 0.000 0.000 0.000 0.000
19 1.000 0.000 0.000 0.000 0.000 0.000 0.000
20 1.000 0.000 0.000 0.000 0.000 0.000 0.000
21 1.000 0.000 0.000 0.000 0.000 0.000 0.000
22 1.000 0.000 0.000 0.000 0.000 0.000 0.000
23 1.000 0.000 0.000 0.000 0.000 0.000 0.000
24 1.000 0.000 0.000 0.000 0.000 0.000 0.000
25 1.000 0.000 0.000 0.000 0.000 0.000 0.000
26 1.000 0.000 0.000 0.000 0.000 0.000 0.000
27 1.000 0.000 0.000 0.000 0.000 0.000 0.000
28 1.000 0.000 0.000 0.000 0.000 0.000 0.000

# Plotly doesn't work will with both color and linetype
ggplot(U, aes(x=day, y=phome, color=trt, linetype=type)) + geom_line() +
    xl + yl + labs(color='', linetype='')

Everything has gone haywire. This is caused by the fact that both gap and day needed to be in the model to have the right main effects and interactions, but these two variables are collinear. We are requesting predictions above for combinations of day and gap that did not occur in the data, i.e., gap=1 when day> 4. Collinearities cause difficulties when geting predictions for out-of-sample data in which the collinearities disagree with those in the sample used in fitting the model.

So when the design is such that one needs to interact gap with previous state and gap is not randomly assigned or is otherwise tangled with absolute time, it is not possible to get estimates for unobserved times.

Before giving up, we’ll try a model below that does not contain gap, and that flexibly models the day main effect using a spline,

Markov Model Without Covariate Adjustment

Let’s repeat the exercise with no covariate adjustment to check how we are estimating the proportion of patients at home at each observed day. We relax the model to allow it to fit any time trend except that the interaction between day and previous state is restricted to be linear in day. We are making proportional odds assumptions for day, previous state, and treatment. The PO assumption will be less consequential for treatment because of its weak effect.

dfull <- dall[y0 == 4, .(day, y, trt, id)]
# Compute days since previous measurements
s <- dfull[, .(phome = mean(y >= 6)), by=.(trt, day)]
s[, type := 'Empirical']
empirical <- s

# It would be nice to just have day in the model and not gap (which can be
# derived from day), but the collinearities between yprev and day prevent
# this, and we must interact day with yprev - see the crosstab below

d[, table(day, yprev)]

    yprev
day    2   3   4   5   6   7
  1   32  55 224 168   0   0
  2   48  70 209 142  10   0
  3   52  73 189 113  51   0
  4   63  57 157  95 103   0
  7   69  50 132  75 148   0
  14  60  35  71  53 129 113
  28  42  12  40  37 169 147

# Treat day as numeric but with all needed variables added (less one)
# to make it be modeled as categorical except for the interaction term
# Using records truncated at death
fplain  <- lrm(y ~ gap * yprev + pol(day, 2) + trt, data=d)
# The following has a slightly greater LR chi-square:
fplain2 <- lrm(y ~ rcs(day, 4) + yprev + day %ia% yprev + trt, data=d)
# lrm(y ~ sqrt(day) * yprev + trt, data=d) also fits well
fplain

Logistic Regression Model

 lrm(formula = y ~ gap * yprev + pol(day, 2) + trt, data = d)

Frequencies of Responses

   1   2   3   4   5   6   7 
  50 357 297 817 526 757 489

	Model Likelihood Ratio Test	Discrimination Indexes	Rank Discrim. Indexes
Obs 3293	LR χ² 4705.63	R² 0.783	C 0.905
max \|∂log L/∂β\| 1×10^-9	d.f. 14	g 4.121	D_xy 0.810
	Pr(>χ²) <0.0001	g_r 61.595	γ 0.825
		g_p 0.305	τ_a 0.662
		Brier 0.058

	β	S.E.	Wald Z	Pr(>\|Z\|)
y≥2	1.2998	0.1895	6.86	<0.0001
y≥3	-2.1762	0.1821	-11.95	<0.0001
y≥4	-4.0680	0.2103	-19.34	<0.0001
y≥5	-7.1255	0.2356	-30.25	<0.0001
y≥6	-8.9419	0.2482	-36.03	<0.0001
y≥7	-11.8166	0.2703	-43.71	<0.0001
gap	-0.2125	0.0837	-2.54	0.0111
yprev=3	2.4481	0.2163	11.32	<0.0001
yprev=4	5.1622	0.2192	23.56	<0.0001
yprev=5	7.6626	0.2407	31.83	<0.0001
yprev=6	9.7256	0.2657	36.60	<0.0001
yprev=7	9.6987	0.5701	17.01	<0.0001
day	0.2786	0.0391	7.13	<0.0001
day²	-0.0051	0.0007	-6.94	<0.0001
trt=hydroxychloroquine	-0.0131	0.0679	-0.19	0.8470
gap × yprev=3	0.2093	0.0605	3.46	0.0005
gap × yprev=4	0.3010	0.0381	7.91	<0.0001
gap × yprev=5	0.2109	0.0388	5.44	<0.0001
gap × yprev=6	0.0820	0.0330	2.48	0.0130
gap × yprev=7	0.2155	0.0566	3.80	0.0001

anova(fplain)

	χ²	d.f.	P
Wald Statistics for `y`
gap (Factor+Higher Order Factors)	93.36	6	<0.0001
All Interactions	90.84	5	<0.0001
yprev (Factor+Higher Order Factors)	2192.50	10	<0.0001
All Interactions	90.84	5	<0.0001
day	83.98	2	<0.0001
Nonlinear	48.22	1	<0.0001
trt	0.04	1	0.8470
gap × yprev (Factor+Higher Order Factors)	90.84	5	<0.0001
TOTAL NONLINEAR + INTERACTION	170.18	6	<0.0001
TOTAL	2506.66	14	<0.0001

dat <- data.frame(yprev='4')
  
times <- c(1, 2, 3, 4, 7, 14, 28)
# gaps  <- c(1, diff(times))
tlev  <- levels(d$trt)

for(trt in tlev) {   # separately for treatments
    dat$trt <- trt
    dat$gap <- 1
    u <- uprob(fplain2, dat, times, 1:7, gap=TRUE)
  cat('\n', trt, '\n\n')
  print(round(u, 3))
  s <- rbind(s, data.frame(trt, day=times, p=u, phome=u[, 6] + u[, 7],
                                                 type='Model'), fill=TRUE)
}


 placebo 

       1     2     3     4     5     6     7
1  0.001 0.036 0.168 0.643 0.124 0.027 0.002
2  0.010 0.092 0.161 0.485 0.164 0.078 0.009
3  0.024 0.113 0.132 0.387 0.184 0.137 0.024
4  0.038 0.110 0.104 0.309 0.192 0.200 0.048
7  0.048 0.087 0.071 0.210 0.194 0.284 0.104
14 0.054 0.059 0.039 0.096 0.135 0.365 0.253
28 0.061 0.042 0.014 0.037 0.053 0.339 0.454

 hydroxychloroquine 

       1     2     3     4     5     6     7
1  0.001 0.037 0.169 0.642 0.123 0.026 0.002
2  0.010 0.093 0.163 0.485 0.163 0.077 0.009
3  0.025 0.114 0.133 0.387 0.182 0.135 0.024
4  0.039 0.112 0.105 0.310 0.191 0.197 0.047
7  0.050 0.089 0.072 0.212 0.194 0.282 0.102
14 0.056 0.060 0.039 0.097 0.136 0.364 0.248
28 0.063 0.043 0.015 0.037 0.054 0.341 0.448

ggplot(s, aes(x=day, y=phome, color=trt, linetype=type)) + geom_line() + xl + yl +
    labs(color='', linetype='')

Now add estimates for all days 1-28.

for(trt in tlev) {   # separately for treatments
    dat$trt <- trt
    u <- uprob(fplain, dat, 1:28, 1:7, gap=TRUE)
  cat('\n', trt, '\n\n')
  print(round(u, 3))
  s <- rbind(s, data.frame(trt=paste(trt, 'days 1-28'), day=1:28, p=u,
                                                 phome=u[, 6] + u[, 7],
                                                 type='Model', times='Days 1-28'), fill=TRUE)
}


 placebo 

       1     2     3     4     5     6     7
1  0.001 0.033 0.155 0.643 0.136 0.030 0.002
2  0.009 0.086 0.154 0.483 0.173 0.085 0.010
3  0.022 0.108 0.131 0.386 0.185 0.142 0.026
4  0.035 0.107 0.108 0.314 0.187 0.200 0.049
5  0.046 0.094 0.088 0.254 0.181 0.255 0.082
6  0.053 0.076 0.071 0.204 0.169 0.303 0.124
7  0.058 0.056 0.055 0.160 0.152 0.341 0.177
8  0.060 0.039 0.042 0.122 0.132 0.366 0.238
9  0.062 0.025 0.029 0.091 0.111 0.377 0.305
10 0.063 0.015 0.020 0.065 0.090 0.374 0.373
11 0.064 0.008 0.012 0.045 0.071 0.361 0.439
12 0.064 0.004 0.007 0.030 0.055 0.340 0.500
13 0.064 0.002 0.004 0.019 0.043 0.315 0.552
14 0.064 0.001 0.002 0.013 0.034 0.290 0.597
15 0.064 0.000 0.001 0.009 0.027 0.266 0.633
16 0.064 0.000 0.001 0.006 0.022 0.244 0.663
17 0.064 0.000 0.000 0.005 0.019 0.224 0.688
18 0.064 0.000 0.000 0.004 0.016 0.208 0.708
19 0.064 0.000 0.000 0.003 0.015 0.194 0.724
20 0.064 0.000 0.000 0.003 0.013 0.182 0.738
21 0.064 0.000 0.000 0.003 0.012 0.172 0.749
22 0.064 0.000 0.000 0.002 0.011 0.164 0.758
23 0.064 0.000 0.000 0.002 0.011 0.158 0.765
24 0.064 0.000 0.000 0.002 0.010 0.153 0.770
25 0.064 0.000 0.000 0.002 0.010 0.150 0.774
26 0.064 0.000 0.000 0.002 0.010 0.148 0.776
27 0.064 0.000 0.000 0.002 0.010 0.147 0.777
28 0.064 0.000 0.000 0.002 0.010 0.147 0.777

 hydroxychloroquine 

       1     2     3     4     5     6     7
1  0.001 0.033 0.157 0.643 0.135 0.030 0.002
2  0.009 0.087 0.156 0.483 0.171 0.083 0.010
3  0.023 0.110 0.132 0.386 0.183 0.140 0.025
4  0.036 0.109 0.109 0.315 0.186 0.197 0.048
5  0.047 0.096 0.089 0.256 0.181 0.251 0.080
6  0.054 0.078 0.072 0.206 0.169 0.300 0.121
7  0.059 0.058 0.057 0.162 0.153 0.338 0.173
8  0.062 0.040 0.043 0.124 0.133 0.364 0.234
9  0.064 0.026 0.030 0.093 0.112 0.376 0.300
10 0.065 0.016 0.020 0.066 0.091 0.374 0.368
11 0.065 0.009 0.013 0.046 0.072 0.361 0.433
12 0.066 0.005 0.007 0.031 0.056 0.341 0.494
13 0.066 0.002 0.004 0.020 0.044 0.317 0.547
14 0.066 0.001 0.002 0.013 0.034 0.292 0.592
15 0.066 0.000 0.001 0.009 0.027 0.268 0.629
16 0.066 0.000 0.001 0.006 0.022 0.246 0.659
17 0.066 0.000 0.000 0.005 0.019 0.226 0.684
18 0.066 0.000 0.000 0.004 0.017 0.210 0.704
19 0.066 0.000 0.000 0.003 0.015 0.195 0.720
20 0.066 0.000 0.000 0.003 0.013 0.184 0.734
21 0.066 0.000 0.000 0.003 0.012 0.174 0.745
22 0.066 0.000 0.000 0.002 0.012 0.166 0.754
23 0.066 0.000 0.000 0.002 0.011 0.159 0.761
24 0.066 0.000 0.000 0.002 0.010 0.154 0.767
25 0.066 0.000 0.000 0.002 0.010 0.151 0.771
26 0.066 0.000 0.000 0.002 0.010 0.149 0.773
27 0.066 0.000 0.000 0.002 0.010 0.148 0.774
28 0.066 0.000 0.000 0.002 0.010 0.149 0.773

ggplot(s, aes(x=day, y=phome, color=trt, linetype=type)) + geom_line() + xl + yl +
    labs(color='', linetype='')

Again there is a problem when estimating days not occurring in the data. Try a model that models day flexibly as a main effect, but that interacts it only linearly with previous state.

dat <- data.frame(yprev='4')
  
U <- empirical
for(trt in tlev) {   # separately for treatments
    dat$trt <- trt
    u <- uprob(fplain2, dat, 1:28, 1:7, gap=FALSE)
  U <- rbind(U, data.frame(trt, day=1:28, p=u, phome=u[, 6] + u[, 7],
                                                 type='Model days 1-28'), fill=TRUE)
}
ggplot(U, aes(x=day, y=phome, color=trt, linetype=type)) + geom_line() +
    xl + yl + labs(color='', linetype='')

We see the same problem as before.

Checking Validity of Method for Unequal Time Spacing Using VIOLET 2

We saw above that it is not valid to use a simple model fitted on sparse times to estimate state occupancy probabilities at unobserved times. Let’s now check that estimates are valid when observation timings are irregular but estimates are requested at actually occurring times. For this purpose we use the VIOLET 2 trial, which had a 4-level ordinal outcome scale measured on days 1-28 (records terminate at death). We’ll fit a non-covariate-adjusted (except for treatment) model using only days 1 2 3 4 7 14 28. Then we will request state occupancy probabilities at only those days, then at all days. But first start with fitting using all days and estimating all days.

Something strange happened with regard to ventilator use on day 28, so restrict analysis to first 27 days.

dcf <- readRDS('dcf.rds')
setDT(dcf)
dcf <- dcf[day < 28]

# Compute empirical proportions of being at home for those starting In Hospital, not on vent
dfull <- dcf[, .(id, day, bstatus, status, treatment, ddeath)]
emp <- dfull[bstatus == 'In Hospital',
             .(phome = mean(status == 'Home', na.rm=TRUE)),
             by=.(treatment, day)]
emp[, type := 'Empirical']
emp[, day  := unclass(day)]

dv <- dfull[day <= ddeath, ]   # don't carry death forward
s <- dv$bstatus
levels(s) <- list('Vent/ARDS'='ARDS', 'Vent/ARDS'='On Vent',
                                    'In Hospital/Facility'='In Hospital')
dv[, table(bstatus, s)]

             s
bstatus       Vent/ARDS In Hospital/Facility
  ARDS             2062                    0
  On Vent          8828                    0
  In Hospital         0                22229

dv[, bstatus := s]
setkey(dv, id, day)
dv[, pstatus := shift(status), by=id]
dv[day == 1, pstatus := bstatus]
dv[, pstatus := pstatus[drop=TRUE]]   # drop unused previous status Dead
ggplot(dv, aes(x=day)) + geom_histogram(binwidth=1) + facet_wrap(~ pstatus)

dd <- datadist(dv); options(datadist='dd')

# d2 <- dv[bstatus == 'In Hospital/Facility', ]
# Find model with best AIC
for(k in 3 : 10) {
  h <- lrm(status ~ rcs(day, k) * pstatus + treatment, data=dv, maxit=40, tol=1e-14)
  if(k == 7) f27 <- h
  cat(k, 'knots\n')
  print(AIC(h))
}

3 knots
[1] 12877.46
4 knots
[1] 12842.47
5 knots
[1] 12832.9
6 knots
[1] 12826.01
7 knots
[1] 12819.5
8 knots
[1] 12823.38
9 knots
[1] 12822.33
10 knots
[1] 12823.41

# Use 7 knots
dat <- data.frame(pstatus='In Hospital/Facility')
  
#times <- c(1, 2, 3, 4, 7, 14, 28)
tlev  <- unique(dv$treatment)
ylev  <- levels(dv$status)
times <- 1:28

R <- emp
for(trt in tlev) {   # separately for treatments
    dat$treatment <- trt
    u <- uprob(f27, dat, times, ylev, pvarname='pstatus')
  cat('\n', trt, '\n\n')
  print(round(u, 3))
  w <- data.table(treatment=trt, day=times, p=u, phome=u[, 4], type='Model')
  R <- rbind(R, w, fill=TRUE)
}


 Placebo 

    Dead Vent/ARDS In Hospital/Facility  Home
1  0.000     0.018                0.925 0.056
2  0.001     0.027                0.846 0.127
3  0.002     0.030                0.757 0.211
4  0.003     0.031                0.663 0.304
5  0.004     0.030                0.574 0.392
6  0.005     0.029                0.499 0.467
7  0.007     0.029                0.440 0.525
8  0.008     0.029                0.394 0.569
9  0.010     0.029                0.359 0.603
10 0.012     0.029                0.328 0.631
11 0.014     0.029                0.303 0.655
12 0.016     0.029                0.280 0.676
13 0.018     0.028                0.259 0.695
14 0.020     0.028                0.241 0.711
15 0.022     0.028                0.225 0.725
16 0.024     0.028                0.212 0.737
17 0.026     0.028                0.199 0.747
18 0.028     0.029                0.188 0.755
19 0.030     0.029                0.178 0.763
20 0.032     0.029                0.169 0.770
21 0.034     0.028                0.161 0.776
22 0.036     0.028                0.154 0.782
23 0.038     0.027                0.147 0.787
24 0.040     0.027                0.141 0.792
25 0.042     0.026                0.135 0.797
26 0.045     0.025                0.129 0.801
27 0.047     0.025                0.123 0.805
28 0.049     0.024                0.117 0.809

 Vitamin D 

    Dead Vent/ARDS In Hospital/Facility  Home
1  0.000     0.018                0.925 0.056
2  0.001     0.027                0.845 0.127
3  0.002     0.030                0.756 0.212
4  0.003     0.030                0.662 0.305
5  0.004     0.030                0.573 0.394
6  0.005     0.029                0.497 0.468
7  0.007     0.028                0.438 0.527
8  0.008     0.028                0.393 0.570
9  0.010     0.029                0.357 0.604
10 0.012     0.029                0.327 0.632
11 0.014     0.029                0.301 0.657
12 0.016     0.028                0.278 0.678
13 0.018     0.028                0.258 0.697
14 0.019     0.028                0.240 0.713
15 0.021     0.027                0.224 0.727
16 0.023     0.028                0.210 0.739
17 0.025     0.028                0.198 0.749
18 0.028     0.028                0.187 0.757
19 0.030     0.029                0.177 0.765
20 0.032     0.028                0.168 0.772
21 0.034     0.028                0.160 0.778
22 0.036     0.028                0.153 0.784
23 0.038     0.027                0.146 0.789
24 0.040     0.026                0.140 0.794
25 0.042     0.026                0.134 0.798
26 0.044     0.025                0.128 0.803
27 0.046     0.025                0.122 0.807
28 0.048     0.024                0.116 0.811

ggplot(R, aes(x=day, y=phome, color=treatment, linetype=type)) + geom_line() +
    xl + yl + labs(color='', linetype='')
# Fit is barely better beyond day 17 if ran lrm on data table d2

The model would probably fit better had a partial proportional odds model been used to relax the proportional odds assumption for day.

Now fit a model as if data collected was restricted to certain days. Need to recompute lags to not let a lag be based on a data that’s no longer sampled.

times <- c(1, 2, 3, 4, 7, 14, 27)
# Compute empirical proportions of being at home for those starting In Hospital, not on vent
dfull <- dcf[day %in% times, .(id, day, bstatus, status, treatment, ddeath)]
emps <- dfull[bstatus == 'In Hospital',
                            .(phome = mean(status == 'Home', na.rm=TRUE)),
                 by=.(treatment, day)]
emps[, type := 'Empirical on selected days']
emps[, day  := unclass(day)]

dv <- dfull[day <= ddeath, ]   # don't carry death forward
s  <- dv[, bstatus]
levels(s) <- list('Vent/ARDS'='ARDS', 'Vent/ARDS'='On Vent',
                                    'In Hospital/Facility'='In Hospital')
dv[, table(bstatus, s)]

             s
bstatus       Vent/ARDS In Hospital/Facility
  ARDS              611                    0
  On Vent          2408                    0
  In Hospital         0                 5925

dv[, bstatus := s]
setkey(dv, id, day)
dv[, pstatus := shift(status), by=id]
dv[day == 1, pstatus := bstatus]
dv[, pstatus := pstatus[drop=TRUE]]   # drop unused previous status Dead
dv[, gap := ifelse(day == 1, 1, day - shift(day)), by=id]
dv[, table(day, gap)]

    gap
day     1    3    7   13
  1  1352    0    0    0
  2  1336    0    0    0
  3  1321    0    0    0
  4  1303    0    0    0
  7     0 1268    0    0
  14    0    0 1212    0
  27    0    0    0 1159

ggplot(dv, aes(x=day)) + geom_histogram(binwidth=1) + facet_wrap(~ pstatus)

dd <- datadist(dv); options(datadist='dd')

# d2 <- dv[bstatus == 'In Hospital/Facility', ]
# Find transformation giving the best AIC
# Can't use a spline because of too many tied times
for(k in c(0 : 3)) {
    cat('Transformation', k, '\n')
    form <- if(k == 0) status ~ (day + gap) * pstatus + treatment
    else
        if(k == 1) status ~ (sqrt(day) + sqrt(gap)) * pstatus + treatment
    else
        if(k == 2) status ~ (pol(day, 2) + pol(gap, 2)) * pstatus + treatment
    else
        status ~ (pol(sqrt(day), 2) + pol(sqrt(day), 2)) * pstatus + treatment
    h <- lrm(form,  data=dv, maxit=50, tol=1e-14)
    if(k == 2) fsel <- h   # choose quadratic
    print(AIC(h))
}

Transformation 0 
[1] 8098.67
Transformation 1 
[1] 8008.668
Transformation 2 
[1] 7864.54
Transformation 3 
[1] 7896.565

dat <- data.frame(pstatus='In Hospital/Facility')
  
tlev  <- unique(dv$treatment)
ylev  <- levels(dv$status)

R <- rbind(R, emps, fill=TRUE)
for(trt in tlev) {   # separately for treatments
    dat$treatment <- trt
    u <- uprob(fsel, dat, times, ylev, pvarname='pstatus', gap=TRUE)
  cat('\n', trt, '\n\n')
  print(round(u, 3))
  w <- data.table(treatment=trt, day=times, p=u, phome=u[, 4], type='Model selected days')
  R <- rbind(R, w, fill=TRUE)
}


 Placebo 

    Dead Vent/ARDS In Hospital/Facility  Home
1  0.000     0.039                0.884 0.077
2  0.003     0.059                0.781 0.157
3  0.006     0.066                0.684 0.245
4  0.009     0.064                0.585 0.342
7  0.011     0.051                0.412 0.526
14 0.012     0.029                0.253 0.705
27 0.013     0.025                0.172 0.790

 Vitamin D 

    Dead Vent/ARDS In Hospital/Facility  Home
1  0.000     0.038                0.883 0.079
2  0.002     0.057                0.779 0.161
3  0.005     0.064                0.680 0.251
4  0.009     0.062                0.580 0.349
7  0.011     0.049                0.405 0.536
14 0.011     0.028                0.246 0.714
27 0.013     0.024                0.166 0.798

times <- 1:28
dat <- data.frame(pstatus='In Hospital/Facility')
for(trt in tlev) {   # separately for treatments
    dat$treatment <- trt
    u <- uprob(fsel, dat, times, ylev, pvarname='pstatus', gap=TRUE)
  cat('\n', trt, '\n\n')
  print(round(u, 3))
  w <- data.table(treatment=trt, day=times, p=u, phome=u[, 4], type='Model selected days, predict all')
  R <- rbind(R, w, fill=TRUE)
}


 Placebo 

    Dead Vent/ARDS In Hospital/Facility  Home
1  0.000     0.039                0.884 0.077
2  0.003     0.059                0.781 0.157
3  0.006     0.066                0.684 0.245
4  0.009     0.064                0.585 0.342
5  0.013     0.069                0.774 0.145
6  0.051     0.168                0.585 0.195
7  0.258     0.141                0.402 0.199
8  0.471     0.116                0.231 0.181
9  0.668     0.093                0.104 0.134
10 0.819     0.072                0.035 0.074
11 0.911     0.052                0.009 0.028
12 0.958     0.032                0.002 0.008
13 0.981     0.017                0.000 0.002
14 0.993     0.006                0.000 0.000
15 0.998     0.002                0.000 0.000
16 1.000     0.000                0.000 0.000
17 1.000     0.000                0.000 0.000
18 1.000     0.000                0.000 0.000
19 1.000     0.000                0.000 0.000
20 1.000     0.000                0.000 0.000
21 1.000     0.000                0.000 0.000
22 1.000     0.000                0.000 0.000
23 1.000     0.000                0.000 0.000
24 1.000     0.000                0.000 0.000
25 1.000     0.000                0.000 0.000
26 1.000     0.000                0.000 0.000
27 1.000     0.000                0.000 0.000
28 1.000     0.000                0.000 0.000

 Vitamin D 

    Dead Vent/ARDS In Hospital/Facility  Home
1  0.000     0.038                0.883 0.079
2  0.002     0.057                0.779 0.161
3  0.005     0.064                0.680 0.251
4  0.009     0.062                0.580 0.349
5  0.012     0.067                0.773 0.148
6  0.050     0.169                0.581 0.199
7  0.261     0.142                0.396 0.201
8  0.476     0.116                0.226 0.182
9  0.673     0.094                0.101 0.133
10 0.822     0.073                0.033 0.072
11 0.912     0.052                0.008 0.027
12 0.957     0.033                0.002 0.007
13 0.981     0.017                0.001 0.002
14 0.993     0.007                0.000 0.000
15 0.998     0.002                0.000 0.000
16 1.000     0.000                0.000 0.000
17 1.000     0.000                0.000 0.000
18 1.000     0.000                0.000 0.000
19 1.000     0.000                0.000 0.000
20 1.000     0.000                0.000 0.000
21 1.000     0.000                0.000 0.000
22 1.000     0.000                0.000 0.000
23 1.000     0.000                0.000 0.000
24 1.000     0.000                0.000 0.000
25 1.000     0.000                0.000 0.000
26 1.000     0.000                0.000 0.000
27 1.000     0.000                0.000 0.000
28 1.000     0.000                0.000 0.000

ggplot(R, aes(x=day, y=phome, color=treatment, linetype=type)) + geom_line() +
    xl + yl + labs(color='', linetype='')

This confirms that the Markov model likely handles uneven time spacings, but predictions should be limited to days appearing in the data. As before the problem is due to our attempt to make completely out-of-sample predictions on days not in the data when (1) both day and gap are important but (2) day and gap are collinear.

Better Modeling of Gap Effect in VIOLET 2

The completeness of VIOLET daily follow-up allows us to do another natural experiment where we create a full distribution of gaps by randomly sampling days within patients. For each patient randomly select a day t between 2 and L = last day of data - 6 days. Generate a random gap time r between 1 and L - 2. Remove days t, t+1, … t+r-1.

w <- dcf
w$gap <- NULL

setkey(w, id, day)
w <- w[!is.na(status), ]
# Recode baseline status like follow-up status
levels(w$bstatus) <- list('Vent/ARDS'='ARDS', 'Vent/ARDS'='On Vent',
                                            'In Hospital/Facility'='In Hospital')
w[, .(died=any(status=='Dead')), by=id][, table(died)]

died
FALSE  TRUE 
 1156   195

u <- data.table(id=unique(w$id), key='id')
n <- nrow(u)
set.seed(1)
u[, start := sample(2 : 20, n, replace=TRUE)]
u[, width := sample(1 : 15, n, replace=TRUE)]
u[, end   := start + width - 1]
u[end < 27, table(start, width)]

     width
start  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
   2   4  8  3  1  5  5  6  3  5  6  3  7  6  5  2
   3   6  4  4  7  6  5  4  3  6  2  6  6 13  3  2
   4   5  6  6  3  9  3  3  5  4  3  6  8  4  2  2
   5   6  4  5  4  4  6  5  5  7  4  1  3  8  3  3
   6   3  6  3  0  4  2  3  9  7  7  8  8  9  2  4
   7   1  3  5  2  3  6  7  5  6  7  5  4  7  3  4
   8   2  9  4  8  3  3  3  5  6  1  2  5  6  9  5
   9   3  3  4  5  8  3  5  4  8  5  7  3  7  4  5
   10 10  6  5  4  4  6  2  4  5  7  5  5  4  7  8
   11  4  7  4  5  6  7 10  6  6  9  3  3  4  2  1
   12  5  2  6  2  4  3  8  1  3  6  4  8  4  6  4
   13  6  3  6  4  6  4  2  4  3  8  2  7  7  2  0
   14  3  7  7  7  4  7  3  4  5  5  5  4  4  0  0
   15  8  4  8  8  7  2  7  4  4  5  4  4  0  0  0
   16  5  3  3  5  6  2  2  6  4  5  2  0  0  0  0
   17  3  9  4  6  7  1  5  4  2 10  0  0  0  0  0
   18  1  4  7  5  4  6  5  2  5  0  0  0  0  0  0
   19  3  6  4  2  0  3  4  3  0  0  0  0  0  0  0
   20  6  4  6  4  3  4  9  0  0  0  0  0  0  0  0

z <- w[u]
w <- z[end >= 27 | (day < start | day > end),]
# Recreate previous state and gap with new gaps
w[, pstatus := ifelse(day == 1, as.character(bstatus), shift(as.character(status))), by=id]
w[, gap     := ifelse(day == 1, 1L, day - shift(day)), by=id]
# Don't carry death forward 
w[,  nd := cumsum(status == 'Dead'), by=id]
w <- w[nd < 2, ]
w[, sum(nd > 0)]    # original number of deaths

[1] 195

w[day > 1, table(gap)]

gap
    1     2     3     4     5     6     7     8     9    10    11    12    13 
22788    76    92    84    77    89    73    85    73    78    84    61    71 
   14    15    16 
   78    45    36

with(w, ggfreqScatter(day, gap))
dd <- datadist(w); options(datadist='dd')
f0 <- lrm(status ~ pstatus + rcs(day, 7) + treatment + rcs(age, 4) + sofa + lips,
         data=w, tol=1e-13)
f1 <- lrm(status ~ rcs(gap, 4) + pstatus + rcs(day, 4) + treatment + rcs(age, 4) + sofa + lips,
         data=w, tol=1e-13)
f2 <- lrm(status ~ gap * pstatus + rcs(day, 4) + treatment + rcs(age, 4) + sofa + lips,
         data=w, tol=1e-13)
f3 <- lrm(status ~ lsp(gap, 2) * pstatus + rcs(day, 4) + treatment + rcs(age, 4) + sofa + lips,
         data=w, tol=1e-13)
f4 <- lrm(status ~ lsp(gap, c(2,4)) * pstatus + rcs(day, 4) + treatment + rcs(age, 4) + sofa + lips,
         data=w, tol=1e-13)
f5 <- lrm(status ~ rcs(gap, 4) * pstatus + rcs(day, 4) + treatment + rcs(age, 4) + sofa + lips,
         data=w, tol=1e-13)
c(f0=AIC(f0), f1=AIC(f1), f2=AIC(f2), f3=AIC(f3), f4=AIC(f4), f5=AIC(f5))

      f0       f1       f2       f3       f4       f5 
12012.40 11652.23 11625.38 11597.44 11601.45 11600.24

ggplot(Predict(f1, gap, conf.int=FALSE))
ggplot(Predict(f5, gap, pstatus, conf.int=FALSE))

anova(f5)

	χ²	d.f.	P
Wald Statistics for `status`
gap (Factor+Higher Order Factors)	504.72	9	<0.0001
All Interactions	129.33	6	<0.0001
Nonlinear (Factor+Higher Order Factors)	38.42	6	<0.0001
pstatus (Factor+Higher Order Factors)	5172.52	8	<0.0001
All Interactions	129.33	6	<0.0001
day	112.31	3	<0.0001
Nonlinear	19.27	2	<0.0001
treatment	0.40	1	0.5253
age	47.59	3	<0.0001
Nonlinear	2.34	2	0.3098
sofa	29.49	1	<0.0001
lips	7.27	1	0.0070
gap × pstatus (Factor+Higher Order Factors)	129.33	6	<0.0001
Nonlinear	15.44	4	0.0039
Nonlinear Interaction : f(A,B) vs. AB	15.44	4	0.0039
TOTAL NONLINEAR	64.92	10	<0.0001
TOTAL NONLINEAR + INTERACTION	179.57	12	<0.0001
TOTAL	5683.23	20	<0.0001

Before drawing conclusions let’s handle non-PO in time.

require(VGAM)

fpol <- vglm(ordered(status) ~ pstatus + day + treatment + rcs(age, 4) + sofa + lips,
          cumulative(reverse=TRUE, parallel=TRUE), data=w)
fpo3 <- vglm(ordered(status) ~ pstatus + rcs(day, 3) + treatment + rcs(age, 4) + sofa + lips,
          cumulative(reverse=TRUE, parallel=TRUE), data=w)
fpo5 <- vglm(ordered(status) ~ pstatus + rcs(day, 5) + treatment + rcs(age, 4) + sofa + lips,
          cumulative(reverse=TRUE, parallel=TRUE), data=w)
f0l <- vglm(ordered(status) ~ pstatus + day + treatment + rcs(age, 4) + sofa + lips,
          cumulative(reverse=TRUE, parallel=FALSE ~ day), data=w)
f03 <- vglm(ordered(status) ~ pstatus + rcs(day, 3) + treatment + rcs(age, 4) + sofa + lips,
          cumulative(reverse=TRUE, parallel=FALSE ~ rcs(day, 3)), data=w) 
f05 <- vglm(ordered(status) ~ pstatus + rcs(day, 5) + treatment + rcs(age, 4) + sofa + lips,
          cumulative(reverse=TRUE, parallel=FALSE ~ rcs(day, 5)), data=w)
f1 <- vglm(ordered(status) ~ pstatus + rcs(gap, 4) + rcs(day, 3) + treatment + rcs(age, 4) + sofa + lips,
          cumulative(reverse=TRUE, parallel=FALSE ~ rcs(day, 3)), data=w)
f2 <- vglm(ordered(status) ~ gap * pstatus + rcs(day, 3) + treatment + rcs(age, 4) + sofa + lips,
          cumulative(reverse=TRUE, parallel=FALSE ~ rcs(day, 3)), data=w)
f3 <- vglm(ordered(status) ~ rcs(gap, 3) * pstatus + rcs(day, 3) + treatment + rcs(age, 4) + sofa + lips,
          cumulative(reverse=TRUE, parallel=FALSE ~ rcs(day, 3)), data=w)
f4 <- vglm(ordered(status) ~ rcs(gap, 4) * pstatus + rcs(day, 3) + treatment + rcs(age, 4) + sofa + lips,
          cumulative(reverse=TRUE, parallel=FALSE ~ rcs(day, 3)), data=w)

c(fpol=AIC(fpol), fpo3=AIC(fpo3), fpo5=AIC(fpo5), f0l=AIC(f0l), f03=AIC(f03), f05=AIC(f05), f1=AIC(f1), f2=AIC(f2), f3=AIC(f3), f4=AIC(f4))

    fpol     fpo3     fpo5      f0l      f03      f05       f1       f2 
12044.31 12023.26 12023.96 12039.35 11988.62 11992.68 11639.42 11612.38 
      f3       f4 
11590.84 11592.26

Comparing AICs indicates the following:

fpol vs. fpo3: effect of day is nonlinear in a PO model
fpo3 vs. fpo5: 3 knots are adequate for day in PO model
fpo5 vs. f05: moderate non-PO for day
f05 vs. f0l: non-PO effect of day or main effect of day or both are nonlinear
f05 vs. f03: 3 knots works better for day for combined main and non-PO effects
f1 vs. f03: main effect is needed for gap
f2 vs. f1: previous state interacts weekly with gap
f2, f3,f4: previous state interaction with gap is better represented by 3 knots

Odds Ratios for ORCHID

Now get treatment odds ratios for state occupancy probabilities, as a function of time, as opposed to odds of transitions over consecutive time periods.

odds <- function(x) x / (1. - x)
setDT(Rorchid, key=c('trt', 'day'))
b    <- Rorchid[trt == 'hydroxychloroquine', phome]
a    <- Rorchid[trt == 'placebo',            phome]
days <- Rorchid[trt == 'placebo',            day]
or   <- odds(b) / odds(a)
plot(days, or, type='b', xlab='Day', ylab='hydroxy : placebo OR', ylim=c(1, 1.05))

abline(h=exp(coef(f)['trt=hydroxychloroquine']), col=gray(0.85))
text(27, 1.007,
         'Horizontal line is conditional\ntransition OR from unified PO model', adj=1)

Bayesian Markov Model for ORCHID

We parallel the regular proportional odds model with a Bayesian Markov proportional hazards model.

stanSet()

bmarkov <- blrm(y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 3) +
                sofa_nogcs,
                data=d, refresh=100, file='bmarkov.rds')
htmlVerbatim(stanDx(bmarkov))

Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved

For each parameter, n_eff is a crude measure of effective sample size and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1)

                        n_eff  Rhat
 y>=2                    2662 1.001
 y>=3                    2219 1.001
 y>=4                    2691 1.000
 y>=5                    3632 1.001
 y>=6                    3459 1.001
 y>=7                    2280 1.001
 gap                     2625 1.000
 yprev=3                 3087 1.000
 yprev=4                 3970 1.000
 yprev=5                 3637 1.001
 yprev=6                 2769 1.001
 yprev=7                 2578 1.000
 day                     5192 1.000
 day^2                   5053 1.000
 trt=hydroxychloroquine  5442 1.000
 age                     5505 1.000
 age'                    5513 1.000
 sofa_nogcs              4865 1.000
 gap * yprev=3           4791 1.000
 gap * yprev=4           4223 1.000
 gap * yprev=5           5252 0.999
 gap * yprev=6           5020 0.999
 gap * yprev=7           5482 0.999

stanDxplot(bmarkov)
bmarkov

Bayesian Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.308 for Intercepts

 blrm(formula = y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 
     3) + sofa_nogcs, data = d, refresh = 100, file = "bmarkov.rds")

Frequencies of Responses

   1   2   3   4   5   6   7 
  50 357 297 817 526 757 489

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 3293	B 0.084 [0.082, 0.085]	g 4.179 [4.02, 4.356]	C 0.908 [0.907, 0.908]
Draws 4000		g_p 0.442 [0.437, 0.447]	D_xy 0.816 [0.814, 0.817]
Chains 4		EV 0.625 [0.604, 0.644]
p 17		v 13.45 [12.436, 14.582]
		vp 0.155 [0.151, 0.161]

	Mode β	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
y≥2	2.4622	2.4679	2.4675	0.3170	1.8609	3.0937	1.0000	1.03
y≥3	-1.0702	-1.0803	-1.0814	0.3078	-1.6511	-0.4478	0.0000	0.96
y≥4	-2.9740	-2.9829	-2.9839	0.3222	-3.6155	-2.3538	0.0000	1.00
y≥5	-6.0546	-6.0639	-6.0613	0.3389	-6.7592	-5.4398	0.0000	0.98
y≥6	-7.8915	-7.8991	-7.8943	0.3454	-8.5890	-7.2479	0.0000	0.98
y≥7	-10.7955	-10.8085	-10.7999	0.3591	-11.5214	-10.1165	0.0000	0.97
gap	-0.2174	-0.2170	-0.2165	0.0872	-0.3858	-0.0496	0.0058	0.99
yprev=3	2.2423	2.2524	2.2523	0.2273	1.8178	2.6953	1.0000	1.03
yprev=4	4.8495	4.8570	4.8511	0.2362	4.4060	5.3202	1.0000	1.01
yprev=5	7.2669	7.2735	7.2694	0.2573	6.7586	7.7535	1.0000	1.01
yprev=6	9.3374	9.3438	9.3395	0.2811	8.8091	9.9018	1.0000	1.01
yprev=7	9.3061	9.3066	9.3182	0.5846	8.1136	10.4645	1.0000	0.99
day	0.2781	0.2788	0.2787	0.0400	0.2036	0.3577	1.0000	1.01
day²	-0.0052	-0.0052	-0.0052	0.0008	-0.0068	-0.0038	0.0000	1.00
trt=hydroxychloroquine	0.0113	0.0126	0.0116	0.0694	-0.1308	0.1394	0.5685	0.98
age	-0.0092	-0.0092	-0.0093	0.0050	-0.0185	0.0007	0.0330	1.01
age’	-0.0031	-0.0031	-0.0031	0.0058	-0.0147	0.0078	0.2962	1.02
sofa_nogcs	-0.0901	-0.0900	-0.0903	0.0204	-0.1294	-0.0495	0.0000	1.02
gap × yprev=3	0.2080	0.2063	0.2079	0.0618	0.0791	0.3202	0.9998	0.96
gap × yprev=4	0.3269	0.3258	0.3255	0.0390	0.2493	0.3995	1.0000	0.98
gap × yprev=5	0.2403	0.2396	0.2397	0.0394	0.1638	0.3170	1.0000	1.00
gap × yprev=6	0.0943	0.0940	0.0942	0.0329	0.0327	0.1580	0.9975	0.99
gap × yprev=7	0.2276	0.2288	0.2284	0.0570	0.1243	0.3456	1.0000	1.02

We need to be able to compute posterior distributions on state occupancy probabilities to take into account all sources of uncertainty. The follow function computes all needed derived quantities, separately for each posterior draw.

# Compute unconditional probabilities
# Output is an array of dimension number of draws by s rows corresponding to
# t=times(1), times(2), ...
# times(s), and final dimension is the number of Y categories
# times(1) must be 1
# 
# fit is the Bayesian model fit from blrm which must contain the variable named day and the
# previous state, which must be named by the value of pvarname
# data is a data frame or data table that contains covariate settings
# other than day and what's in pvarname.  data has one row.
# times is the vector of days for which to evaluate state occupancy probabilities
# ylevels are the ordered levels of the outcome variable
# Set gap=TRUE to compute time gaps when predictor gap is in the model

uprobB <- function(fit, data, times, ylevels, pvarname='yprev', gap=FALSE) {
    if(! length(fit$draws)) stop('fit does not have posterior draws')
    nd <- nrow(fit$draws)
  k  <- length(ylevels)
  s  <- length(times)
  P  <- array(NA, c(nd, s, k),
                        dimnames=list(paste('draw', 1 : nd), as.character(times), 
                                                    as.character(ylevels)))
  # Never uncondition on initial state
  data$day <- times[1]
  if(gap) data$gap <- times[1]
  p <- predict(fit, data, type='fitted.ind', posterior.summary='all')
  P[, 1, ] <- p
  # cp: matrix of conditional probabilities of Y conditioning on previous time Y
  # Columns = k conditional probabilities conditional on a single previous state
  # Rows    = all possible previous states
  # This is for a single posterior draw
  cp <- matrix(NA, nrow=k, ncol=k, 
               dimnames=list(paste('t-1', ylevels), paste('t', ylevels)))
  data <- as.list(data)
  data[[pvarname]] <- as.character(ylevels[-1])
  # don't request estimates after absorbing state
  edata <- expand.grid(data)
  for(it in 2 : s) {
    edata$day <- times[it]
    if(gap) edata$gap <- times[it] - times[it - 1]
    # y=first level = absorbing state
    pp <- predict(fit, edata, type='fitted.ind', posterior.summary='all')
    for(idraw in 1 : nd) {
        cp <- rbind(c(1., rep(0., k - 1)), pp[idraw, ,])
      # Compute unconditional probabilities of being in all possible states
      # at current time t
      P[idraw, it, ] <- t(cp) %*% P[idraw, it - 1, ]
    }
  }
  P
}

times <- c(1,2,3,4,7,14,28)
ylev <- 1 : 7
tlev <- levels(d$trt)
dat <- data.frame(yprev='4', age=repcov$age, sofa_nogcs=repcov$sofa_nogcs)
R   <- list()
for(trt in tlev) {   # separately for treatments
    dat$trt <- trt
    u <- uprobB(bmarkov, dat, times, ylev, pvarname='yprev', gap=TRUE)
    R[[trt]] <- u
}

# Compute summary measure per treatment, separately for each time and posterior draw
# P(home) = P(Y=6 or Y=7)

s <- function(u) u[, 6] + u[, 7]
# w <- apply(R$placebo, 1, s)    # to test
r <- lapply(R, function(x) apply(x, 1, s))

For each treatment and time compute the posterior mean P(home) and 0.95 highest posterior density intervals.

g <- function(x) {
    w <- c(mean(x), HPDint(x))
    names(w) <- c('mean', 'lower', 'upper')
    w
}

rs <- lapply(r, function(x) as.data.frame(t(apply(x, 1, g))))
for(trt in tlev) rs[[trt]]$trt <- trt
rs <- do.call(rbind, rs)
# Fetch day from end of row names (follows .)
rs$day <- as.numeric(sub('^.*\\.(.*?)', '\\2', rownames(rs), ))
xl <- xlab('Day'); yl <- ylab('P(being at home)')
rs$txt <- with(rs, paste0(trt, '<br>Day:', day, '<br>', round(mean, 3)))
gg <- ggplot(rs, aes(x=day, y=mean, color=trt, label=txt)) + geom_line() + 
    geom_ribbon(aes(ymin = lower, ymax = upper), alpha=0.2, linetype=0) +
    xl + yl + labs(color='')
ggp(gg)

Instead of individual per-treatment estimates summarize the difference and odds ratio for state occupancy probabilities.

Dif <- r$hydroxychloroquine - r$placebo
dif <- as.data.frame(t(apply(Dif, 1, g)))
dif$day <- as.numeric(sub('^.*\\.(.*?)', '\\2', rownames(dif), ))
dif$txt <- with(dif, paste0('Day:', day, '<br>', round(mean, 3)))
gg <- ggplot(dif, aes(x=day, y=mean, label=txt)) + geom_line() +
    geom_ribbon(aes(ymin=lower, ymax=upper), alpha=0.2, linetype=0) +
    xl + ylab('Hydroxy - Placebo Difference in P(home)')
ggp(gg)

or <- odds(r$hydroxychloroquine) / odds(r$placebo)
or <- as.data.frame(t(apply(or, 1, g)))
or$day <- as.numeric(sub('^.*\\.(.*?)', '\\2', rownames(or), ))
or$txt <- with(or, paste0('Day:', day, '<br>', round(mean, 3)))
gg <- ggplot(or, aes(x=day, y=mean, label=txt)) + geom_line() +
    geom_ribbon(aes(ymin=lower, ymax=upper), alpha=0.2, linetype=0) +
    xl + ylab('Hydroxy - Placebo OR for Being at Home')
ggp(gg)

Evidence for Efficacy Over Time for One Covariate Combination

Now compute the Bayesian evidence measure for a positive difference in P(home). Note that in the two scatterplots relating posterior draws for differences in occupancy probabilities (P(home)), plotting one day vs. another, no points are on the upper left or lower right quadrants, so the evidence for a positive treatment effect does not vary over the days of follow-up.

plot(Dif[1, ], Dif[5, ], xlab='Posterior Draws for Day 1', ylab='Day 7', cex=.3)
abline(h=0, v=0, col=gray(0.8))
plot(Dif[3, ], Dif[6, ], xlab='Posterior Draws for Day 3', ylab='Day 14', cex=.3)
abline(h=0, v=0, col=gray(0.8))
pp <- apply(Dif, 1, function(x) mean(x > 0))
w <- data.frame(day = as.integer(names(pp)), pp)
w$txt <- with(w, paste0('Day:', day, '<br>', round(pp, 3)))
gg <- ggplot(w, aes(x=day, y=pp, label=txt)) + geom_line() +
    xl + ylab('P(increase in P(home) with hydroxychloroquine)')
ggp(gg)

The evidence does vary when computing the posterior probability that treatment has a benefit exceeding an absolute risk difference of 0.02. This is primarily due to the near impossibility of being discharged on the first or second day.

pp <- apply(Dif, 1, function(x) mean(x > 0.02))
w <- data.frame(day = as.integer(names(pp)), pp)
w$txt <- with(w, paste0('Day:', day, '<br>', round(pp, 3)))
gg <- ggplot(w, aes(x=day, y=pp, label=txt)) + geom_line() +
    xl + ylab('P(increase in P(home) with hydroxychloroquine > 0.02)')
ggp(gg)

Posterior Distributions of Median and Restricted Mean Time to Home

Even though the data collection times in ORCHID were not granular enough to compute median time until the patient returned home, we compute this anyway, and compute per-treatment posterior means and discrete posterior probability distributions for these medians. For a given posterior sample, the median is the first day for which P(home) >= 0.5 for a treatment. We arbitrary use 30d as the median when the probability never exceeds 0.5. Recall that all these calculations are conditional on the patient being in state 4 (hospitalized on supplemental oxygen) pre-randomization.

days <- c(1,2,3,4,7,14,28)
g <- function(p) if(all(p < 0.5)) 30 else min(days[p >= 0.5])
rmed <- lapply(r, function(x) data.frame(med = apply(x, 2, g)))
for(trt in tlev) rmed[[trt]]$trt <- trt
rmed <- do.call(rbind, rmed)
with(rmed, tapply(med, trt, mean))   # posterior mean for median

hydroxychloroquine            placebo 
                14                 14

with(rmed, table(med, trt))          # discrete distribution for median

    trt
med  hydroxychloroquine placebo
  14               4000    4000

The sharpness of the posterior distribution of the median time to home is just a reflection of what an easy question we are asking: what is the median of time to home after essentially rounding it to the nearest 7d.

Follow-up was not long enough to compute the mean time until returning home, but we can do inference for the mean restricted time until home (restricted to 28d) assuming that home is an absorbing state (which is approximately but not exactly true). Let \(t_{i}, i=1, 2, \dots, 7\) represent the assessment times (days 1, 2, 3, 4, 7, 14, 28) for time until home \(T\). Under the \(T \leq 28\) restriction, the mean restricted time to home is \(\sum_{i=1}^{7} t_{i} \Pr(T = t_{i}) = \sum_{i=1}^{7} t_{i} [\Pr(T \leq t_{i}) - [i > 1] \Pr(T \leq t_{i-1})]\).

g <- function(p) sum(days * (p - c(0., p[-length(p)])))
rmean <- lapply(r, function(x) data.frame(mean = apply(x, 2, g)))
for(trt in tlev) rmean[[trt]]$trt <- trt
rmean <- do.call(rbind, rmean)
with(rmean, tapply(mean, trt, mean))   # posterior means for mean restricted time to home

hydroxychloroquine            placebo 
          10.55112           10.57485

ggplot(rmean, aes(x=mean, color=trt)) + geom_density() +
    labs(color='') + xlab('Restricted Mean Time to Home (days)') + ylab('')

Compute the posterior probability that the mean restricted time to home is shorter for hydroxy.

a <- rmean[grep('placebo', rownames(rmean)), 'mean']
b <- rmean[grep('hydrox',  rownames(rmean)), 'mean']
plot(density(b - a), main='', xlab='Difference in Mean Restricted Time to Home', ylab='')
abline(v=0, col=gray(0.85))
mean(b < a)

[1] 0.56825

Markov Model with Random Effects

Let’s see what difference it makes to add patient-level random effects to the Markov state transition model.

bmarkovre <- blrm(y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 3) +
                sofa_nogcs + cluster(id),
                data=d, refresh=100, file='bmarkovre.rds')
htmlVerbatim(stanDx(bmarkovre))

Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved

For each parameter, n_eff is a crude measure of effective sample size and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1)

                        n_eff  Rhat
 y>=2                    2058 1.000
 y>=3                    1266 1.002
 y>=4                    1485 1.001
 y>=5                    2911 1.000
 y>=6                    2458 1.000
 y>=7                    1585 1.001
 gap                     1950 1.001
 yprev=3                 1880 1.000
 yprev=4                 3299 1.000
 yprev=5                 2750 1.001
 yprev=6                 1627 1.002
 yprev=7                 2037 1.000
 day                     5880 0.999
 day^2                   5076 1.001
 trt=hydroxychloroquine  5040 0.999
 age                     4770 1.001
 age'                    5632 0.999
 sofa_nogcs              5487 0.999
 gap * yprev=3           4795 1.000
 gap * yprev=4           5105 0.999
 gap * yprev=5           4534 0.999
 gap * yprev=6           5255 1.000
 gap * yprev=7           5623 0.999
 sigmag                  1384 1.000

stanDxplot(bmarkovre)
bmarkovre

Bayesian Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.308 for Intercepts

 blrm(formula = y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 
     3) + sofa_nogcs + cluster(id), data = d, refresh = 100, file = "bmarkovre.rds")

Frequencies of Responses

   1   2   3   4   5   6   7 
  50 357 297 817 526 757 489

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 3293	B 0.084 [0.082, 0.085]	g 4.184 [4.003, 4.327]	C 0.908 [0.907, 0.908]
Draws 4000		g_p 0.442 [0.437, 0.447]	D_xy 0.816 [0.814, 0.817]
Chains 4		EV 0.626 [0.604, 0.644]
p 17		v 13.469 [12.35, 14.435]
Cluster on `id`		vp 0.156 [0.15, 0.16]
Clusters 479
σ_γ 0.0532 [1e-04, 0.1583]

	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
y≥2	2.4855	2.4782	0.3151	1.8645	3.0789	1.0000	1.00
y≥3	-1.0601	-1.0671	0.3088	-1.6477	-0.4493	0.0003	1.01
y≥4	-2.9656	-2.9664	0.3227	-3.5882	-2.3428	0.0000	1.03
y≥5	-6.0504	-6.0497	0.3357	-6.7091	-5.3733	0.0000	1.06
y≥6	-7.8895	-7.8913	0.3462	-8.6008	-7.2125	0.0000	1.03
y≥7	-10.8014	-10.8012	0.3590	-11.4987	-10.0604	0.0000	1.04
gap	-0.2173	-0.2159	0.0838	-0.3727	-0.0480	0.0048	1.03
yprev=3	2.2465	2.2485	0.2255	1.8144	2.6793	1.0000	0.99
yprev=4	4.8489	4.8438	0.2330	4.3954	5.3076	1.0000	1.02
yprev=5	7.2664	7.2617	0.2616	6.7623	7.7755	1.0000	1.01
yprev=6	9.3376	9.3361	0.2844	8.7519	9.8522	1.0000	0.98
yprev=7	9.2789	9.2782	0.5826	8.0628	10.3473	1.0000	1.02
day	0.2793	0.2797	0.0387	0.2083	0.3589	1.0000	1.04
day²	-0.0052	-0.0052	0.0007	-0.0067	-0.0038	0.0000	1.00
trt=hydroxychloroquine	0.0118	0.0111	0.0684	-0.1196	0.1475	0.5662	1.01
age	-0.0093	-0.0092	0.0049	-0.0183	0.0011	0.0310	1.00
age’	-0.0031	-0.0031	0.0057	-0.0150	0.0076	0.2855	1.00
sofa_nogcs	-0.0913	-0.0909	0.0207	-0.1296	-0.0490	0.0000	0.93
gap × yprev=3	0.2067	0.2060	0.0608	0.0911	0.3231	0.9995	1.01
gap × yprev=4	0.3279	0.3274	0.0386	0.2517	0.4014	1.0000	1.01
gap × yprev=5	0.2417	0.2421	0.0397	0.1665	0.3205	1.0000	0.98
gap × yprev=6	0.0954	0.0950	0.0338	0.0319	0.1614	0.9972	1.03
gap × yprev=7	0.2321	0.2324	0.0575	0.1205	0.3480	1.0000	1.03

The very low standard deviation of the random effects (0.053) lends credence to the idea that the state transitions are conditionally independent and that random effects are not needed. Also, the posterior mean and median regression parameter values and standard deviations of their posterior distributions changed very little.

Serial Transition Model With Non-Proportional Odds

Return to the Markov model without random effects. Let’s use a partial proportional odds model in which we add departures from the proportional odds assumption with regard to two variables: previous state and time (but not their interaction). We are using the constrained partial proportional odds model in which the constraint imposes linearity in the outcome category (but not for the effect of the previous outcome level).

bmarkovppo <- blrm(y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 3) +
                sofa_nogcs, ~ day + yprev, cppo=function(y) y,
                data=d, refresh=100, file='bmarkovppo.rds')
htmlVerbatim(stanDx(bmarkovppo))

Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved

For each parameter, n_eff is a crude measure of effective sample size and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1)

                        n_eff  Rhat
 y>=2                     524 1.009
 y>=3                     480 1.008
 y>=4                     441 1.009
 y>=5                     589 1.005
 y>=6                     795 1.001
 y>=7                     812 1.003
 gap                      487 1.010
 yprev=3                  978 1.004
 yprev=4                  595 1.006
 yprev=5                  761 1.004
 yprev=6                  902 1.005
 yprev=7                  631 1.009
 day                     2249 1.000
 day^2                   1787 1.001
 trt=hydroxychloroquine  2581 1.000
 age                     2304 1.002
 age'                    2419 1.000
 sofa_nogcs              1996 1.001
 gap * yprev=3           1883 1.003
 gap * yprev=4           2312 0.999
 gap * yprev=5           2078 1.000
 gap * yprev=6           1248 1.001
 gap * yprev=7           1702 1.001
 day x f(y)               485 1.008
 yprev=3 x f(y)           886 1.002
 yprev=4 x f(y)           519 1.008
 yprev=5 x f(y)          1056 1.003
 yprev=6 x f(y)           852 1.005
 yprev=7 x f(y)           549 1.008

stanDxplot(bmarkovppo)
bmarkovppo

Bayesian Constrained Partial Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.308 for Intercepts

 blrm(formula = y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 
     3) + sofa_nogcs, ppo = ~day + yprev, cppo = function(y) y, 
     data = d, refresh = 100, file = "bmarkovppo.rds")

Frequencies of Responses

   1   2   3   4   5   6   7 
  50 357 297 817 526 757 489

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 3293	B 0.081 [0.08, 0.082]	g 4.556 [4.243, 4.85]	C 0.886 [0.88, 0.894]
Draws 4000		g_p 0.447 [0.441, 0.452]	D_xy 0.773 [0.76, 0.788]
Chains 4		EV 0.645 [0.624, 0.667]
p 17		v 15.982 [13.802, 18.078]
		vp 0.16 [0.155, 0.165]

	Mode β	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
y≥2	2.8421	2.8598	2.8643	0.3338	2.2264	3.5288	1.0000	1.00
y≥3	-1.4108	-1.4103	-1.4075	0.3250	-2.0104	-0.7244	0.0000	1.04
y≥4	-3.5968	-3.6227	-3.6207	0.3719	-4.3768	-2.8930	0.0000	0.98
y≥5	-7.0605	-7.1393	-7.1315	0.4372	-8.0427	-6.3284	0.0000	0.95
y≥6	-9.2730	-9.4218	-9.4100	0.5459	-10.5459	-8.4240	0.0000	0.88
y≥7	-12.3128	-12.5330	-12.4950	0.6901	-13.9470	-11.2632	0.0000	0.86
gap	0.0851	0.0841	0.0851	0.0888	-0.0953	0.2499	0.8238	1.00
yprev=3	2.8052	2.8386	2.8390	0.3517	2.1649	3.5282	1.0000	1.04
yprev=4	5.8843	5.9514	5.9393	0.3274	5.3706	6.6288	1.0000	1.08
yprev=5	9.0448	9.1057	9.0961	0.3553	8.3830	9.7680	1.0000	1.06
yprev=6	13.6666	13.6957	13.6902	0.5387	12.7090	14.8397	1.0000	1.06
yprev=7	12.0749	12.3731	12.3376	1.0384	10.3061	14.4209	1.0000	1.08
day	0.3580	0.3579	0.3578	0.0420	0.2780	0.4427	1.0000	1.01
day²	-0.0080	-0.0080	-0.0080	0.0007	-0.0094	-0.0066	0.0000	1.00
trt=hydroxychloroquine	0.0171	0.0142	0.0139	0.0710	-0.1138	0.1567	0.5768	1.00
age	-0.0108	-0.0109	-0.0109	0.0053	-0.0217	-0.0014	0.0192	0.99
age’	-0.0022	-0.0021	-0.0020	0.0061	-0.0141	0.0095	0.3665	1.04
sofa_nogcs	-0.0980	-0.0981	-0.0975	0.0203	-0.1376	-0.0598	0.0000	0.98
gap × yprev=3	-0.0861	-0.0817	-0.0823	0.0499	-0.1778	0.0124	0.0472	1.00
gap × yprev=4	-0.1911	-0.1875	-0.1887	0.0377	-0.2616	-0.1127	0.0000	1.06
gap × yprev=5	-0.2893	-0.2861	-0.2866	0.0471	-0.3745	-0.1887	0.0000	1.03
gap × yprev=6	-0.4627	-0.4601	-0.4604	0.0441	-0.5441	-0.3748	0.0000	1.01
gap × yprev=7	-0.3354	-0.3330	-0.3340	0.0635	-0.4538	-0.2100	0.0000	1.07
day x f(y)	0.1202	0.1197	0.1198	0.0068	0.1074	0.1338	1.0000	1.02
yprev=3 x f(y)	-0.1045	-0.0802	-0.0785	0.2495	-0.5887	0.3980	0.3650	1.03
yprev=4 x f(y)	0.1093	0.2008	0.1768	0.1969	-0.1366	0.6030	0.8655	1.50
yprev=5 x f(y)	-0.9969	-0.8857	-0.8974	0.2864	-1.4536	-0.3492	0.0035	1.12
yprev=6 x f(y)	-2.7648	-2.6253	-2.6180	0.3922	-3.4177	-1.8523	0.0000	0.94
yprev=7 x f(y)	-1.6179	-1.6577	-1.6263	0.6149	-2.9022	-0.5181	0.0018	0.87

# htmlVerbatim(compareBmods(bmarkovppo, bmarkov))
# Note: bmarkovppo fit with loo=TRUE was not saved because it was 171MB for some reason
# Output:
# Method: stacking
# ------
#        weight
#  model1 0.957 
#  model2 0.043 
#  
#  Compute posterior means divided by SE
round(coef(bmarkovppo) / sqrt(diag(vcov(bmarkovppo))), 2)

              y>=2                   y>=3                   y>=4 
              8.57                  -4.34                  -9.74 
              y>=5                   y>=6                   y>=7 
            -16.33                 -17.26                 -18.16 
               gap                yprev=3                yprev=4 
              0.95                   8.07                  18.18 
           yprev=5                yprev=6                yprev=7 
             25.63                  25.42                  11.92 
               day                  day^2 trt=hydroxychloroquine 
              8.53                 -10.99                   0.20 
               age                   age'             sofa_nogcs 
             -2.07                  -0.34                  -4.84 
     gap * yprev=3          gap * yprev=4          gap * yprev=5 
             -1.64                  -4.98                  -6.07 
     gap * yprev=6          gap * yprev=7             day x f(y) 
            -10.44                  -5.24                  17.72 
    yprev=3 x f(y)         yprev=4 x f(y)         yprev=5 x f(y) 
             -0.32                   1.02                  -3.09 
    yprev=6 x f(y)         yprev=7 x f(y) 
             -6.69                  -2.70

There were some minor sampling/convergence problems especially with respect to the parameters for departures from proportional odds.

The model comparison results listed in the code above puts a great likelihood on the partial proportional odds model fitting better than the proportional odds model. The result of compareBmods, not saved in the code because the loo=TRUE option makes the fit object much bigger, is here (but from a slightly different fit):

Method: stacking
------
       weight
 model1 0.957 
 model2 0.043

Based on posterior mean parameter values and standard deviations of posterior distributions (standard errors), the most important departures from PO seem to be time-related and the effect of previously being in state 6.

Final Model With Non-PO With Respect to Time

Our final model will have as non-proportional odds only the effect of time, but we will now allow the time effect to be nonlinear (quadratic). As before the cppo function assumes that the PO departure is proportional to the numeric 1-7 outcome codes. A fully unconstrained partial proportional odds model would have too many parameters.

bmarkovppof <- blrm(y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 3) +
                sofa_nogcs, ~ pol(day, 2), cppo=function(y) y,
                data=d, refresh=100, file='bmarkovppof.rds')
htmlVerbatim(stanDx(bmarkovppof))

Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved

For each parameter, n_eff is a crude measure of effective sample size and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1)

                        n_eff  Rhat
 y>=2                    3155 1.001
 y>=3                    2772 1.001
 y>=4                    3497 1.000
 y>=5                    4565 1.000
 y>=6                    3112 1.000
 y>=7                    2744 1.000
 gap                     3701 1.000
 yprev=3                 3236 1.001
 yprev=4                 4312 1.000
 yprev=5                 3439 1.001
 yprev=6                 2914 1.001
 yprev=7                 3443 1.000
 day                     5998 1.000
 day^2                   4539 1.000
 trt=hydroxychloroquine  7707 0.999
 age                     6236 1.000
 age'                    6518 1.000
 sofa_nogcs              5643 1.000
 gap * yprev=3           5416 0.999
 gap * yprev=4           7418 1.000
 gap * yprev=5           6227 0.999
 gap * yprev=6           4029 1.000
 gap * yprev=7           4523 1.000
 day x f(y)              2652 1.000
 day^2 x f(y)            4295 1.001

stanDxplot(bmarkovppof)
bmarkovppof

Bayesian Constrained Partial Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.308 for Intercepts

 blrm(formula = y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 
     3) + sofa_nogcs, ppo = ~pol(day, 2), cppo = function(y) y, 
     data = d, refresh = 100, file = "bmarkovppof.rds")

Frequencies of Responses

   1   2   3   4   5   6   7 
  50 357 297 817 526 757 489

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 3293	B 0.081 [0.08, 0.082]	g 3.805 [3.639, 3.941]	C 0.884 [0.877, 0.892]
Draws 4000		g_p 0.438 [0.433, 0.443]	D_xy 0.768 [0.754, 0.784]
Chains 4		EV 0.615 [0.596, 0.635]
p 17		v 11.362 [10.498, 12.293]
		vp 0.153 [0.149, 0.159]

	Mode β	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
y≥2	4.4378	4.4540	4.4456	0.3640	3.7344	5.1346	1.0000	0.94
y≥3	-0.4172	-0.4190	-0.4177	0.3262	-1.1102	0.1631	0.0980	0.95
y≥4	-3.2193	-3.2292	-3.2277	0.3334	-3.8614	-2.5610	0.0000	0.97
y≥5	-7.1431	-7.1566	-7.1519	0.3534	-7.8163	-6.4290	0.0000	0.96
y≥6	-10.1971	-10.2173	-10.2167	0.3874	-10.9379	-9.4161	0.0000	0.98
y≥7	-15.0652	-15.0933	-15.0933	0.4446	-15.9659	-14.2205	0.0000	0.99
gap	-0.0655	-0.0663	-0.0672	0.0887	-0.2413	0.1038	0.2285	0.98
yprev=3	2.7151	2.7239	2.7242	0.2236	2.2572	3.1256	1.0000	1.00
yprev=4	5.8959	5.9090	5.9062	0.2396	5.4725	6.3953	1.0000	0.99
yprev=5	8.9304	8.9504	8.9518	0.2800	8.4281	9.5012	1.0000	1.05
yprev=6	11.2825	11.3124	11.3110	0.3179	10.6828	11.9249	1.0000	1.06
yprev=7	9.9164	9.9414	9.9370	0.6189	8.7525	11.1980	1.0000	0.94
day	0.3923	0.3917	0.3912	0.0425	0.3121	0.4794	1.0000	1.07
day²	-0.0053	-0.0053	-0.0053	0.0007	-0.0067	-0.0038	0.0000	0.97
trt=hydroxychloroquine	0.0257	0.0264	0.0260	0.0716	-0.1048	0.1767	0.6408	1.03
age	-0.0105	-0.0105	-0.0104	0.0052	-0.0214	-0.0011	0.0205	0.96
age’	-0.0027	-0.0027	-0.0027	0.0059	-0.0136	0.0097	0.3157	1.01
sofa_nogcs	-0.1082	-0.1084	-0.1085	0.0201	-0.1475	-0.0689	0.0000	1.06
gap × yprev=3	-0.0726	-0.0704	-0.0707	0.0460	-0.1642	0.0143	0.0640	1.04
gap × yprev=4	-0.2130	-0.2101	-0.2110	0.0366	-0.2816	-0.1406	0.0000	1.08
gap × yprev=5	-0.3967	-0.3938	-0.3936	0.0411	-0.4736	-0.3125	0.0000	1.01
gap × yprev=6	-0.5680	-0.5657	-0.5660	0.0383	-0.6388	-0.4888	0.0000	1.08
gap × yprev=7	-0.3339	-0.3299	-0.3308	0.0601	-0.4405	-0.2046	0.0000	1.05
day x f(y)	0.3454	0.3457	0.3457	0.0199	0.3057	0.3837	1.0000	0.99
day² x f(y)	-0.0066	-0.0066	-0.0066	0.0006	-0.0078	-0.0054	0.0000	1.05

There is strong evidence that the departure from PO with respect to day of follow-up is nonlinear.

Final Check

Let’s fit a Bayesian transition model without covariates (other than treatment and initial state) to once again check against empirical proportions of being at home.

bmarkovppon <- blrm(y ~ gap * yprev + pol(day, 2) + trt, ~ pol(day, 2),
                                        cppo=function(y) y,
                      data=d, refresh=100, file='bmarkovppon.rds')
htmlVerbatim(stanDx(bmarkovppon))

Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved

For each parameter, n_eff is a crude measure of effective sample size and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1)

                        n_eff  Rhat
 y>=2                    3039 1.000
 y>=3                    2804 1.001
 y>=4                    3198 1.001
 y>=5                    4132 1.000
 y>=6                    3108 1.000
 y>=7                    2452 1.000
 gap                     3223 1.002
 yprev=3                 2929 1.000
 yprev=4                 3561 1.000
 yprev=5                 3773 1.000
 yprev=6                 2980 0.999
 yprev=7                 3157 1.000
 day                     5230 1.000
 day^2                   4511 0.999
 trt=hydroxychloroquine  5522 1.000
 gap * yprev=3           4918 0.999
 gap * yprev=4           5557 1.000
 gap * yprev=5           4704 1.000
 gap * yprev=6           3538 1.000
 gap * yprev=7           3973 1.001
 day x f(y)              2504 1.000
 day^2 x f(y)            4184 0.999

bmarkovppon

Bayesian Constrained Partial Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.308 for Intercepts

 blrm(formula = y ~ gap * yprev + pol(day, 2) + trt, ppo = ~pol(day, 
     2), cppo = function(y) y, data = d, refresh = 100, file = "bmarkovppon.rds")

Frequencies of Responses

   1   2   3   4   5   6   7 
  50 357 297 817 526 757 489

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 3293	B 0.083 [0.082, 0.084]	g 3.723 [3.602, 3.865]	C 0.884 [0.875, 0.895]
Draws 4000		g_p 0.436 [0.431, 0.442]	D_xy 0.768 [0.751, 0.79]
Chains 4		EV 0.611 [0.59, 0.632]
p 14		v 10.918 [10.193, 11.744]
		vp 0.152 [0.147, 0.157]

	Mode β	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
y≥2	3.0582	3.0613	3.0559	0.2542	2.5698	3.5447	1.0000	1.04
y≥3	-1.7124	-1.7184	-1.7170	0.2084	-2.1268	-1.3098	0.0000	1.02
y≥4	-4.4747	-4.4817	-4.4835	0.2230	-4.9316	-4.0574	0.0000	1.00
y≥5	-8.3492	-8.3536	-8.3546	0.2506	-8.8241	-7.8498	0.0000	0.99
y≥6	-11.3550	-11.3588	-11.3569	0.2932	-11.9503	-10.7758	0.0000	0.96
y≥7	-16.1466	-16.1465	-16.1463	0.3684	-16.9045	-15.4516	0.0000	0.97
gap	-0.0601	-0.0610	-0.0611	0.0872	-0.2403	0.1043	0.2380	0.98
yprev=3	2.9563	2.9623	2.9583	0.2233	2.5177	3.4066	1.0000	1.02
yprev=4	6.2331	6.2370	6.2370	0.2320	5.7860	6.6868	1.0000	1.00
yprev=5	9.3574	9.3595	9.3578	0.2661	8.8260	9.8676	1.0000	1.03
yprev=6	11.7001	11.6994	11.6996	0.3149	11.0668	12.2847	1.0000	0.98
yprev=7	10.3873	10.3768	10.3723	0.6030	9.1319	11.4835	1.0000	1.04
day	0.3894	0.3892	0.3891	0.0416	0.3068	0.4689	1.0000	1.01
day²	-0.0053	-0.0053	-0.0053	0.0008	-0.0067	-0.0037	0.0000	1.01
trt=hydroxychloroquine	0.0029	0.0029	0.0016	0.0692	-0.1220	0.1461	0.5115	1.03
gap × yprev=3	-0.0700	-0.0689	-0.0696	0.0473	-0.1603	0.0202	0.0735	1.00
gap × yprev=4	-0.2336	-0.2304	-0.2308	0.0363	-0.3013	-0.1599	0.0000	1.00
gap × yprev=5	-0.4206	-0.4170	-0.4168	0.0413	-0.4960	-0.3342	0.0000	1.00
gap × yprev=6	-0.5733	-0.5692	-0.5690	0.0388	-0.6436	-0.4933	0.0000	1.00
gap × yprev=7	-0.3432	-0.3370	-0.3363	0.0610	-0.4566	-0.2188	0.0000	0.99
day x f(y)	0.3382	0.3374	0.3372	0.0190	0.3011	0.3736	1.0000	1.03
day² x f(y)	-0.0064	-0.0064	-0.0064	0.0006	-0.0076	-0.0053	0.0000	0.99

dat <- data.frame(yprev='4')
R   <- list()
for(trt in tlev) {   # separately for treatments
    dat$trt <- trt
    u <- uprobB(bmarkovppon, dat, times, ylev, pvarname='yprev', gap=TRUE)
    R[[trt]] <- u
}

# Compute summary measure per treatment, separately for each time and posterior draw
# P(home) = P(Y=6 or Y=7)

s <- function(u) u[, 6] + u[, 7]
# w <- apply(R$placebo, 1, s)    # to test
r <- lapply(R, function(x) apply(x, 1, s))

# For each treatment and time compute the posterior mean P(home)

rs <- lapply(r, function(x) data.frame(phome = apply(x, 1, mean)))
for(trt in tlev) rs[[trt]]$trt <- trt
rs <- do.call(rbind, rs)
# Fetch day from end of row names (follows .)
rs$day <- as.numeric(sub('^.*\\.(.*?)', '\\2', rownames(rs), ))
rs$type <- 'Model'
rs <- rbind(rs, empirical)
ggplot(rs, aes(x=day, y=phome, color=trt, linetype=type)) + geom_line() + xl + yl +
    labs(color='', linetype='')

The agreement isn’t perfect but it’s probably the case that the sample size is not large enough to trust the empirical proportions.

A Different Markov Ordinal Model for Unequal Gaps

Consider a different approach in which we model serial dependence by attenuating the previous state effect by multiplying it by the reciprocal of gap time raised to a positive power. This has the properties of (1) forcing Y to be the same as the previous Y as the gap time approaches zero and (2) forcing independence across outcomes within patient as the gap time goes to infinity. To induce very high correlations in Y within patient with very small gaps, effects must approach \(\pm \infty\) on the logic scale, and the reciprocal raised to a power allows this. It is important to note that the effects being modeled must account for strong non-proportional odds. It is easy to see why when considering continuous Y with Gaussian residuals. A continuous-time AR(1) process is the Ornstein-Uhlenbeck process, and this process specifies that the variance of Y changes with the time gap. When the gap is very small, the conditional distribution of Y given the previous Y has a very small variance because in that case Y is very predictable from the previous Y. The analog to changing variance is changing the spacings of intercepts in a PO model. We accomplish that with a constrained partial PO model, the constraint being, as before, that the departure from PO is linear in Y. The question is whether we also need non-PO in absolute follow-up time. Logically, we do, but collinearity with gap may get in the way. We model serial dependence as a reciprocal of the cube root of the gap. The exponent really needs to be estimated from the data, which will require a nonlinear model. Our formulation does not allow main effects of previous states separate from the time gap to be included, so we are forcing the exponential function of gap to always be multiplied by effects of previous states.

A variety of functions for gap discounting of previous states, but none of them resulted in reasonable estimates of occupancy probabilities beyond the first gap > 1d. The last attempt is below. All of this points to the need for a true continuous time model when there is a need to estimate occupancy probabilities at unobserved times.

d$ypn <- NULL

rm(trt)
w <- d
ypn <- as.numeric(as.character(w$yprev))
w <- as.data.frame(w)
alpha <- 1.
lambda <- 0.15
gamma <- 2
X <- with(w, 1 * cbind(ypn == 2, ypn == 3, ypn == 4, ypn == 5, ypn == 6, ypn == 7))
# w$yp  <- X
w$ypr <- X / (w$gap ^ alpha)
w$ypr <- X * exp(-lambda * w$gap)
w$ypr <- X * (gamma + exp(-lambda * w$gap))
# model.matrix problem if yp is logical instead of numeric
bmarkovppor <- blrm(y ~ ypr + pol(day, 2) + trt, ~ pol(day, 2) + ypr,
                                        cppo=function(y) y,
                      data=w, refresh=100, file='bmarkovppor.rds')
b <- bmarkovppor
htmlVerbatim(stanDx(b))

Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved

For each parameter, n_eff is a crude measure of effective sample size and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1)

                        n_eff  Rhat
 y>=2                    1225 0.999
 y>=3                    1046 1.000
 y>=4                     960 1.000
 y>=5                    1272 1.000
 y>=6                    2215 1.000
 y>=7                    2524 1.001
 ypr[1]                  2123 1.001
 ypr[2]                  1581 1.000
 ypr[3]                   963 1.000
 ypr[4]                  1009 1.001
 ypr[5]                  2231 1.002
 ypr[6]                  1613 1.001
 day                     5208 1.000
 day^2                   6050 0.999
 trt=hydroxychloroquine  5684 1.000
 day x f(y)              1075 1.002
 day^2 x f(y)            2040 1.000
 ypr[1] x f(y)           1789 1.002
 ypr[2] x f(y)           2679 1.000
 ypr[3] x f(y)           1364 1.000
 ypr[4] x f(y)           1313 1.000
 ypr[5] x f(y)           2125 1.002
 ypr[6] x f(y)           3964 1.000

Bayesian Constrained Partial Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.308 for Intercepts

 blrm(formula = y ~ ypr + pol(day, 2) + trt, ppo = ~pol(day, 2) + 
     ypr, cppo = function(y) y, data = w, refresh = 100, file = "bmarkovppor.rds")

Frequencies of Responses

   1   2   3   4   5   6   7 
  50 357 297 817 526 757 489

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 3293	B 0.084 [0.083, 0.085]	g 4.511 [4.2, 4.819]	C 0.884 [0.879, 0.893]
Draws 4000		g_p 0.449 [0.443, 0.453]	D_xy 0.769 [0.759, 0.785]
Chains 4		EV 0.651 [0.632, 0.671]
p 9		v 15.665 [13.51, 17.901]
		vp 0.162 [0.157, 0.167]

	Mode β	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
y≥2	9.0466	9.0329	9.0314	5.5668	-1.1526	20.2638	0.9502	1.02
y≥3	1.9489	1.9434	1.9365	3.9782	-5.9721	9.5441	0.6892	1.02
y≥4	-3.1405	-3.1685	-3.1378	2.7501	-8.7864	2.0215	0.1207	0.95
y≥5	-9.5137	-9.5646	-9.5349	2.4891	-14.7573	-4.9412	0.0000	0.96
y≥6	-14.3301	-14.4224	-14.4053	3.4199	-20.9956	-7.8009	0.0000	1.02
y≥7	-20.1491	-20.2917	-20.3525	4.9035	-29.9257	-11.1951	0.0000	1.04
ypr[1]	0.3317	0.3105	0.2920	0.8221	-1.2516	1.9675	0.6440	1.03
ypr[2]	1.1997	1.2103	1.1923	0.8127	-0.2407	2.9797	0.9380	1.03
ypr[3]	2.2590	2.2739	2.2575	0.8139	0.7930	4.0192	0.9970	1.02
ypr[4]	3.4183	3.4317	3.4179	0.8209	1.9777	5.2350	1.0000	1.03
ypr[5]	4.9959	4.9895	4.9714	0.8181	3.4314	6.6575	1.0000	1.03
ypr[6]	4.5479	4.5939	4.5781	0.9122	2.8510	6.4027	1.0000	1.02
day	0.3656	0.3669	0.3666	0.0471	0.2702	0.4574	1.0000	1.03
day²	-0.0083	-0.0083	-0.0083	0.0009	-0.0102	-0.0065	0.0000	0.99
trt=hydroxychloroquine	0.0146	0.0132	0.0140	0.0708	-0.1190	0.1548	0.5748	0.98
day x f(y)	0.3538	0.3560	0.3559	0.0626	0.2336	0.4760	1.0000	1.01
day² x f(y)	-0.0073	-0.0074	-0.0074	0.0012	-0.0096	-0.0050	0.0000	0.95
ypr[1] x f(y)	1.4566	1.4369	1.4428	0.9882	-0.5276	3.2874	0.9285	0.99
ypr[2] x f(y)	1.3604	1.3655	1.3598	0.9896	-0.5180	3.3209	0.9132	1.00
ypr[3] x f(y)	1.4934	1.5060	1.5046	0.9894	-0.4427	3.3649	0.9382	0.98
ypr[4] x f(y)	0.8827	0.9027	0.8946	0.9878	-1.0460	2.7810	0.8170	0.98
ypr[5] x f(y)	0.0410	0.0775	0.0692	1.0024	-1.8831	1.9663	0.5275	1.00
ypr[6] x f(y)	0.6509	0.6510	0.6322	1.0204	-1.2392	2.6901	0.7375	0.99

Now get state occupancy probabilities for observed days, then for all days. We must modify the uprobB function to handle the special nature of gap dependencies.

uprobB <- function(fit, data, times, ylevels, pvarname='yprev', gap=FALSE) {
    if(! length(fit$draws)) stop('fit does not have posterior draws')
    nd <- nrow(fit$draws)
  k  <- length(ylevels)
  s  <- length(times)
  P  <- array(NA, c(nd, s, k),
                        dimnames=list(paste('draw', 1 : nd), as.character(times), 
                                                    as.character(ylevels)))
  # Never uncondition on initial state
  wdata <- data
  wdata$day <- times[1]
  if(gap) {
    gp <- times[1]
    wdata$gap <- gp
    ypn <- as.numeric(as.character(wdata$yprev))
    X <- 1. * cbind(ypn == 2, ypn == 3, ypn == 4, ypn == 5, ypn == 6, ypn == 7)
    # wdata$yp  <- X
    wdata$ypr <- X * (gamma + exp(-lambda * gp))     # X / (gp ^ alpha)
  }
  p <- predict(fit, wdata, type='fitted.ind', posterior.summary='all')
  P[, 1, ] <- p
  # cp: matrix of conditional probabilities of Y conditioning on previous time Y
  # Columns = k conditional probabilities conditional on a single previous state
  # Rows    = all possible previous states
  # This is for a single posterior draw
  cp <- matrix(NA, nrow=k, ncol=k, 
               dimnames=list(paste('t-1', ylevels), paste('t', ylevels)))
  data <- as.list(data)
  data[[pvarname]] <- as.character(ylevels[-1])
  # don't request estimates after absorbing state
  # When data contains a matrix with one row, expand.grid will treat each column of that matrix
  # as a new row in the output.   We don't want that.  Instead, create data frame without
  # expand.grid and add ypr as a matrix to it later
  # edata <- expand.grid(data)
  edata <- as.data.frame(data)  
  
  for(it in 2 : s) {
    edata$day <- times[it]
    if(gap) {
      gp        <- times[it] - times[it - 1]
      edata$gap <- gp
      ypn       <- as.numeric(as.character(edata[[pvarname]]))
      X         <- 1. * matrix(c(ypn == 2, ypn == 3, ypn == 4, ypn == 5, ypn == 6, ypn == 7), ncol=6)
      # edata$yp  <- X
      edata$ypr <- X * (gamma + exp(-lambda * gp))   # X / (gp ^ alpha)
    }
    # y=first level = absorbing state
    pp <- predict(fit, edata, type='fitted.ind', posterior.summary='all')
    for(idraw in 1 : nd) {
        cp <- rbind(c(1., rep(0., k - 1)), pp[idraw, ,])
      # Compute unconditional probabilities of being in all possible states
      # at current time t
      P[idraw, it, ] <- t(cp) %*% P[idraw, it - 1, ]
    }
  }
  P
}

times <- c(1,2,3,4,7,14,28)
ylev <- 1 : 7
tlev <- levels(d$trt)
dat <- data.frame(yprev='4', age=repcov$age, sofa_nogcs=repcov$sofa_nogcs)
R   <- list()
for(trt in tlev) {   # separately for treatments
    dat$trt <- trt
    u <- uprobB(b, dat, times, ylev, pvarname='yprev', gap=TRUE)
    R[[trt]] <- u
}

Look at state occupancy probabilities for placebo, summarized by the posterior mean probabilities.

sop <- round(apply(R[[1]], 2:3, mean), 2)
sop

      1    2    3    4    5    6    7
1  0.00 0.02 0.14 0.70 0.11 0.02 0.00
2  0.00 0.06 0.16 0.53 0.18 0.07 0.00
3  0.01 0.09 0.13 0.40 0.21 0.15 0.01
4  0.02 0.11 0.10 0.29 0.19 0.26 0.02
7  0.04 0.10 0.07 0.20 0.14 0.37 0.07
14 0.06 0.09 0.03 0.11 0.05 0.31 0.36
28 0.08 0.06 0.02 0.06 0.03 0.25 0.50

Now repeat, requesting probabilities for all consecutive days.

times <- 1 : 28
ylev <- 1 : 7
tlev <- levels(d$trt)
dat <- data.frame(yprev='4', age=repcov$age, sofa_nogcs=repcov$sofa_nogcs)
R   <- list()
for(trt in tlev) {   # separately for treatments
    dat$trt <- trt
    u <- uprobB(b, dat, times, ylev, pvarname='yprev', gap=TRUE)
    R[[trt]] <- u
}
round(apply(R[[1]], 2:3, mean), 2)

      1    2    3    4    5    6    7
1  0.00 0.02 0.14 0.70 0.11 0.02 0.00
2  0.00 0.06 0.16 0.53 0.18 0.07 0.00
3  0.01 0.09 0.13 0.40 0.21 0.15 0.01
4  0.02 0.11 0.10 0.29 0.19 0.26 0.02
5  0.04 0.11 0.07 0.20 0.15 0.37 0.06
6  0.06 0.09 0.05 0.13 0.10 0.43 0.15
7  0.07 0.08 0.03 0.07 0.06 0.40 0.29
8  0.09 0.06 0.02 0.04 0.03 0.30 0.46
9  0.11 0.04 0.01 0.02 0.02 0.20 0.61
10 0.12 0.02 0.01 0.01 0.01 0.12 0.71
11 0.12 0.02 0.00 0.01 0.00 0.08 0.77
12 0.13 0.01 0.00 0.00 0.00 0.05 0.81
13 0.13 0.01 0.00 0.00 0.00 0.03 0.83
14 0.13 0.00 0.00 0.00 0.00 0.02 0.84
15 0.13 0.00 0.00 0.00 0.00 0.02 0.85
16 0.13 0.00 0.00 0.00 0.00 0.01 0.85
17 0.13 0.00 0.00 0.00 0.00 0.01 0.85
18 0.13 0.00 0.00 0.00 0.00 0.01 0.86
19 0.14 0.00 0.00 0.00 0.00 0.01 0.86
20 0.14 0.00 0.00 0.00 0.00 0.01 0.86
21 0.14 0.00 0.00 0.00 0.00 0.01 0.86
22 0.14 0.00 0.00 0.00 0.00 0.01 0.86
23 0.14 0.00 0.00 0.00 0.00 0.01 0.86
24 0.14 0.00 0.00 0.00 0.00 0.01 0.86
25 0.14 0.00 0.00 0.00 0.00 0.01 0.86
26 0.14 0.00 0.00 0.00 0.00 0.01 0.86
27 0.14 0.00 0.00 0.00 0.00 0.01 0.86
28 0.14 0.00 0.00 0.00 0.00 0.01 0.85

Computing Environment

 R version 4.1.0 (2021-05-18)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Pop!_OS 20.10
 
 Matrix products: default
 BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
 
 attached base packages:
 [1] splines   stats4    stats     graphics  grDevices utils     datasets 
 [8] methods   base     
 
 other attached packages:
  [1] rstan_2.19.3         StanHeaders_2.21.0-1 VGAM_1.1-3          
  [4] plotly_4.9.3         data.table_1.13.0    rmsb_0.0.2          
  [7] rms_6.2-1            SparseM_1.78         Hmisc_4.6-0         
 [10] ggplot2_3.3.2        Formula_1.2-3        survival_3.2-11     
 [13] lattice_0.20-44

To cite R in publications use:

R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

To cite the Hmisc package in publications use:

Harrell Jr F (2021). Hmisc: Harrell Miscellaneous. R package version 4.6-0, https://hbiostat.org/R/Hmisc/.

To cite the rms package in publications use:

Harrell Jr FE (2021). rms: Regression Modeling Strategies. https://hbiostat.org/R/rms/, https://github.com/harrelfe/rms.

To cite the rmsb package in publications use:

Harrell F (2021). rmsb: Bayesian Regression Modeling Strategies. R package version 0.0.2, https://hbiostat.org/R/rmsb/.

To cite the VGAM package in publications use:

Yee TW (2015). Vector Generalized Linear and Additive Models: With an Implementation in R. Springer, New York, USA.

To cite the data.table package in publications use:

Dowle M, Srinivasan A (2020). data.table: Extension of 'data.frame'. R package version 1.13.0, https://CRAN.R-project.org/package=data.table.

To cite the ggplot2 package in publications use:

Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org.

To cite the rstan package in publications use:

Stan Development Team (2020). “RStan: the R interface to Stan.” R package version 2.19.3, http://mc-stan.org/.

To cite the survival package in publications use:

Therneau T (2021). A Package for Survival Analysis in R. R package version 3.2-11, https://CRAN.R-project.org/package=survival.

A singularity occurred when trying to allow gap to be a linear spline.↩︎