Longitudinal Ordinal Analysis for ORCHID
Introduction
ORCHID was a randomized clinical trial of Hydroxychloroquine vs. placebo for treatment of hospitalized COVID-19 patients. The main publication is here. The primary analysis for the trial used a covariate-adjusted proportional odds model for the 7-level WHO COVID ordinal scale measured at 14 days. There was little evidence for any effectiveness of the drug, despite the fact that the U.S. stockpiled 63,000,000 doses of it. The purpose of the reanalyses done here is to learn from a longitudinal analysis using all available ordinal outcomes. These are assessed at days 1, 2, 3, 4, 7, 14, 28 after randomization. It will be seen that even with this more powerful longitudinal analysis, there remains very little evidence for drug effectiveness.
After much experimentation with various longitudinal models (and also see here), a first-order Markov state transition model for ordinal outcome states provides the best fit to the correlation structure found in the data. The Markov model uses the previous ordinal state as a (categorical) predictor of the current state. It allows the correlation between states on two successive days to be arbitrarily high within patient, and it elegantly handles absorbing states such as death. Excellent handling of the correlation structure is done without the complexities of random effects. See below for justification of this statement. Go here for a list of papers about Markov longitudinal modeling of binary and ordinal outcomes.
Markov models in their usual form assume that the outcome variable is also assessed at baseline (time zero). Such is the case for ORCHID. The fact that some of the outcome levels (e.g., death) are not used at baseline is of no consequence.
The only obstacle to using transition models is that the primary model parameters (here, in a proportional odds model) relate to the probability of being in the current state or higher given the previous state. For day 1 post randomization, this is no problem. But we need to restate the model to obtain time-unconditional probabilities of being in state y or worse (“state occupancy probabilities”). This is done through recursive (through time) matrix multiplications to “uncondition” the transition probabilities to get probabilities of being in the current state regardless of previous states. This are simple to interpret quantities that are functions of all model parameters. Because of that it is difficult to get confidence intervals and statistical tests using frequentist methods. In examples below it is seen that the Bayesian approach provides exact inference, taking all uncertainties into account. Graphs will show the probability of the patient being at home as a function of time and treatment. Home is almost, but not quite, an absorbing state. So it is possible that such graphs will increase up to a point then slightly decrease (that didn’t happen in this study). If we consider home to be an absorbing state, then the graph is a cumulative incidence graph.
Various ways of restating the model results are also given below. One of these is the mean restricted time to the patient going home.
Note that the transition model needs to take absolute time (days from randomization) into account, as well as previous state. That is because the mix of outcomes changes over time in a way that simple transition odds ratios can’t capture.
To be added to this report is an example of using the fitted model to simulate longitudinal ordinal data, and using the simulation to estimate probabilities of more complex quantities than can easily be obtained by the analytic solutions shown here. For example, one can simulate data to compute the probability that patients on a given treatment with given baseline covariates will be alive on day 28 if they are on a ventilator on day 14.
require(rmsb)
require(data.table)
require(plotly)
knitrSet(lang='markdown', w=7, h=7, dev='png', fig.path='png/a-')
options(prType='html')
# Get around a bug in tooltip construction with ggplotly
# See https://stackoverflow.com/questions/66316337
<- function(obj) {
ggp <- ggplotly(obj, tooltip='label')
g <- g$x$data
d for(i in 1 : length(d)) {
<- d[[i]]$text
w if(length(w)) d[[i]]$text <- gsub('txt: ', '', w)
}$x$data <- d
g
g
}
<- readRDS('source/orchid.rds')
d := day - shift(day), by=id]
d[, gap is.na(gap), gap := 1]
d[label(d$gap) <- 'Days since last measurement'
table(day, gap)] d[,
## gap
## day 1 3 7 14
## 1 479 0 0 0
## 2 479 0 0 0
## 3 478 0 0 0
## 4 475 0 0 0
## 7 0 474 0 0
## 14 0 0 461 0
## 28 0 0 0 447
# Also fetch ORCHID dataset for all days, carrying death forward
<- readRDS('source/orchidAllDays.rds')
dall <-
ordLevels c('death',
'hospitalized on invasive mechanical ventilation/ECMO',
'hospitalized on non-invasive ventilation or high flow nasal cannula',
'hospitalized on supplemental oxygen',
'hospitalized not on supplemental oxygen',
'not hospitalized with limitation of activity',
'not hospitalized without limitation in activity')
Descriptive Statistics
html(describe(d))
18 Variables 3293 Observations
id: Subject ID
n | missing | distinct |
---|---|---|
3293 | 0 | 479 |
lowest : | ORCH-0001 | ORCH-0002 | ORCH-0003 | ORCH-0004 | ORCH-0005 |
highest: | ORCH-0475 | ORCH-0476 | ORCH-0477 | ORCH-0478 | ORCH-0479 |
age
n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
3293 | 0 | 70 | 1 | 56.51 | 18.67 | 29 | 35 | 44 | 57 | 68 | 78 | 85 |
icu
n | missing | distinct | Info | Sum | Mean | Gmd |
---|---|---|---|---|---|---|
3293 | 0 | 2 | 0.456 | 616 | 0.1871 | 0.3042 |
bmi: Body Mass Index
n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
3060 | 233 | 396 | 1 | 32.8 | 10.01 | 20.80 | 22.68 | 26.84 | 31.14 | 36.93 | 44.46 | 50.22 |
lowest : | 7.098765 | 8.650519 | 11.428571 | 17.689789 | 17.968750 |
highest: | 67.520776 | 67.827191 | 75.054815 | 78.694069 | 96.559285 |
y0: Baseline status
n | missing | distinct | Info | Mean | Gmd |
---|---|---|---|---|---|
3293 | 0 | 4 | 0.85 | 4.114 | 0.8686 |
Value 2 3 4 5 Frequency 214 364 1549 1166 Proportion 0.065 0.111 0.470 0.354
deathday: Days from randomization to death
n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
318 | 2975 | 25 | 0.996 | 15.38 | 9.578 | 5 | 6 | 8 | 15 | 20 | 28 | 30 |
pfratio: PaO2 / FiO2
n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
403 | 2890 | 53 | 0.999 | 143.8 | 94 | 45.00 | 53.33 | 80.00 | 127.50 | 220.00 | 267.86 | 295.24 |
lowest : | 27.36842 | 30.00000 | 44.00000 | 45.00000 | 50.00000 |
highest: | 290.47619 | 295.23810 | 322.50000 | 343.33333 | 438.09524 |
race
n | missing | distinct |
---|---|---|
3293 | 0 | 3 |
Value black white other Frequency 789 1451 1053 Proportion 0.240 0.441 0.320
sfratio
n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
3125 | 168 | 161 | 0.999 | 348.6 | 118.6 | 124.3 | 178.0 | 283.3 | 383.3 | 438.1 | 452.4 | 461.9 |
lowest : | 0.9666667 | 30.0000000 | 69.0000000 | 78.0000000 | 79.0000000 |
highest: | 457.1428571 | 461.9047619 | 466.6666667 | 471.4285714 | 476.1904762 |
sofa_gcs
n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
3293 | 0 | 17 | 0.963 | 3.088 | 2.658 | 0 | 1 | 1 | 2 | 4 | 6 | 9 |
Value 0 1 2 3 4 5 6 7 8 9 10 11 Frequency 189 741 891 503 390 182 84 67 33 61 25 14 Proportion 0.057 0.225 0.271 0.153 0.118 0.055 0.026 0.020 0.010 0.019 0.008 0.004 Value 12 13 14 16 17 Frequency 54 33 12 7 7 Proportion 0.016 0.010 0.004 0.002 0.002
sofa_nogcs
n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
3293 | 0 | 13 | 0.959 | 2.773 | 2.188 | 0 | 1 | 1 | 2 | 4 | 6 | 7 |
Value 0 1 2 3 4 5 6 7 8 9 11 13 Frequency 189 783 918 539 359 165 82 110 47 75 12 7 Proportion 0.057 0.238 0.279 0.164 0.109 0.050 0.025 0.033 0.014 0.023 0.004 0.002 Value 14 Frequency 7 Proportion 0.002
time_to_recovery
n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
3293 | 0 | 29 | 0.98 | 9.132 | 10.45 | 0 | 0 | 0 | 5 | 15 | 28 | 28 |
sex
n | missing | distinct |
---|---|---|
3293 | 0 | 2 |
Value female male Frequency 1448 1845 Proportion 0.44 0.56
trt: Treatment
n | missing | distinct |
---|---|---|
3293 | 0 | 2 |
Value placebo hydroxychloroquine Frequency 1631 1662 Proportion 0.495 0.505
day: Days since randomization
n | missing | distinct | Info | Mean | Gmd |
---|---|---|---|---|---|
3293 | 0 | 7 | 0.98 | 8.217 | 8.698 |
Value 1 2 3 4 7 14 28 Frequency 479 479 478 475 474 461 447 Proportion 0.145 0.145 0.145 0.144 0.144 0.140 0.136
y: Ordinal Outcome
n | missing | distinct | Info | Mean | Gmd |
---|---|---|---|---|---|
3293 | 0 | 7 | 0.963 | 4.712 | 1.804 |
Value 1 2 3 4 5 6 7 Frequency 50 357 297 817 526 757 489 Proportion 0.015 0.108 0.090 0.248 0.160 0.230 0.148
yprev: Previous ordinal outcome
n | missing | distinct | Info | Mean | Gmd |
---|---|---|---|---|---|
3293 | 0 | 6 | 0.952 | 4.486 | 1.572 |
Value 2 3 4 5 6 7 Frequency 366 352 1022 683 610 260 Proportion 0.111 0.107 0.310 0.207 0.185 0.079
gap: Days since last measurement
n | missing | distinct | Info | Mean | Gmd |
---|---|---|---|---|---|
3293 | 0 | 4 | 0.796 | 3.892 | 4.216 |
Value 1 3 7 14 Frequency 1911 474 461 447 Proportion 0.580 0.144 0.140 0.136
Event Chart for Random Sample of Patients
A status timeline is shown for each of 65 randomly chosen patients. The patients are sorted so that the worst outcomes at the last follow-up day are at the top. One day is subtract from the time variable to make follow-up start at day 0.
<- sample(unique(d$id), 65, FALSE)
ssamp <- subset(d, id %in% ssamp)
dr $id <- as.integer(as.factor(dr$id))
dr$status <- factor(dr$y, levels=7:1)
dr$day <- dr$day - 1
drmultEventChart(status ~ day + id, data=dr,
absorb='1', sortbylast = TRUE) +
theme_classic() +
theme(legend.position='bottom')
Summary of Successive State Transitions
<- propsTrans(y ~ day + id, data=dall, maxsize=6, labels=ordLevels) +
gg theme(legend.position='bottom',
axis.text.x=element_text(angle=90, hjust=1))
ggp(gg)
# For a nice event chart for this type of data see https://livefreeordichotomize.com/2020/05/21/survival-model-detective-1
Category Proportions Over Time
The following plot shows how the mix of outcomes changes over time stratified also by treatment.
<- propsPO(y ~ trt + day, nrow=1, data=dall) +
gg theme(legend.position='bottom',
axis.text.x=element_text(angle=90, hjust=1))
ggp(gg)
Single-Day Analysis
To provide a result similar to the ORCHID’s pre-specified primary analysis, we use a proportional odds model for day 14 status in isolation.
<- subset(dall, day == 14)
d14 $y0 <- as.factor(d14$y0)
d14<- lrm(y ~ trt + y0 + rcs(age, 3) + race + sofa_nogcs + sex, data=d14)
f f
Logistic Regression Model
lrm(formula = y ~ trt + y0 + rcs(age, 3) + race + sofa_nogcs + sex, data = d14)
Frequencies of Responses
1 2 3 4 5 6 7 32 42 12 40 37 169 147
Model Likelihood Ratio Test |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 479 | LR χ2 176.28 | R2 0.320 | C 0.731 |
max |∂log L/∂β| 2×10-12 | d.f. 10 | g 1.360 | Dxy 0.461 |
Pr(>χ2) <0.0001 | gr 3.895 | γ 0.462 | |
gp 0.180 | τa 0.349 | ||
Brier 0.106 |
β | S.E. | Wald Z | Pr(>|Z|) | |
---|---|---|---|---|
y≥2 | 4.2312 | 0.8255 | 5.13 | <0.0001 |
y≥3 | 3.0594 | 0.8146 | 3.76 | 0.0002 |
y≥4 | 2.8138 | 0.8141 | 3.46 | 0.0005 |
y≥5 | 2.1481 | 0.8141 | 2.64 | 0.0083 |
y≥6 | 1.6364 | 0.8134 | 2.01 | 0.0442 |
y≥7 | -0.2346 | 0.8054 | -0.29 | 0.7708 |
trt=hydroxychloroquine | 0.0437 | 0.1701 | 0.26 | 0.7973 |
y0=3 | -0.1120 | 0.4477 | -0.25 | 0.8025 |
y0=4 | 0.9315 | 0.4550 | 2.05 | 0.0406 |
y0=5 | 1.5653 | 0.4925 | 3.18 | 0.0015 |
age | -0.0098 | 0.0127 | -0.77 | 0.4395 |
age’ | -0.0222 | 0.0153 | -1.45 | 0.1475 |
race=white | -0.4846 | 0.2244 | -2.16 | 0.0308 |
race=other | 0.0313 | 0.2452 | 0.13 | 0.8985 |
sofa_nogcs | -0.2699 | 0.0588 | -4.59 | <0.0001 |
sex=male | -0.1971 | 0.1820 | -1.08 | 0.2788 |
Serial Conditional Markov Ordinal Model
Here we use a first order Markov process where the PO model on a given day is conditional on the ordinal outcome on the previous day as well as on the other variables. The 1-day lagged outcome status is variable yprev
. Note that the gaps (1 1 1 3 7 14) between measurement times (1 2 3 4 7 14 28) has a linear correlation with the measurement times of 0.998, so including day
as an effect modifier of previous status is similar to allowing the influence of the previous state to be a function of the gap duration.
First do a consolidated analysis in which all observed gap-previous state combinations are modeled as indicator variables. The reference cell here is gap=1 previous state 2. Then look at effects of previous state stratified by distinct levels of gap
. Interpretation is clouded by all effects being relative to previous state 2.
$yprev <- factor(d$yprev)
d<- datadist(d); options(datadist='dd')
dd <- sort(unique(d$gap))
ugaps <- with(d, interaction(gap, yprev, drop=TRUE))
z <- lrm(y ~ z + day + rcs(age, 3) + race + sofa_nogcs + sex, data=d)
f <- coef(f)[grep('z=', names(coef(f)))]
co names(co) <- sub('z=', '', names(co))
<- names(co)
n <- strsplit(n, '\\.')
x <- sapply(x, function(u) as.numeric(u[1])) # gaps
g <- sapply(x, function(u) as.numeric(u[2])) # yprev
yp <- data.frame(gap=g, yp=yp, effect=unname(co))
ef # Reference cell in model was gap=1 yprev=2 so add a zero for this
<- rbind(data.frame(gap=1, yp=2, effect=0), ef)
ef print(ef, digits=3)
gap yp effect
1 1 2 0.000
2 3 2 -0.796
3 7 2 -1.626
4 14 2 -5.056
5 1 3 2.365
6 3 3 2.057
7 7 3 1.790
8 14 3 1.104
9 1 4 4.959
10 3 4 6.356
11 7 4 5.427
12 14 4 4.052
13 1 5 7.388
14 3 5 8.255
15 7 5 7.345
16 14 5 5.390
17 1 6 9.311
18 3 6 9.715
19 7 6 8.069
20 14 6 5.656
21 7 7 9.385
22 14 7 7.396
# Fit gap trend separately by yprev
ggplot(ef, aes(x=gap, y=effect, col=factor(yp))) + geom_line() + geom_point()
<- c('yprev=3', 'yprev=4', 'yprev=5', 'yprev=6', 'yprev=7')
n <- matrix(NA, nrow=length(ugaps), ncol=length(n), dimnames=list(ugaps, n))
k <- 0
i for(gp in ugaps) {
<- i + 1
i <- subset(d, gap == gp)
s $yprev <- s$yprev[drop=TRUE]
s<- if(gp == 1) y ~ day + yprev + rcs(age, 3) + race + sofa_nogcs + sex else
form ~ yprev + rcs(age, 3) + race + sofa_nogcs + sex
y <- y ~ yprev
form if(gp == 1) form <- y ~ day + yprev
<- lrm(form, data=s)
f <- intersect(n, names(coef(f)))
m <- coef(f)[m]
k[i, m] }
round(k, 2)
yprev=3 yprev=4 yprev=5 yprev=6 yprev=7
1 4.28 7.99 11.34 23.54 NA
3 2.43 4.80 5.92 6.78 NA
7 1.71 3.19 4.71 5.45 6.72
14 1.68 3.32 4.47 4.91 6.64
<- zp <- NULL
z <- seq(1, 14, by=.1)
xx <- data.frame(ugaps=xx)
w for(j in 1 : 5) {
<- k[, j]
y <- if(j == 5) 'power fixed' else c('linear spline', 'exponential', 'power', 'power fixed')
mods for(model in mods) {
<- switch(model,
form 'linear spline' = y ~ lsp(ugaps, 3),
'exponential' = log(y) ~ ugaps,
'power' = log(y) ~ log(ugaps),
'power fixed' = NA)
if(model != 'power fixed') {
<- ols(form)
f if(model == 'power') print(coef(f))
<- predict(f, w)
p if(model != 'linear spline') p <- exp(p)
else {
} <- exp(mean(log(y / (ugaps ^ -0.32)), na.rm=TRUE))
const <- const * (xx ^ -0.32)
p
}<- paste('Previous state:', j + 2)
prev <- rbind(z, data.frame(model=model, gap=xx, y=p, yprev=prev))
z <- rbind(zp, data.frame(model=model, gap=ugaps, y=y, yprev=prev))
zp
} }
Intercept ugaps
1.3760590 -0.3708071
Intercept ugaps
2.0091496 -0.3570896
Intercept ugaps
2.320715 -0.356863
Intercept ugaps
2.9231801 -0.5866748
ggplot(z, aes(x=gap, y=y, color=model)) + geom_line() +
geom_point(data=zp, aes(x=gap, y=y)) +
facet_wrap(~ yprev)
By combining the patterns of both approaches, time gap is effectively modeled as a linear spline with knot at 3, which allows for a very different effect at the most command gap of 1d. But the use of two terms for the gap effect will result in a singularity when interacting gap
with previous state, because some previous states do not occur at some gaps. The power function specifying a gap effect of a constant times \(d^{-0.32}\) where \(d\) is the time gap fits very well except for previous state 6, which is the only state in the above graph for which you can see a difference between the fitted power function (which estimated an exponent of -0.58) and this function. By using a single parameter (the exponent) to capture the gap
effect for all possible gaps one can reduce the number of parameters to estimate by omitting previous state main effects from the model (see below). But again the fact that we are subtracting effects for being in previous state 2 makes this confusing. This will be explored in a better way in the VIOLET dataset where artifical gaps can be created in a way that is independent of absolute time, since in VIOLET all consecutive days had outcome assessments.
In formulating our model we need the following considerations.
- transitions depend on a flexible function of time
- the effect of the previous state may change with time or gap time
- the treatment effect may change with time
- there may be a weird effect where the effect of treatment may differ according to the previous outcome state
Due to a singularity (caused by collinearity), a model could not be fitted that allowed previous state to interact with both day
and gap
. Also, a singularity resulted when the linear spline is used for gap
. There are no records with yprev=7
for gap
< 7. So force gap
to be linear for now. Let’s fit two models to see which one more strongly interacts, then use that one by itself.
<- lrm(y ~ gap * yprev + pol(day,2) * (trt + rcs(age, 3) +
f1 + sofa_nogcs + sex) +
race * trt,
yprev data=d)
<- lrm(y ~ gap + day * (yprev + trt + rcs(age, 3) +
f2 + sofa_nogcs + sex) +
race * trt,
yprev data=d)
AIC(f1); AIC(f2) # first model way better
[1] 7004.827 [1] 7060.269
<- f1
f anova(f)
Wald Statistics for y
|
|||
χ2 | d.f. | P | |
---|---|---|---|
gap (Factor+Higher Order Factors) | 110.32 | 6 | <0.0001 |
All Interactions | 108.29 | 5 | <0.0001 |
yprev (Factor+Higher Order Factors) | 1878.87 | 15 | <0.0001 |
All Interactions | 114.56 | 10 | <0.0001 |
day (Factor+Higher Order Factors) | 129.38 | 16 | <0.0001 |
All Interactions | 45.62 | 14 | <0.0001 |
Nonlinear (Factor+Higher Order Factors) | 79.28 | 8 | <0.0001 |
trt (Factor+Higher Order Factors) | 7.15 | 8 | 0.5204 |
All Interactions | 7.11 | 7 | 0.4177 |
age (Factor+Higher Order Factors) | 45.95 | 6 | <0.0001 |
All Interactions | 18.47 | 4 | 0.0010 |
Nonlinear (Factor+Higher Order Factors) | 3.94 | 3 | 0.2683 |
race (Factor+Higher Order Factors) | 12.28 | 6 | 0.0560 |
All Interactions | 8.56 | 4 | 0.0732 |
sofa_nogcs (Factor+Higher Order Factors) | 29.11 | 3 | <0.0001 |
All Interactions | 9.20 | 2 | 0.0100 |
sex (Factor+Higher Order Factors) | 1.19 | 3 | 0.7560 |
All Interactions | 0.41 | 2 | 0.8151 |
gap × yprev (Factor+Higher Order Factors) | 108.29 | 5 | <0.0001 |
day × trt (Factor+Higher Order Factors) | 2.45 | 2 | 0.2932 |
Nonlinear | 1.24 | 1 | 0.2663 |
Nonlinear Interaction : f(A,B) vs. AB | 1.24 | 1 | 0.2663 |
day × age (Factor+Higher Order Factors) | 18.47 | 4 | 0.0010 |
Nonlinear | 10.50 | 3 | 0.0148 |
Nonlinear Interaction : f(A,B) vs. AB | 10.50 | 3 | 0.0148 |
f(A,B) vs. Af(B) + Bg(A) | 2.82 | 1 | 0.0933 |
Nonlinear Interaction in day vs. Af(B) | 9.85 | 2 | 0.0073 |
Nonlinear Interaction in age vs. Bg(A) | 3.43 | 2 | 0.1799 |
day × race (Factor+Higher Order Factors) | 8.56 | 4 | 0.0732 |
Nonlinear | 7.10 | 2 | 0.0287 |
Nonlinear Interaction : f(A,B) vs. AB | 7.10 | 2 | 0.0287 |
day × sofa_nogcs (Factor+Higher Order Factors) | 9.20 | 2 | 0.0100 |
Nonlinear | 8.86 | 1 | 0.0029 |
Nonlinear Interaction : f(A,B) vs. AB | 8.86 | 1 | 0.0029 |
day × sex (Factor+Higher Order Factors) | 0.41 | 2 | 0.8151 |
Nonlinear | 0.00 | 1 | 0.9892 |
Nonlinear Interaction : f(A,B) vs. AB | 0.00 | 1 | 0.9892 |
yprev × trt (Factor+Higher Order Factors) | 6.07 | 5 | 0.2990 |
TOTAL NONLINEAR | 80.29 | 10 | <0.0001 |
TOTAL INTERACTION | 153.84 | 24 | <0.0001 |
TOTAL NONLINEAR + INTERACTION | 234.98 | 26 | <0.0001 |
TOTAL | 2512.52 | 39 | <0.0001 |
ggplot(Predict(f, gap, yprev))
There was little evidence for a treatment effect at any time, and little evidence that the effect of treatment on state transitions varied with time. There is also little evidence that the treatment effect differed for different previous states.
The graph above shows that being at home without disability is almost an absorbing state, and that the effect of previous state varies with time gap.
For now let’s simplify the model by having previous state only interact with gap (despite some evidence for the age effect varying over time), and removing the interactions with treatment. Now allow the effect of absolute time to be quadratic1. Compare the variances of parameter estimates with cluster sandwich robust estimates.
<- lrm(y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 3) + race + sofa_nogcs + sex,
f data=d, x=TRUE, y=TRUE)
f
Logistic Regression Model
lrm(formula = y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 3) + race + sofa_nogcs + sex, data = d, x = TRUE, y = TRUE)
Frequencies of Responses
1 2 3 4 5 6 7 50 357 297 817 526 757 489
Model Likelihood Ratio Test |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 3293 | LR χ2 4766.76 | R2 0.787 | C 0.908 |
max |∂log L/∂β| 2×10-9 | d.f. 20 | g 4.186 | Dxy 0.816 |
Pr(>χ2) <0.0001 | gr 65.746 | γ 0.821 | |
gp 0.306 | τa 0.667 | ||
Brier 0.058 |
β | S.E. | Wald Z | Pr(>|Z|) | |
---|---|---|---|---|
y≥2 | 2.4712 | 0.3281 | 7.53 | <0.0001 |
y≥3 | -1.0620 | 0.3185 | -3.33 | 0.0009 |
y≥4 | -2.9688 | 0.3342 | -8.88 | <0.0001 |
y≥5 | -6.0554 | 0.3481 | -17.40 | <0.0001 |
y≥6 | -7.8957 | 0.3556 | -22.20 | <0.0001 |
y≥7 | -10.8037 | 0.3689 | -29.28 | <0.0001 |
gap | -0.2184 | 0.0839 | -2.60 | 0.0093 |
yprev=3 | 2.2491 | 0.2230 | 10.09 | <0.0001 |
yprev=4 | 4.8417 | 0.2303 | 21.02 | <0.0001 |
yprev=5 | 7.2783 | 0.2560 | 28.43 | <0.0001 |
yprev=6 | 9.3438 | 0.2776 | 33.66 | <0.0001 |
yprev=7 | 9.2858 | 0.5779 | 16.07 | <0.0001 |
day | 0.2792 | 0.0392 | 7.12 | <0.0001 |
day2 | -0.0052 | 0.0007 | -7.01 | <0.0001 |
trt=hydroxychloroquine | 0.0134 | 0.0683 | 0.20 | 0.8444 |
age | -0.0086 | 0.0049 | -1.75 | 0.0800 |
age’ | -0.0034 | 0.0058 | -0.60 | 0.5515 |
race=white | -0.0641 | 0.0877 | -0.73 | 0.4650 |
race=other | 0.0784 | 0.0951 | 0.82 | 0.4098 |
sofa_nogcs | -0.0886 | 0.0205 | -4.31 | <0.0001 |
sex=male | -0.0612 | 0.0719 | -0.85 | 0.3948 |
gap × yprev=3 | 0.2064 | 0.0613 | 3.37 | 0.0008 |
gap × yprev=4 | 0.3304 | 0.0385 | 8.59 | <0.0001 |
gap × yprev=5 | 0.2404 | 0.0392 | 6.13 | <0.0001 |
gap × yprev=6 | 0.0953 | 0.0332 | 2.87 | 0.0041 |
gap × yprev=7 | 0.2302 | 0.0569 | 4.04 | <0.0001 |
anova(f)
Wald Statistics for y
|
|||
χ2 | d.f. | P | |
---|---|---|---|
gap (Factor+Higher Order Factors) | 106.77 | 6 | <0.0001 |
All Interactions | 104.49 | 5 | <0.0001 |
yprev (Factor+Higher Order Factors) | 1891.62 | 10 | <0.0001 |
All Interactions | 104.49 | 5 | <0.0001 |
day | 84.64 | 2 | <0.0001 |
Nonlinear | 49.14 | 1 | <0.0001 |
trt | 0.04 | 1 | 0.8444 |
age | 26.83 | 2 | <0.0001 |
Nonlinear | 0.35 | 1 | 0.5515 |
race | 3.14 | 2 | 0.2078 |
sofa_nogcs | 18.61 | 1 | <0.0001 |
sex | 0.72 | 1 | 0.3948 |
gap × yprev (Factor+Higher Order Factors) | 104.49 | 5 | <0.0001 |
TOTAL NONLINEAR | 49.55 | 2 | <0.0001 |
TOTAL NONLINEAR + INTERACTION | 186.88 | 7 | <0.0001 |
TOTAL | 2507.19 | 20 | <0.0001 |
anova(f, age, race, sex) # combined effects of 3 variables
Wald Statistics for y
|
|||
χ2 | d.f. | P | |
---|---|---|---|
age | 26.83 | 2 | <0.0001 |
Nonlinear | 0.35 | 1 | 0.5515 |
race | 3.14 | 2 | 0.2078 |
sex | 0.72 | 1 | 0.3948 |
TOTAL | 33.50 | 5 | <0.0001 |
<- robcov(f, d$id)
g htmlVerbatim(round(diag(vcov(f)) / diag(vcov(g)), 2))
y>=2 y>=3 y>=4 1.40 1.31 1.11 y>=5 y>=6 y>=7 0.97 0.96 0.91 gap yprev=3 yprev=4 1.20 0.85 0.79 yprev=5 yprev=6 yprev=7 0.75 0.79 0.84 day day^2 trt=hydroxychloroquine 1.52 0.72 1.22 age age' race=white 1.53 1.26 1.24 race=other sofa_nogcs sex=male 1.22 1.16 1.24 gap * yprev=3 gap * yprev=4 gap * yprev=5 0.21 0.52 0.52 gap * yprev=6 gap * yprev=7 0.57 0.72
The two variance estimates are similar, giving an indirect indication that the assumption of conditional independence (conditioning on previous state) is reasonable. The variance ratio is 1.2 for treatment (ordinary variance estimate is larger than the intra-cluster correlation-robust estimate).
Proportional Odds Assumption
The treatment is too weak to detect any non-proportional odds. Let’s check the PO assumption on the two strongest variables: gap
and yprev
. We omit important interactions between them and make day
linear to be able to limit the number of regression coefficient estimates to examine as a function of y-cutoff. Expect results for the first cutoff, which corresponds to predicting deaths, to be unstable due to there being only 50 deaths.
<- c('yprev=3', 'yprev=4', 'yprev=5', 'yprev=6', 'yprev=7', 'day', 'gap')
n <- se <- matrix(NA, nrow=6, ncol=length(n))
z dimnames(z) <- dimnames(se) <- list(paste0('y>=', 2:7), n)
for(ycut in 2 : 7) {
<- lrm(y >= ycut ~ gap + yprev + day + trt + rcs(age, 3) + race +
g + sex,
sofa_nogcs data=d, tol=1e-14, maxit=30)
- 1, n] <- coef(g)[n]
z [ycut - 1, n] <- sqrt(diag(vcov(g)[n, n]))
se[ycut
}round(z, 2)
yprev=3 yprev=4 yprev=5 yprev=6 yprev=7 day gap
y>=2 0.41 1.70 2.76 2.71 10.61 -0.30 0.35
y>=3 3.29 5.09 6.43 6.20 13.45 -0.05 0.11
y>=4 1.20 4.82 6.59 6.72 12.62 0.24 -0.32
y>=5 1.94 3.35 6.70 8.10 8.02 0.41 -0.53
y>=6 2.58 3.82 5.01 8.42 7.42 0.64 -0.93
y>=7 2.32 3.06 3.75 4.50 6.08 0.28 -0.35
round(se, 2)
yprev=3 yprev=4 yprev=5 yprev=6 yprev=7 day gap
y>=2 0.47 0.45 0.69 0.64 27.72 0.24 0.47
y>=3 0.24 0.28 0.54 0.54 30.24 0.10 0.20
y>=4 0.26 0.27 0.41 0.55 20.70 0.08 0.16
y>=5 0.42 0.39 0.42 0.54 1.07 0.07 0.13
y>=6 0.55 0.51 0.52 0.56 0.64 0.07 0.14
y>=7 1.08 1.02 1.02 1.02 1.03 0.11 0.20
The effect of day
and gap
are roughly linear in the y-cutoff.
yprev
may be predicting death differently from how it predicts the nonfatal outcomes.
The VGAM
package vglm
function would not converge for a partial PO model that allowed for non-PO in yprev
.
Computing Unconditional Estimates
One can uncondition the previous model to obtain unconditional treatment effects, although these will be covariate- and time-dependent. We specify covariate settings (here we use representative values for each covariate, just for convenience, using a method described below) and vary the treatment assignment. Note the following.
- The initial state (baseline ordinal status) is always conditioned on, so is not marginalized out. Calculations can be done separately for each initial state.
- Given this initial state, use the fitted model to get the conditional distribution of Y at (post-randomization) day 1 given the initial state, covariate settings, and treatment.
- Likewise for later days.
- Death is an absorbing state so once death occurs all future status values are also set to death.
- Were time to be continuous one can find the first time for which the probability of being in state y or worse exceeds 0.5, which would define the median time until reaching that state. For the ORCHID study, gaps between measurements are too long for this unless time-interpolation is used. It may be better to focus on estimating the probability of being in state y or worse as a function of days since randomization, treatment, and covariates (covariates include the initial state).
- This approach does not factor in uncertainties in parameter estimates, so provides only a point estimate. The simple Bayesian solution to this is provided later.
Start with computation of a set of representative covariate values that will be used when getting estimates related to transformed parameters. This is done by using the global transition model fitted above, evaluating it at day 14 for a previous state of y=4, and finding covariate settings that give predicted values that are close to the unadjusted estimates for placebo. Of the qualifying settings, one that is subjectively closer to the center of the data distribution is chosen. For simplicity we drop weak categorical predictors sex and race. To better fit we allow time to have a nonlinear effect. First let’s see the distribution of day
by yprev
.
ggplot(d, aes(x=day)) + geom_histogram() + facet_wrap(~ yprev)
# For yprev=7 there are only 2 distinct assessment days (14, 28)
# So need interaction to be restricted to be linear
<- lrm(y ~ gap * yprev + rcs(day, 4) + trt + rcs(age, 3) + sofa_nogcs,
f data=d)
<- subset(d, day == 1)
newdat <- transform(newdat, day=14, yprev='4', gap=1)
newdat <- predict(f, newdat) # uses first intercept by default
xb1
# Do the same when covariates are omitted
<- lrm(y ~ gap * yprev + rcs(day, 4), data=d)
fu <- predict(fu, newdat[1, ])
xb2 <- abs(xb1 - xb2)
diff <- newdat[, c('age', 'sofa_nogcs', 'trt')]
w $diff <- abs(xb1 - xb2)
wsetDT(w)
<- w[order(diff), ]
wi # Compute mean and median baseline covariates
<- all.vars(~ age + sofa_nogcs)
co setDT(newdat)
<- function(x) c(mean=mean(x), median=median(x))
h <- function(x) { s <- sapply(x, h); w <- as.list(s)
u names(w) <- outer(rownames(s), colnames(s), paste, sep='.'); w}
u(.SD), .SDcols=co] newdat[,
mean.age median.age mean.sofa_nogcs median.sofa_nogcs
1: 56.85386 57 2.814196 2
# Seek age=57, sofa 2 or 3
1:30, ] wi[
age sofa_nogcs trt diff
1: 47 7 hydroxychloroquine 0.001365465
2: 63 5 hydroxychloroquine 0.001857581
3: 76 3 hydroxychloroquine 0.002198086
4: 75 3 placebo 0.002726674
5: 88 1 hydroxychloroquine 0.005283669
6: 70 4 hydroxychloroquine 0.006873129
7: 82 2 placebo 0.007587458
8: 48 7 hydroxychloroquine 0.008507628
9: 63 5 placebo 0.010995304
10: 45 7 placebo 0.011599411
11: 75 3 hydroxychloroquine 0.011864397
12: 75 3 hydroxychloroquine 0.011864397
13: 83 2 hydroxychloroquine 0.012547716
14: 56 6 placebo 0.014215440
15: 64 5 hydroxychloroquine 0.014776646
16: 64 5 hydroxychloroquine 0.014776646
17: 77 3 hydroxychloroquine 0.016282989
18: 77 3 hydroxychloroquine 0.016282989
19: 48 7 placebo 0.017645351
20: 89 1 placebo 0.017952035
21: 89 1 placebo 0.017952035
22: 89 1 placebo 0.017952035
23: 87 1 hydroxychloroquine 0.019381651
24: 68 4 hydroxychloroquine 0.020411089
25: 71 4 hydroxychloroquine 0.020655362
26: 61 5 hydroxychloroquine 0.023464890
27: 61 5 hydroxychloroquine 0.023464890
28: 80 2 hydroxychloroquine 0.029746227
29: 67 4 hydroxychloroquine 0.033890653
30: 72 4 hydroxychloroquine 0.034516065
age sofa_nogcs trt diff
# There are no good compromises so use marginal representative values
<- list(age=57, sofa_nogcs=3) repcov
There were no representative covariate combinations that were sufficiently close to unadjusted estimates and close to the center of covariate distributions, so we chose marginally central values.
Compute \(\Pr(Y_s \geq y | X, Y_0)\) analytically for a single value of \((y, Y_0, X, s)\), using this recursive formula where \(k\) is the number of outcome states and \(t\) is time:
\[\Pr(Y_{t} = y | X) = \sum_{j=1}^{k} \Pr(Y_{t} = y | X, Y_{t-1} = j) \Pr(Y_{t-1} = j | X)\] Note that recent updates to the R Hmisc
package make the uprob
function below obsolete. The soprobMarkovOrd
function will compute state occupancy probabilities for given covariate settings. For Bayesian models Hmisc
has soprobMarkovOrdm
.
# Compute unconditional probabilities
# Output is a matrix with s rows corresponding to t=times(1), times(2), ...
# Each row has columns corresponding to the probabilities of being in each of the states
# times(1) must be 1
#
# fit is the model fit which must contain the variable named day and the
# previous state, which must be named by the value of pvarname
# data is a data frame or data table that contains covariate settings
# other than day and what's in pvarname. data has one row.
# times is the vector of days for which to evaluate state occupancy probabilities
# ylevels are the ordered levels of the outcome variable
# Set gap=TRUE if predictor gap is in the model and gap needs to be computed
# At t=times[1] gap is taken as times[1]
<- function(fit, data, times, ylevels, pvarname='yprev', gap=FALSE) {
uprob <- length(ylevels)
k <- length(times)
s <- matrix(NA, nrow=s, ncol=k)
P colnames(P) <- as.character(ylevels)
rownames(P) <- as.character(times)
# Never uncondition on initial state
$day <- times[1]
dataif(gap) data$gap <- times[1]
<- predict(fit, data, type='fitted.ind')
p 1, ] <- p
P[# Matrix of conditional probabilities of Y conditioning on previous time Y
# Columns = k conditional probabilities conditional on a single previous state
# Rows = all possible previous states
<- matrix(NA, nrow=k, ncol=k,
cp dimnames=list(paste('t-1', ylevels), paste('t', ylevels)))
<- as.list(data)
data <- as.character(ylevels[-1])
data[[pvarname]] # Above: don't request estimates after absorbing state
<- expand.grid(data)
edata for(it in 2 : s) {
$day <- times[it]
edataif(gap) edata$gap <- times[it] - times[it - 1]
# edata$gap <- c(1,1,1,1,3,3,3,7,7,7,7,7,7,7,rep(14, 14))[times[it]]
# y=1 = death = absorbing state
<- rbind(c(1., rep(0., k - 1)), predict(fit, edata, type='fitted.ind'))
cp # Compute unconditional probabilities of being in all possible states
# at current time t
<- t(cp) %*% P[it - 1, ]
P[it, ]
}
P
}
<- data.frame(yprev='4', age=repcov$age, sofa_nogcs=repcov$sofa_nogcs)
dat
<- c(1, 2, 3, 4, 7, 14, 28)
times <- levels(d$trt)
tlev
<- NULL
U for(trt in tlev) { # separately for treatments
$trt <- trt
dat<- uprob(f, dat, times, 1:7, gap=TRUE)
u cat('\n', trt, '\n\n')
print(round(u, 3))
<- rbind(U, data.frame(trt, day=times, p=u, phome=u[, 6] + u[, 7]))
U }
placebo
1 2 3 4 5 6 7
1 0.001 0.037 0.172 0.646 0.118 0.024 0.001
2 0.007 0.085 0.157 0.491 0.173 0.077 0.008
3 0.016 0.100 0.130 0.395 0.194 0.141 0.024
4 0.028 0.107 0.117 0.339 0.189 0.184 0.036
7 0.033 0.076 0.077 0.223 0.209 0.282 0.100
14 0.038 0.051 0.038 0.104 0.150 0.394 0.225
28 0.042 0.035 0.014 0.036 0.056 0.374 0.444
hydroxychloroquine
1 2 3 4 5 6 7
1 0.001 0.036 0.171 0.646 0.119 0.025 0.001
2 0.007 0.084 0.156 0.491 0.174 0.078 0.009
3 0.016 0.098 0.129 0.394 0.195 0.142 0.024
4 0.027 0.106 0.116 0.338 0.190 0.186 0.036
7 0.033 0.075 0.076 0.222 0.209 0.284 0.101
14 0.037 0.050 0.038 0.103 0.149 0.395 0.229
28 0.041 0.034 0.013 0.035 0.055 0.373 0.448
<- U Rorchid
Plot the unconditional probabilities of being at home at each time, for representative age and sofa score.
<- xlab('Day'); yl <- ylab('Probability of Being at Home')
xl $txt <- with(U, paste0(trt, '<br>Day:', day, '<br>', round(phome, 3)))
U<- ggplot(U, aes(x=day, y=phome, color=trt, label=txt)) +
gg geom_line() +
+ yl + labs(color='')
xl ggp(gg)
Repeat but estimate probabilities at days not observed in the data.
$type <- 'Actual Times'
Ufor(trt in tlev) { # separately for treatments
$trt <- trt
dat<- uprob(f, dat, 1:28, 1:7, gap=TRUE)
u cat('\n', trt, '\n\n')
print(round(u, 3))
<- paste(trt, 'day 1-28')
lab <- rbind(U, data.frame(trt, day=1:28, p=u, phome=u[, 6] + u[, 7],
U type='Days 1-28', txt=''))
}
placebo
1 2 3 4 5 6 7
1 0.001 0.037 0.172 0.646 0.118 0.024 0.001
2 0.007 0.085 0.157 0.491 0.173 0.077 0.008
3 0.016 0.100 0.130 0.395 0.194 0.141 0.024
4 0.028 0.107 0.117 0.339 0.189 0.184 0.036
5 0.046 0.126 0.119 0.317 0.179 0.184 0.030
6 0.084 0.162 0.138 0.309 0.162 0.129 0.014
7 0.170 0.216 0.168 0.275 0.111 0.056 0.004
8 0.342 0.264 0.164 0.173 0.042 0.013 0.001
9 0.609 0.231 0.091 0.058 0.008 0.002 0.000
10 0.867 0.102 0.022 0.009 0.001 0.000 0.000
11 0.980 0.017 0.002 0.001 0.000 0.000 0.000
12 0.999 0.001 0.000 0.000 0.000 0.000 0.000
13 1.000 0.000 0.000 0.000 0.000 0.000 0.000
14 1.000 0.000 0.000 0.000 0.000 0.000 0.000
15 1.000 0.000 0.000 0.000 0.000 0.000 0.000
16 1.000 0.000 0.000 0.000 0.000 0.000 0.000
17 1.000 0.000 0.000 0.000 0.000 0.000 0.000
18 1.000 0.000 0.000 0.000 0.000 0.000 0.000
19 1.000 0.000 0.000 0.000 0.000 0.000 0.000
20 1.000 0.000 0.000 0.000 0.000 0.000 0.000
21 1.000 0.000 0.000 0.000 0.000 0.000 0.000
22 1.000 0.000 0.000 0.000 0.000 0.000 0.000
23 1.000 0.000 0.000 0.000 0.000 0.000 0.000
24 1.000 0.000 0.000 0.000 0.000 0.000 0.000
25 1.000 0.000 0.000 0.000 0.000 0.000 0.000
26 1.000 0.000 0.000 0.000 0.000 0.000 0.000
27 1.000 0.000 0.000 0.000 0.000 0.000 0.000
28 1.000 0.000 0.000 0.000 0.000 0.000 0.000
hydroxychloroquine
1 2 3 4 5 6 7
1 0.001 0.036 0.171 0.646 0.119 0.025 0.001
2 0.007 0.084 0.156 0.491 0.174 0.078 0.009
3 0.016 0.098 0.129 0.394 0.195 0.142 0.024
4 0.027 0.106 0.116 0.338 0.190 0.186 0.036
5 0.045 0.124 0.118 0.316 0.180 0.186 0.030
6 0.083 0.161 0.137 0.309 0.164 0.132 0.015
7 0.167 0.214 0.168 0.277 0.113 0.057 0.005
8 0.337 0.264 0.165 0.176 0.043 0.014 0.001
9 0.604 0.233 0.093 0.060 0.008 0.002 0.000
10 0.863 0.104 0.023 0.009 0.001 0.000 0.000
11 0.979 0.018 0.002 0.001 0.000 0.000 0.000
12 0.999 0.001 0.000 0.000 0.000 0.000 0.000
13 1.000 0.000 0.000 0.000 0.000 0.000 0.000
14 1.000 0.000 0.000 0.000 0.000 0.000 0.000
15 1.000 0.000 0.000 0.000 0.000 0.000 0.000
16 1.000 0.000 0.000 0.000 0.000 0.000 0.000
17 1.000 0.000 0.000 0.000 0.000 0.000 0.000
18 1.000 0.000 0.000 0.000 0.000 0.000 0.000
19 1.000 0.000 0.000 0.000 0.000 0.000 0.000
20 1.000 0.000 0.000 0.000 0.000 0.000 0.000
21 1.000 0.000 0.000 0.000 0.000 0.000 0.000
22 1.000 0.000 0.000 0.000 0.000 0.000 0.000
23 1.000 0.000 0.000 0.000 0.000 0.000 0.000
24 1.000 0.000 0.000 0.000 0.000 0.000 0.000
25 1.000 0.000 0.000 0.000 0.000 0.000 0.000
26 1.000 0.000 0.000 0.000 0.000 0.000 0.000
27 1.000 0.000 0.000 0.000 0.000 0.000 0.000
28 1.000 0.000 0.000 0.000 0.000 0.000 0.000
# Plotly doesn't work will with both color and linetype
ggplot(U, aes(x=day, y=phome, color=trt, linetype=type)) + geom_line() +
+ yl + labs(color='', linetype='') xl
Everything has gone haywire. This is caused by the fact that both gap
and day
needed to be in the model to have the right main effects and interactions, but these two variables are collinear. We are requesting predictions above for combinations of day
and gap
that did not occur in the data, i.e., gap
=1 when day
> 4. Collinearities cause difficulties when geting predictions for out-of-sample data in which the collinearities disagree with those in the sample used in fitting the model.
So when the design is such that one needs to interact gap
with previous state and gap
is not randomly assigned or is otherwise tangled with absolute time, it is not possible to get estimates for unobserved times.
Before giving up, we’ll try a model below that does not contain gap
, and that flexibly models the day
main effect using a spline,
Markov Model Without Covariate Adjustment
Let’s repeat the exercise with no covariate adjustment to check how we are estimating the proportion of patients at home at each observed day. We relax the model to allow it to fit any time trend except that the interaction between day and previous state is restricted to be linear in day. We are making proportional odds assumptions for day, previous state, and treatment. The PO assumption will be less consequential for treatment because of its weak effect.
<- dall[y0 == 4, .(day, y, trt, id)]
dfull # Compute days since previous measurements
<- dfull[, .(phome = mean(y >= 6)), by=.(trt, day)]
s := 'Empirical']
s[, type <- s
empirical
# It would be nice to just have day in the model and not gap (which can be
# derived from day), but the collinearities between yprev and day prevent
# this, and we must interact day with yprev - see the crosstab below
table(day, yprev)] d[,
yprev
day 2 3 4 5 6 7
1 32 55 224 168 0 0
2 48 70 209 142 10 0
3 52 73 189 113 51 0
4 63 57 157 95 103 0
7 69 50 132 75 148 0
14 60 35 71 53 129 113
28 42 12 40 37 169 147
# Treat day as numeric but with all needed variables added (less one)
# to make it be modeled as categorical except for the interaction term
# Using records truncated at death
<- lrm(y ~ gap * yprev + pol(day, 2) + trt, data=d)
fplain # The following has a slightly greater LR chi-square:
<- lrm(y ~ rcs(day, 4) + yprev + day %ia% yprev + trt, data=d)
fplain2 # lrm(y ~ sqrt(day) * yprev + trt, data=d) also fits well
fplain
Logistic Regression Model
lrm(formula = y ~ gap * yprev + pol(day, 2) + trt, data = d)
Frequencies of Responses
1 2 3 4 5 6 7 50 357 297 817 526 757 489
Model Likelihood Ratio Test |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 3293 | LR χ2 4705.63 | R2 0.783 | C 0.905 |
max |∂log L/∂β| 1×10-9 | d.f. 14 | g 4.121 | Dxy 0.810 |
Pr(>χ2) <0.0001 | gr 61.595 | γ 0.825 | |
gp 0.305 | τa 0.662 | ||
Brier 0.058 |
β | S.E. | Wald Z | Pr(>|Z|) | |
---|---|---|---|---|
y≥2 | 1.2998 | 0.1895 | 6.86 | <0.0001 |
y≥3 | -2.1762 | 0.1821 | -11.95 | <0.0001 |
y≥4 | -4.0680 | 0.2103 | -19.34 | <0.0001 |
y≥5 | -7.1255 | 0.2356 | -30.25 | <0.0001 |
y≥6 | -8.9419 | 0.2482 | -36.03 | <0.0001 |
y≥7 | -11.8166 | 0.2703 | -43.71 | <0.0001 |
gap | -0.2125 | 0.0837 | -2.54 | 0.0111 |
yprev=3 | 2.4481 | 0.2163 | 11.32 | <0.0001 |
yprev=4 | 5.1622 | 0.2192 | 23.56 | <0.0001 |
yprev=5 | 7.6626 | 0.2407 | 31.83 | <0.0001 |
yprev=6 | 9.7256 | 0.2657 | 36.60 | <0.0001 |
yprev=7 | 9.6987 | 0.5701 | 17.01 | <0.0001 |
day | 0.2786 | 0.0391 | 7.13 | <0.0001 |
day2 | -0.0051 | 0.0007 | -6.94 | <0.0001 |
trt=hydroxychloroquine | -0.0131 | 0.0679 | -0.19 | 0.8470 |
gap × yprev=3 | 0.2093 | 0.0605 | 3.46 | 0.0005 |
gap × yprev=4 | 0.3010 | 0.0381 | 7.91 | <0.0001 |
gap × yprev=5 | 0.2109 | 0.0388 | 5.44 | <0.0001 |
gap × yprev=6 | 0.0820 | 0.0330 | 2.48 | 0.0130 |
gap × yprev=7 | 0.2155 | 0.0566 | 3.80 | 0.0001 |
anova(fplain)
Wald Statistics for y
|
|||
χ2 | d.f. | P | |
---|---|---|---|
gap (Factor+Higher Order Factors) | 93.36 | 6 | <0.0001 |
All Interactions | 90.84 | 5 | <0.0001 |
yprev (Factor+Higher Order Factors) | 2192.50 | 10 | <0.0001 |
All Interactions | 90.84 | 5 | <0.0001 |
day | 83.98 | 2 | <0.0001 |
Nonlinear | 48.22 | 1 | <0.0001 |
trt | 0.04 | 1 | 0.8470 |
gap × yprev (Factor+Higher Order Factors) | 90.84 | 5 | <0.0001 |
TOTAL NONLINEAR + INTERACTION | 170.18 | 6 | <0.0001 |
TOTAL | 2506.66 | 14 | <0.0001 |
<- data.frame(yprev='4')
dat
<- c(1, 2, 3, 4, 7, 14, 28)
times # gaps <- c(1, diff(times))
<- levels(d$trt)
tlev
for(trt in tlev) { # separately for treatments
$trt <- trt
dat$gap <- 1
dat<- uprob(fplain2, dat, times, 1:7, gap=TRUE)
u cat('\n', trt, '\n\n')
print(round(u, 3))
<- rbind(s, data.frame(trt, day=times, p=u, phome=u[, 6] + u[, 7],
s type='Model'), fill=TRUE)
}
placebo
1 2 3 4 5 6 7
1 0.001 0.036 0.168 0.643 0.124 0.027 0.002
2 0.010 0.092 0.161 0.485 0.164 0.078 0.009
3 0.024 0.113 0.132 0.387 0.184 0.137 0.024
4 0.038 0.110 0.104 0.309 0.192 0.200 0.048
7 0.048 0.087 0.071 0.210 0.194 0.284 0.104
14 0.054 0.059 0.039 0.096 0.135 0.365 0.253
28 0.061 0.042 0.014 0.037 0.053 0.339 0.454
hydroxychloroquine
1 2 3 4 5 6 7
1 0.001 0.037 0.169 0.642 0.123 0.026 0.002
2 0.010 0.093 0.163 0.485 0.163 0.077 0.009
3 0.025 0.114 0.133 0.387 0.182 0.135 0.024
4 0.039 0.112 0.105 0.310 0.191 0.197 0.047
7 0.050 0.089 0.072 0.212 0.194 0.282 0.102
14 0.056 0.060 0.039 0.097 0.136 0.364 0.248
28 0.063 0.043 0.015 0.037 0.054 0.341 0.448
ggplot(s, aes(x=day, y=phome, color=trt, linetype=type)) + geom_line() + xl + yl +
labs(color='', linetype='')
Now add estimates for all days 1-28.
for(trt in tlev) { # separately for treatments
$trt <- trt
dat<- uprob(fplain, dat, 1:28, 1:7, gap=TRUE)
u cat('\n', trt, '\n\n')
print(round(u, 3))
<- rbind(s, data.frame(trt=paste(trt, 'days 1-28'), day=1:28, p=u,
s phome=u[, 6] + u[, 7],
type='Model', times='Days 1-28'), fill=TRUE)
}
placebo
1 2 3 4 5 6 7
1 0.001 0.033 0.155 0.643 0.136 0.030 0.002
2 0.009 0.086 0.154 0.483 0.173 0.085 0.010
3 0.022 0.108 0.131 0.386 0.185 0.142 0.026
4 0.035 0.107 0.108 0.314 0.187 0.200 0.049
5 0.046 0.094 0.088 0.254 0.181 0.255 0.082
6 0.053 0.076 0.071 0.204 0.169 0.303 0.124
7 0.058 0.056 0.055 0.160 0.152 0.341 0.177
8 0.060 0.039 0.042 0.122 0.132 0.366 0.238
9 0.062 0.025 0.029 0.091 0.111 0.377 0.305
10 0.063 0.015 0.020 0.065 0.090 0.374 0.373
11 0.064 0.008 0.012 0.045 0.071 0.361 0.439
12 0.064 0.004 0.007 0.030 0.055 0.340 0.500
13 0.064 0.002 0.004 0.019 0.043 0.315 0.552
14 0.064 0.001 0.002 0.013 0.034 0.290 0.597
15 0.064 0.000 0.001 0.009 0.027 0.266 0.633
16 0.064 0.000 0.001 0.006 0.022 0.244 0.663
17 0.064 0.000 0.000 0.005 0.019 0.224 0.688
18 0.064 0.000 0.000 0.004 0.016 0.208 0.708
19 0.064 0.000 0.000 0.003 0.015 0.194 0.724
20 0.064 0.000 0.000 0.003 0.013 0.182 0.738
21 0.064 0.000 0.000 0.003 0.012 0.172 0.749
22 0.064 0.000 0.000 0.002 0.011 0.164 0.758
23 0.064 0.000 0.000 0.002 0.011 0.158 0.765
24 0.064 0.000 0.000 0.002 0.010 0.153 0.770
25 0.064 0.000 0.000 0.002 0.010 0.150 0.774
26 0.064 0.000 0.000 0.002 0.010 0.148 0.776
27 0.064 0.000 0.000 0.002 0.010 0.147 0.777
28 0.064 0.000 0.000 0.002 0.010 0.147 0.777
hydroxychloroquine
1 2 3 4 5 6 7
1 0.001 0.033 0.157 0.643 0.135 0.030 0.002
2 0.009 0.087 0.156 0.483 0.171 0.083 0.010
3 0.023 0.110 0.132 0.386 0.183 0.140 0.025
4 0.036 0.109 0.109 0.315 0.186 0.197 0.048
5 0.047 0.096 0.089 0.256 0.181 0.251 0.080
6 0.054 0.078 0.072 0.206 0.169 0.300 0.121
7 0.059 0.058 0.057 0.162 0.153 0.338 0.173
8 0.062 0.040 0.043 0.124 0.133 0.364 0.234
9 0.064 0.026 0.030 0.093 0.112 0.376 0.300
10 0.065 0.016 0.020 0.066 0.091 0.374 0.368
11 0.065 0.009 0.013 0.046 0.072 0.361 0.433
12 0.066 0.005 0.007 0.031 0.056 0.341 0.494
13 0.066 0.002 0.004 0.020 0.044 0.317 0.547
14 0.066 0.001 0.002 0.013 0.034 0.292 0.592
15 0.066 0.000 0.001 0.009 0.027 0.268 0.629
16 0.066 0.000 0.001 0.006 0.022 0.246 0.659
17 0.066 0.000 0.000 0.005 0.019 0.226 0.684
18 0.066 0.000 0.000 0.004 0.017 0.210 0.704
19 0.066 0.000 0.000 0.003 0.015 0.195 0.720
20 0.066 0.000 0.000 0.003 0.013 0.184 0.734
21 0.066 0.000 0.000 0.003 0.012 0.174 0.745
22 0.066 0.000 0.000 0.002 0.012 0.166 0.754
23 0.066 0.000 0.000 0.002 0.011 0.159 0.761
24 0.066 0.000 0.000 0.002 0.010 0.154 0.767
25 0.066 0.000 0.000 0.002 0.010 0.151 0.771
26 0.066 0.000 0.000 0.002 0.010 0.149 0.773
27 0.066 0.000 0.000 0.002 0.010 0.148 0.774
28 0.066 0.000 0.000 0.002 0.010 0.149 0.773
ggplot(s, aes(x=day, y=phome, color=trt, linetype=type)) + geom_line() + xl + yl +
labs(color='', linetype='')
Again there is a problem when estimating days not occurring in the data. Try a model that models day
flexibly as a main effect, but that interacts it only linearly with previous state.
<- data.frame(yprev='4')
dat
<- empirical
U for(trt in tlev) { # separately for treatments
$trt <- trt
dat<- uprob(fplain2, dat, 1:28, 1:7, gap=FALSE)
u <- rbind(U, data.frame(trt, day=1:28, p=u, phome=u[, 6] + u[, 7],
U type='Model days 1-28'), fill=TRUE)
}ggplot(U, aes(x=day, y=phome, color=trt, linetype=type)) + geom_line() +
+ yl + labs(color='', linetype='') xl
We see the same problem as before.
Checking Validity of Method for Unequal Time Spacing Using VIOLET 2
We saw above that it is not valid to use a simple model fitted on sparse times to estimate state occupancy probabilities at unobserved times. Let’s now check that estimates are valid when observation timings are irregular but estimates are requested at actually occurring times. For this purpose we use the VIOLET 2 trial, which had a 4-level ordinal outcome scale measured on days 1-28 (records terminate at death). We’ll fit a non-covariate-adjusted (except for treatment) model using only days 1 2 3 4 7 14 28. Then we will request state occupancy probabilities at only those days, then at all days. But first start with fitting using all days and estimating all days.
Something strange happened with regard to ventilator use on day 28, so restrict analysis to first 27 days.
<- readRDS('dcf.rds')
dcf setDT(dcf)
<- dcf[day < 28]
dcf
# Compute empirical proportions of being at home for those starting In Hospital, not on vent
<- dcf[, .(id, day, bstatus, status, treatment, ddeath)]
dfull <- dfull[bstatus == 'In Hospital',
emp phome = mean(status == 'Home', na.rm=TRUE)),
.(=.(treatment, day)]
by:= 'Empirical']
emp[, type := unclass(day)]
emp[, day
<- dfull[day <= ddeath, ] # don't carry death forward
dv <- dv$bstatus
s levels(s) <- list('Vent/ARDS'='ARDS', 'Vent/ARDS'='On Vent',
'In Hospital/Facility'='In Hospital')
table(bstatus, s)] dv[,
s
bstatus Vent/ARDS In Hospital/Facility
ARDS 2062 0
On Vent 8828 0
In Hospital 0 22229
:= s]
dv[, bstatus setkey(dv, id, day)
:= shift(status), by=id]
dv[, pstatus == 1, pstatus := bstatus]
dv[day := pstatus[drop=TRUE]] # drop unused previous status Dead
dv[, pstatus ggplot(dv, aes(x=day)) + geom_histogram(binwidth=1) + facet_wrap(~ pstatus)
<- datadist(dv); options(datadist='dd')
dd
# d2 <- dv[bstatus == 'In Hospital/Facility', ]
# Find model with best AIC
for(k in 3 : 10) {
<- lrm(status ~ rcs(day, k) * pstatus + treatment, data=dv, maxit=40, tol=1e-14)
h if(k == 7) f27 <- h
cat(k, 'knots\n')
print(AIC(h))
}
3 knots
[1] 12877.46
4 knots
[1] 12842.47
5 knots
[1] 12832.9
6 knots
[1] 12826.01
7 knots
[1] 12819.5
8 knots
[1] 12823.38
9 knots
[1] 12822.33
10 knots
[1] 12823.41
# Use 7 knots
<- data.frame(pstatus='In Hospital/Facility')
dat
#times <- c(1, 2, 3, 4, 7, 14, 28)
<- unique(dv$treatment)
tlev <- levels(dv$status)
ylev <- 1:28
times
<- emp
R for(trt in tlev) { # separately for treatments
$treatment <- trt
dat<- uprob(f27, dat, times, ylev, pvarname='pstatus')
u cat('\n', trt, '\n\n')
print(round(u, 3))
<- data.table(treatment=trt, day=times, p=u, phome=u[, 4], type='Model')
w <- rbind(R, w, fill=TRUE)
R }
Placebo
Dead Vent/ARDS In Hospital/Facility Home
1 0.000 0.018 0.925 0.056
2 0.001 0.027 0.846 0.127
3 0.002 0.030 0.757 0.211
4 0.003 0.031 0.663 0.304
5 0.004 0.030 0.574 0.392
6 0.005 0.029 0.499 0.467
7 0.007 0.029 0.440 0.525
8 0.008 0.029 0.394 0.569
9 0.010 0.029 0.359 0.603
10 0.012 0.029 0.328 0.631
11 0.014 0.029 0.303 0.655
12 0.016 0.029 0.280 0.676
13 0.018 0.028 0.259 0.695
14 0.020 0.028 0.241 0.711
15 0.022 0.028 0.225 0.725
16 0.024 0.028 0.212 0.737
17 0.026 0.028 0.199 0.747
18 0.028 0.029 0.188 0.755
19 0.030 0.029 0.178 0.763
20 0.032 0.029 0.169 0.770
21 0.034 0.028 0.161 0.776
22 0.036 0.028 0.154 0.782
23 0.038 0.027 0.147 0.787
24 0.040 0.027 0.141 0.792
25 0.042 0.026 0.135 0.797
26 0.045 0.025 0.129 0.801
27 0.047 0.025 0.123 0.805
28 0.049 0.024 0.117 0.809
Vitamin D
Dead Vent/ARDS In Hospital/Facility Home
1 0.000 0.018 0.925 0.056
2 0.001 0.027 0.845 0.127
3 0.002 0.030 0.756 0.212
4 0.003 0.030 0.662 0.305
5 0.004 0.030 0.573 0.394
6 0.005 0.029 0.497 0.468
7 0.007 0.028 0.438 0.527
8 0.008 0.028 0.393 0.570
9 0.010 0.029 0.357 0.604
10 0.012 0.029 0.327 0.632
11 0.014 0.029 0.301 0.657
12 0.016 0.028 0.278 0.678
13 0.018 0.028 0.258 0.697
14 0.019 0.028 0.240 0.713
15 0.021 0.027 0.224 0.727
16 0.023 0.028 0.210 0.739
17 0.025 0.028 0.198 0.749
18 0.028 0.028 0.187 0.757
19 0.030 0.029 0.177 0.765
20 0.032 0.028 0.168 0.772
21 0.034 0.028 0.160 0.778
22 0.036 0.028 0.153 0.784
23 0.038 0.027 0.146 0.789
24 0.040 0.026 0.140 0.794
25 0.042 0.026 0.134 0.798
26 0.044 0.025 0.128 0.803
27 0.046 0.025 0.122 0.807
28 0.048 0.024 0.116 0.811
ggplot(R, aes(x=day, y=phome, color=treatment, linetype=type)) + geom_line() +
+ yl + labs(color='', linetype='')
xl # Fit is barely better beyond day 17 if ran lrm on data table d2
The model would probably fit better had a partial proportional odds model been used to relax the proportional odds assumption for day
.
Now fit a model as if data collected was restricted to certain days. Need to recompute lags to not let a lag be based on a data that’s no longer sampled.
<- c(1, 2, 3, 4, 7, 14, 27)
times # Compute empirical proportions of being at home for those starting In Hospital, not on vent
<- dcf[day %in% times, .(id, day, bstatus, status, treatment, ddeath)]
dfull <- dfull[bstatus == 'In Hospital',
emps phome = mean(status == 'Home', na.rm=TRUE)),
.(=.(treatment, day)]
by:= 'Empirical on selected days']
emps[, type := unclass(day)]
emps[, day
<- dfull[day <= ddeath, ] # don't carry death forward
dv <- dv[, bstatus]
s levels(s) <- list('Vent/ARDS'='ARDS', 'Vent/ARDS'='On Vent',
'In Hospital/Facility'='In Hospital')
table(bstatus, s)] dv[,
s
bstatus Vent/ARDS In Hospital/Facility
ARDS 611 0
On Vent 2408 0
In Hospital 0 5925
:= s]
dv[, bstatus setkey(dv, id, day)
:= shift(status), by=id]
dv[, pstatus == 1, pstatus := bstatus]
dv[day := pstatus[drop=TRUE]] # drop unused previous status Dead
dv[, pstatus := ifelse(day == 1, 1, day - shift(day)), by=id]
dv[, gap table(day, gap)] dv[,
gap
day 1 3 7 13
1 1352 0 0 0
2 1336 0 0 0
3 1321 0 0 0
4 1303 0 0 0
7 0 1268 0 0
14 0 0 1212 0
27 0 0 0 1159
ggplot(dv, aes(x=day)) + geom_histogram(binwidth=1) + facet_wrap(~ pstatus)
<- datadist(dv); options(datadist='dd')
dd
# d2 <- dv[bstatus == 'In Hospital/Facility', ]
# Find transformation giving the best AIC
# Can't use a spline because of too many tied times
for(k in c(0 : 3)) {
cat('Transformation', k, '\n')
<- if(k == 0) status ~ (day + gap) * pstatus + treatment
form else
if(k == 1) status ~ (sqrt(day) + sqrt(gap)) * pstatus + treatment
else
if(k == 2) status ~ (pol(day, 2) + pol(gap, 2)) * pstatus + treatment
else
~ (pol(sqrt(day), 2) + pol(sqrt(day), 2)) * pstatus + treatment
status <- lrm(form, data=dv, maxit=50, tol=1e-14)
h if(k == 2) fsel <- h # choose quadratic
print(AIC(h))
}
Transformation 0
[1] 8098.67
Transformation 1
[1] 8008.668
Transformation 2
[1] 7864.54
Transformation 3
[1] 7896.565
<- data.frame(pstatus='In Hospital/Facility')
dat
<- unique(dv$treatment)
tlev <- levels(dv$status)
ylev
<- rbind(R, emps, fill=TRUE)
R for(trt in tlev) { # separately for treatments
$treatment <- trt
dat<- uprob(fsel, dat, times, ylev, pvarname='pstatus', gap=TRUE)
u cat('\n', trt, '\n\n')
print(round(u, 3))
<- data.table(treatment=trt, day=times, p=u, phome=u[, 4], type='Model selected days')
w <- rbind(R, w, fill=TRUE)
R }
Placebo
Dead Vent/ARDS In Hospital/Facility Home
1 0.000 0.039 0.884 0.077
2 0.003 0.059 0.781 0.157
3 0.006 0.066 0.684 0.245
4 0.009 0.064 0.585 0.342
7 0.011 0.051 0.412 0.526
14 0.012 0.029 0.253 0.705
27 0.013 0.025 0.172 0.790
Vitamin D
Dead Vent/ARDS In Hospital/Facility Home
1 0.000 0.038 0.883 0.079
2 0.002 0.057 0.779 0.161
3 0.005 0.064 0.680 0.251
4 0.009 0.062 0.580 0.349
7 0.011 0.049 0.405 0.536
14 0.011 0.028 0.246 0.714
27 0.013 0.024 0.166 0.798
<- 1:28
times <- data.frame(pstatus='In Hospital/Facility')
dat for(trt in tlev) { # separately for treatments
$treatment <- trt
dat<- uprob(fsel, dat, times, ylev, pvarname='pstatus', gap=TRUE)
u cat('\n', trt, '\n\n')
print(round(u, 3))
<- data.table(treatment=trt, day=times, p=u, phome=u[, 4], type='Model selected days, predict all')
w <- rbind(R, w, fill=TRUE)
R }
Placebo
Dead Vent/ARDS In Hospital/Facility Home
1 0.000 0.039 0.884 0.077
2 0.003 0.059 0.781 0.157
3 0.006 0.066 0.684 0.245
4 0.009 0.064 0.585 0.342
5 0.013 0.069 0.774 0.145
6 0.051 0.168 0.585 0.195
7 0.258 0.141 0.402 0.199
8 0.471 0.116 0.231 0.181
9 0.668 0.093 0.104 0.134
10 0.819 0.072 0.035 0.074
11 0.911 0.052 0.009 0.028
12 0.958 0.032 0.002 0.008
13 0.981 0.017 0.000 0.002
14 0.993 0.006 0.000 0.000
15 0.998 0.002 0.000 0.000
16 1.000 0.000 0.000 0.000
17 1.000 0.000 0.000 0.000
18 1.000 0.000 0.000 0.000
19 1.000 0.000 0.000 0.000
20 1.000 0.000 0.000 0.000
21 1.000 0.000 0.000 0.000
22 1.000 0.000 0.000 0.000
23 1.000 0.000 0.000 0.000
24 1.000 0.000 0.000 0.000
25 1.000 0.000 0.000 0.000
26 1.000 0.000 0.000 0.000
27 1.000 0.000 0.000 0.000
28 1.000 0.000 0.000 0.000
Vitamin D
Dead Vent/ARDS In Hospital/Facility Home
1 0.000 0.038 0.883 0.079
2 0.002 0.057 0.779 0.161
3 0.005 0.064 0.680 0.251
4 0.009 0.062 0.580 0.349
5 0.012 0.067 0.773 0.148
6 0.050 0.169 0.581 0.199
7 0.261 0.142 0.396 0.201
8 0.476 0.116 0.226 0.182
9 0.673 0.094 0.101 0.133
10 0.822 0.073 0.033 0.072
11 0.912 0.052 0.008 0.027
12 0.957 0.033 0.002 0.007
13 0.981 0.017 0.001 0.002
14 0.993 0.007 0.000 0.000
15 0.998 0.002 0.000 0.000
16 1.000 0.000 0.000 0.000
17 1.000 0.000 0.000 0.000
18 1.000 0.000 0.000 0.000
19 1.000 0.000 0.000 0.000
20 1.000 0.000 0.000 0.000
21 1.000 0.000 0.000 0.000
22 1.000 0.000 0.000 0.000
23 1.000 0.000 0.000 0.000
24 1.000 0.000 0.000 0.000
25 1.000 0.000 0.000 0.000
26 1.000 0.000 0.000 0.000
27 1.000 0.000 0.000 0.000
28 1.000 0.000 0.000 0.000
ggplot(R, aes(x=day, y=phome, color=treatment, linetype=type)) + geom_line() +
+ yl + labs(color='', linetype='') xl
This confirms that the Markov model likely handles uneven time spacings, but predictions should be limited to days appearing in the data. As before the problem is due to our attempt to make completely out-of-sample predictions on days not in the data when (1) both day
and gap
are important but (2) day
and gap
are collinear.
Better Modeling of Gap Effect in VIOLET 2
The completeness of VIOLET daily follow-up allows us to do another natural experiment where we create a full distribution of gaps by randomly sampling days within patients. For each patient randomly select a day t between 2 and L = last day of data - 6 days. Generate a random gap time r between 1 and L - 2. Remove days t, t+1, … t+r-1.
<- dcf
w $gap <- NULL w
setkey(w, id, day)
<- w[!is.na(status), ]
w # Recode baseline status like follow-up status
levels(w$bstatus) <- list('Vent/ARDS'='ARDS', 'Vent/ARDS'='On Vent',
'In Hospital/Facility'='In Hospital')
died=any(status=='Dead')), by=id][, table(died)] w[, .(
died
FALSE TRUE
1156 195
<- data.table(id=unique(w$id), key='id')
u <- nrow(u)
n set.seed(1)
:= sample(2 : 20, n, replace=TRUE)]
u[, start := sample(1 : 15, n, replace=TRUE)]
u[, width := start + width - 1]
u[, end < 27, table(start, width)] u[end
width
start 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2 4 8 3 1 5 5 6 3 5 6 3 7 6 5 2
3 6 4 4 7 6 5 4 3 6 2 6 6 13 3 2
4 5 6 6 3 9 3 3 5 4 3 6 8 4 2 2
5 6 4 5 4 4 6 5 5 7 4 1 3 8 3 3
6 3 6 3 0 4 2 3 9 7 7 8 8 9 2 4
7 1 3 5 2 3 6 7 5 6 7 5 4 7 3 4
8 2 9 4 8 3 3 3 5 6 1 2 5 6 9 5
9 3 3 4 5 8 3 5 4 8 5 7 3 7 4 5
10 10 6 5 4 4 6 2 4 5 7 5 5 4 7 8
11 4 7 4 5 6 7 10 6 6 9 3 3 4 2 1
12 5 2 6 2 4 3 8 1 3 6 4 8 4 6 4
13 6 3 6 4 6 4 2 4 3 8 2 7 7 2 0
14 3 7 7 7 4 7 3 4 5 5 5 4 4 0 0
15 8 4 8 8 7 2 7 4 4 5 4 4 0 0 0
16 5 3 3 5 6 2 2 6 4 5 2 0 0 0 0
17 3 9 4 6 7 1 5 4 2 10 0 0 0 0 0
18 1 4 7 5 4 6 5 2 5 0 0 0 0 0 0
19 3 6 4 2 0 3 4 3 0 0 0 0 0 0 0
20 6 4 6 4 3 4 9 0 0 0 0 0 0 0 0
<- w[u]
z <- z[end >= 27 | (day < start | day > end),]
w # Recreate previous state and gap with new gaps
:= ifelse(day == 1, as.character(bstatus), shift(as.character(status))), by=id]
w[, pstatus := ifelse(day == 1, 1L, day - shift(day)), by=id]
w[, gap # Don't carry death forward
:= cumsum(status == 'Dead'), by=id]
w[, nd <- w[nd < 2, ]
w sum(nd > 0)] # original number of deaths w[,
[1] 195
> 1, table(gap)] w[day
gap
1 2 3 4 5 6 7 8 9 10 11 12 13
22788 76 92 84 77 89 73 85 73 78 84 61 71
14 15 16
78 45 36
with(w, ggfreqScatter(day, gap))
<- datadist(w); options(datadist='dd')
dd <- lrm(status ~ pstatus + rcs(day, 7) + treatment + rcs(age, 4) + sofa + lips,
f0 data=w, tol=1e-13)
<- lrm(status ~ rcs(gap, 4) + pstatus + rcs(day, 4) + treatment + rcs(age, 4) + sofa + lips,
f1 data=w, tol=1e-13)
<- lrm(status ~ gap * pstatus + rcs(day, 4) + treatment + rcs(age, 4) + sofa + lips,
f2 data=w, tol=1e-13)
<- lrm(status ~ lsp(gap, 2) * pstatus + rcs(day, 4) + treatment + rcs(age, 4) + sofa + lips,
f3 data=w, tol=1e-13)
<- lrm(status ~ lsp(gap, c(2,4)) * pstatus + rcs(day, 4) + treatment + rcs(age, 4) + sofa + lips,
f4 data=w, tol=1e-13)
<- lrm(status ~ rcs(gap, 4) * pstatus + rcs(day, 4) + treatment + rcs(age, 4) + sofa + lips,
f5 data=w, tol=1e-13)
c(f0=AIC(f0), f1=AIC(f1), f2=AIC(f2), f3=AIC(f3), f4=AIC(f4), f5=AIC(f5))
f0 f1 f2 f3 f4 f5
12012.40 11652.23 11625.38 11597.44 11601.45 11600.24
ggplot(Predict(f1, gap, conf.int=FALSE))
ggplot(Predict(f5, gap, pstatus, conf.int=FALSE))
anova(f5)
Wald Statistics for status
|
|||
χ2 | d.f. | P | |
---|---|---|---|
gap (Factor+Higher Order Factors) | 504.72 | 9 | <0.0001 |
All Interactions | 129.33 | 6 | <0.0001 |
Nonlinear (Factor+Higher Order Factors) | 38.42 | 6 | <0.0001 |
pstatus (Factor+Higher Order Factors) | 5172.52 | 8 | <0.0001 |
All Interactions | 129.33 | 6 | <0.0001 |
day | 112.31 | 3 | <0.0001 |
Nonlinear | 19.27 | 2 | <0.0001 |
treatment | 0.40 | 1 | 0.5253 |
age | 47.59 | 3 | <0.0001 |
Nonlinear | 2.34 | 2 | 0.3098 |
sofa | 29.49 | 1 | <0.0001 |
lips | 7.27 | 1 | 0.0070 |
gap × pstatus (Factor+Higher Order Factors) | 129.33 | 6 | <0.0001 |
Nonlinear | 15.44 | 4 | 0.0039 |
Nonlinear Interaction : f(A,B) vs. AB | 15.44 | 4 | 0.0039 |
TOTAL NONLINEAR | 64.92 | 10 | <0.0001 |
TOTAL NONLINEAR + INTERACTION | 179.57 | 12 | <0.0001 |
TOTAL | 5683.23 | 20 | <0.0001 |
Before drawing conclusions let’s handle non-PO in time.
require(VGAM)
<- vglm(ordered(status) ~ pstatus + day + treatment + rcs(age, 4) + sofa + lips,
fpol cumulative(reverse=TRUE, parallel=TRUE), data=w)
<- vglm(ordered(status) ~ pstatus + rcs(day, 3) + treatment + rcs(age, 4) + sofa + lips,
fpo3 cumulative(reverse=TRUE, parallel=TRUE), data=w)
<- vglm(ordered(status) ~ pstatus + rcs(day, 5) + treatment + rcs(age, 4) + sofa + lips,
fpo5 cumulative(reverse=TRUE, parallel=TRUE), data=w)
<- vglm(ordered(status) ~ pstatus + day + treatment + rcs(age, 4) + sofa + lips,
f0l cumulative(reverse=TRUE, parallel=FALSE ~ day), data=w)
<- vglm(ordered(status) ~ pstatus + rcs(day, 3) + treatment + rcs(age, 4) + sofa + lips,
f03 cumulative(reverse=TRUE, parallel=FALSE ~ rcs(day, 3)), data=w)
<- vglm(ordered(status) ~ pstatus + rcs(day, 5) + treatment + rcs(age, 4) + sofa + lips,
f05 cumulative(reverse=TRUE, parallel=FALSE ~ rcs(day, 5)), data=w)
<- vglm(ordered(status) ~ pstatus + rcs(gap, 4) + rcs(day, 3) + treatment + rcs(age, 4) + sofa + lips,
f1 cumulative(reverse=TRUE, parallel=FALSE ~ rcs(day, 3)), data=w)
<- vglm(ordered(status) ~ gap * pstatus + rcs(day, 3) + treatment + rcs(age, 4) + sofa + lips,
f2 cumulative(reverse=TRUE, parallel=FALSE ~ rcs(day, 3)), data=w)
<- vglm(ordered(status) ~ rcs(gap, 3) * pstatus + rcs(day, 3) + treatment + rcs(age, 4) + sofa + lips,
f3 cumulative(reverse=TRUE, parallel=FALSE ~ rcs(day, 3)), data=w)
<- vglm(ordered(status) ~ rcs(gap, 4) * pstatus + rcs(day, 3) + treatment + rcs(age, 4) + sofa + lips,
f4 cumulative(reverse=TRUE, parallel=FALSE ~ rcs(day, 3)), data=w)
c(fpol=AIC(fpol), fpo3=AIC(fpo3), fpo5=AIC(fpo5), f0l=AIC(f0l), f03=AIC(f03), f05=AIC(f05), f1=AIC(f1), f2=AIC(f2), f3=AIC(f3), f4=AIC(f4))
fpol fpo3 fpo5 f0l f03 f05 f1 f2
12044.31 12023.26 12023.96 12039.35 11988.62 11992.68 11639.42 11612.38
f3 f4
11590.84 11592.26
Comparing AICs indicates the following:
fpol
vs.fpo3
: effect ofday
is nonlinear in a PO modelfpo3
vs.fpo5
: 3 knots are adequate forday
in PO modelfpo5
vs.f05
: moderate non-PO forday
f05
vs.f0l
: non-PO effect ofday
or main effect ofday
or both are nonlinearf05
vs.f03
: 3 knots works better forday
for combined main and non-PO effectsf1
vs.f03
: main effect is needed forgap
f2
vs.f1
: previous state interacts weekly withgap
f2
,f3
,f4
: previous state interaction withgap
is better represented by 3 knots
Odds Ratios for ORCHID
Now get treatment odds ratios for state occupancy probabilities, as a function of time, as opposed to odds of transitions over consecutive time periods.
<- function(x) x / (1. - x)
odds setDT(Rorchid, key=c('trt', 'day'))
<- Rorchid[trt == 'hydroxychloroquine', phome]
b <- Rorchid[trt == 'placebo', phome]
a <- Rorchid[trt == 'placebo', day]
days <- odds(b) / odds(a)
or plot(days, or, type='b', xlab='Day', ylab='hydroxy : placebo OR', ylim=c(1, 1.05))
abline(h=exp(coef(f)['trt=hydroxychloroquine']), col=gray(0.85))
text(27, 1.007,
'Horizontal line is conditional\ntransition OR from unified PO model', adj=1)
Bayesian Markov Model for ORCHID
We parallel the regular proportional odds model with a Bayesian Markov proportional hazards model.
stanSet()
<- blrm(y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 3) +
bmarkov
sofa_nogcs,data=d, refresh=100, file='bmarkov.rds')
htmlVerbatim(stanDx(bmarkov))
Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved
For each parameter, n_eff is a crude measure of effective sample size and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1)
n_eff Rhat y>=2 2662 1.001 y>=3 2219 1.001 y>=4 2691 1.000 y>=5 3632 1.001 y>=6 3459 1.001 y>=7 2280 1.001 gap 2625 1.000 yprev=3 3087 1.000 yprev=4 3970 1.000 yprev=5 3637 1.001 yprev=6 2769 1.001 yprev=7 2578 1.000 day 5192 1.000 day^2 5053 1.000 trt=hydroxychloroquine 5442 1.000 age 5505 1.000 age' 5513 1.000 sofa_nogcs 4865 1.000 gap * yprev=3 4791 1.000 gap * yprev=4 4223 1.000 gap * yprev=5 5252 0.999 gap * yprev=6 5020 0.999 gap * yprev=7 5482 0.999
stanDxplot(bmarkov)
bmarkov
Bayesian Proportional Odds Ordinal Logistic Model
Dirichlet Priors With Concentration Parameter 0.308 for Intercepts
blrm(formula = y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 3) + sofa_nogcs, data = d, refresh = 100, file = "bmarkov.rds")
Frequencies of Responses
1 2 3 4 5 6 7 50 357 297 817 526 757 489
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 3293 | B 0.084 [0.082, 0.085] | g 4.179 [4.02, 4.356] | C 0.908 [0.907, 0.908] |
Draws 4000 | gp 0.442 [0.437, 0.447] | Dxy 0.816 [0.814, 0.817] | |
Chains 4 | EV 0.625 [0.604, 0.644] | ||
p 17 | v 13.45 [12.436, 14.582] | ||
vp 0.155 [0.151, 0.161] |
Mode β | Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|---|
y≥2 | 2.4622 | 2.4679 | 2.4675 | 0.3170 | 1.8609 | 3.0937 | 1.0000 | 1.03 |
y≥3 | -1.0702 | -1.0803 | -1.0814 | 0.3078 | -1.6511 | -0.4478 | 0.0000 | 0.96 |
y≥4 | -2.9740 | -2.9829 | -2.9839 | 0.3222 | -3.6155 | -2.3538 | 0.0000 | 1.00 |
y≥5 | -6.0546 | -6.0639 | -6.0613 | 0.3389 | -6.7592 | -5.4398 | 0.0000 | 0.98 |
y≥6 | -7.8915 | -7.8991 | -7.8943 | 0.3454 | -8.5890 | -7.2479 | 0.0000 | 0.98 |
y≥7 | -10.7955 | -10.8085 | -10.7999 | 0.3591 | -11.5214 | -10.1165 | 0.0000 | 0.97 |
gap | -0.2174 | -0.2170 | -0.2165 | 0.0872 | -0.3858 | -0.0496 | 0.0058 | 0.99 |
yprev=3 | 2.2423 | 2.2524 | 2.2523 | 0.2273 | 1.8178 | 2.6953 | 1.0000 | 1.03 |
yprev=4 | 4.8495 | 4.8570 | 4.8511 | 0.2362 | 4.4060 | 5.3202 | 1.0000 | 1.01 |
yprev=5 | 7.2669 | 7.2735 | 7.2694 | 0.2573 | 6.7586 | 7.7535 | 1.0000 | 1.01 |
yprev=6 | 9.3374 | 9.3438 | 9.3395 | 0.2811 | 8.8091 | 9.9018 | 1.0000 | 1.01 |
yprev=7 | 9.3061 | 9.3066 | 9.3182 | 0.5846 | 8.1136 | 10.4645 | 1.0000 | 0.99 |
day | 0.2781 | 0.2788 | 0.2787 | 0.0400 | 0.2036 | 0.3577 | 1.0000 | 1.01 |
day2 | -0.0052 | -0.0052 | -0.0052 | 0.0008 | -0.0068 | -0.0038 | 0.0000 | 1.00 |
trt=hydroxychloroquine | 0.0113 | 0.0126 | 0.0116 | 0.0694 | -0.1308 | 0.1394 | 0.5685 | 0.98 |
age | -0.0092 | -0.0092 | -0.0093 | 0.0050 | -0.0185 | 0.0007 | 0.0330 | 1.01 |
age’ | -0.0031 | -0.0031 | -0.0031 | 0.0058 | -0.0147 | 0.0078 | 0.2962 | 1.02 |
sofa_nogcs | -0.0901 | -0.0900 | -0.0903 | 0.0204 | -0.1294 | -0.0495 | 0.0000 | 1.02 |
gap × yprev=3 | 0.2080 | 0.2063 | 0.2079 | 0.0618 | 0.0791 | 0.3202 | 0.9998 | 0.96 |
gap × yprev=4 | 0.3269 | 0.3258 | 0.3255 | 0.0390 | 0.2493 | 0.3995 | 1.0000 | 0.98 |
gap × yprev=5 | 0.2403 | 0.2396 | 0.2397 | 0.0394 | 0.1638 | 0.3170 | 1.0000 | 1.00 |
gap × yprev=6 | 0.0943 | 0.0940 | 0.0942 | 0.0329 | 0.0327 | 0.1580 | 0.9975 | 0.99 |
gap × yprev=7 | 0.2276 | 0.2288 | 0.2284 | 0.0570 | 0.1243 | 0.3456 | 1.0000 | 1.02 |
We need to be able to compute posterior distributions on state occupancy probabilities to take into account all sources of uncertainty. The follow function computes all needed derived quantities, separately for each posterior draw.
# Compute unconditional probabilities
# Output is an array of dimension number of draws by s rows corresponding to
# t=times(1), times(2), ...
# times(s), and final dimension is the number of Y categories
# times(1) must be 1
#
# fit is the Bayesian model fit from blrm which must contain the variable named day and the
# previous state, which must be named by the value of pvarname
# data is a data frame or data table that contains covariate settings
# other than day and what's in pvarname. data has one row.
# times is the vector of days for which to evaluate state occupancy probabilities
# ylevels are the ordered levels of the outcome variable
# Set gap=TRUE to compute time gaps when predictor gap is in the model
<- function(fit, data, times, ylevels, pvarname='yprev', gap=FALSE) {
uprobB if(! length(fit$draws)) stop('fit does not have posterior draws')
<- nrow(fit$draws)
nd <- length(ylevels)
k <- length(times)
s <- array(NA, c(nd, s, k),
P dimnames=list(paste('draw', 1 : nd), as.character(times),
as.character(ylevels)))
# Never uncondition on initial state
$day <- times[1]
dataif(gap) data$gap <- times[1]
<- predict(fit, data, type='fitted.ind', posterior.summary='all')
p 1, ] <- p
P[, # cp: matrix of conditional probabilities of Y conditioning on previous time Y
# Columns = k conditional probabilities conditional on a single previous state
# Rows = all possible previous states
# This is for a single posterior draw
<- matrix(NA, nrow=k, ncol=k,
cp dimnames=list(paste('t-1', ylevels), paste('t', ylevels)))
<- as.list(data)
data <- as.character(ylevels[-1])
data[[pvarname]] # don't request estimates after absorbing state
<- expand.grid(data)
edata for(it in 2 : s) {
$day <- times[it]
edataif(gap) edata$gap <- times[it] - times[it - 1]
# y=first level = absorbing state
<- predict(fit, edata, type='fitted.ind', posterior.summary='all')
pp for(idraw in 1 : nd) {
<- rbind(c(1., rep(0., k - 1)), pp[idraw, ,])
cp # Compute unconditional probabilities of being in all possible states
# at current time t
<- t(cp) %*% P[idraw, it - 1, ]
P[idraw, it, ]
}
}
P
}
<- c(1,2,3,4,7,14,28)
times <- 1 : 7
ylev <- levels(d$trt)
tlev <- data.frame(yprev='4', age=repcov$age, sofa_nogcs=repcov$sofa_nogcs)
dat <- list()
R for(trt in tlev) { # separately for treatments
$trt <- trt
dat<- uprobB(bmarkov, dat, times, ylev, pvarname='yprev', gap=TRUE)
u <- u
R[[trt]]
}
# Compute summary measure per treatment, separately for each time and posterior draw
# P(home) = P(Y=6 or Y=7)
<- function(u) u[, 6] + u[, 7]
s # w <- apply(R$placebo, 1, s) # to test
<- lapply(R, function(x) apply(x, 1, s)) r
For each treatment and time compute the posterior mean P(home) and 0.95 highest posterior density intervals.
<- function(x) {
g <- c(mean(x), HPDint(x))
w names(w) <- c('mean', 'lower', 'upper')
w
}
<- lapply(r, function(x) as.data.frame(t(apply(x, 1, g))))
rs for(trt in tlev) rs[[trt]]$trt <- trt
<- do.call(rbind, rs)
rs # Fetch day from end of row names (follows .)
$day <- as.numeric(sub('^.*\\.(.*?)', '\\2', rownames(rs), ))
rs<- xlab('Day'); yl <- ylab('P(being at home)')
xl $txt <- with(rs, paste0(trt, '<br>Day:', day, '<br>', round(mean, 3)))
rs<- ggplot(rs, aes(x=day, y=mean, color=trt, label=txt)) + geom_line() +
gg geom_ribbon(aes(ymin = lower, ymax = upper), alpha=0.2, linetype=0) +
+ yl + labs(color='')
xl ggp(gg)
Instead of individual per-treatment estimates summarize the difference and odds ratio for state occupancy probabilities.
<- r$hydroxychloroquine - r$placebo
Dif <- as.data.frame(t(apply(Dif, 1, g)))
dif $day <- as.numeric(sub('^.*\\.(.*?)', '\\2', rownames(dif), ))
dif$txt <- with(dif, paste0('Day:', day, '<br>', round(mean, 3)))
dif<- ggplot(dif, aes(x=day, y=mean, label=txt)) + geom_line() +
gg geom_ribbon(aes(ymin=lower, ymax=upper), alpha=0.2, linetype=0) +
+ ylab('Hydroxy - Placebo Difference in P(home)')
xl ggp(gg)
<- odds(r$hydroxychloroquine) / odds(r$placebo)
or <- as.data.frame(t(apply(or, 1, g)))
or $day <- as.numeric(sub('^.*\\.(.*?)', '\\2', rownames(or), ))
or$txt <- with(or, paste0('Day:', day, '<br>', round(mean, 3)))
or<- ggplot(or, aes(x=day, y=mean, label=txt)) + geom_line() +
gg geom_ribbon(aes(ymin=lower, ymax=upper), alpha=0.2, linetype=0) +
+ ylab('Hydroxy - Placebo OR for Being at Home')
xl ggp(gg)
Evidence for Efficacy Over Time for One Covariate Combination
Now compute the Bayesian evidence measure for a positive difference in P(home). Note that in the two scatterplots relating posterior draws for differences in occupancy probabilities (P(home)), plotting one day vs. another, no points are on the upper left or lower right quadrants, so the evidence for a positive treatment effect does not vary over the days of follow-up.
plot(Dif[1, ], Dif[5, ], xlab='Posterior Draws for Day 1', ylab='Day 7', cex=.3)
abline(h=0, v=0, col=gray(0.8))
plot(Dif[3, ], Dif[6, ], xlab='Posterior Draws for Day 3', ylab='Day 14', cex=.3)
abline(h=0, v=0, col=gray(0.8))
<- apply(Dif, 1, function(x) mean(x > 0))
pp <- data.frame(day = as.integer(names(pp)), pp)
w $txt <- with(w, paste0('Day:', day, '<br>', round(pp, 3)))
w<- ggplot(w, aes(x=day, y=pp, label=txt)) + geom_line() +
gg + ylab('P(increase in P(home) with hydroxychloroquine)')
xl ggp(gg)
The evidence does vary when computing the posterior probability that treatment has a benefit exceeding an absolute risk difference of 0.02. This is primarily due to the near impossibility of being discharged on the first or second day.
<- apply(Dif, 1, function(x) mean(x > 0.02))
pp <- data.frame(day = as.integer(names(pp)), pp)
w $txt <- with(w, paste0('Day:', day, '<br>', round(pp, 3)))
w<- ggplot(w, aes(x=day, y=pp, label=txt)) + geom_line() +
gg + ylab('P(increase in P(home) with hydroxychloroquine > 0.02)')
xl ggp(gg)
Posterior Distributions of Median and Restricted Mean Time to Home
Even though the data collection times in ORCHID were not granular enough to compute median time until the patient returned home, we compute this anyway, and compute per-treatment posterior means and discrete posterior probability distributions for these medians. For a given posterior sample, the median is the first day for which P(home) >= 0.5 for a treatment. We arbitrary use 30d as the median when the probability never exceeds 0.5. Recall that all these calculations are conditional on the patient being in state 4 (hospitalized on supplemental oxygen) pre-randomization.
<- c(1,2,3,4,7,14,28)
days <- function(p) if(all(p < 0.5)) 30 else min(days[p >= 0.5])
g <- lapply(r, function(x) data.frame(med = apply(x, 2, g)))
rmed for(trt in tlev) rmed[[trt]]$trt <- trt
<- do.call(rbind, rmed)
rmed with(rmed, tapply(med, trt, mean)) # posterior mean for median
hydroxychloroquine placebo
14 14
with(rmed, table(med, trt)) # discrete distribution for median
trt
med hydroxychloroquine placebo
14 4000 4000
The sharpness of the posterior distribution of the median time to home is just a reflection of what an easy question we are asking: what is the median of time to home after essentially rounding it to the nearest 7d.
Follow-up was not long enough to compute the mean time until returning home, but we can do inference for the mean restricted time until home (restricted to 28d) assuming that home is an absorbing state (which is approximately but not exactly true). Let \(t_{i}, i=1, 2, \dots, 7\) represent the assessment times (days 1, 2, 3, 4, 7, 14, 28) for time until home \(T\). Under the \(T \leq 28\) restriction, the mean restricted time to home is \(\sum_{i=1}^{7} t_{i} \Pr(T = t_{i}) = \sum_{i=1}^{7} t_{i} [\Pr(T \leq t_{i}) - [i > 1] \Pr(T \leq t_{i-1})]\).
<- function(p) sum(days * (p - c(0., p[-length(p)])))
g <- lapply(r, function(x) data.frame(mean = apply(x, 2, g)))
rmean for(trt in tlev) rmean[[trt]]$trt <- trt
<- do.call(rbind, rmean)
rmean with(rmean, tapply(mean, trt, mean)) # posterior means for mean restricted time to home
hydroxychloroquine placebo
10.55112 10.57485
ggplot(rmean, aes(x=mean, color=trt)) + geom_density() +
labs(color='') + xlab('Restricted Mean Time to Home (days)') + ylab('')
Compute the posterior probability that the mean restricted time to home is shorter for hydroxy.
<- rmean[grep('placebo', rownames(rmean)), 'mean']
a <- rmean[grep('hydrox', rownames(rmean)), 'mean']
b plot(density(b - a), main='', xlab='Difference in Mean Restricted Time to Home', ylab='')
abline(v=0, col=gray(0.85))
mean(b < a)
[1] 0.56825
Markov Model with Random Effects
Let’s see what difference it makes to add patient-level random effects to the Markov state transition model.
<- blrm(y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 3) +
bmarkovre + cluster(id),
sofa_nogcs data=d, refresh=100, file='bmarkovre.rds')
htmlVerbatim(stanDx(bmarkovre))
Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved
For each parameter, n_eff is a crude measure of effective sample size and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1)
n_eff Rhat y>=2 2058 1.000 y>=3 1266 1.002 y>=4 1485 1.001 y>=5 2911 1.000 y>=6 2458 1.000 y>=7 1585 1.001 gap 1950 1.001 yprev=3 1880 1.000 yprev=4 3299 1.000 yprev=5 2750 1.001 yprev=6 1627 1.002 yprev=7 2037 1.000 day 5880 0.999 day^2 5076 1.001 trt=hydroxychloroquine 5040 0.999 age 4770 1.001 age' 5632 0.999 sofa_nogcs 5487 0.999 gap * yprev=3 4795 1.000 gap * yprev=4 5105 0.999 gap * yprev=5 4534 0.999 gap * yprev=6 5255 1.000 gap * yprev=7 5623 0.999 sigmag 1384 1.000
stanDxplot(bmarkovre)
bmarkovre
Bayesian Proportional Odds Ordinal Logistic Model
Dirichlet Priors With Concentration Parameter 0.308 for Intercepts
blrm(formula = y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 3) + sofa_nogcs + cluster(id), data = d, refresh = 100, file = "bmarkovre.rds")
Frequencies of Responses
1 2 3 4 5 6 7 50 357 297 817 526 757 489
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 3293 | B 0.084 [0.082, 0.085] | g 4.184 [4.003, 4.327] | C 0.908 [0.907, 0.908] |
Draws 4000 | gp 0.442 [0.437, 0.447] | Dxy 0.816 [0.814, 0.817] | |
Chains 4 | EV 0.626 [0.604, 0.644] | ||
p 17 | v 13.469 [12.35, 14.435] | ||
Cluster on id
|
vp 0.156 [0.15, 0.16] | ||
Clusters 479 | |||
σγ 0.0532 [1e-04, 0.1583] |
Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|
y≥2 | 2.4855 | 2.4782 | 0.3151 | 1.8645 | 3.0789 | 1.0000 | 1.00 |
y≥3 | -1.0601 | -1.0671 | 0.3088 | -1.6477 | -0.4493 | 0.0003 | 1.01 |
y≥4 | -2.9656 | -2.9664 | 0.3227 | -3.5882 | -2.3428 | 0.0000 | 1.03 |
y≥5 | -6.0504 | -6.0497 | 0.3357 | -6.7091 | -5.3733 | 0.0000 | 1.06 |
y≥6 | -7.8895 | -7.8913 | 0.3462 | -8.6008 | -7.2125 | 0.0000 | 1.03 |
y≥7 | -10.8014 | -10.8012 | 0.3590 | -11.4987 | -10.0604 | 0.0000 | 1.04 |
gap | -0.2173 | -0.2159 | 0.0838 | -0.3727 | -0.0480 | 0.0048 | 1.03 |
yprev=3 | 2.2465 | 2.2485 | 0.2255 | 1.8144 | 2.6793 | 1.0000 | 0.99 |
yprev=4 | 4.8489 | 4.8438 | 0.2330 | 4.3954 | 5.3076 | 1.0000 | 1.02 |
yprev=5 | 7.2664 | 7.2617 | 0.2616 | 6.7623 | 7.7755 | 1.0000 | 1.01 |
yprev=6 | 9.3376 | 9.3361 | 0.2844 | 8.7519 | 9.8522 | 1.0000 | 0.98 |
yprev=7 | 9.2789 | 9.2782 | 0.5826 | 8.0628 | 10.3473 | 1.0000 | 1.02 |
day | 0.2793 | 0.2797 | 0.0387 | 0.2083 | 0.3589 | 1.0000 | 1.04 |
day2 | -0.0052 | -0.0052 | 0.0007 | -0.0067 | -0.0038 | 0.0000 | 1.00 |
trt=hydroxychloroquine | 0.0118 | 0.0111 | 0.0684 | -0.1196 | 0.1475 | 0.5662 | 1.01 |
age | -0.0093 | -0.0092 | 0.0049 | -0.0183 | 0.0011 | 0.0310 | 1.00 |
age’ | -0.0031 | -0.0031 | 0.0057 | -0.0150 | 0.0076 | 0.2855 | 1.00 |
sofa_nogcs | -0.0913 | -0.0909 | 0.0207 | -0.1296 | -0.0490 | 0.0000 | 0.93 |
gap × yprev=3 | 0.2067 | 0.2060 | 0.0608 | 0.0911 | 0.3231 | 0.9995 | 1.01 |
gap × yprev=4 | 0.3279 | 0.3274 | 0.0386 | 0.2517 | 0.4014 | 1.0000 | 1.01 |
gap × yprev=5 | 0.2417 | 0.2421 | 0.0397 | 0.1665 | 0.3205 | 1.0000 | 0.98 |
gap × yprev=6 | 0.0954 | 0.0950 | 0.0338 | 0.0319 | 0.1614 | 0.9972 | 1.03 |
gap × yprev=7 | 0.2321 | 0.2324 | 0.0575 | 0.1205 | 0.3480 | 1.0000 | 1.03 |
The very low standard deviation of the random effects (0.053) lends credence to the idea that the state transitions are conditionally independent and that random effects are not needed. Also, the posterior mean and median regression parameter values and standard deviations of their posterior distributions changed very little.
Serial Transition Model With Non-Proportional Odds
Return to the Markov model without random effects. Let’s use a partial proportional odds model in which we add departures from the proportional odds assumption with regard to two variables: previous state and time (but not their interaction). We are using the constrained partial proportional odds model in which the constraint imposes linearity in the outcome category (but not for the effect of the previous outcome level).
<- blrm(y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 3) +
bmarkovppo ~ day + yprev, cppo=function(y) y,
sofa_nogcs, data=d, refresh=100, file='bmarkovppo.rds')
htmlVerbatim(stanDx(bmarkovppo))
Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved
For each parameter, n_eff is a crude measure of effective sample size and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1)
n_eff Rhat y>=2 524 1.009 y>=3 480 1.008 y>=4 441 1.009 y>=5 589 1.005 y>=6 795 1.001 y>=7 812 1.003 gap 487 1.010 yprev=3 978 1.004 yprev=4 595 1.006 yprev=5 761 1.004 yprev=6 902 1.005 yprev=7 631 1.009 day 2249 1.000 day^2 1787 1.001 trt=hydroxychloroquine 2581 1.000 age 2304 1.002 age' 2419 1.000 sofa_nogcs 1996 1.001 gap * yprev=3 1883 1.003 gap * yprev=4 2312 0.999 gap * yprev=5 2078 1.000 gap * yprev=6 1248 1.001 gap * yprev=7 1702 1.001 day x f(y) 485 1.008 yprev=3 x f(y) 886 1.002 yprev=4 x f(y) 519 1.008 yprev=5 x f(y) 1056 1.003 yprev=6 x f(y) 852 1.005 yprev=7 x f(y) 549 1.008
stanDxplot(bmarkovppo)
bmarkovppo
Bayesian Constrained Partial Proportional Odds Ordinal Logistic Model
Dirichlet Priors With Concentration Parameter 0.308 for Intercepts
blrm(formula = y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 3) + sofa_nogcs, ppo = ~day + yprev, cppo = function(y) y, data = d, refresh = 100, file = "bmarkovppo.rds")
Frequencies of Responses
1 2 3 4 5 6 7 50 357 297 817 526 757 489
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 3293 | B 0.081 [0.08, 0.082] | g 4.556 [4.243, 4.85] | C 0.886 [0.88, 0.894] |
Draws 4000 | gp 0.447 [0.441, 0.452] | Dxy 0.773 [0.76, 0.788] | |
Chains 4 | EV 0.645 [0.624, 0.667] | ||
p 17 | v 15.982 [13.802, 18.078] | ||
vp 0.16 [0.155, 0.165] |
Mode β | Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|---|
y≥2 | 2.8421 | 2.8598 | 2.8643 | 0.3338 | 2.2264 | 3.5288 | 1.0000 | 1.00 |
y≥3 | -1.4108 | -1.4103 | -1.4075 | 0.3250 | -2.0104 | -0.7244 | 0.0000 | 1.04 |
y≥4 | -3.5968 | -3.6227 | -3.6207 | 0.3719 | -4.3768 | -2.8930 | 0.0000 | 0.98 |
y≥5 | -7.0605 | -7.1393 | -7.1315 | 0.4372 | -8.0427 | -6.3284 | 0.0000 | 0.95 |
y≥6 | -9.2730 | -9.4218 | -9.4100 | 0.5459 | -10.5459 | -8.4240 | 0.0000 | 0.88 |
y≥7 | -12.3128 | -12.5330 | -12.4950 | 0.6901 | -13.9470 | -11.2632 | 0.0000 | 0.86 |
gap | 0.0851 | 0.0841 | 0.0851 | 0.0888 | -0.0953 | 0.2499 | 0.8238 | 1.00 |
yprev=3 | 2.8052 | 2.8386 | 2.8390 | 0.3517 | 2.1649 | 3.5282 | 1.0000 | 1.04 |
yprev=4 | 5.8843 | 5.9514 | 5.9393 | 0.3274 | 5.3706 | 6.6288 | 1.0000 | 1.08 |
yprev=5 | 9.0448 | 9.1057 | 9.0961 | 0.3553 | 8.3830 | 9.7680 | 1.0000 | 1.06 |
yprev=6 | 13.6666 | 13.6957 | 13.6902 | 0.5387 | 12.7090 | 14.8397 | 1.0000 | 1.06 |
yprev=7 | 12.0749 | 12.3731 | 12.3376 | 1.0384 | 10.3061 | 14.4209 | 1.0000 | 1.08 |
day | 0.3580 | 0.3579 | 0.3578 | 0.0420 | 0.2780 | 0.4427 | 1.0000 | 1.01 |
day2 | -0.0080 | -0.0080 | -0.0080 | 0.0007 | -0.0094 | -0.0066 | 0.0000 | 1.00 |
trt=hydroxychloroquine | 0.0171 | 0.0142 | 0.0139 | 0.0710 | -0.1138 | 0.1567 | 0.5768 | 1.00 |
age | -0.0108 | -0.0109 | -0.0109 | 0.0053 | -0.0217 | -0.0014 | 0.0192 | 0.99 |
age’ | -0.0022 | -0.0021 | -0.0020 | 0.0061 | -0.0141 | 0.0095 | 0.3665 | 1.04 |
sofa_nogcs | -0.0980 | -0.0981 | -0.0975 | 0.0203 | -0.1376 | -0.0598 | 0.0000 | 0.98 |
gap × yprev=3 | -0.0861 | -0.0817 | -0.0823 | 0.0499 | -0.1778 | 0.0124 | 0.0472 | 1.00 |
gap × yprev=4 | -0.1911 | -0.1875 | -0.1887 | 0.0377 | -0.2616 | -0.1127 | 0.0000 | 1.06 |
gap × yprev=5 | -0.2893 | -0.2861 | -0.2866 | 0.0471 | -0.3745 | -0.1887 | 0.0000 | 1.03 |
gap × yprev=6 | -0.4627 | -0.4601 | -0.4604 | 0.0441 | -0.5441 | -0.3748 | 0.0000 | 1.01 |
gap × yprev=7 | -0.3354 | -0.3330 | -0.3340 | 0.0635 | -0.4538 | -0.2100 | 0.0000 | 1.07 |
day x f(y) | 0.1202 | 0.1197 | 0.1198 | 0.0068 | 0.1074 | 0.1338 | 1.0000 | 1.02 |
yprev=3 x f(y) | -0.1045 | -0.0802 | -0.0785 | 0.2495 | -0.5887 | 0.3980 | 0.3650 | 1.03 |
yprev=4 x f(y) | 0.1093 | 0.2008 | 0.1768 | 0.1969 | -0.1366 | 0.6030 | 0.8655 | 1.50 |
yprev=5 x f(y) | -0.9969 | -0.8857 | -0.8974 | 0.2864 | -1.4536 | -0.3492 | 0.0035 | 1.12 |
yprev=6 x f(y) | -2.7648 | -2.6253 | -2.6180 | 0.3922 | -3.4177 | -1.8523 | 0.0000 | 0.94 |
yprev=7 x f(y) | -1.6179 | -1.6577 | -1.6263 | 0.6149 | -2.9022 | -0.5181 | 0.0018 | 0.87 |
# htmlVerbatim(compareBmods(bmarkovppo, bmarkov))
# Note: bmarkovppo fit with loo=TRUE was not saved because it was 171MB for some reason
# Output:
# Method: stacking
# ------
# weight
# model1 0.957
# model2 0.043
#
# Compute posterior means divided by SE
round(coef(bmarkovppo) / sqrt(diag(vcov(bmarkovppo))), 2)
y>=2 y>=3 y>=4
8.57 -4.34 -9.74
y>=5 y>=6 y>=7
-16.33 -17.26 -18.16
gap yprev=3 yprev=4
0.95 8.07 18.18
yprev=5 yprev=6 yprev=7
25.63 25.42 11.92
day day^2 trt=hydroxychloroquine
8.53 -10.99 0.20
age age' sofa_nogcs
-2.07 -0.34 -4.84
gap * yprev=3 gap * yprev=4 gap * yprev=5
-1.64 -4.98 -6.07
gap * yprev=6 gap * yprev=7 day x f(y)
-10.44 -5.24 17.72
yprev=3 x f(y) yprev=4 x f(y) yprev=5 x f(y)
-0.32 1.02 -3.09
yprev=6 x f(y) yprev=7 x f(y)
-6.69 -2.70
There were some minor sampling/convergence problems especially with respect to the parameters for departures from proportional odds.
The model comparison results listed in the code above puts a great likelihood on the partial proportional odds model fitting better than the proportional odds model. The result of compareBmods
, not saved in the code because the loo=TRUE
option makes the fit object much bigger, is here (but from a slightly different fit):
Method: stacking ------ weight model1 0.957 model2 0.043
Based on posterior mean parameter values and standard deviations of posterior distributions (standard errors), the most important departures from PO seem to be time-related and the effect of previously being in state 6.
Final Model With Non-PO With Respect to Time
Our final model will have as non-proportional odds only the effect of time, but we will now allow the time effect to be nonlinear (quadratic). As before the cppo
function assumes that the PO departure is proportional to the numeric 1-7 outcome codes. A fully unconstrained partial proportional odds model would have too many parameters.
<- blrm(y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 3) +
bmarkovppof ~ pol(day, 2), cppo=function(y) y,
sofa_nogcs, data=d, refresh=100, file='bmarkovppof.rds')
htmlVerbatim(stanDx(bmarkovppof))
Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved
For each parameter, n_eff is a crude measure of effective sample size and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1)
n_eff Rhat y>=2 3155 1.001 y>=3 2772 1.001 y>=4 3497 1.000 y>=5 4565 1.000 y>=6 3112 1.000 y>=7 2744 1.000 gap 3701 1.000 yprev=3 3236 1.001 yprev=4 4312 1.000 yprev=5 3439 1.001 yprev=6 2914 1.001 yprev=7 3443 1.000 day 5998 1.000 day^2 4539 1.000 trt=hydroxychloroquine 7707 0.999 age 6236 1.000 age' 6518 1.000 sofa_nogcs 5643 1.000 gap * yprev=3 5416 0.999 gap * yprev=4 7418 1.000 gap * yprev=5 6227 0.999 gap * yprev=6 4029 1.000 gap * yprev=7 4523 1.000 day x f(y) 2652 1.000 day^2 x f(y) 4295 1.001
stanDxplot(bmarkovppof)
bmarkovppof
Bayesian Constrained Partial Proportional Odds Ordinal Logistic Model
Dirichlet Priors With Concentration Parameter 0.308 for Intercepts
blrm(formula = y ~ gap * yprev + pol(day, 2) + trt + rcs(age, 3) + sofa_nogcs, ppo = ~pol(day, 2), cppo = function(y) y, data = d, refresh = 100, file = "bmarkovppof.rds")
Frequencies of Responses
1 2 3 4 5 6 7 50 357 297 817 526 757 489
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 3293 | B 0.081 [0.08, 0.082] | g 3.805 [3.639, 3.941] | C 0.884 [0.877, 0.892] |
Draws 4000 | gp 0.438 [0.433, 0.443] | Dxy 0.768 [0.754, 0.784] | |
Chains 4 | EV 0.615 [0.596, 0.635] | ||
p 17 | v 11.362 [10.498, 12.293] | ||
vp 0.153 [0.149, 0.159] |
Mode β | Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|---|
y≥2 | 4.4378 | 4.4540 | 4.4456 | 0.3640 | 3.7344 | 5.1346 | 1.0000 | 0.94 |
y≥3 | -0.4172 | -0.4190 | -0.4177 | 0.3262 | -1.1102 | 0.1631 | 0.0980 | 0.95 |
y≥4 | -3.2193 | -3.2292 | -3.2277 | 0.3334 | -3.8614 | -2.5610 | 0.0000 | 0.97 |
y≥5 | -7.1431 | -7.1566 | -7.1519 | 0.3534 | -7.8163 | -6.4290 | 0.0000 | 0.96 |
y≥6 | -10.1971 | -10.2173 | -10.2167 | 0.3874 | -10.9379 | -9.4161 | 0.0000 | 0.98 |
y≥7 | -15.0652 | -15.0933 | -15.0933 | 0.4446 | -15.9659 | -14.2205 | 0.0000 | 0.99 |
gap | -0.0655 | -0.0663 | -0.0672 | 0.0887 | -0.2413 | 0.1038 | 0.2285 | 0.98 |
yprev=3 | 2.7151 | 2.7239 | 2.7242 | 0.2236 | 2.2572 | 3.1256 | 1.0000 | 1.00 |
yprev=4 | 5.8959 | 5.9090 | 5.9062 | 0.2396 | 5.4725 | 6.3953 | 1.0000 | 0.99 |
yprev=5 | 8.9304 | 8.9504 | 8.9518 | 0.2800 | 8.4281 | 9.5012 | 1.0000 | 1.05 |
yprev=6 | 11.2825 | 11.3124 | 11.3110 | 0.3179 | 10.6828 | 11.9249 | 1.0000 | 1.06 |
yprev=7 | 9.9164 | 9.9414 | 9.9370 | 0.6189 | 8.7525 | 11.1980 | 1.0000 | 0.94 |
day | 0.3923 | 0.3917 | 0.3912 | 0.0425 | 0.3121 | 0.4794 | 1.0000 | 1.07 |
day2 | -0.0053 | -0.0053 | -0.0053 | 0.0007 | -0.0067 | -0.0038 | 0.0000 | 0.97 |
trt=hydroxychloroquine | 0.0257 | 0.0264 | 0.0260 | 0.0716 | -0.1048 | 0.1767 | 0.6408 | 1.03 |
age | -0.0105 | -0.0105 | -0.0104 | 0.0052 | -0.0214 | -0.0011 | 0.0205 | 0.96 |
age’ | -0.0027 | -0.0027 | -0.0027 | 0.0059 | -0.0136 | 0.0097 | 0.3157 | 1.01 |
sofa_nogcs | -0.1082 | -0.1084 | -0.1085 | 0.0201 | -0.1475 | -0.0689 | 0.0000 | 1.06 |
gap × yprev=3 | -0.0726 | -0.0704 | -0.0707 | 0.0460 | -0.1642 | 0.0143 | 0.0640 | 1.04 |
gap × yprev=4 | -0.2130 | -0.2101 | -0.2110 | 0.0366 | -0.2816 | -0.1406 | 0.0000 | 1.08 |
gap × yprev=5 | -0.3967 | -0.3938 | -0.3936 | 0.0411 | -0.4736 | -0.3125 | 0.0000 | 1.01 |
gap × yprev=6 | -0.5680 | -0.5657 | -0.5660 | 0.0383 | -0.6388 | -0.4888 | 0.0000 | 1.08 |
gap × yprev=7 | -0.3339 | -0.3299 | -0.3308 | 0.0601 | -0.4405 | -0.2046 | 0.0000 | 1.05 |
day x f(y) | 0.3454 | 0.3457 | 0.3457 | 0.0199 | 0.3057 | 0.3837 | 1.0000 | 0.99 |
day2 x f(y) | -0.0066 | -0.0066 | -0.0066 | 0.0006 | -0.0078 | -0.0054 | 0.0000 | 1.05 |
There is strong evidence that the departure from PO with respect to day of follow-up is nonlinear.
Final Check
Let’s fit a Bayesian transition model without covariates (other than treatment and initial state) to once again check against empirical proportions of being at home.
<- blrm(y ~ gap * yprev + pol(day, 2) + trt, ~ pol(day, 2),
bmarkovppon cppo=function(y) y,
data=d, refresh=100, file='bmarkovppon.rds')
htmlVerbatim(stanDx(bmarkovppon))
Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved
For each parameter, n_eff is a crude measure of effective sample size and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1)
n_eff Rhat y>=2 3039 1.000 y>=3 2804 1.001 y>=4 3198 1.001 y>=5 4132 1.000 y>=6 3108 1.000 y>=7 2452 1.000 gap 3223 1.002 yprev=3 2929 1.000 yprev=4 3561 1.000 yprev=5 3773 1.000 yprev=6 2980 0.999 yprev=7 3157 1.000 day 5230 1.000 day^2 4511 0.999 trt=hydroxychloroquine 5522 1.000 gap * yprev=3 4918 0.999 gap * yprev=4 5557 1.000 gap * yprev=5 4704 1.000 gap * yprev=6 3538 1.000 gap * yprev=7 3973 1.001 day x f(y) 2504 1.000 day^2 x f(y) 4184 0.999
bmarkovppon
Bayesian Constrained Partial Proportional Odds Ordinal Logistic Model
Dirichlet Priors With Concentration Parameter 0.308 for Intercepts
blrm(formula = y ~ gap * yprev + pol(day, 2) + trt, ppo = ~pol(day, 2), cppo = function(y) y, data = d, refresh = 100, file = "bmarkovppon.rds")
Frequencies of Responses
1 2 3 4 5 6 7 50 357 297 817 526 757 489
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 3293 | B 0.083 [0.082, 0.084] | g 3.723 [3.602, 3.865] | C 0.884 [0.875, 0.895] |
Draws 4000 | gp 0.436 [0.431, 0.442] | Dxy 0.768 [0.751, 0.79] | |
Chains 4 | EV 0.611 [0.59, 0.632] | ||
p 14 | v 10.918 [10.193, 11.744] | ||
vp 0.152 [0.147, 0.157] |
Mode β | Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|---|
y≥2 | 3.0582 | 3.0613 | 3.0559 | 0.2542 | 2.5698 | 3.5447 | 1.0000 | 1.04 |
y≥3 | -1.7124 | -1.7184 | -1.7170 | 0.2084 | -2.1268 | -1.3098 | 0.0000 | 1.02 |
y≥4 | -4.4747 | -4.4817 | -4.4835 | 0.2230 | -4.9316 | -4.0574 | 0.0000 | 1.00 |
y≥5 | -8.3492 | -8.3536 | -8.3546 | 0.2506 | -8.8241 | -7.8498 | 0.0000 | 0.99 |
y≥6 | -11.3550 | -11.3588 | -11.3569 | 0.2932 | -11.9503 | -10.7758 | 0.0000 | 0.96 |
y≥7 | -16.1466 | -16.1465 | -16.1463 | 0.3684 | -16.9045 | -15.4516 | 0.0000 | 0.97 |
gap | -0.0601 | -0.0610 | -0.0611 | 0.0872 | -0.2403 | 0.1043 | 0.2380 | 0.98 |
yprev=3 | 2.9563 | 2.9623 | 2.9583 | 0.2233 | 2.5177 | 3.4066 | 1.0000 | 1.02 |
yprev=4 | 6.2331 | 6.2370 | 6.2370 | 0.2320 | 5.7860 | 6.6868 | 1.0000 | 1.00 |
yprev=5 | 9.3574 | 9.3595 | 9.3578 | 0.2661 | 8.8260 | 9.8676 | 1.0000 | 1.03 |
yprev=6 | 11.7001 | 11.6994 | 11.6996 | 0.3149 | 11.0668 | 12.2847 | 1.0000 | 0.98 |
yprev=7 | 10.3873 | 10.3768 | 10.3723 | 0.6030 | 9.1319 | 11.4835 | 1.0000 | 1.04 |
day | 0.3894 | 0.3892 | 0.3891 | 0.0416 | 0.3068 | 0.4689 | 1.0000 | 1.01 |
day2 | -0.0053 | -0.0053 | -0.0053 | 0.0008 | -0.0067 | -0.0037 | 0.0000 | 1.01 |
trt=hydroxychloroquine | 0.0029 | 0.0029 | 0.0016 | 0.0692 | -0.1220 | 0.1461 | 0.5115 | 1.03 |
gap × yprev=3 | -0.0700 | -0.0689 | -0.0696 | 0.0473 | -0.1603 | 0.0202 | 0.0735 | 1.00 |
gap × yprev=4 | -0.2336 | -0.2304 | -0.2308 | 0.0363 | -0.3013 | -0.1599 | 0.0000 | 1.00 |
gap × yprev=5 | -0.4206 | -0.4170 | -0.4168 | 0.0413 | -0.4960 | -0.3342 | 0.0000 | 1.00 |
gap × yprev=6 | -0.5733 | -0.5692 | -0.5690 | 0.0388 | -0.6436 | -0.4933 | 0.0000 | 1.00 |
gap × yprev=7 | -0.3432 | -0.3370 | -0.3363 | 0.0610 | -0.4566 | -0.2188 | 0.0000 | 0.99 |
day x f(y) | 0.3382 | 0.3374 | 0.3372 | 0.0190 | 0.3011 | 0.3736 | 1.0000 | 1.03 |
day2 x f(y) | -0.0064 | -0.0064 | -0.0064 | 0.0006 | -0.0076 | -0.0053 | 0.0000 | 0.99 |
<- data.frame(yprev='4')
dat <- list()
R for(trt in tlev) { # separately for treatments
$trt <- trt
dat<- uprobB(bmarkovppon, dat, times, ylev, pvarname='yprev', gap=TRUE)
u <- u
R[[trt]]
}
# Compute summary measure per treatment, separately for each time and posterior draw
# P(home) = P(Y=6 or Y=7)
<- function(u) u[, 6] + u[, 7]
s # w <- apply(R$placebo, 1, s) # to test
<- lapply(R, function(x) apply(x, 1, s))
r
# For each treatment and time compute the posterior mean P(home)
<- lapply(r, function(x) data.frame(phome = apply(x, 1, mean)))
rs for(trt in tlev) rs[[trt]]$trt <- trt
<- do.call(rbind, rs)
rs # Fetch day from end of row names (follows .)
$day <- as.numeric(sub('^.*\\.(.*?)', '\\2', rownames(rs), ))
rs$type <- 'Model'
rs<- rbind(rs, empirical)
rs ggplot(rs, aes(x=day, y=phome, color=trt, linetype=type)) + geom_line() + xl + yl +
labs(color='', linetype='')
The agreement isn’t perfect but it’s probably the case that the sample size is not large enough to trust the empirical proportions.
A Different Markov Ordinal Model for Unequal Gaps
Consider a different approach in which we model serial dependence by attenuating the previous state effect by multiplying it by the reciprocal of gap time raised to a positive power. This has the properties of (1) forcing Y to be the same as the previous Y as the gap time approaches zero and (2) forcing independence across outcomes within patient as the gap time goes to infinity. To induce very high correlations in Y within patient with very small gaps, effects must approach \(\pm \infty\) on the logic scale, and the reciprocal raised to a power allows this. It is important to note that the effects being modeled must account for strong non-proportional odds. It is easy to see why when considering continuous Y with Gaussian residuals. A continuous-time AR(1) process is the Ornstein-Uhlenbeck process, and this process specifies that the variance of Y changes with the time gap. When the gap is very small, the conditional distribution of Y given the previous Y has a very small variance because in that case Y is very predictable from the previous Y. The analog to changing variance is changing the spacings of intercepts in a PO model. We accomplish that with a constrained partial PO model, the constraint being, as before, that the departure from PO is linear in Y. The question is whether we also need non-PO in absolute follow-up time. Logically, we do, but collinearity with gap may get in the way. We model serial dependence as a reciprocal of the cube root of the gap. The exponent really needs to be estimated from the data, which will require a nonlinear model. Our formulation does not allow main effects of previous states separate from the time gap to be included, so we are forcing the exponential function of gap to always be multiplied by effects of previous states.
A variety of functions for gap discounting of previous states, but none of them resulted in reasonable estimates of occupancy probabilities beyond the first gap > 1d. The last attempt is below. All of this points to the need for a true continuous time model when there is a need to estimate occupancy probabilities at unobserved times.
$ypn <- NULL d
rm(trt)
<- d
w <- as.numeric(as.character(w$yprev))
ypn <- as.data.frame(w)
w <- 1.
alpha <- 0.15
lambda <- 2
gamma <- with(w, 1 * cbind(ypn == 2, ypn == 3, ypn == 4, ypn == 5, ypn == 6, ypn == 7))
X # w$yp <- X
$ypr <- X / (w$gap ^ alpha)
w$ypr <- X * exp(-lambda * w$gap)
w$ypr <- X * (gamma + exp(-lambda * w$gap))
w# model.matrix problem if yp is logical instead of numeric
<- blrm(y ~ ypr + pol(day, 2) + trt, ~ pol(day, 2) + ypr,
bmarkovppor cppo=function(y) y,
data=w, refresh=100, file='bmarkovppor.rds')
<- bmarkovppor
b htmlVerbatim(stanDx(b))
Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved
For each parameter, n_eff is a crude measure of effective sample size and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1)
n_eff Rhat y>=2 1225 0.999 y>=3 1046 1.000 y>=4 960 1.000 y>=5 1272 1.000 y>=6 2215 1.000 y>=7 2524 1.001 ypr[1] 2123 1.001 ypr[2] 1581 1.000 ypr[3] 963 1.000 ypr[4] 1009 1.001 ypr[5] 2231 1.002 ypr[6] 1613 1.001 day 5208 1.000 day^2 6050 0.999 trt=hydroxychloroquine 5684 1.000 day x f(y) 1075 1.002 day^2 x f(y) 2040 1.000 ypr[1] x f(y) 1789 1.002 ypr[2] x f(y) 2679 1.000 ypr[3] x f(y) 1364 1.000 ypr[4] x f(y) 1313 1.000 ypr[5] x f(y) 2125 1.002 ypr[6] x f(y) 3964 1.000
b
Bayesian Constrained Partial Proportional Odds Ordinal Logistic Model
Dirichlet Priors With Concentration Parameter 0.308 for Intercepts
blrm(formula = y ~ ypr + pol(day, 2) + trt, ppo = ~pol(day, 2) + ypr, cppo = function(y) y, data = w, refresh = 100, file = "bmarkovppor.rds")
Frequencies of Responses
1 2 3 4 5 6 7 50 357 297 817 526 757 489
Mixed Calibration/ Discrimination Indexes |
Discrimination Indexes |
Rank Discrim. Indexes |
|
---|---|---|---|
Obs 3293 | B 0.084 [0.083, 0.085] | g 4.511 [4.2, 4.819] | C 0.884 [0.879, 0.893] |
Draws 4000 | gp 0.449 [0.443, 0.453] | Dxy 0.769 [0.759, 0.785] | |
Chains 4 | EV 0.651 [0.632, 0.671] | ||
p 9 | v 15.665 [13.51, 17.901] | ||
vp 0.162 [0.157, 0.167] |
Mode β | Mean β | Median β | S.E. | Lower | Upper | Pr(β>0) | Symmetry | |
---|---|---|---|---|---|---|---|---|
y≥2 | 9.0466 | 9.0329 | 9.0314 | 5.5668 | -1.1526 | 20.2638 | 0.9502 | 1.02 |
y≥3 | 1.9489 | 1.9434 | 1.9365 | 3.9782 | -5.9721 | 9.5441 | 0.6892 | 1.02 |
y≥4 | -3.1405 | -3.1685 | -3.1378 | 2.7501 | -8.7864 | 2.0215 | 0.1207 | 0.95 |
y≥5 | -9.5137 | -9.5646 | -9.5349 | 2.4891 | -14.7573 | -4.9412 | 0.0000 | 0.96 |
y≥6 | -14.3301 | -14.4224 | -14.4053 | 3.4199 | -20.9956 | -7.8009 | 0.0000 | 1.02 |
y≥7 | -20.1491 | -20.2917 | -20.3525 | 4.9035 | -29.9257 | -11.1951 | 0.0000 | 1.04 |
ypr[1] | 0.3317 | 0.3105 | 0.2920 | 0.8221 | -1.2516 | 1.9675 | 0.6440 | 1.03 |
ypr[2] | 1.1997 | 1.2103 | 1.1923 | 0.8127 | -0.2407 | 2.9797 | 0.9380 | 1.03 |
ypr[3] | 2.2590 | 2.2739 | 2.2575 | 0.8139 | 0.7930 | 4.0192 | 0.9970 | 1.02 |
ypr[4] | 3.4183 | 3.4317 | 3.4179 | 0.8209 | 1.9777 | 5.2350 | 1.0000 | 1.03 |
ypr[5] | 4.9959 | 4.9895 | 4.9714 | 0.8181 | 3.4314 | 6.6575 | 1.0000 | 1.03 |
ypr[6] | 4.5479 | 4.5939 | 4.5781 | 0.9122 | 2.8510 | 6.4027 | 1.0000 | 1.02 |
day | 0.3656 | 0.3669 | 0.3666 | 0.0471 | 0.2702 | 0.4574 | 1.0000 | 1.03 |
day2 | -0.0083 | -0.0083 | -0.0083 | 0.0009 | -0.0102 | -0.0065 | 0.0000 | 0.99 |
trt=hydroxychloroquine | 0.0146 | 0.0132 | 0.0140 | 0.0708 | -0.1190 | 0.1548 | 0.5748 | 0.98 |
day x f(y) | 0.3538 | 0.3560 | 0.3559 | 0.0626 | 0.2336 | 0.4760 | 1.0000 | 1.01 |
day2 x f(y) | -0.0073 | -0.0074 | -0.0074 | 0.0012 | -0.0096 | -0.0050 | 0.0000 | 0.95 |
ypr[1] x f(y) | 1.4566 | 1.4369 | 1.4428 | 0.9882 | -0.5276 | 3.2874 | 0.9285 | 0.99 |
ypr[2] x f(y) | 1.3604 | 1.3655 | 1.3598 | 0.9896 | -0.5180 | 3.3209 | 0.9132 | 1.00 |
ypr[3] x f(y) | 1.4934 | 1.5060 | 1.5046 | 0.9894 | -0.4427 | 3.3649 | 0.9382 | 0.98 |
ypr[4] x f(y) | 0.8827 | 0.9027 | 0.8946 | 0.9878 | -1.0460 | 2.7810 | 0.8170 | 0.98 |
ypr[5] x f(y) | 0.0410 | 0.0775 | 0.0692 | 1.0024 | -1.8831 | 1.9663 | 0.5275 | 1.00 |
ypr[6] x f(y) | 0.6509 | 0.6510 | 0.6322 | 1.0204 | -1.2392 | 2.6901 | 0.7375 | 0.99 |
Now get state occupancy probabilities for observed days, then for all days. We must modify the uprobB
function to handle the special nature of gap dependencies.
<- function(fit, data, times, ylevels, pvarname='yprev', gap=FALSE) {
uprobB if(! length(fit$draws)) stop('fit does not have posterior draws')
<- nrow(fit$draws)
nd <- length(ylevels)
k <- length(times)
s <- array(NA, c(nd, s, k),
P dimnames=list(paste('draw', 1 : nd), as.character(times),
as.character(ylevels)))
# Never uncondition on initial state
<- data
wdata $day <- times[1]
wdataif(gap) {
<- times[1]
gp $gap <- gp
wdata<- as.numeric(as.character(wdata$yprev))
ypn <- 1. * cbind(ypn == 2, ypn == 3, ypn == 4, ypn == 5, ypn == 6, ypn == 7)
X # wdata$yp <- X
$ypr <- X * (gamma + exp(-lambda * gp)) # X / (gp ^ alpha)
wdata
}<- predict(fit, wdata, type='fitted.ind', posterior.summary='all')
p 1, ] <- p
P[, # cp: matrix of conditional probabilities of Y conditioning on previous time Y
# Columns = k conditional probabilities conditional on a single previous state
# Rows = all possible previous states
# This is for a single posterior draw
<- matrix(NA, nrow=k, ncol=k,
cp dimnames=list(paste('t-1', ylevels), paste('t', ylevels)))
<- as.list(data)
data <- as.character(ylevels[-1])
data[[pvarname]] # don't request estimates after absorbing state
# When data contains a matrix with one row, expand.grid will treat each column of that matrix
# as a new row in the output. We don't want that. Instead, create data frame without
# expand.grid and add ypr as a matrix to it later
# edata <- expand.grid(data)
<- as.data.frame(data)
edata
for(it in 2 : s) {
$day <- times[it]
edataif(gap) {
<- times[it] - times[it - 1]
gp $gap <- gp
edata<- as.numeric(as.character(edata[[pvarname]]))
ypn <- 1. * matrix(c(ypn == 2, ypn == 3, ypn == 4, ypn == 5, ypn == 6, ypn == 7), ncol=6)
X # edata$yp <- X
$ypr <- X * (gamma + exp(-lambda * gp)) # X / (gp ^ alpha)
edata
}# y=first level = absorbing state
<- predict(fit, edata, type='fitted.ind', posterior.summary='all')
pp for(idraw in 1 : nd) {
<- rbind(c(1., rep(0., k - 1)), pp[idraw, ,])
cp # Compute unconditional probabilities of being in all possible states
# at current time t
<- t(cp) %*% P[idraw, it - 1, ]
P[idraw, it, ]
}
}
P
}
<- c(1,2,3,4,7,14,28)
times <- 1 : 7
ylev <- levels(d$trt)
tlev <- data.frame(yprev='4', age=repcov$age, sofa_nogcs=repcov$sofa_nogcs)
dat <- list()
R for(trt in tlev) { # separately for treatments
$trt <- trt
dat<- uprobB(b, dat, times, ylev, pvarname='yprev', gap=TRUE)
u <- u
R[[trt]] }
Look at state occupancy probabilities for placebo, summarized by the posterior mean probabilities.
<- round(apply(R[[1]], 2:3, mean), 2)
sop sop
1 2 3 4 5 6 7
1 0.00 0.02 0.14 0.70 0.11 0.02 0.00
2 0.00 0.06 0.16 0.53 0.18 0.07 0.00
3 0.01 0.09 0.13 0.40 0.21 0.15 0.01
4 0.02 0.11 0.10 0.29 0.19 0.26 0.02
7 0.04 0.10 0.07 0.20 0.14 0.37 0.07
14 0.06 0.09 0.03 0.11 0.05 0.31 0.36
28 0.08 0.06 0.02 0.06 0.03 0.25 0.50
Now repeat, requesting probabilities for all consecutive days.
<- 1 : 28
times <- 1 : 7
ylev <- levels(d$trt)
tlev <- data.frame(yprev='4', age=repcov$age, sofa_nogcs=repcov$sofa_nogcs)
dat <- list()
R for(trt in tlev) { # separately for treatments
$trt <- trt
dat<- uprobB(b, dat, times, ylev, pvarname='yprev', gap=TRUE)
u <- u
R[[trt]]
}round(apply(R[[1]], 2:3, mean), 2)
1 2 3 4 5 6 7
1 0.00 0.02 0.14 0.70 0.11 0.02 0.00
2 0.00 0.06 0.16 0.53 0.18 0.07 0.00
3 0.01 0.09 0.13 0.40 0.21 0.15 0.01
4 0.02 0.11 0.10 0.29 0.19 0.26 0.02
5 0.04 0.11 0.07 0.20 0.15 0.37 0.06
6 0.06 0.09 0.05 0.13 0.10 0.43 0.15
7 0.07 0.08 0.03 0.07 0.06 0.40 0.29
8 0.09 0.06 0.02 0.04 0.03 0.30 0.46
9 0.11 0.04 0.01 0.02 0.02 0.20 0.61
10 0.12 0.02 0.01 0.01 0.01 0.12 0.71
11 0.12 0.02 0.00 0.01 0.00 0.08 0.77
12 0.13 0.01 0.00 0.00 0.00 0.05 0.81
13 0.13 0.01 0.00 0.00 0.00 0.03 0.83
14 0.13 0.00 0.00 0.00 0.00 0.02 0.84
15 0.13 0.00 0.00 0.00 0.00 0.02 0.85
16 0.13 0.00 0.00 0.00 0.00 0.01 0.85
17 0.13 0.00 0.00 0.00 0.00 0.01 0.85
18 0.13 0.00 0.00 0.00 0.00 0.01 0.86
19 0.14 0.00 0.00 0.00 0.00 0.01 0.86
20 0.14 0.00 0.00 0.00 0.00 0.01 0.86
21 0.14 0.00 0.00 0.00 0.00 0.01 0.86
22 0.14 0.00 0.00 0.00 0.00 0.01 0.86
23 0.14 0.00 0.00 0.00 0.00 0.01 0.86
24 0.14 0.00 0.00 0.00 0.00 0.01 0.86
25 0.14 0.00 0.00 0.00 0.00 0.01 0.86
26 0.14 0.00 0.00 0.00 0.00 0.01 0.86
27 0.14 0.00 0.00 0.00 0.00 0.01 0.86
28 0.14 0.00 0.00 0.00 0.00 0.01 0.85
Computing Environment
R version 4.1.0 (2021-05-18) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Pop!_OS 20.10 Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 attached base packages: [1] splines stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] rstan_2.19.3 StanHeaders_2.21.0-1 VGAM_1.1-3 [4] plotly_4.9.3 data.table_1.13.0 rmsb_0.0.2 [7] rms_6.2-1 SparseM_1.78 Hmisc_4.6-0 [10] ggplot2_3.3.2 Formula_1.2-3 survival_3.2-11 [13] lattice_0.20-44To cite R in publications use:
R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
To cite the Hmisc package in publications use:Harrell Jr F (2021). Hmisc: Harrell Miscellaneous. R package version 4.6-0, https://hbiostat.org/R/Hmisc/.
To cite the rms package in publications use:Harrell Jr FE (2021). rms: Regression Modeling Strategies. https://hbiostat.org/R/rms/, https://github.com/harrelfe/rms.
To cite the rmsb package in publications use:Harrell F (2021). rmsb: Bayesian Regression Modeling Strategies. R package version 0.0.2, https://hbiostat.org/R/rmsb/.
To cite the VGAM package in publications use:Yee TW (2015). Vector Generalized Linear and Additive Models: With an Implementation in R. Springer, New York, USA.
To cite the data.table package in publications use:Dowle M, Srinivasan A (2020). data.table: Extension of 'data.frame'. R package version 1.13.0, https://CRAN.R-project.org/package=data.table.
To cite the ggplot2 package in publications use:Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org.
To cite the rstan package in publications use:Stan Development Team (2020). “RStan: the R interface to Stan.” R package version 2.19.3, http://mc-stan.org/.
To cite the survival package in publications use:Therneau T (2021). A Package for Survival Analysis in R. R package version 3.2-11, https://CRAN.R-project.org/package=survival.
A singularity occurred when trying to allow
gap
to be a linear spline.↩︎