Ordinal Markov State Transition Model

Lee and Daniels (2007) Biometrics Model

Model in General

I just looked over your paper quickly and it seems very close to what we need. Basically it tries to put the treatment effect into a marginal model but still model the transition probabilities using a markov model to get the dependency between multiple days, it is quite general which is great for biometrics but not great for me trying to write a computer program to analyze a clinical trial in COVID-19 in week. (David Schoenfeld)
- You mention that it differs from ‘Classical transition models’. What is a good paper on “classical” transition models for ordinal data, i.e. is there a paper using the constant proportional odds model on the transitions (which are also ordinal). Did you try this approach and got results that the “classical” approach wasn’t a good idea?
- I am sorry that I don’t really have time to dig into your paper, if I had good software I could try it out on simple simulated data and use it for my clinical trial on covid-19.
Q: I also don’t understand why a marginal model has less parameters than modeling the transition probabilities (DS)
A: There are not less parameters due to it being a marginal model. The marginal model itself provides a nice population level interpretation of parameters. The less parameters of the transition probs are due to the regressing on the past functional form. In principle, a similar functional form could be used for a regular transition model, but the treatment effect (e.g.), would be conditional on the previous (in time) outcomes. (Michael Daniels, co-author of original paper)
Q: Suppose we use the proportional odds model for the translation matrix. Suppose you are in state, j, say j-3, then your state on the next day is one of the 8 states, which obey a proportional odds model for treatment. (DS)
- Suppose you assume that the model is stationary with the same transition matrix every day, and you put a proportional odds model on all of the transitions, you then have one treatment parameters, whatever covariate parameters you have plus, 77=49 nuisance parameters. Only 7 because if there are 8 states death is absorbing. You only have 76 nuisance parameters for patients who start out sick and going home becomes absorbing.
- Now if 49 parameters is problematic one could change the time scale. Suppose instead of days you used hours and you assumed that in an hour you could only transition to three states, the next, the previous and your current state. Then there are only 21 nuisance parameters. The likelihood for a markov model doesn’t require knowing every transition. The daily transition matrix is M^24, although there is no need for 24 hours as you only need enough time intervals to have a positive probability of going form state 1 to 8 between any two patient visits.
- I vaguely remember that you can take limits and get some kind of continuous model for M^t where M is the matrix where you can only move up and down. I would guess that there are other models that could reduce the number of nuisance parameters, but is it really an issue if you are going to use MCMC to do the estimation, isn’t convergence less of an issue than with newton raphson? If you were worried about whether the process was stationary you could use M^f(t) where f is some increasing spline. Then you wouldn’t have a stationary model but it would be stationary on some transformed time scale. Or you could have several different periods with different nuisance parameters. My only worry is the transition from vent to death, because often they withdraw support after a certain amount of time.
- I don’t think this will give you the exact same model as proportional odds on any given day, rather it is proportional odds on the transitions. It is still plausible and explainable.
Q: I just read through the discussion on the state change model. There seems to be an underlying assumption that a patient can only move by one at any given time. However, I think they can go from ‘home and alive’ to ‘dead), or from hospitalized not requiring oxygen to mechanical ventilation in one jump. Am I missing something? (Chris Lindsell)
A: There is no jump by only one category restriction in the model. (MD)
Q: From the equations for the marginal effects and the dependence structure I’m having trouble seeing any shared parameters that would make the dependence structure have an effect on the estimation of beta. (FH)
A: Referring to equations 3 and 4 in the paper, the likelihood is based on the transition model in (4). The beta’s in (3) come in through the \(\Delta_{itk}\) parameter in (4) through the constraint to ensure (3) and (4) can be true simultaneously, given in (7). So the dependence comes in through the transition structure in (4) which propagates to estimation of the beta’s which are ‘in’ the \(\Delta\)’s. Is this more clear? And note this question made me have to go back to the original paper :-) (MD)
Q: It seems like the paper does a lot of work to avoid simply putting the proportional odds model on the transitions. It is nice that the model is consistent with proportional odds on each day, but how necessary is this. The NIAID is now moving away from the use of the 14day ordinal score because they have the same concern I do, that this is throwing away information. They are moving towards using time to recovery(censored at 28 days if you die) i.e. the subdistribution function for recovery. This is nice and simple but also throws out data, especially in an age when a person going off a ventilator two days sooner means it is available for another patient. I did understand, when I looked over the paper in about 15 minutes, how it worked, although I didn’t get how you actually did the estimation where Delta was determined by averaging over the transitions. I figured that I didn’t need to know that to use the model. (DS)
A: There is nothing that prevents a simpler dependence structure in the modeling. Everything would still push through. The key would be what might be a sufficiently parsimonious specification (e.g., for the gamma’s in eq. 9). This is a general issue with transition models for ordinal data. And note the advantage of the marginalized model is the nice interpretation of the regression parameters in (3) (MD)
Q: While preserving the nice interpretation of marginal models, have you seen any papers/presentation attempting to specify an ordinal-type dependence part of the models? (FH)
A: No on the dependence but I have not looked into this literature on ordinal models in a while. (MD)
Q: Is the log likelihood in equation (10) completely self-contained for getting all marginal + dependence parameter estimates? In other words does it have constraint (7) built into it? (FH)
A: Yes. \(\Delta_{itk}\) has information of \(\alpha\) and \(\beta\) using the constraint (7). (Keunbaik Lee, first author of paper) He sent the supplemental files for the paper as they are no longer available online. They may be found in http://hbiostat.org/papers/serialData/ordinal/lee07claCode/
Q: I understand this model quite well. I get all of the calculations (I think) except I have one hangup which is bothering me. It is \(logL^(1)(\theta)\) in eqn (10). I am just confused by it even though I imagine it is right and I know what it is supposed to be. It seems like there is a much easier way to write it. That makes me believe that I might be missing something. (JS)
A: I think I see the issue with what was confusing me. In Keunbaik’s paper, he forgot to put a parenthesis is the denominator of his definition of \(\phi_{i1k}\) under equation (10). It should be \(\log(P_{i1k}/[P_{i1k+1} – P_{i1k}])\). So subtle, but threw me off. (JS)
- Regarding simplication of the dependence structure, I’m not sure you want to get any simpler. I don’t know about this, but you might put an ordinal structure on it rather than having less restrictive model that Keunbaik proposed. But there may be something I don’t understand. Also, a first order transition is pretty simple and may not capture response dependence all that well…. One thing to note, Keubaik and others have shown that mean and dependence model parameters are orthogonal to each other in these models. This means you can get the dependence structure wrong and still be ok for mean model estimates. You would have to fix standard errors though. (JS)
A: \(logL^(1)(\theta)\) in eqn (10) is the loglikelihood contribution for time t=1. That is, it is calucated using eqn (3) for t=1. I recommend that \(\beta_{0k}\) and \(\beta\) be same over time (t) to reduce the number of parameters (In typical marginalized transition models, \((\beta_{0k},\beta)\) for t=1 and for t=2,3,… are different). Therefore, to calculate score and information matix for \(\beta_{0k}\) and \(\beta\), \(logL^(1)(\theta)\) is needed. In addition, Dr. Schildcrout is right. I have checked \(\phi_{i1k}\) in which there was a typo. I am sorry for making you confused. As Dr. Schildcrout wrote, marginal mean parameters \((\beta_{0k},\beta)\) and dependence model parameters (\(\alpha\)) are orthogonal. Therefore, there is robustness property to misspecification of dependence model (See subsection 3.5 Simulation Study). (KL)
Q: How does the model handle the absorbing state? (FH)
A: I know you asked Keunbaik, but since I’ve been thinking about that for a nominal outcome I think that I think it would work. You could set the corresponding parameter (a log odds ratio) to a large number and then don’t allow it to vary. For me, if I was doing maximum likelihood, I would fix that gamma to be say 10 and then maximize w.r.t. other parameters.
A: Thanks Jonathan. And that would be the way to do it. Large values for staying in absorbing state transition parameters and very small values for moving to another state from the absorbing state. So essentially prob=1 of not moving. (MD)
Q: Should records after the absorbing state be dropped?
A: The answer has to be “yes.” I was hoping we were going to be able to get around this, but the constraint that the marginal means must equal the marginalized conditional means lead to restrictions that will not always hold, namely (assuming State 1 is absorbing):
- \(pr(y_t=1) \geq pr(y_{t-1}=1)\)
- If, for any \(k>1, pr(y_t=1 | y_{t-1}=k)>0\), then \(pr(y_t=1) > pr(y_{t-1}=1)\)
- This is probably not what we want. We will need to see about properties when we remove all observations after an absorbing state. What were your concerns about that again?
Absorbing states (JS)

Regarding the absorbing states, the key is to understand gamma and how that works with Delta.  The gamma.mat below simulates the first state that is not exactly absorbing, but if you look at the data pr(y_{ij}=1 | y_{ij-1}=1) is very high.  In these data, it might be around 0.98.  I think there are problems with generating a deterministic absorbing state from this model.  The marginal and conditional models have to be cohesive and if you make gamma.mat too extreme, you will not be able to solve for Delta.  I played around with it a bit.  

 

gamma.mat <- rbind( c(10,  0,   0,   0, 0),
                    c(-25,  1.5,   0,   0, 0),
                    c(-25,  0,   1.5,   0, 0),
                    c(-25,  0,   0,   1.5, 0))

I think we will be able to force an absorbing state when generating the data, but I will want to see how that impacts marginal model parameters.

Handling of Pre-Randomization Ordinal State

Q: We measure the ordinal scale pre-randomization. I assume this has to be used as the first node in the transitions. How can one keep the treatment effect out of the model for this first period? In general how does one handle “kicking” off the model when the first period has no history? (FH)
A: In principle the first period can have its own model with different regression coefficients (since no history) (Mike Daniels)
A: I am not sure that the beginning transitions are is a big problem. The first transition is from day 0 to Day 1. They have received treatment, which may not be effective yet, but the first transitions 1-4 have a minor effect on the estimate compared to transitions 5-28 so. this lack of fit will not affect the estimator much. We use the proportional hazards model in survival studies despite the fact that the treatment effect probably doesn’t happen immediately. (David Schoenfeld)
- Suppose we measure days 0,3, 7,15,21 28. 8 states 1=home, 8=dead. Now people will start in 3 4 5 … We have on ordinal constant proportional odds model for treatment and covariates measured BEFORE treatment i.e. age being the most important. M is the transition Matrix.
- So suppose the patient starts in state 3 and is in state 5 at day 3. Then their contribution to the likelihood for M the transition matrix is (M^3){3,5}, if they are dead on day 5, this is multiplied by (M^5){5,8} …. Is there software, simulations,… etc, evidence that whatever we use for estimation works. There are a lot of nuisance parameters but hopefully not so many as to throw this into semiparametric inference.
- Also can we go to a continuous model. i.e. M^5 exp(5*log(M)) where we use the matrix log and the matrix exponential. I have forgotten whether this kind of thing works. The advantage of this would be we could assume that people can only move one step at time, and turn probabilities into hazards.

Overall Modeling Issues

Q: Are the ordinal state change model concepts sufficiently well developed to propose for this study – maybe look at the first seven days of hospitalization as the primary for early learning, and then longer for reporting the effect on the entire hospitalization for secondary analyses? I have heard Todd mention a couple of times that the disease course seems quite elongated, so expect it is important to consider the longer time frame in some way and secondary analysis seems appropriate. If it is, then we will need to come up with a plan for power analysis. Since the simulations can’t be run until the software is ready, maybe we just power on an PO model and a 5 or 7 day cross-sectional scale value and suggest this is the outside of sample size we will need? (Chris Lindsell)
A: For Bayesian power calculations (and for frequentist) it is probably fine to use a single ordinal measurement as I have done in simulations so far. (FH)
From Scott Berry:
- We’ve worked on some of these ideas — they work much better when progression is monotonic. You will have some ups and downs. You can create a Markovian structure for movements and have a treatment effect “move it up or move it down” in an odds type fashion. It is likely Markovian in the 14-day acute period.
- For the control arm model the probability from each state I=1,…,8 to the next state J in one day, T_ij say a Dirichlet distribution. Then add a proportional odds effect to these based on a treatment. Given a treatment X, it has a log-odds effect theta_X that tends to move people more to good states or more to bad states. Preserves the “one treatment effect parameter” (proportional odds) and incorporates the 14-day course. It will likely involve how many “good moves” are made and how many bad steps are made.
- Others have used “worst score” over the 14-days
- The differences in these probably depend on the severity at baseline as to whether an ordinal regression model adds value. If they are moderately-healthy it is likely driven by progressing to ICU and if they are quite severe, do the die?
Q: This model needs to be applied in simulations where the control data is from real data on patients with covid-19 (which I don’t have yet) We need to see that the effect of treatment in the model is at plausible and that the power of using it is superior to using vent-free days or time to recovery as is being used elsewhere. I am not wedded to the use of a model if vent-free days or recovery free days or day 14 is just as good. (DS)
A: ICU-free days have become quite popular as out of the ICU can help others. REMAP-CAP uses ICU-free days as ordinal over 14 days with death as the worst outcome (a –1 in the ordinal scale). My guess is this is very correlated to the transitions above and quite a bit easier to analyze. (SB)
Q: We come up with a Bayesian design that has two important properties: (DS)
- If we look at the data when it would make a decision to stop, it looks like we should stop, what people call face validity.
- If we followed it exactly it’s frequentist properties would be reasonable. My reason for considering a Bayesian design is that we may have to deviate from it, either to pool external data, change the maximum sample size, or change the endpoint, and I would use Bayesian posterior probabilities to justify the deviation to the DSMB.
A: This can be done, posterior probability superiority > 0.99 if you start 50-50 tends to work well. Agree on the value of Bayes here — but I wouldn’t say “deviate” … I think you should predefine these adaptive cases… have the DSMB watch the trial and not redesign it from inside. Do interims every Monday and calculate the posterior probability > control. If greater than 99% script the control and give the treatment — hopefully you have other arms, combinations to add,… the pool of treatments is increasing almost as fast as the damn virus. Face validity might just be the first interim (minimum sample size). (SB)
Q: I prefer to put in a skeptical prior and stop when probability is 90-95% I think it is more reasonable that putting in a non-informative prior and waiting until it is 99%. Have you had experience with that? I thought that your continuation of Violet when the conditional power was only 10% was chosen because of good properties on simulation rather than that it made sense of itself. You could have chosen a higher number if you had an informative, favorable prior on the proportion of patients with vitamin D deficiency. (DS)
A: I’ve always thought it less than ideal to use flat priors and higher posterior cutoffs. With skeptical priors you get automatic pullback of treatment effects in proportion to how early you stop, and this wears off as the sample size grows. From a pure Bayesian standpoint, the posterior should stand on its own and skepticism should be only in the prior.
From Jim Abert:
- My 1994 paper on Markov models for relapsing-remitting disease may be a bit too simple for this. Most of these transition models assume a first-order Markov assumption and do not account for heterogeneity (beyond covariates) simply. Unless you are explicitly interested in transition probabilities (what is the effect of treatment on transitioning from one state to another?), I would not go with this type of an approach for a clinical trial. I would think, you want something that is less model dependent and protects the type I error rates even when the model is mispecified.
- I like the approach that Ritesh suggested (#2). A proportional odds model with a random intercept to deal with the heterogeneity. Fixed effects would include time from randomization and treatment group (and perhaps their interaction). Within this model, you could include entry state (state at baseline) as a fixed effect to adjust for the varying entry states of the patients. The overall significance test for the treatment effect in these models would provide a nice and relatively simply and robust test for treatment effect. (inference on fixed effects in these types of models are pretty robust to the random effects distribution, for example).
Q: From what I gather in the last part you wrote, an ordinary random effects proportional odds model (e.g. mixor package in R or Bayesian random effects model) would do the trick, right? (FH)
A: Yes. It will be simple, easy to interpret, and relatively robust. (JA)
From JS:
- I’m working on generating data from the marginalized transition model for ordinal data. Unless I’m missing something I will get that done today. Indeed solving for Delta must be done numerically, but it is easier for a transition model than a random effects model. If I figure out this data generating process it will provide the insight into finding Delta in general.
- I’ve been trying to figure out how to make my binary data solution work for ordinal data, and I don’t think I can in this short timeframe. I’m trying a different approach now.
- It is more of a pain / takes longer to calculate the Delta in the marginalized model analogue to the random effects model than it does to calculate the Delta in the marginalized model analogue to the transition / Markov model.
- With these marginalized models you DO need to calculate Delta and that needs to be solved numerically. That said, I do think it is pragmatically doable. It’s a little longer a calculation but it’s not that bad… After I finish this data generator (assuming I’m successful) it will be clearer to you and Ben if these marginalized models are feasible for your purposes.
- I haven’t read papers on mixed effects ordinal models. I think your solution is a fine one for the time being. It would seem that truncating after death would represent a missing at random “dropout” mechanism. That said, I don’t understand the literature all that well, but as you know dropout due to death and other forms of dropout are distinct from one another. Depending on how you address death, they can estimate different quantities.
- I’m looking into marginalized transition model for ordinal data because I thought you believed a transition model captured the response dependence in the most appropriate way but you did not want to include the transition terms in the mean model because lagged response values mediate the relationship between a treatment and current response values. Is that correct?
- Anyway, I am continuing to work on this, but I’ve lost a little bit of hope that the solution is within a day for me.
- Responding to other paragraphs: Also, I think there is hope that the approximation they use in the Rana et al paper to move from marginal to conditional models is ok even if not perfect. That said, the times I have tested the approximation, I was starting with a conditional model and using the approximation for the marginal model parameters. They are going in the opposite direction. Not sure how well that will do.
PA: It looks like the R package that Ritesh found handles a more complex random effects structure. I agree that how you deal with death makes this a bit tricky. Modeling only the first followup that the individual died may induce informative dropout of the longitudinal process (the measurement process is then related to the underlying random effects). This is the so called informative dropout problem that was first discussed by Wu and Bailey (1988-Biometrics). They used a shared random parameter model to link together the survival and the longitudinal process. I discuss this model and newer literature in a recent Statistics in Medicine paper published earlier this year (Albert, 2020). For this problem, assuming that the study has scheduled follow-up visits, one could repeat the death measurement for all scheduled measurements. I think this would work for this problem.
Q: It would seem that random effects models would more naturally deal with imbalanced (unequal measurement times) longitudinal data than would transition models. (JS)
A: I think you’re right that this is even harded with transition models. It’s just that random intercepts models assume compound symmetry which is a tenuous assumption even with equally spaced measurement times, and I wonder if we can do better. (FH)
A: Random lines deal with autocorrelation / serial dependence somewhat… not as intuitively as a transition model does though. (JS)
A: I agree with Paul and Frank that a mixed effects proportional odds model seems most appropriate for this setting. In this case I don’t think death is considered informative dropout since it is on the ordinal outcome scale, and agree that it would need to be considered observed data for each scheduled visit after occuring. I don’t know if there might still be an issue with the estimation though. (RR)
BG on implicit solution of \(\Delta\) in Lee-Daniels model: I think more than 3 decimal places of accuracy would be needed to not break the Hamiltonian process in Stan. That said, there is a numerical equation solver, so if a solution theoretically exists for any admissible value of the other parameters and especially if there is a way to obtain a good starting value for the numerical equation solver at each MCMC iteration, we could give it a try. It will be slower, but I doubt the RCT will be so big that runtime speed will become a limiting factor in any way.

FH Current Global Summary

As shown in the analysis of the Violet 2 study, carrying death forward until the last planned day of followup for the whole study results in a huge violation of the proportional odds assumption with respect to time (study day). If one really wants to carry deaths forward (not recommended), the the partial proportional odds model (PPO) should be used. For that model one would have an exception to PO for the time variable (at least) but would still model treatment using only one parameter (assume PO for treatment). It is instructive to consider the limiting case where the ordinal outcome is binary and death is the only event. With small time intervals, a longitudinal binary logistic analysis has these properties:

when death is carried forward, the odds ratio is not the hazard ratio
there is no power gain from carrying death forward as opposed to terminating the longitudinal records on the day of death
when dataset observations stop at death, the odds ratio is the hazard ratio
binary logistic risk estimates as a function of time do not reflect cumulative incidence in a time-to-event model in that case
if death is carried forward and there is only one time point for censoring (i.e., if we excluded general censoring as in staggered entry), the longitudinal binary logistic model risk estimates equal Kaplan-Meier cumulative incidence estimates
this does not adequately compensate for the other problems caused by carrying death forward

Current recommendation:

Bayesian proportional odds model with random effects and truncation at death
Attempt to program a marginal structural Jonathan Schildcrout-type PO model that will be more robust to correlation patterns
Attempt to extend that to incorporate an AR(1) within-patient correlation pattern in the marginal model framework
If death carried forward is imperative, adapt the R rms package blrm PO model with random effects to implement the PPO model
In general for longitudinal studies having more than one outcome event, the event mix will change over time (e.g., in a cardiovascular study one may observe myocardial infarctions earlier and death later). So the PPO model needs to continue to be considered.
See Steele et al for more about model specification using random effects for multistate outcomes

Jonathan Schildcrout’s Summary of Multi-State Modeling Issues

I would like to review the question we are attempting to answer and the perceived challenges with analyses:

Are we looking at a treatment by (smooth function of) time interaction? If yes, I’m interested in discussing because based on a note that David wrote, they are interested in a single outcome at, say day 15. It would seem to me that we will not be able to estimate that treatment effect more accurately than he could, but we could potentially estimate it more efficiently by using the longitudinal data.
- FH: The current plan calls for analysis of the outcome scale at a single day but the scale is collected every day and it is likely the plan will be changed to use all days to get more power and to allow for time-varying treatment effect
Do you know how many people are going to be in this dataset? Will it be 15 days of followup? Will there be missing data?
- FH: Around 500 patients with follow-up to either 15 or 28d. I hope missing data is minimal but you never know.
Will there be absorbing states? Will it be one or two absorbing states.
- FH: death one one end is the only true absorbing state, but once a patient reaches home they are very unlikely to have another event, so home is almost absorbing
What else is there?

I feel like I have a pretty good sense of how the optimizer will perform, although with more states, the optimizer is going to have more problems. Some things I think I’ve learned:

Dropping all observations after entering an absorbing state leads to an analysis of longitudinal ordinal outcomes for people who are alive. It presupposes that those dead people could theoretically change states. I think that I think this is not the right approach. Both of our analysis procedures reproduced the data generating mechanism, but that generating mechanism assumes there is a positive probability of changing states.
Keeping the dead observations in the analysis lead to difficulty in understanding what the “true” parameter values are. The correlation structure is definitely very challenging in that scenario, and also I have concerns that the proportional odd assumption is still valid.
With 2) being said, I think that the plots that I showed you that include the 1 and 2 absorbing states show the “truth” under the assumptions of the data generating model parameters combined with an overriding absorbing state combined with the assumption that you are using a proportional odds model to fit the data.
I think that I think that keeping dead observations is the right thing to do. The question then becomes how to model those data.
- FH: We have a fundamental dilemma: In the special case where the only event is death, if you did a Cox model analysis vs. chopping time intervals into very fine intervals to using a longitudinal binary logistic model, the odds ratio from that model is the hazard ratio. So it gets the relative treatment effect right. On the other hand the predicted probabilities from that model do not seem to have an interpretation (I have an email to Therry Therneau asking about this).

Modeling data with an absorbing state:

I have major concerns that about getting the correlation structure right which is necessary to ensure that most likelihood based estimators will be valid.
Because of 1) my top 3 choices for analyses would be
1. Multistate models that are then marginalized in order to estimate the treatment effect over time. I’ve never done this, and it would take some time for me to learn but I think it is doable
2. A GEE type estimator
3. The marginalized transition model (which is a likelihood based procedure but can have a misspecified dependence structure and still get valid parameter estimates) but possibly with robust SEs since I am worried about the correlation structure. That being said, we would have to force the fitter to NOT allow for 0 probabilities of leaving the dead state. It would be almost right.

FH: The last option (e.g. Lee-Daniels) is appealing but before drawing a conclusion I’d like us to think deeply about whether the marginal mean part of the model is really a state transition model at all (one that accounts for an absorbing state and other things) or is it just a longitudinal proportional odds model that has a PO assumption that may be violated with our setup?

It does not seem that OMTMs are mathematically well suited for models with absorbing states (assuming we want to keep all observations that occur after the absorbing state occurs), and this is caused by the constraint that the marginal mean (probability of being in state k) must equal the marginalized conditional mean.

Thought Experiment: Longitudinal vs Cross-Sectional Analysis

Let’s say you do a cross-sectional analysis at the 28 day timepoint and compare it to a longitudinal model that includes a flexible treatment by time interaction and isthen summarized on day 28… some questions:

Should the treatment estimates be consistent for the same quantity? I think they should (again ignoring random effects issue)Did you include death in the cross sectional analysis at day 28?
I don’t think truncating at death will satisfy 1)

FH: The two should be consistent since the time trend is flexibly modeled. The all-times analysis needs to stop at death. So what about the single time analysis? Tradition has it that death before day 28 is considered the worst outcome. So that’s effectively carrying death forward. But is a single time analysis comparable to a longitudinal one? And if death were the only endpoint you were considering time to event you woudn’t need to carry death forward.

Use of a Continuous Model

Q: One of the reason I was thinking about a continuous model exp(-t log(M)) rather than M^t. is that then you could require that people go through a middle stage before going to a final stage. Some people will go from hi flow O2 to death without being ventilated in a day but that is just because we didn’t know about their ventilation or they should have been ventilated but they couldn’t get to it in time. You can model state transfer as a birth death process without people actually having to be measured going through each state. Suppose the transition matrix M just goes from state I to i+1 or i-1, M^2 will go from to i+2 and M^3 will go from I to i+3, and probably the model will be reasonable. Exp(-t Log(M)) will I think go from state I to every state with the proper probability if t is not days.. My vague memory is the ExP(M)= I + M+M^2/2 +M^3/3! ….. Log(M) similarly and that they behave similarly as the scalar log and exponential. They are available in R. If days is divided into 8ths than the matrixes are complete. Someone has to play with this. Hopefully someone whose math isn’t a vague memory. https://www.rdocumentation.org/packages/expm/versions/0.999-4/topics/expm (DS)
A: That approach has some merit but it would not prepare us for a perhaps upcoming RCT in which state transitions are not monotonic. (FH)
A: This is what is done in the msm package I mentioned in R (https://cran.r-project.org/web/packages/msm/vignettes/msm-manual.pdf). The underlying process is assumed continuous but the data is observed intermittently, and you can specify only transitions to/from adjacent states (and all states can transition to death if preferred). The Matrix Exponential on the matrix of transition intensities by time is used to estimate transition probabilities over time. Proportional hazards model can be used to estimate covariate effects on transition intensities. The drawback of the model is that it assumes constant hazards, and you can’t relax those assumptions all that much beyond piecewise constant with a model of this complexity. (Ritesh Ramchandani, Harvard)
Q: I am in midst of designing randomized clinical trials for COVID-19 . We have an 8 point ordinal scale where 1-Home and well., …. 8-dead. People will start on our trial somewhere and then move along it. Cleary Dead is always absorbing , home might be absorbing for anyone who starts with illness. One of my studies is short, 28 days and people start sick say state 3 to 7, The other is long and everyone starts at state-1, if the leave state-1 and get back to it they are finished. We would like to use a constant odds-ratio model on the transitions and we could even assume it is stationary. Here is what I have written so far as our problem. I wonder if you could help us. Do you have or know of any software for your model that could easily accommodate our problem. Have you considered this model. Is it plausible. Is it problematic. Sorry I haven’t laid this problem out with more detail. I am overwhelmed by trying to design clinical trials while knowing so little about the disease. I am desperately looking for help from someone who knows how to do these things and has or is willing to write software. Basically I am looking at Markov models where the transition probabilities from a given state to the others obey a proportional odds model or any other model that has one parameter for the treatment effect. (DS)
A: Ritesh Ramchandani (Harvard)
- Since the underlying process is a continuous-time process that is observed intermittently, you could model the data using a continuous time Markov model with forward and backward transitions between adjacent states. These slides below from Chris Jackson that addresses continuous-time Markov models for panel data https://www.mrc-bsu.cam.ac.uk/wp-content/uploads/2017/10/multistate_enar_webinar.pdf; I used his R package msm in my first thesis paper. The model is parameterized in terms of transition intensities, and you can use a proportional hazards model for the effect of covariates on transition intensities; further you can constrain the effect of covariate to be equal across the transitions. The biggest issue with the model is that it’s mostly parametric, assuming exponential times spent in each state, and that the hazards are constant over time (this can be relaxed a little bit to piecewise constant).
- I imagine that there is some methodology and software for proportional odds models for longitudinal ordinal data, but I’m not familiar with the literature. From a quick search, something like this may be what you’re looking for https://cran.r-project.org/web/packages/mixor/vignettes/mixor.pdf
- I think if it makes sense clinically, it might be good to reduce from 8 states to 5 or 6 to simplify the model a bit and avoid overparameterization.

Data

Here is data, could you write an R program to estimate the parameters using a continuous-time markov model, and then a program to generate treatment data with a cumulative proportional odds model on the transitions. I would also need I program to calculate the treatment odds ratio. Doug can you have Christine pull out age or some other covariate that I can link to these data, we could make believe it is a treatment effect and it would give us some idea how the model worked. (DS, violetData.csv)

Code

Lee and Daniels original Fortran code

Strategy and Plans

DS:
Someone will develop software to fit an ordinal transition model so that we can model every days data, not just day 14.
This model needs to be applied in simulations where the control data is from real data on patients with covid-19 (which I don’t have yet)
We need to see that the effect of treatment in the model is at plausible and that the power of using it is superior to using vent-free days or time to recovery as is being used elsewhere. I am not wedded to the use of a model if vent-free days or recovery free days or day 14 is just as good.
We come up with a Bayesian design that has two important properties:
- If we look at the data when it would make a decision to stop, it looks like we should stop, what people call face validity.
- If we followed it exactly it’s frequentist properties would be reasonable. My reason for considering a Bayesian design is that we may have to deviate from it, either to pool external data, change the maximum sample size, or change the endpoint, and I would use Bayesian posterior probabilities to justify the deviation to the DSMB.
The DSMB has always liked to have rules. Most DSMB’s do as it becomes very hard to really make a decision without guidelines. I was on a DSMB that stopped a trial when there wasn’t a stopping rule. There was a strong trend, I would even say high probability of harm(since I calculated it) it took three meetings to stop the trial while some kind of futility stopping rule would have allowed us to stop it immediately. If there are extraordinary reasons not to follow the rules DSMB’s can deviate and sometimes do as they see the totality of evidence and not just the primary endpoint, but if we send them a document that says do what you want they won’t accept it. In fact they didn’t because the last paragraph of the statistical plan said exactly that.

Bayes

I have been doing a lot of thinking about the issue of a Bayesian verses a frequentist design. I suggest we use the approach in the FDA guidelines above. Our problem is we have to do a trial that will provide evidence that will help the medical community answer an important question. I think this will be difficult if we have to give them a philosophy lesion on the differences between Bayesian and frequentist inference in order to convince them.
- Gregory Campbell at the FDA has done a lot of thinking on how to do Bayesian trials in the context of regulatory statistics. I think if we follow the FDA guidelines for Bayesian trials we can’t go too far wrong in convincing the medical community( and ourselves for that matter). As I understand it, we do simulations to show that the frequentist properties of a plan are reasonable, and we report these simulations in our paper. Our primary analysis is a Bayesian analysis of the data, which unless the trial stops very early will be independent of the prior we choose (which given our data on HCQ will be skeptical), this will be the analysis presented to the DSMB. If the situation requires it, we encourage the DSMB to deviate from the plan with the goal of making the best decision with all the data they have. The fact that we deviated from a pre-made plan will not be considered to be evidence about the treatment effect.
- The reason we calculate the frequentist probabilities of our plan is that we want to demonstrate that our plan was reasonable whatever you think about how to do inference. I realize that this doesn’t have the purity of either method of inference and has a certain level of ambiguity Mathematicians prove lemmas, Statisticians have dilemmas(Stephen Senn).
- Here is how I would frame the paradox in the two methods of thought.
- I think that the differences arise from the two different views of probability are irreconcilable. That is, the frequentist view is paradoxical because it doesn’t seem correct that if the experimenter could have done one of two experiments the analysis of the data should depend on what the experiment was that he didn’t do. The practical problem for us with it is, that we can’t continue the trial if the question isn’t answered once we reach 500 patients because we have used up all our alpha.
- On the other hand the Bayesian thinking is paradoxical in that if the experimenter is playing a game where he does 100 experiments and gives me one of them, my beliefs should ignore the fact of the game he was playing. In more practical terms, if he keeps running an experiment until he gets the results he wants, then the fact that he is playing this game is irrelevant.

Choice of Prior

DS
- Hierarchical models with non-informative priors. This is basically empirical Bayes, I used it in my paper on historically controlled trials. I don’t really consider this Bayesian statistics.
- Use of real priors that reflect prior beliefs. There are two ways to do this. One is to use your own prior after speaking to the experts and the second is to use a prior with only one parameter and graph the inference verses the prior so that each person who reads the paper can put in their own. Any paper that really describes the data, reflects the likelihood.
FH regarding DS 2nd category:
- expert elicitation (I don’t usually trust this, neither does David Spiegelhalter)
- relevant trustworthy literature
- skeptical prior making use of no previous data (especially good when there are no previous data on the drug-disease combination)
- borrowing information for relevant patient populations (as you did with borrowing adult information to inform pediatric trials)
- make the result interactive so that each reader inserts their own prior (this is being done more and more, but mainly for Bayesian re-analysis; I have some good links for this)
SB
- You may end up where the DATA that creates a success is the same with a skeptical prior and a flat prior. In the simulations this would off course come out. 99% from flat and 90% from skeptical may be the same ‘Bayes factor’.
- I don’t at all mind the skeptical prior, but I’m not sure it’s correct here… and you my paint yourself in a corner:
- In REMAP-CAP we have started with “informed” priors. For example, there are reports that Interferon and steroids combination may have a negative effect so we include a prior effect for the interaction in the models and the starting RAR. We could add priors that Keletra is positive given initial studies — though this was decided not to. What is my point? We want to reserve informed priors — for sure. For this reason we don’t want to start with skeptical priors as a default (and randomization starts biased against arms, more control). This does not at all seem like the right prior belief…. (Ignoring the easy Trump jokes here on HCQ).
- You will also be fighting science a little. A 90% probability of superiority when the trial stops will feel underwhelming. I get it — but it’s not me that consumes the final information.
- Love the interactive part Frank. I would also suggest that given your trial is narrow and is addressing a single question — you probably want the answer to that for most people to be the same answer.
DS
- I got the idea of a skeptical prior from Larry Friedman and Speiglehalters book on Bayesian Clinical Trials. I had presented it to my investigators in a talk on interim stopping where I also went over the frequentist methods. The basic idea was that a trial should be stopped for efficacy when the skeptic would be convinced of the results and for futility when the enthusiast would be convinced. Then the specification of the skeptical prior is to put the center at zero and the alternative at 1 or 1.6 SD away. When you do this the side effect is that you get frequentist properties that are not wildly different than one would get with a group sequential design. The advantage is however that it is flexible.
- I find it very hard to get a prior from people. Mostly they tell how well they think something might work, if it does work. I suppose I could use this with point prior at zero but then I can’t do the math. I don’t believe that the Bayesian paradigm works without a prior. Bayesian statistics is a way of updating belief with data and not a way of creating belief with data.
- Maybe we should not use a skeptical prior. Trump is touting the drug and stockpiling it. Maybe we should just look at 100 patients be reasonably sure it isn’t harmful and go on. “What have we got to lose”.
- The source code is there. What I really want is.
  - Someone to run the code with some different priors and or cut-offs and suggest that we use them. I would be glad to change it.
  - Someone to tell me that the naïve asymptotic I used are reasonable by doing on MCMC and getting an exact result.
  - Someone who is not a biostatistician to tell me it makes sense to them.
  - Covid data so that the endpoint and model can be better specified.
- FH: I forgot to respond to a very important point from David earlier. As David Spiegelhalter has found, it is very hard to get reliable priors by talking to experts. He evolved to only using the variability of their priors in formulating the variance, and ignoring their means. (The greatest fiascos he witnessed were in grossly overvalued chemotherapies). This is one reason I like skeptical priors as opposed to optimistic priors. This brings up the question of what type of expert should specify the prior. I think that statisticians need to have more say in this that is usually implied, but all the experts can weigh in once they see the tradeoffs (as in our graphs) between Bayesian power and getting impressed with positive results too easily.

Planned Deviations from Trial Design

DS: The primary advantage of a Bayesian monitoring plan is that whenever the trial is stopped the inference only depends on the data and not on what the original plan was. The following situations that might lead to a deviation from the plan are described
There are multiple trials of HCQ that will be conducted. We will be receiving reports of completed studies and we may be recieving interim reports of ongoing ones as well. We will incorporate this information using Bayesian methods( Schoenfeld, David A., Hui Zheng, and Dianne M. Finkelstein. “Bayesian design using adult data to augment pediatric trials.” Clinical Trials 6.4 (2009): 297-304.), which allows us to calculate posterior probabilities that use this information. The method basically weights the external data based on its relevance to the efficacy of HCQ in the population we are studying.
In addition there may be reasons to continue this trial past the 500 patients initially planned. For instance if the trial reaches the 500 patient look and the posterior probabilites indicate a reasonable chance of efficacy and the issue of HCQ is still an important one. The current design can be continued with the same stopping rules.

Data Simulation

Rajat Mukherjee, Cytel Corporation: After our discussion, we strongly feel that the longitudinal approach provides more clinical information, including the initial impact on patients and on the health systems. I am developing simulation routines for this and wanted to share my initial thoughts on the data generation process. The approach is using (current) state dependent multinomial probabilities for the next follow-up, with two absorbing states: death and not hospitalized (this may not be true as there could be some cases worsening after going home). The initial state (at day-0) is restricted to 2,3,4 and 5. If a patient dies before day-7 say, then that patients is still counted at state=1 at days 14 and 28. We assume these probabilities for the control arm and then use the OR (perhaps time specific in case of testing for interaction) to generate the treated arm. I find this approach more direct than using correlated ordinal data generation methods (e.g. SimCorMultRes::rmult.clm) or multi-state routines.

Checking Fit of PO Model

DS: The model is the log(P(Y>=y)-Log(Y<y)= Alpha_y+f(t) +B Z
- For each value of y we can plot Log(P(Y>=y))-Log(P(Y<y) against t(smoothed), and we can also plot the average of Alpha_y _+ f(t)+ B Z against t, it would even be good if this were built into a predict.lrm method for an lrm object. These two lines should be close to each other and not systematically different.
- My worry is that this plot gets way off towards the edges, I,e. near t=0 and t=28, If it is good for much of the region we could just concentrate on that region. Alternatively the primary nuisance function is a function of y and t given by alpha_y+f(t), With an 8 point scale this has dimension 8*28 clearly it need to be smoothed. Maybe there is a better smoothing then alpha_y+f(t) that fits the data better. One could do this with and without including people who had reached an absorbing state.
- I also was thinking about the problem of people without the full 28 days of follow up. If you are estimating using an MCMC don’t you simple model missing observations as additional parameters and let the MCMC take care of filling them in?
- Ritesh: I think you could do something similar except for your model the “predict” statement is a bit different. For each patient at each time you can predict the value of log(P(Y>=y)-Log(Y<y)
- From the model and their starting value. Then you can plot the same plot as for the Harrell model
Suppose it looks like this for placebo
- Day 1 .7, .1, .1
- Day 2. .6,.2,.2
- Day 3 . .4,.3,.3
- Day 4. .3. .4,.3
- This is very plausible people start very sick and get better but some don’t, So alpha_y Is there to allow variation in the proportions, i.e. If you had only one verctor Say .7,.1,.1. the alpha_y. would allow these proportions to be fit exactly. Now you add f(t) but f(t) doesn’t contain y, so what happens is that roughly speaking alpha_y is fit to the average vector of proportions, and then one is ASSUMING that you are successfully fixing up the difference between the days with a simple function of time alone that operates on the cumulative odds. This works fine with a binomial but maybe not when there are 4 categories. You need to test fit to this,
- Where you can’t test fit well is whether the treatment actually follows a proportional odds model. I think that is less important because the measured proportional odds treatment effect is a project of the true effect onto the proportional odds model. If you don’t have the model just right it only affects power .