Subject:
Recurrent events
Date:
January 15, 2025 at 11:51:52 PM CST
We
had to review this for a trial as there was considerable disagreement among
various investigators on which method to use for recurrent event analysis -
thought you might like to have this summary as a reference - pros and cons for
various analytic methods for recurrent events. This has some regulatory
consideration which might not be of interest to you
Javed
Butler
Model |
Explanation |
Poisson |
Constant
expected waiting time between recurrent events and across participants
assumed to follow the Poisson parametric model. Often see this for adverse
events when you see exposure adjusted incidence rate and incidence rate
ratios (sum up events and divide by time or could do number of subjects with
events divided by time up to an event). |
Negative
binomial |
Constant
expected waiting time between events, but each participant has their own
constant guided by an unobserved random variable (think their own risk)
termed a frailty. Inference is on the average waiting time and similarly
gives rate ratios. View as a relaxation of the assumptions used in the
Poisson model for additional variability across participants. |
Andersen-Gill |
The
gap times for recurrent events are modeled similar to a Cox model where
covariates have a multiplicative effect on the those
times. Unlike the Poisson models, the times are modeled rather than counts
and divided by times. |
Lin-Wei-Yang-Ying
(LWYY) |
Model
is the same as Andersen-Gill but relaxes an assumption made in the A-G model
about the relationship between events. A-G assumed that prior events have no
influence on future recurrence in the variance calculation. LWYY changes the
variance calculation to be a “robust sandwich estimator” to address this
untestable assumption. The result is that LWYY variance estimator is larger
than the A-G variance estimator, but this is likely appropriately so and has
largely replaced the use of A-G models. |
Wei-Lin-Wessfeld (WLW) |
Instead
of modeling the gap time between events, this models the time from the study
start to each event: time to first, time to second, time to third,… Then create a weighted aggregate of them. This
used to be a popular method, but has largely fallen
out of favor since its target of estimation can be difficult to explain. |
Nelson-Aalen |
Estimator
of the cumulative hazard function in the presence of censoring. Can be used
to also produce a mean cumulative incidence plot similar to
a Kaplan-Meier except the y-axis is average number of events per participants
at a time rather than the proportion of participants with an event. For a
constant ratio Nelson-Aalen problem, the LWYY will estimate that ratio well. |
Ghosh
and Lin |
A
competing risk version of Nelson-Aalen that assumes no future events
following death. This method flattens the Nelson-Aalen mean cumulative
function by keeping deceased participants in the denominator rather than
censoring them. There is an associated ratio model from Ghosh and Lin that
can be made to summarize this similar to the the Nelson-Aalen/LWYY relationship. |
Choice
between the Negative Binomial and LWYY: The negative binomial and LWYY often
agree (exactly so if no missing data) but the LWYY tends to have some better
behavior than the negative binomial in presence of noninformative missing (see
a submission on qualification to EMA on methods on this: https://www.ema.europa.eu/en/documents/other/qualification-opinion-treatment-effect-measures-when-using-recurrent-event-endpoints-applicants-submission_en.pdf).
The LWYY model also can be shown to estimate a particular kind of estimand termed “while alive exposure-weighted event rate
ratio” (see https://www.tandfonline.com/doi/full/10.1080/19466315.2021.1994457#d1e280 beyond
the EMA qualification documents on this). S
Issue
with informative censoring: Both the negative binomial and LWYY assume that the
censoring is noninformative. This can be a problematic assumption in HF studies
with a high death rate inducing censoring since the deaths are often related to
their HF progression. This can be viewed as an informative missing data
problem.
Issue
with death as an event: In the negative binomial and LWYY models there is a
choice on whether death (or CV death or HF death) should count as one of the
“recurrent” events. In favor of counting is that it is potentially more
information on the same process. Against this is that the effect of the drug on
mortality can be different than on keeping participants out of the hospital or
clinic and inclusion could erode some effect. Mixing event types also
complicates the interpretation.
Joint
frailty models: Joint frailty models can be viewed as an attempt to handle the
above two issues. They do this by writing one model for the recurrent events
(often an A-G/LWYY type or a parametric model like a Poisson) and another model
for time to death (often a Cox model or a parametric model like a Weibull) and
then linking the two models with a shared participant specific random variable
(frailty). This frailty can be viewed as linking the risk of the recurrent
events to each other and to the risk of death such that a participant who dies
quickly likely also would have HF events with shorter gap times between them if
they did not die and similarly a participant with repeated HF events with short
gaps is at an elevated risk of death. The result is an estimator of the HF
specific process (HR if A-G like or RR if Poisson like) and an estimator of the
hazard ratio of the death process. The variance of the HF HRs is generally
improved due to the missing data handling from correlated death time information
and the effect shows less attenuation from separating the often
weaker treatment effect on death from treatment effect on HF events.
Some
cons to the joint frailty models are:
1.
Fitting them is more complex and
sometimes the specification fails to converge for some data. This can be
resolved with a fallback to a parametric model that will converge.
2.
The estimand
for the recurrent HF event process is a hypothetical one of an underlying
recurrent event process that cannot be observed due to the competing risk of
death. It isn’t any different than “time to first HF event”, but not having an
explainable treatment policy estimand without added
assumptions likely relegates it to secondary endpoint analysis
In
trials so far, the joint frailty estimator of recurrent HF events has performed
well, demonstrating same or stronger effects on the HF event process than LWYY
models. For example, see figure 3 for Entresto (https://pmc.ncbi.nlm.nih.gov/articles/PMC6607507/#ejhf1139-bib-0016)
as well as for EMPEROR-Preserved (https://www.sciencedirect.com/science/article/pii/S0735109723063829?via%3Dihub#bib26)
as well as in earlier analysis of CHARM and CORONA. LWYY however has not in
general outperformed the time to first event analysis in HF studies as noted in
the second publication in the prior sentence.
Between
the Nelson-Aalen and Ghosh and Lin approaches, the Nelson-Aalen more closely
matches the underlying HF event process while the Ghosh and Lin may reflect how
many events may happen in practice which has use for payers.
FH
Reply 2025-01-16
This
is an excellent summary. But it omitted
what is the best way to handle recurrent events in my estimation: longitudinal
multi-state (state transition) models.
The reason you see so many solutions to the recurrent events problem,
with so many of them ad hoc and not derived from general statistical
principles, is that the problem has been miscast as a time-to-event problem
instead of the more natural, and better fit to the data generating process,
approach of longitudinal current status modeling. This has led to
Markov
longitudinal ordinal models solve all of these problems as discussed in https://hbiostat.org/talks/ordmarkov3.html
and https://hbiostat.org/endpoint.
Longitudinal current patient status
analysis asks questions such as
Multistate
models deal only with observables, e.g., estimate probabilities of
But
estimation of mean time in a certain range of states is probably most
clinically interesting.
We
have now applied this model in a number of
trials. The most exotic use of it
utilized daily angina frequency (penalized for number of anti-anginal meds the
patient is currently taking) and multiple severities of clinical events: https://www.sciencedirect.com/science/article/pii/S0735109724069481