Subject: Recurrent events

Date: January 15, 2025 at 11:51:52 PM CST

 

We had to review this for a trial as there was considerable disagreement among various investigators on which method to use for recurrent event analysis - thought you might like to have this summary as a reference - pros and cons for various analytic methods for recurrent events. This has some regulatory consideration which might not be of interest to you

Javed Butler

 

 

Model

Explanation

Poisson

Constant expected waiting time between recurrent events and across participants assumed to follow the Poisson parametric model. Often see this for adverse events when you see exposure adjusted incidence rate and incidence rate ratios (sum up events and divide by time or could do number of subjects with events divided by time up to an event).

Negative binomial

Constant expected waiting time between events, but each participant has their own constant guided by an unobserved random variable (think their own risk) termed a frailty. Inference is on the average waiting time and similarly gives rate ratios. View as a relaxation of the assumptions used in the Poisson model for additional variability across participants.

Andersen-Gill

The gap times for recurrent events are modeled similar to a Cox model where covariates have a multiplicative effect on the those times. Unlike the Poisson models, the times are modeled rather than counts and divided by times.

Lin-Wei-Yang-Ying (LWYY)

Model is the same as Andersen-Gill but relaxes an assumption made in the A-G model about the relationship between events. A-G assumed that prior events have no influence on future recurrence in the variance calculation. LWYY changes the variance calculation to be a “robust sandwich estimator” to address this untestable assumption. The result is that LWYY variance estimator is larger than the A-G variance estimator, but this is likely appropriately so and has largely replaced the use of A-G models.

Wei-Lin-Wessfeld (WLW)

Instead of modeling the gap time between events, this models the time from the study start to each event: time to first, time to second, time to third,… Then create a weighted aggregate of them. This used to be a popular method, but has largely fallen out of favor since its target of estimation can be difficult to explain.

Nelson-Aalen

Estimator of the cumulative hazard function in the presence of censoring. Can be used to also produce a mean cumulative incidence plot similar to a Kaplan-Meier except the y-axis is average number of events per participants at a time rather than the proportion of participants with an event. For a constant ratio Nelson-Aalen problem, the LWYY will estimate that ratio well.

Ghosh and Lin

A competing risk version of Nelson-Aalen that assumes no future events following death. This method flattens the Nelson-Aalen mean cumulative function by keeping deceased participants in the denominator rather than censoring them. There is an associated ratio model from Ghosh and Lin that can be made to summarize this similar to the the Nelson-Aalen/LWYY relationship.

 

Choice between the Negative Binomial and LWYY: The negative binomial and LWYY often agree (exactly so if no missing data) but the LWYY tends to have some better behavior than the negative binomial in presence of noninformative missing (see a submission on qualification to EMA on methods on this: https://www.ema.europa.eu/en/documents/other/qualification-opinion-treatment-effect-measures-when-using-recurrent-event-endpoints-applicants-submission_en.pdf). The LWYY model also can be shown to estimate a particular kind of estimand termed “while alive exposure-weighted event rate ratio” (see https://www.tandfonline.com/doi/full/10.1080/19466315.2021.1994457#d1e280 beyond the EMA qualification documents on this). S

 

Issue with informative censoring: Both the negative binomial and LWYY assume that the censoring is noninformative. This can be a problematic assumption in HF studies with a high death rate inducing censoring since the deaths are often related to their HF progression. This can be viewed as an informative missing data problem.

 

Issue with death as an event: In the negative binomial and LWYY models there is a choice on whether death (or CV death or HF death) should count as one of the “recurrent” events. In favor of counting is that it is potentially more information on the same process. Against this is that the effect of the drug on mortality can be different than on keeping participants out of the hospital or clinic and inclusion could erode some effect. Mixing event types also complicates the interpretation.

 

Joint frailty models: Joint frailty models can be viewed as an attempt to handle the above two issues. They do this by writing one model for the recurrent events (often an A-G/LWYY type or a parametric model like a Poisson) and another model for time to death (often a Cox model or a parametric model like a Weibull) and then linking the two models with a shared participant specific random variable (frailty). This frailty can be viewed as linking the risk of the recurrent events to each other and to the risk of death such that a participant who dies quickly likely also would have HF events with shorter gap times between them if they did not die and similarly a participant with repeated HF events with short gaps is at an elevated risk of death. The result is an estimator of the HF specific process (HR if A-G like or RR if Poisson like) and an estimator of the hazard ratio of the death process. The variance of the HF HRs is generally improved due to the missing data handling from correlated death time information and the effect shows less attenuation from separating the often weaker treatment effect on death from treatment effect on HF events.

 

Some cons to the joint frailty models are:

1. Fitting them is more complex and sometimes the specification fails to converge for some data.  This can be resolved with a fallback to a parametric model that will converge.


2. The estimand for the recurrent HF event process is a hypothetical one of an underlying recurrent event process that cannot be observed due to the competing risk of death. It isn’t any different than “time to first HF event”, but not having an explainable treatment policy estimand without added assumptions likely relegates it to secondary endpoint analysis


 

In trials so far, the joint frailty estimator of recurrent HF events has performed well, demonstrating same or stronger effects on the HF event process than LWYY models. For example, see figure 3 for Entresto (https://pmc.ncbi.nlm.nih.gov/articles/PMC6607507/#ejhf1139-bib-0016) as well as for EMPEROR-Preserved (https://www.sciencedirect.com/science/article/pii/S0735109723063829?via%3Dihub#bib26) as well as in earlier analysis of CHARM and CORONA. LWYY however has not in general outperformed the time to first event analysis in HF studies as noted in the second publication in the prior sentence. 

 

Between the Nelson-Aalen and Ghosh and Lin approaches, the Nelson-Aalen more closely matches the underlying HF event process while the Ghosh and Lin may reflect how many events may happen in practice which has use for payers.

 

FH Reply   2025-01-16

 

This is an excellent summary.  But it omitted what is the best way to handle recurrent events in my estimation: longitudinal multi-state (state transition) models.  The reason you see so many solutions to the recurrent events problem, with so many of them ad hoc and not derived from general statistical principles, is that the problem has been miscast as a time-to-event problem instead of the more natural, and better fit to the data generating process, approach of longitudinal current status modeling.  This has led to

 

Markov longitudinal ordinal models solve all of these problems as discussed in https://hbiostat.org/talks/ordmarkov3.html and https://hbiostat.org/endpoint.  Longitudinal current patient status analysis asks questions such as

 

Multistate models deal only with observables, e.g., estimate probabilities of

But estimation of mean time in a certain range of states is probably most clinically interesting.

 

We have now applied this model in a number of trials.  The most exotic use of it utilized daily angina frequency (penalized for number of anti-anginal meds the patient is currently taking) and multiple severities of clinical events: https://www.sciencedirect.com/science/article/pii/S0735109724069481