Overview of Composite Outcome Scales & Statistical Approaches for Analyzing Them

Frank Harrell

Expert Biostatistics Advisor
Department of Biostatistics
Vanderbilt University School of Medicine


Analytical Overview

  • Many sponsors are still using minimum-power responder analysis
    • Even more tragic in rare disease than in large RCTs
  • Power gained by using a high-resolution outcome scale
  • Further gain by using multiple scales
  • Further gain by using longitudinal data
  • Need to take death + other clinical events formally into account

Analytical Options


  • WIN ratio/odds and DOOR (desirability of outcome rankings) are longitudinal extensions of the Wilcoxon two-sample test
  • Gain power by breaking ties in time-to-event outcomes
  • Provide only within-study estimand; no clinical-scale readouts
    • Pr(randomly chosen pt on B has better outcome than pt on A)
    • How much better?
    • What is the outcome of pts on tx B?

WIN and DOOR, continued

  • The relative ordering estimand is influenced by narrowness of study inclusion criteria
  • Similar to reporting -statistic without reporting difference in means
  • Very difficult to deal with missing component data + covariate adj.

Ordinal Longitudinal Models (OLM)

  • Extension of Wilcoxon test and Cox model to allow covariate adjustment + repeated measures
  • Most flexible form uses a Markov process
  • Demonstrated to handle within-pt serial correlation almost perfectly in multiple RCTs
  • Better modeling of intra-pt correlation effective sample size
  • Elegantly handles missing components + absorbing states precluding pt scale assessment
    • Death and need for rescue therapy accounted for

OLM, continued

  • Huge variety of clinical readouts
    • Pr(transitioning to state at time given in state at )
    • Pr(being at severity y or worse as a function of time, tx)
    • Mean time in any set of states
    • Treatment difference in expected time in specified states (like time to recovery or time to loss of function)
  • Generalizes Wilcoxon test, Cox model, recurrent event analysis, and longitudinal analysis

OLM, continued

  • OLM works for tx that improves pt condition as well as for tx for slowing progression
  • Detailed case study with complete R code at hbiostat.org/rmsc/markov
  • OB project underway to reanalyze an ALS trial using OLM

Time Savings

  • Dickson, Wessels, Dowsett, Mallinckrodt, Sparks, Chatterjee, Hendrix J Prev Alzheimer's 2023
  • Analyses single composite outcome measure or stat summaries of separate measures
  • For degenerative disease, not for treatments that improve pts over their baseline state
  • Assumes follow-up time is sufficiently long that almost all control group pts fair poorly

TS, continued

  • TS is essentially where
    • is end of study for active tx pts
    • is the time by which control pts do as poorly as active pts at
  • Based on linear interpolation on estimated means
  • TS cannot account for death or need for rescue therapy

Comparisons of TS and OLM

  • OLM assumes consensus in severity ordering of outcome states for a single assessment time
    • Uses only the worst condition suffered by the patient on a given day
  • Expect OLM to have greater power than TS due to
    • More use of all the raw data over the entire time course
    • OLM accounts for how close to failure were active arm pts before the last follow-up

TS vs. OLM, continued

  • OLM allows for absorbing states/terminating events that preclude patient scale assessment
  • OLM explicitly accounts for deaths in an interpretable fashion
    • Contrast that with counterfactuals and competing risk analysis
  • TS cannot easily borrow information
  • OLM has been implemented in both frequentist and Bayesian models
  • OLM can formally assess how tx affects different outcome components

Composite Outcome Scales

  • Choice of scales is very important
  • Gold standard is pt utility for current status
  • OLM approximates the gold standard
  • Several ways to combine multiple scales
  • TS approach using global statistical summary
    • is not clinically interpretable at a given time
    • cannot handle deaths
    • difficult to handle missing component data

More Information

See hbiostat.org/doc/comp

Usage: marp --html xxx.md

See https://www.hashbangcode.com/article/seven-tips-getting-most-out-marp