General COVID-19 Therapeutics Trial Design

This document covers general design and analysis issues and recommendations that pertain to COVID-19 therapeutic randomized clinical trials. Experimental designs are particular to therapies and target patients so are covered only briefly, other than sequential designs which are quite generally applicable. Some of the issues discussed here are described in more detail at where several sequential clinical trial simulation studies may also be found.

For reasons detailed below, we propose that the default design for COVID-19 therapeutic studies be Bayesian sequential designs using high-information/high-power ordinal outcomes as overviewed in this video.

The material on selection and construction of outcome variables applies equally to Bayesian and traditional frequentist statistical analysis. The following Design section includes some general material and lists several advantages of continuous learning Bayesian sequential designs. Before discussing design choices, we summarize the most common statistical pitfalls in RCTs and contrast frequentist and Bayesian methods. This is important because we recommend that a Bayesian approach be used in the current fast-moving environment to maintain flexibility and accelerate learning.

Most Common Pitfalls in Traditional RCT Designs

The most common outcome of a randomized clinical trial is a p-value greater than some arbitrary cutoff. In this case, researchers who are aware that absence of evidence is not evidence of absence will conclude that the study is inconclusive (especially if the confidence interval for the treatment difference is wide). More commonly, the researchers or a journal editor will conclude that the treatment is ineffective. This should raise at least five questions.

  • What exactly is the evidence that the active treatment results in similar patient outcomes as control?
  • What was the effect size assumed in the power calculation, and was it greater than a clinically relevant effect? In other words was the sample size optimistically small?
  • Did the outcome have the most statistical power among all clinically relevant outcomes?
  • What would have happened had the study been extended? Would it have gone from an equivocal result to a definitive result?
  • If the conclusion is “lack of efficacy”, could we have reached that conclusion with a smaller number of patients randomized by using a sequential design?

Powering a study to detect a miracle when all that happened is a clinically important effect is unfortunately all too common. So is the use of inefficient outcome variables. Fixed sample size designs, though easy to understand and budget, are a key cause of wasted resources and all too frequently result in uninformative studies. The classical sample size calculation assumes a model, makes assumptions about patient-to-patient variability or event incidence (both of these assumptions are in common between frequentist and Bayesian approaches) and then assumes an effect size “not to miss”. The effect size is usually overoptimistic to make the budget palatable. A continuously sequential Bayesian design allows one to run the study until

  • there is strong evidence for efficacy
  • there is moderately strong evidence for harm
  • there is moderately strong evidence for similarity
  • the probability of futility is high, e.g., the Bayesian predictive probability of success is low given the current data even if the study were to progress to the maximum affordable sample size

The idea of experimenting until one has an answer, though routinely practiced by physicists, is underutilized in medicine.

A second major statistical pitfall is inflexibility such that the design cannot be modified in reaction to changing disease or medical practice patterns after randomization begins, otherwise one would not know how to compute a p-value as this requires repeated identical sampling.

Contrasting Bayesian and Frequentist Statistical Methods

The Frequentist Paradigm