--- theme: gaia _class: lead size: 16:9 style: | .small-text { font-size: 0.75rem; } /* Use this if a logo is wanted, and comment out the next section width and height match the natural aspect ratio ~ 11:4 section::before { content: ''; position: absolute; bottom: 0px; left: 0px; width: 225px; height: 82px; background-image: url('https://hbiostat.org/img/vumc-logo.png'); background-size: contain; background-repeat: no-repeat; background-position: center; z-index: 999; } */ section::before { content: 'Department of Biostatistics\AVanderbilt University School of Medicine'; white-space: pre; position: absolute; bottom: 0px; left: 0px; font-size: 0.90rem; /* font-weight: bold; */ color: blue; z-index: 999; } section.lead::before, section.nologo::before { display: none; } section.fullimage::before { display: none; /* suppresses logo */ } section.fullimage footer { display: none; /* suppresses footer */ } paginate: true backgroundColor: #fff _paginate: false marp: true --- # Questions We Forget To Ask When Designing an RCT Frank Harrell

Department of Biostatistics
Vanderbilt University School of Medicine

DIDACT Symposium 2026-04-16

--- ## Are You Sure Hypothesis Testing Is the Best Framework? * Aren't questions more useful than hypotheses? * Isn't estimating the **amount** of effectiveness the most relevant goal? * What about basing N and statistical design on precision? + Stay tuned for Emily's presentation * Or using a Bayesian design to compute P(benefit > $\epsilon$) --- ## Do You Need to Demonstrate a Benefit ≥ MCID? * Observed benefit will have to be non-trivially > MCID to declare success * Will you consider instead demonstration of a benefit > $\epsilon$? + $\epsilon$ = threshold for trivial treatment effect or minimum observable treatment effect, e.g., $\frac{\text{MCID}}{2}$ --- ## Are You Aware That Most Fixed N Designs End Equivocally? * p > 0.05 + Failed to generate sufficient evidence at the current N to refute the supposition that the treatment is ignorable, at the completely arbitrary $\alpha=0.05$ level + Wide confidence interval $\rightarrow$ we know no more than before the study + We mainly know the money was spent --- ## Equivocal Results, _continued_ * Equivocal results are the $2^\text{nd}$ most common RCT result * What if randomizing 40 more patients resulted in definitive evidence * **Avoid** getting to planned study end without reaching a conclusion --- ## Do You Really Need a Fixed Sample Size? * Will a sequential design work instead? + Does the disease/treatment lend itself to sequential trials? + Kelley Kidwell will be taking this a major step forward with SMART designs * Frequentist group sequential design + Limited number of looks, fixed maximum $N$ * Bayesian sequential design + Unlimited looks, no fixed maximum $N$ --- ## Do You Want to Possibly Stop Early for Futility? * Fixed N designs ending with p > 0.05 at max $N$ typically could have stopped around $\frac{N}{3}$ with the same result * More general to think of stopping early for inefficacy * Inefficacy = effect $< \epsilon$, $\epsilon=$ trivial effect threshold * Stopping for harm, zero benefit, or less than trivial benefit * Much earlier stopping than using effect $< 0$ * See [this](https://hbiostat.org/bayes/design) --- ## Quiz * How far along in an RCT can you have a $n(0,1)$ $z$ statistic $= 1$ and still have a good chance of ultimate success? * Answer: $\frac{4}{10}$ * How far along in an RCT can you have treatment outcomes in the wrong direction ($z < 0$) and still have a good chance of ultimate success? * Answer: $< \frac{1}{10}$ * See [Spiegelhalter 1993](https://hbiostat.org/bayes/bet/design#sequential-monitoring-and-futility-analysis) --- ## How Many Follow-Ups Can You Afford? * What is the maximum number of follow-ups you can afford and patients will tolerate? * Are you aware that longitudinal data makes each patient contribute more than 1 patient of information? + More dense longitudinal data → higher power --- ## Primary Endpoint Considerations * If there is only one clear primary endpoint, do you have a solid MCID for it? * If you don't have a solid single MCID it's best to have an uncertainty distribution for MCID * $\rightarrow$ Bayesian power / _assurance_ --- ## Are You Aware That Binary Outcomes Have Minimum Information? * RCTs with binary Y are larger and **still have lower power** than RCTs with continuous Y + See [van Zwet, Harrell, Senn 2026 Stat in Med](https://onlinelibrary.wiley.com/doi/10.1002/sim.70402) * Better information, power, and interpretability comes from [breaking ties in Y](https://fharrell.com/post/ordinal-info) + Time to first event violates PH and hides mixtures of event types + Quit ignoring deaths that occur after a first nonfatal event --- ## Multiple Important Outcomes: Key Questions * What is your idea for MCID for every outcome for power calculation and interpretation? * If different outcomes move in different directions, how do you know which treatment results in patients faring better? * Can you translate "which treatment improves patient outcomes" to a solid analysis plan? --- ## Multiple Important Outcomes: Analysis Approaches * Rank order severity of outcomes as of a given day — ask which treatment yields more days in better outcome states for patients * If you can't rank outcomes, would you consider a general effectiveness assessment? + E.g. Bayesian P(treatment benefit on ≥ 2 outcomes out of 5) > 0.95 --- ## Are You Aware of Alternatives to Multiplicity Adjustments? * Prioritization of hypotheses, pre-specification of reporting order: [Cook and Farewell 1996 JRSSA](https://www.jstor.org/stable/2983471) * Raise the bar for assertions + Bayesian P(benefit on ≥ 2 outcomes) + Evidence for > 20% benefit on at least one outcome --- ## Summary * Many RCTs are designed on a wing and a prayer and don't consider many uncertainties * Resource waste is not envisioned at the start but is lamented at the end * Avoid the frequentist multiplicity mess * Trialists are averse to change; statisticians need to show leadership * If you are content with the status quo, don't ask too many questions