Vanderbilt University Department of Biostatistics Seminar 2020-11-18
Power/Sample Size
What Are the Most Common Outcomes of Clinical Trials?
Insufficient patient recruitment
Equivocal result
At planned sample size result not “significant”
Uncertainty intervals too wide to learn anything about efficacy
The money was spent
What Are the Most Common Causes of Equivocal Results?
Over-optimistic power calculation
using an effect size > clinically relevant
if effect is “only” clinically relevant, likely to miss it
Fixed sample size
Losing power in the belief that α is something that should be controlled when taking multiple data looks
Insensitive outcome measure
The Problem With Sample Size
Frequentist (traditional statistics): need a final sample size to know when α is “spent”
Fixed budgeting also requires a maximum sample size
Sample size calculations are voodoo
arbitrary α, β, effect to detect Δ
requires accurate σ or event incidence
Physics’ approach: experiment until you have the answer
Is a Sample Size Calculation Needed?
Yes for budget, if fixed
No sample size needed at all if using a Bayesian sequential design
With Bayes, study extension is trivial and requires no α penalty
logical to recruit more patients if result is promising but not definitive
analysis of cumulative data after new data added merely supersedes analysis at planned study end
Is α a Good Thing to Control?
α = type I assertion probability
It is not the probability of making an error in acting as if a treatment works
It is the probability of making an assertion of efficacy (rejecting H0) when any assertion of efficacy would by definition be wrong (i.e., under H0)
α ⇑ when as # data looks ⇑
Is α a Good Thing to Control?
continued
α includes the probability of things that might have happened
even if we have very strong evidence of efficacy at a given time, our design had the possibility of showing an efficacy signal at other times had efficacy=0
Bayesian methods deal with what did happen and not what might have happened
Is α a Good Thing to Control?
continued
Controlling α leads to conservatism when there are multiple data looks
Bayesian sequential designs: expected sample size at time of stopping study for efficacy/harm/futility ⇓ as # looks ⇑
What Should We Control?
The probability of being wrong in acting as if a treatment works
This is one minus the Bayesian posterior probability of efficacy (probability of inefficacy or harm)
Controlled by the prior distribution (+ data, statistical model, outcome measure, sample size)
Example: P(HR < 1 | current data, prior) = 0.96 ⇒ P(HR ≥ 1) = 0.04
(inefficacy or harm)
The Shocking Truth of Bayes
Control the reliability of evidence at the decision point
Being directional (no penalty for possibility of making a claim for an increase in mortality)
Allowing for infinitely many data looks
Borrowing information when there is treatment effect modification (interaction)
Outcome Variable
Why Do Pivotal Cardiovascular Trials Need 6000-10000 Pts?
Because they use low information outcome: time to binary event
Do not distinguish a small MI from death and completely ignore death after a nonfatal MI
Need 462 events to estimate a hazard ratio to within a factor of 1.20 (from 0.95 CI)
Need 384 patients to estimate a difference in means to within 0.2 SD (n = 96 for Xover design)
Event incidence ⇓ (censoring ⇑) ⇒ power for time to event = power of binary outcome
Aside: Peri-Procedural MIs
Not helpful to do separate efficacy analyses of procedural and spontaneous MIs
Better outcome: continuous measure of myocardial damage due to MI regardless of cause, with death worse than any level
Can be accommodated by longitudinal ordinal outcome
General Outcome Attributes
Timing and severity of outcomes
Handle
terminal events (death)
non-terminal events (MI, stroke)
recurrent events (hospitalization)
Break the ties; the more levels of Y the better
fharrell.com/post/ordinal-info
Maximum power when there is only one patient at each level (continuous Y)
What is a Fundamental Outcome Assessment?
In a given week or day what is the severity of the worst thing that happened to the patient?
Expert clinician consensus of outcome ranks
Spacing of outcome categories irrelevant
Avoids defining additive weights for multiple events on same week
Events can be graded & can code common co-occurring events as worse event
Fundamental Outcome,
continued
Can translate an ordinal longitudinal model to obtain a variety of estimates
time until a condition
expected time in state
Bayesian partial proportional odds model can compute the probability that the treatment affects mortality differently than it affects nonfatal outcomes
Model also elegantly handles partial information: at each day/week the ordinal Y can be left, right, or interval censored when a range of the scale was not measured
Examples of Longitudinal Ordinal Outcomes
0=alive 1=dead
censored at 3w: 000
death at 2w: 01
longitudinal binary logistic model OR ≅ HR
0=at home 1=hospitalized 2=MI 3=dead
hospitalized at 3w, rehosp at 7w, MI at 8w & stays in hosp, f/u ends at 10w: 0010001211
Examples,
continued
0-6 QOL excellent–poor, 7=MI 8=stroke 9=dead
QOL varies, not assessed in 3w but pt event free, stroke at 8w, death 9w: 12[0-6]334589
MI status unknown at 7w: 12[0-6]334[5-7]89
Better: {5,7} instead of [5-7] but no software
Can make first 200 levels be a continuous response variable and the remaining values represent clinical event overrides
Statistical Model
Proportional odds ordinal logistic model with covariate adjustment
Patient random effects (intercepts) handle intra-patient correlation
Extension of binary logistic model
Generalization of Wilcoxon-Mann-Whitney Two-Sample Test
No assumption about Y distribution for a given patient type
Does not use the numeric Y codes
Interpretation
B:A odds ratio
P(B > A)
c-index; concordance probability ≅ OR0.66/(1 + OR0.66)
fharrell.com/post/po – does not assume proportional odds!
Probability that Y=y or worse as a function of time and treatment
does assume PO but the partial PO model relaxes this
Bayesian partial PO model: compute posterior P(treatment affects death differently)
Interpretation,
continued
Covariate Adjustment
Covariate Adjustment
Has nothing to do with baseline balance across treatments
Has to do with outcome heterogeneity within a treatment group
Adjustment for strong baseline prognostic factors increases Bayesian and frequentist power for free
Does this by getting the outcome model more correct
Example: older patients die sooner; pts with poor baseline 6m walk test have poor post-treatment 6m walk test
Covariate Adjustment,
continued
Provides estimates of efficacy for individual patients by addressing the fundamental clinical question:
If I compared two patients who have the same baseline variables but were given different treatments, by how much better should I expect the outcome to be with treatment B instead of treatment A?
Covariate Adjustment,
continued
Provides a correct basis for analysis of heterogeneity of treatment effect
subgroup analysis is very misleading and does not inherit proper covariate adjustment
subgroup analysis does not properly handle continuous variables
Don’t Compute Change from Baseline!
Change from baseline (post - pre) assumes
post is linearly related to pre
slope of post on pre is 1.0
ANCOVA assumes neither so is more efficient and extends to ordinal outcomes
post - pre is inconsistent with ∥ group design
post - pre is manipulated by pt inclusion criteria and RTTM
Take Home Messages
Take Home Messages
Don’t take sample sizes seriously; consider sequential designs with unlimited data looks and study extension
α is not a relevant quantity to “control” or “spend”
(unrelated to decision error)
Choose high-resolution high-information Y
Longitudinal ordinal Y is a general and flexible way to capture severity and timing of outcomes
Always adjust for strong baseline prognostic factors; don’t stratify by treatment in Table 1