Vanderbilt Translational Research Forum

Edge for Scholars 2021-11-04

- I.e., can you act like a physicist?
- Fully sequential trials are almost never used, and may be more useful than adaptive trials
- What keeps a researcher from collecting data until there is sufficient evidence in one direction, or until she runs out of time, money, or patience?

- Traditional frequentist statistics
- Fixed budgets
- NIH project budgets vs. portfolio budgets

- NIH project budgets vs. portfolio budgets
- Wouldn’t it be more logical to
- direct more resources to promising studies?
- direct resources away from unpromising studies (based on futility analysis)?

- How surprising is our data if the treatment has no effect?
- The probability of all possible levels of efficacy?

- If you like the first, you like the frequentist status quo
- p = P(data
**more**impressive than ours if treatment is ignorable)

- p = P(data
- If you like the second, you have Bayesian tendencies & like posterior probabilities
- P(unknown effect > c | data, prior information)
- E.g. P(BP reduction > 0 mmHg),

P(BP reduction > 5 mmHg),

P(similarity) = P(reduction between -3 and 3 mmHg)

- Insufficient patient recruitment
- Equivocal result
- At planned sample size result not “significant”
- Uncertainty intervals too wide to learn anything about efficacy
- The money was spent

- Over-optimistic power calculation
- using an effect size > clinically relevant
- if effect is “only” clinically relevant, likely to miss it

- Fixed sample size
- Losing power in the belief that α is something that should be controlled when taking multiple data looks
- Insensitive outcome measure

- Frequentist (traditional statistics): need a final sample size to know when α is “spent”
- Fixed budgeting also requires a maximum sample size
- Sample size calculations are
**voodoo**- arbitrary α, β, effect to detect Δ
- requires accurate σ or event incidence

- Physics approach: experiment until you have the answer

- Yes for budget, if fixed
- No sample size needed at all if using a Bayesian sequential design
- With Bayes, study extension is trivial and requires no α penalty
- logical to recruit more patients if result is promising but not definitive

0.7 < P(benefit) < 0.95 - analysis of cumulative data after new data added merely supersedes analysis of previous dataset

- logical to recruit more patients if result is promising but not definitive

- α = type I assertion probability = P(p < 0.05 | H
_{0}) typically - It is
**not**the probability of making an error in acting as if a treatment works - It is the probability of making an assertion of efficacy (rejecting H
_{0}) when**any assertion of efficacy**would by definition be wrong (i.e., under H_{0}) - α ⇑ when as # data looks ⇑

- α includes the probability of things that
*might have happened*- even if we have very strong evidence of efficacy at a given time, our design had the
*possibility*of showing an efficacy signal at other times had efficacy=0 - Bayesian methods deal with
*what did happen*and not*what might have happened*

- even if we have very strong evidence of efficacy at a given time, our design had the
- Analogy: using α is like judging a gambler by the proportion of games in which she places a bet.

Instead, we judge by the proportion of times she won when she placed a bet

- Another analogy: using α is like a trial judge who brags about the low fraction of innocent defendants he convicts; Bayesian probs. are P(current defendant is guilty)
- Controlling α leads to conservatism when there are multiple data looks
- Bayesian sequential designs: expected sample size at time of stopping study for efficacy/harm/futility ⇓ as # looks ⇑

- The probability of being wrong in acting as if a treatment works
- This is one minus the Bayesian posterior probability of efficacy (probability of inefficacy or harm)
- Controlled by the prior distribution (+ data, statistical model, outcome measure, sample size)
- Example: P(HR < 1 | current data, prior) = 0.96

⇒ P(HR ≥ 1) = 0.04 (inefficacy or harm)

- It controls the reliability of evidence
**at the decision point** **Not**the pre-study tendency for data extremes under an unknowable assumption- Simulation examples: bit.ly/bayesOp

- Not having an α penalty
- Being directional (no penalty for possibility of making a claim for an
**increase**in mortality) - Allowing for infinitely many data looks
- Borrowing information when there is treatment effect modification (interaction)

- Because they use low information outcome: time to binary event
- Do not distinguish a small MI from death and completely ignore death after a nonfatal MI
- Need 462
**events**to estimate a hazard ratio to within a factor of 1.20 (from 0.95 CI) - Need 384
**patients**to estimate a difference in means to within 0.2 SD (n = 96 for Xover design) - Event incidence ⇓ (censoring ⇑) ⇒ power for time to event = power of binary outcome

- Timing and severity of outcomes
- Handle
- terminal events (death)
- non-terminal events (MI, stroke)
- recurrent events (hospitalization)

- Break the ties; the more levels of Y the better

fharrell.com/post/ordinal-info - Maximum power when there is only one patient at each level (continuous Y)

- In a given week or day what is the severity of the worst thing that happened to the patient?
- Expert clinician consensus of outcome ranks
- Spacing of outcome categories irrelevant
- Avoids defining additive weights for multiple events on same week
- Events can be graded & can code common co-occurring events as worse event

- Can translate an ordinal longitudinal model to obtain a variety of estimates
- time until a condition
- expected time in state
- probability of something bad or worse happening to the pt over time, by treatment

- Bayesian partial proportional odds model can compute the probability that the treatment affects mortality differently than it affects nonfatal outcomes

- Ordinal longitudinal model also elegantly handles partial information: at each day/week the ordinal Y can be left, right, or interval censored when a range of the scale was not measured

- 0=alive 1=dead
- censored at 3w:
**000** - death at 2w:
**01** - longitudinal binary logistic model OR ≅ HR

- censored at 3w:
- 0=at home 1=hospitalized 2=MI 3=dead
- hospitalized at 3w, rehosp at 7w, MI at 8w & stays in hosp, f/u ends at 10w:
**0010001211**

- hospitalized at 3w, rehosp at 7w, MI at 8w & stays in hosp, f/u ends at 10w:

- 0-6 QOL excellent–poor, 7=MI 8=stroke 9=dead
- QOL varies, not assessed in 3w but pt event free, stroke at 8w, death 9w:
**12[0-6]334589** - MI status unknown at 7w:
**12[0-6]334[5,7]89**

- QOL varies, not assessed in 3w but pt event free, stroke at 8w, death 9w:
- Can make first 200 levels be a continuous response variable and the remaining values represent clinical event overrides

- Proportional odds ordinal logistic model with covariate adjustment
- Handles intra-patient correlation with a Markov process or other longitudinal models
- Extension of binary logistic model
- Generalization of Wilcoxon-Mann-Whitney Two-Sample Test
- No assumption about Y distribution for a given patient type
- Does not use the numeric Y codes

- B:A odds ratio
- P(B > A)

c-index; concordance probability ≅ OR^{0.66}/(1 + OR^{0.66})

fharrell.com/post/po – does**not**assume proportional odds! - Probability that Y=y or worse as a function of time and treatment

does assume PO but the partial PO model relaxes this - Bayesian partial PO model: compute posterior P(treatment affects death differently)

- VIOLET (Petal Network, NEJM 381:2529, 2019)
- Early high-dose vitamin D\(_3\) for 1360 D\(_3\)-deficient critically ill adults
- Primary endpoint: mortality (slight evidence for increase with D\(_3\))
- Ordinal endpoint collected each day for 28 consecutive days

- Simulation of VIOLET-like studies

Method | Power |
---|---|

Time-to-recovery analysis with Cox model | 0.79 |

Wilcoxon test of vent/ARDS-free days with death=-1 | 0.32 |

Longitudinal ordinal model | 0.94 |

Details at hbiostat.org/R/Hmisc/markov/sim.html

- Increase power by breaking ties
- Get close to the raw data
- Relevant to patients
- Respect timing and severity of outcomes
- Can do automatic risk/benefit trade-offs by including safety events in an ordinal outcome scale

- Has
**almost nothing**to do with baseline balance across treatments - Has to do with outcome heterogeneity
**within a treatment group** - Adjustment for strong baseline prognostic factors increases Bayesian and frequentist power for
**free** - Does this by getting the outcome model more correct

Example: older patients die sooner; pts with poor baseline 6m walk test have poor post-treatment 6m walk test

- Provides estimates of efficacy for
*individual patients*by addressing the fundamental clinical question:- If I compared two patients who have the same baseline variables but were given different treatments, by how much better should I expect the outcome to be with treatment B instead of treatment A?

- Provides a correct basis for analysis of heterogeneity of treatment effect
- subgroup analysis is very misleading and does not inherit proper covariate adjustment
- subgroup analysis does not properly handle continuous variables

- Continuous Y with X explaining \(\frac{1}{2}\) of the variation in Y
- sample size cut \(\times \frac{1}{2}\) in comparison to unadjusted analysis

- Change from baseline (post - pre) assumes
- post is linearly related to pre
- slope of post on pre is 1.0

- ANCOVA assumes neither so is more efficient and extends to ordinal outcomes
- post - pre is inconsistent with ∥ group design
- post - pre is manipulated by pt inclusion criteria and RTTM

- Don’t take sample sizes seriously; consider sequential designs with unlimited data looks and study extension
- α is not a relevant quantity to “control” or “spend” (unrelated to decision error)
- Choose high-resolution high-information Y
- Longitudinal ordinal Y is a general and flexible way to capture severity and timing of outcomes
- Always adjust for strong baseline prognostic factors; don’t stratify by treatment in Table 1

`datamethods.org`

`fharrell.com`

`hbiostat.org`

`hbiostat.org/proj/covid19`

`hbiostat.org/bbr/md/alpha.html`