11 Summary

Attribute	Frequentist	Bayesian
Nature of probabilities	long-run relative frequencies	degree of belief
Probabilities calculated	P(data \| no effect)	P(effect > c \| data)
Timing of arguments	After the study, influenced by data	Before the study
Type of arguments	Multiplicity re: multiple endpoints, treatments, times; clinical significance; α-spending function; complex designs; how to accurately compute p-value; how to use outside information	Prior distribution
Everyday challenges	Conceptual	Computational
Type I error	Can be controlled but arbitrary if multiple tests. Never zero regardless of n; does not prevent detection of clinically trivial effects; NOT the probability of regulator’s regret	Not relevant; can prevent declaring evidence for trivial effects by directly computing probability of non-trivial effect
Efficacy probability	Not available	Posterior probability; If approve drug with PP=0.96, probability of error=0.04 (regulator’s regret)
Clinical relevance	Tests must be augmented by confidence limits	Built-in because of direct estimation of P(effect)
Sample size	Guessed; hard to adjust once study starts	Savings due to unlimited looks with no penalty; can stop early for harm, futility, or efficacy; can extend any study; sample size estimate can incorporate uncertainty
Effect estimates if stop early	Overstated	Perfectly calibrated by prior
Skepticism	Effect of multiplicity adjustment is constant	Wears off as n ↑
Design	Does not extend to complex designs such as response-adaptive randomization and incorporating prior information	Extends to complex designs and has formal mechanism for incorporating relevant prior information

Bayes provides direct measures of evidence on clinical scale, not randomness scale
PPs have meaning regardless of context, including agressive sequential testing
Works well in standard fixed size RCTs but also in highly flexible designs; encourages learning
Reliable results with no notion of type I error
PPs perfectly calibrated independent of stopping rule
Same for effect point estimates
Data looks without advance planning, and reactive to new outside knowledge
Can stop studies earlier for futility or harm, sometimes for efficacy
Provides simultaneous prob statements re: multiple outcomes
Only approach for formally incorporating historical data

# Summary Attribute | Frequentist | Bayesian --------------------- | -------------------------- | ------------------------- Nature of probabilities | long-run relative frequencies | degree of belief Probabilities calculated | P(data \| no effect) | P(effect > c \| data) Timing of arguments | After the study, influenced by data | Before the study Type of arguments | Multiplicity re: multiple endpoints, treatments, times; clinical significance; α-spending function; complex designs; how to accurately compute p-value; how to use outside information | Prior distribution Everyday challenges | Conceptual | Computational Type I error | Can be controlled but arbitrary if multiple tests. Never zero regardless of n; does not prevent detection of clinically trivial effects; **NOT** the probability of regulator's regret | Not relevant; can prevent declaring evidence for trivial effects by directly computing probability of non-trivial effect Efficacy probability | Not available | Posterior probability; If approve drug with PP=0.96, probability of error=0.04 (regulator's regret) Clinical relevance | Tests must be augmented by confidence limits | Built-in because of direct estimation of P(effect) Sample size | Guessed; hard to adjust once study starts | Savings due to unlimited looks with no penalty; can stop early for harm, futility, or efficacy; can extend any study; sample size estimate can incorporate uncertainty Effect estimates if stop early | Overstated | Perfectly calibrated by prior Skepticism | Effect of multiplicity adjustment is constant | Wears off as n ↑ Design | Does not extend to complex designs such as response-adaptive randomization and incorporating prior information | Extends to complex designs and has formal mechanism for incorporating relevant prior information * Bayes provides direct measures of evidence on clinical scale, not randomness scale * PPs have meaning regardless of context, including agressive sequential testing * Works well in standard fixed size RCTs but also in highly flexible designs; encourages learning * Reliable results with no notion of type I error * PPs perfectly calibrated independent of stopping rule * Same for effect point estimates * Data looks without advance planning, and reactive to new outside knowledge * Can stop studies earlier for futility or harm, sometimes for efficacy * Provides simultaneous prob statements re: multiple outcomes * Only approach for formally incorporating historical data