11  Summary

Attribute Frequentist Bayesian
Nature of probabilities long-run relative frequencies degree of belief
Probabilities calculated P(data | no effect) P(effect > c | data)
Timing of arguments After the study, influenced by data Before the study
Type of arguments Multiplicity re: multiple endpoints, treatments, times; clinical significance; α-spending function; complex designs; how to accurately compute p-value; how to use outside information Prior distribution
Everyday challenges Conceptual Computational
Type I error Can be controlled but arbitrary if multiple tests. Never zero regardless of n; does not prevent detection of clinically trivial effects; NOT the probability of regulator’s regret Not relevant; can prevent declaring evidence for trivial effects by directly computing probability of non-trivial effect
Efficacy probability Not available Posterior probability; If approve drug with PP=0.96, probability of error=0.04 (regulator’s regret)
Clinical relevance Tests must be augmented by confidence limits Built-in because of direct estimation of P(effect)
Sample size Guessed; hard to adjust once study starts Savings due to unlimited looks with no penalty; can stop early for harm, futility, or efficacy; can extend any study; sample size estimate can incorporate uncertainty
Effect estimates if stop early Overstated Perfectly calibrated by prior
Skepticism Effect of multiplicity adjustment is constant Wears off as n ↑
Design Does not extend to complex designs such as response-adaptive randomization and incorporating prior information Extends to complex designs and has formal mechanism for incorporating relevant prior information