Nature of probabilities |
long-run relative frequencies |
degree of belief |

Probabilities calculated |
P(data | no effect) |
P(effect > c | data) |

Timing of arguments |
After the study, influenced by data |
Before the study |

Type of arguments |
Multiplicity re: multiple endpoints, treatments, times; clinical significance; α-spending function; complex designs; how to accurately compute p-value; how to use outside information |
Prior distribution |

Everyday challenges |
Conceptual |
Computational |

Type I error |
Can be controlled but arbitrary if multiple tests. Never zero regardless of n; does not prevent detection of clinically trivial effects; **NOT** the probability of regulator’s regret |
Not relevant; can prevent declaring evidence for trivial effects by directly computing probability of non-trivial effect |

Efficacy probability |
Not available |
Posterior probability; If approve drug with PP=0.96, probability of error=0.04 (regulator’s regret) |

Clinical relevance |
Tests must be augmented by confidence limits |
Built-in because of direct estimation of P(effect) |

Sample size |
Guessed; hard to adjust once study starts |
Savings due to unlimited looks with no penalty; can stop early for harm, futility, or efficacy; can extend any study; sample size estimate can incorporate uncertainty |

Effect estimates if stop early |
Overstated |
Perfectly calibrated by prior |

Skepticism |
Effect of multiplicity adjustment is constant |
Wears off as n ↑ |

Design |
Does not extend to complex designs such as response-adaptive randomization and incorporating prior information |
Extends to complex designs and has formal mechanism for incorporating relevant prior information |