References

Adcock, C. J. (1997). Sample size determination: A review. The Statistician, 46, 261–283.

Albert, J. (2007). Bayesian Computation with R. Springer.

Altman, D. G., & Bland, J. M. (1995). Absence of evidence is not evidence of absence. BMJ, 311, 485.

Bendtsen, M. (2018). A Gentle Introduction to the Comparison Between Null Hypothesis Testing and Bayesian Analysis: Reanalysis of Two Randomized Controlled Trials. Journal of Medical Internet Research, 20(10), e10873. https://doi.org/10.2196/10873

Using priors forces us to be more specific and explicit about what we mean when we say that something is unknown... the Bayesian approach does not attempt to identify a fixed value for the parameters and dichotomize the world into significant and nonsignificant, but rather relies on the researcher to do the scientific inference and not to delegate this obligation to the statistical model... the NHST approach is rooted in the idea of being able to redo the experiment many times (so as to get a sampling distribution). Even if we can rely on theoretical results to get this sampling distribution without actually going back in time and redoing the experiment, the underlying idea can be somewhat problematic. What do we mean by redoing an experiment? Can we redo a randomized controlled trial while keeping all things equal and recruiting a new sample from the study population?... Once we remove ourselves from the dichotomization of evidence, other things start to take precedence: critically assessing the models chosen, evaluating the quality of the data, interpreting the real-world impact of the results, etc.

Berry, D. A. (1987). Interim analysis in clinical trials: The role of the likelihood principle. Am Statistician, 41, 117–122. https://doi.org/10.1080/00031305.1987.10475458

Berry, D. A. (2006). Bayesian clinical trials. Nat Rev, 5, 27–36.

excellent review of Bayesian approaches in clinical trials; "The greatest virtue of the traditional approach may be its extreme rigour and narrowness of focus to the experiment at hand, but a side effect of this virtue is inflexibility, which in turn limits innovation in the design and analysis of clinical trials. ... The set of “other possible results” depends on the experimental design. ... Everything that is known is taken as given and all probabilities are calculated conditionally on known values. ... in contrast to the frequentist approach, only the probabilities of the observed results matter. ... The continuous learning that is possible in the Bayesian approach enables investigators to modify trials in midcourse. ... it is possible to learn from small samples, depending on the results, ... it is possible to adapt to what is learned to enable better treatment of patients. ... subjectivity in prior distributions is explicit and open to examination (and critique) by all. ... The Bayesian approach has several advantages in drug development. One is the process of updating knowledge gradually rather than restricting revisions in study design to large, discrete steps measured in trials or phases."

Blume, J. D. (2002). Likelihood methods for measuring statistical evidence. Stat Med, 21(17), 2563–2599.

Blume, J. D. (2008). How often Likelihood ratios are misleading in sequential trials. Comm Stat Th Meth, 37(8), 1193–1206.

Braun, T. M. (n.d.). Motivating sample sizes in adaptive Phase I trials via Bayesian posterior credible intervals. Biom, n/a. https://doi.org/10.1111/biom.12872

Briggs, W. M. (2017). The Substitute for p-Values. JASA, 112(519), 897–898. https://doi.org/10.1080/01621459.2017.1311264

Cohen, J. (1994). The earth is round (p < .05). Am Psychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066x.49.12.997

Cook, R. J., & Farewell, V. T. (1996). Multiplicity considerations in the design and analysis of clinical trials. J Roy Stat Soc A, 159, 93–110.

argues that if results are intended to be interpreted marginally, there may be no need for controlling experimentwise error rate. FH phrasing: Cook and Farewell point out that when a strong priority order is pre-specified for separate clinical questions, and that same order is also the reporting order (no cherry picking), there is no need for multiplicity adjustment. This is in contrast with a study whose aim is to find an endpoint or find a patient subgroup that is benefited by treatment, a situation requiring conservative multiplicity adjustment.

Dallow, N., Best, N., & Montague, T. H. (2018). Better decision making in drug development through adoption of formal prior elicitation. Pharm Stat, 0(0). https://doi.org/10.1002/pst.1854

Dawid, A. P. (2000). Comment on “the philosophy of statistics” by D. V. Lindley. The Statistician, 49, 325–326.

Deming, W. E. (1975). On Probability as a Basis for Action. Am Statistician, 29(4), 146–152. https://doi.org/10.1080/00031305.1975.10477402

Edwards, W., Lindman, H., & Savage, L. J. (1963). Bayesian statistical inference for psychological research. Psych Rev, 70(3), 193–242. http://psycnet.apa.org/doi/10.1037/h0044139

Emerson, S. S. (1995). Stopping a clinical trial very early based on unplanned interim analysis: A group sequential approach. Biometrics, 51, 1152–1162.

Feinstein, A. R. (1977). Clinical Biostatistics. C. V. Mosby.

Gelman, A. (2013). P Values and Statistical Practice. Epi, 24(1), 69–72. https://doi.org/10.1097/ede.0b013e31827886f7

Gelman, A. (2015). Bayesian and Frequentist Regression Methods. Stat Med, 34(7), 1259–1260. https://doi.org/10.1002/sim.6427

Gelman, A., & Hennig, C. (2017). Beyond subjective and objective in statistics. http://www.stat.columbia.edu/̃gelman/research/published/objectivityr5.pdf

Goodman, S. N. (1999). Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy. Ann Int Med, 130(12), 995+. https://doi.org/10.7326/0003-4819-130-12-199906150-00008

Nice language for what happens when scientists use NHST to justify strong statements in their conclusions and interpretation; p-value fallacy

Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. Eur J Epi, 31(4), 337–350. https://doi.org/10.1007/s10654-016-0149-3

Best article on misinterpretation of p-values. Pithy summaries.

Greenwald, A. G., Gonzalez, R., Harris, R., & Guthrie, D. (1996). Effect sizes and p values: What should be reported and what should be replicated? Psychophysiology, 33(2), 175–183. https://doi.org/10.1111/j.1469-8986.1996.tb02121.x

Grouin, J.-M., Coste, M., Bunouf, P., & Lecoutre, B. (2007). Bayesian sample size determination in non-sequential clinical trials: Statistical aspects and some regulatory considerations. Stat Med, 26, 4914–4924.

Heuts, S., Kawczynski, M. J., Sayed, A., Urbut, S. M., Albuquerque, A. M., Mandrola, J. M., Kaul, S., Harrell, F. E., Gabrio, A., & Brophy, J. M. (2024). Bayesian Analytical Methods in Cardiovascular Clinical Trials: Why, When, and How. Canadian Journal of Cardiology, S0828282X24011309. https://doi.org/10.1016/j.cjca.2024.11.002

Ionan, A. C., Clark, J., Travis, J., Amatya, A., Scott, J., Smith, J. P., Chattopadhyay, S., Salerno, M. J., & Rothmann, M. (2022). Bayesian Methods in Human Drug and Biological Products Development in CDER and CBER. Ther Innov Regul Sci. https://doi.org/10.1007/s43441-022-00483-0

Examples of use of Bayes at FDA CDER and CBER

Ionan, A. C., Clark, J., Travis, J., Amatya, A., Scott, J., Smith, J. P., Chattopadhyay, S., Salerno, M. J., & Rothmann, M. (2023). Bayesian Methods in Human Drug and Biological Products Development in CDER and CBER. Ther Innov Regul Sci, 57(3), 436–444. https://doi.org/10.1007/s43441-022-00483-0

Joseph, L., & Bélisle, P. (1997). Bayesian sample size determination for normal means and differences between normal means. The Statistician, 46, 209–226.

Kopp‐Schneider, A., Calderazzo, S., & Wiesenfarth, M. (2019). Power gains by using external information in clinical trials are typically not possible when requiring strict type I error control. Biometrical Journal, 0(0). https://doi.org/10.1002/bimj.201800395

Kruschke, J. K. (2013). Bayesian estimation supersedes the t test. J Exp Psych, 142(2), 573–603. https://doi.org/10.1037/a0029146

Kruschke, J. K. (2015). Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan (Second Edition). Academic Press. http://www.sciencedirect.com/science/book/9780124058880

Kruschke, J. K., & Liddell, T. M. (2017). Bayesian data analysis for newcomers. 1–23. https://doi.org/10.3758/s13423-017-1272-1

Excellent for teaching Bayesian methods and explaining the advantages

Kunzmann, K., Grayling, M. J., Lee, K. M., Robertson, D. S., Rufibach, K., & Wason, J. M. S. (2021). A Review of Bayesian Perspectives on Sample Size Derivation for Confirmatory Trials. The American Statistician, 0(0), 1–9. https://doi.org/10.1080/00031305.2021.1901782

Laptook, A. R., Shankaran, S., Tyson, J. E., Munoz, B., Bell, E. F., Goldberg, R. N., Parikh, N. A., Ambalavanan, N., Pedroza, C., Pappas, A., Das, A., Chaudhary, A. S., Ehrenkranz, R. A., Hensman, A. M., Van Meurs, K. P., Chalak, L. F., Hamrick, S. E. G., Sokol, G. M., Walsh, M. C., … Higgins, R. D. (2017). Effect of Therapeutic Hypothermia Initiated After 6 Hours of Age on Death or Disability Among Newborns With Hypoxic-Ischemic Encephalopathy. JAMA, 318(16), 1550+. https://doi.org/10.1001/jama.2017.14972

Lindley, D. V. (1993). The Analysis of Experimental Data: The Appreciation of Tea and Wine. Teaching Statistics, 15(1), 22–25. https://doi.org/10.1111/j.1467-9639.1993.tb00252.x

Mark, D. B., Lee, K. L., & Harrell, F. E. (2016). Understanding the Role of P Values and Hypothesis Tests in Clinical Research. JAMA Card, 1(9), 1048–1054. https://doi.org/10.1001/jamacardio.2016.3312

Maxwell, N. (2004). Data Matters: Conceptual Statistics for a Random World. Key College Pub. https://books.google.com/books?id=KH5GAAAAYAAJ

McElreath, R. (2016). Statistical rethinking : A Bayesian course with examples in R and Stan. http://www.worldcat.org/isbn/9781482253443

Natanegara, F., Neuenschwander, B., Seaman, J. W., Kinnersley, N., Heilmann, C. R., Ohlssen, D., & Rochester, G. (2014). The current state of Bayesian methods in medical product development: Survey results and recommendations from the DIA Bayesian Scientific Working Group. Pharm Stat, 13(1), 3–12. https://doi.org/10.1002/pst.1595

Nuzzo, R. (2014). Scientific method: Statistical errors. Nature News, 506(7487), 150. https://doi.org/10.1038/506150a

Oakes, M. (1986). Statistical Inference: A Commentary for the Social and Behavioral Sciences. Wiley.

"It is incomparably more useful to have a plausible range for the value of a parameter than to know, with whatever degree of certitude, what single value is untenable."

Pezeshk, H., & Gittins, J. (2002). A fully Bayesian approach to calculating sample sizes for clinical trials with binary reponses. Drug Info J, 36, 143–150.

Rigat, F. (2023). A conservative approach to leveraging external evidence for effective clinical trial design. Pharmaceutical Statistics, pst.2339. https://doi.org/10.1002/pst.2339

Includes some sample size considerations to ensure that the prior is not too impactful

Rozeboom, W. (1960). The Fallacy of the Null-Hypothesis Significance Test. Psychological Bulletin, 57, 416.

Ruberg, S. J., Beckers, F., Hemmings, R., Honig, P., Irony, T., LaVange, L., Lieberman, G., Mayne, J., & Moscicki, R. (2023). Application of Bayesian approaches in drug development: Starting a virtuous cycle. Nat Rev Drug Discov, 1–16. https://doi.org/10.1038/s41573-023-00638-0

Rubin, M. (2024). Inconsistent multiple testing corrections: The fallacy of using family-based error rates to make inferences about individual hypotheses. Methods in Psychology, 10, 100140. https://doi.org/10.1016/j.metip.2024.100140

Senn, S. (2013). Being Efficient About Efficacy Estimation. Statistics in Biopharmaceutical Research, 5(3), 204–210. https://doi.org/10.1080/19466315.2012.754726

"Every time the statistician working in the pharmaceutical industry does a sample size determination for a trial using a responder analysis, he or she should do the same calculation using the original measure. If the dichotomy is preferred, an explanation as to why the extra millions are going to be spent should be provided."

Simon, R., & Freedman, L. S. (1997). Bayesian design and analysis of two two factorial clinical trials. Biometrics, 53, 456–464.

Spiegelhalter, D. J. (1986). Probabilistic prediction in patient management and clinical trials. Stat Med, 5, 421–433. https://doi.org/10.1002/sim.4780050506

z-test for calibration inaccuracy (implemented in Stata, and R Hmisc package’s val.prob function)

Spiegelhalter, D. J., Abrams, K. R., & Myles, J. P. (2004). Bayesian Approaches to Clinical Trials and Health-Care Evaluation. Wiley.

Spiegelhalter, D. J., & Freedman, L. S. (1986). A predictive approach to selecting the size of a clinical trial, based on subjective clinical opinion. Stat Med, 5, 1–13. https://doi.org/10.1002/sim.4780050103

Spiegelhalter, D. J., Freedman, L. S., & Parmar, M. K. B. (1993). Applying Bayesian ideas in drug development and clinical trials. Stat Med, 12, 1501–1511. https://doi.org/10.1002/sim.4780121516

Vickers, A. J. (2008). Decision analysis for the evaluation of diagnostic tests, prediction models, and molecular markers. Am Statistician, 62(4), 314–320.

limitations of accuracy metrics;incorporating clinical consequences;nice example of calculation of expected outcome;drawbacks of conventional decision analysis, especially because of the difficulty of eliciting the expected harm of a missed diagnosis;use of a threshold on the probability of disease for taking some action;decision curve;has other good references to decision analysis

Wagenmakers, E.-J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., Love, J., Selker, R., Gronau, Q. F., ̌Sḿıra, M., Epskamp, S., Matzke, D., Rouder, J. N., & Morey, R. D. (2017). Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications. 1–23. https://doi.org/10.3758/s13423-017-1343-3

Wang, H., Chow, S.-C., & Chen, M. (2005). A Bayesian Approach on Sample Size Calculation for Comparing Means. J Biopharm Stat, 15(5), 799–807. https://doi.org/10.1081/bip-200067789

analytic form for posterior for normal t-test case

Weber, K., Hemmings, R., & Koch, A. (2018). How to use prior knowledge and still give new data a chance? Pharmaceutical Statistics, 17(4), 329–341. https://doi.org/10.1002/pst.1862

Whitehead, J., Cleary, F., & Turner, A. (2015). Bayesian sample sizes for exploratory clinical trials comparing multiple experimental treatments with a control. Stat Med, 34(12), 2048–2061. https://doi.org/10.1002/sim.6469

Wiesenfarth, M., & Calderazzo, S. (2019). Quantification of Prior Impact in Terms of Effective Current Sample Size. Biometrics, 0. https://doi.org/10.1111/biom.13124