Adcock, C. J. (1997). Sample size determination: A review. The Statistician, 46, 261–283.
Albert, J. (2007). Bayesian Computation with R. Springer.
Altman, D. G., & Bland, J. M. (1995). Absence of evidence is not evidence of absence. BMJ, 311, 485.
Bendtsen, M. (2018). A Gentle Introduction to the Comparison Between Null Hypothesis Testing and Bayesian Analysis: Reanalysis of Two Randomized Controlled Trials. Journal of Medical Internet Research, 20(10), e10873.
Using priors forces us to be more specific and explicit about what we mean when we say that something is unknown... the Bayesian approach does not attempt to identify a fixed value for the parameters and dichotomize the world into significant and nonsignificant, but rather relies on the researcher to do the scientific inference and not to delegate this obligation to the statistical model... the NHST approach is rooted in the idea of being able to redo the experiment many times (so as to get a sampling distribution).  Even if we can rely on theoretical results to get this sampling distribution without actually going back in time and redoing the experiment, the underlying idea can be somewhat problematic.  What do we mean by redoing an experiment? Can we redo a randomized controlled trial while keeping all things equal and recruiting a new sample from the study population?... Once we remove ourselves from the dichotomization of evidence, other things start to take precedence: critically assessing the models chosen, evaluating the quality of the data, interpreting the real-world impact of the results, etc.
Berry, D. A. (1987). Interim analysis in clinical trials: The role of the likelihood principle. Am Statistician, 41, 117–122.
Berry, D. A. (2006). Bayesian clinical trials. Nat Rev, 5, 27–36.
excellent review of Bayesian approaches in clinical trials; "The greatest virtue of the traditional approach may be its extreme rigour and narrowness of focus to the experiment at hand, but a side effect of this virtue is inflexibility, which in turn limits innovation in the design and analysis of clinical trials. ... The set of “other possible results” depends on the experimental design. ... Everything that is known is taken as given and all probabilities are calculated conditionally on known values. ... in contrast to the frequentist approach, only the probabilities of the observed results matter. ... The continuous learning that is possible in the Bayesian approach enables investigators to modify trials in midcourse. ... it is possible to learn from small samples, depending on the results, ... it is possible to adapt to what is learned to enable better treatment of patients. ... subjectivity in prior distributions is explicit and open to examination (and critique) by all. ... The Bayesian approach has several advantages in drug development. One is the process of updating knowledge gradually rather than restricting revisions in study design to large, discrete steps measured in trials or phases."
Blume, J. D. (2002). Likelihood methods for measuring statistical evidence. Stat Med, 21(17), 2563–2599.
Blume, J. D. (2008). How often Likelihood ratios are misleading in sequential trials. Comm Stat Th Meth, 37(8), 1193–1206.
Braun, T. M. (n.d.). Motivating sample sizes in adaptive Phase I trials via Bayesian posterior credible intervals. Biom, n/a.
Briggs, W. M. (2017). The Substitute for p-Values. JASA, 112(519), 897–898.
Cohen, J. (1994). The earth is round (p < .05). Am Psychologist, 49(12), 997–1003.
Cook, R. J., & Farewell, V. T. (1996). Multiplicity considerations in the design and analysis of clinical trials. J Roy Stat Soc A, 159, 93–110.
argues that if results are intended to be interpreted marginally, there may be no need for controlling experimentwise error rate.  FH phrasing: Cook and Farewell point out that when a strong priority order is pre-specified for separate clinical questions, and that same order is also the reporting order (no cherry picking), there is no need for multiplicity adjustment.  This is in contrast with a study whose aim is to find an endpoint or find a patient subgroup that is benefited by treatment, a situation requiring conservative multiplicity adjustment.
Dallow, N., Best, N., & Montague, T. H. (2018). Better decision making in drug development through adoption of formal prior elicitation. Pharm Stat, 0(0).
Dawid, A. P. (2000). Comment on “the philosophy of statistics” by D. V. Lindley. The Statistician, 49, 325–326.
Deming, W. E. (1975). On Probability as a Basis for Action. Am Statistician, 29(4), 146–152.
Edwards, W., Lindman, H., & Savage, L. J. (1963). Bayesian statistical inference for psychological research. Psych Rev, 70(3), 193–242.
Emerson, S. S. (1995). Stopping a clinical trial very early based on unplanned interim analysis: A group sequential approach. Biometrics, 51, 1152–1162.
Feinstein, A. R. (1977). Clinical Biostatistics. C. V. Mosby.
Gelman, A. (2013). P Values and Statistical Practice. Epi, 24(1), 69–72.
Gelman, A. (2015). Bayesian and Frequentist Regression Methods. Stat Med, 34(7), 1259–1260.
Gelman, A., & Hennig, C. (2017). Beyond subjective and objective in statistics.̃gelman/research/published/objectivityr5.pdf
Goodman, S. N. (1999). Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy. Ann Int Med, 130(12), 995+.
Nice language for what happens when scientists use NHST to justify strong statements in their conclusions and interpretation; p-value fallacy
Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. Eur J Epi, 31(4), 337–350.
Best article on misinterpretation of p-values. Pithy summaries.
Greenwald, A. G., Gonzalez, R., Harris, R., & Guthrie, D. (1996). Effect sizes and p values: What should be reported and what should be replicated? Psychophysiology, 33(2), 175–183.
Grouin, J.-M., Coste, M., Bunouf, P., & Lecoutre, B. (2007). Bayesian sample size determination in non-sequential clinical trials: Statistical aspects and some regulatory considerations. Stat Med, 26, 4914–4924.
Ionan, A. C., Clark, J., Travis, J., Amatya, A., Scott, J., Smith, J. P., Chattopadhyay, S., Salerno, M. J., & Rothmann, M. (2022). Bayesian Methods in Human Drug and Biological Products Development in CDER and CBER. Ther Innov Regul Sci.
Examples of use of Bayes at FDA CDER and CBER
Joseph, L., & Bélisle, P. (1997). Bayesian sample size determination for normal means and differences between normal means. The Statistician, 46, 209–226.
Kopp‐Schneider, A., Calderazzo, S., & Wiesenfarth, M. (2019). Power gains by using external information in clinical trials are typically not possible when requiring strict type I error control. Biometrical Journal, 0(0).
Kruschke, J. K. (2013). Bayesian estimation supersedes the t test. J Exp Psych, 142(2), 573–603.
Kruschke, J. K. (2015). Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan (Second Edition). Academic Press.
Kruschke, J. K., & Liddell, T. M. (2017). Bayesian data analysis for newcomers. 1–23.
Excellent for teaching Bayesian methods and explaining the advantages
Kunzmann, K., Grayling, M. J., Lee, K. M., Robertson, D. S., Rufibach, K., & Wason, J. M. S. (2021). A Review of Bayesian Perspectives on Sample Size Derivation for Confirmatory Trials. The American Statistician, 0(0), 1–9.
Laptook, A. R., Shankaran, S., Tyson, J. E., Munoz, B., Bell, E. F., Goldberg, R. N., Parikh, N. A., Ambalavanan, N., Pedroza, C., Pappas, A., Das, A., Chaudhary, A. S., Ehrenkranz, R. A., Hensman, A. M., Van Meurs, K. P., Chalak, L. F., Hamrick, S. E. G., Sokol, G. M., Walsh, M. C., … Higgins, R. D. (2017). Effect of Therapeutic Hypothermia Initiated After 6 Hours of Age on Death or Disability Among Newborns With Hypoxic-Ischemic Encephalopathy. JAMA, 318(16), 1550+.
Lindley, D. V. (1993). The Analysis of Experimental Data: The Appreciation of Tea and Wine. Teaching Statistics, 15(1), 22–25.
Mark, D. B., Lee, K. L., & Harrell, F. E. (2016). Understanding the Role of P Values and Hypothesis Tests in Clinical Research. JAMA Card, 1(9), 1048–1054.
Maxwell, N. (2004). Data Matters: Conceptual Statistics for a Random World. Key College Pub.
McElreath, R. (2016). Statistical rethinking : A Bayesian course with examples in R and Stan.
Natanegara, F., Neuenschwander, B., Seaman, J. W., Kinnersley, N., Heilmann, C. R., Ohlssen, D., & Rochester, G. (2014). The current state of Bayesian methods in medical product development: Survey results and recommendations from the DIA Bayesian Scientific Working Group. Pharm Stat, 13(1), 3–12.
Nuzzo, R. (2014). Scientific method: Statistical errors. Nature News, 506(7487), 150.
Oakes, M. (1986). Statistical Inference: A Commentary for the Social and Behavioral Sciences. Wiley.
"It is incomparably more useful to have a plausible range for the value of a parameter than to know, with whatever degree of certitude, what single value is untenable."
Pezeshk, H., & Gittins, J. (2002). A fully Bayesian approach to calculating sample sizes for clinical trials with binary reponses. Drug Info J, 36, 143–150.
Rigat, F. (2023). A conservative approach to leveraging external evidence for effective clinical trial design. Pharmaceutical Statistics, pst.2339.
Includes some sample size considerations to ensure that the prior is not too impactful
Rozeboom, W. (1960). The Fallacy of the Null-Hypothesis Significance Test. Psychological Bulletin, 57, 416.
Ruberg, S. J., Beckers, F., Hemmings, R., Honig, P., Irony, T., LaVange, L., Lieberman, G., Mayne, J., & Moscicki, R. (2023). Application of Bayesian approaches in drug development: Starting a virtuous cycle. Nat Rev Drug Discov, 1–16.
Senn, S. (2013). Being Efficient About Efficacy Estimation. Statistics in Biopharmaceutical Research, 5(3), 204–210.
"Every time the statistician working in the pharmaceutical industry does a sample size determination for a trial using a responder analysis, he or she should do the same calculation using the original measure.  If the dichotomy is preferred, an explanation as to why the extra millions are going to be spent should be provided."
Simon, R., & Freedman, L. S. (1997). Bayesian design and analysis of two two factorial clinical trials. Biometrics, 53, 456–464.
Spiegelhalter, D. J. (1986). Probabilistic prediction in patient management and clinical trials. Stat Med, 5, 421–433.
z-test for calibration inaccuracy (implemented in Stata, and R Hmisc package’s val.prob function)
Spiegelhalter, David J., Abrams, K. R., & Myles, J. P. (2004). Bayesian Approaches to Clinical Trials and Health-Care Evaluation. Wiley.
Spiegelhalter, David J., & Freedman, L. S. (1986). A predictive approach to selecting the size of a clinical trial, based on subjective clinical opinion. Stat Med, 5, 1–13.
Spiegelhalter, David J., Freedman, L. S., & Parmar, M. K. B. (1993). Applying Bayesian ideas in drug development and clinical trials. Stat Med, 12, 1501–1511.
Vickers, A. J. (2008). Decision analysis for the evaluation of diagnostic tests, prediction models, and molecular markers. Am Statistician, 62(4), 314–320.
limitations of accuracy metrics;incorporating clinical consequences;nice example of calculation of expected outcome;drawbacks of conventional decision analysis, especially because of the difficulty of eliciting the expected harm of a missed diagnosis;use of a threshold on the probability of disease for taking some action;decision curve;has other good references to decision analysis
Wagenmakers, E.-J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., Love, J., Selker, R., Gronau, Q. F., ̌Sḿıra, M., Epskamp, S., Matzke, D., Rouder, J. N., & Morey, R. D. (2017). Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications. 1–23.
Wang, H., Chow, S.-C., & Chen, M. (2005). A Bayesian Approach on Sample Size Calculation for Comparing Means. J Biopharm Stat, 15(5), 799–807.
analytic form for posterior for normal t-test case
Weber, K., Hemmings, R., & Koch, A. (2018). How to use prior knowledge and still give new data a chance? Pharmaceutical Statistics, 17(4), 329–341.
Whitehead, J., Cleary, F., & Turner, A. (2015). Bayesian sample sizes for exploratory clinical trials comparing multiple experimental treatments with a control. Stat Med, 34(12), 2048–2061.
Wiesenfarth, M., & Calderazzo, S. (2019). Quantification of Prior Impact in Terms of Effective Current Sample Size. Biometrics, 0.