• Coherent Tests for Interval Null Hypotheses

    Type Journal Article
    Author Spencer Hansen
    Author Ken Rice
    URL https://doi.org/10.1080/00031305.2022.2050299
    Volume 0
    Issue 0
    Pages 1-9
    Publication The American Statistician
    ISSN 0003-1305
    Date 2022-03-08
    Extra Publisher: Taylor & Francis _eprint: https://doi.org/10.1080/00031305.2022.2050299
    DOI 10.1080/00031305.2022.2050299
    Accessed 4/9/2022, 11:06:10 AM
    Library Catalog Taylor and Francis+NEJM
    Abstract In a celebrated 1996 article, Schervish showed that, for testing interval null hypotheses, tests typically viewed as optimal can be logically incoherent. Specifically, one may fail to reject a specific interval null, but nevertheless—testing at the same level with the same data—reject a larger null, in which the original one is nested. This result has been used to argue against the widespread practice of viewing p-values as measures of evidence. In the current work we approach tests of interval nulls using simple Bayesian decision theory, and establish straightforward conditions that ensure coherence in Schervish’s sense. From these, we go on to establish novel frequentist criteria—different to Type I error rate—that, when controlled at fixed levels, give tests that are coherent in Schervish’s sense. The results suggest that exploring frequentist properties beyond the familiar Neyman–Pearson framework may ameliorate some of statistical testing’s well-known problems.
    Date Added 4/9/2022, 11:06:10 AM
    Modified 4/9/2022, 11:07:21 AM

    Tags:

    • bayes
    • coherent
    • hypothesis-testing
    • inference
    • interval-null

    Notes:

    • New way of looking at Schervish interval null hypothesis testing incoherence example

  • Testing Precise Hypotheses

    Type Journal Article
    Author James O. Berger
    Author Mohan Delampady
    URL https://projecteuclid.org/journals/statistical-science/volume-2/issue-3/Testing-Precise-Hypotheses/10.1214/ss/1177013238.full
    Volume 2
    Issue 3
    Pages 317-335
    Publication Statistical Science
    ISSN 0883-4237, 2168-8745
    Date 1987/08
    Extra Publisher: Institute of Mathematical Statistics
    DOI 10.1214/ss/1177013238
    Accessed 4/6/2022, 7:37:31 AM
    Library Catalog Project Euclid
    Abstract Testing of precise (point or small interval) hypotheses is reviewed, with special emphasis placed on exploring the dramatic conflict between conditional measures (Bayes factors and posterior probabilities) and the classical P-value (or observed significance level). This conflict is highlighted by finding lower bounds on the conditional measures over wide classes of priors, in normal and binomial situations, lower bounds, which are much larger than the P-value; this leads to the recommendation of several alternatives to P-values. Results are also given concerning the validity of approximating an interval null by a point null. The overall discussion features critical examination of issues such as the probability of objective testing and the possibility of testing from confidence sets.
    Date Added 4/6/2022, 7:37:31 AM
    Modified 4/6/2022, 7:38:03 AM

    Tags:

    • bayes
    • p-value

    Notes:

    • Quote from Section 4.6:

      Some statisticians argue that the implied logic concerning a small P-value is compelling: “Either H0 is true and a rare event has occurred, or H0 is false.”  One could again argue against this reasoning as addressing the wrong question, but there is a more obvious major flaw: the “rare event” whose probability is being calculated under H0 is not the event of observing the actual data x0, but the event E = {possible data x: |T(x)| >= |T(x0)|}.  The inclusion of all data “more extreme” than the actual x0 is a curious step, and one which we have seen no remotely convincing justification. … the “logic of surprise” cannot differentiate between x0 and E …

  • Why optional stopping can be a problem for Bayesians

    Type Journal Article
    Author Rianne de Heide
    Author Peter D. Grünwald
    URL https://doi.org/10.3758/s13423-020-01803-x
    Volume 28
    Issue 3
    Pages 795-812
    Publication Psychonomic Bulletin & Review
    ISSN 1531-5320
    Date 2021-06-01
    Journal Abbr Psychon Bull Rev
    DOI 10.3758/s13423-020-01803-x
    Accessed 3/24/2022, 11:11:58 AM
    Library Catalog Springer Link
    Language en
    Abstract Recently, optional stopping has been a subject of debate in the Bayesian psychology community. Rouder (Psychonomic Bulletin & Review 21(2), 301–308, 2014) argues that optional stopping is no problem for Bayesians, and even recommends the use of optional stopping in practice, as do (Wagenmakers, Wetzels, Borsboom, van der Maas & Kievit, Perspectives on Psychological Science 7, 627–633, 2012). This article addresses the question of whether optional stopping is problematic for Bayesian methods, and specifies under which circumstances and in which sense it is and is not. By slightly varying and extending Rouder’s (Psychonomic Bulletin & Review 21(2), 301–308, 2014) experiments, we illustrate that, as soon as the parameters of interest are equipped with default or pragmatic priors—which means, in most practical applications of Bayes factor hypothesis testing—resilience to optional stopping can break down. We distinguish between three types of default priors, each having their own specific issues with optional stopping, ranging from no-problem-at-all (type 0 priors) to quite severe (type II priors).
    Date Added 3/24/2022, 11:12:03 AM
    Modified 3/24/2022, 11:12:56 AM

    Tags:

    • sequential-monitoring
    • sequential
    • bayes
    • prior
    • bayes-factor
    • stopping

    Notes:

    • Interesting taxonomy of priors

  • Effective sample size for tests of censored survival data

    Type Journal Article
    Author JACQUELINE K. BENEDETTI
    Author PING-YU LIU
    Author HARLAND N. SATHER
    Author JACK SEINFELD
    Author MICHAEL A. EPTON
    URL https://doi.org/10.1093/biomet/69.2.343
    Volume 69
    Issue 2
    Pages 343-349
    Publication Biometrika
    ISSN 0006-3444
    Date 1982-08-01
    Journal Abbr Biometrika
    DOI 10.1093/biomet/69.2.343
    Accessed 3/17/2022, 11:19:46 AM
    Library Catalog Silverchair
    Abstract When survival experience of two groups is compared in the presence of arbitrary right censoring, the effective sample size for determining the power of the test used is usually taken to be the number of uncensored observations. This convention is examined through a Monte Carlo study. Empirical powers of the generalized Savage test and generalized Wilcoxon test with uncensored data are compared to those with censored data containing approximately the same number of uncensored observations. Large sample relative efficiencies are calculated for a Lehmann family of alternatives. It is shown that, depending on the underlying distribution and censoring mechanism, censored observations can add appreciably to the power of either test.
    Date Added 3/17/2022, 11:19:46 AM
    Modified 3/17/2022, 11:20:03 AM

    Tags:

    • survival
    • effective-sample-size
  • Integrating mortality and morbidity outcomes: using quality-adjusted life years in critical care trials

    Type Journal Article
    Author Niall D. Ferguson
    Author Damon C. Scales
    Author Ruxandra Pinto
    Author M. Elizabeth Wilcox
    Author Deborah J. Cook
    Author Gordon H. Guyatt
    Author Holger J. Schünemann
    Author John C. Marshall
    Author Margaret S. Herridge
    Author Maureen O. Meade
    Author Canadian Critical Care Trials Group
    Volume 187
    Issue 3
    Pages 256-261
    Publication American Journal of Respiratory and Critical Care Medicine
    ISSN 1535-4970
    Date 2013-02-01
    Extra PMID: 23204250
    Journal Abbr Am J Respir Crit Care Med
    DOI 10.1164/rccm.201206-1057OC
    Library Catalog PubMed
    Language eng
    Abstract RATIONALE: Outcome measures that integrate mortality and morbidity, like quality-adjusted life years (QALYs), have been proposed for critical care clinical trials. OBJECTIVES: We sought to describe the distribution of QALYs in critically ill patients and estimate sample size requirements for a hypothetical trial using QALYs as the primary outcome. METHODS: We used data from a prospective cohort study of survivors of acute respiratory distress syndrome to generate utility values and calculate QALYs at 6 and 12 months. Using multiple simulations, we estimated the required sample sizes for multiple outcome scenarios in a hypothetical trial, including a base-case wherein the intervention improved both mortality and QALYs among survivors. MEASUREMENTS AND MAIN RESULTS: From 195 enrolled patients, follow-up was sufficient to generate QALY outcomes for 168 (86.2%) at 6 months and 159 (81.5%) at 1 year. For a hypothetical intervention that reduced mortality from 48 to 44% and improved QALYs by 0.025 in survivors at 6 months, the required per-group sample size was 571 (80% power; two-sided α = 0.05), compared with 2,436 patients needed for a comparison focusing on mortality alone. When only mortality or QALY in survivors (but not both) showed improvement by these amounts, 3,426 and 1,827 patients per group were needed, respectively. When mortality and morbidity effects moved in opposite directions, simulation results became impossible to interpret. CONCLUSIONS: QALYs may be a feasible outcome in critical care trials yielding a patient-centered result and major gains in statistical power under certain conditions, but this approach is susceptible to several threats, including loss to follow-up.
    Short Title Integrating mortality and morbidity outcomes
    Date Added 3/14/2022, 12:45:35 PM
    Modified 3/14/2022, 12:46:57 PM

    Tags:

    • rct
    • qol
    • multiple-endpoints
    • critical-illness
    • utility
    • qaly

    Attachments

    • PubMed entry
  • Causal Directed Acyclic Graphs

    Type Journal Article
    Author Ari M. Lipsky
    Author Sander Greenland
    URL https://doi.org/10.1001/jama.2022.1816
    Publication JAMA
    ISSN 0098-7484
    Date February 28, 2022
    Journal Abbr JAMA
    DOI 10.1001/jama.2022.1816
    Accessed 3/1/2022, 7:32:10 AM
    Library Catalog Silverchair
    Abstract The design and interpretation of clinical studies requires consideration of variables beyond the exposure or treatment of interest and patient outcomes, including decisions about which variables to capture and, of those, which to control for in statistical analyses to minimize bias in estimating treatment effects. Causal directed acyclic graphs (DAGs) are a useful tool for communicating researchers’ understanding of the potential interplay among variables and are commonly used for mediation analysis. Assumptions are presented visually in a causal DAG and, based on this visual representation, researchers can deduce which variables require control to minimize bias and which variables could introduce bias if controlled in the analysis.
    Date Added 3/1/2022, 8:55:47 AM
    Modified 3/1/2022, 8:57:34 AM

    Tags:

    • teaching-mds
    • causal-effects
    • causality
    • directed-graph
    • causal-analysis
    • dag
  • Sample size formula for a win ratio endpoint

    Type Journal Article
    Author Ron Xiaolong Yu
    Author Jitendra Ganju
    URL https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.9297
    Volume n/a
    Issue n/a
    Publication Statistics in Medicine
    ISSN 1097-0258
    Date 2022
    DOI 10.1002/sim.9297
    Accessed 1/28/2022, 8:48:13 AM
    Library Catalog Wiley Online Library
    Language en
    Abstract The win ratio composite endpoint, which organizes the components of the composite hierarchically, is becoming popular in late-stage clinical trials. The method involves comparing data in a pair-wise manner starting with the endpoint highest in priority (eg, cardiovascular death). If the comparison is a tie, the endpoint next highest in priority (eg, hospitalizations for heart failure) is compared, and so on. Its sample size is usually calculated through complex simulations because there does not exist in the literature a simple sample size formula. This article provides a formula that depends on the probability that a randomly selected patient from one group does better than a randomly selected patient from another group, and on the probability of a tie. We compare the published 95% confidence intervals, which require patient-level data, with that calculated from the formula, requiring only summary-level data, for 17 composite or single win ratio endpoints. The two sets of results are similar. Simulations show the sample size formula performs well. The formula provides important insights. It shows when adding an endpoint to the hierarchy can increase power even if the added endpoint has low power by itself. It provides relevant information to modify an on-going blinded trial if necessary. The formula allows a non-specialist to quickly determine the size of the trial with a win ratio endpoint whose use is expected to increase over time.
    Date Added 3/1/2022, 8:55:59 AM
    Modified 3/1/2022, 8:56:50 AM

    Tags:

    • sample-size
    • multiple-endpoints
    • win-ratio
  • Improving adaptive seamless designs through Bayesian optimization

    Type Journal Article
    Author Jakob Richter
    Author Tim Friede
    Author Jörg Rahnenführer
    URL https://onlinelibrary.wiley.com/doi/abs/10.1002/bimj.202000389
    Volume n/a
    Issue n/a
    Publication Biometrical Journal
    ISSN 1521-4036
    Date 2021
    Extra _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/bimj.202000389
    DOI 10.1002/bimj.202000389
    Accessed 2/28/2022, 11:57:27 AM
    Library Catalog Wiley Online Library
    Language en
    Abstract We propose to use Bayesian optimization (BO) to improve the efficiency of the design selection process in clinical trials. BO is a method to optimize expensive black-box functions, by using a regression as a surrogate to guide the search. In clinical trials, planning test procedures and sample sizes is a crucial task. A common goal is to maximize the test power, given a set of treatments, corresponding effect sizes, and a total number of samples. From a wide range of possible designs, we aim to select the best one in a short time to allow quick decisions. The standard approach to simulate the power for each single design can become too time consuming. When the number of possible designs becomes very large, either large computational resources are required or an exhaustive exploration of all possible designs takes too long. Here, we propose to use BO to quickly find a clinical trial design with high power from a large number of candidate designs. We demonstrate the effectiveness of our approach by optimizing the power of adaptive seamless designs for different sets of treatment effect sizes. Comparing BO with an exhaustive evaluation of all candidate designs shows that BO finds competitive designs in a fraction of the time.
    Date Added 2/28/2022, 11:57:27 AM
    Modified 2/28/2022, 11:58:36 AM

    Tags:

    • bayes
    • adaptive-design
    • design
    • optimal-design
    • adaptive-clinical-trials
    • optimality
    • seamless-designs