Type | Journal Article |
---|---|
Author | Spencer Hansen |
Author | Ken Rice |
URL | https://doi.org/10.1080/00031305.2022.2050299 |
Volume | 0 |
Issue | 0 |
Pages | 1-9 |
Publication | The American Statistician |
ISSN | 0003-1305 |
Date | 2022-03-08 |
Extra | Publisher: Taylor & Francis _eprint: https://doi.org/10.1080/00031305.2022.2050299 |
DOI | 10.1080/00031305.2022.2050299 |
Accessed | 4/9/2022, 11:06:10 AM |
Library Catalog | Taylor and Francis+NEJM |
Abstract | In a celebrated 1996 article, Schervish showed that, for testing interval null hypotheses, tests typically viewed as optimal can be logically incoherent. Specifically, one may fail to reject a specific interval null, but nevertheless—testing at the same level with the same data—reject a larger null, in which the original one is nested. This result has been used to argue against the widespread practice of viewing p-values as measures of evidence. In the current work we approach tests of interval nulls using simple Bayesian decision theory, and establish straightforward conditions that ensure coherence in Schervish’s sense. From these, we go on to establish novel frequentist criteria—different to Type I error rate—that, when controlled at fixed levels, give tests that are coherent in Schervish’s sense. The results suggest that exploring frequentist properties beyond the familiar Neyman–Pearson framework may ameliorate some of statistical testing’s well-known problems. |
Date Added | 4/9/2022, 11:06:10 AM |
Modified | 4/9/2022, 11:07:21 AM |
New way of looking at Schervish interval null hypothesis testing incoherence example
Type | Journal Article |
---|---|
Author | James O. Berger |
Author | Mohan Delampady |
URL | https://projecteuclid.org/journals/statistical-science/volume-2/issue-3/Testing-Precise-Hypotheses/10.1214/ss/1177013238.full |
Volume | 2 |
Issue | 3 |
Pages | 317-335 |
Publication | Statistical Science |
ISSN | 0883-4237, 2168-8745 |
Date | 1987/08 |
Extra | Publisher: Institute of Mathematical Statistics |
DOI | 10.1214/ss/1177013238 |
Accessed | 4/6/2022, 7:37:31 AM |
Library Catalog | Project Euclid |
Abstract | Testing of precise (point or small interval) hypotheses is reviewed, with special emphasis placed on exploring the dramatic conflict between conditional measures (Bayes factors and posterior probabilities) and the classical P-value (or observed significance level). This conflict is highlighted by finding lower bounds on the conditional measures over wide classes of priors, in normal and binomial situations, lower bounds, which are much larger than the P-value; this leads to the recommendation of several alternatives to P-values. Results are also given concerning the validity of approximating an interval null by a point null. The overall discussion features critical examination of issues such as the probability of objective testing and the possibility of testing from confidence sets. |
Date Added | 4/6/2022, 7:37:31 AM |
Modified | 4/6/2022, 7:38:03 AM |
Quote from Section 4.6:
Some statisticians argue that the implied logic concerning a small P-value is compelling: “Either H0 is true and a rare event has occurred, or H0 is false.” One could again argue against this reasoning as addressing the wrong question, but there is a more obvious major flaw: the “rare event” whose probability is being calculated under H0 is not the event of observing the actual data x0, but the event E = {possible data x: |T(x)| >= |T(x0)|}. The inclusion of all data “more extreme” than the actual x0 is a curious step, and one which we have seen no remotely convincing justification. … the “logic of surprise” cannot differentiate between x0 and E …
Type | Journal Article |
---|---|
Author | Rianne de Heide |
Author | Peter D. Grünwald |
URL | https://doi.org/10.3758/s13423-020-01803-x |
Volume | 28 |
Issue | 3 |
Pages | 795-812 |
Publication | Psychonomic Bulletin & Review |
ISSN | 1531-5320 |
Date | 2021-06-01 |
Journal Abbr | Psychon Bull Rev |
DOI | 10.3758/s13423-020-01803-x |
Accessed | 3/24/2022, 11:11:58 AM |
Library Catalog | Springer Link |
Language | en |
Abstract | Recently, optional stopping has been a subject of debate in the Bayesian psychology community. Rouder (Psychonomic Bulletin & Review 21(2), 301–308, 2014) argues that optional stopping is no problem for Bayesians, and even recommends the use of optional stopping in practice, as do (Wagenmakers, Wetzels, Borsboom, van der Maas & Kievit, Perspectives on Psychological Science 7, 627–633, 2012). This article addresses the question of whether optional stopping is problematic for Bayesian methods, and specifies under which circumstances and in which sense it is and is not. By slightly varying and extending Rouder’s (Psychonomic Bulletin & Review 21(2), 301–308, 2014) experiments, we illustrate that, as soon as the parameters of interest are equipped with default or pragmatic priors—which means, in most practical applications of Bayes factor hypothesis testing—resilience to optional stopping can break down. We distinguish between three types of default priors, each having their own specific issues with optional stopping, ranging from no-problem-at-all (type 0 priors) to quite severe (type II priors). |
Date Added | 3/24/2022, 11:12:03 AM |
Modified | 3/24/2022, 11:12:56 AM |
Interesting taxonomy of priors
Type | Journal Article |
---|---|
Author | JACQUELINE K. BENEDETTI |
Author | PING-YU LIU |
Author | HARLAND N. SATHER |
Author | JACK SEINFELD |
Author | MICHAEL A. EPTON |
URL | https://doi.org/10.1093/biomet/69.2.343 |
Volume | 69 |
Issue | 2 |
Pages | 343-349 |
Publication | Biometrika |
ISSN | 0006-3444 |
Date | 1982-08-01 |
Journal Abbr | Biometrika |
DOI | 10.1093/biomet/69.2.343 |
Accessed | 3/17/2022, 11:19:46 AM |
Library Catalog | Silverchair |
Abstract | When survival experience of two groups is compared in the presence of arbitrary right censoring, the effective sample size for determining the power of the test used is usually taken to be the number of uncensored observations. This convention is examined through a Monte Carlo study. Empirical powers of the generalized Savage test and generalized Wilcoxon test with uncensored data are compared to those with censored data containing approximately the same number of uncensored observations. Large sample relative efficiencies are calculated for a Lehmann family of alternatives. It is shown that, depending on the underlying distribution and censoring mechanism, censored observations can add appreciably to the power of either test. |
Date Added | 3/17/2022, 11:19:46 AM |
Modified | 3/17/2022, 11:20:03 AM |
Type | Journal Article |
---|---|
Author | Niall D. Ferguson |
Author | Damon C. Scales |
Author | Ruxandra Pinto |
Author | M. Elizabeth Wilcox |
Author | Deborah J. Cook |
Author | Gordon H. Guyatt |
Author | Holger J. Schünemann |
Author | John C. Marshall |
Author | Margaret S. Herridge |
Author | Maureen O. Meade |
Author | Canadian Critical Care Trials Group |
Volume | 187 |
Issue | 3 |
Pages | 256-261 |
Publication | American Journal of Respiratory and Critical Care Medicine |
ISSN | 1535-4970 |
Date | 2013-02-01 |
Extra | PMID: 23204250 |
Journal Abbr | Am J Respir Crit Care Med |
DOI | 10.1164/rccm.201206-1057OC |
Library Catalog | PubMed |
Language | eng |
Abstract | RATIONALE: Outcome measures that integrate mortality and morbidity, like quality-adjusted life years (QALYs), have been proposed for critical care clinical trials. OBJECTIVES: We sought to describe the distribution of QALYs in critically ill patients and estimate sample size requirements for a hypothetical trial using QALYs as the primary outcome. METHODS: We used data from a prospective cohort study of survivors of acute respiratory distress syndrome to generate utility values and calculate QALYs at 6 and 12 months. Using multiple simulations, we estimated the required sample sizes for multiple outcome scenarios in a hypothetical trial, including a base-case wherein the intervention improved both mortality and QALYs among survivors. MEASUREMENTS AND MAIN RESULTS: From 195 enrolled patients, follow-up was sufficient to generate QALY outcomes for 168 (86.2%) at 6 months and 159 (81.5%) at 1 year. For a hypothetical intervention that reduced mortality from 48 to 44% and improved QALYs by 0.025 in survivors at 6 months, the required per-group sample size was 571 (80% power; two-sided α = 0.05), compared with 2,436 patients needed for a comparison focusing on mortality alone. When only mortality or QALY in survivors (but not both) showed improvement by these amounts, 3,426 and 1,827 patients per group were needed, respectively. When mortality and morbidity effects moved in opposite directions, simulation results became impossible to interpret. CONCLUSIONS: QALYs may be a feasible outcome in critical care trials yielding a patient-centered result and major gains in statistical power under certain conditions, but this approach is susceptible to several threats, including loss to follow-up. |
Short Title | Integrating mortality and morbidity outcomes |
Date Added | 3/14/2022, 12:45:35 PM |
Modified | 3/14/2022, 12:46:57 PM |
Type | Journal Article |
---|---|
Author | Ari M. Lipsky |
Author | Sander Greenland |
URL | https://doi.org/10.1001/jama.2022.1816 |
Publication | JAMA |
ISSN | 0098-7484 |
Date | February 28, 2022 |
Journal Abbr | JAMA |
DOI | 10.1001/jama.2022.1816 |
Accessed | 3/1/2022, 7:32:10 AM |
Library Catalog | Silverchair |
Abstract | The design and interpretation of clinical studies requires consideration of variables beyond the exposure or treatment of interest and patient outcomes, including decisions about which variables to capture and, of those, which to control for in statistical analyses to minimize bias in estimating treatment effects. Causal directed acyclic graphs (DAGs) are a useful tool for communicating researchers’ understanding of the potential interplay among variables and are commonly used for mediation analysis. Assumptions are presented visually in a causal DAG and, based on this visual representation, researchers can deduce which variables require control to minimize bias and which variables could introduce bias if controlled in the analysis. |
Date Added | 3/1/2022, 8:55:47 AM |
Modified | 3/1/2022, 8:57:34 AM |
Type | Journal Article |
---|---|
Author | Ron Xiaolong Yu |
Author | Jitendra Ganju |
URL | https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.9297 |
Volume | n/a |
Issue | n/a |
Publication | Statistics in Medicine |
ISSN | 1097-0258 |
Date | 2022 |
DOI | 10.1002/sim.9297 |
Accessed | 1/28/2022, 8:48:13 AM |
Library Catalog | Wiley Online Library |
Language | en |
Abstract | The win ratio composite endpoint, which organizes the components of the composite hierarchically, is becoming popular in late-stage clinical trials. The method involves comparing data in a pair-wise manner starting with the endpoint highest in priority (eg, cardiovascular death). If the comparison is a tie, the endpoint next highest in priority (eg, hospitalizations for heart failure) is compared, and so on. Its sample size is usually calculated through complex simulations because there does not exist in the literature a simple sample size formula. This article provides a formula that depends on the probability that a randomly selected patient from one group does better than a randomly selected patient from another group, and on the probability of a tie. We compare the published 95% confidence intervals, which require patient-level data, with that calculated from the formula, requiring only summary-level data, for 17 composite or single win ratio endpoints. The two sets of results are similar. Simulations show the sample size formula performs well. The formula provides important insights. It shows when adding an endpoint to the hierarchy can increase power even if the added endpoint has low power by itself. It provides relevant information to modify an on-going blinded trial if necessary. The formula allows a non-specialist to quickly determine the size of the trial with a win ratio endpoint whose use is expected to increase over time. |
Date Added | 3/1/2022, 8:55:59 AM |
Modified | 3/1/2022, 8:56:50 AM |
Type | Journal Article |
---|---|
Author | Jakob Richter |
Author | Tim Friede |
Author | Jörg Rahnenführer |
URL | https://onlinelibrary.wiley.com/doi/abs/10.1002/bimj.202000389 |
Volume | n/a |
Issue | n/a |
Publication | Biometrical Journal |
ISSN | 1521-4036 |
Date | 2021 |
Extra | _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/bimj.202000389 |
DOI | 10.1002/bimj.202000389 |
Accessed | 2/28/2022, 11:57:27 AM |
Library Catalog | Wiley Online Library |
Language | en |
Abstract | We propose to use Bayesian optimization (BO) to improve the efficiency of the design selection process in clinical trials. BO is a method to optimize expensive black-box functions, by using a regression as a surrogate to guide the search. In clinical trials, planning test procedures and sample sizes is a crucial task. A common goal is to maximize the test power, given a set of treatments, corresponding effect sizes, and a total number of samples. From a wide range of possible designs, we aim to select the best one in a short time to allow quick decisions. The standard approach to simulate the power for each single design can become too time consuming. When the number of possible designs becomes very large, either large computational resources are required or an exhaustive exploration of all possible designs takes too long. Here, we propose to use BO to quickly find a clinical trial design with high power from a large number of candidate designs. We demonstrate the effectiveness of our approach by optimizing the power of adaptive seamless designs for different sets of treatment effect sizes. Comparing BO with an exhaustive evaluation of all candidate designs shows that BO finds competitive designs in a fraction of the time. |
Date Added | 2/28/2022, 11:57:27 AM |
Modified | 2/28/2022, 11:58:36 AM |