Item Type | Journal Article |
---|---|
Author | Alan Herschtal |
URL | https://doi.org/10.1186/s12874-023-01878-9 |
Volume | 23 |
Issue | 1 |
Pages | 60 |
Publication | BMC Medical Research Methodology |
ISSN | 1471-2288 |
Date | 2023-03-13 |
Journal Abbr | BMC Medical Research Methodology |
DOI | 10.1186/s12874-023-01878-9 |
Accessed | 9/11/2023, 12:08:26 PM |
Library Catalog | BioMed Central |
Abstract | Baseline imbalance in covariates associated with the primary outcome in clinical trials leads to bias in the reporting of results. Standard practice is to mitigate that bias by stratifying by those covariates in the randomization. Additionally, for continuously valued outcome variables, precision of estimates can be (and should be) improved by controlling for those covariates in analysis. Continuously valued covariates are commonly thresholded for the purpose of performing stratified randomization, with participants being allocated to arms such that balance between arms is achieved within each stratum. Often the thresholding consists of a simple dichotomization. For simplicity, it is also common practice to dichotomize the covariate when controlling for it at the analysis stage. This latter dichotomization is unnecessary, and has been shown in the literature to result in a loss of precision when compared with controlling for the covariate in its raw, continuous form. Analytic approaches to quantifying the magnitude of the loss of precision are generally confined to the most convenient case of a normally distributed covariate. This work generalises earlier findings, examining the effect on treatment effect estimation of dichotomizing skew-normal covariates, which are characteristic of a far wider range of real-world scenarios than their normal equivalents. |
Date Added | 9/11/2023, 12:08:26 PM |
Modified | 9/11/2023, 12:09:00 PM |
Item Type | Journal Article |
---|---|
Author | Jeanne A. Teresi |
Author | Xiaoying Yu |
Author | Anita L. Stewart |
Author | Ron D. Hays |
URL | https://journals.lww.com/lww-medicalcare/abstract/2022/01000/guidelines_for_designing_and_evaluating.14.aspx |
Volume | 60 |
Issue | 1 |
Pages | 95 |
Publication | Medical Care |
ISSN | 0025-7079 |
Date | January 2022 |
DOI | 10.1097/MLR.0000000000001664 |
Accessed | 8/26/2023, 3:53:21 PM |
Library Catalog | journals.lww.com |
Language | en-US |
Abstract | Background: Pilot studies test the feasibility of methods and procedures to be used in larger-scale studies. Although numerous articles describe guidelines for the conduct of pilot studies, few have included specific feasibility indicators or strategies for evaluating multiple aspects of feasibility. In addition, using pilot studies to estimate effect sizes to plan sample sizes for subsequent randomized controlled trials has been challenged; however, there has been little consensus on alternative strategies. Methods: In Section 1, specific indicators (recruitment, retention, intervention fidelity, acceptability, adherence, and engagement) are presented for feasibility assessment of data collection methods and intervention implementation. Section 1 also highlights the importance of examining feasibility when adapting an intervention tested in mainstream populations to a new more diverse group. In Section 2, statistical and design issues are presented, including sample sizes for pilot studies, estimates of minimally important differences, design effects, confidence intervals (CI) and nonparametric statistics. An in-depth treatment of the limits of effect size estimation as well as process variables is presented. Tables showing CI around parameters are provided. With small samples, effect size, completion and adherence rate estimates will have large CI. Conclusion: This commentary offers examples of indicators for evaluating feasibility, and of the limits of effect size estimation in pilot studies. As demonstrated, most pilot studies should not be used to estimate effect sizes, provide power calculations for statistical tests or perform exploratory analyses of efficacy. It is hoped that these guidelines will be useful to those planning pilot/feasibility studies before a larger-scale study. |
Date Added | 8/26/2023, 3:53:21 PM |
Modified | 8/26/2023, 3:53:54 PM |
Item Type | Preprint |
---|---|
Author | Florence Bockting |
Author | Stefan T. Radev |
Author | Paul-Christian Bürkner |
URL | http://arxiv.org/abs/2308.11672 |
Date | 2023-08-22 |
Extra | arXiv:2308.11672 [stat] |
DOI | 10.48550/arXiv.2308.11672 |
Accessed | 8/25/2023, 5:57:22 PM |
Library Catalog | arXiv.org |
Abstract | A central characteristic of Bayesian statistics is the ability to consistently incorporate prior knowledge into various modeling processes. In this paper, we focus on translating domain expert knowledge into corresponding prior distributions over model parameters, a process known as prior elicitation. Expert knowledge can manifest itself in diverse formats, including information about raw data, summary statistics, or model parameters. A major challenge for existing elicitation methods is how to effectively utilize all of these different formats in order to formulate prior distributions that align with the expert's expectations, regardless of the model structure. To address these challenges, we develop a simulation-based elicitation method that can learn the hyperparameters of potentially any parametric prior distribution from a wide spectrum of expert knowledge using stochastic gradient descent. We validate the effectiveness and robustness of our elicitation method in four representative case studies covering linear models, generalized linear models, and hierarchical models. Our results support the claim that our method is largely independent of the underlying model structure and adaptable to various elicitation techniques, including quantile-based, moment-based, and histogram-based methods. |
Repository | arXiv |
Archive ID | arXiv:2308.11672 |
Date Added | 8/25/2023, 5:57:22 PM |
Modified | 8/25/2023, 5:57:51 PM |
Item Type | Journal Article |
---|---|
Author | Charlotte Dugourd |
Author | Amna Abichou-Klich |
Author | René Ecochard |
Author | Fabien Subtil |
URL | https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.9876 |
Volume | n/a |
Issue | n/a |
Publication | Statistics in Medicine |
Extra | Citation Key: https://doi.org/10.1002/sim.9876 tex.eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.9876 |
DOI | https://doi.org/10.1002/sim.9876 |
Abstract | Classifying patient biomarker trajectories into groups has become frequent in clinical research. Mixed effects classification models can be used to model the heterogeneity of longitudinal data. The estimated parameters of typical trajectories and the partition can be provided by the classification version of the expectation maximization algorithm, named CEM. However, the variance of the parameter estimates obtained underestimates the true variance because classification uncertainties are not taken into account. This article takes into account these uncertainties by using the stochastic EM algorithm (SEM), a stochastic version of the CEM algorithm, after convergence of the CEM algorithm. The simulations showed correct coverage probabilities of the 95% confidence intervals (close to 95% except for scenarios with high bias in typical trajectories). The method was applied on a trial, called low-cyclo, that compared the effects of low vs standard cyclosporine A doses on creatinine levels after cardiac transplantation. It identified groups of patients for whom low-dose cyclosporine may be relevant, but with high uncertainty on the dose-effect estimate. |
Date Added | 8/17/2023, 2:19:58 PM |
Modified | 8/17/2023, 2:21:28 PM |
Item Type | Journal Article |
---|---|
Author | Min Woo Sun |
Author | Robert Tibshirani |
URL | https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.9873 |
Volume | n/a |
Issue | n/a |
Publication | Statistics in Medicine |
Extra | Citation Key: https://doi.org/10.1002/sim.9873 tex.eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.9873 |
DOI | https://doi.org/10.1002/sim.9873 |
Abstract | Summary Cross-validation (CV) is one of the most widely used techniques in statistical learning for estimating the test error of a model, but its behavior is not yet fully understood. It has been shown that standard confidence intervals for test error using estimates from CV may have coverage below nominal levels. This phenomenon occurs because each sample is used in both the training and testing procedures during CV and as a result, the CV estimates of the errors become correlated. Without accounting for this correlation, the estimate of the variance is smaller than it should be. One way to mitigate this issue is by estimating the mean squared error of the prediction error instead using nested CV. This approach has been shown to achieve superior coverage compared to intervals derived from standard CV. In this work, we generalize the nested CV idea to the Cox proportional hazards model and explore various choices of test error for this setting. |
Date Added | 8/17/2023, 2:17:56 PM |
Modified | 8/17/2023, 2:20:52 PM |
Item Type | Journal Article |
---|---|
Author | Matias D. Cattaneo |
Author | Luke Keele |
Author | Rocío Titiunik |
URL | https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.9861 |
Rights | © 2023 John Wiley & Sons Ltd. |
Volume | n/a |
Issue | n/a |
Publication | Statistics in Medicine |
ISSN | 1097-0258 |
Extra | _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.9861 |
DOI | 10.1002/sim.9861 |
Accessed | 8/2/2023, 5:33:06 PM |
Library Catalog | Wiley Online Library |
Language | en |
Abstract | We present a practical guide for the analysis of regression discontinuity (RD) designs in biomedical contexts. We begin by introducing key concepts, assumptions, and estimands within both the continuity-based framework and the local randomization framework. We then discuss modern estimation and inference methods within both frameworks, including approaches for bandwidth or local neighborhood selection, optimal treatment effect point estimation, and robust bias-corrected inference methods for uncertainty quantification. We also overview empirical falsification tests that can be used to support key assumptions. Our discussion focuses on two particular features that are relevant in biomedical research: (i) fuzzy RD designs, which often arise when therapeutic treatments are based on clinical guidelines, but patients with scores near the cutoff are treated contrary to the assignment rule; and (ii) RD designs with discrete scores, which are ubiquitous in biomedical applications. We illustrate our discussion with three empirical applications: the effect CD4 guidelines for anti-retroviral therapy on retention of HIV patients in South Africa, the effect of genetic guidelines for chemotherapy on breast cancer recurrence in the United States, and the effects of age-based patient cost-sharing on healthcare utilization in Taiwan. Complete replication materials employing publicly available data and statistical software in Python, R and Stata are provided, offering researchers all necessary tools to conduct an RD analysis. |
Date Added | 8/2/2023, 5:33:06 PM |
Modified | 8/2/2023, 5:33:35 PM |
Item Type | Journal Article |
---|---|
Author | Sebastian Häckl |
Author | Armin Koch |
Author | Florian Lasch |
URL | https://onlinelibrary.wiley.com/doi/abs/10.1002/pst.2328 |
Rights | © 2023 The Authors. Pharmaceutical Statistics published by John Wiley & Sons Ltd. |
Volume | n/a |
Issue | n/a |
Publication | Pharmaceutical Statistics |
ISSN | 1539-1612 |
Extra | _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/pst.2328 |
DOI | 10.1002/pst.2328 |
Accessed | 7/31/2023, 2:19:19 PM |
Library Catalog | Wiley Online Library |
Language | en |
Abstract | Pre-specification of the primary analysis model is a pre-requisite to control the family-wise type-I-error rate (T1E) at the intended level in confirmatory clinical trials. However, mixed models for repeated measures (MMRM) have been shown to be poorly specified in study protocols. The magnitude of a resulting T1E rate inflation is still unknown. This investigation aims to quantify the magnitude of the T1E rate inflation depending on the type and number of unspecified model items as well as different trial characteristics. We simulated a randomized, double-blind, parallel group, phase III clinical trial under the assumption that there is no treatment effect at any time point. The simulated data was analysed using different clusters, each including several MMRMs that are compatible with the imprecise pre-specification of the MMRM. T1E rates for each cluster were estimated. A significant T1E rate inflation could be shown for ambiguous model specifications with a maximum T1E rate of 7.6% [7.1%; 8.1%]. The results show that the magnitude of the T1E rate inflation depends on the type and number of unspecified model items as well as the sample size and allocation ratio. The imprecise specification of nuisance parameters may not lead to a significant T1E rate inflation. However, the results of this simulation study rather underestimate the true T1E rate inflation. In conclusion, imprecise MMRM specifications may lead to a substantial inflation of the T1E rate and can damage the ability to generate confirmatory evidence in pivotal clinical trials. |
Date Added | 7/31/2023, 2:19:19 PM |
Modified | 7/31/2023, 2:20:15 PM |
Item Type | Journal Article |
---|---|
Author | Richard D. Riley |
Author | Gary S. Collins |
URL | https://onlinelibrary.wiley.com/doi/abs/10.1002/bimj.202200302 |
Rights | © 2023 The Authors. Biometrical Journal published by Wiley-VCH GmbH. |
Volume | n/a |
Issue | n/a |
Pages | 2200302 |
Publication | Biometrical Journal |
ISSN | 1521-4036 |
Extra | _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/bimj.202200302 |
DOI | 10.1002/bimj.202200302 |
Accessed | 7/20/2023, 2:44:28 PM |
Library Catalog | Wiley Online Library |
Language | en |
Abstract | Clinical prediction models estimate an individual's risk of a particular health outcome. A developed model is a consequence of the development dataset and model-building strategy, including the sample size, number of predictors, and analysis method (e.g., regression or machine learning). We raise the concern that many models are developed using small datasets that lead to instability in the model and its predictions (estimated risks). We define four levels of model stability in estimated risks moving from the overall mean to the individual level. Through simulation and case studies of statistical and machine learning approaches, we show instability in a model's estimated risks is often considerable, and ultimately manifests itself as miscalibration of predictions in new data. Therefore, we recommend researchers always examine instability at the model development stage and propose instability plots and measures to do so. This entails repeating the model-building steps (those used to develop the original prediction model) in each of multiple (e.g., 1000) bootstrap samples, to produce multiple bootstrap models, and deriving (i) a prediction instability plot of bootstrap model versus original model predictions; (ii) the mean absolute prediction error (mean absolute difference between individuals’ original and bootstrap model predictions), and (iii) calibration, classification, and decision curve instability plots of bootstrap models applied in the original sample. A case study illustrates how these instability assessments help reassure (or not) whether model predictions are likely to be reliable (or not), while informing a model's critical appraisal (risk of bias rating), fairness, and further validation requirements. |
Date Added | 7/20/2023, 2:44:28 PM |
Modified | 7/20/2023, 2:45:05 PM |
Item Type | Journal Article |
---|---|
Author | Cécile Proust-Lima |
Author | Tiphaine Saulnier |
Author | Viviane Philipps |
Author | Anne Pavy-Le Traon |
Author | Patrice Péran |
Author | Olivier Rascol |
Author | Wassilios G. Meissner |
Author | Alexandra Foubert-Samier |
URL | https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.9844 |
Rights | © 2023 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. |
Volume | n/a |
Issue | n/a |
Publication | Statistics in Medicine |
ISSN | 1097-0258 |
Extra | _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.9844 |
DOI | 10.1002/sim.9844 |
Accessed | 7/20/2023, 2:43:27 PM |
Library Catalog | Wiley Online Library |
Language | en |
Abstract | Neurodegenerative diseases are characterized by numerous markers of progression and clinical endpoints. For instance, multiple system atrophy (MSA), a rare neurodegenerative synucleinopathy, is characterized by various combinations of progressive autonomic failure and motor dysfunction, and a very poor prognosis. Describing the progression of such complex and multi-dimensional diseases is particularly difficult. One has to simultaneously account for the assessment of multivariate markers over time, the occurrence of clinical endpoints, and a highly suspected heterogeneity between patients. Yet, such description is crucial for understanding the natural history of the disease, staging patients diagnosed with the disease, unravelling subphenotypes, and predicting the prognosis. Through the example of MSA progression, we show how a latent class approach modeling multiple repeated markers and clinical endpoints can help describe complex disease progression and identify subphenotypes for exploring new pathological hypotheses. The proposed joint latent class model includes class-specific multivariate mixed models to handle multivariate repeated biomarkers possibly summarized into latent dimensions and class-and-cause-specific proportional hazard models to handle time-to-event data. Maximum likelihood estimation procedure, validated through simulations is available in the lcmm R package. In the French MSA cohort comprising data of 598 patients during up to 13 years, five subphenotypes of MSA were identified that differ by the sequence and shape of biomarkers degradation, and the associated risk of death. In posterior analyses, the five subphenotypes were used to explore the association between clinical progression and external imaging and fluid biomarkers, while properly accounting for the uncertainty in the subphenotypes membership. |
Date Added | 7/20/2023, 2:43:27 PM |
Modified | 7/20/2023, 2:44:07 PM |
Item Type | Journal Article |
---|---|
Author | Ellicott C. Matthay |
Author | M. Maria Glymour |
URL | https://journals.lww.com/epidem/Fulltext/2020/05000/A_Graphical_Catalog_of_Threats_to_Validity_.11.aspx |
Volume | 31 |
Issue | 3 |
Pages | 376 |
Publication | Epidemiology |
ISSN | 1044-3983 |
Date | May 2020 |
DOI | 10.1097/EDE.0000000000001161 |
Accessed | 7/18/2023, 6:08:06 AM |
Library Catalog | journals.lww.com |
Language | en-US |
Abstract | Directed acyclic graphs (DAGs), a prominent tool for expressing assumptions in epidemiologic research, are most useful when the hypothetical data generating structure is correctly encoded. Understanding a study’s data generating structure and translating that data structure into a DAG can be challenging, but these skills are often glossed over in training. Campbell and Stanley’s framework for causal inference has been extraordinarily influential in social science training programs but has received less attention in epidemiology. Their work, along with subsequent revisions and enhancements based on practical experience conducting empirical studies, presents a catalog of 37 threats to validity describing reasons empirical studies may fail to deliver causal effects. We interpret most of these threats to study validity as suggestions for common causal structures. Threats are organized into issues of statistical conclusion validity, internal validity, construct validity, or external validity. To assist epidemiologists in drawing the correct DAG for their application, we map the correspondence between threats to validity and epidemiologic concepts that can be represented with DAGs. Representing these threats as DAGs makes them amenable to formal analysis with d-separation rules and breaks down cross-disciplinary language barriers in communicating methodologic issues. |
Short Title | A Graphical Catalog of Threats to Validity |
Date Added | 7/18/2023, 6:10:59 AM |
Modified | 7/18/2023, 6:10:59 AM |
Item Type | Journal Article |
---|---|
Author | Jenny Devenport |
Author | Alexander Schacht |
Author | The Launch & Lifecycle Special Interest Group within Psi |
URL | https://onlinelibrary.wiley.com/doi/abs/10.1002/pst.2325 |
Rights | © 2023 John Wiley & Sons Ltd. |
Volume | n/a |
Issue | n/a |
Publication | Pharmaceutical Statistics |
ISSN | 1539-1612 |
Extra | _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/pst.2325 |
DOI | 10.1002/pst.2325 |
Accessed | 7/12/2023, 12:48:50 PM |
Library Catalog | Wiley Online Library |
Language | en |
Abstract | The role and value of statistical contributions in drug development up to the point of health authority approval are well understood. But health authority approval is only a true ‘win’ if the evidence enables access and adoption into clinical practice. In today's complex and evolving healthcare environment, there is additional strategic evidence generation, communication, and decision support that can benefit from statistical contributions. In this article, we describe the history of medical affairs in the context of drug development, the factors driving post-approval evidence generation needs, and the opportunities for statisticians to optimize evidence generation for stakeholders beyond health authorities in order to ensure that new medicines reach appropriate patients. |
Short Title | Leading beyond regulatory approval |
Date Added | 7/12/2023, 12:48:50 PM |
Modified | 7/12/2023, 12:49:09 PM |
Item Type | Journal Article |
---|---|
Author | Farhad Hatami |
Author | Alex Ocampo |
Author | Gordon Graham |
Author | Thomas E Nichols |
Author | Habib Ganjgahi |
URL | https://doi.org/10.1093/biostatistics/kxad012 |
Pages | kxad012 |
Publication | Biostatistics |
ISSN | 1465-4644 |
Date | 2023-07-11 |
Journal Abbr | Biostatistics |
DOI | 10.1093/biostatistics/kxad012 |
Accessed | 7/12/2023, 12:46:39 PM |
Library Catalog | Silverchair |
Abstract | Existing methods for fitting continuous time Markov models (CTMM) in the presence of covariates suffer from scalability issues due to high computational cost of matrix exponentials calculated for each observation. In this article, we propose an optimization technique for CTMM which uses a stochastic gradient descent algorithm combined with differentiation of the matrix exponential using a Padé approximation. This approach makes fitting large scale data feasible. We present two methods for computing standard errors, one novel approach using the Padé expansion and the other using power series expansion of the matrix exponential. Through simulations, we find improved performance relative to existing CTMM methods, and we demonstrate the method on the large-scale multiple sclerosis NO.MS data set. |
Date Added | 7/12/2023, 12:46:39 PM |
Modified | 7/12/2023, 12:47:13 PM |
Develops an approximation to matrix exponential derivative
Item Type | Journal Article |
---|---|
Author | Yanxun Xu |
Author | Peter Müller |
Author | Abdus S. Wahed |
Author | Peter F. Thall |
URL | https://doi.org/10.1080/01621459.2015.1086353 |
Volume | 111 |
Issue | 515 |
Pages | 921-950 |
Publication | Journal of the American Statistical Association |
ISSN | 0162-1459 |
Date | 2016-07-02 |
Extra | Publisher: Taylor & Francis _eprint: https://doi.org/10.1080/01621459.2015.1086353 PMID: 28018015 |
DOI | 10.1080/01621459.2015.1086353 |
Accessed | 7/11/2023, 9:52:33 AM |
Library Catalog | Taylor and Francis+NEJM |
Abstract | We analyze a dataset arising from a clinical trial involving multi-stage chemotherapy regimes for acute leukemia. The trial design was a 2 × 2 factorial for frontline therapies only. Motivated by the idea that subsequent salvage treatments affect survival time, we model therapy as a dynamic treatment regime (DTR), that is, an alternating sequence of adaptive treatments or other actions and transition times between disease states. These sequences may vary substantially between patients, depending on how the regime plays out. To evaluate the regimes, mean overall survival time is expressed as a weighted average of the means of all possible sums of successive transitions times. We assume a Bayesian nonparametric survival regression model for each transition time, with a dependent Dirichlet process prior and Gaussian process base measure (DDP-GP). Posterior simulation is implemented by Markov chain Monte Carlo (MCMC) sampling. We provide general guidelines for constructing a prior using empirical Bayes methods. The proposed approach is compared with inverse probability of treatment weighting, including a doubly robust augmented version of this approach, for both single-stage and multi-stage regimes with treatment assignment depending on baseline covariates. The simulations show that the proposed nonparametric Bayesian approach can substantially improve inference compared to existing methods. An R program for implementing the DDP-GP-based Bayesian nonparametric analysis is freely available at www.ams.jhu.edu/yxu70. Supplementary materials for this article are available online. |
Date Added | 7/11/2023, 9:52:33 AM |
Modified | 7/11/2023, 9:53:23 AM |
Item Type | Journal Article |
---|---|
Author | Stijn Hawinkel |
Author | Willem Waegeman |
Author | Steven Maere |
URL | https://doi.org/10.1080/00031305.2023.2216252 |
Volume | 0 |
Issue | 0 |
Pages | 1-11 |
Publication | The American Statistician |
ISSN | 0003-1305 |
Date | 2023-05-25 |
Extra | Publisher: Taylor & Francis _eprint: https://doi.org/10.1080/00031305.2023.2216252 |
DOI | 10.1080/00031305.2023.2216252 |
Accessed | 7/6/2023, 2:04:35 PM |
Library Catalog | Taylor and Francis+NEJM |
Abstract | Out-of-sample prediction is the acid test of predictive models, yet an independent test dataset is often not available for assessment of the prediction error. For this reason, out-of-sample performance is commonly estimated using data splitting algorithms such as cross-validation or the bootstrap. For quantitative outcomes, the ratio of variance explained to total variance can be summarized by the coefficient of determination or in-sample R2, which is easy to interpret and to compare across different outcome variables. As opposed to in-sample R2, out-of-sample R2 has not been well defined and the variability on out-of-sample R̂2 has been largely ignored. Usually only its point estimate is reported, hampering formal comparison of predictability of different outcome variables. Here we explicitly define out-of-sample R2 as a comparison of two predictive models, provide an unbiased estimator and exploit recent theoretical advances on uncertainty of data splitting estimates to provide a standard error for R̂2. The performance of the estimators for R2 and its standard error are investigated in a simulation study. We demonstrate our new method by constructing confidence intervals and comparing models for prediction of quantitative Brassica napus and Zea mays phenotypes based on gene expression data. Our method is available in the R-package oosse. |
Short Title | Out-of-Sample R2 |
Date Added | 7/6/2023, 2:04:35 PM |
Modified | 7/6/2023, 2:05:00 PM |
Item Type | Journal Article |
---|---|
Author | Sandra Ofori |
Author | Teresa Cafaro |
Author | P. J. Devereaux |
Author | Maura Marcucci |
Author | Lawrence Mbuagbaw |
Author | Lehana Thabane |
Author | Gordon Guyatt |
URL | https://www.jclinepi.com/article/S0895-4356(23)00169-5/fulltext?rss=yes |
Volume | 0 |
Issue | 0 |
Publication | Journal of Clinical Epidemiology |
ISSN | 0895-4356, 1878-5921 |
Date | 2023-07-06 |
Extra | Publisher: Elsevier |
Journal Abbr | Journal of Clinical Epidemiology |
DOI | 10.1016/j.jclinepi.2023.06.022 |
Accessed | 7/6/2023, 2:03:24 PM |
Library Catalog | www.jclinepi.com |
Language | English |
Date Added | 7/6/2023, 2:03:24 PM |
Modified | 7/6/2023, 2:03:41 PM |
Item Type | Journal Article |
---|---|
Author | Elinor Curnow |
Author | James R. Carpenter |
Author | Jon E. Heron |
Author | Rosie P. Cornish |
Author | Stefan Rach |
Author | Vanessa Didelez |
Author | Malte Langeheine |
Author | Kate Tilling |
URL | https://www.jclinepi.com/article/S0895-4356(23)00158-0/fulltext?rss=yes |
Volume | 0 |
Issue | 0 |
Publication | Journal of Clinical Epidemiology |
ISSN | 0895-4356, 1878-5921 |
Date | 2023-06-19 |
Extra | Publisher: Elsevier |
Journal Abbr | Journal of Clinical Epidemiology |
DOI | 10.1016/j.jclinepi.2023.06.011 |
Accessed | 6/19/2023, 1:38:19 PM |
Library Catalog | www.jclinepi.com |
Language | English |
Short Title | Multiple imputation of missing data under missing at random |
Date Added | 6/19/2023, 1:38:19 PM |
Modified | 6/19/2023, 1:39:09 PM |
If relationships between variables are nonlinear and the imputation model assumes they are linear, multiple imputation may not work well.
Item Type | Journal Article |
---|---|
Author | Pierre-Emmanuel Poulet |
Author | Stanley Durrleman |
URL | https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.9770 |
Rights | © 2023 John Wiley & Sons Ltd. |
Volume | n/a |
Issue | n/a |
Publication | Statistics in Medicine |
ISSN | 1097-0258 |
Extra | _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.9770 |
DOI | 10.1002/sim.9770 |
Accessed | 5/30/2023, 7:02:49 AM |
Library Catalog | Wiley Online Library |
Language | en |
Abstract | Disease modeling is an essential tool to describe disease progression and its heterogeneity across patients. Usual approaches use continuous data such as biomarkers to assess progression. Nevertheless, categorical or ordinal data such as item responses in questionnaires also provide insightful information about disease progression. In this work, we propose a disease progression model for ordinal and categorical data. We built it on the principles of disease course mapping, a technique that uniquely describes the variability in both the dynamics of progression and disease heterogeneity from multivariate longitudinal data. This extension can also be seen as an attempt to bridge the gap between longitudinal multivariate models and the field of item response theory. Application to the Parkinson's progression markers initiative cohort illustrates the benefits of our approach: a fine-grained description of disease progression at the item level, as compared to the aggregated total score, together with improved predictions of the patient's future visits. The analysis of the heterogeneity across individual trajectories highlights known disease trends such as tremor dominant or postural instability and gait difficulties subtypes of Parkinson's disease. |
Date Added | 5/30/2023, 7:02:49 AM |
Modified | 5/30/2023, 7:03:47 AM |
Item Type | Journal Article |
---|---|
Author | Marco Riani |
Author | Anthony C. Atkinson |
Author | Aldo Corbellini |
URL | https://doi.org/10.1080/10618600.2023.2205447 |
Pages | 1-16 |
Publication | Journal of Computational and Graphical Statistics |
ISSN | 1061-8600 |
Date | 2023-04-20 |
Extra | Publisher: Taylor & Francis |
Journal Abbr | Journal of Computational and Graphical Statistics |
DOI | 10.1080/10618600.2023.2205447 |
Date Added | 5/30/2023, 7:00:38 AM |
Modified | 5/30/2023, 7:01:17 AM |
doi: 10.1080/10618600.2023.2205447
Item Type | Journal Article |
---|---|
Author | Hyunwoo Kim |
Author | Hamad Shahbal |
Author | Sameer Parpia |
Author | Tauben Averbuch |
Author | Harriette G. C. Van Spall |
Author | Lehana Thabane |
Author | Jinhui Ma |
URL | https://www.jclinepi.com/article/S0895-4356(23)00128-2/fulltext?rss=yes |
Volume | 0 |
Issue | 0 |
Publication | Journal of Clinical Epidemiology |
ISSN | 0895-4356, 1878-5921 |
Date | 2023-05-26 |
Extra | Publisher: Elsevier PMID: 37245700 |
Journal Abbr | Journal of Clinical Epidemiology |
DOI | 10.1016/j.jclinepi.2023.05.015 |
Accessed | 5/30/2023, 6:48:52 AM |
Library Catalog | www.jclinepi.com |
Language | English |
Short Title | Trials using composite outcomes neglect the presence of competing risks |
Date Added | 5/30/2023, 6:48:52 AM |
Modified | 5/30/2023, 6:50:17 AM |
Item Type | Journal Article |
---|---|
Author | Alexander Pate |
Author | Matthew Sperrin |
Author | Richard D. Riley |
Author | Jamie C. Sergeant |
Author | Tjeerd Van Staa |
Author | Niels Peek |
Author | Mamas A. Mamas |
Author | Gregory Y. H. Lip |
Author | Martin O'Flaherty |
Author | Iain Buchan |
Author | Glen P. Martin |
URL | https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.9771 |
Volume | n/a |
Issue | n/a |
Publication | Statistics in Medicine |
ISSN | 1097-0258 |
Extra | _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.9771 |
DOI | 10.1002/sim.9771 |
Accessed | 5/24/2023, 6:55:14 AM |
Library Catalog | Wiley Online Library |
Language | en |
Abstract | Introduction This study considers the prediction of the time until two survival outcomes have both occurred. We compared a variety of analytical methods motivated by a typical clinical problem of multimorbidity prognosis. Methods We considered five methods: product (multiply marginal risks), dual-outcome (directly model the time until both events occur), multistate models (msm), and a range of copula and frailty models. We assessed calibration and discrimination under a variety of simulated data scenarios, varying outcome prevalence, and the amount of residual correlation. The simulation focused on model misspecification and statistical power. Using data from the Clinical Practice Research Datalink, we compared model performance when predicting the risk of cardiovascular disease and type 2 diabetes both occurring. Results Discrimination was similar for all methods. The product method was poorly calibrated in the presence of residual correlation. The msm and dual-outcome models were the most robust to model misspecification but suffered a drop in performance at small sample sizes due to overfitting, which the copula and frailty model were less susceptible to. The copula and frailty model's performance were highly dependent on the underlying data structure. In the clinical example, the product method was poorly calibrated when adjusting for 8 major cardiovascular risk factors. Discussion We recommend the dual-outcome method for predicting the risk of two survival outcomes both occurring. It was the most robust to model misspecification, although was also the most prone to overfitting. The clinical example motivates the use of the methods considered in this study. |
Short Title | Developing prediction models to estimate the risk of two survival outcomes both occurring |
Date Added | 5/24/2023, 6:55:14 AM |
Modified | 5/24/2023, 6:56:15 AM |
Item Type | Journal Article |
---|---|
Author | Anna Lohmann |
Author | Rolf H. H. Groenwold |
Author | Maarten van Smeden |
URL | https://onlinelibrary.wiley.com/doi/abs/10.1002/bimj.202200108 |
Volume | n/a |
Issue | n/a |
Pages | 2200108 |
Publication | Biometrical Journal |
ISSN | 1521-4036 |
Date | 2023 |
Extra | _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/bimj.202200108 |
DOI | 10.1002/bimj.202200108 |
Accessed | 5/19/2023, 5:39:06 PM |
Library Catalog | Wiley Online Library |
Language | en |
Abstract | Logistic regression is one of the most commonly used approaches to develop clinical risk prediction models. Developers of such models often rely on approaches that aim to minimize the risk of overfitting and improve predictive performance of the logistic model, such as through likelihood penalization and variance decomposition techniques. We present an extensive simulation study that compares the out-of-sample predictive performance of risk prediction models derived using the elastic net, with Lasso and ridge as special cases, and variance decomposition techniques, namely, incomplete principal component regression and incomplete partial least squares regression. We varied the expected events per variable, event fraction, number of candidate predictors, presence of noise predictors, and the presence of sparse predictors in a full-factorial design. Predictive performance was compared on measures of discrimination, calibration, and prediction error. Simulation metamodels were derived to explain the performance differences within model derivation approaches. Our results indicate that, on average, prediction models developed using penalization and variance decomposition approaches outperform models developed using ordinary maximum likelihood estimation, with penalization approaches being consistently superior over the variance decomposition approaches. Differences in performance were most pronounced on the calibration of the model. Performance differences regarding prediction error and concordance statistic outcomes were often small between approaches. The use of likelihood penalization and variance decomposition techniques methods was illustrated in the context of peripheral arterial disease. |
Short Title | Comparison of likelihood penalization and variance decomposition approaches for clinical prediction models |
Date Added | 5/19/2023, 5:39:06 PM |
Modified | 5/19/2023, 5:40:48 PM |
Item Type | Journal Article |
---|---|
Author | Stephen Bates |
Author | Trevor Hastie |
Author | Robert Tibshirani |
URL | https://doi.org/10.1080/01621459.2023.2197686 |
Volume | 0 |
Issue | 0 |
Pages | 1-12 |
Publication | Journal of the American Statistical Association |
ISSN | 0162-1459 |
Date | 2023-04-03 |
Extra | Publisher: Taylor & Francis _eprint: https://doi.org/10.1080/01621459.2023.2197686 |
DOI | 10.1080/01621459.2023.2197686 |
Accessed | 5/16/2023, 4:30:05 PM |
Library Catalog | Taylor and Francis+NEJM |
Abstract | Cross-validation is a widely used technique to estimate prediction error, but its behavior is complex and not fully understood. Ideally, one would like to think that cross-validation estimates the prediction error for the model at hand, fit to the training data. We prove that this is not the case for the linear model fit by ordinary least squares; rather it estimates the average prediction error of models fit on other unseen training sets drawn from the same population. We further show that this phenomenon occurs for most popular estimates of prediction error, including data splitting, bootstrapping, and Mallow’s Cp. Next, the standard confidence intervals for prediction error derived from cross-validation may have coverage far below the desired level. Because each data point is used for both training and testing, there are correlations among the measured accuracies for each fold, and so the usual estimate of variance is too small. We introduce a nested cross-validation scheme to estimate this variance more accurately, and show empirically that this modification leads to intervals with approximately correct coverage in many examples where traditional cross-validation intervals fail. Lastly, our analysis also shows that when producing confidence intervals for prediction accuracy with simple data splitting, one should not refit the model on the combined data, since this invalidates the confidence intervals. Supplementary materials for this article are available online. |
Short Title | Cross-Validation |
Date Added | 5/16/2023, 4:30:05 PM |
Modified | 5/16/2023, 4:30:24 PM |