• A Cornucopia of Maximum Likelihood Algorithms

    Item Type Journal Article
    Author Kenneth Lange
    Author Xun-Jian Li
    Author Hua Zhou
    Date 2025-08-04
    Language en
    Library Catalog DOI.org (Crossref)
    URL https://www.tandfonline.com/doi/full/10.1080/00031305.2025.2526535
    Accessed 8/10/2025, 4:22:59 PM
    Pages 1-11
    Publication The American Statistician
    DOI 10.1080/00031305.2025.2526535
    Journal Abbr The American Statistician
    ISSN 0003-1305, 1537-2731
    Date Added 8/10/2025, 4:22:59 PM
    Modified 8/10/2025, 4:23:46 PM

    Tags:

    • mle
    • computing
    • maximum-likelihood-estimation
    • optimization
  • Bayesian Expectile Joint Model With Varying Coefficient for Longitudinal and Semi‐Competing Risks Data

    Item Type Journal Article
    Author Feng Gu
    Author Jiaqing Chen
    Author Jinjing Wang
    Author Yibo Long
    Author Xiaofan Wang
    Author Yangxin Huang
    Abstract ABSTRACT In the realm of clinical medical research, semi‐competing risks data are usually observed in practice, yet there are few studies on the joint models of longitudinal and semi‐competing risks data. In this paper, a joint model for longitudinal and semi‐competing risks data is proposed. Based on the expectile regression, a linear mixed‐effects longitudinal sub‐model is formulated, and a Cox proportional hazards survival sub‐model is considered under the framework of semi‐competing risks. The two sub‐models are linked by a shared longitudinal trajectory function. To accommodate the time‐varying relationship between the longitudinal response variable and covariates, as well as to introduce flexibility to the structural linkage between longitudinal and survival processes, we incorporate the time‐varying coefficients into the joint model in the form of nonparametric functions. The simultaneous Bayesian inference method is utilized to estimate the model parameters, which not only overcomes the convergence problem, but also improves the accuracy of the parameter estimation while effectively reducing the computational burden. The simulation studies are conducted to assess the performance of the proposed joint model and methodology. Finally, we analyze a dataset from the Multicenter AIDS Cohort Study to illustrate the real application of the proposed model and method. In both simulation studies and empirical analyses, joint modeling methods demonstrate performance that meets expected effects.
    Date 08/2025
    Language en
    Library Catalog DOI.org (Crossref)
    URL https://onlinelibrary.wiley.com/doi/10.1002/sim.70219
    Accessed 8/10/2025, 4:35:09 PM
    Volume 44
    Pages e70219
    Publication Statistics in Medicine
    DOI 10.1002/sim.70219
    Issue 18-19
    Journal Abbr Statistics in Medicine
    ISSN 0277-6715, 1097-0258
    Date Added 8/10/2025, 4:35:09 PM
    Modified 8/10/2025, 4:36:21 PM

    Tags:

    • rct
    • multiple-endpoints
    • competing-risk
    • bayes
    • cox-model
    • shared-parameter
    • expectile
  • Combining Missing Data Imputation and Internal Validation in Clinical Risk Prediction Models

    Item Type Journal Article
    Author Junhui Mi
    Author Rahul D. Tendulkar
    Author Sarah M. C. Sittenfeld
    Author Sujata Patil
    Author Emily C. Zabor
    Abstract ABSTRACT Methods to handle missing data have been extensively explored in the context of estimation and descriptive studies, with multiple imputation being the most widely used method in clinical research. However, in the context of clinical risk prediction models, where the goal is often to achieve high prediction accuracy and to make predictions for future patients, there are different considerations regarding the handling of missing covariate data. As a result, deterministic imputation is better suited to the setting of clinical risk prediction models, since the outcome is not included in the imputation model and the imputation method can be easily applied to future patients. In this paper, we provide a tutorial demonstrating how to conduct bootstrapping followed by deterministic imputation of missing covariate data to construct and internally validate the performance of a clinical risk prediction model in the presence of missing data. Simulation study results are provided to help guide when imputation may be appropriate in real‐world applications.
    Date 08/2025
    Language en
    Library Catalog DOI.org (Crossref)
    URL https://onlinelibrary.wiley.com/doi/10.1002/sim.70203
    Accessed 8/10/2025, 4:31:30 PM
    Volume 44
    Pages e70203
    Publication Statistics in Medicine
    DOI 10.1002/sim.70203
    Issue 18-19
    Journal Abbr Statistics in Medicine
    ISSN 0277-6715, 1097-0258
    Date Added 8/10/2025, 4:31:30 PM
    Modified 8/10/2025, 4:32:40 PM

    Tags:

    • bootstrap
    • validation
    • multiple-imputation
    • missing
  • Combining multiple imputation with internal model validation in clinical prediction modeling: a systematic methodological review

    Item Type Journal Article
    Author Sinclair Awounvo
    Author Meinhard Kieser
    Author Manuel Feißt
    Date 8/2025
    Language en
    Short Title Combining multiple imputation with internal model validation in clinical prediction modeling
    Library Catalog DOI.org (Crossref)
    URL https://linkinghub.elsevier.com/retrieve/pii/S0895435625002495
    Accessed 8/10/2025, 4:51:27 PM
    Pages 111916
    Publication Journal of Clinical Epidemiology
    DOI 10.1016/j.jclinepi.2025.111916
    Journal Abbr Journal of Clinical Epidemiology
    ISSN 08954356
    Date Added 8/10/2025, 4:51:27 PM
    Modified 8/10/2025, 4:52:00 PM

    Tags:

    • validation
    • multiple-imputation
    • review
    • missing
  • Complementary strengths of the Neyman-Rubin and graphical causal frameworks

    Item Type Preprint
    Author Tetiana Gorbach
    Author Xavier de Luna
    Author Juha Karvanen
    Author Ingeborg Waernbaum
    Abstract This article contributes to the discussion on the relationship between the Neyman-Rubin and the graphical frameworks for causal inference. We present specific examples of data-generating mechanisms - such as those involving undirected or deterministic relationships and cycles - where analyses using a directed acyclic graph are challenging, but where the tools from the Neyman-Rubin causal framework are readily applicable. We also provide examples of data-generating mechanisms with M-bias, trapdoor variables, and complex front-door structures, where the application of the Neyman-Rubin approach is complicated, but the graphical approach is directly usable. The examples offer insights into commonly used causal inference frameworks and aim to improve comprehension of the languages for causal reasoning among a broad audience.
    Date 2025
    Library Catalog DOI.org (Datacite)
    URL https://arxiv.org/abs/2512.09130
    Accessed 12/11/2025, 8:27:21 AM
    Rights Creative Commons Attribution 4.0 International
    Extra Version Number: 1
    DOI 10.48550/ARXIV.2512.09130
    Repository arXiv
    Date Added 12/11/2025, 8:27:21 AM
    Modified 12/11/2025, 8:28:25 AM

    Tags:

    • causal
    • Neyman-Pearson

    Notes:

    • <h2>Other</h2> Under consideration at The American Statistician; not yet accepted

  • Dealing with continuous variables and modelling non-linear associations in healthcare data: practical guide

    Item Type Journal Article
    Author Pedro Lopez-Ayala
    Author Richard D Riley
    Author Gary S Collins
    Author Tobias Zimmermann
    Date 2025-07-16
    Language en
    Short Title Dealing with continuous variables and modelling non-linear associations in healthcare data
    Library Catalog Crossref
    URL https://www.bmj.com/lookup/doi/10.1136/bmj-2024-082440
    Accessed 7/16/2025, 1:29:28 PM
    Rights http://www.bmj.com/company/legal-information/terms-conditions/legal-information/tdm-licencepolicy
    Extra Publisher: BMJ
    Volume 390
    Pages e082440
    Publication BMJ
    DOI 10.1136/bmj-2024-082440
    ISSN 1756-1833
    Date Added 7/16/2025, 1:29:28 PM
    Modified 7/16/2025, 1:30:25 PM

    Tags:

    • teaching-mds
    • regression
    • categorization-of-continuous-variables
    • categorization
    • spline
    • fractional-polynomial
  • Developing clinical prediction models: a step-by-step guide

    Item Type Journal Article
    Author Orestis Efthimiou
    Author Michael Seo
    Author Konstantina Chalkou
    Author Thomas Debray
    Author Matthias Egger
    Author Georgia Salanti
    Date 2024-09-03
    Language en
    Short Title Developing clinical prediction models
    Library Catalog DOI.org (Crossref)
    URL https://www.bmj.com/lookup/doi/10.1136/bmj-2023-078276
    Accessed 7/29/2025, 5:12:14 PM
    Pages e078276
    Publication BMJ
    DOI 10.1136/bmj-2023-078276
    Journal Abbr BMJ
    ISSN 1756-1833
    Date Added 7/29/2025, 5:12:14 PM
    Modified 7/29/2025, 5:13:27 PM

    Tags:

    • teaching-mds
    • variable-selection
    • bootstrap
    • validation
    • design
    • strategy
    • rms
  • Explainable AI in healthcare: to explain, to predict, or to describe?

    Item Type Journal Article
    Author Alex Carriero
    Author Anne De Hond
    Author Bram Cappers
    Author Fernando Paulovich
    Author Sanne Abeln
    Author Karel Gm Moons
    Author Maarten Van Smeden
    Date 2025-12-05
    Language en
    Short Title Explainable AI in healthcare
    Library Catalog DOI.org (Crossref)
    URL https://diagnprognres.biomedcentral.com/articles/10.1186/s41512-025-00213-8
    Accessed 12/5/2025, 7:59:48 AM
    Volume 9
    Pages 29
    Publication Diagnostic and Prognostic Research
    DOI 10.1186/s41512-025-00213-8
    Issue 1
    Journal Abbr Diagn Progn Res
    ISSN 2397-7523
    Date Added 12/5/2025, 7:59:48 AM
    Modified 12/5/2025, 8:00:45 AM

    Tags:

    • causal-inference
    • variable-importance
    • causal
    • explainable-ai

    Notes:

    • Figure 1 is a nice summary of colliders, confounders, etc.

  • Hazards Constitute Key Quantities for Analyzing, Interpreting and Understanding Time‐to‐Event Data

    Item Type Journal Article
    Author Jan Beyersmann
    Author Claudia Schmoor
    Author Martin Schumacher
    Abstract ABSTRACT Censoring makes time‐to‐event data special and requires customized statistical techniques. Survival and event history analysis therefore builds on hazards as the identifiable quantities in the presence of rather general censoring schemes. The reason is that hazards are conditional quantities, given previous survival, which enables estimation based on the current risk set—those still alive and under observation. But it is precisely their conditional nature that has made hazards subject of critique from a causal perspective: A beneficial treatment will help patients survive longer than had they remained untreated. Hence, in a randomized trial, randomization is broken in later risk sets, which, however, are the basis for statistical inference. We survey this dilemma—after all, mapping analyses of hazards onto probabilities in randomized trials is viewed as still having a causal interpretation—and argue that a causal interpretation is possible taking a functional point of view. We illustrate matters with examples from benefit–risk assessment: Prolonged survival may lead to more adverse events, but this need not imply a worse safety profile of the novel treatment. These examples illustrate that the situation at hand is conveniently parameterized using hazards, that the need to use survival techniques is not always fully appreciated and that censoring not necessarily leads to the question of “what, if no censoring?” The discussion should concentrate on how to correctly interpret causal hazard contrasts and analyses of hazards should routinely be translated onto probabilities.
    Date 06/2025
    Language en
    Library Catalog DOI.org (Crossref)
    URL https://onlinelibrary.wiley.com/doi/10.1002/bimj.70057
    Accessed 6/7/2025, 12:55:03 AM
    Volume 67
    Pages e70057
    Publication Biometrical Journal
    DOI 10.1002/bimj.70057
    Issue 3
    Journal Abbr Biometrical J
    ISSN 0323-3847, 1521-4036
    Date Added 6/7/2025, 12:55:03 AM
    Modified 6/7/2025, 12:55:57 AM

    Tags:

    • survival
    • causal-risk-difference
    • hazard-function
    • causal-effects
    • causality
    • itt
    • causal-analysis
  • Improving the Modeling of Binary Response Regression Based on New Proposals for Statistical Diagnostics With Applications to Medical Data

    Item Type Journal Article
    Author Manuel Galea
    Author Mónica Catalán
    Author Alejandra Tapia
    Author Viviana Giampaoli
    Author Víctor Leiva
    Abstract ABSTRACT Binary regression models utilizing logit or probit link functions have been extensively employed for examining the relationship between binary responses and covariates, particularly in medicine. Nonetheless, an erroneous specification of the link function may result in poor model fitting and compromise the statistical significance of covariate effects. In this study, we introduce a diagnostic method associated with a novel family of link functions enabling the assessment of sensitivity for symmetric links in relation to their asymmetric counterparts. This new family offers a comprehensive model encompassing nested symmetric cases. Our method proves beneficial in modeling medical data, especially when evaluating the sensitivity of the commonly used logit link function, prized for its interpretability via odds ratio. Moreover, our method advocates a general link based on the logit function when a standard link is unsatisfactory. We employ likelihood‐based methods to estimate parameters of the general model and conduct local influence analysis under the case‐weight perturbation scheme. Regarding local influence, we emphasize the relevance of employing appropriate perturbations to avoid misleading outcomes. Additionally, we introduce a diagnostic method for local influence, assessing the sensitivity of odds ratio under two perturbation schemes. Monte Carlo simulations are conducted to evaluate both the diagnostic method performance and parameter estimation of the general model, supplemented by illustrations using medical data related to menstruation and respiratory problems. The results confirm the efficacy of our proposal, highlighting the critical role of statistical diagnostics in modeling.
    Date 06/2025
    Language en
    Library Catalog DOI.org (Crossref)
    URL https://onlinelibrary.wiley.com/doi/10.1002/sim.70073
    Accessed 6/7/2025, 1:00:38 AM
    Volume 44
    Pages e70073
    Publication Statistics in Medicine
    DOI 10.1002/sim.70073
    Issue 13-14
    Journal Abbr Statistics in Medicine
    ISSN 0277-6715, 1097-0258
    Date Added 6/7/2025, 1:00:38 AM
    Modified 6/7/2025, 1:01:04 AM

    Tags:

    • binary-data
    • logistic
    • probit
    • link-function
  • LASSO-Based Survival Prediction Modeling with Multiply Imputed Data: A Case Study in Tuberculosis Mortality Prediction

    Item Type Journal Article
    Author Md. Belal Hossain
    Author Mohsen Sadatsafavi
    Author James C. Johnston
    Author Hubert Wong
    Author Victoria J. Cook
    Author Mohammad Ehsanul Karim
    Date 2025-08-04
    Language en
    Short Title LASSO-Based Survival Prediction Modeling with Multiply Imputed Data
    Library Catalog DOI.org (Crossref)
    URL https://www.tandfonline.com/doi/full/10.1080/00031305.2025.2526545
    Accessed 8/10/2025, 4:29:44 PM
    Pages 1-12
    Publication The American Statistician
    DOI 10.1080/00031305.2025.2526545
    Journal Abbr The American Statistician
    ISSN 0003-1305, 1537-2731
    Date Added 8/10/2025, 4:29:44 PM
    Modified 8/10/2025, 4:30:16 PM

    Tags:

    • cox-model
    • multiple-imputation
    • lasso
    • missing
    • stacking
  • Motivating Sample Sizes in Adaptive Phase I Trials Via Bayesian Posterior Credible Intervals

    Item Type Journal Article
    Author Thomas M. Braun
    Abstract Summary In contrast with typical Phase III clinical trials, there is little existing methodology for determining the appropriate numbers of patients to enroll in adaptive Phase I trials. And, as stated by Dennis Lindley in a more general context, “[t]he simple practical question of ‘What size of sample should I take’ is often posed to a statistician, and it is a question that is embarrassingly difficult to answer.” Historically, simulation has been the primary option for determining sample sizes for adaptive Phase I trials, and although useful, can be problematic and time-consuming when a sample size is needed relatively quickly. We propose a computationally fast and simple approach that uses Beta distributions to approximate the posterior distributions of DLT rates of each dose and determines an appropriate sample size through posterior coverage rates. We provide sample sizes produced by our methods for a vast number of realistic Phase I trial settings and demonstrate that our sample sizes are generally larger than those produced by a competing approach that is based upon the nonparametric optimal design.
    Date 2018-09-01
    Language en
    Library Catalog DOI.org (Crossref)
    URL https://academic.oup.com/biometrics/article/74/3/1065-1071/7525822
    Accessed 12/11/2025, 4:35:46 PM
    Rights https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model
    Volume 74
    Pages 1065-1071
    Publication Biometrics
    DOI 10.1111/biom.12872
    Issue 3
    ISSN 0006-341X, 1541-0420
    Date Added 12/11/2025, 4:35:46 PM
    Modified 12/11/2025, 4:37:26 PM

    Tags:

    • adaptive-design
    • bayes
    • drug-development
    • sample-size
  • Nonparametric Assessment of Variable Selection and Ranking Algorithms

    Item Type Journal Article
    Author Zhou Tang
    Author Ted Westling
    Date 2025-10-13
    Language en
    Library Catalog DOI.org (Crossref)
    URL https://www.tandfonline.com/doi/full/10.1080/10618600.2025.2547064
    Accessed 10/15/2025, 11:24:15 AM
    Pages 1-12
    Publication Journal of Computational and Graphical Statistics
    DOI 10.1080/10618600.2025.2547064
    Journal Abbr Journal of Computational and Graphical Statistics
    ISSN 1061-8600, 1537-2715
    Date Added 10/15/2025, 11:24:15 AM
    Modified 10/15/2025, 11:25:24 AM

    Tags:

    • variable-importance
    • ranking-selection
  • Novel Clinical Trial Design With Stratum‐Specific Endpoints and Global Test Methods for Rare Diseases With Heterogeneous Clinical Manifestations

    Item Type Journal Article
    Author Emily Shives
    Author Yared Gurmu
    Author Wonyul Lee
    Author Emily Morris
    Author Yan Wang
    Abstract ABSTRACT Many rare disease clinical trials are underpowered to detect a moderate treatment effect of an investigational product due to the limited number of participants available for the trials. In addition, given the complex, multisystemic nature of many rare diseases, it is challenging to confidently prespecify a single primary efficacy endpoint that is applicable to all trial participants with a heterogeneous clinical manifestation of their disease. Traditional trial designs and analysis methods often used in more common diseases to analyze the same endpoint(s) for all patients may be inefficient or impractical for a rare disease with heterogeneous clinical manifestations. To address these issues, we propose a novel trial design and analytic approach that allows for an evaluation of stratum‐specific efficacy endpoints in a broader population of participants. We develop several nonparametric global test methods that can accommodate the novel design and provide global evaluation of treatment effects. Using a case example in patients with Fabry disease, our simulation studies illustrate that the novel design evaluated using the global test methods may be more sensitive to detect a treatment effect compared to the traditional design that uses the same endpoint(s) for all patients.
    Date 08/2025
    Language en
    Library Catalog DOI.org (Crossref)
    URL https://onlinelibrary.wiley.com/doi/10.1002/sim.70206
    Accessed 8/10/2025, 4:37:45 PM
    Volume 44
    Pages e70206
    Publication Statistics in Medicine
    DOI 10.1002/sim.70206
    Issue 18-19
    Journal Abbr Statistics in Medicine
    ISSN 0277-6715, 1097-0258
    Date Added 8/10/2025, 4:37:45 PM
    Modified 8/10/2025, 4:38:13 PM

    Tags:

    • multiple-endpoints
    • global-test
    • rare-disease
  • On the Uses and Abuses of Regression Models: A Call for Reform of Statistical Practice and Teaching

    Item Type Journal Article
    Author John B. Carlin
    Author Margarita Moreno‐Betancur
    Abstract ABSTRACT Regression methods dominate the practice of biostatistical analysis, but biostatistical training emphasizes the details of regression models and methods ahead of the purposes for which such modeling might be useful. More broadly, statistics is widely understood to provide a body of techniques for “modeling data,” underpinned by what we describe as the “true model myth”: that the task of the statistician/data analyst is to build a model that closely approximates the true data generating process. By way of our own historical examples and a brief review of mainstream clinical research journals, we describe how this perspective has led to a range of problems in the application of regression methods, including misguided “adjustment” for covariates, misinterpretation of regression coefficients and the widespread fitting of regression models without a clear purpose. We then outline a new approach to the teaching and application of biostatistical methods, which situates them within a framework that first requires clear definition of the substantive research question at hand, within one of three categories: descriptive, predictive, or causal. Within this approach, the development and application of (multivariable) regression models, as well as other advanced biostatistical methods, should proceed differently according to the type of question. Regression methods will no doubt remain central to statistical practice as they provide a powerful tool for representing variation in a response or outcome variable as a function of “input” variables, but their conceptualization and usage should follow from the purpose at hand.
    Date 06/2025
    Language en
    Short Title On the Uses and Abuses of Regression Models
    Library Catalog DOI.org (Crossref)
    URL https://onlinelibrary.wiley.com/doi/10.1002/sim.10244
    Accessed 6/26/2025, 9:23:06 AM
    Volume 44
    Pages e10244
    Publication Statistics in Medicine
    DOI 10.1002/sim.10244
    Issue 13-14
    Journal Abbr Statistics in Medicine
    ISSN 0277-6715, 1097-0258
    Date Added 6/26/2025, 9:23:06 AM
    Modified 6/26/2025, 9:24:11 AM

    Tags:

    • regression
    • causal-inference
    • practice-guidelines
    • model
  • Resampling Methods with Multiply Imputed Data

    Item Type Journal Article
    Author Michael W Robbins
    Author Lane Burgette
    Abstract Abstract Resampling techniques have become increasingly popular for estimation of uncertainty. However, data are often fraught with missing values that are commonly imputed to facilitate analysis. This article addresses the issue of using resampling methods such as a jackknife or bootstrap in conjunction with imputations that have been sampled stochastically, in the vein of multiple imputation. We derive the theory needed to illustrate two key points regarding the use of resampling methods in lieu of traditional combining rules. First, imputations should be independently generated multiple times within each replicate group of a jackknife or bootstrap. Second, the number of multiply imputed datasets per replicate group must dramatically exceed the number of replicate groups for a jackknife; however, this is not the case in a bootstrap approach. We also discuss bias-adjusted analogues of the jackknife and bootstrap that are argued to require fewer imputed datasets. A simulation study is provided to support these theoretical conclusions.
    Date 2025-07-30
    Language en
    Library Catalog DOI.org (Crossref)
    URL https://academic.oup.com/biomet/advance-article/doi/10.1093/biomet/asaf059/8219454
    Accessed 7/31/2025, 12:14:17 PM
    Rights https://creativecommons.org/licenses/by-nc-nd/4.0/
    Pages asaf059
    Publication Biometrika
    DOI 10.1093/biomet/asaf059
    ISSN 0006-3444, 1464-3510
    Date Added 7/31/2025, 12:14:17 PM
    Modified 7/31/2025, 12:14:46 PM

    Tags:

    • resampling
    • bootstrap
    • imputation
    • missing
    • jackknife
  • Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses

    Item Type Journal Article
    Author Jeffrey D. Blume
    Author Lucy D’Agostino McGowan
    Author William D. Dupont
    Author Robert A. Greevy
    Editor Neil R. Smalheiser
    Date 2018-3-22
    Language en
    Short Title Second-generation p-values
    Library Catalog DOI.org (Crossref)
    URL https://dx.plos.org/10.1371/journal.pone.0188299
    Accessed 9/4/2025, 8:51:13 AM
    Volume 13
    Pages e0188299
    Publication PLOS ONE
    DOI 10.1371/journal.pone.0188299
    Issue 3
    Journal Abbr PLoS ONE
    ISSN 1932-6203
    Date Added 9/4/2025, 8:51:13 AM
    Modified 9/4/2025, 8:51:41 AM

    Tags:

    • multiplicity
    • confidence-intervals
    • inference
  • The Impact of Violation of the Proportional Hazards Assumption on the Calibration of the Cox Proportional Hazards Model

    Item Type Journal Article
    Author Peter C. Austin
    Author Daniele Giardiello
    Abstract ABSTRACT The Cox proportional hazards regression model is frequently used to develop clinical prediction models for time‐to‐event outcomes, allowing clinicians to estimate an individual's risk of experiencing the outcome within specified time horizons (e.g., estimate an individual's 10‐year risk of death). The Cox regression model models the association between covariates and the hazard of the outcome. A key assumption of the Cox model is the proportional hazards assumption: the ratio of the hazard function for any two individuals is constant over time, and the ratio is a function of only their covariates and the regression coefficients. Calibration is an important aspect of the validation of clinical prediction models. Calibration refers to the concordance between predicted and observed risk. The impact of the violation of the proportional hazards assumption on the calibration of clinical prediction models developed using the Cox model has not been examined. We conducted a set of Monte Carlo simulations to assess the impact of the magnitude of the violation of the proportional hazards assumption on the calibration of the Cox model. We compared the calibration of predictions obtained using a Cox regression model that ignored the violation of the proportional hazards assumption with those obtained using accelerated failure time (AFT) models, Royston and Parmar's spline‐based parametric survival models, and generalized linear models using pseudo‐observations. We found that violation of the proportional hazards assumption had negligible impact on the calibration of predictions obtained using a Cox model.
    Date 06/2025
    Language en
    Library Catalog DOI.org (Crossref)
    URL https://onlinelibrary.wiley.com/doi/10.1002/sim.70161
    Accessed 6/13/2025, 1:50:07 AM
    Volume 44
    Pages e70161
    Publication Statistics in Medicine
    DOI 10.1002/sim.70161
    Issue 13-14
    Journal Abbr Statistics in Medicine
    ISSN 0277-6715, 1097-0258
    Date Added 6/13/2025, 1:50:07 AM
    Modified 6/13/2025, 1:50:48 AM

    Tags:

    • calibration
    • non-ph
    • non-proportional-hazards
    • flexible-survival-model
    • flexible-parametric-distribution
    • aft-model
  • Using a Supervised Principal Components Analysis for Variable Selection in High‐Dimensional Datasets Reduces False Discovery Rates

    Item Type Journal Article
    Author Insha Ullah
    Author Kerrie Mengersen
    Author Anthony N. Pettitt
    Author Benoit Liquet
    Abstract ABSTRACT High‐dimensional datasets, where the number of variables ‘’ is much larger than the number of samples ‘’, are ubiquitous and often render standard classification techniques unreliable due to overfitting. An important research problem is feature selection, which ranks candidate variables based on their relevance to the outcome variable and retains those that satisfy a chosen criterion. This article proposes a computationally efficient variable selection method based on principal component analysis tailored to a binary classification problem or case‐control study. This method is accessible and is suitable for the analysis of high‐dimensional datasets. We demonstrate the superior performance of our method through extensive simulations. A semi‐real gene expression dataset, a challenging childhood acute lymphoblastic leukemia gene expression study, and a GWAS that attempts to identify single‐nucleotide polymorphisms (SNPs) associated with rice grain length further demonstrate the usefulness of our method in genomic applications. We expect our method to accurately identify important features and reduce the False Discovery Rate (fdr) by accounting for the correlation between variables and by de‐noising data in the training phase, which also makes it robust to mild outliers in the training data. Our method is almost as fast as univariate filters, so it allows valid statistical inference. The ability to make such inferences sets this method apart from most current multivariate statistical tools designed for today's high‐dimensional data.
    Date 06/2025
    Language en
    Library Catalog DOI.org (Crossref)
    URL https://onlinelibrary.wiley.com/doi/10.1002/sim.70110
    Accessed 6/7/2025, 1:30:02 AM
    Volume 44
    Pages e70110
    Publication Statistics in Medicine
    DOI 10.1002/sim.70110
    Issue 13-14
    Journal Abbr Statistics in Medicine
    ISSN 0277-6715, 1097-0258
    Date Added 6/7/2025, 1:30:02 AM
    Modified 6/7/2025, 1:30:02 AM

    Notes:

    • Seems to be reinventing sliced inverse regression without attribution