Randomized Clinical Trials Do Not Mimic Clinical Practice, Thank Goodness
Randomized clinical trials are successful because they do not mimic clinical practice. They remain highly clinically relevant despite this.
Biostatistical Modeling Plan
This is an example statistical plan for project proposals where the goal is to develop a biostatistical model for prediction, and to do external or strong internal validation of the model.
How to Do Bad Biomarker Research
This article covers some of the bad statistical practices that have crept into biomarker research, including setting the bar too low for demonstrating that biomarker information is new, believing that winning biomarkers are really “winners”, and improper use of continuous variables. Step-by-step guidance is given for ensuring that a biomarker analysis is not reproducible and does not provide clinically useful information.
R Workflow
An overview of R Workflow, which covers how to use R effectively all the way from importing data to analysis, and making use of Quarto
for reproducible reporting.
Decision curve analysis for quantifying the additional benefit of a new marker
This article examines the benefits of decision curve analysis for assessing model performance when adding a new marker to an existing model. Decision curve analysis provides a clinically interpretable metric based on the number of events identified and interventions avoided.
Equivalence of Wilcoxon Statistic and Proportional Odds Model
In this article I provide much more extensive simulations showing the near perfect agreement between the odds ratio (OR) from a proportional odds (PO) model, and the Wilcoxon two-sample test statistic. The agreement is studied by degree of violation of the PO assumption and by the sample size. A refinement in the conversion formula between the OR and the Wilcoxon statistic scaled to 0-1 (corcordance probability) is provided.
Longitudinal Data: Think Serial Correlation First, Random Effects Second
Most analysts automatically turn towards random effects models when analyzing longitudinal data. This may not always be the most natural, or best fitting approach.
Assessing the Proportional Odds Assumption and Its Impact
This article demonstrates how the proportional odds (PO) assumption and its impact can be assessed. General robustness to non-PO on either a main variable of interest or on an adjustment covariate are exemplified. Advantages of a continuous Bayesian blend of PO and non-PO are also discussed.
Commentary on Improving Precision and Power in Randomized Trials for COVID-19 Treatments Using Covariate Adjustment, for Binary, Ordinal, and Time-to-Event Outcomes
This is a commentary on the paper by Benkeser, Díaz, Luedtke, Segal, Scharfstein, and Rosenblum
Incorrect Covariate Adjustment May Be More Correct than Adjusted Marginal Estimates
This article provides a demonstration that the perceived non-robustness of nonlinear models for covariate adjustment in randomized trials may be less of an issue than the non-transportability of marginal so-called robust estimators.
Avoiding One-Number Summaries of Treatment Effects for RCTs with Binary Outcomes
This article presents an argument that for RCTs with a binary outcome the primary result should be a distribution and not any single number summary. The GUSTO-I study is used to exemplify risk difference distributions.
If You Like the Wilcoxon Test You Must Like the Proportional Odds Model
Since the Wilcoxon test is a special case of the proportional odds (PO) model, if one likes the Wilcoxon test, one must like the PO model. This is made more convincing by showing examples of how one may accurately compute the Wilcoxon statistic from the PO model’s odds ratio.
Violation of Proportional Odds is Not Fatal
Many researchers worry about violations of the proportional hazards assumption when comparing treatments in a randomized study. Besides the fact that this frequently makes them turn to a much worse approach, the harm done by violations of the proportional odds assumption usually do not prevent the proportional odds model from providing a reasonable treatment effect assessment.
RCT Analyses With Covariate Adjustment
This article summarizes arguments for the claim that the primary analysis of treatment effect in a RCT should be with adjustment for baseline covariates. It reiterates some findings and statements from classic papers, with illustration on the GUSTO-I trial.
Bayesian Methods to Address Clinical Development Challenges for COVID-19 Drugs and Biologics
The COVID-19 pandemic has elevated the challenge for designing and executing clinical trials with vaccines and drug/device combinations within a substantially shortened time frame. Numerous challenges in designing COVID-19 trials include lack of prior data for candidate interventions / vaccines due to the novelty of the disease, evolving standard of care and sense of urgency to speed up development programmes. We propose sequential and adaptive Bayesian trial designs to help address the challenges inherent in COVID-19 trials. In the Bayesian framework, several methodologies can be implemented to address the complexity of the primary endpoint choice. Different options could be used for the primary analysis of the WHO Severity Scale, frequently used in COVID-19 trials. We propose the longitudinal proportional odds mixed effects model using the WHO Severity Scale ordinal scale. This enables efficient utilization of all clinical information to optimize sample sizes and maximize the rate of acquiring evidence about treatment effects and harms.
Implications of Interactions in Treatment Comparisons
This article explains how the generalizability of randomized trial findings depends primarily on whether and how patient characteristics modify (interact with) the treatment effect. For an observational study this will be related to overlap in the propensity to receive treatment.
The Burden of Demonstrating HTE
Reasons are given for why heterogeneity of treatment effect must be demonstrated, not assumed. An example is presented that shows that HTE must exceed a certain level before personalizing treatment results in better decisions than using the average treatment effect for everyone.
Assessing Heterogeneity of Treatment Effect, Estimating Patient-Specific Efficacy, and Studying Variation in Odds ratios, Risk Ratios, and Risk Differences
This article shows an example formally testing for heterogeneity of treatment effect in the GUSTO-I trial, shows how to use penalized estimation to obtain patient-specific efficacy, and studies variation across patients in three measures of treatment effect.
Statistically Efficient Ways to Quantify Added Predictive Value of New Measurements
Researchers have used contorted, inefficient, and arbitrary analyses to demonstrated added value in biomarkers, genes, and new lab measurements. Traditional statistical measures have always been up to the task, and are more powerful and more flexible. It’s time to revisit them, and to add a few slight twists to make them more helpful.
In Machine Learning Predictions for Health Care the Confusion Matrix is a Matrix of Confusion
The performance metrics chosen for prediction tools, and for Machine Learning in particular, have significant implications for health care and a penetrating understanding of the AUROC will lead to better methods, greater ML value, and ultimately, benefit patients.
Viewpoints on Heterogeneity of Treatment Effect and Precision Medicine
This article provides my reflections after the PCORI/PACE Evidence and the Individual Patient meeting on 2018-05-31. The discussion includes a high-level view of heterogeneity of treatment effect in optimizing treatment for individual patients.
Musings on Multiple Endpoints in RCTs
This article discusses issues related to alpha spending, effect sizes used in power calculations, multiple endpoints in RCTs, and endpoint labeling. Changes in endpoint priority is addressed. Included in the the discussion is how Bayesian probabilities more naturally allow one to answer multiple questions without all-too-arbitrary designations of endpoints as “primary” and “secondary”. And we should not quit trying to learn.
Improving Research Through Safer Learning from Data
What are the major elements of learning from data that should inform the research process? How can we prevent having false confidence from statistical analysis? Does a Bayesian approach result in more honest answers to research questions? Is learning inherently subjective anyway, so we need to stop criticizing Bayesians’ subjectivity? How important and possible is pre-specification? When should replication be required? These and other questions are discussed.
Is Medicine Mesmerized by Machine Learning?
Deep learning and other forms of machine learning are getting a lot of press in medicine. The reality doesn’t match the hype, and interpretable statistical models still have a lot to offer.
Information Gain From Using Ordinal Instead of Binary Outcomes
This article gives examples of information gained by using ordinal over binary response variables. This is done by showing that for the same sample size and power, smaller effects can be detected.
How Can Machine Learning be Reliable When the Sample is Adequate for Only One Feature?
It is easy to compute the sample size N1 needed to reliably estimate how one predictor relates to an outcome. It is next to impossible for a machine learning algorithm entertaining hundreds of features to yield reliable answers when the sample size < N1.
Statistical Criticism is Easy; I Need to Remember That Real People are Involved
Criticism of medical journal articles is easy. I need to keep in mind that much good research is done even if there are some flaws in the design, analysis, or interpretation. I also need to remember that real people are involved.
Continuous Learning from Data: No Multiplicities from Computing and Using Bayesian Posterior Probabilities as Often as Desired
This article describes the drastically different way that sequential data looks operate in a Bayesian setting compared to a classical frequentist setting.
Bayesian vs. Frequentist Statements About Treatment Efficacy
This article contrasts language used when reporting a classical frequentist treatment comparison vs. a Bayesian one, and describes why Bayesian statements convey more actionable information.
Integrating Audio, Video, and Discussion Boards with Course Notes
In this article I seek recommendations for integrating various media for teaching long courses.
EHRs and RCTs: Outcome Prediction vs. Optimal Treatment Selection
Observational data from electronic health records may contain biases that large sample sizes do not overcome. Moderate confounding by indication may render an infinitely large observational study less useful than a small randomized trial for estimating relative treatment effectiveness.
Statistical Errors in the Medical Literature
This article catalogs several types of statistical problems that occur frequently in the medical journal articles.
Subjective Ranking of Quality of Research by Subject Matter Area
This is a subjective ranking of topical areas by the typical quality of research published in the area. Keep in mind that top-quality research can occur in any area when the research team is multi-disciplinary, team members are at the top of their game, and peer review is functional.
Damage Caused by Classification Accuracy and Other Discontinuous Improper Accuracy Scoring Rules
Estimating tendencies is usually a more appropriate goal than classification, and classification leads to the use of discontinuous accuracy scores which give rise to misleading results.
My Journey from Frequentist to Bayesian Statistics
This is the story of what influenced me to become a Bayesian statistician after being trained as a classical frequentist statistician, and practicing only that mode of statistics for many years.
A Litany of Problems With p-values
p-values are very often misinterpreted. p-values and null hypothesis significant testing have hurt science. This article attempts to catalog all the ways in which these happen.
Clinicians’ Misunderstanding of Probabilities Makes Them Like Backwards Probabilities Such As Sensitivity, Specificity, and Type I Error
The error of the transposed conditional is rampant in research. Conditioning on what is unknowable to predict what is already known leads to a host of complexities and interpretation problems.
Split-Sample Model Validation
The many disadvantages of split-sample validation, including subtle ones, are discussed.
Classification vs. Prediction
Classification involves a forced-choice premature decision, and is often misused in machine learning applications. Probability modeling involves the quantification of tendencies and usually addresses the real project goals.
Null Hypothesis Significance Testing Never Worked
This article explains why for decision making the original idea of null hypothesis testing never delivered on its goal.
p-values and Type I Errors are Not the Probabilities We Need
p-values are not what decision makers need, nor are they what most decision makers think they are getting.