Statistical Thinking
  • Frank Harrell
  • About
  • Posts
  • Talks
  • Courses
  • Datamethods
  • News
  • Links
  • Bio
  • Publications
Categories
2017
2018
2019
2020
2021
2022
2023
EHR
RCT
accuracy-score
backward-probability
bayes
big-data
bioinformatics
biomarker
bootstrap
change-scores
classification
collaboration
conditioning
covid-19
data-reduction
data-science
decision-making
design
diagnosis
dichotomization
drug-evaluation
endpoints
evidence
forward-probability
generalizability
graphics
hypothesis-testing
inductive-reasoning
inference
judgment
likelihood
logic
machine-learning
measurement
medical-literature
medicine
metrics
multiplicity
observational
ordinal
p-value
personalized-medicine
posterior
precision
prediction
principles
prior
probability
r
regression
reporting
reproducible
responder-analysis
sample-size
sensitivity
sequential
specificity
subgroup
survival-analysis
teaching
validation
variability

Resources for Ordinal Regression Models

2022
2023
endpoints
ordinal
regression

This article provides resources to assist researchers in understanding and using ordinal regression models, and provides arguments for their wider use.

May 1, 2023
Frank Harrell
20 min

Seven Common Errors in Decision Curve Analysis

decision-making
diagnosis
medicine
2023

I describe seven common errors in decision curve analysis. Avoidance of such errors will make decision curve analysis more reliable and useful.

Mar 18, 2023
Andrew Vickers
7 min

Randomized Clinical Trials Do Not Mimic Clinical Practice, Thank Goodness

generalizability
design
medicine
RCT
drug-evaluation
personalized-medicine
evidence
2017
2023

Randomized clinical trials are successful because they do not mimic clinical practice. They remain highly clinically relevant despite this.

Feb 14, 2023
Frank Harrell
20 min

Biostatistical Modeling Plan

2023
accuracy-score
endpoints
ordinal
collaboration
data-reduction
design
medicine
prediction
regression
validation
bootstrap

This is an example statistical plan for project proposals where the goal is to develop a biostatistical model for prediction, and to do external or strong internal validation of the model.

Jan 26, 2023
Frank Harrell
12 min

How to Do Bad Biomarker Research

2022
big-data
bioinformatics
biomarker
bootstrap
data-science
decision-making
dichotomization
forward-probability
generalizability
medical-literature
multiplicity
personalized-medicine
prediction
principles
reporting
reproducible
responder-analysis
sample-size
sensitivity

This article covers some of the bad statistical practices that have crept into biomarker research, including setting the bar too low for demonstrating that biomarker information is new, believing that winning biomarkers are really “winners”, and improper use of continuous variables. Step-by-step guidance is given for ensuring that a biomarker analysis is not reproducible and does not provide clinically useful information.

Oct 6, 2022
Frank Harrell
15 min

R Workflow

2022
data-science
graphics
r
reproducible

An overview of R Workflow, which covers how to use R effectively all the way from importing data to analysis, and making use of Quarto for reproducible reporting.

Jun 25, 2022
Frank Harrell
15 min

Decision curve analysis for quantifying the additional benefit of a new marker

2022
biomarker
accuracy-score
decision-making
diagnosis
medicine

This article examines the benefits of decision curve analysis for assessing model performance when adding a new marker to an existing model. Decision curve analysis provides a clinically interpretable metric based on the number of events identified and interventions avoided.

Apr 11, 2022
Emily Vertosick and Andrew Vickers
8 min

Equivalence of Wilcoxon Statistic and Proportional Odds Model

2022
endpoints
ordinal
drug-evaluation
hypothesis-testing
RCT
regression

In this article I provide much more extensive simulations showing the near perfect agreement between the odds ratio (OR) from a proportional odds (PO) model, and the Wilcoxon two-sample test statistic. The agreement is studied by degree of violation of the PO assumption and by the sample size. A refinement in the conversion formula between the OR and the Wilcoxon statistic scaled to 0-1 (corcordance probability) is provided.

Apr 6, 2022
Frank Harrell
24 min

Longitudinal Data: Think Serial Correlation First, Random Effects Second

drug-evaluation
endpoints
measurement
RCT
regression
2022

Most analysts automatically turn towards random effects models when analyzing longitudinal data. This may not always be the most natural, or best fitting approach.

Mar 15, 2022
Frank Harrell
8 min

Assessing the Proportional Odds Assumption and Its Impact

2022
accuracy-score
dichotomization
endpoints
ordinal

This article demonstrates how the proportional odds (PO) assumption and its impact can be assessed. General robustness to non-PO on either a main variable of interest or on an adjustment covariate are exemplified. Advantages of a continuous Bayesian blend of PO and non-PO are also discussed.

Mar 9, 2022
Frank Harrell
27 min

A Comparison of Decision Curve Analysis with Traditional Decision Analysis

decision-making
diagnosis
medicine
2021

We compare decision curve analysis and traditional decision analysis to illustrate their similarities and differences.

Dec 27, 2021
Andrew Vickers
7 min

Commentary on Improving Precision and Power in Randomized Trials for COVID-19 Treatments Using Covariate Adjustment, for Binary, Ordinal, and Time-to-Event Outcomes

bayes
covid-19
design
generalizability
inference
metrics
ordinal
personalized-medicine
RCT
regression
reporting
2021

This is a commentary on the paper by Benkeser, Díaz, Luedtke, Segal, Scharfstein, and Rosenblum

Jul 17, 2021
Frank Harrell, Stephen Senn
25 min

Incorrect Covariate Adjustment May Be More Correct than Adjusted Marginal Estimates

2021
generalizability
RCT
regression

This article provides a demonstration that the perceived non-robustness of nonlinear models for covariate adjustment in randomized trials may be less of an issue than the non-transportability of marginal so-called robust estimators.

Jun 29, 2021
Frank Harrell
17 min

Avoiding One-Number Summaries of Treatment Effects for RCTs with Binary Outcomes

2021
generalizability
RCT
regression

This article presents an argument that for RCTs with a binary outcome the primary result should be a distribution and not any single number summary. The GUSTO-I study is used to exemplify risk difference distributions.

Jun 28, 2021
Frank Harrell
10 min

If You Like the Wilcoxon Test You Must Like the Proportional Odds Model

ordinal
hypothesis-testing
2021
accuracy-score
RCT
regression
metrics

Since the Wilcoxon test is a special case of the proportional odds (PO) model, if one likes the Wilcoxon test, one must like the PO model. This is made more convincing by showing examples of how one may accurately compute the Wilcoxon statistic from the PO model’s odds ratio.

Mar 10, 2021
Frank Harrell
6 min

Implementation of the PATH Statement

The recent PATH (Predictive Approaches to Treatment effect Heterogeneity) Statement outlines principles, criteria, and key considerations for applying predictive approaches to clinical trials to provide patient-centered evidence in support of decision making. Here challenges in implementing the PATH Statement are addressed with the GUSTO-I trial as a case study.

Nov 24, 2020
Ewout Steyerberg
19 min

Violation of Proportional Odds is Not Fatal

2020
ordinal
accuracy-score
RCT
regression
hypothesis-testing
metrics

Many researchers worry about violations of the proportional hazards assumption when comparing treatments in a randomized study. Besides the fact that this frequently makes them turn to a much worse approach, the harm done by violations of the proportional odds assumption usually do not prevent the proportional odds model from providing a reasonable treatment effect assessment.

Sep 20, 2020
Frank Harrell
12 min

Unadjusted Odds Ratios are Conditional

2020
generalizability
RCT
regression

This article discusses issues with unadjusted effect ratios such as odds ratios and hazard ratios, showing a simple example of non-generalizability of unadjusted odds ratios.

Sep 13, 2020
Frank Harrell
9 min

RCT Analyses With Covariate Adjustment

2020
drug-evaluation
generalizability
medicine
personalized-medicine
prediction
RCT
regression

This article summarizes arguments for the claim that the primary analysis of treatment effect in a RCT should be with adjustment for baseline covariates. It reiterates some findings and statements from classic papers, with illustration on the GUSTO-I trial.

Jul 19, 2020
Ewout Steyerberg
@ESteyerberg
13 min

Bayesian Methods to Address Clinical Development Challenges for COVID-19 Drugs and Biologics

bayes
RCT
design
drug-evaluation
medicine
responder-analysis
covid-19

The COVID-19 pandemic has elevated the challenge for designing and executing clinical trials with vaccines and drug/device combinations within a substantially shortened time frame. Numerous challenges in designing COVID-19 trials include lack of prior data for candidate interventions / vaccines due to the novelty of the disease, evolving standard of care and sense of urgency to speed up development programmes. We propose sequential and adaptive Bayesian trial designs to help address the challenges inherent in COVID-19 trials. In the Bayesian framework, several methodologies can be implemented to address the complexity of the primary endpoint choice. Different options could be used for the primary analysis of the WHO Severity Scale, frequently used in COVID-19 trials. We propose the longitudinal proportional odds mixed effects model using the WHO Severity Scale ordinal scale. This enables efficient utilization of all clinical information to optimize sample sizes and maximize the rate of acquiring evidence about treatment effects and harms.

May 29, 2020
Natalia Muhlemann MD, Rajat Mukherjee Phd, Frank Harrell PhD
7 min

Implications of Interactions in Treatment Comparisons

RCT
drug-evaluation
generalizability
medicine
observational
personalized-medicine
prediction
subgroup
2020

This article explains how the generalizability of randomized trial findings depends primarily on whether and how patient characteristics modify (interact with) the treatment effect. For an observational study this will be related to overlap in the propensity to receive treatment.

Mar 3, 2020
Frank Harrell
25 min

The Burden of Demonstrating HTE

RCT
generalizability
medicine
metrics
personalized-medicine
subgroup
2019

Reasons are given for why heterogeneity of treatment effect must be demonstrated, not assumed. An example is presented that shows that HTE must exceed a certain level before personalizing treatment results in better decisions than using the average treatment effect for everyone.

Apr 8, 2019
Frank Harrell
6 min

Assessing Heterogeneity of Treatment Effect, Estimating Patient-Specific Efficacy, and Studying Variation in Odds ratios, Risk Ratios, and Risk Differences

RCT
generalizability
medicine
metrics
personalized-medicine
prediction
subgroup
accuracy-score
2019

This article shows an example formally testing for heterogeneity of treatment effect in the GUSTO-I trial, shows how to use penalized estimation to obtain patient-specific efficacy, and studies variation across patients in three measures of treatment effect.

Mar 25, 2019
Frank Harrell
12 min

Statistically Efficient Ways to Quantify Added Predictive Value of New Measurements

prediction
sample-size
validation
accuracy-score
biomarker
diagnosis
medicine
reporting
2018

Researchers have used contorted, inefficient, and arbitrary analyses to demonstrated added value in biomarkers, genes, and new lab measurements. Traditional statistical measures have always been up to the task, and are more powerful and more flexible. It’s time to revisit them, and to add a few slight twists to make them more helpful.

Oct 17, 2018
Frank Harrell
18 min

In Machine Learning Predictions for Health Care the Confusion Matrix is a Matrix of Confusion

data-science
machine-learning
prediction
2018

The performance metrics chosen for prediction tools, and for Machine Learning in particular, have significant implications for health care and a penetrating understanding of the AUROC will lead to better methods, greater ML value, and ultimately, benefit patients.

Aug 28, 2018
Drew Griffin Levy
@DrewLevy
24 min

Data Methods Discussion Site

collaboration
teaching
2018

This article lays out the rationale and overall design of a new discussion site about quantitative methods.

Jun 19, 2018
Frank Harrell
8 min

Viewpoints on Heterogeneity of Treatment Effect and Precision Medicine

RCT
biomarker
decision-making
drug-evaluation
generalizability
medicine
metrics
personalized-medicine
prediction
subgroup
2018

This article provides my reflections after the PCORI/PACE Evidence and the Individual Patient meeting on 2018-05-31. The discussion includes a high-level view of heterogeneity of treatment effect in optimizing treatment for individual patients.

Jun 4, 2018
Frank Harrell
16 min

Navigating Statistical Modeling and Machine Learning

data-science
machine-learning
prediction
2018

This article elaborates on Frank Harrell’s post providing guidance in choosing between machine learning and statistical modeling for a prediction project.

May 14, 2018
Drew Griffin Levy
@DrewLevy
11 min

Road Map for Choosing Between Statistical Modeling and Machine Learning

data-science
machine-learning
prediction
2018

This article provides general guidance to help researchers choose between machine learning and statistical modeling for a prediction project.

Apr 30, 2018
Frank Harrell
10 min

Musings on Multiple Endpoints in RCTs

RCT
bayes
design
drug-evaluation
evidence
hypothesis-testing
medicine
multiplicity
p-value
posterior
endpoints
2018

This article discusses issues related to alpha spending, effect sizes used in power calculations, multiple endpoints in RCTs, and endpoint labeling. Changes in endpoint priority is addressed. Included in the the discussion is how Bayesian probabilities more naturally allow one to answer multiple questions without all-too-arbitrary designations of endpoints as “primary” and “secondary”. And we should not quit trying to learn.

Mar 26, 2018
Frank Harrell
13 min

Improving Research Through Safer Learning from Data

design
evidence
generalizability
inference
judgment
measurement
prior
bayes
2018

What are the major elements of learning from data that should inform the research process? How can we prevent having false confidence from statistical analysis? Does a Bayesian approach result in more honest answers to research questions? Is learning inherently subjective anyway, so we need to stop criticizing Bayesians’ subjectivity? How important and possible is pre-specification? When should replication be required? These and other questions are discussed.

Mar 8, 2018
Frank Harrell
15 min

Is Medicine Mesmerized by Machine Learning?

machine-learning
accuracy-score
classification
data-science
decision-making
medicine
prediction
validation
2018

Deep learning and other forms of machine learning are getting a lot of press in medicine. The reality doesn’t match the hype, and interpretable statistical models still have a lot to offer.

Feb 1, 2018
Frank Harrell
11 min

Information Gain From Using Ordinal Instead of Binary Outcomes

RCT
design
ordinal
dichotomization
inference
precision
responder-analysis
sample-size
2018

This article gives examples of information gained by using ordinal over binary response variables. This is done by showing that for the same sample size and power, smaller effects can be detected.

Jan 28, 2018
Frank Harrell
8 min

Why I Don’t Like Percents

metrics
2018

I prefer fractions and ratios over percents. Here are the reasons.

Jan 19, 2018
Frank Harrell
4 min

How Can Machine Learning be Reliable When the Sample is Adequate for Only One Feature?

prediction
machine-learning
sample-size
validation
precision
accuracy-score
2018

It is easy to compute the sample size N1 needed to reliably estimate how one predictor relates to an outcome. It is next to impossible for a machine learning algorithm entertaining hundreds of features to yield reliable answers when the sample size < N1.

Jan 11, 2018
Frank Harrell
11 min

New Year Goals

2018
2019

Methodologic goals and wishes for research and clinical practice for 2018

Dec 29, 2017
Frank Harrell
7 min

Scoring Multiple Variables, Too Many Variables and Too Few Observations: Data Reduction

variability
data-reduction
2017

This article addresses data reduction, also called unsupervised learning.

Nov 21, 2017
Frank Harrell
6 min

Statistical Criticism is Easy; I Need to Remember That Real People are Involved

RCT
2017

Criticism of medical journal articles is easy. I need to keep in mind that much good research is done even if there are some flaws in the design, analysis, or interpretation. I also need to remember that real people are involved.

Nov 5, 2017
Frank Harrell
6 min

Continuous Learning from Data: No Multiplicities from Computing and Using Bayesian Posterior Probabilities as Often as Desired

bayes
sequential
RCT
2017

This article describes the drastically different way that sequential data looks operate in a Bayesian setting compared to a classical frequentist setting.

Oct 9, 2017
Frank Harrell
13 min

Bayesian vs. Frequentist Statements About Treatment Efficacy

reporting
inference
p-value
RCT
bayes
drug-evaluation
evidence
hypothesis-testing
2017

This article contrasts language used when reporting a classical frequentist treatment comparison vs. a Bayesian one, and describes why Bayesian statements convey more actionable information.

Oct 4, 2017
Frank Harrell
6 min

Integrating Audio, Video, and Discussion Boards with Course Notes

collaboration
teaching
r
reproducible
2017

In this article I seek recommendations for integrating various media for teaching long courses.

Aug 1, 2017
Frank Harrell
15 min

EHRs and RCTs: Outcome Prediction vs. Optimal Treatment Selection

prediction
generalizability
drug-evaluation
evidence
subgroup
EHR
design
medicine
inference
big-data
RCT
personalized-medicine
2017

Observational data from electronic health records may contain biases that large sample sizes do not overcome. Moderate confounding by indication may render an infinitely large observational study less useful than a small randomized trial for estimating relative treatment effectiveness.

Jun 1, 2017
Frank Harrell, Laura Lazzeroni
16 min

Statistical Errors in the Medical Literature

prediction
logic
p-value
validation
bayes
evidence
subgroup
dichotomization
medicine
inference
change-scores
RCT
personalized-medicine
responder-analysis
hypothesis-testing
medical-literature
2017

This article catalogs several types of statistical problems that occur frequently in the medical journal articles.

Apr 8, 2017
Frank Harrell
32 min

Subjective Ranking of Quality of Research by Subject Matter Area

2017

This is a subjective ranking of topical areas by the typical quality of research published in the area. Keep in mind that top-quality research can occur in any area when the research team is multi-disciplinary, team members are at the top of their game, and peer review is functional.

Mar 16, 2017
Frank Harrell
4 min

Damage Caused by Classification Accuracy and Other Discontinuous Improper Accuracy Scoring Rules

prediction
machine-learning
accuracy-score
dichotomization
probability
bioinformatics
validation
classification
data-science
2017

Estimating tendencies is usually a more appropriate goal than classification, and classification leads to the use of discontinuous accuracy scores which give rise to misleading results.

Mar 1, 2017
Frank Harrell
5 min

My Journey from Frequentist to Bayesian Statistics

inference
p-value
likelihood
RCT
bayes
multiplicity
posterior
drug-evaluation
principles
evidence
hypothesis-testing
2017

This is the story of what influenced me to become a Bayesian statistician after being trained as a classical frequentist statistician, and practicing only that mode of statistics for many years.

Feb 19, 2017
Frank Harrell
24 min

Interactive Statistical Graphics: Showing More By Showing Less

survival-analysis
graphics
r
2017

With interactive graphics one can start by showing the most important data features, then drill down to see details.

Feb 5, 2017
Frank Harrell
3 min

A Litany of Problems With p-values

decision-making
bayes
multiplicity
p-value
hypothesis-testing
2017

p-values are very often misinterpreted. p-values and null hypothesis significant testing have hurt science. This article attempts to catalog all the ways in which these happen.

Feb 5, 2017
Frank Harrell
36 min

Clinicians’ Misunderstanding of Probabilities Makes Them Like Backwards Probabilities Such As Sensitivity, Specificity, and Type I Error

specificity
probability
backward-probability
forward-probability
p-value
bayes
conditioning
diagnosis
decision-making
dichotomization
medicine
bioinformatics
biomarker
sensitivity
posterior
accuracy-score
classification
2017

The error of the transposed conditional is rampant in research. Conditioning on what is unknowable to predict what is already known leads to a host of complexities and interpretation problems.

Jan 25, 2017
Frank Harrell
15 min

Split-Sample Model Validation

prediction
bootstrap
validation
2017

The many disadvantages of split-sample validation, including subtle ones, are discussed.

Jan 23, 2017
Frank Harrell
4 min

Fundamental Principles of Statistics

design
measurement
principles
2017

This brief note catalogs what I feel are some of the most important principles to guide statistical practice.

Jan 18, 2017
Frank Harrell
3 min

Ideas for Future Articles

2017

Suggestions for future articles, by readers

Jan 16, 2017
Frank Harrell
2 min

Classification vs. Prediction

prediction
decision-making
machine-learning
accuracy-score
classification
data-science
2017

Classification involves a forced-choice premature decision, and is often misused in machine learning applications. Probability modeling involves the quantification of tendencies and usually addresses the real project goals.

Jan 15, 2017
Frank Harrell
7 min

Null Hypothesis Significance Testing Never Worked

logic
inference
bayes
p-value
hypothesis-testing
inductive-reasoning
2017

This article explains why for decision making the original idea of null hypothesis testing never delivered on its goal.

Jan 14, 2017
Frank Harrell
3 min

p-values and Type I Errors are Not the Probabilities We Need

judgment
inference
likelihood
bayes
multiplicity
p-value
prior
hypothesis-testing
2017

p-values are not what decision makers need, nor are they what most decision makers think they are getting.

Jan 14, 2017
Frank Harrell
12 min

Introduction

2017
principles

Introducing the Statistical Thinking Blog

Jan 13, 2017
Frank Harrell
4 min
No matching items

    Reuse

    https://creativecommons.org/licenses/by/4.0/
    Source Code
    ---
    title: ""
    listing:
      - id: post
        contents: "*/index.qmd"
        type: default
        fields: [date, title, description, categories, author, reading-time]  
        sort: "date desc"
        categories: cloud
        sort-ui: true
        filter-ui: true
        page-size: 10
    page-layout: full
    title-block-banner: false
    ---
    
    
    
    ::: {#post}
    :::
    Blog made with Quarto, by Frank Harrell. License: CC BY-SA 2.0.