Valerii Fedorov
Frank Mannino
Rongmei Zhang
Volume 8
Pages 50-61
Pharm Stat
2009
optimal cutpoint depends on unknown parameters;should only entertain dichotomization when "estimating a value of the cumulative distribution and when the assumed model is very different from the true model";nice graphics
Heiko Belcher
Volume 11
Pages 1747-1758
Stat Med
1992
Petra Buettner
Claus Garbe
Irene Guggenmoos-Holzmann
Volume 50
Pages 1201-1210
J Clin Epi
1997
choice of cut point depends on marginal distribution of predictor
D. R. Ragland
Volume 3
Pages 434-440
Epi
1992
10.1097/00001648-199209000-00009
Gary S. Collins
Emmanuel O. Ogundimu
Jonathan A. Cook
Yannick L. Manach
Douglas G. Altman
Volume 35
Issue 23
Pages 4124-4135
Stat Med
2016-10
10.1002/sim.6986
used rms package hazard regression method (hare) for survival model calibration
Caroline Bennette
Andrew Vickers
Volume 12
Issue 1
Pages 21+
BMC Med Res Methodol
2012-02
Quantiles are a staple of epidemiologic research: in contemporary epidemiologic practice, continuous variables are typically categorized into tertiles, quartiles and quintiles as a means to illustrate the relationship between a continuous exposure and a binary outcome. In this paper we argue that this approach is highly problematic and present several potential alternatives. We also discuss the perceived drawbacks of these newer statistical methods and the possible reasons for their slow adoption by epidemiologists. The use of quantiles is often inadequate for epidemiologic research with continuous variables.
terrific graphical examples; nice display of outcome heterogeneity within quantile groups of PSA
O. Naggara
J. Raymond
F. Guilbert
D. Roy
A. Weill
D. G. Altman
Volume 32
Issue 3
Pages 437-440
Am J Neuroradiol
2011
In medical research analyses, continuous variables are often converted into categoric variables by grouping values into ≥2 categories. The simplicity achieved by creating ≥2 artificial groups has a cost: Grouping may create rather than avoid problems. In particular, dichotomization leads to a considerable loss of power and incomplete correction for confounding factors. The use of data-derived "optimal" cut-points can lead to serious bias and should at least be tested on independent observations to assess their validity. Both problems are illustrated by the way the results of a registry on unruptured intracranial aneurysms are commonly used. Extreme caution should restrict the application of such results to clinical decision-making. Categorization of continuous data, especially dichotomization, is unnecessary for statistical analysis. Continuous explanatory variables should be left alone in statistical models.
Norbert Holländer
Willi Sauerbrei
Martin Schumacher
Volume 23
Pages 1701-1713
Stat Med
2004
10.1002/sim.1611
true type I error can be much greater than nominal level;one example where nominal is 0.05 and true is 0.5;minimum P-value method;CART;recursive partitioning;bootstrap method for correcting confidence interval;based on heuristic shrinkage coefficient;"It should be noted, however, that the optimal cutpoint approach has disadvantages. One of these is that in almost every study where this method is applied, another cutpoint will emerge. This makes comparisons across studies extremely difficult or even impossible. Altman et al. point out this problem for studies of the prognostic relevance of the S-phase fraction in breast cancer published in the literature. They identified 19 different cutpoints used in the literature; some of them were solely used because they emerged as the `optimal' cutpoint in a specific data set. In a meta-analysis on the relationship between cathepsin-D content and disease-free survival in node-negative breast cancer patients, 12 studies were in included with 12 different cutpoints ... Interestingly, neither cathepsin-D nor the S-phase fraction are recommended to be used as prognostic markers in breast cancer in the recent update of the American Society of Clinical Oncology."; dichotomization; categorizing continuous variables; refs alt94dan, sch94out, alt98sub
Barry K. Moser
Laura P. Coombs
Volume 23
Pages 1843-1860
Stat Med
2004
large loss of efficiency and power;embeds in a logistic distribution, similar to proportional odds model;categorization;dichotomization of a continuous response in order to obtain odds ratios often results in an inflation of the needed sample size by a factor greater than 1.5
Douglas G. Altman
Volume 78
Pages 556-557
Brit J Cancer
1998
Patrick Royston
Douglas G. Altman
Willi Sauerbrei
Volume 25
Pages 127-141
Stat Med
2006
10.1002/sim.2331
destruction of statistical inference when cutpoints are chosen using the response variable; varying effect estimates when change cutpoints;difficult to interpret effects when dichotomize;nice plot showing effect of categorization; PBC data
Howard Wainer
Volume 19
Issue 1
Pages 49-56
Chance
2006
can find bins that yield either positive or negative association;especially pertinent when effects are small;"With four parameters, I can fit an elephant; with five, I can make it wiggle its trunk." - John von Neumann
S. E. Maxwell
H. D. Delaney
Volume 113
Pages 181-190
Psych Bull
1993
10.1037//0033-2909.113.1.181
G. Schulgen
B. Lausen
J. Olsen
M. Schumacher
Volume 120
Pages 172-184
Am J Epi
1994
Samy Suissa
Lucie Blais
Volume 14
Pages 247-255
Stat Med
1995
10.1002/sim.4780140303
S. G. Hilsenbeck
G. M. Clark
Volume 15
Pages 103-112
Stat Med
1996
B. Lausen
M. Schumacher
Volume 21
Issue 3
Pages 307-326
Comp Stat Data Analysis
1996
10.1016/0167-9473(95)00016-X
D. G. Altman
Volume 64
Pages 975
Brit J Cancer
1991
D. G. Altman
B. Lausen
W. Sauerbrei
M. Schumacher
Volume 86
Pages 829-835
J Nat Cancer Inst
1994
David Faraggi
Richard Simon
Volume 15
Pages 2203-2213
Stat Med
1996
bias in point estimate of effect from selecting cutpoints based on P-value; loss of information from dichotomizing continuous predictors