# References

Allison, P. D. (2001).

*Missing Data*. Sage.
Altman, D. G. (1991). Categorising continuous covariates (letter to the
editor).

*Brit J Cancer*,*64*, 975.
Altman, Douglas G. (1998). Suboptimal analysis using
“optimal” cutpoints.

*Brit J Cancer*,*78*, 556–557.
Altman, D. G., & Andersen, P. K. (1989). Bootstrap investigation of
the stability of a Cox regression model.

*Stat Med*,*8*, 771–783.
Altman, D. G., Lausen, B., Sauerbrei, W., & Schumacher, M. (1994).
Dangers of using “optimal” cutpoints in the evaluation of
prognostic factors.

*J Nat Cancer Inst*,*86*, 829–835.
Andrews, D. F., & Herzberg, A. M. (1985).

*Data*. Springer-Verlag.
Arjas, E. (1988). A graphical method for assessing goodness of fit in
Cox’s proportional hazards model.

*J Am Stat Assoc*,*83*, 204–212.
Armstrong, B. G., & Sloan, M. (1989). Ordinal regression models for
epidemiologic data.

*Am J Epi*,*129*, 191–204.
Atkinson, A. C. (1980). A note on the generalized information criterion
for choice of a model.

*Biometrika*,*67*, 413–418.
Austin, P. C. (2008). Bootstrap model selection had similar performance
for selecting authentic and noise variables compared to backward
variable elimination: A simulation study.

*J Clin Epi*,*61*, 1009–1017."in general, a bootstrap
model selection method had comparable performance to conventional
backward variable elimination for identifying the true regression model.
In most settings, both methods performed poorly at correctly identifying
the correct regression model."

Austin, P. C., & Steyerberg, E. W. (2019). The Integrated
Calibration Index (ICI) and related metrics for
quantifying the calibration of logistic regression models.

*Statistics in Medicine*,*38*(21), 4051–4065. https://doi.org/10.1002/sim.8281
Austin, P. C., Tu, J. V., Daly, P. A., & Alter, D. A. (2005).
Tutorial in Biostatistics:The use of quantile
regression in health care research: A case study examining gender
differences in the timeliness of thrombolytic therapy.

*Stat Med*,*24*, 791–816.
Austin, P. C., Tu, J. V., & Lee, D. S. (2010). Logistic regression
had superior performance compared with regression trees for predicting
in-hospital mortality in patients hospitalized with heart failure.

*J Clin Epi*,*63*, 1145–1155.ROC areas
for logistic models varied from 0.747 to 0.775 whereas they varied from
0.620-0.651 for recursive partitioning;repeated data simulation showed
large variation in tree structure

Barnes, S. A., Lindborg, S. R., & Seaman, J. W. (2006). Multiple
imputation techniques in small sample clinical trials.

*Stat Med*,*25*, 233–245.bad performance of
LOCF including high bias and poor confidence interval
coverage;simulation setup;longitudinal data;serial
data;RCT;dropout;assumed missing at random (MAR);approximate Bayesian
bootstrap;Bayesian least squares;missing data;nice background
summary;new completion score method based on fitting a Poisson model for
the number of completed clinic visits and using donors and approximate
Bayesian bootstrap

Barzi, F., & Woodward, M. (2004). Imputations of missing values in
practice: Results from imputations of serum cholesterol in
28 cohort studies.

*Am J Epi*,*160*, 34–45.excellent review article for multiple imputation;list
of variables to include in imputation model;"Imputation models should
ideally include all covariates that are related to the missing data
mechanism, have distributions that differ between the respondents and
nonrespondents, are associated with cholesterol, and will be included in
the analyses of the final complete data sets";detailed comparison of
results (cholesterol effect and confidence limits) for various
imputation methods

Belcher, H. (1992). The concept of residual confounding in regression
models and some applications.

*Stat Med*,*11*, 1747–1758.
Belsley, David A. (1991).

*Conditioning Diagnostics: Collinearity and Weak Data in Regression*. Wiley.
Belsley, D. A., Kuh, E., & Welsch, R. E. (1980).

*Regression Diagnostics: Identifying Influential Data and Sources of Collinearity*. Wiley.
Benedetti, J. K., Liu, P.-Y., Sather, H. N., Seinfeld, J., & Epton,
M. A. (1982). Effective sample size for tests of censored survival data.

*Biometrika*,*69*, 343–349.
Bennette, C., & Vickers, A. (2012). Against quantiles:
Categorization of continuous variables in epidemiologic research, and
its discontents.

*BMC Med Res Methodol*,*12*(1), 21+. https://doi.org/10.1186/1471-2288-12-21terrific graphical examples; nice display of outcome
heterogeneity within quantile groups of PSA

Berhane, K., Hauptmann, M., & Langholz, B. (2008). Using tensor
product splines in modeling exposure–time–response relationships:
Application to the Colorado Plateau Uranium
Miners cohort.

*Stat Med*,*27*, 5484–5496.discusses taking product of all univariate spline
basis functions

Bernal, J. L., Cummins, S., & Gasparrini, A. (2017). Interrupted
time series regression for the evaluation of public health
interventions: A tutorial.

*International Journal of Epidemiology*,*46*(1), 348–355. https://doi.org/10.1093/ije/dyw098
Berridge, D. M., & Whitehead, J. (1991). Analysis of failure time
data with ordinal categories of response.

*Stat Med*,*10*, 1703–1710. https://doi.org/10.1002/sim.4780101108
Blettner, M., & Sauerbrei, W. (1993). Influence of model-building
strategies on the results of a case-control study.

*Stat Med*,*12*, 1325–1338.
Bondarenko, I., & Raghunathan, T. (2016). Graphical and numerical
diagnostic tools to assess suitability of multiple imputations and
imputation models.

*Stat Med*,*35*(17), 3007–3020. https://doi.org/10.1002/sim.6926
Booth, J. G., & Sarkar, S. (1998). Monte Carlo
approximation of bootstrap variances.

*Am Statistician*,*52*, 354–357.number of resamples required
to estimate variances, quantiles; 800 resamples may be required to
guarantee with 0.95 confidence that the relative error of a variance
estimate is 0.1;Efron’s original suggestions for as low as 25 resamples
were based on comparing stability of bootstrap estimates to sampling
error, but small relative effects can significantly change
P-values;number of bootstrap resamples

Bordley, R. (2007). Statistical decisionmaking without math.

*Chance*,*20*(3), 39–44.
Breiman, Leo. (1992). The little bootstrap and other methods for
dimensionality selection in regression: X-fixed prediction error.

*J Am Stat Assoc*,*87*, 738–754.
Breiman, L., & Friedman, J. H. (1985). Estimating optimal
transformations for multiple regression and correlation (with
discussion).

*J Am Stat Assoc*,*80*, 580–619.
Breiman, Leo, Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984).

*Classification and Regression Trees*. Wadsworth and Brooks/Cole.
Breslow, N. E., Edler, L., & Berger, J. (1984). A two-sample
censored-data rank test for acceleration.

*Biometrics*,*40*, 1049–1062.
Briggs, W. M., & Zaretzki, R. (2008). The skill plot: A
graphical technique for evaluating continuous diagnostic tests (with
discussion).

*Biometrics*,*64*, 250–261."statistics such as the AUC are not especially
relevant to someone who must make a decision about a particular x_c. ...
ROC curves lack or obscure several quantities that are necessary for
evaluating the operational effectiveness of diagnostic tests. ... ROC
curves were first used to check how radio <i>receivers</i> (like radar receivers) operated
over a range of frequencies. ... This is not how most ROC curves are
used now, particularly in medicine. The receiver of a diagnostic
measurement ... wants to make a decision based on some x_c, and is not
especially interested in how well he would have done had he used some
different cutoff."; in the discussion David Hand states "when
integrating to yield the overall AUC measure, it is necessary to decide
what weight to give each value in the integration. The AUC implicitly
does this using a weighting derived empirically from the data. This is
nonsensical. The relative importance of misclassifying a case as a
noncase, compared to the reverse, cannot come from the data itself. It
must come externally, from considerations of the severity one attaches
to the different kinds of misclassifications."; see Lin, Kvam, Lu Stat
in Med 28:798-813;2009

Brownstone, D. (1988). Regression strategies.

*Proceedings of the 20th Symposium on the Interface Between Computer Science and Statistics*, 74–79.
Buettner, P., Garbe, C., & Guggenmoos-Holzmann, I. (1997). Problems
in defining cutoff points of continuous prognostic factors:
Example of tumor thickness in primary cutaneous melanoma.

*J Clin Epi*,*50*, 1201–1210.choice of cut point depends on marginal distribution
of predictor

Buuren, Stef. (2012).

*Flexible imputation of missing data*. Chapman & Hall/CRC. https://doi.org/10.1201/b11826
Byar, D. P., & Green, S. B. (1980). The choice of treatment for
cancer patients based on covariate information: Application
to prostate cancer.

*Bulletin Cancer, Paris*,*67*, 477–488.
Califf, R. M., Harrell, F. E., Lee, K. L., Rankin, J. S., & Others.
(1989). The evolution of medical and surgical therapy for coronary
artery disease.

*JAMA*,*261*, 2077–2086.
Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B.,
Betancourt, M., Brubaker, M., Guo, J., Li, P., & Riddell, A. (2017).
Stan: A Probabilistic Programming Language.

*J Stat Software*,*76*(1), 1–32. https://doi.org/10.18637/jss.v076.i01
Carpenter, J. R., & Smuk, M. (2021). Missing data: A
statistical framework for practice.

*Biometrical Journal*,*63*(5), 915–947. https://doi.org/10.1002/bimj.202000196
Carpenter, J., & Bithell, J. (2000). Bootstrap confidence intervals:
When, which, what? A practical guide for medical
statisticians.

*Stat Med*,*19*, 1141–1164.unconditional nonparametric bootstrap becomes more
equivalent to conditional bootstrap based on regression residuals when
full models are fitted

Centers for Disease Control and Prevention CDC. National Center for
Health Statistics NCHS. (2010).

*National Health and Nutrition Examination Survey*. http://www.cdc.gov/nchs/nhanes/nhanes2009-2010/nhanes09_10.htm
Chambers, J. M., & Hastie, T. J. (Eds.). (1992).

*Statistical Models in S*. Wadsworth and Brooks/Cole.
Chan, K. W., & Meng, X.-L. (2022). Multiple improvements of multiple
imputation likelihood ratio tests.

*Statistica Sinica*,*32*, 1489–1514. https://doi.org/10.5705/ss.202019.0314
Chatfield, C. (1991). Avoiding statistical pitfalls (with discussion).

*Stat Sci*,*6*, 240–268.
Chatfield, C. (1995). Model uncertainty, data mining and statistical
inference (with discussion).

*J Roy Stat Soc A*,*158*, 419–466.bias by selecting model because it fits
the data well; bias in standard errors;P. 420: ... need for a better
balance in the literature and in statistical teaching between
techniques and problem solving strategies. P. 421: It is “well
known” to be “logically unsound and practically
misleading” (Zhang, 1992) to make inferences as if a model is
known to be true when it has, in fact, been selected from the same data
to be used for estimation purposes. However, although statisticians may
admit this privately (Breiman (1992) calls it a “quiet
scandal”), they (we) continue to ignore the difficulties because
it is not clear what else could or should be done. P. 421: Estimation
errors for regression coefficients are usually smaller than errors from
failing to take into account model specification. P. 422: Statisticians
must stop pretending that model uncertainty does not exist and begin to
find ways of coping with it. P. 426: It is indeed strange that we often
admit model uncertainty by searching for a best model but then ignore
this uncertainty by making inferences and predictions as if certain that
the best fitting model is actually true. P. 427: The analyst needs to
assess the model selection process and not just the best fitting model.
P. 432: The use of subset selection methods is well known to introduce
alarming biases. P. 433: ... the AIC can be highly biased in data-driven
model selection situations. P. 434: Prediction intervals will generally
be too narrow. In the discussion, Jamal R. M. Ameen states that a model
should be (a) satisfactory in performance relative to the stated
objective, (b) logically sound, (c) representative, (d) questionable and
subject to on-line interrogation, (e) able to accommodate external or
expert information and (f) able to convey information.

Chatterjee, S., & Hadi, A. S. (2012).

*Regression Analysis by Example*(Fifth). Wiley.
Chavent, M., Kuentz-Simonet, V., Liquet, B., & Saracco, J. (2012).
ClustOfVar: An R package for the clustering of
variables.

*J Stat Software*,*50*(13), 1–16.
Ciampi, A., Thiffault, J., Nakache, J. P., & Asselain, B. (1986).
Stratification by stepwise regression, correspondence analysis and
recursive partition.

*Comp Stat Data Analysis*,*1986*, 185–204.
Cleveland, W. S. (1979). Robust locally weighted regression and
smoothing scatterplots.

*J Am Stat Assoc*,*74*, 829–836.
Collett, D. (2002).

*Modelling Binary Data*(Second). Chapman and Hall.
Collins, G. S., Ogundimu, E. O., & Altman, D. G. (2016). Sample size
considerations for the external validation of a multivariable prognostic
model: A resampling study.

*Stat Med*,*35*(2), 214–226. https://doi.org/10.1002/sim.6787
Collins, G. S., Ogundimu, E. O., Cook, J. A., Manach, Y. L., &
Altman, D. G. (2016). Quantifying the impact of different approaches for
handling continuous predictors on the performance of a prognostic model.

*Stat Med*,*35*(23), 4124–4135. https://doi.org/10.1002/sim.6986used rms package hazard regression method (hare) for
survival model calibration

Cook, E. F., & Goldman, L. (1988). Asymmetric stratification:
An outline for an efficient method for controlling
confounding in cohort studies.

*Am J Epi*,*127*, 626–639.
Cook, N. R. (2007). Use and misues of the receiver operating
characteristic curve in risk prediction.

*Circ*,*115*, 928–935.example of large change in predicted risk
in cardiovascular disease with tiny change in ROC area;possible limits
to c index when calibration is perfect;importance of calibration
accuracy and changes in predicted risk when new variables are
added

Copas, J. B. (1983). Regression, prediction and shrinkage (with
discussion).

*J Roy Stat Soc B*,*45*, 311–354.
Copas, J. B. (1987). Cross-validation shrinkage of regression
predictors.

*J Roy Stat Soc B*,*49*, 175–183.
Cox, C., Chu, H., Schneider, M. F., & Muñoz, A. (2007). Parametric
survival analysis and taxonomy of hazard functions for the generalized
gamma distribution.

*Stat Med*,*26*, 4352–4374.nice tutoria;GG includes bathtub-shape hazard
function; failed to reference Herndon

Cox, D. R. (1972). Regression models and life-tables (with discussion).

*J Roy Stat Soc B*,*34*, 187–220.
Crawford, S. L., Tennstedt, S. L., & McKinlay, J. B. (1995). A
comparison of analytic methods for non-random missingness of outcome
data.

*J Clin Epi*,*48*, 209–219.
Crichton, N. J., & Hinde, J. P. (1989). Correspondence analysis as a
screening method for indicants for clinical diagnosis.

*Stat Med*,*8*, 1351–1362.
D’Agostino, R. B., Belanger, A. J., Markson, E. W., Kelly-Hayes, M.,
& Wolf, P. A. (1995). Development of health risk appraisal functions
in the presence of multiple indicators: The Framingham
Study nursing home institutionalization model.

*Stat Med*,*14*, 1757–1770.
Davis, C. E., Hyde, J. E., Bangdiwala, S. I., & Nelson, J. J.
(1986). An example of dependencies among variables in a conditional
logistic regression. In S. H. Moolgavkar & R. L. Prentice (Eds.),

*Modern Statistical Methods in Chronic Disease Epidemiology*(pp. 140–147). Wiley.
Davis, C. S. (2002).

*Statistical Methods for the Analysis of Repeated Measurements*. Springer.
Derksen, S., & Keselman, H. J. (1992). Backward, forward and
stepwise automated subset selection algorithms: Frequency
of obtaining authentic and noise variables.

*British J Math Stat Psych*,*45*, 265–282.
Devlin, T. F., & Weeks, B. J. (1986). Spline functions for logistic
regression modeling.

*Proceedings of the Eleventh Annual SAS Users Group International Conference*, 646–651.
Diggle, P. J., Heagerty, P., Liang, K.-Y., & Zeger, S. L. (2002).

*Analysis of Longitudinal Data*(second). Oxford University Press.
Donders, van der Heijden, G. J. M. G., Stijnen, T., & Moons, K. G.
M. (2006). Review: A gentle introduction to imputation of
missing values.

*J Clin Epi*,*59*, 1087–1091.simple demonstration of failure of the add new
category method (indicator variable)

Donohue, M. C., Langford, O., Insel, P. S., van Dyck, C. H., Petersen,
R. C., Craft, S., Sethuraman, G., Raman, R., Aisen, P. S., &
Initiative, F. the A. D. N. (n.d.). Natural cubic splines for the
analysis of Alzheimer’s clinical trials.

*Pharmaceutical Statistics*,*n/a*(n/a). https://doi.org/10.1002/pst.2285
Duan, N. (1983). Smearing estimate: A nonparametric
retransformation method.

*J Am Stat Assoc*,*78*, 605–610.
Duc, A. N., & Wolbers, M. (n.d.). Smooth semi-nonparametric
(SNP) estimation of the cumulative incidence function.

*Stat Med*, n/a. https://doi.org/10.1002/sim.7331
Durrleman, S., & Simon, R. (1989). Flexible regression models with
cubic splines.

*Stat Med*,*8*, 551–561.
Efron, B. (1983). Estimating the error rate of a prediction rule:
Improvement on cross-validation.

*J Am Stat Assoc*,*78*, 316–331.suggested need at least 200
models to get an average that is adequate, i.e., 20 repeats of 10-fold
cv

Efron, Bradley, & Narasimhan, B. (2020). The Automatic
Construction of Bootstrap Confidence Intervals.

*Journal of Computational and Graphical Statistics*,*0*(0), 1–12. https://doi.org/10.1080/10618600.2020.1714633
Efron, Bradley, & Tibshirani, R. (1993).

*An Introduction to the Bootstrap*. Chapman and Hall.
Efron, Bradley, & Tibshirani, R. (1997). Improvements on
cross-validation: The .632+ bootstrap method.

*J Am Stat Assoc*,*92*, 548–560.
Erler, N. S., Rizopoulos, D., Rosmalen, J., Jaddoe, V. W. V., Franco, O.
H., & Lesaffre, E. M. E. H. (2016). Dealing with missing covariates
in epidemiologic studies: A comparison between multiple imputation and a
full Bayesian approach.

*Stat Med*,*35*(17), 2955–2974. https://doi.org/10.1002/sim.6944
Fan, J., & Levine, R. A. (2007). To amnio or not to amnio:
That is the decision for Bayes.

*Chance*,*20*(3), 26–32.
Faraggi, D., & Simon, R. (1996). A simulation study of
cross-validation for selecting an optimal cutpoint in univariate
survival analysis.

*Stat Med*,*15*, 2203–2213.bias in point estimate of effect from selecting
cutpoints based on P-value; loss of information from dichotomizing
continuous predictors

Faraway, J. J. (1992). The cost of data analysis.

*J Comp Graph Stat*,*1*, 213–229.
Fedorov, V., Mannino, F., & Zhang, R. (2009). Consequences of
dichotomization.

*Pharm Stat*,*8*, 50–61. https://doi.org/10.1002/pst.331optimal cutpoint depends on unknown parameters;should
only entertain dichotomization when "estimating a value of the
cumulative distribution and when the assumed model is very different
from the true model";nice graphics

Fienberg, S. E. (2007).

*The Analysis of Cross-Classified Categorical Data*(Second). Springer.
Filzmoser, P., Fritz, H., & Kalcher, K. (2012).

*pcaPP: Robust PCA by Projection Pursuit*. http://CRAN.R-project.org/package=pcaPP
Freedman, D., Navidi, W., & Peters, S. (1988).

*On the Impact of Variable Selection in Fitting Regression Equations*(pp. 1–16). Springer-Verlag.
Friedman, J. H. (1984).

*A variable span smoother*(Technical Report No. 5). Laboratory for Computational Statistics, Department of Statistics, Stanford University.
Gail, M. H., & Pfeiffer, R. M. (2005). On criteria for evaluating
models of absolute risk.

*Biostatistics*,*6*(2), 227–239.
Gardiner, J. C., Luo, Z., & Roman, L. A. (2009). Fixed effects,
random effects and GEE: What are the
differences?

*Stat Med*,*28*, 221–239.nice comparison of models; econometrics; different use
of the term "fixed effects model"

Giannoni, A., Baruah, R., Leong, T., Rehman, M. B., Pastormerlo, L. E.,
Harrell, F. E., Coats, A. J., & Francis, D. P. (2014). Do optimal
prognostic thresholds in continuous physiological variables really
exist? Analysis of origin of apparent thresholds, with
systematic review for peak oxygen consumption, ejection fraction and
BNP.

*PLoS ONE*,*9*(1). https://doi.org/10.1371/journal.pone.0081699
Giudice, J. H., Fieberg, J. R., & Lenarz, M. S. (2011). Spending
degrees of freedom in a poor economy: A case study of
building a sightability model for moose in northeastern minnesota.

*J Wildlife Manage*. https://doi.org/10.1002/jwmg.213
Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring
rules, prediction, and estimation.

*J Am Stat Assoc*,*102*, 359–378.wonderful review article
except missing references from Scandanavian and German medical decision
making literature

Goldstein, H. (1989). Restricted unbiased iterative generalized
least-squares estimation.

*Biometrika*,*76*(3), 622–623.derivation of REML

Govindarajulu, U. S., Spiegelman, D., Thurston, S. W., Ganguli, B.,
& Eisen, E. A. (2007). Comparing smoothing techniques in
Cox models for exposure-response relationships.

*Stat Med*,*26*, 3735–3752.authors wrote a
SAS macro for restricted cubic splines even though such a macro as
existed since 1984; would have gotten more useful results had simulation
been used so would know the true regression shape;measure of agreement
of two estimated curves by computing the area between them, standardized
by average of areas under the two;penalized spline and rcs were closer
to each other than to fractional polynomials

Grambsch, P. M., & O’Brien, P. C. (1991). The effects of
transformations and preliminary tests for non-linearity in regression.

*Stat Med*,*10*, 697–709.
Grambsch, P., & Therneau, T. (1994). Proportional hazards tests and
diagnostics based on weighted residuals.

*Biometrika*,*81*, 515–526.
Gray, R. J. (1992). Flexible methods for analyzing survival data using
splines, with applications to breast cancer prognosis.

*J Am Stat Assoc*,*87*, 942–951.
Gray, R. J. (1994). Spline-based tests in survival analysis.

*Biometrics*,*50*, 640–652.
Greenacre, M. J. (1988). Correspondence analysis of multivariate
categorical data by weighted least-squares.

*Biometrika*,*75*, 457–467.
Greenland, S. (2000). When should epidemiologic regressions use random
coefficients?

*Biometrics*,*56*, 915–921. https://doi.org/10.1111/j.0006-341X.2000.00915.xuse of statistics in epidemiology is largely
primitive;stepwise variable selection on confounders leaves important
confounders uncontrolled;composition matrix;example with far too many
significant predictors with many regression coefficients absurdly
inflated when overfit;lack of evidence for dietary effects mediated
through constituents;shrinkage instead of variable selection;larger
effect on confidence interval width than on point estimates with
variable selection;uncertainty about variance of random effects is just
uncertainty about prior opinion;estimation of variance is
pointless;instead the analysis should be repeated using different
values;"if one feels compelled to estimate $\tau{̂2}$, I would recommend
giving it a proper prior concentrated amount contextually reasonable
values";claim about ordinary MLE being unbiased is misleading because it
assumes the model is correct and is the only model entertained;shrinkage
towards compositional model;"models need to be complex to capture
uncertainty about the relations...an honest uncertainty assessment
requires parameters for all effects that we know may be present. This
advice is implicit in an antiparsimony principle often attributed to L.
J. Savage ’All models should be as big as an elephant (see Draper,
1995)’". See also gus06per.

Guo, J., James, G., Levina, E., Michailidis, G., & Zhu, J. (2011).
Principal component analysis with sparse fused loadings.

*J Comp Graph Stat*,*19*(4), 930–946.incorporates blocking structure in the
variables;selects different variables for different
components;encourages loadings of highly correlated variables to have
same magnitude, which aids in interpretation

Gurka, M. J., Edwards, L. J., & Muller, K. E. (2011). Avoiding bias
in mixed model inference for fixed effects.

*Stat Med*,*30*(22), 2696–2707. https://doi.org/10.1002/sim.4293
Hand, D., & Crowder, M. (1996).

*Practical Longitudinal Data Analysis*. Chapman & Hall.
Harel, O., & Zhou, X.-H. (2007). Multiple imputation:
Review of theory, implementation and software.

*Stat Med*,*26*, 3057–3077.failed to review
aregImpute;excellent overview;ugly S code;nice description of different
statistical tests including combining likelihood ratio tests (which
appears to be complex, requiring an out-of-sample log likelihood
computation);congeniality of imputation and analysis models;Bayesian
approximation or approximate Bayesian bootstrap overview;"Although
missing at random (MAR) is a non-testable assumption, it has been
pointed out in the literature that we can get very close to MAR if we
include enough variables in the imputation models ... it would be
preferred if the missing data modelling was done by the data
constructors and not by the users... MI yields valid inferences not only
in congenial settings, but also in certain uncongenial ones as
well—where the imputer’s model (1) is more general (i.e. makes fewer
assumptions) than the complete-data estimation method, or when the
imputer’s model makes additional assumptions that are
well-founded."

Harrell, F. E. (1986). The LOGIST Procedure. In

*SUGI Supplemental Library Users Guide*(Version 5, pp. 269–293). SAS Institute, Inc.
Harrell, Frank E. (2015).

*Regression Modeling Strategies, with Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis*(Second edition). Springer. https://doi.org/10.1007/978-3-319-19425-7
Harrell, F. E., Lee, K. L., Califf, R. M., Pryor, D. B., & Rosati,
R. A. (1984). Regression modeling strategies for improved prognostic
prediction.

*Stat Med*,*3*, 143–152.
Harrell, Frank E., Lee, K. L., & Mark, D. B. (1996). Multivariable
prognostic models: Issues in developing models, evaluating
assumptions and adequacy, and measuring and reducing errors.

*Stat Med*,*15*, 361–387.
Harrell, F. E., Lee, K. L., Matchar, D. B., & Reichert, T. A.
(1985). Regression models for prognostic prediction:
Advantages, problems, and suggested solutions.

*Ca Trt Rep*,*69*, 1071–1077.
Harrell, F. E., Lee, K. L., & Pollock, B. G. (1988). Regression
models in clinical studies: Determining relationships
between predictors and response.

*J Nat Cancer Inst*,*80*, 1198–1202.
Harrell, Frank E., Margolis, P. A., Gove, S., Mason, K. E., Mulholland,
E. K., Lehmann, D., Muhe, L., Gatchalian, S., & Eichenwald, H. F.
(1998). Development of a clinical prediction model for an ordinal
outcome: The World Health Organization ARI Multicentre
Study of clinical signs and etiologic agents of pneumonia,
sepsis, and meningitis in young infants.

*Stat Med*,*17*, 909–944. http://onlinelibrary.wiley.com/doi/10.1002/(SICI)1097-0258(19980430)17:8%3C909::AID-SIM753%3E3.0.CO;2-O/abstract
Hastie, T. J., & Tibshirani, R. J. (1990).

*Generalized Additive Models*. Chapman & Hall/CRC.
Hastie, T., & Tibshirani, R. (1990).

*Generalized Additive Models*. Chapman and Hall.
Hastie, Trevor, Tibshirani, R., & Friedman, J. H. (2008).

*The Elements of Statistical Learning*(second). Springer.
He, Y., & Zaslavsky, A. M. (2012). Diagnosing imputation models by
applying target analyses to posterior replicates of completed data.

*Stat Med*,*31*(1), 1–18. https://doi.org/10.1002/sim.4413
Herndon, J. E., & Harrell, F. E. (1990). The restricted cubic spline
hazard model.

*Comm Stat Th Meth*,*19*, 639–663.
Herndon, J. E., & Harrell, F. E. (1995). The restricted cubic spline
as baseline hazard in the proportional hazards model with step function
time-dependent covariables.

*Stat Med*,*14*, 2119–2129.
Hilsenbeck, S. G., & Clark, G. M. (1996). Practical p-value
adjustment for optimally selected cutpoints.

*Stat Med*,*15*, 103–112.
Hoeffding, W. (1948). A non-parametric test of independence.

*Ann Math Stat*,*19*, 546–557.
Holländer, N., Sauerbrei, W., & Schumacher, M. (2004). Confidence
intervals for the effect of a prognostic factor after selection of an
“optimal” cutpoint.

*Stat Med*,*23*, 1701–1713. https://doi.org/10.1002/sim.1611true type I error can be much greater than nominal
level;one example where nominal is 0.05 and true is 0.5;minimum P-value
method;CART;recursive partitioning;bootstrap method for correcting
confidence interval;based on heuristic shrinkage coefficient;"It should
be noted, however, that the optimal cutpoint approach has disadvantages.
One of these is that in almost every study where this method is applied,
another cutpoint will emerge. This makes comparisons across studies
extremely difficult or even impossible. Altman et al. point out this
problem for studies of the prognostic relevance of the S-phase fraction
in breast cancer published in the literature. They identified 19
different cutpoints used in the literature; some of them were solely
used because they emerged as the “optimal” cutpoint in a
specific data set. In a meta-analysis on the relationship between
cathepsin-D content and disease-free survival in node-negative breast
cancer patients, 12 studies were in included with 12 different cutpoints
... Interestingly, neither cathepsin-D nor the S-phase fraction are
recommended to be used as prognostic markers in breast cancer in the
recent update of the American Society of Clinical Oncology.";
dichotomization; categorizing continuous variables; refs alt94dan,
sch94out, alt98sub

Horton, N. J., & Kleinman, K. P. (2007). Much ado about nothing:
A comparison of missing data methods and software to fit
incomplete data regression models.

*Am Statistician*,*61*(1), 79–90.
Hurvich, C. M., & Tsai, C. L. (1990). The impact of model selection
on inference in linear regression.

*Am Statistician*,*44*, 214–217.
Iezzoni, L. I. (1994). Dimensions of Risk. In L. I. Iezzoni
(Ed.),

*Risk Adjustment for Measuring Health Outcomes*(pp. 29–118). Foundation of the American College of Healthcare Executives.dimensions of risk factors to include in models

Janssen, K. J., Donders, A. R., Harrell, F. E., Vergouwe, Y., Chen, Q.,
Grobbee, D. E., & Moons, K. G. (2010). Missing covariate data in
medical research: To impute is better than to ignore.

*J Clin Epi*,*63*, 721–727.
Jolliffe, I. T. (2010).

*Principal Component Analysis*(Second). Springer-Verlag.
Jones, M. P. (1996). Indicator and stratification methods for missing
explanatory variables in multiple linear regression.

*J Am Stat Assoc*,*91*, 222–230.
Kalbfleisch, J. D., & Prentice, R. L. (1973). Marginal likelihood
based on Cox’s regression and life model.

*Biometrika*,*60*, 267–278.
Karrison, T. G. (1997). Use of Irwin’s restricted mean as
an index for comparing survival in different treatment
groups—Interpretation and power considerations.

*Controlled Clin Trials*,*18*, 151–167.nice power comparisons with Wilcoxon;power with and
without covariable adjustment

Karvanen, J., & Harrell, F. E. (2009). Visualizing covariates in
proportional hazards model.

*Stat Med*,*28*, 1957–1966.
Kass, R. E., & Raftery, A. E. (1995). Bayes factors.

*J Am Stat Assoc*,*90*, 773–795.
Kay, R. (1986). Treatment effects in competing-risks analysis of
prostate cancer data.

*Biometrics*,*42*, 203–211.
Kenward, M. G., White, I. R., & Carpener, J. R. (2010). Should
baseline be a covariate or dependent variable in analyses of change from
baseline in clinical trials? (Letter to the editor).

*Stat Med*,*29*, 1455–1456.sharp rebuke of
liu09sho

Keselman, H. J., Algina, J., Kowalchuk, R. K., & Wolfinger, R. D.
(1998). A comparison of two approaches for selecting covariance
structures in the analysis of repeated measurements.

*Comm Stat - Sim Comp*,*27*, 591–604.use of AIC and
BIC for selecting the covariance structure in repeated
measurements;serial data;longitudinal data;when chosing from 11
covariance patterns, AIC selected the correct structure 0.47 of the
time; BIC was correct in 0.35

Kim, S., Sugar, C. A., & Belin, T. R. (2015). Evaluating model-based
imputation methods for missing covariates in regression models with
interactions.

*Stat Med*,*34*(11), 1876–1888. https://doi.org/10.1002/sim.6435
Knaus, W. A., Harrell, F. E., Lynn, J., Goldman, L., Phillips, R. S.,
Connors, A. F., Dawson, N. V., Fulkerson, W. J., Califf, R. M.,
Desbiens, N., Layde, P., Oye, R. K., Bellamy, P. E., Hakim, R. B., &
Wagner, D. P. (1995). The SUPPORT prognostic model:
Objective estimates of survival for seriously ill
hospitalized adults.

*Ann Int Med*,*122*, 191–203. https://doi.org/10.7326/0003-4819-122-3-199502010-00007
Knol, M. J., Janssen, K. J. M., Donders, R. T., Egberts, A. C. G.,
Heerding, E. R., Grobbee, D. E., Moons, K. G. M., & Geerlings, M. I.
(2010). Unpredictable bias when using the missing indicator method or
complete case analysis for missing confounder values: An empirical
example.

*J Clin Epi*,*63*, 728–736.
Koenker, Roger. (2005).

*Quantile Regression*. Cambridge University Press.
Koenker, Roger. (2009).

*Quantreg: Quantile Regression*. http://CRAN.R-project.org/package=quantreg
Koenker, R., & Bassett, G. (1978). Regression quantiles.

*Econometrica*,*46*, 33–50.
Kooperberg, C., Stone, C. J., & Truong, Y. K. (1995). Hazard
regression.

*J Am Stat Assoc*,*90*, 78–94.
Kuhfeld, W. F. (2009). The PRINQUAL Procedure. In

*SAS/STAT 9.2 User’s Guide*(Second). SAS Publishing. http://support.sas.com/documentation/onlinedoc/stat
Lachin, J. M., & Foulkes, M. A. (1986). Evaluation of sample size
and power for analyses of survival with allowance for nonuniform patient
entry, losses to follow-up, noncompliance, and stratification.

*Biometrics*,*42*, 507–519.
Landwehr, J. M., Pregibon, D., & Shoemaker, A. C. (1984). Graphical
methods for assessing logistic regression models (with discussion).

*J Am Stat Assoc*,*79*, 61–83.
Larson, M. G., & Dinse, G. E. (1985). A mixture model for the
regression analysis of competing risks data.

*Appl Stat*,*34*, 201–211.
Lausen, B., & Schumacher, M. (1996). Evaluating the effect of
optimized cutoff values in the assessment of prognostic factors.

*Comp Stat Data Analysis*,*21*(3), 307–326. https://doi.org/10.1016/0167-9473(95)00016-X
Lawless, J. F., & Singhal, K. (1978). Efficient screening of
nonnormal regression models.

*Biometrics*,*34*, 318–327.
le Cessie, S., & van Houwelingen, J. C. (1992). Ridge estimators in
logistic regression.

*Appl Stat*,*41*, 191–201.
Leclerc, A., Luce, D., Lert, F., Chastang, J. F., & Logeay, P.
(1988). Correspondence analysis and logistic modelling:
Complementary use in the analysis of a health survey among
nurses.

*Stat Med*,*7*, 983–995.
Lee, K. J., & Carlin, J. B. (2012). Recovery of information from
multiple imputation: A simulation study.

*Emerg Themes Epi*,*9*(1), 3+. https://doi.org/10.1186/1742-7622-9-3Not sure that the authors satisfactorily dealt with
nonlinear predictor effectsin the absence of strong auxiliary
information, there is little to gain from multiple imputation with
missing data in the exposure-of-interest. In fact, the authors went
further to say that multiple imputation can introduce bias not present
in a complete case analysis if a poorly fitting imputation model is used
[from Yong Hao Pua]

Lee, S., Huang, J. Z., & Hu, J. (2010). Sparse logistic principal
components analysis for binary data.

*Ann Appl Stat*,*4*(3), 1579–1601.
Leng, C., & Wang, H. (2009). On general adaptive sparse principal
component analysis.

*J Comp Graph Stat*,*18*(1), 201–215.
Li, C., & Shepherd, B. E. (2012). A new residual for ordinal
outcomes.

*Biometrika*,*99*(2), 473–480. https://doi.org/10.1093/biomet/asr073
Liang, K.-Y., & Zeger, S. L. (2000). Longitudinal data analysis of
continuous and discrete responses for pre-post designs.

*Sankhyā*,*62*, 134–148.makes an
error in assuming the baseline variable will have the same univariate
distribution as the response except for a shift;baseline may have for
example a truncated distribution based on a trial’s inclusion
criteria;if correlation between baseline and response is zero, ANCOVA
will be twice as efficient as simple analysis of change scores;if
correlation is one they may be equally efficient

Lindsey, J. K. (1997).

*Models for Repeated Measurements*. Clarendon Press.
Lipsitz, S., Parzen, M., & Zhao, L. P. (2002). A
Degrees-Of-Freedom approximation in Multiple
imputation.

*J Stat Comp Sim*,*72*(4), 309–318. https://doi.org/10.1080/00949650212848
Little, R. J. A., & Rubin, D. B. (2002).

*Statistical Analysis with Missing Data*(second). Wiley.
Liu, G. F., Lu, K., Mogg, R., Mallick, M., & Mehrotra, D. V. (2009).
Should baseline be a covariate or dependent variable in analyses of
change from baseline in clinical trials?

*Stat Med*,*28*, 2509–2530.seems to miss several important points,
such as the fact that the baseline variable is often part of the
inclusion/exclusion criteria and so has a truncated distribution that is
different from that of the follow-up measurements;sharp rebuke in
ken10sho

Lockhart, R., Taylor, J., Tibshirani, R. J., & Tibshirani, R.
(2013).

*A significance test for the lasso*. arXiv. http://arxiv.org/abs/1301.7161
Luo, X., Stfanski, L. A., & Boos, D. D. (2006). Tuning variable
selection procedures by adding noise.

*Technometrics*,*48*, 165–175.adding a known amount of
noise to the response and studying σ² to tune the stopping rule to avoid
overfitting or underfitting;simulation setup

Madley-Dowd, P., Hughes, R., Tilling, K., & Heron, J. (2019). The
proportion of missing data should not be used to guide decisions on
multiple imputation.

*Journal of Clinical Epidemiology*,*110*, 63–73. https://doi.org/10.1016/j.jclinepi.2019.02.016
Mantel, N. (1970). Why stepdown procedures in variable selection.

*Technometrics*,*12*, 621–625.
Manuguerra, M., & Heller, G. Z. (2010). Ordinal Regression
Models for Continuous Scales.

*Int J Biostat*,*6*(1). https://doi.org/10.2202/1557-4679.1230mislabeled a flexible parametric model as
semi-parametric; does not cover semi-parametric approach with lots of
intercepts

Mark, D. B., Hlatky, M. A., Harrell, F. E., Lee, K. L., Califf, R. M.,
& Pryor, D. B. (1987). Exercise treadmill score for predicting
prognosis in coronary artery disease.

*Ann Int Med*,*106*, 793–800.
Maxwell, S. E., & Delaney, H. D. (1993). Bivariate median splits and
spurious statistical significance.

*Psych Bull*,*113*, 181–190. https://doi.org/10.1037//0033-2909.113.1.181
McCabe, G. P. (1984). Principal variables.

*Technometrics*,*26*, 137–144.
Michailidis, G., & de Leeuw, J. (1998). The Gifi system
of descriptive multivariate analysis.

*Stat Sci*,*13*, 307–336.
Moons, K. G. M., Donders, R. A. R. T., Stijnen, T., & Harrell, F. E.
(2006). Using the outcome for imputation of missing predictor values was
preferred.

*J Clin Epi*,*59*, 1092–1101. https://doi.org/10.1016/j.jclinepi.2006.01.009use of outcome variable; excellent graphical summaries
of simulations

Moser, B. K., & Coombs, L. P. (2004). Odds ratios for a continuous
outcome variable without dichotomizing.

*Stat Med*,*23*, 1843–1860.large loss of efficiency and
power;embeds in a logistic distribution, similar to proportional odds
model;categorization;dichotomization of a continuous response in order
to obtain odds ratios often results in an inflation of the needed sample
size by a factor greater than 1.5

Muenz, L. R. (1983). Comparing survival distributions: A
review for nonstatisticians. II.

*Ca Invest*,*1*, 537–545.
Muggeo, V. M. R., & Tagliavia, M. (2010). A flexible approach to the
crossing hazards problem.

*Stat Med*,*29*, 1947–1957.failed to reference per06red or per07app

Myers, R. H. (1990).

*Classical and Modern Regression with Applications*. PWS-Kent.
Nagelkerke, N. J. D. (1991). A note on a general definition of the
coefficient of determination.

*Biometrika*,*78*, 691–692.
Nick, T. G., & Hardin, J. M. (1999). Regression modeling strategies:
An illustrative case study from medical rehabilitation
outcomes research.

*Am J Occ Ther*,*53*, 459–470.
Nott, D. J., & Leng, C. (2010). Bayesian projection approaches to
variable selection in generalized linear models.

*Computational Statistics & Data Analysis*,*54*(12), 3227–3241. https://doi.org/10.1016/j.csda.2010.01.036
Paul, D., Bair, E., Hastie, T., & Tibshirani, R. (2008).
“Preconditioning” for feature selection and
regression in high-dimensional problems.

*Ann Stat*,*36*(4), 1595–1619. https://doi.org/10.1214/009053607000000578develop consistent Y using a latent variable
structure, using for example supervised principal components. Then run
stepwise regression or lasso predicting Y (lasso worked better). Can run
into problems when a predictor has importance in an adjusted sense but
has no marginal correlation with Y;model approximation;model
simplification

Peduzzi, P., Concato, J., Feinstein, A. R., & Holford, T. R. (1995).
Importance of events per independent variable in proportional hazards
regression analysis. II. Accuracy and
precision of regression estimates.

*J Clin Epi*,*48*, 1503–1510.
Peduzzi, P., Concato, J., Kemper, E., Holford, T. R., & Feinstein,
A. R. (1996). A simulation study of the number of events per variable in
logistic regression analysis.

*J Clin Epi*,*49*, 1373–1379.
Peek, N., Arts, D. G. T., Bosman, R. J., van der Voort, P. H. J., &
de Keizer, N. F. (2007). External validation of prognostic models for
critically ill patients required substantial sample sizes.

*J Clin Epi*,*60*, 491–501.large sample sizes
need to obtain reliable external validations;inadequate power of DeLong,
DeLong, and Clarke-Pearson test for differences in correlated ROC areas
(p. 498);problem with tests of calibration accuracy having too much
power for large sample sizes

Pencina, M. J., D’Agostino, R. B., & Demler, O. V. (2012). Novel
metrics for evaluating improvement in discrimination: Net
reclassification and integrated discrimination improvement for normal
variables and nested models.

*Stat Med*,*31*(2), 101–113. https://doi.org/10.1002/sim.4348
Pencina, M. J., D’Agostino, R. B., & Steyerberg, E. W. (2011).
Extensions of net reclassification improvement calculations to measure
usefulness of new biomarkers.

*Stat Med*,*30*, 11–21. https://doi.org/10.1002/sim.4085lack of need for NRI to be
category-based;arbitrariness of categories;"category-less or continuous
NRI is the most objective and versatile measure of improvement in risk
prediction;authors misunderstood the inadequacy of three categories if
categories are used;comparison of NRI to change in C index;example of
continuous plot of risk for old model vs. risk for new model

Pencina, M. J., D’Agostino Sr, R. B., D’Agostino Jr, R. B., & Vasan,
R. S. (2008). Evaluating the added predictive ability of a new marker:
From area under the ROC curve to
reclassification and beyond.

*Stat Med*,*27*, 157–172.small differences in ROC area can still
be very meaningful;example of insignificant test for difference in ROC
areas with very significant results from new method;Yates’
discrimination slope;reclassification table;limiting version of this
based on whether and amount by which probabilities rise for events and
lower for non-events when compare new model to old;comparing two
models;see letter to the editor by Van Calster and Van Huffel, Stat in
Med 29:318-319, 2010 and by Cook and Paynter, Stat in Med 31:93-97,
2012

Penning, de V. B. B. L., van, S. M., & Groenwold, R. H. H. (2018).
Propensity Score Estimation Using Classification and
Regression Trees in the Presence of
Missing Covariate Data.

*Epidemiologic Methods*,*7*(1). https://doi.org/10.1515/em-2017-0020
Pepe, Margaret S. (1991). Inference for events with dependent risks in
multiple endpoint studies.

*J Am Stat Assoc*,*86*, 770–778.
Pepe, M. S., Longton, G., & Thornquist, M. (1991). A qualifier
Q for the survival function to describe the prevalence of a
transient condition.

*Stat Med*,*10*, 413–421.
Pepe, Margaret S., & Mori, M. (1993). Kaplan–Meier,
marginal or conditional probability curves in summarizing competing
risks failure time data?

*Stat Med*,*12*, 737–751.
Perperoglou, A., le Cessie, S., & van Houwelingen, H. C. (2006).
Reduced-rank hazard regression for modelling non-proportional hazards.

*Stat Med*,*25*, 2831–2845.natural
structural way to allow for varying degrees of freedom in modeling
non-PH

Peters, S. A., Bots, M. L., den Ruijter, H. M., Palmer, M. K., Grobbee,
D. E., Crouse, J. R., O’Leary, D. H., Evans, G. W., Raichlen, J. S.,
Moons, K. G., Koffijberg, H., & METEOR study group. (2012). Multiple
imputation of missing repeated outcome measurements did not add to
linear mixed-effects models.

*J Clin Epi*,*65*(6), 686–695. https://doi.org/10.1016/j.jclinepi.2011.11.012
Peterson, B., & George, S. L. (1993). Sample size requirements and
length of study for testing interaction in a 1 k factorial design when
time-to-failure is the outcome.

*Controlled Clin Trials*,*14*, 511–522.
Peterson, B., & Harrell, F. E. (1990). Partial proportional odds
models for ordinal response variables.

*Appl Stat*,*39*, 205–217.
Pike, M. C. (1966). A method of analysis of certain class of experiments
in carcinogenesis.

*Biometrics*,*22*, 142–161.
Pinheiro, J. C., & Bates, D. M. (2000).

*Mixed-Effects Models in S and S-PLUS*. Springer.
Potthoff, R. F., & Roy, S. N. (1964). A generalized multivariate
analysis of variance model useful especially for growth curve problems.

*Biometrika*,*51*, 313–326.included an AR1 example

Pryor, David B., Harrell, F. E., Lee, K. L., Califf, R. M., &
Rosati, R. A. (1983). Estimating the likelihood of significant coronary
artery disease.

*Am J Med*,*75*, 771–780.
Pryor, D. B., Harrell, F. E., Rankin, J. S., Lee, K. L., Muhlbaier, L.
H., Oldham, H. N., Hlatky, M. A., Mark, D. B., Reves, J. G., &
Califf, R. M. (1987). The changing survival benefits of coronary
revascularization over time.

*Circ (Supplement V)*,*76*, 13–21.
Putter, H., Sasako, M., Hartgrink, H. H., van de Velde, C. J. H., &
van Houwelingen, J. C. (2005). Long-term survival with non-proportional
hazards: Results from the Dutch Gastric Cancer Trial.

*Stat Med*,*24*, 2807–2821.
Radchenko, P., & James, G. M. (2008). Variable inclusion and
shrinkage algorithms.

*J Am Stat Assoc*,*103*(483), 1304–1315.solves problem caused by lasso using
the same penalty parameter for variable selection and shrinkage which
causes lasso to have to keep too many variables in the model to avoid
overshrinking the remaining predictors;does not handle scaling issue
well

Ragland, D. R. (1992). Dichotomizing continuous outcome variables:
Dependence of the magnitude of association and statistical
power on the cutpoint.

*Epi*,*3*, 434–440. https://doi.org/10.1097/00001648-199209000-00009
Reilly, B. M., & Evans, A. T. (2006). Translating clinical research
into clinical practice: Impact of using prediction rules to
make decisions.

*Ann Int Med*,*144*, 201–209.impact analysis;example of decision aid being ignored
or overruled making MD decisions worse;assumed utilities are constant
across subjects by concluding that directives have more impact than
predictions;Goldman-Cook clinical prediction rule in AMI

Reiter, J. P. (2007). Small-sample degrees of freedom for
multi-component significance tests with multiple imputation for missing
data.

*Biometrika*,*94*(2), 502–508. https://doi.org/10.1093/biomet/asm028
Riley, R. D., Snell, K. I. E., Ensor, J., Burke, D. L., Harrell, F. E.,
Moons, K. G. M., & Collins, G. S. (2019). Minimum sample size for
developing a multivariable prediction model: Part I –
Continuous outcomes.

*Statistics in Medicine*,*38*(7), 1276–1296. https://doi.org/10.1002/sim.7993
Riley, R. D., Snell, K. I., Ensor, J., Burke, D. L., Jr, F. E. H.,
Moons, K. G., & Collins, G. S. (2019). Minimum sample size for
developing a multivariable prediction model: PART II -
binary and time-to-event outcomes.

*Statistics in Medicine*,*38*(7), 1276–1296. https://doi.org/10.1002/sim.7992
Roecker, E. B. (1991). Prediction error and its estimation for
subset-selected models.

*Technometrics*,*33*, 459–468.
Royston, P., Altman, D. G., & Sauerbrei, W. (2006). Dichotomizing
continuous predictors in multiple regression: A bad idea.

*Stat Med*,*25*, 127–141. https://doi.org/10.1002/sim.2331destruction of statistical inference when cutpoints
are chosen using the response variable; varying effect estimates when
change cutpoints;difficult to interpret effects when dichotomize;nice
plot showing effect of categorization; PBC data

Rubin, D., & Schenker, N. (1991). Multiple imputation in health-care
data bases: An overview and some applications.

*Stat Med*,*10*, 585–598.
Sarle, W. (1990). The VARCLUS Procedure. In

*SAS/STAT User’s Guide*(fourth, Vol. 2, pp. 1641–1659). SAS Institute, Inc. http://support.sas.com/documentation/onlinedoc/stat
Sauerbrei, W., & Schumacher, M. (1992). A bootstrap resampling
procedure for model building: Application to the
Cox regression model.

*Stat Med*,*11*, 2093–2109.
Schafer, J. L., & Graham, J. W. (2002). Missing data:
Our view of the state of the art.

*Psych Meth*,*7*, 147–177.excellent review and overview
of missing data and imputation;problems with MICE;less technical
description of 3 types of missing data

Schemper, M., & Heinze, G. (1997). Probability imputation revisited
for prognostic factor studies.

*Stat Med*,*16*, 73–80.imputation of missing covariables using
logistic model;comparison with multiple imputation;analysis of prostate
cancer dataset

Schoenfeld, D. (1982). Partial residuals for the proportional hazards
regression model.

*Biometrika*,*69*, 239–241.
Schulgen, G., Lausen, B., Olsen, J., & Schumacher, M. (1994).
Outcome-oriented cutpoints in quantitative exposure.

*Am J Epi*,*120*, 172–184.
Selvin, E., Steffes, M. W., Zhu, H., Matsushita, K., Wagenknecht, L.,
Pankow, J., Coresh, J., & Brancati, F. L. (2010). Glycated
hemoglobin, diabetes, and cardiovascular risk in nondiabetic adults.

*NEJM*,*362*(9), 800–811. https://doi.org/10.1056/NEJMoa0908359
Senn, S. (2006). Change from baseline and analysis of covariance
revisited.

*Stat Med*,*25*, 4334–4344.shows that claims that in a 2-arm study it is not true
that ANCOVA requires the population means at baseline to be
identical;refutes some claims of lia00lon;problems with
counterfactuals;temporal additivity ("amounts to supposing that despite
the fact that groups are difference at baseline they would show the same
evolution over time");causal additivity;is difficult to design trials
for which simple analysis of change scores is unbiased, ANCOVA is
biased, and a causal interpretation can be given;temporally and
logically, a "baseline cannot be a <i>response</i> to treatment", so baseline and
response cannot be modeled in an integrated framework as Laird and
Ware’s model has been used;"one should focus clearly on
“outcomes” as being the only values that can be influenced
by treatment and examine critically any schemes that assume that these
are linked in some rigid and deterministic view to
“baseline” values. An alternative tradition sees a baseline
as being merely one of a number of measurements capable of improving
predictions of outcomes and models it in this way.";"You cannot
establish necessary conditions for an estimator to be valid by
nominating a model and seeing what the model implies unless the model is
universally agreed to be impeccable. On the contrary it is appropriate
to start with the estimator and see what assumptions are implied by
valid conclusions.";this is in distinction to lia00lon

Shao, J. (1993). Linear model selection by cross-validation.

*J Am Stat Assoc*,*88*, 486–494.
Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2013). A
sparse-group lasso.

*J Comp Graph Stat*,*22*(2), 231–245. https://doi.org/10.1080/10618600.2012.681250sparse effects both on a group and within group
levels;can also be considered special case of group lasso allowing
overlap between groups

Simpson, S. L., Edwards, L. J., Muller, K. E., Sen, P. K., & Styner,
M. A. (2010). A linear exponent AR(1) family of correlation
structures.

*Stat Med*,*29*, 1825–1838.
Smith, L. R., Harrell, F. E., & Muhlbaier, L. H. (1992). Problems
and potentials in modeling survival. In M. L. Grady & H. A. Schwartz
(Eds.),

*Medical Effectiveness Research Data Methods (Summary Report), AHCPR Pub. No. 92-0056*(pp. 151–159). US Dept. of Health and Human Services, Agency for Health Care Policy and Research. https://hbiostat.org/bib/papers/smi92pro.pdf
Spanos, A., Harrell, F. E., & Durack, D. T. (1989). Differential
diagnosis of acute meningitis: An analysis of the
predictive value of initial observations.

*JAMA*,*262*, 2700–2707. https://doi.org/10.1001/jama.262.19.2700
Spence, I., & Garrison, R. F. (1993). A remarkable scatterplot.

*Am Statistician*,*47*, 12–19.
Spiegelhalter, D. J. (1986). Probabilistic prediction in patient
management and clinical trials.

*Stat Med*,*5*, 421–433. https://doi.org/10.1002/sim.4780050506z-test for calibration inaccuracy (implemented in
Stata, and R Hmisc package’s val.prob function)

Stan Development Team. (2020).

*Stan: A C++ Library for Probability and Sampling*. https://cran.r-project.org/package=rstan
Steyerberg, E. W. (2009).

*Clinical Prediction Models*. Springer.
Steyerberg, E. W. (2018). Validation in prediction research: The waste
by data-splitting.

*Journal of Clinical Epidemiology*,*0*(0). https://doi.org/10.1016/j.jclinepi.2018.07.010
Steyerberg, E. W., Eijkemans, M. J. C., Harrell, F. E., & Habbema,
J. D. F. (2000). Prognostic modelling with logistic regression analysis:
A comparison of selection and estimation methods in small
data sets.

*Stat Med*,*19*, 1059–1079.
Steyerberg, E. W., Eijkemans, M. J. C., Harrell, F. E., & Habbema,
J. D. F. (2001). Prognostic modeling with logistic regression analysis:
In search of a sensible strategy in small data sets.

*Med Decis Mak*,*21*, 45–56.
Stone, C. J. (1986). Comment: Generalized additive models.

*Stat Sci*,*1*, 312–314.
Stone, C. J., & Koo, C. Y. (1985). Additive splines in statistics.

*Proceedings of the Statistical Computing Section ASA*, 45–48.
Strauss, D., & Shavelle, R. (1998). An extended
Kaplan–Meier estimator and its applications.

*Stat Med*,*17*, 971–982.estimation of
transition probabilities of an individual in state i at time x being in
state j at a subsequent time t;dead state and multiple live
states;prognostic chart;generalized uninformative censoring;multistate
Kaplan-Meier estimator

Suissa, S., & Blais, L. (1995). Binary regression with continuous
outcomes.

*Stat Med*,*14*, 247–255. https://doi.org/10.1002/sim.4780140303
Sullivan, T. R., Salter, A. B., Ryan, P., & Lee, K. J. (2015). Bias
and Precision of the “Multiple
Imputation, Then Deletion”
Method for Dealing With Missing Outcome Data.

*American Journal of Epidemiology*,*182*(6), 528–534. https://doi.org/10.1093/aje/kwv100Disagrees with von Hippel approach of "impute then
delete" for Y

Sun, G.-W., Shook, T. L., & Kay, G. L. (1996). Inappropriate use of
bivariable analysis to screen risk factors for use in multivariable
analysis.

*J Clin Epi*,*49*, 907–916.
Therneau, T. M., Grambsch, P. M., & Fleming, T. R. (1990).
Martingale-based residuals for survival models.

*Biometrika*,*77*, 216–218.
Tibshirani, R. (1988). Estimating transformations for regression via
additivity and variance stabilization.

*J Am Stat Assoc*,*83*, 394–405.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.

*J Roy Stat Soc B*,*58*, 267–288.
Tjur, T. (2009). Coefficients of determination in logistic regression
models—A new proposal: The coefficient of
discrimination.

*Am Statistician*,*63*(4), 366–372.
Twisk, J., de Boer, M., de Vente, W., & Heymans, M. (2013). Multiple
imputation of missing values was not necessary before performing a
longitudinal mixed-model analysis.

*J Clin Epi*,*66*(9), 1022–1028. https://doi.org/10.1016/j.jclinepi.2013.03.017
Vach, W., & Blettner, M. (1998). Missing Data in
Epidemiologic Studies. In

*Ency of Biostatistics*(pp. 2641–2654). Wiley.
van Buuren, S., Brand, J. P. L., Groothuis-Oudshoorn, C. G. M., &
Rubin, D. B. (2006). Fully conditional specification in multivariate
imputation.

*J Stat Computation Sim*,*76*(12), 1049–1064.justification for chained equations
alternative to full multivariate modeling

Van Calster, B., Nieboer, D., Vergouwe, Y., De Cock, B., Pencina, M. J.,
& Steyerberg, E. W. (2016). A calibration hierarchy for risk models
was defined: From utopia to empirical data.

*J Clin Epi*,*74*, 167–176. https://doi.org/10.1016/j.jclinepi.2015.12.005
van der Heijden, G. J. M. G., Donders, Stijnen, T., & Moons, K. G.
M. (2006). Imputation of missing values is superior to complete case
analysis and the missing-indicator method in multivariable diagnostic
research: A clinical example.

*J Clin Epi*,*59*, 1102–1109. https://doi.org/10.1016/j.jclinepi.2006.01.015Invalidity of adding a new category or an indicator
variable for missing values even with MCAR

van der Ploeg, T., Austin, P. C., & Steyerberg, E. W. (2014). Modern
modelling techniques are data hungry: A simulation study for predicting
dichotomous endpoints.

*BMC Medical Research Methodology*,*14*(1), 137+. https://doi.org/10.1186/1471-2288-14-137Would be better to use proper accuracy scores in the
assessment. Too much emphasis on optimism as opposed to final
discrimination measure. But much good practical information. Recursive
partitioning fared poorly.

van Houwelingen, J. C., & le Cessie, S. (1990). Predictive value of
statistical models.

*Stat Med*,*9*, 1303–1325.
Venables, W. N., & Ripley, B. D. (2003).

*Modern Applied Statistics with S*(Fourth). Springer-Verlag.
Verbeke, G., & Molenberghs, G. (2000).

*Linear Mixed Models for Longitudinal Data*. Springer.
Verweij, P. J., & van Houwelingen, H. C. (1994). Penalized
likelihood in Cox regression.

*Stat Med*,*13*, 2427–2436.
Vickers, A. J. (2008). Decision analysis for the evaluation of
diagnostic tests, prediction models, and molecular markers.

*Am Statistician*,*62*(4), 314–320.limitations of accuracy metrics;incorporating clinical
consequences;nice example of calculation of expected outcome;drawbacks
of conventional decision analysis, especially because of the difficulty
of eliciting the expected harm of a missed diagnosis;use of a threshold
on the probability of disease for taking some action;decision curve;has
other good references to decision analysis

Vink, G., Frank, L. E., Pannekoek, J., & van Buuren, S. (2014).
Predictive mean matching imputation of semicontinuous variables.

*Statistica Neerlandica*,*68*(1), 61–90. https://doi.org/10.1111/stan.12023
Vittinghoff, E., & McCulloch, C. E. (2006). Relaxing the rule of ten
events per variable in logistic and Cox regression.

*Am J Epi*,*165*, 710–718.the authors may
have not been quite stringent enough in their assessment of adequacy of
predictions;letter to the editor submitted

von Hippel, P. T. (2007). Regression with missing Ys:
An improved strategy for analyzing multiple imputed data.

*Soc Meth*,*37*(1), 83–117.
von Hippel, P. T. (2016).

*The number of imputations should increase quadratically with the fraction of missing information*. http://arxiv.org/abs/1608.05406
Wainer, H. (2006). Finding what is not there through the unfortunate
binning of results: The Mendel effect.

*Chance*,*19*(1), 49–56.can find bins that yield
either positive or negative association;especially pertinent when
effects are small;"With four parameters, I can fit an elephant; with
five, I can make it wiggle its trunk." - John von Neumann

Walker, S. H., & Duncan, D. B. (1967). Estimation of the probability
of an event as a function of several independent variables.

*Biometrika*,*54*, 167–178.
Wang, H., & Leng, C. (2007). Unified LASSO estimation
by least squares approximation.

*J Am Stat Assoc*,*102*, 1039–1048. https://doi.org/10.1198/016214507000000509
Wang, S., Nan, B., Zhou, N., & Zhu, J. (2009). Hierarchically
penalized Cox regression with grouped variables.

*Biometrika*,*96*(2), 307–322.
Wax, Y. (1992). Collinearity diagnosis for a relative risk regression
analysis: An application to assessment of diet-cancer
relationship in epidemiological studies.

*Stat Med*,*11*, 1273–1287.
Wenger, T. L., Harrell, F. E., Brown, K. K., Lederman, S., &
Strauss, H. C. (1984). Ventricular fibrillation following canine
coronary reperfusion: Different outcomes with pentobarbital
and α-chloralose.

*Can J Phys Pharm*,*62*, 224–228.
White, I. R., & Carlin, J. B. (2010). Bias and efficiency of
multiple imputation compared with complete-case analysis for missing
covariate values.

*Stat Med*,*29*, 2920–2931.
White, I. R., & Royston, P. (2009). Imputing missing covariate
values for the Cox model.

*Stat Med*,*28*, 1982–1998.approach to using event time and
censoring indicator as predictors in the imputation model for missing
baseline covariates;recommended an approximation using the event
indicator and the cumulative hazard transformation of time, without
their interaction

White, I. R., Royston, P., & Wood, A. M. (2011). Multiple imputation
using chained equations: Issues and guidance for practice.

*Stat Med*,*30*(4), 377–399.practical guidance for the use of multiple imputation
using chained equations;MICE;imputation models for different types of
target variables;PMM choosing at random from among a few closest
matches;choosing number of multiple imputations by a reproducibility
argument, suggesting 100f imputations when f is the fraction of cases
that are incomplete

Whitehead, J. (1993). Sample size calculations for ordered categorical
data.

*Stat Med*,*12*, 2257–2271.
Wiegand, R. E. (2010). Performance of using multiple stepwise algorithms
for variable selection.

*Stat Med*,*29*, 1647–1659.fruitless to try different stepwise methods and look
for agreement;the methods will agree on the wrong model

Witten, D. M., & Tibshirani, R. (2008). Testing significance of
features by lassoed principal components.

*Ann Appl Stat*,*2*(3), 986–1012.reduction in false
discovery rates over using a vector of t-statistics;borrowing strength
across genes;"one would not expect a single gene to be associated with
the outcome, since, in practice, many genes work together to effect a
particular phenotype. LPC effectively down-weights individual genes that
are associated with the outcome but that do not share an expression
pattern with a larger group of genes, and instead favors large groups of
genes that appear to be differentially-expressed.";regress principal
components on outcome;sparse principal components

Wood, S. N. (2006).

*Generalized Additive Models: An Introduction with R*. Chapman & Hall/CRC.
Wu, C. F. J. (1986). Jackknife, bootstrap and other resampling methods
in regression analysis.

*Ann Stat*,*14*(4), 1261–1350.
Xiong, S. (2010). Some notes on the nonnegative garrote.

*Technometrics*,*52*(3), 349–361."... to select tuning parameters, it may be
unnecessary to optimize a model selectin criterion repeatedly";natural
selection of penalty function

Ye, J. (1998). On measuring and correcting the effects of data mining
and model selection.

*J Am Stat Assoc*,*93*, 120–131.
Young, F. W., Takane, Y., & de Leeuw, J. (1978). The principal
components of mixed measurement level multivariate data: An
alternating least squares method with optimal scaling features.

*Psychometrika*,*43*, 279–281.
Yucel, R. M., & Zaslavsky, A. M. (2008). Using calibration to
improve rounding in imputation.

*Am Statistician*,*62*(2), 125–129.using rounding to impute
binary variables using techniques for continuous data;uses the method to
solve for the cutpoint for a continuous estimate to be converted into a
binary value;method should be useful in more general situations;idea is
to duplicate the entire dataset and in the second half of the new
datasets to set all non-missing values of the target variable to
missing;multiply impute these now-missing values and compare them to the
actual values

Zhang, H. H., & Lu, W. (2007). Adaptive lasso for Cox’s
proportional hazards model.

*Biometrika*,*94*, 691–703.penalty function has ratios against
original MLE;scale-free lasso

Zhang, M., Yu, Y., Wang, S., Salvatore, M., Fritsche, L. G., He, Z.,
& Mukherjee, B. (2020). Interaction analysis under misspecification
of main effects: Some common mistakes and simple solutions.

*Statistics in Medicine*,*n/a*(n/a). https://doi.org/10.1002/sim.8505
Zheng, X., & Loh, W.-L. (1995). Consistent variable selection in
linear models.

*J Am Stat Assoc*,*90*, 151–156.
Zhou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal
component analysis.

*J Comp Graph Stat*,*15*, 265–286.principal components analysis that
shrinks some loadings to zero

Zou, H., & Hastie, T. (2005). Regularization and variable selection
via the elastic net.

*J Roy Stat Soc B*,*67*(2), 301–320.