References

Allison, P. D. (2001). Missing Data. Sage.

Altman, D. G. (1991). Categorising continuous covariates (letter to the editor). Brit J Cancer, 64, 975.

Altman, D. G. (1998). Suboptimal analysis using “optimal” cutpoints. Brit J Cancer, 78, 556–557.

Altman, D. G., & Andersen, P. K. (1989). Bootstrap investigation of the stability of a Cox regression model. Stat Med, 8, 771–783.

Altman, D. G., Lausen, B., Sauerbrei, W., & Schumacher, M. (1994). Dangers of using “optimal” cutpoints in the evaluation of prognostic factors. J Nat Cancer Inst, 86, 829–835.

Andrews, D. F., & Herzberg, A. M. (1985). Data. Springer-Verlag.

Arjas, E. (1988). A graphical method for assessing goodness of fit in Cox’s proportional hazards model. J Am Stat Assoc, 83, 204–212.

Armstrong, B. G., & Sloan, M. (1989). Ordinal regression models for epidemiologic data. Am J Epi, 129, 191–204.

Atkinson, A. C. (1980). A note on the generalized information criterion for choice of a model. Biometrika, 67, 413–418.

Austin, P. C. (2008). Bootstrap model selection had similar performance for selecting authentic and noise variables compared to backward variable elimination: A simulation study. J Clin Epi, 61, 1009–1017.

"in general, a bootstrap model selection method had comparable performance to conventional backward variable elimination for identifying the true regression model. In most settings, both methods performed poorly at correctly identifying the correct regression model."

Austin, P. C., & Steyerberg, E. W. (2019). The Integrated Calibration Index (ICI) and related metrics for quantifying the calibration of logistic regression models. Statistics in Medicine, 38(21), 4051–4065. https://doi.org/10.1002/sim.8281

Austin, P. C., Tu, J. V., Daly, P. A., & Alter, D. A. (2005). Tutorial in Biostatistics:The use of quantile regression in health care research: A case study examining gender differences in the timeliness of thrombolytic therapy. Stat Med, 24, 791–816.

Austin, P. C., Tu, J. V., & Lee, D. S. (2010). Logistic regression had superior performance compared with regression trees for predicting in-hospital mortality in patients hospitalized with heart failure. J Clin Epi, 63, 1145–1155.

ROC areas for logistic models varied from 0.747 to 0.775 whereas they varied from 0.620-0.651 for recursive partitioning;repeated data simulation showed large variation in tree structure

Awounvo, S., Kieser, M., & Feißt, M. (2025). Combining multiple imputation with internal model validation in clinical prediction modeling: A systematic methodological review. Journal of Clinical Epidemiology, 111916. https://doi.org/10.1016/j.jclinepi.2025.111916

Barnes, S. A., Lindborg, S. R., & Seaman, J. W. (2006). Multiple imputation techniques in small sample clinical trials. Stat Med, 25, 233–245.

bad performance of LOCF including high bias and poor confidence interval coverage;simulation setup;longitudinal data;serial data;RCT;dropout;assumed missing at random (MAR);approximate Bayesian bootstrap;Bayesian least squares;missing data;nice background summary;new completion score method based on fitting a Poisson model for the number of completed clinic visits and using donors and approximate Bayesian bootstrap

Barzi, F., & Woodward, M. (2004). Imputations of missing values in practice: Results from imputations of serum cholesterol in 28 cohort studies. Am J Epi, 160, 34–45.

excellent review article for multiple imputation;list of variables to include in imputation model;"Imputation models should ideally include all covariates that are related to the missing data mechanism, have distributions that differ between the respondents and nonrespondents, are associated with cholesterol, and will be included in the analyses of the final complete data sets";detailed comparison of results (cholesterol effect and confidence limits) for various imputation methods

Belcher, H. (1992). The concept of residual confounding in regression models and some applications. Stat Med, 11, 1747–1758.

Belsley, D. A. (1991). Conditioning Diagnostics: Collinearity and Weak Data in Regression. Wiley.

Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. Wiley.

BENEDETTI, J. K., LIU, P.-Y., SATHER, H. N., SEINFELD, J., & EPTON, M. A. (1982). Effective sample size for tests of censored survival data. Biometrika, 69(2), 343–349. https://doi.org/10.1093/biomet/69.2.343

Bennette, C., & Vickers, A. (2012). Against quantiles: Categorization of continuous variables in epidemiologic research, and its discontents. BMC Med Res Methodol, 12(1), 21+. https://doi.org/10.1186/1471-2288-12-21

terrific graphical examples; nice display of outcome heterogeneity within quantile groups of PSA

Berhane, K., Hauptmann, M., & Langholz, B. (2008). Using tensor product splines in modeling exposure–time–response relationships: Application to the Colorado Plateau Uranium Miners cohort. Stat Med, 27, 5484–5496.

discusses taking product of all univariate spline basis functions

Bernal, J. L., Cummins, S., & Gasparrini, A. (2017). Interrupted time series regression for the evaluation of public health interventions: A tutorial. International Journal of Epidemiology, 46(1), 348–355. https://doi.org/10.1093/ije/dyw098

Berridge, D. M., & Whitehead, J. (1991). Analysis of failure time data with ordinal categories of response. Stat Med, 10, 1703–1710. https://doi.org/10.1002/sim.4780101108

Blettner, M., & Sauerbrei, W. (1993). Influence of model-building strategies on the results of a case-control study. Stat Med, 12, 1325–1338.

Bondarenko, I., & Raghunathan, T. (2016). Graphical and numerical diagnostic tools to assess suitability of multiple imputations and imputation models. Stat Med, 35(17), 3007–3020. https://doi.org/10.1002/sim.6926

Booth, J. G., & Sarkar, S. (1998). Monte Carlo approximation of bootstrap variances. Am Statistician, 52, 354–357.

number of resamples required to estimate variances, quantiles; 800 resamples may be required to guarantee with 0.95 confidence that the relative error of a variance estimate is 0.1;Efron’s original suggestions for as low as 25 resamples were based on comparing stability of bootstrap estimates to sampling error, but small relative effects can significantly change P-values;number of bootstrap resamples

Bordley, R. (2007). Statistical decisionmaking without math. Chance, 20(3), 39–44.

Breiman, L. (1992). The little bootstrap and other methods for dimensionality selection in regression: X-fixed prediction error. J Am Stat Assoc, 87, 738–754.

Breiman, L., & Friedman, J. H. (1985). Estimating optimal transformations for multiple regression and correlation (with discussion). J Am Stat Assoc, 80, 580–619.

Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Wadsworth and Brooks/Cole.

Breslow, N. E., Edler, L., & Berger, J. (1984). A two-sample censored-data rank test for acceleration. Biometrics, 40, 1049–1062.

Briggs, W. M., & Zaretzki, R. (2008). The skill plot: A graphical technique for evaluating continuous diagnostic tests (with discussion). Biometrics, 64, 250–261.

"statistics such as the AUC are not especially relevant to someone who must make a decision about a particular x_c. ... ROC curves lack or obscure several quantities that are necessary for evaluating the operational effectiveness of diagnostic tests. ... ROC curves were first used to check how radio <i>receivers</i> (like radar receivers) operated over a range of frequencies. ... This is not how most ROC curves are used now, particularly in medicine. The receiver of a diagnostic measurement ... wants to make a decision based on some x_c, and is not especially interested in how well he would have done had he used some different cutoff."; in the discussion David Hand states "when integrating to yield the overall AUC measure, it is necessary to decide what weight to give each value in the integration. The AUC implicitly does this using a weighting derived empirically from the data. This is nonsensical. The relative importance of misclassifying a case as a noncase, compared to the reverse, cannot come from the data itself. It must come externally, from considerations of the severity one attaches to the different kinds of misclassifications."; see Lin, Kvam, Lu Stat in Med 28:798-813;2009

Brownstone, D. (1988). Regression strategies. Proceedings of the 20th Symposium on the Interface Between Computer Science and Statistics, 74–79.

Buettner, P., Garbe, C., & Guggenmoos-Holzmann, I. (1997). Problems in defining cutoff points of continuous prognostic factors: Example of tumor thickness in primary cutaneous melanoma. J Clin Epi, 50, 1201–1210.

choice of cut point depends on marginal distribution of predictor

Byar, D. P., & Green, S. B. (1980). The choice of treatment for cancer patients based on covariate information: Application to prostate cancer. Bulletin Cancer, Paris, 67, 477–488.

Califf, R. M., Harrell, F. E., Lee, K. L., Rankin, J. S., & Others. (1989). The evolution of medical and surgical therapy for coronary artery disease. JAMA, 261, 2077–2086.

Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., & Riddell, A. (2017). Stan: A Probabilistic Programming Language. J Stat Software, 76(1), 1–32. https://doi.org/10.18637/jss.v076.i01

Carpenter, J. R., & Smuk, M. (2021). Missing data: A statistical framework for practice. Biometrical Journal, 63(5), 915–947. https://doi.org/10.1002/bimj.202000196

Carpenter, J., & Bithell, J. (2000). Bootstrap confidence intervals: When, which, what? A practical guide for medical statisticians. Stat Med, 19, 1141–1164.

unconditional nonparametric bootstrap becomes more equivalent to conditional bootstrap based on regression residuals when full models are fitted

Centers for Disease Control and Prevention CDC. National Center for Health Statistics NCHS. (2010). National Health and Nutrition Examination Survey. http://www.cdc.gov/nchs/nhanes/nhanes2009-2010/nhanes09_10.htm

Chambers, J. M., & Hastie, T. J. (Eds.). (1992). Statistical Models in S. Wadsworth and Brooks/Cole.

Chan, K. W., & Meng, X.-L. (2022). Multiple improvements of multiple imputation likelihood ratio tests. Statistica Sinica, 32, 1489–1514. https://doi.org/10.5705/ss.202019.0314

Chatfield, C. (1991). Avoiding statistical pitfalls (with discussion). Stat Sci, 6, 240–268.

Chatfield, C. (1995). Model uncertainty, data mining and statistical inference (with discussion). J Roy Stat Soc A, 158, 419–466.

bias by selecting model because it fits the data well; bias in standard errors;P. 420: ... need for a better balance in the literature and in statistical teaching between techniques and problem solving strategies. P. 421: It is “well known” to be “logically unsound and practically misleading” (Zhang, 1992) to make inferences as if a model is known to be true when it has, in fact, been selected from the same data to be used for estimation purposes. However, although statisticians may admit this privately (Breiman (1992) calls it a “quiet scandal”), they (we) continue to ignore the difficulties because it is not clear what else could or should be done. P. 421: Estimation errors for regression coefficients are usually smaller than errors from failing to take into account model specification. P. 422: Statisticians must stop pretending that model uncertainty does not exist and begin to find ways of coping with it. P. 426: It is indeed strange that we often admit model uncertainty by searching for a best model but then ignore this uncertainty by making inferences and predictions as if certain that the best fitting model is actually true. P. 427: The analyst needs to assess the model selection process and not just the best fitting model. P. 432: The use of subset selection methods is well known to introduce alarming biases. P. 433: ... the AIC can be highly biased in data-driven model selection situations. P. 434: Prediction intervals will generally be too narrow. In the discussion, Jamal R. M. Ameen states that a model should be (a) satisfactory in performance relative to the stated objective, (b) logically sound, (c) representative, (d) questionable and subject to on-line interrogation, (e) able to accommodate external or expert information and (f) able to convey information.

Chatterjee, S., & Hadi, A. S. (2012). Regression Analysis by Example (Fifth). Wiley.

Chavent, M., Kuentz-Simonet, V., Liquet, B., & Saracco, J. (2012). ClustOfVar: An R package for the clustering of variables. J Stat Software, 50(13), 1–16.

Ciampi, A., Thiffault, J., Nakache, J. P., & Asselain, B. (1986). Stratification by stepwise regression, correspondence analysis and recursive partition. Comp Stat Data Analysis, 1986, 185–204.

Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc, 74, 829–836.

Collett, D. (2002). Modelling Binary Data (Second). Chapman and Hall.

Collins, G. S., Ogundimu, E. O., & Altman, D. G. (2016). Sample size considerations for the external validation of a multivariable prognostic model: A resampling study. Stat Med, 35(2), 214–226. https://doi.org/10.1002/sim.6787

Collins, G. S., Ogundimu, E. O., Cook, J. A., Manach, Y. L., & Altman, D. G. (2016). Quantifying the impact of different approaches for handling continuous predictors on the performance of a prognostic model. Stat Med, 35(23), 4124–4135. https://doi.org/10.1002/sim.6986

used rms package hazard regression method (hare) for survival model calibration

Cook, E. F., & Goldman, L. (1988). Asymmetric stratification: An outline for an efficient method for controlling confounding in cohort studies. Am J Epi, 127, 626–639.

Cook, N. R. (2007). Use and misues of the receiver operating characteristic curve in risk prediction. Circ, 115, 928–935.

example of large change in predicted risk in cardiovascular disease with tiny change in ROC area;possible limits to c index when calibration is perfect;importance of calibration accuracy and changes in predicted risk when new variables are added

Copas, J. B. (1983). Regression, prediction and shrinkage (with discussion). J Roy Stat Soc B, 45, 311–354.

Copas, J. B. (1987). Cross-validation shrinkage of regression predictors. J Roy Stat Soc B, 49, 175–183.

Cox, C., Chu, H., Schneider, M. F., & Muñoz, A. (2007). Parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution. Stat Med, 26, 4352–4374.

nice tutoria;GG includes bathtub-shape hazard function; failed to reference Herndon

Cox, D. R. (1972). Regression models and life-tables (with discussion). J Roy Stat Soc B, 34, 187–220.

Crawford, S. L., Tennstedt, S. L., & McKinlay, J. B. (1995). A comparison of analytic methods for non-random missingness of outcome data. J Clin Epi, 48, 209–219.

Crichton, N. J., & Hinde, J. P. (1989). Correspondence analysis as a screening method for indicants for clinical diagnosis. Stat Med, 8, 1351–1362.

D’Agostino, R. B., Belanger, A. J., Markson, E. W., Kelly-Hayes, M., & Wolf, P. A. (1995). Development of health risk appraisal functions in the presence of multiple indicators: The Framingham Study nursing home institutionalization model. Stat Med, 14, 1757–1770.

Davis, C. S. (2002). Statistical Methods for the Analysis of Repeated Measurements. Springer.

Derksen, S., & Keselman, H. J. (1992). Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables. British J Math Stat Psych, 45, 265–282.

Devlin, T. F., & Weeks, B. J. (1986). Spline functions for logistic regression modeling. Proceedings of the Eleventh Annual SAS Users Group International Conference, 646–651.

Diggle, P. J., Heagerty, P., Liang, K.-Y., & Zeger, S. L. (2002). Analysis of Longitudinal Data (second). Oxford University Press.

Donders, van der Heijden, G. J. M. G., Stijnen, T., & Moons, K. G. M. (2006). Review: A gentle introduction to imputation of missing values. J Clin Epi, 59, 1087–1091.

simple demonstration of failure of the add new category method (indicator variable)

Donohue, M. C., Langford, O., Insel, P. S., van Dyck, C. H., Petersen, R. C., Craft, S., Sethuraman, G., Raman, R., Aisen, P. S., & Initiative, F. the A. D. N. (n.d.). Natural cubic splines for the analysis of Alzheimer’s clinical trials. Pharmaceutical Statistics, n/a(n/a). https://doi.org/10.1002/pst.2285

Duan, N. (1983). Smearing estimate: A nonparametric retransformation method. J Am Stat Assoc, 78, 605–610.

Duc, A. N., & Wolbers, M. (2017). Smooth semi-nonparametric (SNP) estimation of the cumulative incidence function. Stat Med, 36(18), n/a. https://doi.org/10.1002/sim.7331

Durrleman, S., & Simon, R. (1989). Flexible regression models with cubic splines. Stat Med, 8, 551–561.

Efron, B. (1983). Estimating the error rate of a prediction rule: Improvement on cross-validation. J Am Stat Assoc, 78, 316–331.

suggested need at least 200 models to get an average that is adequate, i.e., 20 repeats of 10-fold cv

Efron, B., & Narasimhan, B. (2020). The Automatic Construction of Bootstrap Confidence Intervals. Journal of Computational and Graphical Statistics, 0(0), 1–12. https://doi.org/10.1080/10618600.2020.1714633

Efron, B., & Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman and Hall.

Efron, B., & Tibshirani, R. (1997). Improvements on cross-validation: The .632+ bootstrap method. J Am Stat Assoc, 92, 548–560.

Efthimiou, O., Seo, M., Chalkou, K., Debray, T., Egger, M., & Salanti, G. (2024). Developing clinical prediction models: A step-by-step guide. BMJ, e078276. https://doi.org/10.1136/bmj-2023-078276

Erler, N. S., Rizopoulos, D., Rosmalen, J., Jaddoe, V. W. V., Franco, O. H., & Lesaffre, E. M. E. H. (2016). Dealing with missing covariates in epidemiologic studies: A comparison between multiple imputation and a full Bayesian approach. Stat Med, 35(17), 2955–2974. https://doi.org/10.1002/sim.6944

Fan, J., & Levine, R. A. (2007). To amnio or not to amnio: That is the decision for Bayes. Chance, 20(3), 26–32.

Faraggi, D., & Simon, R. (1996). A simulation study of cross-validation for selecting an optimal cutpoint in univariate survival analysis. Stat Med, 15, 2203–2213.

bias in point estimate of effect from selecting cutpoints based on P-value; loss of information from dichotomizing continuous predictors

Faraway, J. J. (1992). The cost of data analysis. J Comp Graph Stat, 1, 213–229.

Fedorov, V., Mannino, F., & Zhang, R. (2009). Consequences of dichotomization. Pharm Stat, 8, 50–61. https://doi.org/10.1002/pst.331

optimal cutpoint depends on unknown parameters;should only entertain dichotomization when "estimating a value of the cumulative distribution and when the assumed model is very different from the true model";nice graphics

Fienberg, S. E. (2007). The Analysis of Cross-Classified Categorical Data (Second). Springer.

Filzmoser, P., Fritz, H., & Kalcher, K. (2012). pcaPP: Robust PCA by Projection Pursuit. http://CRAN.R-project.org/package=pcaPP

Freedman, D., Navidi, W., & Peters, S. (1988). On the Impact of Variable Selection in Fitting Regression Equations (pp. 1–16). Springer-Verlag.

Friedman, J. H. (1984). A variable span smoother (Technical Report 5). Laboratory for Computational Statistics, Department of Statistics, Stanford University.

Gail, M. H., & Pfeiffer, R. M. (2005). On criteria for evaluating models of absolute risk. Biostatistics, 6(2), 227–239.

Gardiner, J. C., Luo, Z., & Roman, L. A. (2009). Fixed effects, random effects and GEE: What are the differences? Stat Med, 28, 221–239.

nice comparison of models; econometrics; different use of the term "fixed effects model"

Giannoni, A., Baruah, R., Leong, T., Rehman, M. B., Pastormerlo, L. E., Harrell, F. E., Coats, A. J., & Francis, D. P. (2014). Do optimal prognostic thresholds in continuous physiological variables really exist? Analysis of origin of apparent thresholds, with systematic review for peak oxygen consumption, ejection fraction and BNP. PLoS ONE, 9(1). https://doi.org/10.1371/journal.pone.0081699

Giudice, J. H., Fieberg, J. R., & Lenarz, M. S. (2011). Spending degrees of freedom in a poor economy: A case study of building a sightability model for moose in northeastern minnesota. J Wildlife Manage. https://doi.org/10.1002/jwmg.213

Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc, 102, 359–378.

wonderful review article except missing references from Scandanavian and German medical decision making literature

Goldstein, H. (1989). Restricted unbiased iterative generalized least-squares estimation. Biometrika, 76(3), 622–623.

derivation of REML

Govindarajulu, U. S., Spiegelman, D., Thurston, S. W., Ganguli, B., & Eisen, E. A. (2007). Comparing smoothing techniques in Cox models for exposure-response relationships. Stat Med, 26, 3735–3752.

authors wrote a SAS macro for restricted cubic splines even though such a macro has existed since 1984; would have gotten more useful results had simulation been used so would know the true regression shape;measure of agreement of two estimated curves by computing the area between them, standardized by average of areas under the two;penalized spline and rcs were closer to each other than to fractional polynomials

Grambsch, P. M., & O’Brien, P. C. (1991). The effects of transformations and preliminary tests for non-linearity in regression. Stat Med, 10, 697–709.

Grambsch, P., & Therneau, T. (1994). Proportional hazards tests and diagnostics based on weighted residuals. Biometrika, 81, 515–526.

Gray, R. J. (1992). Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. J Am Stat Assoc, 87, 942–951.

Gray, R. J. (1994). Spline-based tests in survival analysis. Biometrics, 50, 640–652.

Greenacre, M. J. (1988). Correspondence analysis of multivariate categorical data by weighted least-squares. Biometrika, 75, 457–467.

Greenland, S. (2000). When should epidemiologic regressions use random coefficients? Biometrics, 56, 915–921. https://doi.org/10.1111/j.0006-341X.2000.00915.x

use of statistics in epidemiology is largely primitive;stepwise variable selection on confounders leaves important confounders uncontrolled;composition matrix;example with far too many significant predictors with many regression coefficients absurdly inflated when overfit;lack of evidence for dietary effects mediated through constituents;shrinkage instead of variable selection;larger effect on confidence interval width than on point estimates with variable selection;uncertainty about variance of random effects is just uncertainty about prior opinion;estimation of variance is pointless;instead the analysis should be repeated using different values;"if one feels compelled to estimate $\tau^{2}$, I would recommend giving it a proper prior concentrated amount contextually reasonable values";claim about ordinary MLE being unbiased is misleading because it assumes the model is correct and is the only model entertained;shrinkage towards compositional model;"models need to be complex to capture uncertainty about the relations...an honest uncertainty assessment requires parameters for all effects that we know may be present. This advice is implicit in an antiparsimony principle often attributed to L. J. Savage ’All models should be as big as an elephant (see Draper, 1995)’". See also gus06per.

Guo, J., James, G., Levina, E., Michailidis, G., & Zhu, J. (2011). Principal component analysis with sparse fused loadings. J Comp Graph Stat, 19(4), 930–946.

incorporates blocking structure in the variables;selects different variables for different components;encourages loadings of highly correlated variables to have same magnitude, which aids in interpretation

Gurka, M. J., Edwards, L. J., & Muller, K. E. (2011). Avoiding bias in mixed model inference for fixed effects. Stat Med, 30(22), 2696–2707. https://doi.org/10.1002/sim.4293

Hand, D., & Crowder, M. (1996). Practical Longitudinal Data Analysis. Chapman & Hall.

Harel, O., & Zhou, X.-H. (2007). Multiple imputation: Review of theory, implementation and software. Stat Med, 26, 3057–3077.

failed to review aregImpute;excellent overview;ugly S code;nice description of different statistical tests including combining likelihood ratio tests (which appears to be complex, requiring an out-of-sample log likelihood computation);congeniality of imputation and analysis models;Bayesian approximation or approximate Bayesian bootstrap overview;"Although missing at random (MAR) is a non-testable assumption, it has been pointed out in the literature that we can get very close to MAR if we include enough variables in the imputation models ... it would be preferred if the missing data modelling was done by the data constructors and not by the users... MI yields valid inferences not only in congenial settings, but also in certain uncongenial ones as well—where the imputer’s model (1) is more general (i.e. makes fewer assumptions) than the complete-data estimation method, or when the imputer’s model makes additional assumptions that are well-founded."

Harrell, F. E. (1986). The LOGIST Procedure. In SUGI Supplemental Library Users Guide (Version 5, pp. 269–293). SAS Institute, Inc.

Harrell, F. E. (2015). Regression Modeling Strategies, with Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis (Second edition). Springer. https://doi.org/10.1007/978-3-319-19425-7

Harrell, F. E., Lee, K. L., Califf, R. M., Pryor, D. B., & Rosati, R. A. (1984). Regression modeling strategies for improved prognostic prediction. Stat Med, 3, 143–152.

Harrell, F. E., Lee, K. L., & Mark, D. B. (1996). Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med, 15, 361–387.

Harrell, F. E., Lee, K. L., Matchar, D. B., & Reichert, T. A. (1985). Regression models for prognostic prediction: Advantages, problems, and suggested solutions. Ca Trt Rep, 69, 1071–1077.

Harrell, F. E., Lee, K. L., & Pollock, B. G. (1988). Regression models in clinical studies: Determining relationships between predictors and response. J Nat Cancer Inst, 80, 1198–1202.

Harrell, F. E., Margolis, P. A., Gove, S., Mason, K. E., Mulholland, E. K., Lehmann, D., Muhe, L., Gatchalian, S., & Eichenwald, H. F. (1998). Development of a clinical prediction model for an ordinal outcome: The World Health Organization ARI Multicentre Study of clinical signs and etiologic agents of pneumonia, sepsis, and meningitis in young infants. Stat Med, 17, 909–944. http://onlinelibrary.wiley.com/doi/10.1002/(SICI)1097-0258(19980430)17:8%3C909::AID-SIM753%3E3.0.CO;2-O/abstract

Hastie, T., & Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall.

Hastie, T., Tibshirani, R., & Friedman, J. H. (2008). The Elements of Statistical Learning (second ed.). Springer.

Hazewinkel, A., Tilling, K., Wade, K. H., & Palmer, T. (2023). Trial arm outcome variance difference after dropout as an indicator of missing‐not‐at‐random bias in randomized controlled trials. Biometrical J, 2200116. https://doi.org/10.1002/bimj.202200116

Indirect assessment of MNAR by comparing variances of continuous outcome variable. Not sure if authors discussed transformation assumptions about Y.

He, Y., & Zaslavsky, A. M. (2012). Diagnosing imputation models by applying target analyses to posterior replicates of completed data. Stat Med, 31(1), 1–18. https://doi.org/10.1002/sim.4413

Herndon, J. E., & Harrell, F. E. (1990). The restricted cubic spline hazard model. Comm Stat Th Meth, 19, 639–663.

Herndon, J. E., & Harrell, F. E. (1995). The restricted cubic spline as baseline hazard in the proportional hazards model with step function time-dependent covariables. Stat Med, 14, 2119–2129.

Hilsenbeck, S. G., & Clark, G. M. (1996). Practical p-value adjustment for optimally selected cutpoints. Stat Med, 15, 103–112.

Hoeffding, W. (1948). A non-parametric test of independence. Ann Math Stat, 19, 546–557.

Holländer, N., Sauerbrei, W., & Schumacher, M. (2004). Confidence intervals for the effect of a prognostic factor after selection of an “optimal” cutpoint. Stat Med, 23, 1701–1713. https://doi.org/10.1002/sim.1611

true type I error can be much greater than nominal level;one example where nominal is 0.05 and true is 0.5;minimum P-value method;CART;recursive partitioning;bootstrap method for correcting confidence interval;based on heuristic shrinkage coefficient;"It should be noted, however, that the optimal cutpoint approach has disadvantages. One of these is that in almost every study where this method is applied, another cutpoint will emerge. This makes comparisons across studies extremely difficult or even impossible. Altman et al. point out this problem for studies of the prognostic relevance of the S-phase fraction in breast cancer published in the literature. They identified 19 different cutpoints used in the literature; some of them were solely used because they emerged as the “optimal” cutpoint in a specific data set. In a meta-analysis on the relationship between cathepsin-D content and disease-free survival in node-negative breast cancer patients, 12 studies were in included with 12 different cutpoints ... Interestingly, neither cathepsin-D nor the S-phase fraction are recommended to be used as prognostic markers in breast cancer in the recent update of the American Society of Clinical Oncology."; dichotomization; categorizing continuous variables; refs alt94dan, sch94out, alt98sub

Horton, N. J., & Kleinman, K. P. (2007). Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. Am Statistician, 61(1), 79–90.

Hossain, Md. B., Sadatsafavi, M., Johnston, J. C., Wong, H., Cook, V. J., & Karim, M. E. (2025). LASSO-Based Survival Prediction Modeling with Multiply Imputed Data: A Case Study in Tuberculosis Mortality Prediction. The American Statistician, 1–12. https://doi.org/10.1080/00031305.2025.2526545

Hurvich, C. M., & Tsai, C. L. (1990). The impact of model selection on inference in linear regression. Am Statistician, 44, 214–217.

Iezzoni, L. I. (1994). Dimensions of Risk. In L. I. Iezzoni (Ed.), Risk Adjustment for Measuring Health Outcomes (pp. 29–118). Foundation of the American College of Healthcare Executives.

dimensions of risk factors to include in models

Janssen, K. J., Donders, A. R., Harrell, F. E., Vergouwe, Y., Chen, Q., Grobbee, D. E., & Moons, K. G. (2010). Missing covariate data in medical research: To impute is better than to ignore. J Clin Epi, 63, 721–727.

Jolliffe, I. T. (2010). Principal Component Analysis (Second). Springer-Verlag.

Jones, M. P. (1996). Indicator and stratification methods for missing explanatory variables in multiple linear regression. J Am Stat Assoc, 91, 222–230.

Kalbfleisch, J. D., & Prentice, R. L. (1973). Marginal likelihood based on Cox’s regression and life model. Biometrika, 60, 267–278.

Kapoor, S., & Narayanan, A. (2023). Leakage and the reproducibility crisis in machine-learning-based science. Patterns, 4(9), 100804. https://doi.org/10.1016/j.patter.2023.100804

Karrison, T. G. (1997). Use of Irwin’s restricted mean as an index for comparing survival in different treatment groups—Interpretation and power considerations. Controlled Clin Trials, 18, 151–167.

nice power comparisons with Wilcoxon;power with and without covariable adjustment

Karvanen, J., & Harrell, F. E. (2009). Visualizing covariates in proportional hazards model. Stat Med, 28, 1957–1966.

Kass, R. E., & Raftery, A. E. (1995). Bayes factors. J Am Stat Assoc, 90, 773–795.

Kay, R. (1986). Treatment effects in competing-risks analysis of prostate cancer data. Biometrics, 42, 203–211.

Kenward, M. G., White, I. R., & Carpener, J. R. (2010). Should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials? (Letter to the editor). Stat Med, 29, 1455–1456.

sharp rebuke of liu09sho

Keselman, H. J., Algina, J., Kowalchuk, R. K., & Wolfinger, R. D. (1998). A comparison of two approaches for selecting covariance structures in the analysis of repeated measurements. Comm Stat - Sim Comp, 27, 591–604.

use of AIC and BIC for selecting the covariance structure in repeated measurements;serial data;longitudinal data;when chosing from 11 covariance patterns, AIC selected the correct structure 0.47 of the time; BIC was correct in 0.35

Kim, S., Sugar, C. A., & Belin, T. R. (2015). Evaluating model-based imputation methods for missing covariates in regression models with interactions. Stat Med, 34(11), 1876–1888. https://doi.org/10.1002/sim.6435

Knaus, W. A., Harrell, F. E., Lynn, J., Goldman, L., Phillips, R. S., Connors, A. F., Dawson, N. V., Fulkerson, W. J., Califf, R. M., Desbiens, N., Layde, P., Oye, R. K., Bellamy, P. E., Hakim, R. B., & Wagner, D. P. (1995). The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Ann Int Med, 122, 191–203. https://doi.org/10.7326/0003-4819-122-3-199502010-00007

Knol, M. J., Janssen, K. J. M., Donders, R. T., Egberts, A. C. G., Heerding, E. R., Grobbee, D. E., Moons, K. G. M., & Geerlings, M. I. (2010). Unpredictable bias when using the missing indicator method or complete case analysis for missing confounder values: An empirical example. J Clin Epi, 63, 728–736.

Koenker, R. (2005). Quantile Regression. Cambridge University Press.

Koenker, R. (2009). Quantreg: Quantile Regression. http://CRAN.R-project.org/package=quantreg

Koenker, R., & Bassett, G. (1978). Regression quantiles. Econometrica, 46, 33–50.

Kooperberg, C., & Clarkson, D. B. (1997). Hazard regression with interval-censored data. Biometrics, 53, 1485–1494.

Kooperberg, C., Stone, C. J., & Truong, Y. K. (1995). Hazard regression. J Am Stat Assoc, 90, 78–94.

Kuhfeld, W. F. (2009). The PRINQUAL Procedure. In SAS/STAT 9.2 User’s Guide (Second). SAS Publishing. http://support.sas.com/documentation/onlinedoc/stat

Lachin, J. M., & Foulkes, M. A. (1986). Evaluation of sample size and power for analyses of survival with allowance for nonuniform patient entry, losses to follow-up, noncompliance, and stratification. Biometrics, 42, 507–519.

Landwehr, J. M., Pregibon, D., & Shoemaker, A. C. (1984). Graphical methods for assessing logistic regression models (with discussion). J Am Stat Assoc, 79, 61–83.

Larson, M. G., & Dinse, G. E. (1985). A mixture model for the regression analysis of competing risks data. Appl Stat, 34, 201–211.

Lausen, B., & Schumacher, M. (1996). Evaluating the effect of optimized cutoff values in the assessment of prognostic factors. Comp Stat Data Analysis, 21(3), 307–326. https://doi.org/10.1016/0167-9473(95)00016-X

Lawless, J. F., & Singhal, K. (1978). Efficient screening of nonnormal regression models. Biometrics, 34, 318–327.

le Cessie, S., & van Houwelingen, J. C. (1992). Ridge estimators in logistic regression. Appl Stat, 41, 191–201.

Leclerc, A., Luce, D., Lert, F., Chastang, J. F., & Logeay, P. (1988). Correspondence analysis and logistic modelling: Complementary use in the analysis of a health survey among nurses. Stat Med, 7, 983–995.

Lee, K. J., & Carlin, J. B. (2012). Recovery of information from multiple imputation: A simulation study. Emerg Themes Epi, 9(1), 3+. https://doi.org/10.1186/1742-7622-9-3

Not sure that the authors satisfactorily dealt with nonlinear predictor effectsin the absence of strong auxiliary information, there is little to gain from multiple imputation with missing data in the exposure-of-interest. In fact, the authors went further to say that multiple imputation can introduce bias not present in a complete case analysis if a poorly fitting imputation model is used [from Yong Hao Pua]

Lee, S., Huang, J. Z., & Hu, J. (2010). Sparse logistic principal components analysis for binary data. Ann Appl Stat, 4(3), 1579–1601.

Leng, C., & Wang, H. (2009). On general adaptive sparse principal component analysis. J Comp Graph Stat, 18(1), 201–215.

Li, C., & Shepherd, B. E. (2012). A new residual for ordinal outcomes. Biometrika, 99(2), 473–480. https://doi.org/10.1093/biomet/asr073

Liang, K.-Y., & Zeger, S. L. (2000). Longitudinal data analysis of continuous and discrete responses for pre-post designs. Sankhyā, 62, 134–148.

makes an error in assuming the baseline variable will have the same univariate distribution as the response except for a shift;baseline may have for example a truncated distribution based on a trial’s inclusion criteria;if correlation between baseline and response is zero, ANCOVA will be twice as efficient as simple analysis of change scores;if correlation is one they may be equally efficient

Lindsey, J. K. (1997). Models for Repeated Measurements. Clarendon Press.

Lipsitz, S., Parzen, M., & Zhao, L. P. (2002). A Degrees-Of-Freedom approximation in Multiple imputation. J Stat Comp Sim, 72(4), 309–318. https://doi.org/10.1080/00949650212848

Little, R. J. A., & Rubin, D. B. (2002). Statistical Analysis with Missing Data (second). Wiley.

Liu, G. F., Lu, K., Mogg, R., Mallick, M., & Mehrotra, D. V. (2009). Should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials? Stat Med, 28, 2509–2530.

seems to miss several important points, such as the fact that the baseline variable is often part of the inclusion/exclusion criteria and so has a truncated distribution that is different from that of the follow-up measurements;sharp rebuke in ken10sho

Lockhart, R., Taylor, J., Tibshirani, R. J., & Tibshirani, R. (2013). A significance test for the lasso. arXiv. http://arxiv.org/abs/1301.7161

Luchman, J. N. (2014). Relative Importance Analysis With Multicategory Dependent Variables:: An Extension and Review of Best Practices. Organizational Research Methods, 17(4), 452–471. https://doi.org/10.1177/1094428114544509

Measures based on pseudo R^2 and considering all possible subsets of covariates. Good background section with review of pseudo R^2 measures.

Luo, X., Stfanski, L. A., & Boos, D. D. (2006). Tuning variable selection procedures by adding noise. Technometrics, 48, 165–175.

adding a known amount of noise to the response and studying σ² to tune the stopping rule to avoid overfitting or underfitting;simulation setup

Madley-Dowd, P., Hughes, R., Tilling, K., & Heron, J. (2019). The proportion of missing data should not be used to guide decisions on multiple imputation. Journal of Clinical Epidemiology, 110, 63–73. https://doi.org/10.1016/j.jclinepi.2019.02.016

Mamouris, P., Nassiri, V., Verbeke, G., Janssens, A., Vaes, B., & Molenberghs, G. (2023). A longitudinal transition imputation model for categorical data applied to a large registry dataset. Statistics in Medicine, 42(29), 5405–5418. https://doi.org/10.1002/sim.9919

Mantel, N. (1970). Why stepdown procedures in variable selection. Technometrics, 12, 621–625.

Manuguerra, M., & Heller, G. Z. (2010). Ordinal Regression Models for Continuous Scales. Int J Biostat, 6(1). https://doi.org/10.2202/1557-4679.1230

mislabeled a flexible parametric model as semi-parametric; does not cover semi-parametric approach with lots of intercepts

Mark, D. B., Hlatky, M. A., Harrell, F. E., Lee, K. L., Califf, R. M., & Pryor, D. B. (1987). Exercise treadmill score for predicting prognosis in coronary artery disease. Ann Int Med, 106, 793–800.

Maxwell, S. E., & Delaney, H. D. (1993). Bivariate median splits and spurious statistical significance. Psych Bull, 113, 181–190. https://doi.org/10.1037//0033-2909.113.1.181

McCabe, G. P. (1984). Principal variables. Technometrics, 26, 137–144.

Mi, J., Tendulkar, R. D., Sittenfeld, S. M. C., Patil, S., & Zabor, E. C. (2025). Combining Missing Data Imputation and Internal Validation in Clinical Risk Prediction Models. Statistics in Medicine, 44(18–19), e70203. https://doi.org/10.1002/sim.70203

Michailidis, G., & de Leeuw, J. (1998). The Gifi system of descriptive multivariate analysis. Stat Sci, 13, 307–336.

Moons, K. G. M., Donders, R. A. R. T., Stijnen, T., & Harrell, F. E. (2006). Using the outcome for imputation of missing predictor values was preferred. J Clin Epi, 59, 1092–1101. https://doi.org/10.1016/j.jclinepi.2006.01.009

use of outcome variable; excellent graphical summaries of simulations

Morris, T. P., White, I. R., Carpenter, J. R., Stanworth, S. J., & Royston, P. (2015). Combining fractional polynomial model building with multiple imputation: T. P. Morris et Al . Statist. Med., 34(25), 3298–3317. https://doi.org/10.1002/sim.6553

Moser, B. K., & Coombs, L. P. (2004). Odds ratios for a continuous outcome variable without dichotomizing. Stat Med, 23, 1843–1860.

large loss of efficiency and power;embeds in a logistic distribution, similar to proportional odds model;categorization;dichotomization of a continuous response in order to obtain odds ratios often results in an inflation of the needed sample size by a factor greater than 1.5

Muenz, L. R. (1983). Comparing survival distributions: A review for nonstatisticians. II. Ca Invest, 1, 537–545.

Muggeo, V. M. R., & Tagliavia, M. (2010). A flexible approach to the crossing hazards problem. Stat Med, 29, 1947–1957.

failed to reference per06red or per07app

Myers, R. H. (1990). Classical and Modern Regression with Applications. PWS-Kent.

Nagelkerke, N. J. D. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78, 691–692.

Nick, T. G., & Hardin, J. M. (1999). Regression modeling strategies: An illustrative case study from medical rehabilitation outcomes research. Am J Occ Ther, 53, 459–470.

Noma, H., Shinozaki, T., Iba, K., Teramukai, S., & Furukawa, T. A. (2021). Confidence intervals of prediction accuracy measures for multivariable prediction models based on the bootstrap-based optimism correction methods. Statistics in Medicine, n/a(n/a). https://doi.org/10.1002/sim.9148

Nott, D. J., & Leng, C. (2010). Bayesian projection approaches to variable selection in generalized linear models. Computational Statistics & Data Analysis, 54(12), 3227–3241. https://doi.org/10.1016/j.csda.2010.01.036

Paul, D., Bair, E., Hastie, T., & Tibshirani, R. (2008). “Preconditioning” for feature selection and regression in high-dimensional problems. Ann Stat, 36(4), 1595–1619. https://doi.org/10.1214/009053607000000578

develop consistent Y using a latent variable structure, using for example supervised principal components. Then run stepwise regression or lasso predicting Y (lasso worked better). Can run into problems when a predictor has importance in an adjusted sense but has no marginal correlation with Y;model approximation;model simplification

Peduzzi, P., Concato, J., Feinstein, A. R., & Holford, T. R. (1995). Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. J Clin Epi, 48, 1503–1510.

Peduzzi, P., Concato, J., Kemper, E., Holford, T. R., & Feinstein, A. R. (1996). A simulation study of the number of events per variable in logistic regression analysis. J Clin Epi, 49, 1373–1379.

Peek, N., Arts, D. G. T., Bosman, R. J., van der Voort, P. H. J., & de Keizer, N. F. (2007). External validation of prognostic models for critically ill patients required substantial sample sizes. J Clin Epi, 60, 491–501.

large sample sizes need to obtain reliable external validations;inadequate power of DeLong, DeLong, and Clarke-Pearson test for differences in correlated ROC areas (p. 498);problem with tests of calibration accuracy having too much power for large sample sizes

Pencina, M. J., D’Agostino, R. B., & Demler, O. V. (2012). Novel metrics for evaluating improvement in discrimination: Net reclassification and integrated discrimination improvement for normal variables and nested models. Stat Med, 31(2), 101–113. https://doi.org/10.1002/sim.4348

Pencina, M. J., D’Agostino, R. B., & Steyerberg, E. W. (2011). Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med, 30, 11–21. https://doi.org/10.1002/sim.4085

lack of need for NRI to be category-based;arbitrariness of categories;"category-less or continuous NRI is the most objective and versatile measure of improvement in risk prediction;authors misunderstood the inadequacy of three categories if categories are used;comparison of NRI to change in C index;example of continuous plot of risk for old model vs. risk for new model

Pencina, M. J., D’Agostino Sr, R. B., D’Agostino Jr, R. B., & Vasan, R. S. (2008). Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond. Stat Med, 27, 157–172.

small differences in ROC area can still be very meaningful;example of insignificant test for difference in ROC areas with very significant results from new method;Yates’ discrimination slope;reclassification table;limiting version of this based on whether and amount by which probabilities rise for events and lower for non-events when compare new model to old;comparing two models;see letter to the editor by Van Calster and Van Huffel, Stat in Med 29:318-319, 2010 and by Cook and Paynter, Stat in Med 31:93-97, 2012

Penning, de V. B. B. L., van, S. M., & Groenwold, R. H. H. (2018). Propensity Score Estimation Using Classification and Regression Trees in the Presence of Missing Covariate Data. Epidemiologic Methods, 7(1). https://doi.org/10.1515/em-2017-0020

Pepe, M. S. (1991). Inference for events with dependent risks in multiple endpoint studies. J Am Stat Assoc, 86, 770–778.

Pepe, M. S., Longton, G., & Thornquist, M. (1991). A qualifier Q for the survival function to describe the prevalence of a transient condition. Stat Med, 10, 413–421.

Pepe, M. S., & Mori, M. (1993). Kaplan–Meier, marginal or conditional probability curves in summarizing competing risks failure time data? Stat Med, 12, 737–751.

Perperoglou, A., le Cessie, S., & van Houwelingen, H. C. (2006). Reduced-rank hazard regression for modelling non-proportional hazards. Stat Med, 25, 2831–2845.

natural structural way to allow for varying degrees of freedom in modeling non-PH

Peters, S. A., Bots, M. L., den Ruijter, H. M., Palmer, M. K., Grobbee, D. E., Crouse, J. R., O’Leary, D. H., Evans, G. W., Raichlen, J. S., Moons, K. G., Koffijberg, H., & METEOR study group. (2012). Multiple imputation of missing repeated outcome measurements did not add to linear mixed-effects models. J Clin Epi, 65(6), 686–695. https://doi.org/10.1016/j.jclinepi.2011.11.012

Peterson, B., & George, S. L. (1993). Sample size requirements and length of study for testing interaction in a 1 k factorial design when time-to-failure is the outcome. Controlled Clin Trials, 14, 511–522.

Peterson, B., & Harrell, F. E. (1990). Partial proportional odds models for ordinal response variables. Appl Stat, 39, 205–217.

Pike, M. C. (1966). A method of analysis of certain class of experiments in carcinogenesis. Biometrics, 22, 142–161.

Pinheiro, J. C., & Bates, D. M. (2000). Mixed-Effects Models in S and S-PLUS. Springer.

Potthoff, R. F., & Roy, S. N. (1964). A generalized multivariate analysis of variance model useful especially for growth curve problems. Biometrika, 51, 313–326.

included an AR1 example

Pryor, D. B., Harrell, F. E., Lee, K. L., Califf, R. M., & Rosati, R. A. (1983). Estimating the likelihood of significant coronary artery disease. Am J Med, 75, 771–780.

Pryor, D. B., Harrell, F. E., Rankin, J. S., Lee, K. L., Muhlbaier, L. H., Oldham, H. N., Hlatky, M. A., Mark, D. B., Reves, J. G., & Califf, R. M. (1987). The changing survival benefits of coronary revascularization over time. Circ (Supplement V), 76, 13–21.

Putter, H., Sasako, M., Hartgrink, H. H., van de Velde, C. J. H., & van Houwelingen, J. C. (2005). Long-term survival with non-proportional hazards: Results from the Dutch Gastric Cancer Trial. Stat Med, 24, 2807–2821.

Radchenko, P., & James, G. M. (2008). Variable inclusion and shrinkage algorithms. J Am Stat Assoc, 103(483), 1304–1315.

solves problem caused by lasso using the same penalty parameter for variable selection and shrinkage which causes lasso to have to keep too many variables in the model to avoid overshrinking the remaining predictors;does not handle scaling issue well

Ragland, D. R. (1992). Dichotomizing continuous outcome variables: Dependence of the magnitude of association and statistical power on the cutpoint. Epi, 3, 434–440. https://doi.org/10.1097/00001648-199209000-00009

Reilly, B. M., & Evans, A. T. (2006). Translating clinical research into clinical practice: Impact of using prediction rules to make decisions. Ann Int Med, 144, 201–209.

impact analysis;example of decision aid being ignored or overruled making MD decisions worse;assumed utilities are constant across subjects by concluding that directives have more impact than predictions;Goldman-Cook clinical prediction rule in AMI

Reiter, J. P. (2007). Small-sample degrees of freedom for multi-component significance tests with multiple imputation for missing data. Biometrika, 94(2), 502–508. https://doi.org/10.1093/biomet/asm028

Riley, R. D., Snell, K. I. E., Archer, L., Ensor, J., Debray, T. P. A., Van Calster, B., Van Smeden, M., & Collins, G. S. (2024). Evaluation of clinical prediction models (part 3): Calculating the sample size required for an external validation study. BMJ, e074821. https://doi.org/10.1136/bmj-2023-074821

Riley, R. D., Snell, K. I. E., Ensor, J., Burke, D. L., Harrell, F. E., Moons, K. G. M., & Collins, G. S. (2019). Minimum sample size for developing a multivariable prediction model: Part I – Continuous outcomes. Statistics in Medicine, 38(7), 1262–1275. https://doi.org/10.1002/sim.7993

Riley, R. D., Snell, K. I., Ensor, J., Burke, D. L., Harrell Jr, F. E., Moons, K. G., & Collins, G. S. (2019). Minimum sample size for developing a multivariable prediction model: PART II ‐ binary and time‐to‐event outcomes. Statistics in Medicine, 38(7), 1276–1296. https://doi.org/10.1002/sim.7992

Roecker, E. B. (1991). Prediction error and its estimation for subset-selected models. Technometrics, 33, 459–468.

Rohde, M. D., French, B., Stewart, T. G., & Harrell, F. E. (2024). Bayesian transition models for ordinal longitudinal outcomes. Statistics in Medicine, 43(18), 3539–3561. https://doi.org/10.1002/sim.10133

Royston, P., Altman, D. G., & Sauerbrei, W. (2006). Dichotomizing continuous predictors in multiple regression: A bad idea. Stat Med, 25, 127–141. https://doi.org/10.1002/sim.2331

destruction of statistical inference when cutpoints are chosen using the response variable; varying effect estimates when change cutpoints;difficult to interpret effects when dichotomize;nice plot showing effect of categorization; PBC data

Rubin, D., & Schenker, N. (1991). Multiple imputation in health-care data bases: An overview and some applications. Stat Med, 10, 585–598.

Sarle, W. (1990). The VARCLUS Procedure. In SAS/STAT User’s Guide (fourth, Vol. 2, pp. 1641–1659). SAS Institute, Inc. http://support.sas.com/documentation/onlinedoc/stat

Sauerbrei, W., & Schumacher, M. (1992). A bootstrap resampling procedure for model building: Application to the Cox regression model. Stat Med, 11, 2093–2109.

Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psych Meth, 7, 147–177.

excellent review and overview of missing data and imputation;problems with MICE;less technical description of 3 types of missing data

Schemper, M., & Heinze, G. (1997). Probability imputation revisited for prognostic factor studies. Stat Med, 16, 73–80.

imputation of missing covariables using logistic model;comparison with multiple imputation;analysis of prostate cancer dataset

Schoenfeld, D. (1982). Partial residuals for the proportional hazards regression model. Biometrika, 69, 239–241.

Schulgen, G., Lausen, B., Olsen, J., & Schumacher, M. (1994). Outcome-oriented cutpoints in quantitative exposure. Am J Epi, 120, 172–184.

Selvin, E., Steffes, M. W., Zhu, H., Matsushita, K., Wagenknecht, L., Pankow, J., Coresh, J., & Brancati, F. L. (2010). Glycated hemoglobin, diabetes, and cardiovascular risk in nondiabetic adults. NEJM, 362(9), 800–811. https://doi.org/10.1056/NEJMoa0908359

Senn, S. (2006). Change from baseline and analysis of covariance revisited. Stat Med, 25, 4334–4344.

shows that claims that in a 2-arm study it is not true that ANCOVA requires the population means at baseline to be identical;refutes some claims of lia00lon;problems with counterfactuals;temporal additivity ("amounts to supposing that despite the fact that groups are difference at baseline they would show the same evolution over time");causal additivity;is difficult to design trials for which simple analysis of change scores is unbiased, ANCOVA is biased, and a causal interpretation can be given;temporally and logically, a "baseline cannot be a <i>response</i> to treatment", so baseline and response cannot be modeled in an integrated framework as Laird and Ware’s model has been used;"one should focus clearly on “outcomes” as being the only values that can be influenced by treatment and examine critically any schemes that assume that these are linked in some rigid and deterministic view to “baseline” values. An alternative tradition sees a baseline as being merely one of a number of measurements capable of improving predictions of outcomes and models it in this way.";"You cannot establish necessary conditions for an estimator to be valid by nominating a model and seeing what the model implies unless the model is universally agreed to be impeccable. On the contrary it is appropriate to start with the estimator and see what assumptions are implied by valid conclusions.";this is in distinction to lia00lon

Shao, J. (1993). Linear model selection by cross-validation. J Am Stat Assoc, 88, 486–494.

Shepherd, B. E., Li, C., & Liu, Q. (2016). Probability‐scale residuals for continuous, discrete, and censored data. Can J Statistics, 44(4), 463–479. https://doi.org/10.1002/cjs.11302

Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2013). A sparse-group lasso. J Comp Graph Stat, 22(2), 231–245. https://doi.org/10.1080/10618600.2012.681250

sparse effects both on a group and within group levels;can also be considered special case of group lasso allowing overlap between groups

Simon, R., & Freedman, L. S. (1997). Bayesian design and analysis of two two factorial clinical trials. Biometrics, 53, 456–464.

Simpson, S. L., Edwards, L. J., Muller, K. E., Sen, P. K., & Styner, M. A. (2010). A linear exponent AR(1) family of correlation structures. Stat Med, 29, 1825–1838.

Smith, L. R., Harrell, F. E., & Muhlbaier, L. H. (1992). Problems and potentials in modeling survival. In M. L. Grady & H. A. Schwartz (Eds.), Medical Effectiveness Research Data Methods (Summary Report), AHCPR Pub. No. 92-0056 (pp. 151–159). US Dept. of Health and Human Services, Agency for Health Care Policy and Research. https://hbiostat.org/bib/papers/smi92pro.pdf

Spanos, A., Harrell, F. E., & Durack, D. T. (1989). Differential diagnosis of acute meningitis: An analysis of the predictive value of initial observations. JAMA, 262, 2700–2707. https://doi.org/10.1001/jama.262.19.2700

Spence, I., & Garrison, R. F. (1993). A remarkable scatterplot. Am Statistician, 47, 12–19.

Spiegelhalter, D. J. (1986). Probabilistic prediction in patient management and clinical trials. Stat Med, 5, 421–433. https://doi.org/10.1002/sim.4780050506

z-test for calibration inaccuracy (implemented in Stata, and R Hmisc package’s val.prob function)

Stan Development Team. (2020). Stan: A C++ Library for Probability and Sampling. https://cran.r-project.org/package=rstan

Steyerberg, E. W. (2018). Validation in prediction research: The waste by data-splitting. Journal of Clinical Epidemiology, 0(0). https://doi.org/10.1016/j.jclinepi.2018.07.010

Steyerberg, E. W. (2019). Clinical Prediction Models (2nd ed.). Springer.

Steyerberg, E. W., Eijkemans, M. J. C., Harrell, F. E., & Habbema, J. D. F. (2000). Prognostic modelling with logistic regression analysis: A comparison of selection and estimation methods in small data sets. Stat Med, 19, 1059–1079.

Steyerberg, E. W., Eijkemans, M. J. C., Harrell, F. E., & Habbema, J. D. F. (2001). Prognostic modeling with logistic regression analysis: In search of a sensible strategy in small data sets. Med Decis Mak, 21, 45–56.

Stone, C. J. (1986). Comment: Generalized additive models. Stat Sci, 1, 312–314.

Stone, C. J., & Koo, C. Y. (1985). Additive splines in statistics. Proceedings of the Statistical Computing Section ASA, 45–48.

Strauss, D., & Shavelle, R. (1998). An extended Kaplan–Meier estimator and its applications. Stat Med, 17, 971–982.

estimation of transition probabilities of an individual in state i at time x being in state j at a subsequent time t;dead state and multiple live states;prognostic chart;generalized uninformative censoring;multistate Kaplan-Meier estimator

Suissa, S., & Blais, L. (1995). Binary regression with continuous outcomes. Stat Med, 14, 247–255. https://doi.org/10.1002/sim.4780140303

Sullivan, T. R., Salter, A. B., Ryan, P., & Lee, K. J. (2015). Bias and Precision of the “Multiple Imputation, Then Deletion” Method for Dealing With Missing Outcome Data. American Journal of Epidemiology, 182(6), 528–534. https://doi.org/10.1093/aje/kwv100

Disagrees with von Hippel approach of "impute then delete" for Y

Sun, G.-W., Shook, T. L., & Kay, G. L. (1996). Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis. J Clin Epi, 49, 907–916.

Therneau, T. M., Grambsch, P. M., & Fleming, T. R. (1990). Martingale-based residuals for survival models. Biometrika, 77, 216–218.

Tibshirani, R. (1988). Estimating transformations for regression via additivity and variance stabilization. J Am Stat Assoc, 83, 394–405.

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J Roy Stat Soc B, 58, 267–288.

Tjur, T. (2009). Coefficients of determination in logistic regression models—A new proposal: The coefficient of discrimination. Am Statistician, 63(4), 366–372.

Twisk, J., de Boer, M., de Vente, W., & Heymans, M. (2013). Multiple imputation of missing values was not necessary before performing a longitudinal mixed-model analysis. J Clin Epi, 66(9), 1022–1028. https://doi.org/10.1016/j.jclinepi.2013.03.017

Vach, W., & Blettner, M. (1998). Missing Data in Epidemiologic Studies. In Ency of Biostatistics (pp. 2641–2654). Wiley.

van Buuren, S. (2012). Flexible imputation of missing data. Chapman & Hall/CRC. https://doi.org/10.1201/b11826

van Buuren, S., Brand, J. P. L., Groothuis-Oudshoorn, C. G. M., & Rubin, D. B. (2006). Fully conditional specification in multivariate imputation. J Stat Computation Sim, 76(12), 1049–1064.

justification for chained equations alternative to full multivariate modeling

Van Calster, B., Nieboer, D., Vergouwe, Y., De Cock, B., Pencina, M. J., & Steyerberg, E. W. (2016). A calibration hierarchy for risk models was defined: From utopia to empirical data. J Clin Epi, 74, 167–176. https://doi.org/10.1016/j.jclinepi.2015.12.005

van der Heijden, G. J. M. G., Donders, Stijnen, T., & Moons, K. G. M. (2006). Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: A clinical example. J Clin Epi, 59, 1102–1109. https://doi.org/10.1016/j.jclinepi.2006.01.015

Invalidity of adding a new category or an indicator variable for missing values even with MCAR

van der Ploeg, T., Austin, P. C., & Steyerberg, E. W. (2014). Modern modelling techniques are data hungry: A simulation study for predicting dichotomous endpoints. BMC Medical Research Methodology, 14(1), 137+. https://doi.org/10.1186/1471-2288-14-137

Would be better to use proper accuracy scores in the assessment. Too much emphasis on optimism as opposed to final discrimination measure. But much good practical information. Recursive partitioning fared poorly.

van Houwelingen, J. C., & le Cessie, S. (1990). Predictive value of statistical models. Stat Med, 9, 1303–1325.

Venables, W. N., & Ripley, B. D. (2003). Modern Applied Statistics with S (Fourth). Springer-Verlag.

Verbeke, G., & Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data. Springer.

Verweij, P. J., & van Houwelingen, H. C. (1994). Penalized likelihood in Cox regression. Stat Med, 13, 2427–2436.

Vickers, A. J. (2008). Decision analysis for the evaluation of diagnostic tests, prediction models, and molecular markers. Am Statistician, 62(4), 314–320.

limitations of accuracy metrics;incorporating clinical consequences;nice example of calculation of expected outcome;drawbacks of conventional decision analysis, especially because of the difficulty of eliciting the expected harm of a missed diagnosis;use of a threshold on the probability of disease for taking some action;decision curve;has other good references to decision analysis

Vink, G., Frank, L. E., Pannekoek, J., & van Buuren, S. (2014). Predictive mean matching imputation of semicontinuous variables. Statistica Neerlandica, 68(1), 61–90. https://doi.org/10.1111/stan.12023

Vittinghoff, E., & McCulloch, C. E. (2006). Relaxing the rule of ten events per variable in logistic and Cox regression. Am J Epi, 165, 710–718.

the authors may have not been quite stringent enough in their assessment of adequacy of predictions;letter to the editor submitted

von Hippel, P. T. (2007). Regression with missing Ys: An improved strategy for analyzing multiple imputed data. Soc Meth, 37(1), 83–117.

von Hippel, P. T. (2016). The number of imputations should increase quadratically with the fraction of missing information. http://arxiv.org/abs/1608.05406

Wainer, H. (2006). Finding what is not there through the unfortunate binning of results: The Mendel effect. Chance, 19(1), 49–56.

can find bins that yield either positive or negative association;especially pertinent when effects are small;"With four parameters, I can fit an elephant; with five, I can make it wiggle its trunk." - John von Neumann

Walker, S. H., & Duncan, D. B. (1967). Estimation of the probability of an event as a function of several independent variables. Biometrika, 54, 167–178.

Wang, H., & Leng, C. (2007). Unified LASSO estimation by least squares approximation. J Am Stat Assoc, 102, 1039–1048. https://doi.org/10.1198/016214507000000509

Wang, S., Nan, B., Zhou, N., & Zhu, J. (2009). Hierarchically penalized Cox regression with grouped variables. Biometrika, 96(2), 307–322.

Wax, Y. (1992). Collinearity diagnosis for a relative risk regression analysis: An application to assessment of diet-cancer relationship in epidemiological studies. Stat Med, 11, 1273–1287.

Wenger, T. L., Harrell, F. E., Brown, K. K., Lederman, S., & Strauss, H. C. (1984). Ventricular fibrillation following canine coronary reperfusion: Different outcomes with pentobarbital and α-chloralose. Can J Phys Pharm, 62, 224–228.

White, I. R., & Carlin, J. B. (2010). Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Stat Med, 29, 2920–2931.

White, I. R., & Royston, P. (2009). Imputing missing covariate values for the Cox model. Stat Med, 28, 1982–1998.

approach to using event time and censoring indicator as predictors in the imputation model for missing baseline covariates;recommended an approximation using the event indicator and the cumulative hazard transformation of time, without their interaction

White, I. R., Royston, P., & Wood, A. M. (2011). Multiple imputation using chained equations: Issues and guidance for practice. Stat Med, 30(4), 377–399.

practical guidance for the use of multiple imputation using chained equations;MICE;imputation models for different types of target variables;PMM choosing at random from among a few closest matches;choosing number of multiple imputations by a reproducibility argument, suggesting 100f imputations when f is the fraction of cases that are incomplete

Whitehead, J. (1993). Sample size calculations for ordered categorical data. Stat Med, 12, 2257–2271.

Wiegand, R. E. (2010). Performance of using multiple stepwise algorithms for variable selection. Stat Med, 29, 1647–1659.

fruitless to try different stepwise methods and look for agreement;the methods will agree on the wrong model

Wijesuriya, R., Moreno‐Betancur, M., Carlin, J. B., White, I. R., Quartagno, M., & Lee, K. J. (2025). Multiple Imputation for Longitudinal Data: A Tutorial. Statistics in Medicine, 44(3–4), e10274. https://doi.org/10.1002/sim.10274

Witten, D. M., & Tibshirani, R. (2008). Testing significance of features by lassoed principal components. Ann Appl Stat, 2(3), 986–1012.

reduction in false discovery rates over using a vector of t-statistics;borrowing strength across genes;"one would not expect a single gene to be associated with the outcome, since, in practice, many genes work together to effect a particular phenotype. LPC effectively down-weights individual genes that are associated with the outcome but that do not share an expression pattern with a larger group of genes, and instead favors large groups of genes that appear to be differentially-expressed.";regress principal components on outcome;sparse principal components

Wood, A. M., White, I. R., & Royston, P. (2008). How should variable selection be performed with multiply imputed data? Statistics in Medicine, 27(17), 3227–3246. https://doi.org/10.1002/sim.3177

Wood, S. N. (2006). Generalized Additive Models: An Introduction with R. Chapman & Hall/CRC.

Wu, C. F. J. (1986). Jackknife, bootstrap and other resampling methods in regression analysis. Ann Stat, 14(4), 1261–1350.

Xiong, S. (2010). Some notes on the nonnegative garrote. Technometrics, 52(3), 349–361.

"... to select tuning parameters, it may be unnecessary to optimize a model selectin criterion repeatedly";natural selection of penalty function

Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. J Am Stat Assoc, 93, 120–131.

Young, F. W., Takane, Y., & de Leeuw, J. (1978). The principal components of mixed measurement level multivariate data: An alternating least squares method with optimal scaling features. Psychometrika, 43, 279–281.

Yucel, R. M., & Zaslavsky, A. M. (2008). Using calibration to improve rounding in imputation. Am Statistician, 62(2), 125–129.

using rounding to impute binary variables using techniques for continuous data;uses the method to solve for the cutpoint for a continuous estimate to be converted into a binary value;method should be useful in more general situations;idea is to duplicate the entire dataset and in the second half of the new datasets to set all non-missing values of the target variable to missing;multiply impute these now-missing values and compare them to the actual values

Zhang, H. H., & Lu, W. (2007). Adaptive lasso for Cox’s proportional hazards model. Biometrika, 94, 691–703.

penalty function has ratios against original MLE;scale-free lasso

Zhang, M., Yu, Y., Wang, S., Salvatore, M., Fritsche, L. G., He, Z., & Mukherjee, B. (2020). Interaction analysis under misspecification of main effects: Some common mistakes and simple solutions. Statistics in Medicine, n/a(n/a). https://doi.org/10.1002/sim.8505

Zheng, X., & Loh, W.-L. (1995). Consistent variable selection in linear models. J Am Stat Assoc, 90, 151–156.

Zhou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. J Comp Graph Stat, 15, 265–286.

principal components analysis that shrinks some loadings to zero

Zhou, X., & Reiter, J. P. (2012). A Note on Bayesian Inference After Multiple Imputation. The American Statistician, 64(2), 159–163. https://doi.org/10.1198/tast.2010.09109

Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. J Roy Stat Soc B, 67(2), 301–320.