Bordley, R. (2007). Statistical decisionmaking without math. Chance, 20(3), 39–44.
Briggs, W. M., & Zaretzki, R. (2008). The skill plot:
A graphical technique for evaluating continuous diagnostic tests (with discussion).
Biometrics,
64, 250–261.
"statistics such as the AUC are not especially relevant to someone who must make a decision about a particular x_c. ... ROC curves lack or obscure several quantities that are necessary for evaluating the operational effectiveness of diagnostic tests. ... ROC curves were first used to check how radio \(<\)i\(>\)receivers\(<\)/i\(>\) (like radar receivers) operated over a range of frequencies. ... This is not how most ROC curves are used now, particularly in medicine. The receiver of a diagnostic measurement ... wants to make a decision based on some x_c, and is not especially interested in how well he would have done had he used some different cutoff."; in the discussion David Hand states "when integrating to yield the overall AUC measure, it is necessary to decide what weight to give each value in the integration. The AUC implicitly does this using a weighting derived empirically from the data. This is nonsensical. The relative importance of misclassifying a case as a noncase, compared to the reverse, cannot come from the data itself. It must come externally, from considerations of the severity one attaches to the different kinds of misclassifications."; see Lin, Kvam, Lu Stat in Med 28:798-813;2009
Fan, J., & Levine, R. A. (2007). To amnio or not to amnio: That is the decision for Bayes. Chance, 20(3), 26–32.
Fedorov, V., Mannino, F., & Zhang, R. (2009). Consequences of dichotomization.
Pharm Stat,
8, 50–61.
https://doi.org/10.1002/pst.331optimal cutpoint depends on unknown parameters;should only entertain dichotomization when "estimating a value of the cumulative distribution and when the assumed model is very different from the true model";nice graphics
Gail, M. H., & Pfeiffer, R. M. (2005). On criteria for evaluating models of absolute risk. Biostatistics, 6(2), 227–239.
Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation.
J Am Stat Assoc,
102, 359–378.
wonderful review article except missing references from Scandanavian and German medical decision making literature
Iezzoni, L. I. (1994). Dimensions of
Risk. In L. I. Iezzoni (Ed.),
Risk Adjustment for Measuring Health Outcomes (pp. 29–118).
Foundation of the American College of Healthcare Executives.
dimensions of risk factors to include in models
Luo, X., Stfanski, L. A., & Boos, D. D. (2006). Tuning variable selection procedures by adding noise.
Technometrics,
48, 165–175.
adding a known amount of noise to the response and studying σ² to tune the stopping rule to avoid overfitting or underfitting;simulation setup
Reilly, B. M., & Evans, A. T. (2006). Translating clinical research into clinical practice:
Impact of using prediction rules to make decisions.
Ann Int Med,
144, 201–209.
impact analysis;example of decision aid being ignored or overruled making MD decisions worse;assumed utilities are constant across subjects by concluding that directives have more impact than predictions;Goldman-Cook clinical prediction rule in AMI
Vickers, A. J. (2008). Decision analysis for the evaluation of diagnostic tests, prediction models, and molecular markers.
Am Statistician,
62(4), 314–320.
limitations of accuracy metrics;incorporating clinical consequences;nice example of calculation of expected outcome;drawbacks of conventional decision analysis, especially because of the difficulty of eliciting the expected harm of a missed diagnosis;use of a threshold on the probability of disease for taking some action;decision curve;has other good references to decision analysis
Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. J Am Stat Assoc, 93, 120–131.