International Society for Clinical Biostatistics 41

2020-08-26

- Classifier: a method providing only categorical predictions
- Classification is a premature decision; a forced choice
- Inconsistent with optimal decision making unless true patient-specific utilities known by analyst
- Best for deterministic outcomes occurring frequently
- Use when probabilities of class membership are all near 0 or 1

- Predictions are separate from decisions & can be used by any decision maker
- When outcome incidence is near 0 or 1 deal with
*tendencies*(probabilities) - ML too often uses classification and discards observations to get class balance (!)

- Modern modelling techniques are data hungry by van der Ploeg, Austin, Steyerberg
- SM n=20p
- ML n=200p
- Single recursive partitioning tree n > 200p

- Estimate a
**single**correlation coefficient: n=400 for MOE \(\pm 0.1\) - Estimate
**only the intercept**in a logistic model: n=96 for MOE \(\pm 0.1\) - Estimate \(\sigma\) in linear model: n=70 for MMOE 1.2
- Estimate misclassification probability: n=96 for MOE \(\pm 0.1\)

- Select the right variables from a large number: n=\(\infty\)
- Estimate misclassification probability with feature selection or large p: n \(>>\) 96
- If sample size is not large in comparison with p, it may be insufficient for
- choosing the optimum penalty
- estimating model performance
- estimating variable importance measures

If n is too small to do something simple, it is too small to do something complex

- Probability model for data
- Default assumption of additivity of predictor effects
- Interactions usually must be pre-specified
- Model may be very high dimensional if penalization used
- Very easy to allow for non-linearity
- Suffers from assumptions
- semiparametric models a great help

- Regression models are
**not**ML (though do fall under*statistical learning*) - Sound of machine learning posing as logistic regression (courtesy of Maarten van Smeden)

- No probability model for data
- Empirical without favoring additivity
- Algorithmic
- Can deal with high-order interactions
- Allows for non-linearity
- Suffers from lack of assumptions
- Examples: neural net (deep learning), recursive partitioning, random forest, SVM

- Very high S:N settings (visual and sound pattern recognition) and infinite S:N settings (games e.g. Go and chess)
- makes it safe to effectively estimate a large number of parameters

- Also when unlimited training with exact replications are possible (games)
- Very large n
- Outcome is almost deterministic (two identical subjects will have the same outcomes)

- Lower S:N e.g. diagnosing ovarian cancer from clinical signs, symptoms, biomarkers
- Outcome is stochastic
- Predominantly additive effects
- Lower n

- Clinical researchers are getting less impressed with ML in typical clinical prediction problems
- Multiple comparative studies are showing that gains from ML in low S:N settings is modest

John Zech `medium.com/@jrzech`