EPICLIN2021 / JSCLCC28

2021-06-09

- Classifier: a method providing only categorical predictions
- Classification is a premature decision; a forced choice
- Inconsistent with optimal decision making unless true patient-specific utilities known by analyst
- Best for deterministic outcomes occurring frequently
- Use when probabilities of class membership are all near 0 or 1

- Predictions are separate from decisions & can be used by any decision maker
- When outcome incidence is near 0 or 1 deal with
*tendencies*(probabilities) - ML too often uses classification and discards observations to get class balance (!)

- Modern modelling techniques are data hungry by van der Ploeg, Austin, Steyerberg
- SM n=20p
- ML n=200p
- Single recursive partitioning tree n > 200p

- Estimate a
**single**correlation coefficient: n=400 for MOE \(\pm 0.1\) - Estimate
**only the intercept**in a logistic model: n=96 for MOE \(\pm 0.1\) - Estimate \(\sigma\) in linear model: n=70 for MMOE 1.2
- Estimate misclassification probability: n=96 for MOE \(\pm 0.1\)

- Select the right variables from a large number: n=\(\infty\)
- Estimate misclassification probability with feature selection or large p: n \(>>\) 96
- If sample size is not large in comparison with p, it may be insufficient for
- choosing the optimum penalty
- estimating model performance
- estimating variable importance measures

If n is too small to do something simple, it is too small to do something complex

- Probability model for data
- Default assumption of additivity of predictor effects
- Interactions usually must be pre-specified
- Model may be very high dimensional if penalization used
- Very easy to allow for non-linearity
- Suffers from assumptions
- semiparametric models a great help

- Regression models are
**not**ML (though do fall under*statistical learning*) - Sound of machine learning posing as logistic regression (courtesy of Maarten van Smeden)

When we raise money it’s AI,

when we hire it’s machine learning, and

when we do the work it’s logistic regression

—Juan Miguel Lavista `@BDataScientist`

- No probability model for data
- Empirical without favoring additivity
- Algorithmic
- Can deal with high-order interactions
- Allows for non-linearity
- Suffers from lack of assumptions
- Examples: neural net (deep learning), recursive partitioning, random forest, SVM

- Very high S:N settings (visual and sound pattern recognition) and infinite S:N settings (games e.g. Go and chess)
- makes it safe to effectively estimate a large number of parameters

- Also when unlimited training with exact replications are possible (games)
- Very large n
- Outcome is almost deterministic (two identical subjects will have the same outcomes)

- Lower S:N e.g. diagnosing ovarian cancer from clinical signs, symptoms, biomarkers
- Outcome is stochastic
- Predominantly additive effects
- Lower n

- Clinical researchers are getting less impressed with ML in typical clinical prediction problems
- Multiple comparative studies are showing that gains from ML in low S:N settings is modest

John Zech `medium.com/@jrzech`