Let a = deviance of the full model and b = deviance of the intercept-only model. LR \(\chi^{2} = b - a\)

Let k = total number of parameters in the model

Let p = number of non-intercepts in the model

Then AIC = a + 2k

McFadden’s adjusted \(R^2\) is \(1 - \frac{a + 2k}{b}\)

I can’t find justification for the factor 2.

Maddala-Cox-Snell (MCS) \(R^2\): \(1 - \exp(-LR/n)\)

For some models, the MCS \(R^2\) cannot attain a value of 1.0 even with a perfect model. The Nagelkerke \(R^2\) (\(R^{2}_{N}\)) divides the MCS \(R^2\) by the maximum attainable value which is \(1 - \exp(-b/n)\). For a binary logistic example, suppose there is one binary predictor x that is balanced, and y=x. The MCS \(R^2\) is 0.75 and \(R^{2}_{N}=1.0\) for predicting y from itself. But there is controversy over whether \(R^{2}_{N}\) is properly recalibrated over its whole range and its use of \(n\) doesn’t apply to censored data, so we don’t use \(R^{2}_{N}\) below.

The idea of adjustment is to not reward \(R^2\) for overfitting. The most commonly used adjusted \(R^2\) with linear models is \(1 - (1 - R^{2})\frac{n-1}{n-p-1}\) which is obtained by replacing the effective estimate of the residual variance with the unbiased estimate \(\frac{\sum_{i=1}^{n} r^{2}_{i}}{n-p-1}\) where \(r\) is a residual.

Carrying this to MCS gives \(1 - \exp(-LR/n)\frac{n-1}{n-p-1} = 1 - \exp(-LR/n + \log\frac{n-1}{n-p-1}) = 1 - \exp(-(LR - n \log\frac{n-1}{n-p-1}) / n)\).

\(n \log\frac{n-1}{n-p-1} = n \log(1 + \frac{p}{n-p-1}) \approx \frac{np}{n-p-1} \approx p\).

So applying linear model adjusted \(R^2\) to MCS \(R^2\) is approximately \(1 - \exp(-(LR - p) / n)\). This is sensible because under the global null hypothesis of no associations between any X’s and Y the expected value of \(LR\) is \(p\). Thus \(LR - p\) is a chance correction for \(LR\).

Besides \(R^{2}_{N}\), the R `rms`

package implements 4 types of MCS \(R^2\) computed by the `Hmisc`

package R2Measures function. Either of the two \(p\) adjustments can be used, with the \(LR - p\) method being the default. This is a slight modification of the Mittlbock & Schemper approach (see references). The first two measures use the raw sample size \(n\) and the second two use the effective sample size \(m\). The effective sample size \(m\) is taken to be the following:

- For right-censored time-to-event data (survival analysis) \(m\) is the number of uncensored observations (number of events). This is exactly correct when the survival distribution is exponential or the context is the Cox-logrank two-sample test for comparing survival distributions. For front-loaded hazard functions where instantaneous event rates are very high at the beginning of follow-up, uncensored observations convey more information and \(m\) should be between \(n\) and the number of events. There is currently no guidance for exactly how to estimate \(m\) in this case. See the Benedetti reference.
- For binary, ordinal, semi-continuous, or continuous uncensored response variable \(Y\) the effective sample size is taken as the sample size \(m < n\) for a continuous variable that makes the approximate variance of the log odds ratio in a proportional odds model equal to the variance from the original \(Y\) of size \(n\). This also makes the power of the Wilcoxon test for the smaller continuous \(Y\) and the larger \(Y\) with ties equivalent. This approach is due to Whitehead and is a good approximation for the binary \(Y\) case. Let \(y_{1}, y_{2}, ..., y_{k}\) be the distinct values of \(Y\) and \(p_{1}, p_{2}, .., p_{k}\) be the proportion of \(Y\) values occurring for these distinct \(Y\) values. The effective sample size is \(m = n(1 - \sum_{i}^{k} p_{i}^{3})\). The multiplier for \(n\) is what is computed as the
`Info`

information measure by the R`Hmisc`

`describe`

function, and this \(m\) is used in the`Hmisc`

`popower`

function.

There are four MCS-based \(R^2\) measures, prooduced in order by the `R2Measures`

function.

- \(R^{2}_{n}\): original MCS \(R^2\)
- \(R^{2}_{p,n}\): adjusted for estimating \(p\) regression coefficients (non-intercepts)
- \(R^{2}_{m}\): use effective instead of actual sample size, don’t penalize for overfitting
- \(R^{2}_{p,m}\): effective sample size adjusted for \(p\) estimated regression coefficients

Note that when comparing the performance of a binary \(Y\) model with that of an ordinal \(Y\) model it is not appropriate to use a measure based on \(m\). That is because the ordinal model is charged with a more difficult prediction task but would be penalized for a higher effective sample size.

- https://www.glmj.org/archives/articles/Smith_v39n2.pdf
- https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faq-what-are-pseudo-r-squareds
- https://stats.stackexchange.com/questions/82105/mcfaddens-pseudo-r2-interpretation
- http://eml.berkeley.edu/~mcfadden/travel.html especially https://eml.berkeley.edu/~mcfadden/travel/ch5.pdf
- http://thestatsgeek.com/2014/02/08/r-squared-in-logistic-regression
- https://statisticalhorizons.com/r2logistic
- https://support.sas.com/resources/papers/proceedings/proceedings/sugi25/25/st/25p256.pdf
- Mittlbock and Schemper which mentions van Houwelingen’s idea of correcting log-likelihood by (p+1)/2 in the numerator and 1/2 in the denominator of McFadden’s \(R^2\). So this is not consistent with AIC but is consistent with a chance correction for the \(\chi^2\) statistic.
- R Hmisc package R2Measures function
- Benedetti
- Whitehead
- https://stats.stackexchange.com/questions/48703