Let a = deviance of the full model and b = deviance of the intercept-only model. LR \(\chi^{2} = b - a\)
Let k = total number of parameters in the model
Let p = number of non-intercepts in the model
Then AIC = a + 2k
McFadden’s adjusted \(R^2\) is \(1 - \frac{a + 2k}{b}\)
I can’t find justification for the factor 2.
Maddala-Cox-Snell (MCS) \(R^2\): \(1 - \exp(-LR/n)\)
For some models, the MCS \(R^2\) cannot attain a value of 1.0 even with a perfect model. The Nagelkerke \(R^2\) (\(R^{2}_{N}\)) divides the MCS \(R^2\) by the maximum attainable value which is \(1 - \exp(-b/n)\). For a binary logistic example, suppose there is one binary predictor x that is balanced, and y=x. The MCS \(R^2\) is 0.75 and \(R^{2}_{N}=1.0\) for predicting y from itself. But there is controversy over whether \(R^{2}_{N}\) is properly recalibrated over its whole range and its use of \(n\) doesn’t apply to censored data, so we don’t use \(R^{2}_{N}\) below.
The idea of adjustment is to not reward \(R^2\) for overfitting. The most commonly used adjusted \(R^2\) with linear models is \(1 - (1 - R^{2})\frac{n-1}{n-p-1}\) which is obtained by replacing the effective estimate of the residual variance with the unbiased estimate \(\frac{\sum_{i=1}^{n} r^{2}_{i}}{n-p-1}\) where \(r\) is a residual.
Carrying this to MCS gives \(1 - \exp(-LR/n)\frac{n-1}{n-p-1} = 1 - \exp(-LR/n + \log\frac{n-1}{n-p-1}) = 1 - \exp(-(LR - n \log\frac{n-1}{n-p-1}) / n)\).
\(n \log\frac{n-1}{n-p-1} = n \log(1 + \frac{p}{n-p-1}) \approx \frac{np}{n-p-1} \approx p\).
So applying linear model adjusted \(R^2\) to MCS \(R^2\) is approximately \(1 - \exp(-(LR - p) / n)\). This is sensible because under the global null hypothesis of no associations between any X’s and Y the expected value of \(LR\) is \(p\). Thus \(LR - p\) is a chance correction for \(LR\).
Besides \(R^{2}_{N}\), the R rms
package implements 4 types of MCS \(R^2\) computed by the Hmisc
package R2Measures function. Either of the two \(p\) adjustments can be used, with the \(LR - p\) method being the default. This is a slight modification of the Mittlbock & Schemper approach (see references). The first two measures use the raw sample size \(n\) and the second two use the effective sample size \(m\). The effective sample size \(m\) is taken to be the following:
Info
information measure by the R Hmisc
describe
function, and this \(m\) is used in the Hmisc
popower
function.There are four MCS-based \(R^2\) measures, prooduced in order by the R2Measures
function.
Note that when comparing the performance of a binary \(Y\) model with that of an ordinal \(Y\) model it is not appropriate to use a measure based on \(m\). That is because the ordinal model is charged with a more difficult prediction task but would be penalized for a higher effective sample size.