# Pseudo $$R^2$$ Measures

Let a = deviance of the full model and b = deviance of the intercept-only model. LR $$\chi^{2} = b - a$$

Let k = total number of parameters in the model

Let p = number of non-intercepts in the model

Then AIC = a + 2k

McFadden’s adjusted $$R^2$$ is $$1 - \frac{a + 2k}{b}$$

I can’t find justification for the factor 2.

Maddala-Cox-Snell (MCS) $$R^2$$: $$1 - \exp(-LR/n)$$

For some models, the MCS $$R^2$$ cannot attain a value of 1.0 even with a perfect model. The Nagelkerke $$R^2$$ ($$R^{2}_{N}$$) divides the MCS $$R^2$$ by the maximum attainable value which is $$1 - \exp(-b/n)$$. For a binary logistic example, suppose there is one binary predictor x that is balanced, and y=x. The MCS $$R^2$$ is 0.75 and $$R^{2}_{N}=1.0$$ for predicting y from itself. But there is controversy over whether $$R^{2}_{N}$$ is properly recalibrated over its whole range and its use of $$n$$ doesn’t apply to censored data, so we don’t use $$R^{2}_{N}$$ below.

# Adjusted $$R^2$$ Measures

The idea of adjustment is to not reward $$R^2$$ for overfitting. The most commonly used adjusted $$R^2$$ with linear models is $$1 - (1 - R^{2})\frac{n-1}{n-p-1}$$ which is obtained by replacing the effective estimate of the residual variance with the unbiased estimate $$\frac{\sum_{i=1}^{n} r^{2}_{i}}{n-p-1}$$ where $$r$$ is a residual.

Carrying this to MCS gives $$1 - \exp(-LR/n)\frac{n-1}{n-p-1} = 1 - \exp(-LR/n + \log\frac{n-1}{n-p-1}) = 1 - \exp(-(LR - n \log\frac{n-1}{n-p-1}) / n)$$.

$$n \log\frac{n-1}{n-p-1} = n \log(1 + \frac{p}{n-p-1}) \approx \frac{np}{n-p-1} \approx p$$.

So applying linear model adjusted $$R^2$$ to MCS $$R^2$$ is approximately $$1 - \exp(-(LR - p) / n)$$. This is sensible because under the global null hypothesis of no associations between any X’s and Y the expected value of $$LR$$ is $$p$$. Thus $$LR - p$$ is a chance correction for $$LR$$.

# Adjusted Modified MCS $$R^2$$

Besides $$R^{2}_{N}$$, the R rms package implements 4 types of MCS $$R^2$$ computed by the Hmisc package R2Measures function. Either of the two $$p$$ adjustments can be used, with the $$LR - p$$ method being the default. This is a slight modification of the Mittlbock & Schemper approach (see references). The first two measures use the raw sample size $$n$$ and the second two use the effective sample size $$m$$. The effective sample size $$m$$ is taken to be the following:

• For right-censored time-to-event data (survival analysis) $$m$$ is the number of uncensored observations (number of events). This is exactly correct when the survival distribution is exponential or the context is the Cox-logrank two-sample test for comparing survival distributions. For front-loaded hazard functions where instantaneous event rates are very high at the beginning of follow-up, uncensored observations convey more information and $$m$$ should be between $$n$$ and the number of events. There is currently no guidance for exactly how to estimate $$m$$ in this case. See the Benedetti reference.
• For binary, ordinal, semi-continuous, or continuous uncensored response variable $$Y$$ the effective sample size is taken as the sample size $$m < n$$ for a continuous variable that makes the approximate variance of the log odds ratio in a proportional odds model equal to the variance from the original $$Y$$ of size $$n$$. This also makes the power of the Wilcoxon test for the smaller continuous $$Y$$ and the larger $$Y$$ with ties equivalent. This approach is due to Whitehead and is a good approximation for the binary $$Y$$ case. Let $$y_{1}, y_{2}, ..., y_{k}$$ be the distinct values of $$Y$$ and $$p_{1}, p_{2}, .., p_{k}$$ be the proportion of $$Y$$ values occurring for these distinct $$Y$$ values. The effective sample size is $$m = n(1 - \sum_{i}^{k} p_{i}^{3})$$. The multiplier for $$n$$ is what is computed as the Info information measure by the R Hmisc describe function, and this $$m$$ is used in the Hmisc popower function.

There are four MCS-based $$R^2$$ measures, prooduced in order by the R2Measures function.

• $$R^{2}_{n}$$: original MCS $$R^2$$
• $$R^{2}_{p,n}$$: adjusted for estimating $$p$$ regression coefficients (non-intercepts)
• $$R^{2}_{m}$$: use effective instead of actual sample size, don’t penalize for overfitting
• $$R^{2}_{p,m}$$: effective sample size adjusted for $$p$$ estimated regression coefficients

Note that when comparing the performance of a binary $$Y$$ model with that of an ordinal $$Y$$ model it is not appropriate to use a measure based on $$m$$. That is because the ordinal model is charged with a more difficult prediction task but would be penalized for a higher effective sample size.