If You Like the Wilcoxon Test You Must Like the Proportional Odds Model

ordinal
hypothesis-testing
2021
accuracy-score
RCT
regression
metrics
Since the Wilcoxon test is a special case of the proportional odds (PO) model, if one likes the Wilcoxon test, one must like the PO model. This is made more convincing by showing examples of how one may accurately compute the Wilcoxon statistic from the PO model’s odds ratio.
Author

Vanderbilt University
School of Medicine
Department of Biostatistics

Published

March 10, 2021

Clearly, the dependence of the proportional odds model on the assumption of proportionality can be over-stressed. Suppose that two different statisticians would cut the same three-point scale at different cut points. It is hard to see how anybody who could accept either dichotomy could object to the compromise answer produced by the proportional odds model. — Stephen Senn

Background

The Wilcoxon-Mann-Whitney two-sample rank-sum test is a special case of the proportional odds (PO) ordinal logistic regression model. The numerator of the PO model score \(\chi^2\) test for comparing two groups without covariate adjustment is exactly the Wilcoxon statistic. The equivalence of the PO model and the Wilcoxon test in this simple two-group setting is perhaps demonstrated more compellingly by showing how the Wilcoxon test statistic may be accurately approximated by a simple function of the odds ratio (OR) estimate from the PO model, even when PO is strongly violated.

In Violation of Proportional Odds is Not Fatal I used simulation to derive an accurate approximation to the Wilcoxon statistic from the PO model group 2 : group 1 odds ratio estimate (OR). When the Wilcoxon statistic is re-scaled to have a 0-1 range, i.e., to a concordance probability \(c\), the approximation is \[c = \frac{\mathrm{OR}^{0.66}}{1 + \mathrm{OR}^{0.66}}\]

Over a wide variety of simulated datasets, this approximation has a mean absolute error of 0.0021.

  • 1 Note that \(0.66 = \frac{1}{1.52}\) from the blog article.

  • For \(n\) overall observations, \(n_1\) in group 1 and \(n_2\) in group 2, let \(R\) denote the vector of ranks of all the observations ignoring group membership. In case of ties, midranks are used. The Wilcoxon rank-sum statistic \(W\) is based on the sum of ranks in group 2. Let \(X_{i}\) be [group = 2], the 1/0 indicator of being in group 2 for the \(i^{\mathrm{th}}\) observation.

    \[W = \sum_{i=1}^{n} X_{i} R_{i} - \frac{n_{2} (n_{2} + 1)}{2}\]

    Letting \(\bar{R}\) denote the mean of the group 2 ranks (\(\frac{1}{n_{2}} \sum_{i=1}^{n} X_{i}R_{i})\), the Wilcoxon statistic is proportional to

    \[c = \frac{\bar{R} - \frac{n_{2} + 1}{2}}{n_{1}}\] where \(c\) is the c-index or concordance probability. It is the proportion of all possible pairs of observations, one from group 2 and one from group 1, such that the the observation from group 2 is the larger of the two. Since midranks are used for ties, \(c\) counts a tied pair as \(\frac{1}{2}\) concordant. So letting \(Y_{1}\) and \(Y_{2}\) represent, respectively, random observations from groups 1 and 2, \(c\) estimates \(\Pr(Y_{2} > Y_{1}) + \frac{1}{2} \Pr(Y_{2} = Y_{1})\).

    There is an identity with Somers’ \(D_{yx}\) rank correlation, which is the probability of concordance minus the probability of discordance. \(D_{yx}\) can be also be written as \(D_{yx} = 2 \times (c - \frac{1}{2})\). Note that the R Hmisc package function rcorr.cens used below computes \(D_{xy}\), but \(D_{yx}(X, Y)\) is the same as \(D_{xy}(Y, X)\). \(D_{yx}\) means “discard ties on \(X\), let ties on \(Y\) count against us.” In a two-group comparison we are discarding ties on \(X\), i.e., are not comparing observations from group 1 with other observations in group 1.

    Discrete Ordinal Y Example

    Let’s go through the calculations and check the Wilcoxon PO OR-based approximation using data in which there is a severe violation of the PO assumption. We have three levels of \(Y\) (0, 1, 2), with the group 2 : group 1 OR for \(Y=2\) being 2.5 but the OR for \(Y \geq 1\) being 0.795. The compromise OR from assuming PO is 1.114.

    Code
    w <- expand.grid(group=1:2, y=0:2)
    n <- c(100, 110, 50, 10, 30, 60)
    u <- w[rep(1:6, n),]
    with(u, table(group, y))
         y
    group   0   1   2
        1 100  50  30
        2 110  10  60
    Code
    or2 <- exp(coef(lrm(y == 2 ~ group, data=u))['group'])
    or1 <- exp(coef(lrm(y >= 1 ~ group, data=u))['group'])
    or12 <- exp(coef(lrm(y ~ group, data=u))['group'])
    ors <- c(or2, or1, or12)
    names(ors) <- c('y=2', 'y>=1', 'y')
    ors
          y=2      y>=1         y 
    2.5000000 0.7954545 1.1135786 

    Now compute the Wilcoxon statistic

    Code
    wilcox.test(y ~ group, u, correct=FALSE)
    
        Wilcoxon rank sum test
    
    data:  y by group
    W = 15750, p-value = 0.6061
    alternative hypothesis: true location shift is not equal to 0
    Code
    sumr1 <- with(u, sum(rank(y)[group == 1]))
    sumr2 <- with(u, sum(rank(y)[group == 2]))
    n1 <- sum(u$group == 1)
    n2 <- sum(u$group == 2)
    # wilcox.test uses sum of ranks in group 1
    W <- sumr1 - n1 * (n1 + 1) / 2   # equals wilcox.test
    W
    [1] 15750
    Code
    # Going forward use sum of ranks in group 2
    W <- sumr2 - n2 * (n2 + 1) / 2

    Compute \(c\) three different ways.

    Code
    W / (n1 * n2)
    [1] 0.5138889
    Code
    with(u, (mean(rank(y)[group == 2]) - (n2 + 1) / 2) / n1)
    [1] 0.5138889
    Code
    b <- with(u, rcorr.cens(y, group))
    b
           C Index            Dxy           S.D.              n        missing 
      5.138889e-01   2.777778e-02   5.470021e-02   3.600000e+02   0.000000e+00 
        uncensored Relevant Pairs     Concordant      Uncertain 
      3.600000e+02   6.480000e+04   3.330000e+04   0.000000e+00 
    Code
    conc <- b['C Index']
    conc
      C Index 
    0.5138889 

    Now compare the concordance probability with the approximation from the PO-estimated OR:

    Code
    po <- ors['y']
    capprox <- po ^ 0.66 / (1 + po ^ 0.66)
    capprox
           y 
    0.517743 

    The approximation is off by 0.004.

    Check against the original regression equation fitted here:

    Code
    plogis((log(po) - 0.0003) / 1.5179)
            y 
    0.5176616 

    Continuous Y Example

    Now consider a two-sample problem with continuous Y. We could induce mild non-PO by sampling from two normal distributions with equal variance and a nonzero difference in means2. But let’s induce major non-PO by also allowing the variance in the two groups to be unequal. Draw a random sample of size 100 from a normal distribution with mean 0 and variance 1 and a second sample from a normal distribution with mean 1 and variance 0.2.

  • 2 Note that PO would hold if one simulated from a logistic distribution with a shift in location only.

  • Code
    set.seed(1)
    n1 <- n2 <- 100
    y1 <- rnorm(n1, 0, 1)
    y2 <- rnorm(n1, 1, sqrt(0.2))
    group <- c(rep(1, n1), rep(2, n2))
    y <- c(y1, y2)
    Ecdf(~ y, group=group, fun=qlogis, ylab='logit ECDF')

    Serious non-parallelism of the logit of the two empirical cumulative distributions means serious non-proportional odds. Let’s get the \(c\) index (concordance probability) and its approximation from the PO model, as before. We use the rms package orm function which is designed to efficiently analyze continuous Y. Here the model has 199 intercepts since there are no ties in the data. Below I have back-solved for the Wilcoxon test \(\chi^2\) statistic.

    Code
    conc <- (mean(rank(y)[group == 2]) - (n2 + 1) / 2) / n1
    w <- wilcox.test(y ~ group)
    wchisq <- qchisq(1 - w$p.value, 1)
    wchisq

    [1] 59.67072

    Code
    f <- orm(y ~ group)
    f

    Logistic (Proportional Odds) Ordinal Regression Model

     orm(formula = y ~ group)
     
    Model Likelihood
    Ratio Test
    Discrimination
    Indexes
    Rank Discrim.
    Indexes
    Obs 200 LR χ2 69.08 R2 0.292 ρ 0.548
    Distinct Y 200 d.f. 1 R21,200 0.289
    Y0.5 0.7002136 Pr(>χ2) <0.0001 R21,200 0.289
    max |∂log L/∂β| 6×10-7 Score χ2 66.46 |Pr(Y ≥ median)-½| 0.257
    Pr(>χ2) <0.0001
    β S.E. Wald Z Pr(>|Z|)
    group  2.2796  0.2922 7.80 <0.0001
    Code
    or <- exp(coef(f)['group'])
    capprox <- or ^ 0.66 / (1 + or ^ 0.66)
    z <- c(conc, capprox, or)
    names(z) <- c('c-index', 'c-approx', 'OR')
    z
      c-index  c-approx        OR 
    0.8162000 0.8182461 9.7723775 

    The approximation is off by 0.002.

    Further Reading

    Computing Environment

     R version 4.2.1 (2022-06-23)
     Platform: x86_64-pc-linux-gnu (64-bit)
     Running under: Pop!_OS 22.04 LTS
     
     Matrix products: default
     BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
     LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
     
     attached base packages:
     [1] stats     graphics  grDevices utils     datasets  methods   base     
     
     other attached packages:
     [1] rms_6.4-0       SparseM_1.81    Hmisc_4.7-2     ggplot2_3.3.5  
     [5] Formula_1.2-4   survival_3.4-0  lattice_0.20-45
     
    To cite R in publications use:

    R Core Team (2022). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

    To cite the Hmisc package in publications use:

    Harrell Jr F (2022). Hmisc: Harrell Miscellaneous. R package version 4.7-2, https://hbiostat.org/R/Hmisc/.

    To cite the rms package in publications use:

    Harrell Jr FE (2022). rms: Regression Modeling Strategies. https://hbiostat.org/R/rms/, https://github.com/harrelfe/rms.

    To cite the ggplot2 package in publications use:

    Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org.

    To cite the survival package in publications use:

    Therneau T (2022). A Package for Survival Analysis in R. R package version 3.4-0, https://CRAN.R-project.org/package=survival.