6 Comparing Two Proportions

6.1 Overview

Compare dichotomous independent variable with a dichotomous outcome
- Independent variables: Exposed/Not, Treatment/Control, Knockout/Wild Type, etc.
- Outcome (dependent) variables: Diseased/Not or any Yes/No outcome
Continuous outcomes often dichotomized for analysis (bad idea)
- Consider \(t\)-tests (Chapter 5) or Non-parametric methods (Chapter 7)

6.2 Normal-Approximation Test

Two independent samples

	Sample 1	Sample 2
Sample size	\(n_1\)	\(n_2\)
Population probability of event	\(p_1\)	\(p_2\)
Sample probability of event	\(\hat{p}_1\)	\(\hat{p}_2\)

Null Hypothesis, \(H_{0}: p_{1}=p_{2}=p\)
Estimating the variance
- Variance of \(\hat{p}_{i} = p_{i}(1-p_{i})/n_{i}\) for \(i = 1, 2\)
- Variance of \(\left(\hat{p}_1 - \hat{p}_2\right)\) is the sum of the variances, which under \(H_{0}\) is \[ p(1-p)[\frac{1}{n_{1}}+\frac{1}{n_{2}}] \]
- We estimate this variance by plugging \(\hat{p}\) into \(p\), where \[ \hat{p} = \frac{n_{1}\hat{p}_{1}+n_{2}\hat{p}_{2}}{n_{1}+n_{2}} \] is the pooled estimate of the probability under \(H_{0}: p_1 = p_2 = p\)
Test statistic which has approximately a normal distribution under \(H_{0}\) if \(n_{i}\hat{p}_{i}\) are each large enough: \[ z = \frac{\hat{p}_{1} - \hat{p}_{2}}{\sqrt{\hat{p}(1-\hat{p})[\frac{1}{n_{1}}+\frac{1}{n_{2}}]}} \]
To test \(H_{0}\) we see how likely it is to obtain a \(z\) value as far or farther out in the tails of the normal distribution than \(z\) is
We don’t recommend using the continuity correction
Example:
Test whether the population of women whose age at first birth \(\leq 29\) has the same probability of breast cancer as women whose age at first birth was \(\geq 30\). This dichotomization is highly arbitrary and we should really be testing for an association between age and cancer incidence, treating age as a continuous variable.
Case-control study (independent and dependent variables interchanged); \(p_{1}=\) probability of age at first birth \(\geq 30\), etc.

	with Cancer	without Cancer
Total # of subjects	\(3220~~(n_1)\)	\(10245~~(n_2)\)
# age \(\geq 30\)	\(683\)	\(1498\)
Sample probabilities	\(0.212~~(\hat{p}_1\))	\(0.146~~(\hat{p}_2\))
Pooled probability	\(\frac{683 + 1498}{3220 + 10245} = 0.162\)

Estimate the variance
- \(\mathrm{variance}(\hat{p}_{1}-\hat{p}_{2}) = \hat{p}(1-\hat{p})\times \left[\frac{1}{n_{1}}+\frac{1}{n_{2}}\right] = 5.54 \times 10^{-5}\)
- \(SE = \sqrt{\mathrm{variance}} = 0.00744\)
Test statistic
- \(z = \frac{0.212-0.146}{0.00744} = 8.85\)
2-tailed \(P\)-value is \(< 10^{-4}\)

Code

n1 <- 3220;     n2 <- 10245
p1 <- 683 / n1; p2 <- 1498 / n2
pp <- (n1 * p1 + n2 * p2) / (n1 + n2)
se <- sqrt(pp * (1 - pp) * (1 / n1 + 1 / n2))
z  <- (p1 - p2) / se
pval <- 2 * (1 - pnorm(abs(z)))
round(c(p1=p1, p2=p2, pooled=pp, se=se, z=z, pval=pval), 4)

    p1     p2 pooled     se      z   pval 
0.2121 0.1462 0.1620 0.0074 8.8527 0.0000

We do not use a \(t\)-distribution because there is no \(\sigma\) to estimate (and hence no “denominator d.f.” to subtract)

6.3 \(\chi^2\) Test

If \(z\) has a normal distribution, \(z^2\) has a \(\chi^2\) distribution with 1 d.f. (are testing a single difference against zero)
The data we just tested can be shown as a \(2\times 2\) contingency table

	Cancer +	Cancer -
Age \(\leq 29\)	2537	8747
Age \(\geq 30\)	683	1498
	3220	10245

In general, the \(\chi^2\) test statistic is given by \[\sum_{ij} \frac{(\mathrm{Obs}_{ij}-\mathrm{Exp}_{ij})^2}{\mathrm{Exp}_{ij}}\]
\(\mathrm{Obs}_{ij}\) is the observed cell frequency for row \(i\) column \(j\)
\(\mathrm{Exp}_{ij}\) is the expected cell frequency for row \(i\) column \(j\)
- Expected cell frequencies calculating assuming \(H_0\) is true
- \(\mathrm{Exp}_{ij} = \frac{\mathrm{row }~i~\mathrm{ total} \times \mathrm{column}~j~\mathrm{total}}{\mathrm{grand total}}\)
- e.g. \(\mathrm{Exp}_{11} = \frac{11284 \times 3220}{13465} = 2698.4\)
For \(2 \times 2\) tables, if the observed cell frequencies are labeled

\(a\)	\(b\)
\(c\)	\(d\)

the \(\chi^{2}\) test statistic simplifies to \[ \frac{N[ad - bc]^{2}}{(a+c)(b+d)(a+b)(c+d)}, \] where \(N=a+b+c+d\). Here we get \(\chi^{2}_{1} = 78.37 = z^{2}\) from above.

Code

x <- matrix(c(2537, 8747, 683, 1498), nrow=2, byrow=TRUE)
x

     [,1] [,2]
[1,] 2537 8747
[2,]  683 1498

Code

chisq.test(x, correct=FALSE)


    Pearson's Chi-squared test

data:  x
X-squared = 78.37, df = 1, p-value < 2.2e-16

Code

# Also compute more accurate P-value based on 1M Monte-Carlo simulations
chisq.test(x, correct=FALSE, simulate.p.value=TRUE, B=1e6)


    Pearson's Chi-squared test with simulated p-value (based on 1e+06
    replicates)

data:  x
X-squared = 78.37, df = NA, p-value = 1e-06

Don’t need Yates’ continuity correction
Note that even though we are doing a 2-tailed test we use only the right tail of the \(\chi^{2}_{1}\) distribution; that’s because we have squared the difference when computing the statistic, so the sign is lost.
This is the ordinary Pearson \(\chi^2\) test

6.4 Fisher’s Exact Test

Is a misnomer in the sense that it computes probabilities exactly, with no normal approximation, but only after changing what is being tested to condition on the number of events and non-events
Because frequencies are discrete and because of the conditioning, the test is conservative (\(P\)-values too large)
Is exact only in the sense that actual type I error probability will not exceed the nominal level
The ordinary Pearson \(\chi^2\) works fine (even when an expected cell frequency is as low as 1.0, contrary to popular belief)
We don’t use Yates’ continuity correction because it was developed to make the normal approximation test yield \(P\)-values that are more similar to Fisher’s test, i.e., to be more conservative
The attempt to obtain exact unconditional \(P\)-values for the simple \(2\times 2\) contingency table has stumped frequentist statisticians for many decades Choi et al. (2015)
By contrast, Bayesian posterior probabilities for the true unconditional quantity of interest are exact
- Frequentist confidence limits and \(P\)-values are approximate because they use the sample space, and the sample space is discrete when the response variable is categorical
- Bayes does not consider the sample space, only the parameter space, which is almost always continuous
See stats.stackexchange.com/questions/14226 for discussion

6.5 Sample Size and Power for Comparing Two Independent Samples

Power \(\uparrow\) as
- \(n_{1}, n_{2} \uparrow\)
- \(\frac{n_{2}}{n_{1}} \rightarrow 1.0\) (usually)
- \(\Delta = |p_{1}-p_{2}| \uparrow\)
- \(\alpha \uparrow\)
There are approximate formulas such as the recommended methods in Altman based on transforming \(\hat{p}\) to make it have a variance that is almost independent of \(p\)
Example: Using current therapy, 0.5 of the population is free of infection at 24 hours. Adding a new drug to the standard of care is expected to increase the percentage infection-free to 0.7. If we randomly sample 100 subjects to receive standard care and 100 subjects to receive the new therapy, what is the probability that we will be able to detect a certain difference between the two therapies at the end of the study?

\[ p_{1}=.5, p_{2}=.7, n_{1}=n_{2}=100 \] results in a power of 0.83 when \(\alpha=0.05\)

Code

require(Hmisc)
bpower(.5, .7, n1=100, n2=100)

    Power 
0.8281098

When computing sample size to achieve a given power, the sample size \(\downarrow\) when
- power \(\downarrow\)
- \(\frac{n_{2}}{n_{1}} \rightarrow 1.0\)
- \(\Delta \uparrow\)
- \(\alpha \uparrow\)
Required sample size is a function of both \(p_{1}\) and \(p_{2}\)
Example:

How many subjects are needed to detect a 0.8 fold decrease in the probability of colorectal cancer if the baseline probability of cancer is \(0.0015\)? Use a power of 0.8 and a type-I error probability of 0.05.

\[\begin{array}{c} p_{1}=0.0015, p_{2}=0.8\times p_{1}=0.0012, \alpha=0.05, \beta=0.2 \\ n_{1}=n_{2}=235,147 \end{array}\]

(Rosner estimated 234,881)

Code

bsamsize(.0015, 0.8 * .0015, alpha=0.05, power=0.8)

      n1       n2 
235147.3 235147.3

Formulas for power and sample size may be seen as R code found at
github.com/harrelfe/Hmisc/blob/master/R/bpower.s.

6.6 Confidence Interval

An approximate \(1-\alpha\) 2-sided CL is given by \[\hat{p}_{1}-\hat{p}_{2} \pm z_{1-\alpha/2} \times\sqrt{\frac{\hat{p}_{1}(1-\hat{p}_{1})}{n_{1}}+\frac{\hat{p}_{2}(1-\hat{p}_{2})}{n_{2}}}\] where \(z_{1-\alpha/2}\) is the critical value from the normal distribution (1.96 when \(\alpha=0.05\)).

The CL for the number of patients needed to be treated to save one event may simply be obtained by taking the reciprocal of the two confidence limits.¹

¹ If a negative risk reduction is included in the confidence interval, set the NNT to \(\infty\) for that limit instead of quoting a negative NNT. There is more to this; see bit.ly/datamethods-nnt.

6.7 Sample Size for a Given Precision

Goal: Plan a study so that the margin of error is sufficiently small
The margin of error (\(\delta\)) is defined to be half of the confidence interval width. For two proportions, \[ \delta = z_{1-\alpha/2} \times\sqrt{\frac{\hat{p}_{1}(1-\hat{p}_{1})}{n_{1}}+\frac{\hat{p}_{2}(1-\hat{p}_{2})}{n_{2}}} \]
Basing the sample size calculations on the margin of error can lead to a study that gives scientifically relevant results even if the results are not statistically significant.
Example: Suppose that the infection risk in a population is \(0.5\) and a reduction to \(0.4\) is believed to be a large enough reduction that it would lead to a change in procedures. A study of a new treatment is planned so that enough subjects will be enrolled for the margin of error is \(0.05\). Consider these two possible outcomes:

The new treatment is observed to decrease infections by \(0.06\) (\(0.95\) CI: \([0.11, 0.01]\)). The confidence interval does not contain \(0\), so we have indirect evidence² that the new treatment is effective at reducing infections. \(0.1\) is also within the confidence interval limits.
The new treatment is observed to decrease infections by only \(0.04\) (\(0.95\) CI: \([0.09, -0.01]\)). The confidence interval now contains \(0\), so we do not have enough evidence to reject the supposition that there is no effect of the treatment on reducing infections if we are bound to an arbitrary \(\alpha=0.05\). However, the confidence interval also does not contain \(0.10\), so we are able to indirectly rule out a scientifically relevant decrease in infections.

² To obtain direct evidence requires Bayesian posterior probabilities.

For fixed \(n_{1} = n_2 = n\), confidence intervals for proportions have the maximum width, when \(p_{1} = p_{2} = 0.5\). This can be shown by:
- Recall that the variance formula for the difference in two proportions when calculating a confidence interval is \[ \frac{p_{1}(1-p_{1})}{n_{1}}+\frac{p_{2}(1-p_{2})}{n_{2}} \]
- When \(p_{1} = p_{2} = p\) and \(n_{1} = n_2 = n\), the variance formula simplifies to \[ \frac{p(1 - p)}{n}+\frac{p (1- p)}{n} = 2\frac{p(1 - p)}{n} \]
- Then, for any fixed value of \(n\) (\(e.g. n = 1\) or \(10\)), \(2\frac{p(1 - p)}{n}\) is largest when \(p = 0.5\). With \(p= 0.5\), the variance formula further simplifies to \[ 2\frac{.25}{n} = \frac{1}{2n} \]
Using \(\alpha = 0.05\) (\(z_{1-\alpha/2} = 1.96\)), the worst-case margin of error will be \[ \delta = 1.96 \sqrt{\frac{1}{2n}} \]
By solving for \(n\), we can rearrange this formula to be \[ n = \frac{1.92}{\delta^2} \]
This formula then gives the number of subjects needed in each group \(n\) to obtain a given margin of error \(\delta\). For a margin of error of \(0.05\) (\(\delta = 0.05\)), \(n = \frac{1.92}{0.05^2} = 768\) subjects in each group.

Code

diff <- .05
qnorm(.975)^2 / 2 / (diff ^ 2)

[1] 768.2918

6.8 Relative Effect Measures

We have been dealing with risk differences which are measures of absolute effect
Measures of relative effect include risk ratios and odds ratios
Risk ratios are easier to interpret but only are useful over a limited range of prognosis (i.e., a risk factor that doubles your risk of lung cancer cannot apply to a subject having a risk above 0.5 without the risk factor)
Odds ratios can apply to any subject
In large clinical trials treatment effects on lowering probability of an event are often constant on the odds ratio scale
OR = Odds ratio = \(\frac{\frac{p_{1}}{1-p_{1}}}{\frac{p_{2}}{1-p_{2}}}\)
Testing \(H_{0}\): OR=1 is equivalent to testing \(H_{0}:p_{1}=p_{2}\)
There are formulas for computing confidence intervals for odds ratios
Odds ratios are most variable when one or both of the probabilities are near 0 or 1
We compute CLs for ORs by anti-logging CLs for the log OR
In the case where \(p_{1}=p_{2}=0.05\) and \(n_{1}=n_{2}=n\), the standard error of the log odds ratio is approximately \(\sqrt{\frac{42.1}{n}}\)
The common sample size \(n\) needed to estimate the true OR to within a factor of 1.5 is 984 with \(p\)s in this range
To show the multiplicative margins of error³ for a range of sample sizes and values of \(p\). For each scenario, the margin of error assumes that both unknown probability estimates equal \(p\).

³ Value by which to multiply the observed odds ratio to obtain the upper 0.95 confidence limit or to divide the observed odds ratio to obtain the lower 0.95 limit. The confidence interval for a ratio should always be constructed on the log scale (unless using the excellent profile likelihood interval which is transformation-invariant). We compute the margin of error on the log ratio scale just as we do for means. The margin of error can be defined as \(\frac{1}{2}\) of the width of the confidence interval for the log ratio. If you anti log that margin of error you get the multiplicative factor, i.e., the multiplicative margin of error. Suppose that the MMOE is 1.5. This means that you can get the 0.95 confidence interval for the ratio by taking the ratio’s point estimate and dividing it by 1.5 and then multiplying it by 1.5. You can loosely say that our point estimate can easily be off by a factor of 1.5.

Code

require(ggplot2)
d <- expand.grid(n=c(seq(10, 1000, by=10), seq(1100, 50000, by=100)),
                 p=c(.02, .05, .075, .1, .15, .2, .25, .3, .4, .5))
d$selor <- with(d, sqrt(2 / (p * (1 - p) * n)))
d$mmoe  <- with(d, exp(qnorm(0.975) * selor))
mb <- c(1, 1.25, 1.5, 2, 2.5, 3, 4, 5, 10, 20, 30, 40, 50, 100, 400)
ggplot(aes(x=n, y=mmoe, color=factor(p)), data=d) +
  geom_line() +
  scale_x_log10(breaks=c(10,20,30,50,100,200,500,1000,2000,5000,10000,
                  20000,50000)) +
  scale_y_log10(breaks=mb, labels=as.character(mb)) +
  xlab(expression(n)) + ylab('Multiplicative Margin of Error for OR') +
  guides(color=guide_legend(title=expression(p)))

Figure 6.1: Multiplicative margin of error related to 0.95 confidence limits of an odds ratio, for varying \(n\) and \(p\) (different curves), assuming the unknown true probability in each group is no lower than \(p\)

6.9 Comprehensive example

6.9.1 Study Description

Consider patients who will undergo coronary artery bypass graft surgery (CABG)
Mortality risk associated with open heart surgery
Study question: Do emergency cases have a surgical mortality that is different from that of non-emergency cases?
Population probabilities
- \(p_1\): Probability of death in patients with emergency priority
- \(p_2\): Probability of death in patients with non-emergency priority
Statistical hypotheses
- \(H_0: p_1 = p_2\) (or \(\mathrm{OR} = 1\))
- \(H_1: p_1 \neq p_2\) (or \(\mathrm{OR} \neq 1\))

6.9.2 Power and Sample Size

Prior research shows that just over \(0.1\) of surgeries end in death
Researchers want to be able to detect a 3 fold increase in risk
For every 1 emergency priority, expect to see 10 non-emergency
\(p_1 = 0.3\), \(p_2 = 0.1\), \(\alpha = 0.05\), and power} = 0.90
Calculate sample sizes using the PS software for these values and other combinations of \(p_1\) and \(p_2\)

\((p_1, p_2)\)	(0.3, 0.1)	(0.4, 0.2)	(0.03, 0.01)	(0.7, 0.9)
\(n_1\)	40	56	589	40
\(n_2\)	400	560	5890	400

Check PS calculations against the R Hmisc package’s bsamsize function.

Code

round(bsamsize(.3, .1, fraction=1/11, power=.9))

 n1  n2 
 40 399

Code

round(bsamsize(.4, .2, fraction=1/11, power=.9))

 n1  n2 
 56 561

Code

round(bsamsize(.7, .9, fraction=1/11, power=.9))

 n1  n2 
 40 399

6.9.3 Collected Data

In-hospital mortality figures for emergency surgery and other surgery

Surgical Priority	Dead	Alive
Emergency	6	19
Other	11	100

\(\hat{p}_1 = \frac{6}{25} = 0.24\)
\(\hat{p}_2 = \frac{11}{111} = 0.10\)

6.9.4 Statistical Test

Code

n1 <- 25;     n2 <- 111
p1 <- 6 / n1; p2 <- 11 / n2
or <- p1 / (1 - p1) / (p2 / (1 - p2))
or

[1] 2.870813

Code

# Standard error of log odds ratio:
selor <- sqrt(1 / (n1 * p1 * (1 - p1)) + 1 / (n2 * p2 * (1 - p2)))
# Get 0.95 confidence limits
cls <- exp(log(or) + c(-1, 1) * qnorm(0.975) * selor)
cls

[1] 0.946971 8.703085

Code

tcls <- paste0(round(or, 2), ' (0.95 CI: [', round(cls[1], 2),
               ', ', round(cls[2], 2), '])')
# Multiplying a constant by the vector -1, 1 does +/-
x <- matrix(c(6, 19, 11, 100), nrow=2, byrow=TRUE)
x

     [,1] [,2]
[1,]    6   19
[2,]   11  100

Code

chisq.test(x, correct=FALSE)


    Pearson's Chi-squared test

data:  x
X-squared = 3.7037, df = 1, p-value = 0.05429

Interpretation
- Compare odds of death in the emergency group \(\left(\frac{\hat{p}_1}{1-\hat{p}_1}\right)\) to odds of death in non-emergency group \(\left(\frac{\hat{p}_2}{1-\hat{p}_2}\right)\)
- Emergency cases have 2.87 (0.95 CI: [0.95, 8.7]) fold increased odds of death during surgery compared to non-emergency cases.

Fisher’s Exact Test

Observed marginal totals from emergency surgery dataset


Emergency	\(a\)	\(b\)
Other	\(c\)	\(d\)

With fixed marginal totals, there are 18 possible tables (\(a = 0, 1, \ldots 17\))
Can calculated probability of each of these tables
- \(p\)-value: Probability of observing data as extreme or more extreme than we collected in this experiment
Exact test: \(p\)-value can be calculated “exactly” (not using the \(\chi^2\) distribution to approximate the \(p\)-value)

Code

fisher.test(x)


    Fisher's Exact Test for Count Data

data:  x
p-value = 0.08706
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.7674155 9.6831351
sample estimates:
odds ratio 
  2.843047

Note that the odds ratio from Fisher’s test is a conditional maximum likelihood estimate, which differs from the unconditional maximum likelihood estimate we obtained earlier. * Fisher’s test more conservative than Pearson’s \(\chi^2\) test (larger \(P\)-value)

6.10 Logistic Regression for Comparing Proportions

When comparing \(\geq 2\) groups on the probability that a categorical outcome variable will have a certain value observed (e.g., \(P(Y=1)\)), one can use the Pearson \(\chi^2\) test for a contingency table (or the less powerful Fisher’s “exact” test)
Such analyses can also be done with a variety of regression models
It is necessary to use a regression model when one desires to analyze more than the grouping variable, e.g.
- analyze effects of two grouping variables
- adjust for covariates
For full generality, the regression model needs to have no restrictions on the regression coefficients
- Probabilities are restricted to be in the interval \([0,1]\) so an additive risk model cannot fit over a broad risk range
- Odds (\(\frac{p}{1-p}\)) are restricted to be in \([0, \infty]\)
- Log odds have no restrictions since they can be in \([-\infty, \infty]\)
So log odds is a good basis for regression analysis of categorical \(Y\)
- default assumption of additivity of effects needs no restrictions
- will still translate to probabilities in \([0,1]\)
The binary logistic regression model is a model to estimate the probability of an event as a flexible function of covariates
Let the outcome variable \(Y\) have the values \(Y=0\) (non-event) or \(Y=1\) (event) \[\mathrm{Prob}(Y = 1 | X) = \frac{1}{1 + \exp(-(\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \ldots))}\]
The sum inside the inner () is called the linear predictor (LP)
The binary logistic model relates LP (with no restrictions) to the event probability as so:

Code

lp <- seq(-5, 5, length=150)
ggplot(mapping=aes(lp, plogis(lp))) + geom_line() +
  xlab('Linear Predictor') + ylab('P(Y=1)')

Notes about LP:
- for a given problem the range may be much narrower than \([-4,4]\)
- when any predictor \(X_j\) with a non-zero \(\beta_j\) is categorical, LP cannot take on all possible values within its range, so the above plot will have gaps
As a special case the model can estimate and compare two probabilities as done above, through an odds ratio
Logistic regression seems like an overkill here, but it sets the stage for more complex frequentist analysis as well as Bayesian analysis
For our case the model is as follows
Consider groups A and B, A = reference group
\([x]\) denotes 1 if \(x\) is true, 0 if \(x\) is false
Define the expit function as the inverse of the logit function, or \(\mathrm{expit}(x) = \frac{1}{1 + \exp(-x)}\) \[\begin{array}{ccc} P(Y = 1 | \mathrm{group}) &=& \frac{1}{1 + \exp(-(\beta_0 + \beta_1 [\mathrm{group } B]))} \\ &=& \mathrm{expit}(\beta_0 + \beta_1 [\mathrm{group } B])\\ P(Y = 1 | \mathrm{group A}) &=& \mathrm{expit}(\beta_0)\\ P(Y = 1 | \mathrm{group B}) &=& \mathrm{expit}(\beta_0 + \beta_1) \end{array}\]

\(\beta_0 =\) log odds of probability of event in group A \(= \log(\frac{p_1}{1 - p_1}) = \mathrm{logit}(p_1)\)
\(\beta_1 =\) increase in log odds in going from group A to group B =
\(\log(\frac{p_2}{1 - p_2}) - \log(\frac{p_1}{1 - p_1}) = \mathrm{logit}(p_2) - \mathrm{logit}(p_1)\)
\(\exp(\beta_1) =\) group B : group A odds ratio \(= \frac{\frac{p_2}{1 - p_2}}{\frac{p_1}{1 - p_1}}\)
expit\((\beta_0) = p_1\)
expit\((\beta_0 + \beta_1) = p_2\)

Once the \(\beta\)s are estimated, one quickly gets the B:A odds ratio and \(\hat{p}_1\) and \(\hat{p}_2\)
This model is saturated
- has the maximum number of parameters needed to fully describe the situation (here, 2 since 2 groups)
- saturated models must fit the data if the distributional and independence assumptions are met
- logistic model has no distributional assumption
Logistic regression is more general and flexible than the specialized tests for proportions
- allows testing association on continuous characteristics
- easily extends to more than two groups
- allows adjustment for covariates
Examples:
- assess effects of subjects’ sex and country (Canada vs. US) on \(P(Y=1)\) denoted by \(p\)
  logit \(p\) = constant + logit male effect + logit Canada effect
- same but allow for interaction
  logit \(p\) = constant + logit male effect + logit Canada effect + special effect of being male if Canadian
- latter model is saturated with 3 d.f. so fits as well as a model with 4 proportions
  - unlike the overall Pearson \(\chi^2\) test, allows testing interaction and
  - separate effects of sex and country (e.g., 2 d.f. chunk test for whether there is a sex difference for either country, allowing for the sex effect to differ by country)

6.10.1 Test Statistics

For frequentist logistic models there are 3 types of \(\chi^2\) test statistics for testing the same hypothesis:

likelihood ratio (LR) test (usually the most accurate) and is scale invariant⁴
- Can obtain the likelihood ratio \(\chi^2\) statistic from either the logistic model or from a logarithmic equation in the two proportions and sample sizes
Score test (identical to Pearson test for overall model if model is saturated)
Wald test (square of \(\frac{\hat{\beta}}{\mathrm{s.e.}}\) in the one parameter case; misbehaves for extremely large effects)
Wald test is the easiest to compute but \(P\)-values and confidence intervals from it are not as accurate. The score test is the way to exactly reproduce the Pearson \(\chi^2\) statistic from the logistic model.
As with \(t\)-test vs. a linear model, the special case tests are not needed once you use the logistic model framework
Three usages of any of these test statistics:
- individual test, e.g. sex effect in sex-country model without interaction
- chunk test, e.g. sex + sex \(\times\) country interaction 2 d.f. test
  tests overall sex effect
- global test of no association, e.g. 3 d.f. test for whether sex or country is associated with \(Y\)

⁴ The LR test statistic is the same whether testing for an absolute risk difference of 0.0, or for an odds or risk ratio of 1.0.

6.10.2 Frequentist Analysis Example

Consider again our emergency surgery example
String the observations out to get one row = one patient, binary \(Y\)

Code

require(rms)
options(prType='html')
priority <- factor(c(rep('emergency', 25), rep('other', 111)), c('other', 'emergency'))
death <- c(rep(0, 19), rep(1, 6), rep(0, 100), rep(1, 11))
table(priority, death)

           death
priority      0   1
  other     100  11
  emergency  19   6

Code

d  <- data.frame(priority, death)
dd <- datadist(d); options(datadist='dd')
# rms package needs covariate summaries computed by datadist
f <- lrm(death ~ priority)
f

Logistic Regression Model

lrm(formula = death ~ priority)

	Model Likelihood Ratio Test	Discrimination Indexes	Rank Discrim. Indexes
Obs 136	LR χ² 3.20	R² 0.044	C 0.597
0 119	d.f. 1	R²_1,136 0.016	D_xy 0.193
1 17	Pr(>χ²) 0.0737	R²_1,44.6 0.048	γ 0.483
max \|∂log L/∂β\| 4×10^-9		Brier 0.106	τ_a 0.043

	β	S.E.	Wald Z	Pr(>\|Z\|)
Intercept	-2.2073	0.3177	-6.95	<0.0001
priority=emergency	1.0546	0.5659	1.86	0.0624

Compare the LR \(\chi^2\) of 3.2 with the earlier Pearson \(\chi^2\) of 3.70
The likelihood ratio (LR) \(\chi^2\) test statistic and its \(P\)-value are usually a little more accurate than the other association tests
- but \(\chi^2\) distribution is still only an approximation to the true sampling distribution
See that we can recover the simple proportions from the fitted logistic model:
\(\hat{p}_1 = \frac{11}{111} = 0.099 = \mathrm{expit}(-2.2073)\)
\(\hat{p}_2 = \frac{6}{25} = 0.24 = \mathrm{expit}(-2.2073 + 1.0546)\)

Code

summary(f)

	Low	High	Effect	S.E.	Lower 0.95	Upper 0.95
Effects Response: `death`
priority --- emergency:other	1	2	1.055	0.5659	-0.05449	2.164
Odds Ratio	1	2	2.871		0.94700	8.703

The point estimate and CLs for the odds ratio is the same as what we obtained earlier.

Add a random binary variable to the logistic model—one that is correlated with the surgical priority—to see the effect on the estimate of the priority effect

Code

set.seed(10)
randomUniform <- runif(length(priority))
random <- ifelse(priority == 'emergency', randomUniform < 1/3,
                                          randomUniform > 1/3) * 1
d$random <- random
table(priority, random)

           random
priority     0  1
  other     39 72
  emergency 17  8

Code

lrm(death ~ priority + random)

Logistic Regression Model

lrm(formula = death ~ priority + random)

	Model Likelihood Ratio Test	Discrimination Indexes	Rank Discrim. Indexes
Obs 136	LR χ² 6.03	R² 0.082	C 0.659
0 119	d.f. 2	R²_2,136 0.029	D_xy 0.319
1 17	Pr(>χ²) 0.0489	R²_2,44.6 0.086	γ 0.464
max \|∂log L/∂β\| 1×10^-8		Brier 0.103	τ_a 0.070

	β	S.E.	Wald Z	Pr(>\|Z\|)
Intercept	-1.6851	0.4157	-4.05	<0.0001
priority=emergency	0.7811	0.5909	1.32	0.1862
random	-0.9319	0.5621	-1.66	0.0973

The effect of emergency status is diminished, and the random grouping variable, created to have no relation to death in the population, has a large apparent effect.

6.10.3 Bayesian Logistic Regression Analysis

Has several advantages
- All calculations are exact (to within simulation error) without changing the model
- Can incorporate external information
- Intuitive measures of evidence
- Automatically handles zero-frequency cells (priors shrink probabilities a bit away from 0.0)
Simple to use beta priors for each of the two probabilities
But we’d need to incorporate a complex dependency in the two priors because we know more about how the two probabilities relate to each other than we know about each absolute risk
Simpler to have a wide prior on \(p_1\) and to have a non-flat prior on the log odds ratio
- could back-solve and show dependency between knowledge of \(p_1\) and \(p_2\)
Use the same data model as above
As before we use the R brms package which makes standard modeling easy
Need two priors: for intercept \(\beta_0\) and for log odds ratio \(\beta_1\)
\(\beta_0\): use a normal distribution that makes \(p_1 = 0.05\) the most likely value (put mean at logit(0.05) = -2.944) and allows only a 0.1 chance that \(p_1 > 0.2\); solve for SD \(\sigma\) that accomplishes that

Code

# Given mu and value, solve for SD so that the tail area of the normal distribution beyond value is prob
normsolve <- function(mu, value, prob) (value - mu) / qnorm(1 - prob)
normsolve(qlogis(0.05), qlogis(0.2), 0.1)  # qlogis is R logit()

[1] 1.215827

We round \(\sigma\) to 1.216
For \(\beta_1\) put a prior that has equal chance for OR \(< 1\) as for OR \(> 1\), i.e., mean for log OR of zero
Put a chance of only 0.1 that OR \(> 3\)

Code

normsolve(0, log(3), 0.1)

[1] 0.8572517

Round to 0.857 Compute the correlation between prior evidence for \(p_1\) and \(p_2\) by drawing 100,000 samples from the prior distributions. Also verify that prior probability \(p_2 > p_1\) is \(\frac{1}{2} =\) prob. OR \(> 1\).

Code

b0 <- rnorm(100000, qlogis(0.05), 1.216)
b1 <- rnorm(100000, 0, 0.857)
p1 <- plogis(b0)
p2 <- plogis(b0 + b1)
cor(b0, b1, method='spearman')

[1] 0.003724938

Code

cor(p1, p2, method='spearman')

[1] 0.8066422

Code

# Define functions for posterior probability operator and posterior mode
P <- mean   # proportion of posterior draws for which a condition holds
pmode <- function(x) {
  z <- density(x)
  z$x[which.max(z$y)]
  }

P(p2 > p1)

[1] 0.50106

Code

P(b1 > 0)

[1] 0.50106

Code

P(exp(b1) > 1)

[1] 0.50106

To show that prior knowledge about \(p_1\) and \(p_2\) is uncorrelated when we don’t know anything about the odds ratio, repeat the above calculation use a SD of 1000 for the log odds ratio:

Code

b1 <- rnorm(100000, 0, 1000)
p2 <- plogis(b0 + b1)
cor(p1, p2, method='spearman')

[1] 0.001448724

Now do Bayesian logistic regression analysis.

Code

require(brms)
# Tell brms/Stan to use all available CPU cores less 1 and to use
# cmdstan instead of rstan
options(mc.cores=parallel::detectCores() - 1, brms.backend='cmdstanr')
cmdstanr::set_cmdstan_path(cmdstan.loc)   # defined in ~/.Rprofile

Code

p <- c(prior(normal(-2.944, 1.216), class='Intercept'),
       prior(normal(0, 0.857),      class='b'))
f <- brm(death ~ priority, data=d, prior=p, family='bernoulli', seed=123)

Running MCMC with 4 chains, at most 11 in parallel...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.2 seconds.

Code

 Family: bernoulli 
  Links: mu = logit 
Formula: death ~ priority 
   Data: d (Number of observations: 136) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept            -2.19      0.30    -2.81    -1.64 1.00     2717     2275
priorityemergency     0.73      0.50    -0.26     1.71 1.00     2599     2562

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Code

plot(f)

Code

# Bring out posterior draws
w <- as.data.frame(f)
b0 <- w[, 'b_Intercept']
b1 <- w[, 'b_priorityemergency']
r  <- rbind(c(mean(b0), median(b0), pmode(b0)),
            c(mean(b1), median(b1), pmode(b1)),
            c(mean(exp(b1)), median(exp(b1)), pmode(exp(b1))))
colnames(r) <- c('Posterior Mean', 'Posterior Median', 'Posterior Mode')
rownames(r) <- c('b0', 'b1', 'OR')
round(r, 3)

   Posterior Mean Posterior Median Posterior Mode
b0         -2.194           -2.189         -2.218
b1          0.731            0.731          0.712
OR          2.346            2.078          1.708

Because the prior on the OR is conservative, the Bayesian posterior mode for the OR is smaller than the frequentist maximum likelihood estimate of 2.87.

Below notice how easy it is to do Bayesian inference on derived quantities p1 and p2 which are functions of b0 and b1.

Code

# 0.95 credible interval for log odds ratio and odds ratio
quantile(b1, c(0.025, 0.975))

      2.5%      97.5% 
-0.2611761  1.7088300

Code

quantile(exp(b1), c(.025, 0.975))

     2.5%     97.5% 
0.7701453 5.5224965

Code

exp(quantile(b1,  c(0.025, 0.975)))

     2.5%     97.5% 
0.7701452 5.5224964

Code

# Posterior density of emergency:other odds ratio
ggplot(mapping=aes(x=exp(b1))) + geom_density() +
  xlab('OR') + ylab('') +
  geom_vline(xintercept=c(1, pmode(exp(b1))), alpha=I(0.25))
# Probability that OR > 1
P(exp(b1) > 1)

[1] 0.93175

Code

# Probability it is > 1.5
P(exp(b1) > 1.5)

[1] 0.74775

Code

# Probability that risk with emergency surgery exceeds that of
# non-emergency (same as P(OR > 1))
# plogis in R is 1/(1 + exp(-x))
P(plogis(b0 + b1) > plogis(b0))

[1] 0.93175

Code

# Prob. that risk with emergency surgery elevated by more than 0.03
P(plogis(b0 + b1) > plogis(b0) + 0.03)

[1] 0.82125

Figure 6.3: Posterior distribution of odds ratio

Even though the priors for the intercept and log odds ratio are independent, the connection of these two parameters in the data likelihood makes the posteriors dependent as shown with Spearman correlations of the posterior draws below. Also get the correlation between evidence for the two probabilities. These have correlated priors even though they are unconnected in the likelihood function. Posteriors for \(p_1\) and \(p_2\) are less correlated than their priors.

Code

cor(b0, b1, method='spearman')

[1] -0.465965

Code

cor(plogis(b0), plogis(b0 + b1), method='spearman')

[1] 0.1355492

To demonstrate the effect of a skeptical prior:

Add random grouping to model as we did with the frequentist analysis
Make use of prior information that this variable is unlikely to be important
Put a prior on the log OR for this variable centered at zero with chance that the OR \(> 1.25\) of only 0.05

Code

normsolve(0, log(1.25), 0.05)

[1] 0.1356616

Code

p <- c(prior(normal(-2.944, 1.216), class='Intercept'),
       prior(normal(0, 0.857),      class='b', coef='priorityemergency'),
       prior(normal(0, 0.136),      class='b', coef='random'))

f2 <- brm(death ~ priority + random, data=d, prior=p, family='bernoulli',
          seed=121)

Running MCMC with 4 chains, at most 11 in parallel...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.2 seconds.

Code

f2

 Family: bernoulli 
  Links: mu = logit 
Formula: death ~ priority + random 
   Data: d (Number of observations: 136) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept            -2.17      0.30    -2.79    -1.61 1.00     3085     2500
priorityemergency     0.71      0.50    -0.30     1.69 1.00     3789     3039
random               -0.06      0.13    -0.31     0.20 1.00     3825     3243

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Effect of random is greatly discounted
Posterior mean priority effect and its credible interval is virtually the same as the model that excluded random

James Rae and Nils Reimer have written a nice tutorial on using the R brms package for binary logistic regression available at bit.ly/brms-lrm