12  Logistic Model Case Study: Survival of Titanic Passengers

Data source: The Titanic Passenger List edited by Michael A. Findlay, originally published in Eaton & Haas (1994) Titanic: Triumph and Tragedy, Patrick Stephens Ltd, and expanded with the help of the Internet community. The original html files were obtained from Philip Hind (1999). The dataset was compiled and interpreted by Thomas Cason. It is available in R and spreadsheet formats from hbiostat.org/data under the name titanic3.

12.1 Descriptive Statistics

Code
require(rms)
options(prType='html')   # for print, summary, anova
getHdata(titanic3)        # get dataset from web site
# List of names of variables to analyze
v <- c('pclass','survived','age','sex','sibsp','parch')
t3 <- titanic3[, v]
units(t3$age) <- 'years'
html(describe(t3))
t3 Descriptives
t3

6 Variables   1309 Observations

pclass
image
nmissingdistinct
130903
 Value        1st   2nd   3rd
 Frequency    323   277   709
 Proportion 0.247 0.212 0.542
 

survived: Survived
nmissingdistinctInfoSumMeanGmd
1309020.7085000.3820.4725

age: Age years
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
1046263980.99929.8816.06 5142128395057
lowest : 0.1667 0.3333 0.4167 0.6667 0.7500 , highest: 70.5000 71.0000 74.0000 76.0000 80.0000
sex
nmissingdistinct
130902
 Value      female   male
 Frequency     466    843
 Proportion  0.356  0.644
 

sibsp: Number of Siblings/Spouses Aboard
image
nmissingdistinctInfoMeanGmd
1309070.670.49890.777
lowest : 0 1 2 3 4 , highest: 2 3 4 5 8
 Value          0     1     2     3     4     5     8
 Frequency    891   319    42    20    22     6     9
 Proportion 0.681 0.244 0.032 0.015 0.017 0.005 0.007
 

parch: Number of Parents/Children Aboard
image
nmissingdistinctInfoMeanGmd
1309080.5490.3850.6375
lowest : 0 1 2 3 4 , highest: 3 4 5 6 9
 Value          0     1     2     3     4     5     6     9
 Frequency   1002   170   113     8     6     6     2     2
 Proportion 0.765 0.130 0.086 0.006 0.005 0.005 0.002 0.002
 

Code
spar(ps=6,rt=3)
dd <- datadist(t3)
# describe distributions of variables to rms
options(datadist='dd')
s <- summary(survived ~ age + sex + pclass +
             cut2(sibsp,0:3) + cut2(parch,0:3), data=t3)
plot(s, main='', subtitles=FALSE)   

Figure 12.1: Univariable summaries of Titanic survival

Show 4-way relationships after collapsing levels. Suppress estimates based on \(<25\) passengers.

A
Code
tn <- transform(t3,
  agec = ifelse(age < 21, 'child', 'adult'),
  sibsp= ifelse(sibsp == 0, 'no sib/sp', 'sib/sp'),
  parch= ifelse(parch == 0, 'no par/child', 'par/child'))
g <- function(y) if(length(y) < 25) NA else mean(y)
s <- with(tn, summarize(survived, 
           llist(agec, sex, pclass, sibsp, parch), g))
# llist, summarize in Hmisc package
ggplot(subset(s, agec != 'NA'),
  aes(x=survived, y=pclass, shape=sex)) +
  geom_point() + facet_grid(agec ~ sibsp * parch) +
  xlab('Proportion Surviving') + ylab('Passenger Class') +
  scale_x_continuous(breaks=c(0, .5, 1))

Figure 12.2: Multi-way summary of Titanic survival

12.3 Binary Logistic Model with Casewise Deletion of Missing Values

First fit a model that is saturated with respect to age, sex, pclass. Insufficient variation in sibsp, parch to fit complex interactions or nonlinearities.

Code
f1 <- lrm(survived ~ sex*pclass*rcs(age,5) +
          rcs(age,5)*(sibsp + parch), data=t3)
print(f1, coefs=FALSE)
Logistic Regression Model
 lrm(formula = survived ~ sex * pclass * rcs(age, 5) + rcs(age, 
     5) * (sibsp + parch), data = t3)
 

Frequencies of Missing Values Due to Each Variable
 survived      sex   pclass      age    sibsp    parch 
        0        0        0      263        0        0 
 
Model Likelihood
Ratio Test
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 1046 LR χ2 573.41 R2 0.569 C 0.883
0 619 d.f. 39 R239,1046 0.400 Dxy 0.766
1 427 Pr(>χ2) <0.0001 R239,758.1 0.506 γ 0.767
max |∂log L/∂β| 0.004 Brier 0.127 τa 0.370
Code
anova(f1)
Wald Statistics for survived
χ2 d.f. P
sex (Factor+Higher Order Factors) 187.15 15 <0.0001
All Interactions 59.74 14 <0.0001
pclass (Factor+Higher Order Factors) 100.10 20 <0.0001
All Interactions 46.51 18 0.0003
age (Factor+Higher Order Factors) 56.20 32 0.0052
All Interactions 34.57 28 0.1826
Nonlinear (Factor+Higher Order Factors) 28.66 24 0.2331
sibsp (Factor+Higher Order Factors) 19.67 5 0.0014
All Interactions 12.13 4 0.0164
parch (Factor+Higher Order Factors) 3.51 5 0.6217
All Interactions 3.51 4 0.4761
sex × pclass (Factor+Higher Order Factors) 42.43 10 <0.0001
sex × age (Factor+Higher Order Factors) 15.89 12 0.1962
Nonlinear (Factor+Higher Order Factors) 14.47 9 0.1066
Nonlinear Interaction : f(A,B) vs. AB 4.17 3 0.2441
pclass × age (Factor+Higher Order Factors) 13.47 16 0.6385
Nonlinear (Factor+Higher Order Factors) 12.92 12 0.3749
Nonlinear Interaction : f(A,B) vs. AB 6.88 6 0.3324
age × sibsp (Factor+Higher Order Factors) 12.13 4 0.0164
Nonlinear 1.76 3 0.6235
Nonlinear Interaction : f(A,B) vs. AB 1.76 3 0.6235
age × parch (Factor+Higher Order Factors) 3.51 4 0.4761
Nonlinear 1.80 3 0.6147
Nonlinear Interaction : f(A,B) vs. AB 1.80 3 0.6147
sex × pclass × age (Factor+Higher Order Factors) 8.34 8 0.4006
Nonlinear 7.74 6 0.2581
TOTAL NONLINEAR 28.66 24 0.2331
TOTAL INTERACTION 75.61 30 <0.0001
TOTAL NONLINEAR + INTERACTION 79.49 33 <0.0001
TOTAL 241.93 39 <0.0001

3-way interactions, parch clearly insignificant, so drop

Code
f <- lrm(survived ~ (sex + pclass + rcs(age,5))^2 + 
         rcs(age,5)*sibsp, data=t3)
f

Logistic Regression Model

 lrm(formula = survived ~ (sex + pclass + rcs(age, 5))^2 + rcs(age, 
     5) * sibsp, data = t3)
 


Frequencies of Missing Values Due to Each Variable

 survived      sex   pclass      age    sibsp 
        0        0        0      263        0 
 
Model Likelihood
Ratio Test
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 1046 LR χ2 553.87 R2 0.555 C 0.878
0 619 d.f. 26 R226,1046 0.396 Dxy 0.756
1 427 Pr(>χ2) <0.0001 R226,758.1 0.502 γ 0.758
max |∂log L/∂β| 6×10-6 Brier 0.130 τa 0.366
β S.E. Wald Z Pr(>|Z|)
Intercept   3.3075  1.8427 1.79 0.0727
sex=male  -1.1478  1.0878 -1.06 0.2914
pclass=2nd   6.7309  3.9617 1.70 0.0893
pclass=3rd  -1.6437  1.8299 -0.90 0.3691
age   0.0886  0.1346 0.66 0.5102
age’  -0.7410  0.6513 -1.14 0.2552
age’’   4.9264  4.0047 1.23 0.2186
age’’’  -6.6129  5.4100 -1.22 0.2216
sibsp  -1.0446  0.3441 -3.04 0.0024
sex=male × pclass=2nd  -0.7682  0.7083 -1.08 0.2781
sex=male × pclass=3rd   2.1520  0.6214 3.46 0.0005
sex=male × age  -0.2191  0.0722 -3.04 0.0024
sex=male × age’   1.0842  0.3886 2.79 0.0053
sex=male × age’’  -6.5578  2.6511 -2.47 0.0134
sex=male × age’’’   8.3716  3.8532 2.17 0.0298
pclass=2nd × age  -0.5446  0.2653 -2.05 0.0401
pclass=3rd × age  -0.1634  0.1308 -1.25 0.2118
pclass=2nd × age’   1.9156  1.0189 1.88 0.0601
pclass=3rd × age’   0.8205  0.6091 1.35 0.1780
pclass=2nd × age’’  -8.9545  5.5027 -1.63 0.1037
pclass=3rd × age’’  -5.4276  3.6475 -1.49 0.1367
pclass=2nd × age’’’   9.3926  6.9559 1.35 0.1769
pclass=3rd × age’’’   7.5403  4.8519 1.55 0.1202
age × sibsp   0.0357  0.0340 1.05 0.2933
age’ × sibsp  -0.0467  0.2213 -0.21 0.8330
age’’ × sibsp   0.5574  1.6680 0.33 0.7382
age’’’ × sibsp  -1.1937  2.5711 -0.46 0.6425

Note that the adjusted Maddala-Cox-Snell \(R^2\) using an effective sample size of 758.1 is only 0.004 smaller than the full model.

Code
anova(f)
Wald Statistics for survived
χ2 d.f. P
sex (Factor+Higher Order Factors) 199.42 7 <0.0001
All Interactions 56.14 6 <0.0001
pclass (Factor+Higher Order Factors) 108.73 12 <0.0001
All Interactions 42.83 10 <0.0001
age (Factor+Higher Order Factors) 47.04 20 0.0006
All Interactions 24.51 16 0.0789
Nonlinear (Factor+Higher Order Factors) 22.72 15 0.0902
sibsp (Factor+Higher Order Factors) 19.95 5 0.0013
All Interactions 10.99 4 0.0267
sex × pclass (Factor+Higher Order Factors) 35.40 2 <0.0001
sex × age (Factor+Higher Order Factors) 10.08 4 0.0391
Nonlinear 8.17 3 0.0426
Nonlinear Interaction : f(A,B) vs. AB 8.17 3 0.0426
pclass × age (Factor+Higher Order Factors) 6.86 8 0.5516
Nonlinear 6.11 6 0.4113
Nonlinear Interaction : f(A,B) vs. AB 6.11 6 0.4113
age × sibsp (Factor+Higher Order Factors) 10.99 4 0.0267
Nonlinear 1.81 3 0.6134
Nonlinear Interaction : f(A,B) vs. AB 1.81 3 0.6134
TOTAL NONLINEAR 22.72 15 0.0902
TOTAL INTERACTION 67.58 18 <0.0001
TOTAL NONLINEAR + INTERACTION 70.68 21 <0.0001
TOTAL 253.18 26 <0.0001

Show the many effects of predictors.

B
Code
p <- Predict(f, age, sex, pclass, sibsp=0, fun=plogis)
ggplot(p)       

Figure 12.5: Effects of predictors on probability of survival of Titanic passengers, estimated for zero siblings or spouses

Code
ggplot(Predict(f, sibsp, age=c(10,15,20,50), conf.int=FALSE))
#

Figure 12.6: Effect of number of siblings and spouses on the log odds of surviving, for third class males

Note that children having many siblings apparently had lower survival. Married adults had slightly higher survival than unmarried ones. Validate the model using the bootstrap to check overfitting. Ignoring two very insignificant pooled tests.

C
Code
f <- update(f, x=TRUE, y=TRUE)
# x=TRUE, y=TRUE adds raw data to fit object so can bootstrap
set.seed(131)                  # so can replicate re-samples
html(validate(f, B=200), digits=2)
Index Original
Sample
Training
Sample
Test
Sample
Optimism Corrected
Index
\(n\)
\(D_{xy}\) 0.76 0.77 0.74 0.03 0.72 200
\(R^{2}\) 0.55 0.58 0.53 0.05 0.5 200
Intercept 0 0 -0.08 0.08 -0.08 200
Slope 1 1 0.86 0.14 0.86 200
\(E_{\max}\) 0 0 0.05 0.05 0.05 200
\(D\) 0.53 0.56 0.49 0.06 0.46 200
\(U\) 0 0 0.01 -0.01 0.01 200
\(Q\) 0.53 0.56 0.49 0.07 0.46 200
\(B\) 0.13 0.13 0.13 -0.01 0.14 200
\(g\) 2.43 2.78 2.4 0.38 2.04 200
\(g_{p}\) 0.37 0.37 0.35 0.02 0.35 200
Code
spar(ps=10, bot=1)
cal <- calibrate(f, B=200)      
plot(cal, subtitles=FALSE)

n=1046   Mean absolute error=0.011   Mean squared error=0.00016
0.9 Quantile of absolute error=0.018

Figure 12.7: Bootstrap overfitting-corrected loess nonparametric calibration curve for casewise deletion model

But moderate problem with missing data

12.4 Examining Missing Data Patterns

Code
#|label: fig-titanic-napatterns
spar(mfrow=c(2,2), top=1, ps=11)
na.patterns <- naclus(titanic3)
require(rpart)      # Recursive partitioning package
who.na <- rpart(is.na(age) ~ sex + pclass + survived +
                sibsp + parch, data=titanic3, minbucket=15)
naplot(na.patterns, 'na per var')
plot(who.na, margin=.1); text(who.na) 
plot(na.patterns)

Patterns of missing data. Upper left panel shows the fraction of observations missing on each predictor. Lower panel depicts a hierarchical cluster analysis of missingness combinations. The similarity measure shown on the \(Y\)-axis is the fraction of observations for which both variables are missing. Right panel shows the result of recursive partitioning for predicting is.na(age). The rpart function found only strong patterns according to passenger class.

Code
spar(ps=7, rt=3)
plot(summary(is.na(age) ~ sex + pclass + survived +
             sibsp + parch, data=t3))  

Figure 12.8: Univariable descriptions of proportion of passengers with missing age

Code
m <- lrm(is.na(age) ~ sex * pclass + survived + sibsp + parch,
         data=t3)
m

Logistic Regression Model

 lrm(formula = is.na(age) ~ sex * pclass + survived + sibsp + 
     parch, data = t3)
 
Model Likelihood
Ratio Test
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 1309 LR χ2 114.99 R2 0.133 C 0.703
FALSE 1046 d.f. 8 R28,1309 0.078 Dxy 0.406
TRUE 263 Pr(>χ2) <0.0001 R28,630.5 0.156 γ 0.451
max |∂log L/∂β| 5×10-6 Brier 0.148 τa 0.131
β S.E. Wald Z Pr(>|Z|)
Intercept  -2.2030  0.3641 -6.05 <0.0001
sex=male   0.6440  0.3953 1.63 0.1033
pclass=2nd  -1.0079  0.6658 -1.51 0.1300
pclass=3rd   1.6124  0.3596 4.48 <0.0001
survived  -0.1806  0.1828 -0.99 0.3232
sibsp   0.0435  0.0737 0.59 0.5548
parch  -0.3526  0.1253 -2.81 0.0049
sex=male × pclass=2nd   0.1347  0.7545 0.18 0.8583
sex=male × pclass=3rd  -0.8563  0.4214 -2.03 0.0422
Code
anova(m)
Wald Statistics for is.na(age)
χ2 d.f. P
sex (Factor+Higher Order Factors) 5.61 3 0.1324
All Interactions 5.58 2 0.0614
pclass (Factor+Higher Order Factors) 68.43 4 <0.0001
All Interactions 5.58 2 0.0614
survived 0.98 1 0.3232
sibsp 0.35 1 0.5548
parch 7.92 1 0.0049
sex × pclass (Factor+Higher Order Factors) 5.58 2 0.0614
TOTAL 82.90 8 <0.0001

pclass and parch are the important predictors of missing age.

12.5 Single Conditional Mean Imputation

First try: conditional mean imputation
Default spline transformation for age caused distribution of imputed values to be much different from non-imputed ones; constrain to linear

D
Code
xtrans <- transcan(~ I(age) + sex + pclass + sibsp + parch, 
                   imputed=TRUE, pl=FALSE, pr=FALSE, data=t3)
summary(xtrans)
transcan(x = ~I(age) + sex + pclass + sibsp + parch, imputed = TRUE, 
    pr = FALSE, pl = FALSE, data = t3)

Iterations: 5 

R-squared achieved in predicting each variable:

   age    sex pclass  sibsp  parch 
 0.264  0.076  0.242  0.249  0.291 

Adjusted R-squared:

   age    sex pclass  sibsp  parch 
 0.260  0.073  0.239  0.245  0.288 

Coefficients of canonical variates for predicting each (row) variable

       age   sex   pclass sibsp parch
age           0.92  6.05  -2.02 -2.65
sex     0.03       -0.56  -0.01 -0.75
pclass  0.08 -0.26         0.03  0.28
sibsp  -0.02  0.00  0.03         0.86
parch  -0.03 -0.30  0.23   0.75      

Summary of imputed values

age 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
     263        0       24     0.91    28.53    6.925    17.34    21.77 
     .25      .50      .75      .90      .95 
   26.17    28.10    28.10    42.77    42.77 

lowest :  9.82894 11.75710 13.22440 15.15250 17.28300
highest: 33.24650 34.73840 38.63790 40.83950 42.76770

Starting estimates for imputed values:

   age    sex pclass  sibsp  parch 
    28      2      3      0      0 
Code
# Look at mean imputed values by sex,pclass and observed means
# age.i is age, filled in with conditional mean estimates
age.i <- with(t3, impute(xtrans, age, data=t3))
i <- is.imputed(age.i)
with(t3, tapply(age.i[i], list(sex[i],pclass[i]), mean))
            1st      2nd      3rd
female 39.08396 31.31831 23.10548
male   42.76765 33.24650 26.87451
Code
with(t3, tapply(age, list(sex,pclass), mean, na.rm=TRUE))
            1st      2nd      3rd
female 37.03759 27.49919 22.18531
male   41.02925 30.81540 25.96227
Code
dd   <- datadist(dd, age.i)
f.si <- lrm(survived ~ (sex + pclass + rcs(age.i,5))^2 + 
            rcs(age.i,5)*sibsp, data=t3)
print(f.si, coefs=FALSE)
Logistic Regression Model
 lrm(formula = survived ~ (sex + pclass + rcs(age.i, 5))^2 + rcs(age.i, 
     5) * sibsp, data = t3)
 
Model Likelihood
Ratio Test
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 1309 LR χ2 640.85 R2 0.526 C 0.861
0 809 d.f. 26 R226,1309 0.375 Dxy 0.722
1 500 Pr(>χ2) <0.0001 R226,927 0.485 γ 0.726
max |∂log L/∂β| 0.0004 Brier 0.133 τa 0.341
Code
spar(ps=12)
p1 <- Predict(f,    age,   pclass, sex, sibsp=0, fun=plogis)
p2 <- Predict(f.si, age.i, pclass, sex, sibsp=0, fun=plogis)
p  <- rbind('Casewise Deletion'=p1, 'Single Imputation'=p2,
            rename=c(age.i='age'))   # creates .set. variable
ggplot(p, groups='sex', ylab='Probability of Surviving')

Figure 12.9: Predicted probability of survival for males from fit using casewise deletion (bottom) and single conditional mean imputation (top). sibsp is set to zero for these predicted values.

Code
anova(f.si)
Wald Statistics for survived
χ2 d.f. P
sex (Factor+Higher Order Factors) 245.39 7 <0.0001
All Interactions 52.85 6 <0.0001
pclass (Factor+Higher Order Factors) 112.07 12 <0.0001
All Interactions 36.79 10 <0.0001
age.i (Factor+Higher Order Factors) 49.32 20 0.0003
All Interactions 25.62 16 0.0595
Nonlinear (Factor+Higher Order Factors) 19.71 15 0.1835
sibsp (Factor+Higher Order Factors) 22.02 5 0.0005
All Interactions 12.28 4 0.0154
sex × pclass (Factor+Higher Order Factors) 30.29 2 <0.0001
sex × age.i (Factor+Higher Order Factors) 8.91 4 0.0633
Nonlinear 5.62 3 0.1319
Nonlinear Interaction : f(A,B) vs. AB 5.62 3 0.1319
pclass × age.i (Factor+Higher Order Factors) 6.05 8 0.6421
Nonlinear 5.44 6 0.4888
Nonlinear Interaction : f(A,B) vs. AB 5.44 6 0.4888
age.i × sibsp (Factor+Higher Order Factors) 12.28 4 0.0154
Nonlinear 2.05 3 0.5614
Nonlinear Interaction : f(A,B) vs. AB 2.05 3 0.5614
TOTAL NONLINEAR 19.71 15 0.1835
TOTAL INTERACTION 67.00 18 <0.0001
TOTAL NONLINEAR + INTERACTION 69.53 21 <0.0001
TOTAL 305.74 26 <0.0001

12.6 Multiple Imputation

The following uses aregImpute with predictive mean matching. By default, aregImpute does not transform age when it is being predicted from the other variables. Four knots are used to transform age when used to impute other variables (not needed here as no other missings were present). Since the fraction of observations with missing age is \(\frac{263}{1309} = 0.2\) we use 20 imputations.

Code
set.seed(17)         # so can reproduce random aspects
mi <- aregImpute(~ age + sex + pclass + 
                 sibsp + parch + survived,
                 data=t3, n.impute=20, nk=4, pr=FALSE)
mi

Multiple Imputation using Bootstrap and PMM

aregImpute(formula = ~age + sex + pclass + sibsp + parch + survived, 
    data = t3, n.impute = 20, nk = 4, pr = FALSE)

n: 1309     p: 6    Imputations: 20     nk: 4 

Number of NAs:
     age      sex   pclass    sibsp    parch survived 
     263        0        0        0        0        0 

         type d.f.
age         s    1
sex         c    1
pclass      c    2
sibsp       s    3
parch       s    3
survived    l    1

Transformation of Target Variables Forced to be Linear

R-squares for Predicting Non-Missing Values for Each Variable
Using Last Imputations of Predictors
  age 
0.373 
Code
# Print the first 10 imputations for the first 10 passengers
#  having missing age
mi$imputed$age[1:10, 1:10]
    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
16    41 47.0   24   44 60.0   47 28.0   29   49    17
38    53 44.0   76   59 35.0   39 16.0   54   19    29
41    45 46.0   28   40 50.0   61 19.0   63   18    61
47    31 28.5   33   35 61.0   55 45.5   38   41    47
60    35 40.0   49   41 27.0   36 51.0    2   33    27
70    30 30.0   16   53 56.0   70 17.0   38   45    51
71    55 36.0   36   42 42.0   33 65.0   46   39    57
75    24 36.0   47   49 45.5   47 47.0   38   55    56
81    60 45.0   46   28 55.0   42 45.0   61   33    45
107   46 29.0   40   58 71.0   58 47.0   63   61    56

Show the distribution of imputed (black) and actual ages (gray).

E
Code
plot(mi)
Ecdf(t3$age, add=TRUE, col='gray', lwd=2,
     subtitles=FALSE)

Figure 12.10: Distributions of imputed and actual ages for the Titanic dataset. Imputed values are in black and actual ages in gray.

Fit logistic models for 20 completed datasets and print the ratio of imputation-corrected variances to average ordinary variances

F
Code
f.mi <- fit.mult.impute(
  survived ~ (sex + pclass + rcs(age,5))^2 +
  rcs(age,5)*sibsp,
  lrm, mi, data=t3, pr=FALSE)
anova(f.mi)
Wald Statistics for survived
χ2 d.f. P
sex (Factor+Higher Order Factors) 237.81 7 <0.0001
All Interactions 53.44 6 <0.0001
pclass (Factor+Higher Order Factors) 113.77 12 <0.0001
All Interactions 38.60 10 <0.0001
age (Factor+Higher Order Factors) 49.97 20 0.0002
All Interactions 26.00 16 0.0540
Nonlinear (Factor+Higher Order Factors) 23.03 15 0.0835
sibsp (Factor+Higher Order Factors) 25.08 5 0.0001
All Interactions 13.42 4 0.0094
sex × pclass (Factor+Higher Order Factors) 32.70 2 <0.0001
sex × age (Factor+Higher Order Factors) 10.54 4 0.0322
Nonlinear 8.40 3 0.0384
Nonlinear Interaction : f(A,B) vs. AB 8.40 3 0.0384
pclass × age (Factor+Higher Order Factors) 5.53 8 0.6996
Nonlinear 4.67 6 0.5870
Nonlinear Interaction : f(A,B) vs. AB 4.67 6 0.5870
age × sibsp (Factor+Higher Order Factors) 13.42 4 0.0094
Nonlinear 2.11 3 0.5492
Nonlinear Interaction : f(A,B) vs. AB 2.11 3 0.5492
TOTAL NONLINEAR 23.03 15 0.0835
TOTAL INTERACTION 66.42 18 <0.0001
TOTAL NONLINEAR + INTERACTION 69.10 21 <0.0001
TOTAL 294.26 26 <0.0001

The Wald \(\chi^2\) for age is reduced by accounting for imputation but is increased by using patterns of association with survival status to impute missing age. Show estimated effects of age by classes.

G
Code
spar(ps=12)
p1 <- Predict(f.si,  age.i, pclass, sex, sibsp=0, fun=plogis)
p2 <- Predict(f.mi,  age,   pclass, sex, sibsp=0, fun=plogis)
p  <- rbind('Single Imputation'=p1, 'Multiple Imputation'=p2,
            rename=c(age.i='age'))
ggplot(p, groups='sex', ylab='Probability of Surviving')

Figure 12.11: Predicted probability of survival for males from fit using single conditional mean imputation again (top) and multiple random draw imputation (bottom). Both sets of predictions are for sibsp=0.

12.7 Summarizing the Fitted Model

Show odds ratios for changes in predictor values

H
Code
spar(bot=1, top=0.5)
# Get predicted values for certain types of passengers
s <- summary(f.mi, age=c(1,30), sibsp=0:1)  
# override default ranges for 3 variables
plot(s, log=TRUE, main='')             

Figure 12.12: Odds ratios for some predictor settings

Code
phat <- predict(f.mi,
                combos <-
         expand.grid(age=c(2,21,50),sex=levels(t3$sex),
                     pclass=levels(t3$pclass),
                     sibsp=0), type='fitted')
# Can also use Predict(f.mi, age=c(2,21,50), sex, pclass,
#                      sibsp=0, fun=plogis)$yhat
options(digits=1)
data.frame(combos, phat)
   age    sex pclass sibsp phat
1    2 female    1st     0 0.97
2   21 female    1st     0 0.98
3   50 female    1st     0 0.97
4    2   male    1st     0 0.89
5   21   male    1st     0 0.47
6   50   male    1st     0 0.26
7    2 female    2nd     0 1.00
8   21 female    2nd     0 0.90
9   50 female    2nd     0 0.81
10   2   male    2nd     0 1.00
11  21   male    2nd     0 0.08
12  50   male    2nd     0 0.03
13   2 female    3rd     0 0.85
14  21 female    3rd     0 0.57
15  50 female    3rd     0 0.35
16   2   male    3rd     0 0.91
17  21   male    3rd     0 0.13
18  50   male    3rd     0 0.05
Code
options(digits=5)

We can also get predicted values by creating an S function that will evaluate the model on demand.

I
Code
pred.logit <- Function(f.mi)
# Note: if don't define sibsp to pred.logit, defaults to 0
pred.logit
function (sex = "male", pclass = "3rd", age = 28, sibsp = 0) 
{
    3.373079 - 1.0484795 * (sex == "male") + 5.8078168 * (pclass == 
        "2nd") - 1.4370771 * (pclass == "3rd") + 0.078347318 * 
        age - 0.00027150053 * pmax(age - 6, 0)^3 + 0.0017093284 * 
        pmax(age - 21, 0)^3 - 0.0023751505 * pmax(age - 28, 0)^3 + 
        0.0010126373 * pmax(age - 36, 0)^3 - 7.5314668e-05 * 
        pmax(age - 56, 0)^3 - 1.1799235 * sibsp + (sex == "male") * 
        (-0.47754081 * (pclass == "2nd") + 2.0665924 * (pclass == 
            "3rd")) + (sex == "male") * (-0.21884197 * age + 
        0.00042463444 * pmax(age - 6, 0)^3 - 0.0023860246 * pmax(age - 
        21, 0)^3 + 0.0030996682 * pmax(age - 28, 0)^3 - 0.0012255784 * 
        pmax(age - 36, 0)^3 + 8.7300463e-05 * pmax(age - 56, 
        0)^3) + (pclass == "2nd") * (-0.47647131 * age + 0.00068483 * 
        pmax(age - 6, 0)^3 - 0.0029990417 * pmax(age - 21, 0)^3 + 
        0.0031221255 * pmax(age - 28, 0)^3 - 0.00083472782 * 
        pmax(age - 36, 0)^3 + 2.6813959e-05 * pmax(age - 56, 
        0)^3) + (pclass == "3rd") * (-0.16335774 * age + 0.00030986546 * 
        pmax(age - 6, 0)^3 - 0.0018174716 * pmax(age - 21, 0)^3 + 
        0.002491657 * pmax(age - 28, 0)^3 - 0.0010824082 * pmax(age - 
        36, 0)^3 + 9.8357307e-05 * pmax(age - 56, 0)^3) + sibsp * 
        (0.042368037 * age - 2.0590588e-05 * pmax(age - 6, 0)^3 + 
            0.00017884536 * pmax(age - 21, 0)^3 - 0.00039201911 * 
            pmax(age - 28, 0)^3 + 0.00028732385 * pmax(age - 
            36, 0)^3 - 5.3559508e-05 * pmax(age - 56, 0)^3)
}
<environment: 0x55dc70f19a50>
Code
# Run the newly created function
plogis(pred.logit(age=c(2,21,50), sex='male', pclass='3rd'))
[1] 0.912648 0.134219 0.050343

A nomogram could be used to obtain predicted values manually, but this is not feasible when so many interaction terms are present.

J

12.8 Bayesian Analysis

  • Repeat the multiple imputation-based approach but using a Bayesian binary logistic model
  • Using default blrm function normal priors on regression coefficients with zero mean and large SD making the priors almost flat
  • blrm uses the rstan package that provides the full power of Stan to R
  • Could use smaller SDs to get penalized estimates
  • Using 4 independent Markov chain Hamiltonion posterior sampling procedures each with 1000 burn-in iterations that are discarded, and 1000 “real” iterations for a total of 4000 posterior sample draws
  • Use the first 10 multiple imputations already developed above (object mi), running the Bayesian procedure separately for 10 completed datasets
  • Merely have to stack the posterior draws into one giant sample to account for imputation and get correct posterior distribution
K
Code
# Use all available CPU cores.  Each chain will be run on its
# own core.
# fitIf function only re-runs the model if bt.rds file doesn't exist
require(rmsb)
options(mc.cores=parallel::detectCores())
getRs('fitIf.r', put='source')
# 10 Bayesian analysis took 5m on 4 cores
set.seed(1)
fitIf(bt <- stackMI(survived ~ (sex + pclass + rcs(age, 5)) ^ 2 +
                    rcs(age, 5) * sibsp,
                    blrm, mi, data=t3, n.impute=10, refresh=25) )
bt

Bayesian Logistic Model

Dirichlet Priors With Concentration Parameter 0.541 for Intercepts

 stackMI(formula = survived ~ (sex + pclass + rcs(age, 5))^2 + 
     rcs(age, 5) * sibsp, fitter = blrm, xtrans = mi, data = t3, 
     n.impute = 10, refresh = 25)
 
Mixed Calibration/
Discrimination Indexes
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 1309 B 0.134 [0.131, 0.136] g 2.539 [2.229, 2.858] C 0.867 [0.861, 0.872]
0 809 gp 0.358 [0.341, 0.373] Dxy 0.734 [0.723, 0.745]
1 500 EV 0.463 [0.422, 0.503]
Draws 40000 v 5.885 [4.157, 8.181]
Chains 4 vp 0.109 [0.1, 0.118]
Imputations 10
p 26
Mean β Median β S.E. Lower Upper Pr(β>0) Symmetry
Intercept   4.2256   4.0181  2.1182   0.3097   8.4753  0.9921  1.33
sex=male  -1.1505  -1.1362  1.0674   -3.2620   0.9230  0.1374  0.98
pclass=2nd   6.7436   6.3237  4.1646   -0.9114  15.3055  0.9694  1.34
pclass=3rd  -2.1347  -1.9380  2.1004   -6.3520   1.7840  0.1407  0.75
age   0.0382   0.0491  0.1471   -0.2592   0.3171  0.6336  0.81
age’  -0.5684  -0.5853  0.7115   -1.9658   0.8117  0.2067  1.07
age’’   3.8317   3.8511  4.0698   -3.8496  12.1277  0.8287  0.97
age’’’  -5.4189  -5.4252  5.6309  -16.6913   5.3325  0.1654  1.00
sibsp  -1.2754  -1.2591  0.3485   -1.9512  -0.5918  0.0000  0.87
sex=male × pclass=2nd  -0.5085  -0.5209  0.7331   -1.9222   0.9603  0.2372  1.06
sex=male × pclass=3rd   2.2239   2.1987  0.6346   1.0235   3.4871  1.0000  1.13
sex=male × age  -0.2224  -0.2216  0.0679   -0.3579  -0.0907  0.0006  0.98
sex=male × age’   1.0566   1.0550  0.3919   0.2954   1.8350  0.9970  1.02
sex=male × age’’  -5.7767  -5.7699  2.5942  -10.7636  -0.5762  0.0124  0.98
sex=male × age’’’   7.2647   7.2379  3.9670   -0.5857  14.9241  0.9675  1.02
pclass=2nd × age  -0.5420  -0.5185  0.2710   -1.1172  -0.0554  0.0118  0.78
pclass=3rd × age  -0.1310  -0.1414  0.1447   -0.4068   0.1631  0.1712  1.24
pclass=2nd × age’   1.9521   1.9002  1.0843   -0.0598   4.1981  0.9722  1.14
pclass=3rd × age’   0.6805   0.7012  0.6815   -0.6765   1.9988  0.8438  0.91
pclass=2nd × age’’  -8.5030  -8.3619  5.6139  -19.6687   2.2496  0.0607  0.94
pclass=3rd × age’’  -4.1617  -4.2095  3.8193  -11.4759   3.4303  0.1368  1.06
pclass=2nd × age’’’   8.7236   8.6409  7.3677   -5.4246  23.3336  0.8841  1.03
pclass=3rd × age’’’   5.7495   5.7856  5.2019   -4.6609  15.7567  0.8664  0.98
age × sibsp   0.0438   0.0433  0.0323   -0.0176   0.1085  0.9142  1.03
age’ × sibsp  -0.0206  -0.0254  0.2259   -0.4571   0.4276  0.4555  1.05
age’’ × sibsp   0.1931   0.2060  1.6409   -3.1399   3.2789  0.5508  0.97
age’’’ × sibsp  -0.6045  -0.6143  2.6561   -5.7432   4.6123  0.4089  1.01
  • Note that fit indexes have HPD uncertainty intervals
  • Everthing above accounts for imputation
  • Look at diagnostics
L
Code
stanDx(bt)
Diagnostics for each of 10 imputations

Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved

For each parameter, n_eff is a crude measure of effective sample size
and Rhat is the potential scale reduction factor on split chains
(at convergence, Rhat=1)
                                     n_eff  Rhat
Imputation 1: Intercept               1893 1.002
Imputation 1: sex=male                2701 1.003
Imputation 1: pclass=2nd              1134 1.003
Imputation 1: pclass=3rd              2980 1.001
Imputation 1: age                     1222 1.003
Imputation 1: age'                     820 1.004
Imputation 1: age''                   1155 1.003
Imputation 1: age'''                  1849 1.002
Imputation 1: sibsp                   2809 1.000
Imputation 1: sex=male * pclass=2nd   2625 1.000
Imputation 1: sex=male * pclass=3rd   2500 1.001
Imputation 1: sex=male * age          3696 1.001
Imputation 1: sex=male * age'         2899 1.002
Imputation 1: sex=male * age''        3932 1.000
Imputation 1: sex=male * age'''       3819 1.000
Imputation 1: pclass=2nd * age         940 1.004
Imputation 1: pclass=3rd * age        1659 1.002
Imputation 1: pclass=2nd * age'        858 1.004
Imputation 1: pclass=3rd * age'       2553 1.000
Imputation 1: pclass=2nd * age''       918 1.004
Imputation 1: pclass=3rd * age''      3194 1.000
Imputation 1: pclass=2nd * age'''     1674 1.001
Imputation 1: pclass=3rd * age'''     4601 1.000
Imputation 1: age * sibsp             4958 0.999
Imputation 1: age' * sibsp            5299 1.000
Imputation 1: age'' * sibsp           4119 1.001
Imputation 1: age''' * sibsp          5121 1.000
Imputation 2: Intercept               2273 1.000
Imputation 2: sex=male                2970 1.000
Imputation 2: pclass=2nd              1400 1.000
Imputation 2: pclass=3rd              2848 1.000
Imputation 2: age                     1284 1.003
Imputation 2: age'                    1342 1.001
Imputation 2: age''                   1469 1.002
Imputation 2: age'''                  2963 1.000
Imputation 2: sibsp                   2809 1.000
Imputation 2: sex=male * pclass=2nd   3257 1.001
Imputation 2: sex=male * pclass=3rd   2656 1.001
Imputation 2: sex=male * age          3892 1.000
Imputation 2: sex=male * age'         3448 1.000
Imputation 2: sex=male * age''        3958 1.001
Imputation 2: sex=male * age'''       4433 1.000
Imputation 2: pclass=2nd * age        1164 1.002
Imputation 2: pclass=3rd * age        2334 1.001
Imputation 2: pclass=2nd * age'       1041 1.001
Imputation 2: pclass=3rd * age'       3052 1.001
Imputation 2: pclass=2nd * age''      1106 1.002
Imputation 2: pclass=3rd * age''      3354 1.000
Imputation 2: pclass=2nd * age'''     2349 1.000
Imputation 2: pclass=3rd * age'''     4145 1.000
Imputation 2: age * sibsp             4272 1.000
Imputation 2: age' * sibsp            4422 1.000
Imputation 2: age'' * sibsp           3680 1.000
Imputation 2: age''' * sibsp          3928 1.000
Imputation 3: Intercept               2562 1.001
Imputation 3: sex=male                3250 1.001
Imputation 3: pclass=2nd              1784 1.000
Imputation 3: pclass=3rd              2841 1.001
Imputation 3: age                     1671 1.001
Imputation 3: age'                    1366 1.002
Imputation 3: age''                   1445 1.001
Imputation 3: age'''                  2562 1.001
Imputation 3: sibsp                   2473 1.001
Imputation 3: sex=male * pclass=2nd   3265 1.000
Imputation 3: sex=male * pclass=3rd   2954 1.001
Imputation 3: sex=male * age          3858 1.000
Imputation 3: sex=male * age'         3747 1.001
Imputation 3: sex=male * age''        4405 1.000
Imputation 3: sex=male * age'''       3866 1.000
Imputation 3: pclass=2nd * age        1248 1.002
Imputation 3: pclass=3rd * age        2209 1.001
Imputation 3: pclass=2nd * age'       1257 1.002
Imputation 3: pclass=3rd * age'       3061 1.001
Imputation 3: pclass=2nd * age''      1293 1.002
Imputation 3: pclass=3rd * age''      4078 1.001
Imputation 3: pclass=2nd * age'''     2271 1.000
Imputation 3: pclass=3rd * age'''     4108 1.000
Imputation 3: age * sibsp             4388 1.000
Imputation 3: age' * sibsp            4964 1.000
Imputation 3: age'' * sibsp           4204 1.001
Imputation 3: age''' * sibsp          4343 1.000
Imputation 4: Intercept               2318 1.004
Imputation 4: sex=male                3301 1.001
Imputation 4: pclass=2nd              1580 1.003
Imputation 4: pclass=3rd              3025 1.001
Imputation 4: age                     1148 1.008
Imputation 4: age'                    1367 1.004
Imputation 4: age''                   1296 1.007
Imputation 4: age'''                  2533 1.003
Imputation 4: sibsp                   2843 1.001
Imputation 4: sex=male * pclass=2nd   2889 1.001
Imputation 4: sex=male * pclass=3rd   2954 1.001
Imputation 4: sex=male * age          4235 1.001
Imputation 4: sex=male * age'         4190 1.000
Imputation 4: sex=male * age''        4021 1.000
Imputation 4: sex=male * age'''       3700 1.001
Imputation 4: pclass=2nd * age        1084 1.008
Imputation 4: pclass=3rd * age        2411 1.002
Imputation 4: pclass=2nd * age'       1063 1.007
Imputation 4: pclass=3rd * age'       3393 1.001
Imputation 4: pclass=2nd * age''      1290 1.005
Imputation 4: pclass=3rd * age''      3973 1.000
Imputation 4: pclass=2nd * age'''     1797 1.005
Imputation 4: pclass=3rd * age'''     4022 1.000
Imputation 4: age * sibsp             3798 1.001
Imputation 4: age' * sibsp            4621 0.999
Imputation 4: age'' * sibsp           3509 1.001
Imputation 4: age''' * sibsp          3924 1.000
Imputation 5: Intercept               2186 1.000
Imputation 5: sex=male                2416 1.002
Imputation 5: pclass=2nd              1978 1.001
Imputation 5: pclass=3rd              2445 1.000
Imputation 5: age                     1708 1.001
Imputation 5: age'                    1418 1.001
Imputation 5: age''                   1716 1.001
Imputation 5: age'''                  2791 1.001
Imputation 5: sibsp                   3078 1.000
Imputation 5: sex=male * pclass=2nd   2698 1.001
Imputation 5: sex=male * pclass=3rd   2186 1.000
Imputation 5: sex=male * age          3246 0.999
Imputation 5: sex=male * age'         3216 1.001
Imputation 5: sex=male * age''        3887 1.000
Imputation 5: sex=male * age'''       3875 1.000
Imputation 5: pclass=2nd * age        1389 1.003
Imputation 5: pclass=3rd * age        2497 1.000
Imputation 5: pclass=2nd * age'       1270 1.002
Imputation 5: pclass=3rd * age'       2842 1.000
Imputation 5: pclass=2nd * age''      1336 1.002
Imputation 5: pclass=3rd * age''      3537 1.000
Imputation 5: pclass=2nd * age'''     2111 1.001
Imputation 5: pclass=3rd * age'''     3827 1.000
Imputation 5: age * sibsp             4005 1.000
Imputation 5: age' * sibsp            3875 1.000
Imputation 5: age'' * sibsp           3458 1.000
Imputation 5: age''' * sibsp          3252 1.000
Imputation 6: Intercept               1918 1.000
Imputation 6: sex=male                2662 1.000
Imputation 6: pclass=2nd              1700 1.002
Imputation 6: pclass=3rd              2640 1.001
Imputation 6: age                     1588 1.001
Imputation 6: age'                    1333 1.004
Imputation 6: age''                   1388 1.000
Imputation 6: age'''                  2471 1.002
Imputation 6: sibsp                   2706 1.003
Imputation 6: sex=male * pclass=2nd   3197 1.001
Imputation 6: sex=male * pclass=3rd   2396 1.000
Imputation 6: sex=male * age          3576 1.000
Imputation 6: sex=male * age'         3432 1.000
Imputation 6: sex=male * age''        3663 1.000
Imputation 6: sex=male * age'''       3859 1.000
Imputation 6: pclass=2nd * age        1108 1.002
Imputation 6: pclass=3rd * age        2216 1.001
Imputation 6: pclass=2nd * age'       1191 1.003
Imputation 6: pclass=3rd * age'       3002 1.000
Imputation 6: pclass=2nd * age''      1198 1.003
Imputation 6: pclass=3rd * age''      3764 1.001
Imputation 6: pclass=2nd * age'''     2150 1.000
Imputation 6: pclass=3rd * age'''     4091 1.000
Imputation 6: age * sibsp             4157 1.000
Imputation 6: age' * sibsp            4443 1.001
Imputation 6: age'' * sibsp           4338 1.000
Imputation 6: age''' * sibsp          4307 1.000
Imputation 7: Intercept               2689 1.001
Imputation 7: sex=male                3054 1.000
Imputation 7: pclass=2nd              1571 1.001
Imputation 7: pclass=3rd              3453 1.001
Imputation 7: age                     1270 1.002
Imputation 7: age'                    1283 1.001
Imputation 7: age''                   1440 1.002
Imputation 7: age'''                  2645 1.000
Imputation 7: sibsp                   2437 1.000
Imputation 7: sex=male * pclass=2nd   2924 1.000
Imputation 7: sex=male * pclass=3rd   3257 1.000
Imputation 7: sex=male * age          3571 0.999
Imputation 7: sex=male * age'         3862 1.000
Imputation 7: sex=male * age''        4562 1.001
Imputation 7: sex=male * age'''       3713 1.000
Imputation 7: pclass=2nd * age        1122 1.000
Imputation 7: pclass=3rd * age        2353 1.000
Imputation 7: pclass=2nd * age'       1220 1.002
Imputation 7: pclass=3rd * age'       3362 1.000
Imputation 7: pclass=2nd * age''      1207 1.001
Imputation 7: pclass=3rd * age''      3626 1.000
Imputation 7: pclass=2nd * age'''     2059 1.000
Imputation 7: pclass=3rd * age'''     4141 0.999
Imputation 7: age * sibsp             4111 1.000
Imputation 7: age' * sibsp            4457 1.000
Imputation 7: age'' * sibsp           3951 1.000
Imputation 7: age''' * sibsp          4693 1.000
Imputation 8: Intercept               2613 1.000
Imputation 8: sex=male                3091 1.000
Imputation 8: pclass=2nd              1854 1.001
Imputation 8: pclass=3rd              3099 1.000
Imputation 8: age                     1711 1.001
Imputation 8: age'                    1536 1.000
Imputation 8: age''                   1653 1.003
Imputation 8: age'''                  3006 1.000
Imputation 8: sibsp                   2992 1.000
Imputation 8: sex=male * pclass=2nd   3087 1.000
Imputation 8: sex=male * pclass=3rd   2955 1.000
Imputation 8: sex=male * age          3602 1.000
Imputation 8: sex=male * age'         3565 1.000
Imputation 8: sex=male * age''        4107 1.000
Imputation 8: sex=male * age'''       3899 1.000
Imputation 8: pclass=2nd * age        1372 1.003
Imputation 8: pclass=3rd * age        2605 1.000
Imputation 8: pclass=2nd * age'       1225 1.001
Imputation 8: pclass=3rd * age'       3228 1.001
Imputation 8: pclass=2nd * age''      1498 1.002
Imputation 8: pclass=3rd * age''      3676 1.000
Imputation 8: pclass=2nd * age'''     2439 1.000
Imputation 8: pclass=3rd * age'''     4109 1.000
Imputation 8: age * sibsp             4696 0.999
Imputation 8: age' * sibsp            4108 0.999
Imputation 8: age'' * sibsp           3559 1.001
Imputation 8: age''' * sibsp          4186 0.999
Imputation 9: Intercept               2003 1.001
Imputation 9: sex=male                2746 1.000
Imputation 9: pclass=2nd              1608 1.000
Imputation 9: pclass=3rd              2721 1.000
Imputation 9: age                     1120 1.000
Imputation 9: age'                    1117 1.001
Imputation 9: age''                   1208 1.000
Imputation 9: age'''                  2558 1.000
Imputation 9: sibsp                   2790 1.001
Imputation 9: sex=male * pclass=2nd   2926 1.000
Imputation 9: sex=male * pclass=3rd   2527 1.000
Imputation 9: sex=male * age          3637 0.999
Imputation 9: sex=male * age'         3443 1.000
Imputation 9: sex=male * age''        4079 1.001
Imputation 9: sex=male * age'''       3738 0.999
Imputation 9: pclass=2nd * age        1055 1.001
Imputation 9: pclass=3rd * age        2429 1.000
Imputation 9: pclass=2nd * age'       1033 1.000
Imputation 9: pclass=3rd * age'       2787 1.000
Imputation 9: pclass=2nd * age''       999 1.001
Imputation 9: pclass=3rd * age''      3656 1.001
Imputation 9: pclass=2nd * age'''     1945 1.000
Imputation 9: pclass=3rd * age'''     4218 1.000
Imputation 9: age * sibsp             3939 1.000
Imputation 9: age' * sibsp            3803 1.000
Imputation 9: age'' * sibsp           4033 1.000
Imputation 9: age''' * sibsp          4072 1.002
Imputation 10: Intercept              3144 0.999
Imputation 10: sex=male               3086 1.000
Imputation 10: pclass=2nd             3889 1.000
Imputation 10: pclass=3rd             2786 1.000
Imputation 10: age                    3318 0.999
Imputation 10: age'                   3045 1.001
Imputation 10: age''                  4284 1.000
Imputation 10: age'''                 3578 1.001
Imputation 10: sibsp                  3231 1.001
Imputation 10: sex=male * pclass=2nd  3256 1.000
Imputation 10: sex=male * pclass=3rd  2510 1.000
Imputation 10: sex=male * age         3326 0.999
Imputation 10: sex=male * age'        2981 1.000
Imputation 10: sex=male * age''       4029 0.999
Imputation 10: sex=male * age'''      4459 1.000
Imputation 10: pclass=2nd * age       2750 1.000
Imputation 10: pclass=3rd * age       3317 1.000
Imputation 10: pclass=2nd * age'      3061 1.001
Imputation 10: pclass=3rd * age'      3229 1.001
Imputation 10: pclass=2nd * age''     3416 0.999
Imputation 10: pclass=3rd * age''     4054 1.001
Imputation 10: pclass=2nd * age'''    4637 0.999
Imputation 10: pclass=3rd * age'''    4616 1.000
Imputation 10: age * sibsp            4719 1.000
Imputation 10: age' * sibsp           4944 1.000
Imputation 10: age'' * sibsp          4028 1.000
Imputation 10: age''' * sibsp         4210 1.000
Code
# Look at convergence of only 2 parameters
stanDxplot(bt, c('sex=male', 'pclass=3rd', 'age'), rev=TRUE)

  • Difficult to see but there are 40 traces (10 imputations \(\times\) 4 chains)
  • Diagnostics look good; posterior samples can be trusted
  • Plot posterior densities for select parameters
  • Also shows the 10 densities before stacking
M
Code
plot(bt, c('sex=male', 'pclass=3rd', 'age'), nrow=2)

  • Plot partial effect plots with 0.95 highest posterior density intervals
N
Code
p <- Predict(bt, age, sex, pclass, sibsp=0, fun=plogis, funint=FALSE)
ggplot(p)

  • Compute approximate measure of explained outcome variation for predictors
O
Code
plot(anova(bt))

  • Contrast second class males and females, both at 5 years and 30 years of age, all other things being equal
  • Compute 0.95 HPD interval for the contrast and a joint uncertainty region
  • Compute P(both contrasts < 0), both < -2, and P(either one < 0)
P
Code
k <- contrast(bt, list(sex='male',   age=c(5, 30), pclass='2nd'),
                  list(sex='female', age=c(5, 30), pclass='2nd'),
              cnames = c('age 5 M-F', 'age 30 M-F'))
k
            age Contrast    S.E.   Lower   Upper Pr(Contrast>0)
1age 5 M-F    5  -2.7709 0.78074 -4.2774 -1.2136          3e-04
2age 30 M-F  30  -4.1487 0.51361 -5.1578 -3.1551          0e+00

Intervals are 0.95 highest posterior density intervals
Contrast is the posterior mean 
Code
plot(k)

Code
plot(k, bivar=TRUE)                        # assumes an ellipse
plot(k, bivar=TRUE, bivarmethod='kernel')  # doesn't
P <- PostF(k, pr=TRUE)
Contrast names: age 5 M-F, age 30 M-F 
Code
P(`age 5 M-F` <  0 & `age 30 M-F` <  0)    # note backticks
[1] 0.99972
Code
P(`age 5 M-F` < -2 & `age 30 M-F` < -2)
[1] 0.83813
Code
P(`age 5 M-F` <  0 | `age 30 M-F` <  0)
[1] 1

R software used
Package Purpose Functions
Hmisc Miscellaneous functions summary,plsmo,naclus,llist,latex, summarize,Dotplot,describe
Hmisc Imputation transcan,impute,fit.mult.impute,aregImpute,stackMI
rms Modeling datadist,lrm,blrm,rcs
Model presentation plot,summary,nomogram,Function,anova
Estimation Predict,summary,contrast
Model validation validate,calibrate
Misc. Bayesian stanDx,stanDxplot,plot
rpart1 Recursive partitioning rpart
  • 1 Written by Atkinson and Therneau