12 Logistic Model Case Study: Survival of Titanic Passengers

Data source: The Titanic Passenger List edited by Michael A. Findlay, originally published in Eaton & Haas (1994) Titanic: Triumph and Tragedy, Patrick Stephens Ltd, and expanded with the help of the Internet community. The original html files were obtained from Philip Hind (1999). The dataset was compiled and interpreted by Thomas Cason. It is available in R and spreadsheet formats from hbiostat.org/data under the name titanic3.

12.1 Descriptive Statistics

Code

require(rms)
options(prType='html')   # for print, summary, anova
getHdata(titanic3)        # get dataset from web site
# List of names of variables to analyze
v <- c('pclass','survived','age','sex','sibsp','parch')
t3 <- titanic3[, v]
units(t3$age) <- 'years'
describe(t3)

t3 Descriptives

t3

6 Variables 1309 Observations

pclass

n	missing	distinct
1309	0	3

 Value        1st   2nd   3rd
 Frequency    323   277   709
 Proportion 0.247 0.212 0.542

survived: Survived

n	missing	distinct	Info	Sum	Mean	Gmd
1309	0	2	0.708	500	0.382	0.4725

age: Age years

n	missing	distinct	Info	Mean	Gmd	.05	.10	.25	.50	.75	.90	.95
1046	263	98	0.999	29.88	16.06	5	14	21	28	39	50	57

lowest : 0.1667 0.3333 0.4167 0.6667 0.75 , highest: 70.5 71 74 76 80

sex

n	missing	distinct
1309	0	2

 Value      female   male
 Frequency     466    843
 Proportion  0.356  0.644

sibsp: Number of Siblings/Spouses Aboard

n	missing	distinct	Info	Mean	Gmd
1309	0	7	0.67	0.4989	0.777

 Value          0     1     2     3     4     5     8
 Frequency    891   319    42    20    22     6     9
 Proportion 0.681 0.244 0.032 0.015 0.017 0.005 0.007

For the frequency table, variable is rounded to the nearest 0

parch: Number of Parents/Children Aboard

n	missing	distinct	Info	Mean	Gmd
1309	0	8	0.549	0.385	0.6375

 Value          0     1     2     3     4     5     6     9
 Frequency   1002   170   113     8     6     6     2     2
 Proportion 0.765 0.130 0.086 0.006 0.005 0.005 0.002 0.002

For the frequency table, variable is rounded to the nearest 0

Code

spar(ps=6,rt=3)
dd <- datadist(t3)
# describe distributions of variables to rms
options(datadist='dd')
s <- summary(survived ~ age + sex + pclass +
             cut2(sibsp,0:3) + cut2(parch,0:3), data=t3)
plot(s, main='', subtitles=FALSE)

Figure 12.1: Univariable summaries of Titanic survival

Show 4-way relationships after collapsing levels. Suppress estimates based on \(<25\) passengers.

Code

require(ggplot2)
tn <- transform(t3,
  agec = ifelse(age < 21, 'child', 'adult'),
  sibsp= ifelse(sibsp == 0, 'no sib/sp', 'sib/sp'),
  parch= ifelse(parch == 0, 'no par/child', 'par/child'))
g <- function(y) if(length(y) < 25) NA else mean(y)
s <- with(tn, summarize(survived,
           llist(agec, sex, pclass, sibsp, parch), g))
# llist, summarize in Hmisc package
ggplot(subset(s, agec != 'NA'),
  aes(x=survived, y=pclass, shape=sex)) +
  geom_point() + facet_grid(agec ~ sibsp * parch) +
  xlab('Proportion Surviving') + ylab('Passenger Class') +
  scale_x_continuous(breaks=c(0, .5, 1))

Figure 12.2: Multi-way summary of Titanic survival

12.2 Exploring Trends with Nonparametric Regression

Code

b  <- scale_size_discrete(range=c(.1, .85))
yl <- ylab(NULL)
p1 <- ggplot(t3, aes(x=age, y=survived)) +
      histSpikeg(survived ~ age, lowess=TRUE, data=t3) +
      ylim(0,1) + yl
p2 <- ggplot(t3, aes(x=age, y=survived, color=sex)) +
      histSpikeg(survived ~ age + sex, lowess=TRUE,
                 data=t3) + ylim(0,1) + yl
p3 <- ggplot(t3, aes(x=age, y=survived, size=pclass)) +
      histSpikeg(survived ~ age + pclass, lowess=TRUE,
                 data=t3) + b + ylim(0,1) + yl
p4 <- ggplot(t3, aes(x=age, y=survived, color=sex,
       size=pclass)) +
      histSpikeg(survived ~ age + sex + pclass,
                 lowess=TRUE, data=t3) +
      b + ylim(0,1) + yl
gridExtra::grid.arrange(p1, p2, p3, p4, ncol=2)   # combine 4

Figure 12.3: Nonparametric regression (`loess`) estimates of the relationship between age and the probability of surviving the Titanic, with tick marks depicting the age distribution. The top left panel shows unstratified estimates of the probability of survival. Other panels show nonparametric estimates by various stratifications.

Code

top <- theme(legend.position='top')
p1 <- ggplot(t3, aes(x=age, y=survived, color=cut2(sibsp,
       0:2))) + stat_plsmo() + b + ylim(0,1) + yl + top +
      scale_color_discrete(name='siblings/spouses')
p2 <- ggplot(t3, aes(x=age, y=survived, color=cut2(parch,
       0:2))) + stat_plsmo() + b + ylim(0,1) + yl + top +
      scale_color_discrete(name='parents/children')
gridExtra::grid.arrange(p1, p2, ncol=2)

Figure 12.4: Relationship between age and survival stratified by the number of siblings or spouses on board (left panel) or by the number of parents or children of the passenger on board (right panel).

12.3 Binary Logistic Model with Casewise Deletion of Missing Values

First fit a model that is saturated with respect to age, sex, pclass
Insufficient variation in sibsp, parch to fit complex interactions or nonlinearities.
With age appearing in so many terms, giving too many parameters to age creates instabilities and makes many bootstrap repetitions fail to converge or to yield singular covariance matrices
Use AIC to determine the global number of knots for age that is “best for the money” in terms of being the most likely to cross-validate well

Code

for(k in 3 : 5) {
  f <- lrm(survived ~ sex*pclass*rcs(age, k) +
           rcs(age, k)*(sibsp + parch), data=t3)
  cat('k=', k, '  AIC=', AIC(f), '\n')
}

k= 3   AIC= 922.9147 
k= 4   AIC= 916.6481 
k= 5   AIC= 921.2103

4 knots has best (lowest) AIC and we’ll use that going forward
Refit that model with x=TRUE, y=TRUE so can do likelihood ratio (LR) tests
But start with Wald tests

Code

f1 <- lrm(survived ~ sex*pclass*rcs(age,4) +
          rcs(age,4)*(sibsp + parch), data=t3, x=TRUE, y=TRUE)
print(f1, r2=1:4)   # print all 4 R^2 measures that use only the global LR chi-square

Logistic Regression Model

lrm(formula = survived ~ sex * pclass * rcs(age, 4) + rcs(age, 
    4) * (sibsp + parch), data = t3, x = TRUE, y = TRUE)

Frequencies of Missing Values Due to Each Variable

survived      sex   pclass      age    sibsp    parch 
       0        0        0      263        0        0

	Model Likelihood Ratio Test	Discrimination Indexes	Rank Discrim. Indexes
Obs 1046	LR χ² 561.97	R²₁₀₄₆ 0.416	C 0.876
0 619	d.f. 31	R²_31,1046 0.398	D_xy 0.752
1 427	Pr(>χ²) <0.0001	R²_758.1 0.524	γ 0.753
max \|∂log L/∂β\| 0.0007		R²_31,758.1 0.504	τ_a 0.363
		Brier 0.129

	β	S.E.	Wald Z	Pr(>\|Z\|)
Intercept	-2.2942	3.4139	-0.67	0.5016
sex=male	6.3349	4.2247	1.50	0.1337
pclass=2nd	14.3540	8.4673	1.70	0.0900
pclass=3rd	3.5271	3.2329	1.09	0.2753
age	0.3671	0.2187	1.68	0.0932
age'	-0.8270	0.5684	-1.45	0.1457
age''	2.9159	2.3083	1.26	0.2065
sibsp	-0.8241	0.3173	-2.60	0.0094
parch	0.2397	0.7406	0.32	0.7462
sex=male × pclass=2nd	-13.7215	9.0533	-1.52	0.1296
sex=male × pclass=3rd	-6.3991	4.3000	-1.49	0.1367
sex=male × age	-0.5937	0.2582	-2.30	0.0215
sex=male × age'	1.2395	0.6406	1.93	0.0530
sex=male × age''	-4.3891	2.5546	-1.72	0.0858
pclass=2nd × age	-0.9460	0.4793	-1.97	0.0484
pclass=3rd × age	-0.4106	0.2097	-1.96	0.0502
pclass=2nd × age'	2.2111	1.0827	2.04	0.0411
pclass=3rd × age'	0.7450	0.5632	1.32	0.1859
pclass=2nd × age''	-8.5916	4.1621	-2.06	0.0390
pclass=3rd × age''	-2.0708	2.3726	-0.87	0.3828
age × sibsp	0.0035	0.0277	0.13	0.9005
age' × sibsp	0.1309	0.1076	1.22	0.2237
age'' × sibsp	-0.7549	0.5438	-1.39	0.1651
age × parch	0.0145	0.0468	0.31	0.7558
age' × parch	-0.1092	0.1262	-0.87	0.3869
age'' × parch	0.5123	0.5365	0.95	0.3396
sex=male × pclass=2nd × age	0.7993	0.5140	1.56	0.1199
sex=male × pclass=3rd × age	0.4755	0.2641	1.80	0.0718
sex=male × pclass=2nd × age'	-1.9165	1.1705	-1.64	0.1016
sex=male × pclass=3rd × age'	-0.7422	0.6754	-1.10	0.2719
sex=male × pclass=2nd × age''	7.6430	4.5357	1.69	0.0920
sex=male × pclass=3rd × age''	1.1688	2.8864	0.40	0.6855

Code

anova(f1)

Wald Statistics for `survived`
	χ²	d.f.	P
sex (Factor+Higher Order Factors)	187.59	12	<0.0001
All Interactions	60.55	11	<0.0001
pclass (Factor+Higher Order Factors)	100.33	16	<0.0001
All Interactions	47.44	14	<0.0001
age (Factor+Higher Order Factors)	61.35	24	<0.0001
All Interactions	37.51	21	0.0147
Nonlinear (Factor+Higher Order Factors)	28.15	16	0.0303
sibsp (Factor+Higher Order Factors)	20.38	4	0.0004
All Interactions	11.84	3	0.0080
parch (Factor+Higher Order Factors)	3.79	4	0.4349
All Interactions	3.79	3	0.2848
sex × pclass (Factor+Higher Order Factors)	43.72	8	<0.0001
sex × age (Factor+Higher Order Factors)	14.39	9	0.1093
Nonlinear (Factor+Higher Order Factors)	12.54	6	0.0510
Nonlinear Interaction : f(A,B) vs. AB	4.95	2	0.0843
pclass × age (Factor+Higher Order Factors)	18.59	12	0.0989
Nonlinear (Factor+Higher Order Factors)	15.56	8	0.0492
Nonlinear Interaction : f(A,B) vs. AB	9.22	4	0.0559
age × sibsp (Factor+Higher Order Factors)	11.84	3	0.0080
Nonlinear	2.22	2	0.3302
Nonlinear Interaction : f(A,B) vs. AB	2.22	2	0.3302
age × parch (Factor+Higher Order Factors)	3.79	3	0.2848
Nonlinear	1.02	2	0.5994
Nonlinear Interaction : f(A,B) vs. AB	1.02	2	0.5994
sex × pclass × age (Factor+Higher Order Factors)	11.24	6	0.0813
Nonlinear	10.12	4	0.0385
TOTAL NONLINEAR	28.15	16	0.0303
TOTAL INTERACTION	77.40	23	<0.0001
TOTAL NONLINEAR + INTERACTION	80.04	25	<0.0001
TOTAL	243.00	31	<0.0001

Compute the slightly more time-consuming LR tests

Code

af1 <- anova(f1, test='LR')
print(af1, which='subscripts')

Likelihood Ratio Statistics for `survived`
	χ²	d.f.	P	Tested
sex (Factor+Higher Order Factors)	339.48	12	<0.0001	1,9-13,26-31
All Interactions	76.17	11	<0.0001	9-13,26-31
pclass (Factor+Higher Order Factors)	154.71	16	<0.0001	2-3,9-10,14-19,26-31
All Interactions	64.95	14	<0.0001	9-10,14-19,26-31
age (Factor+Higher Order Factors)	109.11	24	<0.0001	4-6,11-31
All Interactions	53.85	21	0.0001	11-31
Nonlinear (Factor+Higher Order Factors)	37.75	16	0.0016	5-6,12-13,16-19,21-22,24-25,28-31
sibsp (Factor+Higher Order Factors)	26.75	4	<0.0001	7,20-22
All Interactions	12.10	3	0.0070	20-22
parch (Factor+Higher Order Factors)	3.96	4	0.4109	8,23-25
All Interactions	3.95	3	0.2666	23-25
sex × pclass (Factor+Higher Order Factors)	54.58	8	<0.0001	9-10,26-31
sex × age (Factor+Higher Order Factors)	19.68	9	0.0200	11-13,26-31
Nonlinear (Factor+Higher Order Factors)	16.43	6	0.0116	12-13,28-31
Nonlinear Interaction : f(A,B) vs. AB	7.76	2	0.0206	12-13
pclass × age (Factor+Higher Order Factors)	27.45	12	0.0066	14-19,26-31
Nonlinear (Factor+Higher Order Factors)	22.59	8	0.0039	16-19,28-31
Nonlinear Interaction : f(A,B) vs. AB	12.97	4	0.0114	16-19
age × sibsp (Factor+Higher Order Factors)	12.10	3	0.0070	20-22
Nonlinear	2.26	2	0.3224	21-22
Nonlinear Interaction : f(A,B) vs. AB	2.26	2	0.3224	21-22
age × parch (Factor+Higher Order Factors)	3.95	3	0.2666	23-25
Nonlinear	1.03	2	0.5990	24-25
Nonlinear Interaction : f(A,B) vs. AB	1.03	2	0.5990	24-25
sex × pclass × age (Factor+Higher Order Factors)	14.94	6	0.0207	26-31
Nonlinear	14.00	4	0.0073	28-31
TOTAL NONLINEAR	37.75	16	0.0016	5-6,12-13,16-19,21-22,24-25,28-31
TOTAL INTERACTION	107.47	23	<0.0001	9-31
TOTAL NONLINEAR + INTERACTION	117.47	25	<0.0001	5-6,9-31
TOTAL	561.97	31	<0.0001	1-31

In the RMS text, 5 knots were used for age and only Wald tests were performed
Large \(p\)-value for the 3rd order interaction was used to justify exclusion of these highest-order interactions from the model (and one other term)
More evidence for 3rd order interaction from the more accurate LR test
Keep this model

Show the many effects of predictors.

Code

p <- Predict(f1, age, sex, pclass, sibsp=0, parch=0, fun=plogis)
ggplot(p)

Figure 12.5: Effects of predictors on probability of survival of Titanic passengers, estimated for zero siblings/spouses and zero parents/children

Code

ggplot(Predict(f1, sibsp, age=c(10,15,20,50), conf.int=FALSE))
#

Figure 12.6: Effect of number of siblings and spouses on the log odds of surviving, for third class males

Note that children having many siblings apparently had lower survival. Married adults had slightly higher survival than unmarried ones.

But moderate problem with missing data must be dealt with

12.4 Examining Missing Data Patterns

Code

spar(mfrow=c(2,2), top=1, ps=11)
na.patterns <- naclus(titanic3)
require(rpart)      # Recursive partitioning package
who.na <- rpart(is.na(age) ~ sex + pclass + survived +
                sibsp + parch, data=titanic3, minbucket=15)
naplot(na.patterns, 'na per var')
plot(who.na, margin=.1); text(who.na)
plot(na.patterns)

Figure 12.7: Patterns of missing data. Upper left panel shows the fraction of observations missing on each predictor. Lower panel depicts a hierarchical cluster analysis of missingness combinations. The similarity measure shown on the \(Y\)-axis is the fraction of observations for which both variables are missing. Right panel shows the result of recursive partitioning for predicting `is.na(age)`. The `rpart` function found only strong patterns according to passenger class.

Code

spar(ps=7, rt=3)
plot(summary(is.na(age) ~ sex + pclass + survived +
             sibsp + parch, data=t3))

Figure 12.8: Univariable descriptions of proportion of passengers with missing age

But models almost always provide better descriptive statistics

Code

m <- lrm(is.na(age) ~ sex * pclass + survived + sibsp + parch,
         data=t3)
m

Logistic Regression Model

lrm(formula = is.na(age) ~ sex * pclass + survived + sibsp + 
    parch, data = t3)

	Model Likelihood Ratio Test	Discrimination Indexes	Rank Discrim. Indexes
Obs 1309	LR χ² 114.99	R² 0.133	C 0.703
FALSE 1046	d.f. 8	R²_8,1309 0.078	D_xy 0.406
TRUE 263	Pr(>χ²) <0.0001	R²_8,630.5 0.156	γ 0.451
max \|∂log L/∂β\| 5×10^-6		Brier 0.148	τ_a 0.131

	β	S.E.	Wald Z	Pr(>\|Z\|)
Intercept	-2.2030	0.3641	-6.05	<0.0001
sex=male	0.6440	0.3953	1.63	0.1033
pclass=2nd	-1.0079	0.6658	-1.51	0.1300
pclass=3rd	1.6124	0.3596	4.48	<0.0001
survived	-0.1806	0.1828	-0.99	0.3232
sibsp	0.0435	0.0737	0.59	0.5548
parch	-0.3526	0.1253	-2.81	0.0049
sex=male × pclass=2nd	0.1347	0.7545	0.18	0.8583
sex=male × pclass=3rd	-0.8563	0.4214	-2.03	0.0422

Code

anova(m)

Wald Statistics for `is.na(age)`
	χ²	d.f.	P
sex (Factor+Higher Order Factors)	5.61	3	0.1324
All Interactions	5.58	2	0.0614
pclass (Factor+Higher Order Factors)	68.43	4	<0.0001
All Interactions	5.58	2	0.0614
survived	0.98	1	0.3232
sibsp	0.35	1	0.5548
parch	7.92	1	0.0049
sex × pclass (Factor+Higher Order Factors)	5.58	2	0.0614
TOTAL	82.90	8	<0.0001

pclass and parch are the important predictors of missing age.

12.5 Single Conditional Mean Imputation

Single imputation is not the preferred approach here. Click below to see this section.

Single Imputation and Analysis Result

First try: conditional mean imputation
Default spline transformation for age caused distribution of imputed values to be much different from non-imputed ones; constrain to linear. Also force discrete numeric variables to be linear because knots are hard to determine for them.

Code

xtrans <- transcan(~ I(age) + sex + pclass + I(sibsp) + I(parch),
                   imputed=TRUE, pl=FALSE, pr=FALSE, data=t3)
summary(xtrans)

transcan(x = ~I(age) + sex + pclass + I(sibsp) + I(parch), imputed = TRUE, 
    pr = FALSE, pl = FALSE, data = t3)

Iterations: 4 

R-squared achieved in predicting each variable:

   age    sex pclass  sibsp  parch 
 0.236  0.075  0.232  0.200  0.173 

Adjusted R-squared:

   age    sex pclass  sibsp  parch 
 0.233  0.072  0.229  0.197  0.170 

Coefficients of canonical variates for predicting each (row) variable

       age   sex   pclass sibsp parch
age           1.33  5.98  -3.16 -0.85
sex     0.04       -0.67  -0.04 -0.80
pclass  0.08 -0.32         0.14  0.02
sibsp  -0.02 -0.01  0.08         0.39
parch   0.00 -0.15  0.01   0.28      

Summary of imputed values


Starting estimates for imputed values:

   age    sex pclass  sibsp  parch 
    28      2      3      0      0

Code

# Look at mean imputed values by sex,pclass and observed means
# age.i is age, filled in with conditional mean estimates
age.i <- with(t3, impute(xtrans, age, data=t3))
i <- is.imputed(age.i)
with(t3, tapply(age.i[i], list(sex[i],pclass[i]), mean))

            1st      2nd      3rd
female 37.64677 29.78567 21.67031
male   42.21854 32.55474 26.19231

Code

with(t3, tapply(age, list(sex,pclass), mean, na.rm=TRUE))

            1st      2nd      3rd
female 37.03759 27.49919 22.18531
male   41.02925 30.81540 25.96227

Code

dd   <- datadist(dd, age.i)
f.si <- lrm(survived ~ sex * pclass * rcs(age.i, 4) +
            rcs(age.i, 4) * (sibsp + parch), data=t3, x=TRUE, y=TRUE)
print(f.si, coefs=FALSE)

Logistic Regression Model

lrm(formula = survived ~ sex * pclass * rcs(age.i, 4) + rcs(age.i, 
    4) * (sibsp + parch), data = t3, x = TRUE, y = TRUE)

	Model Likelihood Ratio Test	Discrimination Indexes	Rank Discrim. Indexes
Obs 1309	LR χ² 649.29	R² 0.532	C 0.864
0 809	d.f. 31	R²_31,1309 0.376	D_xy 0.728
1 500	Pr(>χ²) <0.0001	R²_31,927 0.487	γ 0.732
max \|∂log L/∂β\| 0.0006		Brier 0.132	τ_a 0.344

Code

spar(ps=12)
p1 <- Predict(f1,   age,   pclass, sex, sibsp=0, fun=plogis)
p2 <- Predict(f.si, age.i, pclass, sex, sibsp=0, fun=plogis)
p  <- rbind('Casewise Deletion'=p1, 'Single Imputation'=p2,
            rename=c(age.i='age'))   # creates .set. variable
ggplot(p, groups='sex', ylab='Probability of Surviving')
anova(f.si, test='LR')

Likelihood Ratio Statistics for `survived`
	χ²	d.f.	P
sex (Factor+Higher Order Factors)	399.94	12	<0.0001
All Interactions	74.26	11	<0.0001
pclass (Factor+Higher Order Factors)	163.16	16	<0.0001
All Interactions	61.31	14	<0.0001
age.i (Factor+Higher Order Factors)	109.88	24	<0.0001
All Interactions	55.34	21	<0.0001
Nonlinear (Factor+Higher Order Factors)	40.70	16	0.0006
sibsp (Factor+Higher Order Factors)	28.84	4	<0.0001
All Interactions	12.81	3	0.0051
parch (Factor+Higher Order Factors)	1.55	4	0.8177
All Interactions	0.26	3	0.9681
sex × pclass (Factor+Higher Order Factors)	50.28	8	<0.0001
sex × age.i (Factor+Higher Order Factors)	19.61	9	0.0205
Nonlinear (Factor+Higher Order Factors)	15.35	6	0.0177
Nonlinear Interaction : f(A,B) vs. AB	8.33	2	0.0156
pclass × age.i (Factor+Higher Order Factors)	23.86	12	0.0213
Nonlinear (Factor+Higher Order Factors)	19.67	8	0.0117
Nonlinear Interaction : f(A,B) vs. AB	11.63	4	0.0203
age.i × sibsp (Factor+Higher Order Factors)	12.81	3	0.0051
Nonlinear	1.50	2	0.4718
Nonlinear Interaction : f(A,B) vs. AB	1.50	2	0.4718
age.i × parch (Factor+Higher Order Factors)	0.26	3	0.9681
Nonlinear	0.02	2	0.9876
Nonlinear Interaction : f(A,B) vs. AB	0.02	2	0.9876
sex × pclass × age.i (Factor+Higher Order Factors)	11.88	6	0.0647
Nonlinear	10.57	4	0.0318
TOTAL NONLINEAR	40.70	16	0.0006
TOTAL INTERACTION	108.27	23	<0.0001
TOTAL NONLINEAR + INTERACTION	117.26	25	<0.0001
TOTAL	649.29	31	<0.0001

Figure 12.9: Predicted probability of survival for males from fit using casewise deletion (bottom) and single conditional mean imputation (top). is set to zero for these predicted values.

Figure 12.10: Predicted probability of survival for males from fit using casewise deletion (bottom) and single conditional mean imputation (top). is set to zero for these predicted values.

12.6 Multiple Imputation

The following uses aregImpute with predictive mean matching. By default, aregImpute does not transform age when it is being predicted from the other variables. Four knots are used to transform age when used to impute other variables (not needed here as no other missings were present). Since the fraction of observations with missing age is \(\frac{263}{1309} = 0.2\) we use 20 imputations.

Force sibsp and parch to be linear for imputation, because their highly discrete distributions make it difficult to choose knots for splines.

Code

set.seed(17)         # so can reproduce random aspects
mi <- aregImpute(~ age + sex + pclass +
                 I(sibsp) + I(parch) + survived,
                 data=t3, n.impute=20, nk=4, pr=FALSE)
mi


Multiple Imputation using Bootstrap and PMM

aregImpute(formula = ~age + sex + pclass + I(sibsp) + I(parch) + 
    survived, data = t3, n.impute = 20, nk = 4, pr = FALSE)

n: 1309     p: 6    Imputations: 20     nk: 4 

Number of NAs:
     age      sex   pclass    sibsp    parch survived 
     263        0        0        0        0        0 

         type d.f.
age         s    1
sex         c    1
pclass      c    2
sibsp       l    1
parch       l    1
survived    l    1

Transformation of Target Variables Forced to be Linear

R-squares for Predicting Non-Missing Values for Each Variable
Using Last Imputations of Predictors
  age 
0.294

Code

# Print the first 10 imputations for the first 10 passengers
#  having missing age
mi$imputed$age[1:10, 1:10]

    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
16    29 71.0   62   41   24   71 48.0   30   28    33
38    42 58.0   58   64   62   28 51.0   36   29    29
41    42 32.5   55   24   58   60 54.0   47   23    54
47    31 28.5   48   37   60   50 28.5   38   42    47
60    28 42.0   38   31   58   21 45.0    2   61    42
70    38 58.0   30   17   43   39 64.0   52   33    30
71    37 46.0   30   47   30   36 47.0   65   30    40
75    62 46.0   47   70   65   54 21.0   47   46    56
81    24 25.0   17   28   36   29 42.0   56   48    41
107   42 23.0   60   41   46   58 21.0   61   33    62

Show the distribution of imputed (black) and actual ages (gray).

Code

plot(mi)
Ecdf(t3$age, add=TRUE, col='gray', lwd=2,
     subtitles=FALSE)

Figure 12.11: Distributions of imputed and actual ages for the Titanic dataset. Imputed values are in black and actual ages in gray.

Fit logistic models for 20 completed datasets and print the ratio of imputation-corrected variances to average ordinary variances.
Use method of Chan & Meng to get LR tests
This method takes final \(\hat{\beta}\) from a single model fit on 20 stacked completed datasets
But standard errors come from the usual Rubin’s rule and the 20 fits
rms::processMI computes the LR statistics from special information saved by fit.mult.impute triggered by lrt=TRUE
The Hmisc package runifChanged function is used to save the result and not spend 1m running it again until an input changes
The rms LRupdate function is run to fix likelihood ratio-related statistics (LR test, its \(p\)-value, various \(R^2\) measures) using the overall Chan & Meng model LR \(\chi^2\) computed by processMI
Two of the \(R^2\) printed use an effective sample size of 927 for the unbalanced binary survived variable

Code

runmi <- function()
  fit.mult.impute(survived ~ sex * pclass * rcs(age, 4) + rcs(age, 4) * (sibsp + parch),
                  lrm, mi, data=t3, pr=FALSE, lrt=TRUE)  # lrt implies x=TRUE y=TRUE + more
seed <- 17
f.mi <- runifChanged(runmi, seed, mi, t3)
afmi <- processMI(f.mi, 'anova')
# Print imputation penalty indexes
prmiInfo(afmi)

Imputation penalties
Test	Missing Information Fraction	Denominator d.f.	χ² Discount
sex (Factor+Higher Order Factors)	0.131	13387.9	0.869
All Interactions	0.180	6455.1	0.820
pclass (Factor+Higher Order Factors)	0.106	27217.2	0.894
All Interactions	0.154	11285.5	0.846
age (Factor+Higher Order Factors)	0.179	14281.1	0.821
All Interactions	0.175	12960.7	0.825
Nonlinear (Factor+Higher Order Factors)	0.160	11937.3	0.840
sibsp (Factor+Higher Order Factors)	0.209	1744.4	0.791
All Interactions	0.215	1235.9	0.785
parch (Factor+Higher Order Factors)	0.179	2362.9	0.821
All Interactions	0.219	1183.5	0.781
sex × pclass (Factor+Higher Order Factors)	0.153	6502.3	0.847
sex × age (Factor+Higher Order Factors)	0.210	3875.9	0.790
Nonlinear (Factor+Higher Order Factors)	0.223	2293.9	0.777
Nonlinear Interaction : f(A,B) vs. AB	0.000	Inf	1.000
pclass × age (Factor+Higher Order Factors)	0.169	7940.7	0.831
Nonlinear (Factor+Higher Order Factors)	0.186	4413.0	0.814
Nonlinear Interaction : f(A,B) vs. AB	0.181	2330.0	0.819
age × sibsp (Factor+Higher Order Factors)	0.215	1235.9	0.785
Nonlinear	0.147	1765.7	0.853
Nonlinear Interaction : f(A,B) vs. AB	0.147	1765.7	0.853
age × parch (Factor+Higher Order Factors)	0.219	1183.5	0.781
Nonlinear	0.213	837.2	0.787
Nonlinear Interaction : f(A,B) vs. AB	0.213	837.2	0.787
sex × pclass × age (Factor+Higher Order Factors)	0.215	2476.2	0.785
Nonlinear	0.260	1123.0	0.740
TOTAL NONLINEAR	0.160	11937.3	0.840
TOTAL INTERACTION	0.167	15608.7	0.833
TOTAL NONLINEAR + INTERACTION	0.165	17345.0	0.835
TOTAL	0.144	28342.6	0.856

None of the denominator d.f. is small enough for us to worry about the \(\chi^2\) approximation
Take the ratio of selected LR statistics after multiple imputation to that from casewise deletion

Code

afmi

Likelihood Ratio Statistics for `survived`
	χ²	d.f.	P
sex (Factor+Higher Order Factors)	345.17	12	<0.0001
All Interactions	59.41	11	<0.0001
pclass (Factor+Higher Order Factors)	161.47	16	<0.0001
All Interactions	50.55	14	<0.0001
age (Factor+Higher Order Factors)	101.66	24	<0.0001
All Interactions	43.61	21	0.0026
Nonlinear (Factor+Higher Order Factors)	39.97	16	0.0008
sibsp (Factor+Higher Order Factors)	24.23	4	<0.0001
All Interactions	8.94	3	0.0300
parch (Factor+Higher Order Factors)	3.19	4	0.5272
All Interactions	1.72	3	0.6329
sex × pclass (Factor+Higher Order Factors)	42.26	8	<0.0001
sex × age (Factor+Higher Order Factors)	14.42	9	0.1081
Nonlinear (Factor+Higher Order Factors)	11.47	6	0.0748
Nonlinear Interaction : f(A,B) vs. AB	7.94	2	0.0189
pclass × age (Factor+Higher Order Factors)	19.68	12	0.0734
Nonlinear (Factor+Higher Order Factors)	14.76	8	0.0639
Nonlinear Interaction : f(A,B) vs. AB	8.93	4	0.0629
age × sibsp (Factor+Higher Order Factors)	8.94	3	0.0300
Nonlinear	1.26	2	0.5313
Nonlinear Interaction : f(A,B) vs. AB	1.26	2	0.5313
age × parch (Factor+Higher Order Factors)	1.72	3	0.6329
Nonlinear	1.73	2	0.4214
Nonlinear Interaction : f(A,B) vs. AB	1.73	2	0.4214
sex × pclass × age (Factor+Higher Order Factors)	9.11	6	0.1676
Nonlinear	7.66	4	0.1050
TOTAL NONLINEAR	39.97	16	0.0008
TOTAL INTERACTION	87.90	23	<0.0001
TOTAL NONLINEAR + INTERACTION	100.00	25	<0.0001
TOTAL	567.58	31	<0.0001

Code

f.mi <- LRupdate(f.mi, afmi)
print(f.mi, r2=1:4)   # print all 4 imputation-adjusted R^2

Logistic Regression Model

fit.mult.impute(formula = survived ~ sex * pclass * rcs(age, 
    4) + rcs(age, 4) * (sibsp + parch), fitter = lrm, xtrans = mi, 
    data = t3, lrt = TRUE, pr = FALSE)

	Model Likelihood Ratio Test	Discrimination Indexes	Rank Discrim. Indexes
Obs 1309	LR χ² 567.58	R²₁₃₀₉ 0.352	C 0.868
0 809	d.f. 31	R²_31,1309 0.336	D_xy 0.736
1 500	Pr(>χ²) <0.0001	R²₉₂₇ 0.458	γ 0.737
max \|∂log L/∂β\| 3×10^-8		R²_31,927 0.439	τ_a 0.347
		Brier 0.130

	β	S.E.	Wald Z	Pr(>\|Z\|)
Intercept	-0.3199	3.2655	-0.10	0.9220
sex=male	5.8145	4.1248	1.41	0.1586
pclass=2nd	11.5383	8.2719	1.39	0.1631
pclass=3rd	2.3785	3.1614	0.75	0.4518
age	0.2701	0.2149	1.26	0.2087
age'	-0.6430	0.5367	-1.20	0.2309
age''	2.0278	2.2600	0.90	0.3696
sibsp	-0.7625	0.3165	-2.41	0.0160
parch	-0.4562	0.5576	-0.82	0.4133
sex=male × pclass=2nd	-11.5679	8.8617	-1.31	0.1918
sex=male × pclass=3rd	-6.0402	4.1905	-1.44	0.1495
sex=male × age	-0.5758	0.2578	-2.23	0.0255
sex=male × age'	1.2105	0.6099	1.98	0.0472
sex=male × age''	-3.8105	2.5114	-1.52	0.1292
pclass=2nd × age	-0.8021	0.4775	-1.68	0.0930
pclass=3rd × age	-0.3556	0.2096	-1.70	0.0898
pclass=2nd × age'	1.9084	1.0268	1.86	0.0631
pclass=3rd × age'	0.6770	0.5353	1.26	0.2059
pclass=2nd × age''	-6.6070	4.0713	-1.62	0.1046
pclass=3rd × age''	-1.8293	2.3224	-0.79	0.4309
age × sibsp	0.0070	0.0275	0.26	0.7981
age' × sibsp	0.0987	0.0986	1.00	0.3169
age'' × sibsp	-0.4979	0.5199	-0.96	0.3382
age × parch	0.0362	0.0396	0.91	0.3607
age' × parch	-0.1208	0.1115	-1.08	0.2783
age'' × parch	0.4435	0.5094	0.87	0.3839
sex=male × pclass=2nd × age	0.6870	0.5140	1.34	0.1813
sex=male × pclass=3rd × age	0.4564	0.2625	1.74	0.0821
sex=male × pclass=2nd × age'	-1.6435	1.1151	-1.47	0.1405
sex=male × pclass=3rd × age'	-0.7801	0.6367	-1.23	0.2205
sex=male × pclass=2nd × age''	5.7658	4.4553	1.29	0.1956
sex=male × pclass=3rd × age''	1.7728	2.7888	0.64	0.5250

Code

round(afmi[c(1,3,5,30), 'Chi-Square'] / af1[c(1,3,5,30), 'Chi-Square'], 3)

   sex  (Factor+Higher Order Factors) pclass  (Factor+Higher Order Factors) 
                                1.017                                 1.044 
   age  (Factor+Higher Order Factors)                                 TOTAL 
                                0.932                                 1.010

Using all available data resulted in increases in predictive information for sex, pclass and strangely a reduction for age

For each completed dataset run bootstrap validation of model performance indexes and the nonparametric calibration curve. Because the 20 analyses of completed datasets help to average out some of the noise in bootstrap estimates we can use fewer bootstrap repetitions (100) than usual (300 or so).

Code

val <- function(fit)
  list(validate  = validate (fit, B=100),
       calibrate = calibrate(fit, B=100) )

runmi <- function()
  fit.mult.impute(       # 1m
    survived ~ sex * pclass * rcs(age,4) +
    rcs(age,4) * (sibsp + parch),
    lrm, mi, data=t3, pr=FALSE,
    fun=val, fitargs=list(x=TRUE, y=TRUE))
seed <- 19
f <- runifChanged(runmi, seed, mi, t3, val)

Display the 20 bootstrap internal validations averaged over the multiple imputations.
Show the 20 individual calibration curves then the first 3 in more detail followed by the overall calibration estimate

Code

val <- processMI(f, 'validate')
print(val, digits=3)

Index	Original Sample	Training Sample	Test Sample	Optimism	Corrected Index	Successful Resamples
D_xy	0.739	0.754	0.728	0.026	0.713	1496
R²	0.543	0.561	0.503	0.058	0.486	1496
Intercept	0	0	-0.099	0.099	-0.099	1496
Slope	1	1	0.846	0.154	0.846	1496
E_max	0	0	0.055	0.055	0.055	1496
D	0.509	0.531	0.462	0.069	0.44	1496
U	-0.002	-0.002	0.014	-0.015	0.014	1496
Q	0.511	0.532	0.448	0.085	0.426	1496
B	0.129	0.126	0.133	-0.007	0.136	1496
g	2.392	3.135	2.604	0.531	1.861	1496
g_p	0.352	0.357	0.334	0.023	0.329	1496

Code

spar(mfrow=c(2,2), top=1, bot=2)
cal <- processMI(f, 'calibrate', nind=3)


n=1309   Mean absolute error=0.008   Mean squared error=0.00012
0.9 Quantile of absolute error=0.018


n=1309   Mean absolute error=0.008   Mean squared error=1e-04
0.9 Quantile of absolute error=0.016


n=1309   Mean absolute error=0.009   Mean squared error=0.00017
0.9 Quantile of absolute error=0.023


n=1309   Mean absolute error=0.009   Mean squared error=0.00017
0.9 Quantile of absolute error=0.022

Code

# plot(cal) for full-size final calibration curve

Figure 12.12: Estimated calibration curves for the Titanic risk model, accounting for multiple imputation

Figure 12.13: Estimated calibration curves for the Titanic risk model, accounting for multiple imputation

Return to the stacked fit and compare it to the fit from single imputation

Code

p1 <- Predict(f.si,  age.i, pclass, sex, sibsp=0, fun=plogis)
p2 <- Predict(f.mi,  age,   pclass, sex, sibsp=0, fun=plogis)
p  <- rbind('Single Imputation'=p1, 'Multiple Imputation'=p2,
            rename=c(age.i='age'))
ggplot(p, groups='sex', ylab='Probability of Surviving')

Figure 12.14: Predicted probability of survival for males from fit using single conditional mean imputation again (top) and multiple random draw imputation (bottom). Both sets of predictions are for `sibsp`=0.

12.7 Summarizing the Fitted Model

Show odds ratios for changes in predictor values

Code

spar(bot=1, top=0.5, ps=8)
# Get predicted values for certain types of passengers
s <- summary(f.mi, age=c(1,30), sibsp=0:1)
# override default ranges for 3 variables
plot(s, log=TRUE, main='')

Figure 12.15: Odds ratios for some predictor settings

Code

phat <- predict(f.mi,
                combos <-
         expand.grid(age=c(2,21,50),sex=levels(t3$sex),
                     pclass=levels(t3$pclass),
                     sibsp=0, parch=0), type='fitted')
# Can also use Predict(f.mi, age=c(2,21,50), sex, pclass,
#                      sibsp=0, fun=plogis)$yhat
options(digits=1)
data.frame(combos, phat)

   age    sex pclass sibsp parch phat
1    2 female    1st     0     0 0.55
2   21 female    1st     0     0 0.99
3   50 female    1st     0     0 0.96
4    2   male    1st     0     0 0.99
5   21   male    1st     0     0 0.49
6   50   male    1st     0     0 0.28
7    2 female    2nd     0     0 1.00
8   21 female    2nd     0     0 0.88
9   50 female    2nd     0     0 0.80
10   2   male    2nd     0     0 0.99
11  21   male    2nd     0     0 0.11
12  50   male    2nd     0     0 0.07
13   2 female    3rd     0     0 0.87
14  21 female    3rd     0     0 0.58
15  50 female    3rd     0     0 0.45
16   2   male    3rd     0     0 0.81
17  21   male    3rd     0     0 0.15
18  50   male    3rd     0     0 0.05

Code

options(digits=5)

We can also get predicted values by creating an R function that will evaluate the model on demand, but that only works if there are no 3rd-order interactions.

Code

pred.logit <- Function(f.mi)
# Note: if don't define sibsp to pred.logit, defaults to 0
plogis(pred.logit(age=c(2,21,50), sex='male', pclass='3rd'))

A nomogram could be used to obtain predicted values manually, but this is not feasible when so many interaction terms are present.

12.8 Bayesian Analysis

Repeat the multiple imputation-based approach but using a Bayesian binary logistic model
Using default blrm function normal priors on regression coefficients with zero mean and large SD making the priors almost flat
blrm uses the rcmdstan and rstan packages that provides the full power of Stan to R
Here we use cmdstan with rcmdstan
rmsb has its own caching mechanism that efficiently stores the model fit object (and all its posterior draws) and reads it back from disk install of running it again, until one of the inputs change
See this for more about the rmsb package
Could use smaller prior SDs to get penalized estimates
Using 4 independent Markov chain Hamiltonion posterior sampling procedures each with 1000 burn-in iterations that are discarded, and 1000 “real” iterations for a total of 4000 posterior sample draws
Use the first 10 multiple imputations already developed above (object mi), running the Bayesian procedure separately for 10 completed datasets
Merely have to stack the posterior draws into one giant sample to account for imputation and get correct posterior distribution

Code

# Use all available CPU cores less 1.  Each chain will be run on its
# own core.
require(rmsb)
options(mc.cores=parallel::detectCores() - 1, rmsb.backend='cmdstan')
cmdstanr::set_cmdstan_path(cmdstan.loc)
# cmdstan.loc is defined in ~/.Rprofile

# 10 Bayesian analyses took 3m on 11 cores
set.seed(21)
bt <- stackMI(survived ~ sex * pclass * rcs(age, 4) +
          rcs(age, 4) * (sibsp + parch),
          blrm, mi, data=t3, n.impute=10, refresh=25,
          file='bt.rds')

Initial log joint probability = -1664.32 
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
      99      -524.364     0.0830697       0.33118     0.02033           1      122    
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
     199      -524.269      0.062577    0.00585828           1           1      266    
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
     216      -524.269     0.0111121    0.00684597      0.1595      0.3073      291    
Optimization terminated normally:  
  Convergence detected: relative gradient magnitude is below tolerance 
Finished in  0.1 seconds.
Initial log joint probability = -893.849 
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
      99      -541.402      0.332412      0.271638      0.3574     0.03574      142    
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
     160      -541.384   0.000383987    0.00226102    0.001592           1      221    
Optimization terminated normally:  
  Convergence detected: relative gradient magnitude is below tolerance 
Finished in  0.1 seconds.
Initial log joint probability = -1227.07 
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
      99      -530.816      0.073598     0.0982794    0.006771      0.4779      116    
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
     199      -530.713     0.0234462     0.0277328    0.004479           1      269    
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
     293      -530.708    0.00613878    0.00237606      0.3228           1      510    
Optimization terminated normally:  
  Convergence detected: relative gradient magnitude is below tolerance 
Finished in  0.1 seconds.
Initial log joint probability = -960.056 
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
      99       -525.53      0.193644      0.185076           1           1      128    
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
     189      -525.473    0.00858699    0.00719081           1           1      253    
Optimization terminated normally:  
  Convergence detected: relative gradient magnitude is below tolerance 
Finished in  0.1 seconds.
Initial log joint probability = -1127.73 
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
      99       -538.26     0.0354954     0.0374929           1           1      127    
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
     199      -538.222    0.00390884    0.00472831      0.2233    0.002233      342    
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
     299      -538.222    0.00334469    0.00244673      0.2253     0.02253      674    
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
     302      -538.222    0.00131593    0.00240072    0.001258           1      690    
Optimization terminated normally:  
  Convergence detected: relative gradient magnitude is below tolerance 
Finished in  0.2 seconds.
Initial log joint probability = -927.017 
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
      99      -535.691       0.10964     0.0899216           1           1      137    
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
     182      -535.649    0.00460858    0.00297738           1           1      247    
Optimization terminated normally:  
  Convergence detected: relative gradient magnitude is below tolerance 
Finished in  0.1 seconds.
Initial log joint probability = -1685.06 
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
      99      -539.904     0.0709392     0.0313824           1           1      147    
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
     160      -539.898     0.0240232    0.00307569           1           1      218    
Optimization terminated normally:  
  Convergence detected: relative gradient magnitude is below tolerance 
Finished in  0.1 seconds.
Initial log joint probability = -868.114 
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
      99      -540.737      0.122018      0.171922           1           1      125    
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
     171      -540.703     0.0237131    0.00396745           1           1      226    
Optimization terminated normally:  
  Convergence detected: relative gradient magnitude is below tolerance 
Finished in  0.1 seconds.
Initial log joint probability = -891.043 
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
      99        -534.6     0.0530136      0.314309       0.514       0.514      135    
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
     199      -534.568     0.0376798     0.0134564      0.4095           1      262    
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
     241      -534.566      0.011841    0.00188458           1           1      322    
Optimization terminated normally:  
  Convergence detected: relative gradient magnitude is below tolerance 
Finished in  0.1 seconds.
Initial log joint probability = -1064.83 
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
      99       -531.31      0.097006     0.0749049       1.242     0.01242      139    
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
     131      -531.307     0.0221744     0.0045838           1           1      175    
Optimization terminated normally:  
  Convergence detected: relative gradient magnitude is below tolerance 
Finished in  0.1 seconds.

Code

bt

Bayesian Logistic Model

Dirichlet Priors With Concentration Parameter 0.541 for Intercepts

stackMI(formula = survived ~ sex * pclass * rcs(age, 4) + rcs(age, 
    4) * (sibsp + parch), fitter = blrm, xtrans = mi, data = t3, 
    n.impute = 10, refresh = 25, file = "bt.rds")

	Mixed Calibration/ Discrimination Indexes	Discrimination Indexes	Rank Discrim. Indexes
Obs 1309	B 0.132 [0.129, 0.134]	g 2.765 [2.35, 3.296]	C 0.866 [0.862, 0.871]
0 809		g_p 0.36 [0.344, 0.374]	D_xy 0.732 [0.724, 0.742]
1 500		EV 0.468 [0.429, 0.506]
Draws 40000		v 8.058 [4.375, 12.832]
Chains 4		vp 0.111 [0.101, 0.119]
Time 12.8s
Imputations 10
p 31

	Mean β	Median β	S.E.	Lower	Upper	Pr(β>0)	Symmetry
Intercept	-3.0205	-2.0130	5.1825	-14.0433	5.8008	0.3064	0.59
sex=male	9.8789	9.0430	5.8509	-0.3005	22.0918	0.9842	1.50
pclass=2nd	21.6658	20.1308	10.4705	4.1811	42.7053	0.9994	1.51
pclass=3rd	5.4510	4.4494	5.0892	-2.6579	16.6750	0.9065	1.72
age	0.4838	0.4247	0.3420	-0.1046	1.1979	0.9665	1.66
age'	-1.1128	-1.0049	0.8058	-2.7948	0.2901	0.0502	0.67
age''	4.2438	3.8813	3.2417	-1.4595	11.1184	0.9319	1.39
sibsp	-0.9465	-0.9319	0.3194	-1.5858	-0.3346	0.0004	0.88
parch	-0.5111	-0.5872	0.6999	-1.7927	1.1483	0.1661	1.55
sex=male × pclass=2nd	-21.9110	-20.5941	11.1516	-44.5994	-2.3838	0.0054	0.72
sex=male × pclass=3rd	-9.8716	-9.0435	5.9164	-22.0314	0.5144	0.0174	0.67
sex=male × age	-0.8727	-0.8214	0.3728	-1.6361	-0.2138	0.0007	0.66
sex=male × age'	1.8265	1.7277	0.8558	0.2665	3.5586	0.9968	1.42
sex=male × age''	-6.8808	-6.5281	3.4124	-13.7591	-0.5267	0.0078	0.74
pclass=2nd × age	-1.4241	-1.3433	0.6059	-2.6535	-0.3917	0.0001	0.69
pclass=3rd × age	-0.5911	-0.5324	0.3369	-1.3086	-0.0285	0.0079	0.60
pclass=2nd × age'	3.1090	2.9643	1.2821	0.8631	5.7181	0.9996	1.36
pclass=3rd × age'	1.1930	1.0893	0.7990	-0.2282	2.8408	0.9656	1.49
pclass=2nd × age''	-12.2621	-11.7890	4.9905	-22.4140	-3.3101	0.0009	0.77
pclass=3rd × age''	-4.1739	-3.8271	3.2567	-10.7897	1.8132	0.0761	0.73
age × sibsp	0.0170	0.0167	0.0271	-0.0365	0.0703	0.7357	1.04
age' × sibsp	0.0697	0.0690	0.0966	-0.1252	0.2547	0.7658	1.02
age'' × sibsp	-0.4744	-0.4715	0.5179	-1.4627	0.5698	0.1781	0.98
age × parch	0.0416	0.0469	0.0477	-0.0643	0.1305	0.8434	0.68
age' × parch	-0.1317	-0.1401	0.1266	-0.3712	0.1336	0.1432	1.27
age'' × parch	0.5641	0.5902	0.5638	-0.5822	1.6564	0.8459	0.86
sex=male × pclass=2nd × age	1.3160	1.2464	0.6466	0.1471	2.6082	0.9950	1.36
sex=male × pclass=3rd × age	0.7342	0.6827	0.3767	0.0792	1.5185	0.9941	1.49
sex=male × pclass=2nd × age'	-2.8682	-2.7534	1.3757	-5.6419	-0.3494	0.0064	0.78
sex=male × pclass=3rd × age'	-1.3643	-1.2727	0.8708	-3.1439	0.2163	0.0331	0.72
sex=male × pclass=2nd × age''	11.3081	10.9469	5.3885	1.1016	21.9479	0.9922	1.23
sex=male × pclass=3rd × age''	4.0841	3.8061	3.5694	-2.7989	11.1282	0.8869	1.27

Note that fit indexes have HPD uncertainty intervals
Everthing above accounts for imputation
Look at diagnostics

Separate Diagnostics for Each of 10 Imputed Datasets

Code

stanDx(bt)

Diagnostics for each of 10 imputations

Iterations: 2000 on each of 4 chains, with 4000 posterior distribution samples saved

For each parameter, n_eff is a crude measure of effective sample size
and Rhat is the potential scale reduction factor on split chains
(at convergence, Rhat=1)


Imputation 1 

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.

EBFMI: 0.947 1.026 0.901 0.946 

   Parameter  Rhat ESS bulk ESS tail
1   alpha[1] 1.002     1131     1489
2    beta[1] 1.004      978     1109
3    beta[2] 1.004      783      856
4    beta[3] 1.003     1729     1759
5    beta[4] 1.002     1073     1266
6    beta[5] 1.004      910     1264
7    beta[6] 1.004     1202     1714
8    beta[7] 1.001     2145     2321
9    beta[8] 1.001     2973     2958
10   beta[9] 1.006      657      723
11  beta[10] 1.002     1629     1762
12  beta[11] 1.004      973     1193
13  beta[12] 1.006      815      945
14  beta[13] 1.001     1518     1878
15  beta[14] 1.005      760      857
16  beta[15] 1.001     1528     2211
17  beta[16] 1.006      794      806
18  beta[17] 1.000     2056     2092
19  beta[18] 1.006      922     1096
20  beta[19] 1.001     2147     2645
21  beta[20] 1.000     4300     2970
22  beta[21] 1.000     3649     3064
23  beta[22] 1.001     3877     2949
24  beta[23] 1.000     3413     2238
25  beta[24] 1.000     4272     2972
26  beta[25] 1.000     5249     3108
27  beta[26] 1.005      697      747
28  beta[27] 1.001     1090     1597
29  beta[28] 1.006      743      758
30  beta[29] 1.001     1771     1858
31  beta[30] 1.004      773      951
32  beta[31] 1.002     2308     2427

Imputation 2 

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.

EBFMI: 0.983 1.018 1.005 0.933 

   Parameter  Rhat ESS bulk ESS tail
1   alpha[1] 1.004     1173     1210
2    beta[1] 1.002     1099      907
3    beta[2] 1.006      720      766
4    beta[3] 1.001     1901     1837
5    beta[4] 1.003      946      978
6    beta[5] 1.005      831      816
7    beta[6] 1.001     1111     1197
8    beta[7] 1.001     1924     2501
9    beta[8] 1.002     2877     2625
10   beta[9] 1.006      692      642
11  beta[10] 1.001     1840     2010
12  beta[11] 1.010      808      859
13  beta[12] 1.002      810      848
14  beta[13] 1.002     1124     1312
15  beta[14] 1.005      698      665
16  beta[15] 1.003     1760     2642
17  beta[16] 1.006      680      644
18  beta[17] 1.001     2512     2413
19  beta[18] 1.004      911      962
20  beta[19] 1.000     2455     2667
21  beta[20] 1.002     4090     2796
22  beta[21] 1.003     4269     2948
23  beta[22] 1.001     4215     2998
24  beta[23] 1.000     3231     2885
25  beta[24] 1.000     4463     3132
26  beta[25] 1.001     3589     2774
27  beta[26] 1.005      602      603
28  beta[27] 1.005     1326     1267
29  beta[28] 1.006      648      675
30  beta[29] 1.001     2156     2437
31  beta[30] 1.004      830      837
32  beta[31] 0.999     2387     2316

Imputation 3 

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.

EBFMI: 1.014 0.813 0.992 0.868 

   Parameter  Rhat ESS bulk ESS tail
1   alpha[1] 1.004     1134     1433
2    beta[1] 1.004      896      921
3    beta[2] 1.005      779     1208
4    beta[3] 1.001     2044     2494
5    beta[4] 1.004      982     1291
6    beta[5] 1.004      941     1601
7    beta[6] 1.002     1416     1736
8    beta[7] 1.003     1966     2363
9    beta[8] 1.001     1976     2175
10   beta[9] 1.005      746     1007
11  beta[10] 1.003     1986     2386
12  beta[11] 1.004      920     1274
13  beta[12] 1.005      973     1077
14  beta[13] 1.002     1686     2574
15  beta[14] 1.005      807     1101
16  beta[15] 1.003     1442     1763
17  beta[16] 1.005      786      951
18  beta[17] 1.003     1621     2093
19  beta[18] 1.002      954     1365
20  beta[19] 1.002     1918     2587
21  beta[20] 1.000     3543     2893
22  beta[21] 1.001     3744     3022
23  beta[22] 1.001     3356     2790
24  beta[23] 1.002     2217     2582
25  beta[24] 1.003     2132     2555
26  beta[25] 1.001     2760     3121
27  beta[26] 1.007      701      808
28  beta[27] 1.005     1142     1336
29  beta[28] 1.005      733      831
30  beta[29] 1.003     1535     2385
31  beta[30] 1.005      738      980
32  beta[31] 1.000     2403     2023

Imputation 4 

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.

EBFMI: 0.876 0.977 1.025 0.956 

   Parameter  Rhat ESS bulk ESS tail
1   alpha[1] 1.002     1078     1314
2    beta[1] 1.002     1023     1122
3    beta[2] 1.003      660      888
4    beta[3] 1.001     2031     2086
5    beta[4] 1.002      948     1261
6    beta[5] 1.003      713      980
7    beta[6] 1.000      984     1325
8    beta[7] 1.002     2385     2907
9    beta[8] 1.001     2379     2669
10   beta[9] 1.003      610      761
11  beta[10] 1.001     1883     1635
12  beta[11] 1.002      790     1007
13  beta[12] 1.002      829      987
14  beta[13] 1.001     1049     1356
15  beta[14] 1.002      638      785
16  beta[15] 1.002     1293     2028
17  beta[16] 1.002      679      813
18  beta[17] 1.002     1788     1874
19  beta[18] 1.002      752     1109
20  beta[19] 1.002     1710     1733
21  beta[20] 1.000     3103     3075
22  beta[21] 1.002     2452     3059
23  beta[22] 1.001     3254     3035
24  beta[23] 1.001     2179     2060
25  beta[24] 1.000     2724     2593
26  beta[25] 1.000     3486     2954
27  beta[26] 1.003      626      819
28  beta[27] 1.003     1169     1638
29  beta[28] 1.003      601      837
30  beta[29] 1.000     1658     2452
31  beta[30] 1.003      772      897
32  beta[31] 1.000     2542     2325

Imputation 5 

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.

EBFMI: 0.996 0.984 0.946 0.887 

   Parameter  Rhat ESS bulk ESS tail
1   alpha[1] 1.001     1117     1226
2    beta[1] 1.005      927     1139
3    beta[2] 1.004      812      859
4    beta[3] 1.000     1809     1695
5    beta[4] 1.005     1036     1155
6    beta[5] 1.003      998     1081
7    beta[6] 1.002     1274     1502
8    beta[7] 1.002     2018     2240
9    beta[8] 1.000     2950     2806
10   beta[9] 1.006      782      789
11  beta[10] 1.000     1644     1903
12  beta[11] 1.002     1007      912
13  beta[12] 1.006      869      832
14  beta[13] 1.001     1332     1497
15  beta[14] 1.004      866      822
16  beta[15] 1.001     1654     1457
17  beta[16] 1.005      817      826
18  beta[17] 1.001     2144     2149
19  beta[18] 1.002     1016     1020
20  beta[19] 1.000     2228     2173
21  beta[20] 1.001     3274     3098
22  beta[21] 1.000     3477     3132
23  beta[22] 1.002     3272     2724
24  beta[23] 1.001     2443     2525
25  beta[24] 1.002     3550     3009
26  beta[25] 1.001     3017     2929
27  beta[26] 1.006      799      700
28  beta[27] 1.004     1356     1434
29  beta[28] 1.005      784      677
30  beta[29] 1.001     1581     2205
31  beta[30] 1.004      900      827
32  beta[31] 1.000     2076     1822

Imputation 6 

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.

EBFMI: 1.022 1.019 0.974 0.97 

   Parameter  Rhat ESS bulk ESS tail
1   alpha[1] 1.004      827     1209
2    beta[1] 1.002      749      970
3    beta[2] 1.007      577      943
4    beta[3] 1.004     1664     1340
5    beta[4] 1.002      801     1110
6    beta[5] 1.006      689     1038
7    beta[6] 1.005      972     1662
8    beta[7] 1.001     1666     2587
9    beta[8] 1.001     3869     3321
10   beta[9] 1.006      562      718
11  beta[10] 1.004     1366     1421
12  beta[11] 1.003      656     1071
13  beta[12] 1.006      752     1177
14  beta[13] 1.005     1134     1529
15  beta[14] 1.006      572      904
16  beta[15] 1.001     1050     1540
17  beta[16] 1.005      554      878
18  beta[17] 1.001     2415     2804
19  beta[18] 1.002      702     1131
20  beta[19] 1.004     1913     1663
21  beta[20] 1.000     3816     3178
22  beta[21] 1.000     3332     2949
23  beta[22] 1.001     2687     2952
24  beta[23] 1.001     3502     3063
25  beta[24] 1.000     4287     2791
26  beta[25] 1.001     3576     2879
27  beta[26] 1.004      540      745
28  beta[27] 1.002      966     1340
29  beta[28] 1.004      551      627
30  beta[29] 1.003     1970     2297
31  beta[30] 1.003      635     1017
32  beta[31] 1.005     1812     1510

Imputation 7 

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.

EBFMI: 1 1.1 0.929 0.946 

   Parameter  Rhat ESS bulk ESS tail
1   alpha[1] 1.002     1246     1787
2    beta[1] 1.001     1017     1473
3    beta[2] 1.002      869     1180
4    beta[3] 1.002     1854     1818
5    beta[4] 1.002     1068     1526
6    beta[5] 1.001     1020     1314
7    beta[6] 1.002     1442     2032
8    beta[7] 1.000     1800     2231
9    beta[8] 1.001     4200     2774
10   beta[9] 1.003      795      941
11  beta[10] 1.001     1857     1879
12  beta[11] 1.003      937     1341
13  beta[12] 1.003     1026     1473
14  beta[13] 1.000     1404     2048
15  beta[14] 1.002      834      951
16  beta[15] 1.001     1572     1831
17  beta[16] 1.003      835     1066
18  beta[17] 1.003     1936     2345
19  beta[18] 1.002     1124     1415
20  beta[19] 1.000     1874     1994
21  beta[20] 1.001     3833     3389
22  beta[21] 1.001     5411     3219
23  beta[22] 1.000     4538     3060
24  beta[23] 1.001     3458     2855
25  beta[24] 1.000     5278     2930
26  beta[25] 1.001     4645     2871
27  beta[26] 1.002      714      955
28  beta[27] 1.002     1361     1828
29  beta[28] 1.002      716      915
30  beta[29] 1.001     1862     2546
31  beta[30] 1.001      899     1228
32  beta[31] 1.001     2021     1911

Imputation 8 

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.

EBFMI: 1.031 0.972 0.897 0.987 

   Parameter  Rhat ESS bulk ESS tail
1   alpha[1] 1.003      963     1613
2    beta[1] 1.002      893     1070
3    beta[2] 1.008      621      844
4    beta[3] 1.003     2277     1886
5    beta[4] 1.004      871     1253
6    beta[5] 1.006      629      845
7    beta[6] 1.005     1211     1704
8    beta[7] 1.002     1841     1793
9    beta[8] 1.001     3794     3265
10   beta[9] 1.009      602      665
11  beta[10] 1.002     2111     1884
12  beta[11] 1.004      732     1216
13  beta[12] 1.008      880     1163
14  beta[13] 1.004     1310     1524
15  beta[14] 1.010      577      759
16  beta[15] 1.002     1454     2255
17  beta[16] 1.007      580      824
18  beta[17] 1.002     3563     2700
19  beta[18] 1.006      700     1039
20  beta[19] 1.001     2172     1788
21  beta[20] 1.001     4867     3302
22  beta[21] 1.003     4094     2683
23  beta[22] 1.001     4457     3060
24  beta[23] 1.003     4404     2915
25  beta[24] 1.000     5341     3139
26  beta[25] 1.000     4889     2975
27  beta[26] 1.008      586      694
28  beta[27] 1.001     1116     1725
29  beta[28] 1.008      610      637
30  beta[29] 1.002     1704     1817
31  beta[30] 1.005      689      791
32  beta[31] 1.002     2548     2608

Imputation 9 

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.

EBFMI: 0.924 0.949 1.048 0.873 

   Parameter  Rhat ESS bulk ESS tail
1   alpha[1] 1.007     1245     1539
2    beta[1] 1.003     1131     1645
3    beta[2] 1.006      963     1395
4    beta[3] 1.001     1992     2069
5    beta[4] 1.002     1078     1827
6    beta[5] 1.004     1221     1517
7    beta[6] 1.002     1398     2061
8    beta[7] 1.003     2191     2480
9    beta[8] 1.000     3201     2724
10   beta[9] 1.007      776      934
11  beta[10] 1.003     1764     1875
12  beta[11] 1.010      947     1720
13  beta[12] 1.004      936     1313
14  beta[13] 1.002     1475     2202
15  beta[14] 1.006      825     1239
16  beta[15] 1.006     1349     1559
17  beta[16] 1.005      870     1398
18  beta[17] 1.001     1430     1690
19  beta[18] 1.005     1090     1771
20  beta[19] 1.002     1828     1945
21  beta[20] 1.000     4021     2979
22  beta[21] 1.000     5302     3268
23  beta[22] 1.001     4189     2611
24  beta[23] 1.001     3201     2506
25  beta[24] 1.002     4451     2940
26  beta[25] 1.001     4050     2875
27  beta[26] 1.007      816     1074
28  beta[27] 1.002     1172     1954
29  beta[28] 1.007      762     1095
30  beta[29] 1.002     2135     2499
31  beta[30] 1.006      926     1383
32  beta[31] 1.002     1962     2420

Imputation 10 

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.

EBFMI: 0.918 0.993 1.017 0.986 

   Parameter  Rhat ESS bulk ESS tail
1   alpha[1] 1.002     1153     1368
2    beta[1] 1.003      919     1068
3    beta[2] 1.004      829      788
4    beta[3] 1.001     1982     1986
5    beta[4] 1.004      837     1018
6    beta[5] 1.004      978      999
7    beta[6] 1.002     1360     1799
8    beta[7] 1.001     2389     2635
9    beta[8] 1.001     3843     2645
10   beta[9] 1.006      701      644
11  beta[10] 1.001     2059     1832
12  beta[11] 1.005      736      890
13  beta[12] 1.004      842      799
14  beta[13] 1.002     1266     1590
15  beta[14] 1.004      812      707
16  beta[15] 1.002     1633     2276
17  beta[16] 1.005      771      748
18  beta[17] 1.001     2799     2706
19  beta[18] 1.002     1023     1080
20  beta[19] 1.001     2235     1898
21  beta[20] 1.004     5972     2887
22  beta[21] 1.002     5879     3130
23  beta[22] 1.001     5231     2983
24  beta[23] 1.002     4081     2890
25  beta[24] 1.001     4328     2651
26  beta[25] 1.000     4838     2890
27  beta[26] 1.006      626      583
28  beta[27] 1.004      967     1549
29  beta[28] 1.006      573      613
30  beta[29] 1.002     2032     2626
31  beta[30] 1.005      707      991
32  beta[31] 1.001     2171     1954

Code

# Look at convergence of only 2 parameters
stanDxplot(bt, c('sex=male', 'pclass=3rd', 'age'), rev=TRUE)

Difficult to see but there are 40 traces (10 imputations \(\times\) 4 chains)
Diagnostics look good; posterior samples can be trusted
Plot posterior densities for select parameters
Also shows the 10 densities before stacking

Code

plot(bt, c('sex=male', 'pclass=3rd', 'age'), nrow=2)

Plot partial effect plots with 0.95 highest posterior density intervals

Code

p <- Predict(bt, age, sex, pclass, sibsp=0, fun=plogis, funint=FALSE)
ggplot(p)

Compute approximate measure of explained outcome variation for predictors

Code

plot(anova(bt))

Contrast second class males and females, both at 5 years and 30 years of age, all other things being equal
Compute 0.95 HPD interval for the contrast and a joint uncertainty region
Compute P(both contrasts < 0), both < -2, and P(either one < 0)

Code

k <- contrast(bt, list(sex='male',   age=c(5, 30), pclass='2nd'),
                  list(sex='female', age=c(5, 30), pclass='2nd'),
              cnames = c('age 5 M-F', 'age 30 M-F'))
k

            age Contrast    S.E.    Lower   Upper Pr(Contrast>0)
1age 5 M-F    5  -9.8164 6.77214 -23.4452  1.3229          0.027
2age 30 M-F  30  -4.8962 0.62181  -6.1261 -3.7082          0.000

Intervals are 0.95 highest posterior density intervals
Contrast is the posterior mean

Code

plot(k)

Code

plot(k, bivar=TRUE)                        # assumes an ellipse
plot(k, bivar=TRUE, bivarmethod='kernel')  # doesn't
P <- PostF(k, pr=TRUE)

Contrast names: age 5 M-F, age 30 M-F

Code

P(`age 5 M-F` <  0 & `age 30 M-F` <  0)    # note backticks

[1] 0.97305

Code

P(`age 5 M-F` < -2 & `age 30 M-F` < -2)

[1] 0.90935

Code

P(`age 5 M-F` <  0 | `age 30 M-F` <  0)

[1] 1

Show posterior distribution of predicted survival probability for a 21 year old male in third class with sibsp=0
Predict summarizes with a posterior mean (set posterior.summary='median' to use posterior median)
Frequentist multiple imputation estimate was 0.1342

Code

pmean <- Predict(bt, age=21, sex='male', pclass='3rd', sibsp=0, parch=0,
                 fun=plogis, funint=FALSE)
pmean

  age  sex pclass sibsp parch    yhat   lower   upper
1  21 male    3rd     0     0 0.14641 0.09952 0.19909

Response variable (y):  

Limits are 0.95 confidence limits

Code

p <- predict(bt,
             data.frame(age=21, sex='male', pclass='3rd', sibsp=0, parch=0),
             posterior.summary='all', fun=plogis, funint=FALSE)
plot(density(p), main='',
     xlab='Pr(survival) For One Covariate Combination')
abline(v=with(pmean, c(yhat, lower, upper)), col=alpha('blue', 0.5))

Compute Pr(survival probability > 0.2) for this man

Code

mean(p > 0.2)

[1] 0.02465

`R` software used
Package	Purpose	Functions
`Hmisc`	Miscellaneous functions	`summary,plsmo,naclus,llist,latex, summarize,Dotplot,describe`
`Hmisc`	Imputation	`transcan,impute,fit.mult.impute,aregImpute,stackMI`
`rms`	Modeling	`datadist,lrm,rcs`
	Accounting for imputation	`processMI, LRupdate`
	Model presentation	`plot,summary,nomogram,Function,anova`
	Estimation	`Predict,summary,contrast`
	Model validation	`validate,calibrate`
`rmsb`	Misc. Bayesian	`blrm`, `stanDx`,`stanDxplot`,`plot`
`rpart`¹	Recursive partitioning	`rpart`

¹ Written by Atkinson and Therneau