1 Introduction

1.1 Markdown

This is an R Markdown html document using the template that is here. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

require(rms)
knitrSet(lang='markdown')

The following (r hidingTOC(buttonLabel=“Outline”)) uses the Hmisc hidingTOC function to define HTML styles related to a floating table of contents that can be minimized or be collapsed to major outline levels. For more details see this.


2 Data

2.1 Setup

getHdata(titanic3)  # Get the dataset from the VU DataSets page
mu <- markupSpecs$html   # markupSpecs is in Hmisc
subtext <- mu$subtext
code    <- mu$code

2.2 Data Dictionary

html(contents(titanic3), maxlevels=10, levelType='table')

Data frame:titanic3

1309 observations and 14 variables, maximum # NAs:1188  
NameLabelsUnitsLevelsStorageNAs
pclass 3integer 0
survivedSurvivedinteger 0
nameNamecharacter 0
sex 2integer 0
ageAgeYeardouble 263
sibspNumber of Siblings/Spouses Aboardinteger 0
parchNumber of Parents/Children Aboardinteger 0
ticketTicket Numbercharacter 0
farePassenger FareBritish Pound (\243)double 1
cabin187integer 0
embarked 3integer 2
boat 28integer 0
bodyBody Identification Numberinteger1188
home.destHome/Destinationcharacter 0

VariableLevels
pclass1st
2nd
3rd
sexfemale
male
cabin
A10
A11
A14
A16
A18
A19
A20
A21
A23
...
embarkedCherbourg
Queenstown
Southampton
boat
1
10
11
12
13
13 15
13 15 B
14
15
...

2.3 Descriptive Statistics
for the titanic3 dataset

# Set graphics type so that Hmisc and rms packages use plotly
# Chunk header height=150 is in pixels
# For certain print methods set to use html
options(grType='plotly', prType='html')
s <- summaryM(age + pclass ~ sex, data=titanic3)
html(s)
Descriptive Statistics (N=1309).
N
female
N=466
male
N=843
Age
Year
1046 19 27 38 21 28 39
pclass : 1st 1309 0.31 (144) 0.21 (179)
  2nd 0.23 (106) 0.20 (171)
  3rd 0.46 (216) 0.58 (493)
a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables.   N is the number of non-missing values. Numbers after proportions are frequencies.
plot(s)
$Categorical

$Continuous
d <- describe(titanic3)
plot(d)
$Categorical

$Continuous

The following doesn’t work because it overlays two different legends

# Try combining two plots into one
p <- plot(d)
plotly::subplot(p[[1]], p[[2]],
                nrows=2, heights=c(.3, .7), which_layout=1)

3 Logistic Regression Model

dd <- datadist(titanic3); options(datadist='dd')
f <- lrm(survived ~ rcs(sqrt(age),5) * sex, data=titanic3)
print(f)
Logistic Regression Model
 lrm(formula = survived ~ rcs(sqrt(age), 5) * sex, data = titanic3)
 
Frequencies of Missing Values Due to Each Variable
 survived      age      sex 
        0      263        0 
 
Model Likelihood
Ratio Test
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 1046 LR χ2 352.09 R2 0.386 C 0.805
0 619 d.f. 9 g 1.379 Dxy 0.611
1 427 Pr(>χ2) <0.0001 gr 3.971 γ 0.615
max |∂log L/∂β| 3×10-12 gp 0.291 τa 0.295
Brier 0.164
β S.E. Wald Z Pr(>|Z|)
Intercept   0.6477   0.6867 0.94 0.3456
age   0.0303   0.2384 0.13 0.8989
age'   0.3147   0.9461 0.33 0.7394
age''   -2.4377  14.1220 -0.17 0.8630
age'''   5.2441  28.5120 0.18 0.8541
sex=male   1.3458   0.9518 1.41 0.1574
age × sex=male   -1.0364   0.3313 -3.13 0.0018
age' × sex=male   1.8695   1.2843 1.46 0.1455
age'' × sex=male  -15.1439  18.3677 -0.82 0.4097
age''' × sex=male   17.8230  36.2175 0.49 0.6226
latex(f)
\[{\rm Prob}\{{\rm survived}=1\} = \frac{1}{1+\exp(-X\beta)}, {\rm \ \ where} \\ \] \begin{eqnarray*} X\hat{\beta}= & & \\ & & 0.6476552 \\ & & + 0.03027852 {\rm age}+0.01114664 ({\rm age}-2.236068)_{+}^{3}-0.08633279({\rm age}-4.582576)_{+}^{3} \\ & & +0.1857246 ({\rm age}-5.291503)_{+}^{3}-0.1516536 ({\rm age}-6.082763)_{+}^{3} \\ & & +0.04111512 ({\rm age}-7.549834)_{+}^{3} \\ & & +1.345826[{\rm male}] \\ & & +[{\rm male}][-1.036388 {\rm age}+0.06620937 ({\rm age}-2.236068)_{+}^{3} \\ & & -0.5363316 ({\rm age}-4.582576)_{+}^{3}+0.6312123 ({\rm age}-5.291503)_{+}^{3} \\ & & -0.1266968 ({\rm age}-6.082763)_{+}^{3}-0.03439327({\rm age}-7.549834)_{+}^{3} ] \\ \end{eqnarray*} and \([c]=1\) if subject is in group \(c\), 0 otherwise; \((x)_{+}=x\) if \(x > 0\), 0 otherwise
\({\rm age}\) is pre--transformed as \({\rm \sqrt{age}}\).
a <- anova(f)
print(a)
Wald Statistics for survived
χ2 d.f. P
age (Factor+Higher Order Factors) 35.57 8 <0.0001
All Interactions 24.48 4 <0.0001
Nonlinear (Factor+Higher Order Factors) 14.25 6 0.0270
sex (Factor+Higher Order Factors) 266.08 5 <0.0001
All Interactions 24.48 4 <0.0001
age × sex (Factor+Higher Order Factors) 24.48 4 <0.0001
Nonlinear 3.80 3 0.2842
Nonlinear Interaction : f(A,B) vs. AB 3.80 3 0.2842
TOTAL NONLINEAR 14.25 6 0.0270
TOTAL NONLINEAR + INTERACTION 33.25 7 <0.0001
TOTAL 280.99 9 <0.0001
plot(a)
s <- summary(f, age=c(2, 21))
plot(s, log=TRUE)
print(s, dec=2)
Effects   Response: survived
Low High Δ Effect S.E. Lower 0.95 Upper 0.95
age 2 21 19 -2.19 0.43 -3.03 -1.34
Odds Ratio 2 21 19 0.11 0.05 0.26
sex --- female:male 2 1 2.44 0.29 1.86 3.02
Odds Ratio 2 1 11.48 6.45 20.43
ggplot(Predict(f, age, sex), height=500, width=650)  # uses ggplotly()
plotp(Predict(f, age, sex))                          # uses plotly directly
plot(nomogram(f, fun=plogis, funlabel='Prob(survive)'))

4 Survival Plots for pbc Dataset

Hover over the curves to see particular probability estimates and numbers at risk. Click on legend components to show/hide components.

getHdata(pbc)
pbc <- upData(pbc, 
              fu.yrs = fu.days / 365.25,
              units = c(fu.yrs = 'year'))
Input object size:   80952 bytes;    19 variables    418 observations
Added variable      fu.yrs
New object size:    85136 bytes;    20 variables    418 observations
f <- npsurv(Surv(fu.yrs, status) ~ spiders, data=pbc)
survplotp(f, time.inc=1, times=c(5, 10), fun=function(y) 1 - y)

5 Computing Environment

 R version 3.5.1 (2018-07-02)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Ubuntu 18.10
 
 Matrix products: default
 BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
 
 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base     
 
 other attached packages:
 [1] bindrcpp_0.2.2  rms_5.1-3       SparseM_1.77    Hmisc_4.2-0    
 [5] ggplot2_3.1.0   Formula_1.2-3   survival_2.43-3 lattice_0.20-38
 
To cite R in publication use:

R Core Team (2018). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.