1 Introduction

1.1 Markdown

This is an R Markdown html document using the template that is here. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

require(rms)

knitrSet(lang='markdown')

The following (r hidingTOC(buttonLabel=“Outline”)) uses the Hmisc hidingTOC function to define HTML styles related to a floating table of contents that can be minimized or be collapsed to major outline levels. For more details see this.

2 Data

2.1 Setup

getHdata(titanic3)  # Get the dataset from the VU DataSets page
mu <- markupSpecs$html   # markupSpecs is in Hmisc
subtext <- mu$subtext
code    <- mu$code

2.2 Data Dictionary

html(contents(titanic3), maxlevels=10, levelType='table')

Data frame:titanic3

1309 observations and 14 variables, maximum # NAs:1188

Name	Labels	Units	Levels	Storage	NAs
pclass			3	integer	0
survived	Survived			integer	0
name	Name			character	0
sex			2	integer	0
age	Age	Year		double	263
sibsp	Number of Siblings/Spouses Aboard			integer	0
parch	Number of Parents/Children Aboard			integer	0
ticket	Ticket Number			character	0
fare	Passenger Fare	British Pound (\243)		double	1
cabin			187	integer	0
embarked			3	integer	2
boat			28	integer	0
body	Body Identification Number			integer	1188
home.dest	Home/Destination			character	0

Variable	Levels
pclass	1st
	2nd
	3rd
sex	female
	male
cabin
	A10
	A11
	A14
	A16
	A18
	A19
	A20
	A21
	A23
	...
embarked	Cherbourg
	Queenstown
	Southampton
boat
	1
	10
	11
	12
	13
	13 15
	13 15 B
	14
	15
	...

2.3 Descriptive Statistics
for the `titanic3` dataset

# Set graphics type so that Hmisc and rms packages use plotly
# Chunk header height=150 is in pixels
# For certain print methods set to use html
options(grType='plotly', prType='html')
s <- summaryM(age + pclass ~ sex, data=titanic3)
html(s)

	N	female N=466	male N=843
Descriptive Statistics (N=1309).
Age Year	1046	19 27 38	21 28 39
pclass : 1st	1309	0.31 (144)	0.21 (179)
2nd		0.23 (106)	0.20 (171)
3rd		0.46 (216)	0.58 (493)
a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. N is the number of non-missing values. Numbers after proportions are frequencies.

plot(s)

$Categorical

$Continuous

d <- describe(titanic3)
plot(d)

$Categorical


$Continuous

The following doesn’t work because it overlays two different legends

# Try combining two plots into one
p <- plot(d)
plotly::subplot(p[[1]], p[[2]],
                nrows=2, heights=c(.3, .7), which_layout=1)

3 Logistic Regression Model

dd <- datadist(titanic3); options(datadist='dd')
f <- lrm(survived ~ rcs(sqrt(age),5) * sex, data=titanic3)
print(f)

Logistic Regression Model

 lrm(formula = survived ~ rcs(sqrt(age), 5) * sex, data = titanic3)

Frequencies of Missing Values Due to Each Variable

 survived      age      sex 
        0      263        0

	Model Likelihood Ratio Test	Discrimination Indexes	Rank Discrim. Indexes
Obs 1046	LR χ² 352.09	R² 0.386	C 0.805
0 619	d.f. 9	g 1.379	D_xy 0.611
1 427	Pr(>χ²) <0.0001	g_r 3.971	γ 0.615
max \|∂log L/∂β\| 3×10^-12		g_p 0.291	τ_a 0.295
		Brier 0.164

	β	S.E.	Wald Z	Pr(>\|Z\|)
Intercept	0.6477	0.6867	0.94	0.3456
age	0.0303	0.2384	0.13	0.8989
age'	0.3147	0.9461	0.33	0.7394
age''	-2.4377	14.1220	-0.17	0.8630
age'''	5.2441	28.5120	0.18	0.8541
sex=male	1.3458	0.9518	1.41	0.1574
age × sex=male	-1.0364	0.3313	-3.13	0.0018
age' × sex=male	1.8695	1.2843	1.46	0.1455
age'' × sex=male	-15.1439	18.3677	-0.82	0.4097
age''' × sex=male	17.8230	36.2175	0.49	0.6226

latex(f)

\[{\rm Prob}\{{\rm survived}=1\} = \frac{1}{1+\exp(-X\beta)}, {\rm \ \ where} \\ \] \begin{eqnarray*} X\hat{\beta}= & & \\ & & 0.6476552 \\ & & + 0.03027852 {\rm age}+0.01114664 ({\rm age}-2.236068)_{+}^{3}-0.08633279({\rm age}-4.582576)_{+}^{3} \\ & & +0.1857246 ({\rm age}-5.291503)_{+}^{3}-0.1516536 ({\rm age}-6.082763)_{+}^{3} \\ & & +0.04111512 ({\rm age}-7.549834)_{+}^{3} \\ & & +1.345826[{\rm male}] \\ & & +[{\rm male}][-1.036388 {\rm age}+0.06620937 ({\rm age}-2.236068)_{+}^{3} \\ & & -0.5363316 ({\rm age}-4.582576)_{+}^{3}+0.6312123 ({\rm age}-5.291503)_{+}^{3} \\ & & -0.1266968 ({\rm age}-6.082763)_{+}^{3}-0.03439327({\rm age}-7.549834)_{+}^{3} ] \\ \end{eqnarray*} and \([c]=1\) if subject is in group \(c\), 0 otherwise; \((x)_{+}=x\) if \(x > 0\), 0 otherwise
\({\rm age}\) is pre--transformed as \({\rm \sqrt{age}}\).

a <- anova(f)
print(a)

	χ²	d.f.	P
Wald Statistics for `survived`
age (Factor+Higher Order Factors)	35.57	8	<0.0001
All Interactions	24.48	4	<0.0001
Nonlinear (Factor+Higher Order Factors)	14.25	6	0.0270
sex (Factor+Higher Order Factors)	266.08	5	<0.0001
All Interactions	24.48	4	<0.0001
age × sex (Factor+Higher Order Factors)	24.48	4	<0.0001
Nonlinear	3.80	3	0.2842
Nonlinear Interaction : f(A,B) vs. AB	3.80	3	0.2842
TOTAL NONLINEAR	14.25	6	0.0270
TOTAL NONLINEAR + INTERACTION	33.25	7	<0.0001
TOTAL	280.99	9	<0.0001

plot(a)

s <- summary(f, age=c(2, 21))
plot(s, log=TRUE)

print(s, dec=2)

	Low	High	Δ	Effect	S.E.	Lower 0.95	Upper 0.95
Effects Response: `survived`
age	2	21	19	-2.19	0.43	-3.03	-1.34
Odds Ratio	2	21	19	0.11		0.05	0.26
sex --- female:male	2	1		2.44	0.29	1.86	3.02
Odds Ratio	2	1		11.48		6.45	20.43

ggplot(Predict(f, age, sex), height=500, width=650)  # uses ggplotly()

plotp(Predict(f, age, sex))                          # uses plotly directly

plot(nomogram(f, fun=plogis, funlabel='Prob(survive)'))

4 Survival Plots for `pbc` Dataset

Hover over the curves to see particular probability estimates and numbers at risk. Click on legend components to show/hide components.

getHdata(pbc)
pbc <- upData(pbc, 
              fu.yrs = fu.days / 365.25,
              units = c(fu.yrs = 'year'))

Input object size:   80952 bytes;    19 variables    418 observations
Added variable      fu.yrs
New object size:    85136 bytes;    20 variables    418 observations

f <- npsurv(Surv(fu.yrs, status) ~ spiders, data=pbc)
survplotp(f, time.inc=1, times=c(5, 10), fun=function(y) 1 - y)

5 Computing Environment

 R version 3.5.1 (2018-07-02)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Ubuntu 18.10
 
 Matrix products: default
 BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
 
 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base     
 
 other attached packages:
 [1] bindrcpp_0.2.2  rms_5.1-3       SparseM_1.77    Hmisc_4.2-0    
 [5] ggplot2_3.1.0   Formula_1.2-3   survival_2.43-3 lattice_0.20-38

To cite R in publication use:

R Core Team (2018). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

Examples for rms Package

FE Harrell

2018-12-19

1 Introduction

1.1 Markdown

2 Data

2.1 Setup

2.2 Data Dictionary

Data frame:titanic3

2.3 Descriptive Statistics
for the `titanic3` dataset

3 Logistic Regression Model

4 Survival Plots for `pbc` Dataset

5 Computing Environment

Examples for rms Package

FE Harrell

2018-12-19

1 Introduction

1.1 Markdown

2 Data

2.1 Setup

2.2 Data Dictionary

Data frame:titanic3

2.3 Descriptive Statisticsfor the titanic3 dataset

3 Logistic Regression Model

4 Survival Plots for pbc Dataset

5 Computing Environment

2.3 Descriptive Statistics
for the `titanic3` dataset

4 Survival Plots for `pbc` Dataset