Figure 21.1: Raw and spline-smoothed scaled Schoenfeld residuals for dose of estrogen, nonlinearly coded from the Cox model fit, with \(\pm\) 2 standard errors.
None of the effects significantly change over time
Global test of PH \(P=0.52\)
21.3 Testing Interactions
Will ignore non-PH for dose even though it makes sense
More accurate predictions could be obtained using stratification or time dep. cov.
Test all interactions with dose Reduce to 1 d.f. as before
Code
z.dose <- z[,"rx"] # same as saying z[,1] - get first columnz.other <- z[,-1] # all but the first column of zf.ia <-cph(S ~ z.dose * z.other, x=TRUE, y=TRUE)anova(f.ia, test='LR')
Likelihood Ratio Statistics for S
χ2
d.f.
P
z.dose (Factor+Higher Order Factors)
20.65
11
0.0371
All Interactions
11.89
10
0.2926
z.other (Factor+Higher Order Factors)
121.19
20
<0.0001
All Interactions
11.89
10
0.2926
z.dose × z.other (Factor+Higher Order Factors)
11.89
10
0.2926
TOTAL
130.26
21
<0.0001
21.4 Describing Predictor Effects
G
Plot relationship between each predictor and \(\log \lambda\)
Figure 21.2: Shape of each predictor on log hazard of death. \(Y\)-axis shows \(X\hat{\beta}\), but the predictors not plotted are set to reference values. Note the highly non-monotonic relationship with ap, and the increased slope after age 70 which has been found in outcome models for various diseases.
21.5 Validating the Model
H
Validate for \(D_{xy}\) and slope shrinkage
Code
set.seed(1) # so can reproduce resultsv <-validate(f, B=300)v
Index
Original
Sample
Training
Sample
Test
Sample
Optimism
Corrected
Index
Lower
Upper
Successful
Resamples
Dxy
0.3208
0.3494
0.2953
0.0541
0.2667
0.2365
0.3562
300
R2
0.2101
0.2481
0.1756
0.0724
0.1377
0.1203
0.2434
300
Slope
1
1
0.7863
0.2137
0.7863
0.6662
0.9321
300
D
0.0292
0.0354
0.0239
0.0116
0.0176
0.0144
0.0359
300
U
-5e-04
-5e-04
0.0024
-0.0029
0.0024
0.0023
0.0024
300
Q
0.0297
0.0359
0.0215
0.0144
0.0153
0.012
0.0336
300
g
0.7174
0.7999
0.629
0.1708
0.5466
0.4998
0.7959
300
Shrinkage surprisingly close to heuristic estimate of 0.79
Now validate 5-year survival probability estimates
Code
cal <-calibrate(f, B=300, u=5*12, maxdim=3)
Using Cox survival estimates at 60 Months
Code
plot(cal)
Figure 21.3: Bootstrap estimate of calibration accuracy for 5-year estimates from the final Cox model, using adaptive linear spline hazard regression. Line nearer the ideal line corresponds to apparent predictive accuracy. The blue curve corresponds to bootstrap-corrected estimates.
```{r include=FALSE}require(Hmisc)options(qproject='rms', prType='html')require(qreport)getRs('qbookfun.r')hookaddcap()knitr::set_alias(w ='fig.width', h ='fig.height', cap ='fig.cap', scap ='fig.scap')```# Case Study in Cox Regression {#sec-coxcase}Note that all the analyses presented here may be done in a more general context - see @sec-ordsurv## Choosing the Number of Parameters and Fitting the Model`r mrg(sound("cox-case-1"))`* Clinical trial of estrogen for prostate cancer* Response is time to death, all causes* Base analysis on Cox proportional hazards model [@cox72]* $S(t | X)$ = probability of surviving at least to time $t$ given set of predictor values $X$* $S(t | X) = S_{0}(t)^{\exp(X\beta)}$* Censor time to death at time of last follow-up for patients still alive at end of study (treat survival time for pt.\ censored at 24m as 24m+)* Use simple, partial approaches to data reduction* Use `transcan` for single imputation* Again combine last 2 categories for `ekg,pf`* See if we can use a full additive model (4 knots for continuous $X$)`r ipacue()`| Predictor | Name | d.f. | Original Levels ||---|---|---|---|Dose of estrogen | `rx` | 3 | placebo, 0.2, 1.0, 5.0 mg estrogen |Age in years | `age` | 3 | |Weight index: wt(kg)-ht(cm)+200 | `wt` | 3 | |Performance rating | `pf` | 2 | normal, in bed <50% of time, in bed >50%, in bed always |History of cardiovascular disease | `hx` | 1 |present/absent |Systolic blood pressure/10 | `sbp` | 3 | |Diastolic blood pressure/10 | `dbp` | 3 | |Electrocardiogram code | `ekg` | 5 | normal, benign, rhythm disturb., block, strain, old myocardial infarction, new MI |Serum hemoglobin (g/100ml) | `hg` | 3 | |Tumor size (cm$^2$) | `sz` | 3 | |Stage/histologic grade combination | `sg` | 3 | |Serum prostatic acid phosphatase | `ap` | 3 | |Bone metastasis | `bm` | 1 | present/absent |* Total of 36 candidate d.f.* Impute missings and estimate shrinkage `r ipacue()````{r}require(rms)options(prType='html') # for print, summary, anova, validategetHdata(prostate)levels(prostate$ekg)[levels(prostate$ekg) %in%c('old MI','recent MI')] <-'MI'# combines last 2 levels and uses a new name, MIprostate$pf.coded <-as.integer(prostate$pf)# save original pf, re-code to 1-4levels(prostate$pf) <-c(levels(prostate$pf)[1:3],levels(prostate$pf)[3])# combine last 2 levelsw <-transcan(~ sz + sg + ap + sbp + dbp + age + wt + hg + ekg + pf + bm + hx,imputed=TRUE, data=prostate, pl=FALSE, pr=FALSE)attach(prostate)sz <-impute(w, sz, data=prostate)sg <-impute(w, sg, data=prostate)age <-impute(w, age,data=prostate)wt <-impute(w, wt, data=prostate)ekg <-impute(w, ekg,data=prostate)dd <-datadist(prostate)options(datadist='dd')units(dtime) <-'Month'S <-Surv(dtime, status!='alive')f <-cph(S ~ rx +rcs(age,4) +rcs(wt,4) + pf + hx +rcs(sbp,4) +rcs(dbp,4) + ekg +rcs(hg,4) +rcs(sg,4) +rcs(sz,4) +rcs(log(ap),4) + bm)print(f, coefs=FALSE)```* Global LR $\chi^2$ is 135 and very significant $\rightarrow$ modeling warranted `r ipacue()`* AIC on $\chi^2$ scale = $136.2 - 2 \times 36 = 64.2$* Rough shrinkage: 0.74 ($\frac{136.2 - 36}{136.2}$)* Informal data reduction (increase for `ap`) `r ipacue()`| Variables | Reductions | d.f. Saved ||-----------|------------|------------||`wt`| Assume variable not important enough for 4 knots; use 3 | 1 ||`pf`| Assume linearity | 1 ||`hx,ekg`| Make new 0,1,2 variable and assume linearity: 2=`hx` and `ekg` not normal or benign, 1=either, 0=none | 5 ||`sbp,dbp`| Combine into mean arterial bp and use 3 knots: map=$\frac{2}{3}$ `dbp` $+ \frac{1}{3}$ `sbp`| 4 ||`sg`| Use 3 knots | 1 ||`sz`| Use 3 knots | 1 ||`ap`| Look at shape of effect of `ap` in detail, and take log before expanding as spline to achieve stability: add 1 knot | -1 |: Data reduction strategy```{r}heart <- hx + ekg %nin%c('normal','benign')label(heart) <-'Heart Disease Code'map <- (2*dbp + sbp)/3label(map) <-'Mean Arterial Pressure/10'dd <-datadist(dd, heart, map)f <-cph(S ~ rx +rcs(age,4) +rcs(wt,3) + pf.coded + heart +rcs(map,3) +rcs(hg,4) +rcs(sg,3) +rcs(sz,3) +rcs(log(ap),5) + bm,x=TRUE, y=TRUE, surv=TRUE, time.inc=5*12)print(f, coefs=FALSE)# x, y for anova LR, predict, validate, calibrate;# surv, time.inc for calibrateanova(f, test='LR')```* Savings of 12 d.f. `r ipacue()`* AIC=70, shrinkage 0.80## Checking Proportional Hazards {#sec-coxcase-check-ph}`r mrg(sound("cox-case-2"))`* This is our tentative model* Examine distributional assumptions using scaled Schoenfeld residuals* Complication arising from predictors using multiple d.f.* Transform to 1 d.f. empirically using $X\hat{\beta}$* `cox.zph` does this automatically* Following analysis approx. since internal coefficients estimated```{r }z <-predict(f, type='terms')# required x=T above to store design matrixf.short <-cph(S ~ z, x=TRUE, y=TRUE)# store raw x, y so can get residuals```* Fit `f.short` has same LR $\chi^2$ of 118 as the fit `f`, `r ipacue()` but with falsely low d.f.* All $\beta=1$```{r rx-ph,cap='Raw and spline-smoothed scaled Schoenfeld residuals for dose of estrogen, nonlinearly coded from the Cox model fit, with $\\pm$ 2 standard errors.',scap='Schoenfeld residuals for dose of estrogen in Cox model'}#| label: fig-coxcase-rx-phrequire(survival) # or use survival::cox.zph(...)phtest <-cox.zph(f, transform='identity')phtestplot(phtest[1]) # plot only the first variable```* None of the effects significantly change over time* Global test of PH $P=0.52$## Testing Interactions`r mrg(sound("cox-case-3"))`* Will ignore non-PH for dose even though it makes sense* More accurate predictions could be obtained using stratification or time dep. cov.* Test all interactions with dose <br> Reduce to 1 d.f. as before```{r}z.dose <- z[,"rx"] # same as saying z[,1] - get first columnz.other <- z[,-1] # all but the first column of zf.ia <-cph(S ~ z.dose * z.other, x=TRUE, y=TRUE)anova(f.ia, test='LR')```## Describing Predictor Effects`r ipacue()`* Plot relationship between each predictor and $\log \lambda$```{r cox-shapes,h=6,w=6.75,cap='Shape of each predictor on log hazard of death. $Y$-axis shows $X\\hat{\\beta}$, but the predictors not plotted are set to reference values. Note the highly non-monotonic relationship with `ap`, and the increased slope after age 70 which has been found in outcome models for various diseases.',scap='Shapes of predictors for log hazard in prostate cancer'}#| label: fig-coxcase-shapesggplot(Predict(f), sepdiscrete='vertical', nlevels=4, vnames='names')```## Validating the Model`r ipacue()` * Validate for $D_{xy}$ and slope shrinkage```{r}set.seed(1) # so can reproduce resultsv <- validate(f, B=300)v```* Shrinkage surprisingly close to heuristic estimate of 0.79* Now validate 5-year survival probability estimates```{r cal-cox,cap='Bootstrap estimate of calibration accuracy for 5-year estimates from the final Cox model, using adaptive linear spline hazard regression. Line nearer the ideal line corresponds to apparent predictive accuracy. The blue curve corresponds to bootstrap-corrected estimates.',scap='Bootstrap estimates of calibration accuracy in prostate cancer model'}#| label: fig-coxcase-calcal <-calibrate(f, B=300, u=5*12, maxdim=3)plot(cal)```## Presenting the Model`r mrg(sound("cox-case-4"))`* Display hazard ratios, overriding default for `ap````{r summary-cox,cap='Hazard ratios and multi-level confidence bars for effects of predictors in model, using default ranges except for `ap`',scap='Hazard ratios for prostate survival model'}#| label: fig-coxcase-summaryspar(top=1)plot(summary(f, ap=c(1,20)), log=TRUE, main='')```* Draw nomogram, with predictions stated 4 ways```{r cox-nomogram,w=6,h=7,cap='Nomogram for predicting death in prostate cancer trial'}#| label: fig-coxcase-nomogramspar(ps=8)surv <- Survival(f)surv3 <- function(x) surv(3*12,lp=x)surv5 <- function(x) surv(5*12,lp=x)quan <- Quantile(f)med <- function(x) quan(lp=x)/12ss <- c(.05,.1,.2,.3,.4,.5,.6,.7,.8,.9,.95)nom <- nomogram(f, ap=c(.1,.5,1,2,3,4,5,10,20,30,40), fun=list(surv3, surv5, med), funlabel=c('3-year Survival','5-year Survival', 'Median Survival Time (years)'), fun.at=list(ss, ss, c(.5,1:6)))plot(nom, xfrac=.65, lmgp=.35)``````{r echo=FALSE}saveCap('21')```