18  Parametric Survival Models

18.1 Homogeneous Models (No Predictors)

Why use a parametric model?

  1. easily compute selected quantiles of the survival distribution
  2. estimate (usually by extrapolation) the expected failure time
  3. derive a concise equation and smooth function for estimating \(S(t)\), \(\Lambda(t)\), and \(\lambda(t)\)
  4. estimate \(S(t)\) more precisely than \(S_{\rm KM}(t)\) or \(S_{\Lambda}(t)\) (Altschuler-Nelson-Fleming-Harrington estimator) if the parametric form is correctly specified.

Note: Fitting more than two smooth survival curves and choosing the one that best reproduces the KM estimator will result in a true precision no better than KM due to model uncertainty.

18.1.1 Specific Models

  • Seen exponential and Weibull already
  • Many others obtained by assuming \(\log(T)\) has a certain dist.
  • Log-normal: \(S(t) = 1 - \Phi(\frac{\log(t)-\mu}{\sigma})\)
  • Log-logistic: \(S(t) = [1 + \exp(-\frac{\log(t)-\mu}{\sigma})]^{-1}\)
  • Log-extreme value: \(S(t) = \exp[-\exp(\frac{\log(t)-\mu}{\sigma})]\)
    another way of expressing Weibull

18.1.2 Estimation

  • Log-likelihood for exponential distribution

\[ \log L = \sum_{i:Y_{i} {\rm\ uncensored}}^{n} \log \lambda - \sum_{i=1}^{n} \lambda Y_{i} \]

\[\begin{array}{ccc} \hat{\lambda} &=& n_{u}/w \\ {\rm var}(\hat{\lambda}) &=& n_{u}/w^{2} \\ {\rm var}(\log \hat{\lambda}) &=& 1/n_{u} \\ \hat{\mu} &=& w/n_{u} \\ \hat{S}(t) &=& \exp(-\hat{\lambda}t) \end{array}\]

Consider these failure time data:

\[\begin{array}{c} 1 \ \ 3\ \ 3\ \ 6^{+}\ \ 8^{+}\ \ 9\ \ 10^{+} . \nonumber \end{array}\]
Code
require(rms)
S <- Surv(c(1, 3, 3, 6, 8, 9, 10), c(1,1,1,0,0,1,0))
fe <- psm(S ~ 1, dist='exponential')
f2 <- psm(S ~ 1, dist='weibull')
\[\begin{array}{ccc} n_{u} &=& 4 \\ w &=& 40 \\ \hat{\mu} &=& 10 \pm 5 \\ T_{0.5} &=& 10 \log(2) \end{array}\]
  • Weibull fit
\[\begin{array}{ccc} \hat{\alpha} &=& 0.0728 \nonumber \\ \hat{\gamma} &=& 1.164 \nonumber \\ \hat{S}(t) &=& \exp(-0.0728t^{1.164}) \\ \hat{S}^{-1}(0.5) &=& [(\log 2)/\hat{\alpha}]^{1/\hat{\gamma}} = 6.935 \\ {\rm (estimated\ median)} \nonumber \end{array}\]

18.1.3 Assessment of Model Fit

  • Example: Weibull

\[ \log[-\log S(t)] = \log \Lambda(t) = \log \alpha +\gamma (\log t) \]

  • Plot \(\log \hat{\Lambda}(t)\) versus \(\log t\)
  • For assumed dist. \(S(t)\) plot \(S^{-1}[S_{\Lambda}(t)]\) or \(S^{-1}[S_{\rm KM}(t)]\) against \(t\), check for linearity
  • Log-distributions: plot vs. \(\log t\)
  • Check log-normal: plot \(\Phi^{-1}[S_{\Lambda}(t)]\) vs. \(\log t\)
  • Check log-logistic: plot \({\rm logit}[S_{\Lambda}(t)]\) vs. \(\log t\)
  • Alternative: plot fitted \(\hat{S}(t)\) and \(S_{\Lambda}(t)\) vs. \(t\) on the same graph

18.2 Parametric Proportional Hazards Models

18.2.1 Model

\[ \lambda(t|X) = \lambda(t) \exp(X\beta) \]

\[\begin{array}{ccc} \Lambda(t|X) &=& \Lambda(t) \exp(X\beta) \nonumber \\ S(t|X) &=& \exp[-\Lambda(t)\exp(X\beta)] = \exp[-\Lambda(t)]^{\exp(X\beta)} \end{array}\]

\[ S(t|X) = S(t)^{\exp(X\beta)} , \]

18.2.2 Model Assumptions and Interpretation of Parameters

\[\begin{array}{ccc} \log \lambda(t|X) &=& \log \lambda(t)+X\beta \nonumber \\ \log \Lambda(t|X) &=& \log \Lambda(t)+X\beta \end{array}\]

Assumptions:

  • Underlying functions (\(\lambda\), \(\Lambda\), \(S\))
  • Linear effect of predictors on \(\log\lambda\), \(\log\Lambda\)
  • No interaction between \(X\) and \(t\) \(\rightarrow\) impact same over time
\[\begin{array}{ccc} \beta_{j} &=& \log \lambda(t|X_{1}, X_{2}, \ldots, X_{j}+1, X_{j+1}, \ldots, X_{k}) \\ &-& \log \lambda(t|X_{1}, \ldots, X_{j}, \ldots, X_{k}) , \end{array}\]
  • Effect of increasing \(X_{j}\) by \(d\) is to increase \(\lambda\) by factor of \(\exp(\beta_{j}d)\)
  • One binary predictor:
\[\begin{array}{ccc} \lambda(t|X_{1}=0) &=& \lambda(t) \nonumber \\ \lambda(t|X_{1}=1) &=& \lambda(t)\exp(\beta_{1}) \end{array}\]

Here \(\exp(\beta_{1})\) is the \(X_{1}=1:X_{1}=0\) hazard ratio.

  • One continuous predictor:

\[ \lambda(t|X_{1}) = \lambda(t)\exp(\beta_{1}X) \]

18.2.3 Hazard Ratio, Risk Ratio, and Risk Difference

\(S_{T}=S_{C}^{0.5}\)

Mortality differences and ratios when hazard ratio is 0.5.
Code
spar(bty='l')
plot(0, 0, type="n", xlab="Survival for Control Subject",
     ylab="Improvement in Survival",
     xlim=c(0,1), ylim=c(0,.7))
i <- 0
hr <- seq(.1, .9, by=.1)
for(h in hr) {
  i <- i + 1
  p <- seq(.0001, .9999, length=200)
  p2 <- p^h
  d <- p2 - p
  lines(p, d, lty=i)
  maxd <- max(d)
  smax <- p[d==maxd]
  text(smax,maxd+.02, format(h), cex=.6)
}
Figure 18.1: Absolute clinical benefit as a function of survival in a control subject and the relative benefit (hazard ratio). The hazard ratios are given for each curve.

18.2.4 Specific Models

  • Exponential:
\[\begin{array}{ccc} \lambda(t|X) &=& \lambda \exp(X\beta) \nonumber \\ S(t|X) &=& \exp[-\lambda t \exp(X\beta)] = \exp(-\lambda t)^{\exp(X\beta)} \end{array}\] \[\begin{array}{ccc} E\{T|X\} &=& 1/[\lambda \exp(X\beta)] \nonumber \\ T_{0.5}|X &=& (\log 2)/[\lambda \exp(X\beta)] \end{array}\]
  • Weibull:
\[\begin{array}{ccc} \lambda(t|X) &=& \alpha\gamma t^{\gamma-1} \exp(X\beta) \nonumber \\ \Lambda(t|X) &=& \alpha t^{\gamma} \exp(X\beta) \nonumber \nonumber \\ S(t|X) &=& \exp[-\alpha t^{\gamma} \exp(X\beta)] \\ &=& [\exp(-\alpha t^{\gamma})]^{\exp(X\beta)} \nonumber \end{array}\]

\[ T_{0.5}|X=\{\log 2 / [\alpha \exp(X\beta)]\}^{1/\gamma} . \]

For numerical reasons, re-write:

\[\begin{array}{ccc} S(t|X) &=& \exp(-\Lambda(t|X)) , {\rm\ \ \ where} \nonumber \\ \Lambda(t|X) &=& \exp(\gamma \log t + X\beta) \end{array} \tag{18.1}\]

See also spline hazard models Kooperberg et al. (1995) and the generalized gamma distribution (Cox et al., 2007).

18.2.5 Assessment of Model Fit

Figure 18.2: PH Model with one binary predictor. \(Y\)-axis is \(\log \lambda(t)\) or \(\log \Lambda(t)\). For \(\log \Lambda(t)\), the curves must be non-decreasing. For \(\log \lambda(t)\), they may be any shape.

If \(\lambda(t)\) is Weibull, the two curves will be linear if \(\log t\) is plotted instead of \(t\) on the \(x\)-axis.

Figure 18.3: PH model with one continuous predictor. \(Y\)-axis is \(\log \lambda(t)\) or \(\log \Lambda(t)\). For \(\log \Lambda(t)\), drawn for \(t_{2}>t_{1}\). The slope of each line is \(\beta_{1}\).
Figure 18.4: PH model with one continuous predictor. \(Y\)-axis is \(\log \lambda(t)\) or \(\log \Lambda(t)\). For \(\log \lambda\), the functions need not be monotonic.
Figure 18.5: Regression assumptions, linear additive PH or AFT model with two predictors. For PH, \(Y\)-axis is \(\log \lambda(t)\) or \(\log \Lambda(t)\) for a fixed \(t\). For AFT, \(Y\)-axis is \(\log(T)\).
  • Weibull: Stratify on \(X\), plot \(\log \Lambda_{\rm KM}(t|X {\rm\ stratum})\) vs. \(\log t\).
  • Assesses PH in addition to shape assumptions–all curves should be parallel as well as straight.

18.3 Accelerated Failure Time Models

18.3.1 Model

  • Specifies that predictors act multiplicatively on failure time
  • Alters rate subject proceeds along time axis

\[S(t|X) = \psi(\frac{\log(t)-X\beta}{\sigma}) \tag{18.2}\]

\[\begin{array}{ccc} \frac{\log(T)-X\beta}{\sigma} &\sim& \psi \\ \log(T) = X\beta + \sigma\epsilon \\ \epsilon \sim \psi \end{array}\]
  • Weibull (and exponential) members of PH and AFT

18.3.2 Model Assumptions and Interpretation of Parameters

\[ \psi^{-1}(S(t|X)) = \frac{\log(t)-X\beta}{\sigma} \tag{18.3}\]

Letting \(\epsilon \sim \psi\)

\[ \log(T) = X\beta + \sigma\epsilon \]

Check that residuals \(\log(T)-X\hat{\beta} \sim \psi\) (within scale factor). The assumptions of the AFT model are thus

  1. The true form of \(\psi\) (the distributional family) is correctly specified.
  2. In the absence of nonlinear and interaction terms, each \(X_j\) affects \(\log(T)\) or \(\psi^{-1}(S(t|X))\) linearly.
  3. Implicit in these assumptions is that \(\sigma\) is a constant independent of \(X\).

1-unit change in \(X_{j} = \beta_{j}\) change in \(\log T\), or increase \(T\) by factor of \(\exp(\beta_{j})\).
Median survival time:

\[ T_{0.5}|X = \exp(X\beta + \sigma \psi^{-1}(0.5)) \]

18.3.3 Specific Models

  • Extreme value: \(\psi(u)=\exp(-\exp(u))\)
  • Logistic: \(\psi(u)=[1+\exp(u)]^{-1}\)
  • Normal: \(\psi(u)=1 - \Phi(u)\)
  • Log-normal:

\[ S(t|X) = 1 - \Phi(\frac{\log(t)-X\beta}{\sigma}), \]

  • Log-logistic:

\[ S(t|X) = [1 + \exp(\frac{\log(t) - X\beta}{\sigma})]^{-1}. \]

18.3.4 Estimation

Works better if \(\sigma\) parameterized as \(\exp(\delta)\).

\[\begin{array}{ccc} \hat{S}(t|X) &=& \psi(\frac{\log(t)-X\hat{\beta}}{\hat{\sigma}}) \nonumber \\ \hat{T}_{0.5}|X &=& \exp[X\hat{\beta} + \hat{\sigma} \psi^{-1}(0.5)]. \end{array}\]

Normal and logistic: \(\hat{T}_{0.5}|X = \exp(X\hat{\beta})\)

\[ \psi(\frac{\log(t)-X\hat{\beta}}{\hat{\sigma}}\pm z_{1-\alpha/2}\times s) \]

18.3.5 Residuals

For an AFT model, standardized residuals are simply

\[ r = (\log(T)-X\hat{\beta})/\sigma \tag{18.4}\]

When \(T\) is right-censored, \(r\) is right-censored.

18.3.6 Assessment of Model Fit

Figure 18.6: AFT model with one predictor. \(Y\)-axis is \(\psi^{-1}(S(t|X)) = \frac{\log(t)-X\beta}{\sigma}\). Drawn for \(d>c\). The slope of the lines is \(\sigma^{-1}\).
Figure 18.7: AFT model with one continuous predictor. \(Y\)-axis is \(\psi^{-1}(S(t|X)) = \frac{\log(t)-X\beta}{\sigma}\). Drawn for \(t_{2}>t_{1}\). The slope of each line is \(\beta_{1}/\sigma\) and the difference between the lines is \(\frac{1}{\sigma}\log(t_{2}/t_{1})\).
Group 1 143 164 188 188 190 192 206 209 213 216
220 227 230 234 246 265 304 216\(^{+}\) 244\(^{+}\)
Group 2 142 156 163 198 205 232 232 233 233 233
233 239 240 261 280 280 296 296 323 204\(^{+}\)
344\(^{+}\)
Code
spar(mfrow=c(2,2), top=1, bot=2, mgp=c(2.75, .365, 0))
getHdata(kprats)
kprats$group <- factor(kprats$group, 0:1, c('Group 1', 'Group 2'))
dd <- datadist(kprats); options(datadist="dd")
S <- with(kprats, Surv(t, death))
f <- npsurv(S ~ group, type="fleming", data=kprats)
survplot(f, n.risk=TRUE, conf='none',   
         label.curves=list(keys='lines'), levels.only=TRUE)
title(sub="Nonparametric estimates", adj=0, cex=.7)
# Check fits of Weibull, log-logistic, log-normal
xl <- c(4.8, 5.9)
survplot(f, loglog=TRUE, logt=TRUE, conf="none", xlim=xl,
         label.curves=list(keys='lines'), levels.only=TRUE)
title(sub="Weibull (extreme value)", adj=0, cex=.7)
survplot(f, fun=function(y)log(y/(1-y)), ylab="logit S(t)",
         logt=TRUE, conf="none", xlim=xl,
         label.curves=list(keys='lines'), levels.only=TRUE)
title(sub="Log-logistic", adj=0, cex=.7)
survplot(f, fun=qnorm, ylab="Inverse Normal S(t)",
         logt=TRUE, conf="none",
         xlim=xl,cex.label=.7,
         label.curves=list(keys='lines'), levels.only=TRUE)
title(sub="Log-normal", adj=0, cex=.7)
Figure 18.8: Altschuler-Nelson-Fleming-Harrington nonparametric survival estimates for rats treated with DMBA (Pike, 1966), along with various transformations of the estimates for checking distributional assumptions of 3 parametric survival models.

Fit Weibull (in aft form), log-logistic, and log-normal models.

Code
fw <- psm(S ~ group, data=kprats, dist='weibull')
fl <- psm(S ~ group, data=kprats, dist='loglogistic',
          y=TRUE)
fn <- psm(S ~ group, data=kprats, dist='lognormal')
bld <- function(x) knitr::asis_output(paste0('**', x, '** :\n\n'))
bld('Weibull default form')

Weibull default form :

Code
latex(fw)
\[\Pr(T\geq t) = \exp[-\exp( \frac{\log(t)-X\beta}{0.1832976} )]~\mathrm{where}\] \[\begin{array} \lefteqn{X\hat{\beta}=}\\ & & 5.450859 \\ & & +0.131983[\mathrm{Group\ 2}] \\ \end{array}\]

\[[c]=1~\mathrm{if~subject~is~in~group}~c,~0~\mathrm{otherwise}\]

Code
bld('Weibull PH form')

Weibull PH form :

Code
latex(pphsm(fw))
\[\Pr(T\geq t) = \exp(-t^{ 5.455608 } \exp(X\hat{\beta}))~\mathrm{where}~~\] \[\begin{array} \lefteqn{X\hat{\beta}=}\\ & & -29.73775 \\ & & -0.7200475[\mathrm{Group\ 2}] \\ \end{array}\]

\[[c]=1~\mathrm{if~subject~is~in~group}~c,~0~\mathrm{otherwise}\]

Code
bld('Log-logistic')

Log-logistic :

Code
latex(fl)
\[\Pr(T\geq t) = [1+\exp( \frac{\log(t)-X\beta}{0.1159753} )]^{-1}~\mathrm{where}\] \[\begin{array} \lefteqn{X\hat{\beta}=}\\ & & 5.375675 \\ & & +0.1051005[\mathrm{Group\ 2}] \\ \end{array}\]

\[[c]=1~\mathrm{if~subject~is~in~group}~c,~0~\mathrm{otherwise}\]

Code
bld('Log-normal')

Log-normal :

Code
latex(fn)
\[\Pr(T\geq t) = 1-\Phi( \frac{\log(t)-X\beta}{0.2100184} )~\mathrm{where}\] \[\begin{array} \lefteqn{X\hat{\beta}=}\\ & & 5.375328 \\ & & +0.0930606[\mathrm{Group\ 2}] \\ \end{array}\]

\[[c]=1~\mathrm{if~subject~is~in~group}~c,~0~\mathrm{otherwise}\]

Group effects from three survival models
  • More general approach to verifying distributional assumptions:
  • Plot nonparametric estimate of survival distribution of \(r\)
  • Superimpose theoretical standardized distribution
  • Can get distribution of residuals separately by strata — should all have same standardized distribution (e.g., same \(\sigma\))
Code
survplot(f, conf.int=FALSE,   
         levels.only=TRUE, label.curves=list(keys='lines'))
survplot(fl, add=TRUE, label.curves=FALSE, conf.int=FALSE)
Figure 18.9: Agreement between fitted log-logistic model and nonparametric survival estimates for rat vaginal cancer data
Code
r <- resid(fl, 'cens')
survplot(npsurv(r ~ group, data=kprats),
         conf='none', xlab='Residual',
         label.curves=list(keys='lines'), levels.only=TRUE)
survplot(npsurv(r ~ 1), conf='none', add=TRUE, col='red')
lines(r, lwd=1, col='blue')   
Figure 18.10: Kaplan-Meier estimates of distribution of standardized, censored residuals from the log-logistic model, along with the assumed standard log-logistic distribution (blue). Red step function is the estimated distribution of all residuals; black step functions are the estimated distributions of residuals stratified by group, as indicated.

Derive R code for median, mean, hazard, survival functions

Code
med   <- Quantile(fl)
med
function (q = 0.5, lp = NULL, parms = -2.15437773933124) 
{
    names(parms) <- NULL
    f <- function(lp, q, parms) lp + exp(parms) * logb(q/(1 - 
        q))
    names(q) <- format(q)
    drop(exp(outer(lp, q, FUN = f, parms = parms)))
}
<environment: namespace:rms>
Code
meant <- Mean(fl)
meant
function (lp = NULL, parms = -2.15437773933124) 
{
    names(parms) <- NULL
    if (exp(parms) > 1) 
        rep(Inf, length(lp))
    else exp(lp) * pi * exp(parms)/sin(pi * exp(parms))
}
<environment: namespace:rms>
Code
haz   <- Hazard(fl)
haz
function (times = NA, lp = NULL, parms = -2.15437773933124) 
{
    t.trans <- logb(times)
    t.deriv <- 1/times
    scale <- exp(parms)
    names(t.trans) <- format(times)
    t.deriv/scale/(1 + exp(-(t.trans - lp)/scale))
}
<environment: namespace:rms>
Code
surv  <- Survival(fl)
surv
function (times = NULL, lp = NULL, parms = -2.15437773933124) 
{
    1/(1 + exp((logb(times) - lp)/exp(parms)))
}
<environment: namespace:rms>

Show fitted hazard function from log-logistic, and add median survival time to graph

Code
spar(ps=9,top=1,bot=1,left=1,mgp=c(2.75,.365,0))
# Plot estimated hazard functions and add median
# survival times to graph
survplot(fl, group, what="hazard")   
# Compute median survival time
m <- med(lp=predict(fl,
           data.frame(group=levels(kprats$group))))
m
       1        2 
216.0857 240.0328 
Code
med(lp=range(fl$linear.predictors))
[1] 216.0857 240.0328
Code
m <- format(m, digits=3)
text(68, .02, paste("Group 1 median: ", m[1],"\n",
                    "Group 2 median: ", m[2], sep=""))
# Compute survival probability at 210 days
xbeta <- predict(fl,
                 data.frame(group=c("Group 1","Group 2")))
surv(210, xbeta)
        1         2 
0.5612718 0.7599776 
Figure 18.11: Estimated hazard functions for log-logistic fit to rat vaginal cancer data, along with median survival times

18.3.7 Validating the Fitted Model

  • Check distributional shape
  • Group predicted \(t\)-year survival and plot Kaplan-Meier estimate at \(t\) vs. mean predicted \(\hat{S}\)
  • Cox-Snell residuals — check against \(U[0,1]\)
  • loess smooth of \(F(T | X) - 0.5 F(C | X)\) against \(X \hat{\beta}\) or \(\frac{2 F(T | X)}{F(C | X)}\) vs. \(X \hat{\beta}\) if \(C\) is known

See the val.surv function in the rms package.

18.4 R Functions