List of Figures

Figure Short Caption
Figure A linear spline function with knots at a = 1, b = 3, c = 5. A linear spline function with knots at \(a = 1, b = 3, c = 5\).
Figure A regular cubic spline function with three levels of continuity that prevent the human eye from detecting the knots. Also shown is the function’s first three derivatives. Knots are located at x=0.25, 0.5, 0.75. For x beyond the outer knots, the function is not restricted to be linear. Linearity would imply an acceleration of zero. Vertical lines are drawn at the knots. Cubic spline function and its derivatives
Figure Restricted cubic spline component variables for k = 5 and knots at X = .05, .275, .5, .725, and .95. Nonlinear basis functions are scaled by \tau. The left panel is a y–magnification of the right panel. Fitted functions such as those in @fig-gen-rcsex will be linear combinations of these basis functions as long as knots are at the same locations used here. Restricted cubic spline component variables for 5 knots
Figure Some typical restricted cubic spline functions for k = 3, 4, 5, 6. The y–axis is X\beta. Arrows indicate knots. These curves were derived by randomly choosing values of \beta subject to standard deviations of fitted functions being normalized. Some typical restricted cubic spline functions
Figure Regression assumptions for one binary and one continuous predictor Regression assumptions for one binary and one continuous predictor
Figure Logistic regression estimate of probability of a hemorrhagic stroke for patients in the GUSTO-I trial given t-PA, using a tensor spline of two restricted cubic splines and penalization (shrinkage). Dark (cold color) regions are low risk, and bright (hot) regions are higher risk. Probability of hemorrhagic stroke vs. blood pressures
Figure Fitting errors to withstand or to avoid Fitting errors to withstand or to avoid
Figure Sorted means from 20 samples of size 50 from a uniform [0,1] distribution. The reference line at 0.5 depicts the true population value of all of the means. Means from 20 \(U(0,1)\) samples
Figure Transformations fitted using transcan. Tick marks indicate the two imputed values for blood pressure. transcan transformations for two physiologic variables
Figure Relationship between heart rate and blood pressure: untransformed data (left) and transformed data (right). Data courtesy of the SUPPORT study [@kna95sup]. HR vs. BP before and after transcan transformations
Figure Relative LR \chi^2 explained. Interaction effects are added to main effects. Relative LR \(\chi^2\) explained. Interaction effects are added to main effects.
Figure Relative explained variation due to each predictor. Interaction effects are added to main effects. Intervals are 0.95 bootstrap percentile confidence intervals. Relative explained variation due to each predictor. Interaction effects are added to main effects. Intervals are 0.95 bootstrap percentile confidence intervals.
Figure Empirical and population cumulative distribution function Empirical and population cumulative distribution function
Figure Estimating properties of sample median using the bootstrap Estimating properties of sample median using the bootstrap
Figure Bootstrap percentile 0.95 confidence limits for ranks of predictors in an OLS model. Ranking is on the basis of partial \chi^2 minus d.f. Point estimates are original ranks Bootstrap confidence limits for ranks of predictors
Figure Time profiles for individual subjects, stratified by study site and dose Time profiles for individual subjects, stratified by study site and dose
Figure Quartiles of TWSTRS stratified by dose Quartiles of TWSTRS stratified by dose
Figure Mean responses and nonparametric bootstrap 0.95 confidence limits for population means, stratified by dose Mean responses and nonparametric bootstrap 0.95 confidence limits for population means, stratified by dose
Figure Variogram, with assumed correlation pattern superimposed Variogram, with assumed correlation pattern superimposed
Figure Three residual plots to check for absence of trends in central tendency and in variability. Upper right panel shows the baseline score on the x-axis. Bottom left panel shows the mean \pm 2\times SD. Bottom right panel is the QQ plot for checking normality of residuals from the GLS fit. Three residual plots to check for absence of trends in central tendency and in variability. Upper right panel shows the baseline score on the \(x\)-axis. Bottom left panel shows the mean \(\pm 2\times\) SD. Bottom right panel is the QQ plot for checking normality of residuals from the GLS fit.
Figure Results of anova.rms from generalized least squares fit with continuous time AR1 correlation structure Results of anova.rms from generalized least squares fit with continuous time AR1 correlation structure
Figure Estimated effects of time, baseline TWSTRS, age, and sex Estimated effects of time, baseline TWSTRS, age, and sex
Figure Contrasts and 0.95 confidence limits from GLS fit Contrasts and 0.95 confidence limits from GLS fit
Figure Nomogram from GLS fit. Second axis is the baseline score. Nomogram from GLS fit. Second axis is the baseline score.
Figure Matrix of Spearman \rho rank correlation coefficients between predictors. Horizontal gray scale lines correspond to \rho=0. The tallest bar corresponds to |\rho|=0.785. Spearman \(\rho\) rank correlations of predictors
Figure Hierarchical clustering using Hoeffding’s D as a similarity measure. Dummy variables were used for the categorical variable `ekg. Some of the dummy variables cluster together since they are by definition negatively correlated. ’Hierarchical clustering
Figure Simultaneous transformation and single imputation of all candidate predictors using transcan. Imputed values are shown as red plus signs. Transformed values are arbitrarily scaled to [0,1]. Simultaneous transformation and imputation using transcan
Figure Variance of the system of raw predictors (black) explained by individual principal components (lines) along with cumulative proportion of variance explained (text), and variance explained by components computed on transcan-transformed variables (red) Variance of the system explained by principal components.
Figure AIC of Cox models fitted with progressively more principal components. The solid blue line depicts the AIC of the model with all original covariates. The dotted blue line is positioned at the AIC of the full spline model. AIC vs. number of principal components
Figure Variance explained by individual sparse principal components (lines) along with cumulative proportion of variance explained (text) Sparse principal components
Figure Performance of sparse principal components in Cox models Performance of sparse principal components
Figure Simultaneous transformation of all variables using ACE. Transformation of variables using ACE
Figure Log-likelihood function for binomial distribution with 2 sample sizes Log-likelihood function for binomial distribution with 2 sample sizes
Figure Tests arising from maximum liklihood estimation Tests arising from maximum liklihood estimation
Figure Bootstrap confidence interval choices, from @car00boo Bootstrap confidence interval choices, from Carpenter & Bithell (2000)
Figure Bootstrap confidence intervals Bootstrap confidence intervals
Figure Logistic function Logistic function
Figure Absolute benefit as a function of risk of the event in a control subject and the relative effect (odds ratio) of the risk factor. The odds ratios are given for each curve. Absolute benefit as a function of risk in a control subject and the relative effect
Figure Data, subgroup proportions (triangles), and fitted logistic model, with 0.95 pointwise confidence bands Data, subgroup proportions, and fitted logistic model
Figure Simulated expected and 0.9 quantile of the maximum error in estimating probabilities for x \in [-1.5, 1.5] with a single normally distributed X with mean zero Average and 0.9 quantile of maximum error with continuous predictor
Figure Logistic regression assumptions for one binary and one continuous predictor Logistic regression assumptions for one binary and one continuous predictor
Figure Logit proportions of significant coronary artery disease by sex and deciles of age for n=3504 patients, with spline fits (smooth curves). Spline fits are for k=4 knots at age=36, 48, 56, and 68 years, and interaction between age and sex is allowed. Shaded bands are pointwise 0.95 confidence limits for predicted log odds. Smooth nonparametric estimates are shown as dashed lines. Data courtesy of the Duke Cardiovascular Disease Databank. Logit proportions of significant CAD by sex and age
Figure Estimated relationship between duration of symptoms and the log odds of severe coronary artery disease for k=5. Knots are marked with arrows. Solid line is spline fit; dashed line is a nonparametric loess estimate. Triangles are logits of proportions after binning duration. Duration of symptoms and severe CAD
Figure Fitted linear logistic model in \log_{10}(\text{duration + 1}), with subgroup estimates using groups of 150 patients. Fitted equation is \Pr(\text{tvdlm}) = \text{expit}(-.9809+.7122 \log_{10}(\text{months}+1)). Duration of symptoms and \(\log_{10}(\text{months}+1\))
Figure Log odds of significant coronary artery disease modeling age with two dummy variables Log odds of significant coronary artery disease modeling age with two dummy variables
Figure Local regression fit for the logit of the probability of significant coronary disease vs. age and cholesterol for males, based on the loess function. Local regression fit for log odds of significant coronary disease vs. age and cholesterol
Figure Linear spline surface for males, with knots for age at 46, 52, 59 and knots for cholesterol at 196, 224, and 259 (quartiles). Linear spline surface for logit(significant disease) for males
Figure Restricted cubic spline surface in two variables, each with k=4 knots Restricted cubic spline surface in two variables, each with \(k=4\) knots
Figure Restricted cubic spline fit with age \times spline(cholesterol) and cholesterol \times spline(age) Restricted cubic spline fit with age \(\times\) spline(cholesterol) and cholesterol \(\times\) spline(age)
Figure Spline fit with nonlinear effects of cholesterol and age and a simple product interaction Spline fit with nonlinear effects of cholesterol and age and a simple product interaction
Figure Predictions from linear interaction model with mean age in tertiles indicated. Predictions from linear interaction model with mean age in tertiles indicated.
Figure Partial residuals for duration and \log_{10}(duration+1). Data density shown at top of each plot. Partial residuals for binary logistic model
Figure Odds ratios and confidence bars, using quartiles of age and cholesterol for assessing their effects on the odds of coronary disease. Effects of predictors on odds of coronary disease
Figure Linear spline fit for probability of bacterial vs. viral meningitis as a function of age at onset [@spa89]. Points are simple proportions by age quantile groups. Linear spline fit for probability of bacterial vs. viral meningitis as a function of age at onset (Spanos et al., 1989). Points are simple proportions by age quantile groups.
Figure (A) Relationship between myocardium at risk and ventricular fibrillation, based on the individual best fit equations for animals anesthetized with pentobarbital and \alpha-chloralose. The amount of myocardium at risk at which 0.5 of the animals are expected to fibrillate (\text{MAR}_{50}) is shown for each anesthetic group. (B) Relationship between myocardium at risk and ventricular fibrillation, based on equations derived from the single slope estimate. Note that the \text{MAR}_{50} describes the overall relationship between myocardium at risk and outcome when either the individual best fit slope or the single slope estimate is used. The shift of the curve to the right during \alpha-chloralose anesthesia is well described by the shift in \text{MAR}_{50}. Test for interaction had P=0.10 [@wen84ven]. Reprinted by permission, NRC Research Press. Fitted logistic models in two variables, with and without interaction
Figure A nomogram for estimating the likelihood of significant coronary artery disease (CAD) in women. ECG = electrocardiographic; MI = myocardial infarction [@pry83]. Reprinted from American Journal of Medicine, Vol 75, Pryor DB et al., “Estimating the likelihood of significant coronary artery disease”, p. 778, Copyright 1983, with permission from Excerpta Medica, Inc. Nomogram for predicting \(\Pr(\text{CAD})\)
Figure Nomogram for estimating probability of bacterial (ABM) vs. viral (AVM) meningitis. Step 1, place ruler on reading lines for patient’s age and month of presentation and mark intersection with line A; step 2, place ruler on values for glucose ratio and total polymorphonuclear leukocyte (PMN) count in cerebro-spinal fluid and mark intersection with line B; step 3, use ruler to join marks on lines A and B, then read off the probability of ABM vs. AVM [@spa89]. Nomogram for predicting \(\Pr(\), Bacterial meningitis\()\)
Figure Ranking of apparent importance of predictors of cause of death using LR statistics Ranking of apparent importance of predictors of cause of death using LR statistics
Figure Partial effects (log odds scale) in full model for cause of death, along with vertical line segments showing the raw data distribution of predictors Partial effects in cause of death model
Figure Interquartile-range odds ratios for continuous predictors and simple odds ratios for categorical predictors. Numbers at left are upper quartile : lower quartile or current group : reference group. The bars represent 0.9, 0.95, 0.99 confidence limits. The intervals are drawn on the log odds ratio scale and labeled on the odds ratio scale. Ranges are on the original scale. Interquartile-range odds ratios and confidence limits
Figure Nomogram calculating X\hat{\beta} and \hat{P} for cvd as the cause of death, using the step-down model. For each predictor, read the points assigned on the 0–100 scale and add these points. Read the result on the Total Points scale and then read the corresponding predictions below it. Nomogram for obtaining \(X\hat{\beta}\) and \(\hat{P}\) from step-down model
Figure Bootstrap overfitting-corrected calibration curve estimate for the backwards step-down cause of death logistic model, along with a rug plot showing the distribution of predicted risks. The smooth nonparametric calibration estimator (loess) is used. Bootstrap nonparametric calibration curve for reduced cause of death model
Figure Fraction of explainable variation (full model LR \chi^2) in cvd that was explained by approximate models, along with approximation accuracy (x-axis) Model approximation vs. LR \(\chi^2\) preserved
Figure Nomogram for predicting the probability of cvd based on the approximate model Approximate nomogram for predicting cause of death
Figure Univariable summaries of Titanic survival Univariable summaries of Titanic survival
Figure Multi-way summary of Titanic survival Multi-way summary of Titanic survival
Figure Nonparametric regression (loess) estimates of the relationship between age and the probability of surviving the Titanic, with tick marks depicting the age distribution. The top left panel shows unstratified estimates of the probability of survival. Other panels show nonparametric estimates by various stratifications. Nonparametric regression for age, sex, class, and passenger survival
Figure Relationship between age and survival stratified by the number of siblings or spouses on board (left panel) or by the number of parents or children of the passenger on board (right panel). Relationship between age and survival stratified by family size variables
Figure Effects of predictors on probability of survival of Titanic passengers, estimated for zero siblings/spouses and zero parents/children Effects of predictors on probability of surviving the Titanic
Figure Effect of number of siblings and spouses on the log odds of surviving, for third class males Effect of number of siblings/spouses on survival
Figure Patterns of missing data. Upper left panel shows the fraction of observations missing on each predictor. Lower panel depicts a hierarchical cluster analysis of missingness combinations. The similarity measure shown on the Y-axis is the fraction of observations for which both variables are missing. Right panel shows the result of recursive partitioning for predicting is.na(age). The rpart function found only strong patterns according to passenger class. Patterns of missing Titanic data
Figure Univariable descriptions of proportion of passengers with missing age Univariable descriptions of proportion of passengers with missing age
?fig-titanic-nasingle Predicted log odds of survival in Titanic using casewise deletion
Figure Distributions of imputed and actual ages for the Titanic dataset. Imputed values are in black and actual ages in gray. Distribution of imputed and actual ages
?fig-titanic-calibrate Estimated calibration curves for the Titanic risk model, accounting for multiple imputation
Figure Predicted probability of survival for males from fit using single conditional mean imputation again (top) and multiple random draw imputation (bottom). Both sets of predictions are for sibsp=0. Predicted Titanic survival using multiple imputation
Figure Odds ratios for some predictor settings Odds ratios for some predictor settings
Figure Checking PO assumption separately for a series of predictors. The circle, triangle, and plus sign correspond to Y \geq 1, 2, 3, respectively. PO is checked by examining the vertical constancy of distances between any two of these three symbols. Response variable is the severe functional disability scale sfdm2 from the 1000-patient SUPPORT dataset, with the last two categories combined because of low frequency of coma/intubation. Simple method for checking PO assumption using stratification
Figure Checking the impact of the PO assumption by comparing predicted probabilities of all outcome categories from a PO model with a multinomial logistic model that assumes PO for no variables Checking impact of the PO assumption
Figure Transformed empirical cumulative distribution functions stratified by body frame in the diabetes dataset. Left panel: checking all assumptions of the parametric ANOVA. Right panel: checking all assumptions of the PO model (here, Kruskal–Wallis test). Checking assumptions of PO and parametric model
Figure Examination of normality and constant variance assumption, and assumptions for various ordinal models Examining normality and ordinal model assumptions
Figure Assumptions of the linear model (left panel) and semiparametric ordinal probit or logit (proportional odds) models (right panel). Ordinal models do not assume any shape for the distribution of Y for a given X; they only assume parallelism. Assumptions of linear vs. semiparametric models
Figure Three estimated quantiles and estimated mean using 6 methods, compared against caliper-matched sample quantiles/means (circles). Numbers are mean absolute differences between predicted and sample quantities using overlapping intervals of age and caliper matching. QR:quantile regression. Six methods for estimating quantiles or means.
Figure Observed (dashed lines, open circles) and predicted (solid lines, closed circles) exceedance probability distributions from a model using 6-tiles of OLS-predicted \text{HbA}_{1c}. Key shows quantile group intervals of predicted mean \text{HbA}_{1c}. Observed and predicted distributions
Figure Estimated intercepts from probit model Estimated intercepts from probit model
Figure Variable clustering for all potential predictors Variable clustering for all potential predictors
Figure Estimated median height as a smooth function of age, allowing age to interact with sex, from a proportional odds model Median height vs. age
Figure Estimated median upper leg length as a smooth function of age, allowing age to interact with sex, from a proportional odds model Median leg length vs. age
Figure Generalized squared rank correlations Generalized squared rank correlations
Figure Estimated mean and 0.5 and 0.9 quantiles from the log-log ordinal model using casewise deletion, along with predictions of 0.5 and 0.9 quantiles from quantile regression (QR). Age is varied and other predictors are held constant to medians/modes. Estimated mean and quantiles from casewise deletion model.
Figure ANOVA for reduced model after multiple imputation ANOVA for reduced model after multiple imputation
Figure Partial effects (log hazard or log-log cumulative probability scale) of all predictors in reduced model, after multiple imputation Partial effects after multiple imputation
Figure Partial effects (mean scale) of all predictors in reduced model, after multiple imputation Partial effects (means) after multiple imputation
Figure Partial effect for age from multiple imputation and casewise deletion (center lines with the green line depicting all non-multiple-imputation methods) with symmetric Wald 0.95 confidence bands using casewise deletion, basic bootstrap confidence bands using casewise deletion, percentile bootstrap confidence bands using casewise deletion, and symmetric Wald confidence bands accounting for multiple imputation. Partial effect for age with bootstrap and Wald confidence bands
Figure Predicted mean r hba vs. predicted median and 0.9 quantile along with their marginal distributions Predicted mean, median, and 0.9 quantile of r hba
Figure Nomogram for predicting median, mean, and 0.9 quantile of glycohemoglobin, along with the estimated probability that \text{HbA}_{1c} \ge 6.5, 7, or 7.5, all from the log-log ordinal model Nomogram of log-log ordinal model for \(\text{HbA}_{1c}\)
Figure avas transformations: overall estimates, pointwise 0.95 confidence bands, and 20 bootstrap estimates (red lines). Transformations estimated by avas
Figure Checking estimated against optimal transformation Checking estimated against optimal transformation
Figure Predicted median (left panel) and mean (right panel) y as a function of x2 and x3. True population curves are pointed. Predicted y as a function of x2 and x3
Figure Survival function Survival function
Figure Cumulative hazard function Cumulative hazard function
Figure Hazard function Hazard function
Figure Some censored data. Circles denote events. Some censored data. Circles denote events.
Figure Some Weibull hazard functions with \alpha=1 and various values of \gamma Some Weibull hazard functions with \(\alpha=1\) and various values of \(\gamma\)
Figure Kaplan–Meier product-limit estimator with 0.95 confidence bands. The Altschuler–Nelson–Aalen–Fleming–Harrington estimator is depicted with the dashed lines. Kaplan-Meier and Nelson–Aalen estimates
Figure Absolute clinical benefit as a function of survival in a control subject and the relative benefit (hazard ratio). The hazard ratios are given for each curve. Absolute clinical benefit as a function of survival in a control subject and the relative benefit
Figure PH Model with one binary predictor. Y-axis is \log \lambda(t) or \log \Lambda(t). For \log \Lambda(t), the curves must be non-decreasing. For \log \lambda(t), they may be any shape. PH model with one binary predictor
Figure PH model with one continuous predictor. Y-axis is \log \lambda(t) or \log \Lambda(t). For \log \Lambda(t), drawn for t_{2}>t_{1}. The slope of each line is \beta_{1}. PH model with one continuous predictor
Figure PH model with one continuous predictor. Y-axis is \log \lambda(t) or \log \Lambda(t). For \log \lambda, the functions need not be monotonic. PH model with one continuous predictor
Figure Regression assumptions, linear additive PH or AFT model with two predictors. For PH, Y-axis is \log \lambda(t) or \log \Lambda(t) for a fixed t. For AFT, Y-axis is \log(T). Regression assumptions, linear additive PH or AFT model with two predictors
Figure AFT model with one predictor. Y-axis is \psi^{-1}(S(t|X)) = \frac{\log(t)-X\beta}{\sigma}. Drawn for d>c. The slope of the lines is \sigma^{-1}. AFT model with one predictor
Figure AFT model with one continuous predictor. Y-axis is \psi^{-1}(S(t|X)) = \frac{\log(t)-X\beta}{\sigma}. Drawn for t_{2}>t_{1}. The slope of each line is \beta_{1}/\sigma and the difference between the lines is \frac{1}{\sigma}\log(t_{2}/t_{1}). AFT model with one continuous predictor
Figure Altschuler-Nelson-Fleming-Harrington nonparametric survival estimates for rats treated with DMBA [@pik66], along with various transformations of the estimates for checking distributional assumptions of 3 parametric survival models. Examples of checking parametric survival model assumptions
Figure Agreement between fitted log-logistic model and nonparametric survival estimates for rat vaginal cancer data Fitted log-logistic model
Figure Kaplan-Meier estimates of distribution of standardized, censored residuals from the log-logistic model, along with the assumed standard log-logistic distribution (blue). Red step function is the estimated distribution of all residuals; black step functions are the estimated distributions of residuals stratified by group, as indicated. Checking AFT distributional assumption using residuals
Figure Estimated hazard functions for log-logistic fit to rat vaginal cancer data, along with median survival times Estimated log-logistic hazard functions
Figure Cluster analysis showing which predictors tend to be missing on the same patients Cluster analysis of missingness in SUPPORT
Figure Hierarchical clustering of potential predictors using Hoeffding D as a similarity measure. Categorical predictors are automatically expanded into dummy variables. Clustering of predictors in SUPPORT using Hoeffding \(D\)
Figure \Phi^{-1}(S_{ ext{KM}}(t)) stratified by dzgroup. Linearity and semi-parallelism indicate a reasonable fit to the log-normal accelerated failure time model with respect to one predictor. \(\Phi^{-1}(S_{ ext{KM}}(t))\) stratified by dzgroup
Figure Kaplan-Meier estimates of distributions of normalized, right-censored residuals from the fitted log-normal survival model. Residuals are stratified by important variables in the model (by quartiles of continuous variables), plus a random variable to depict the natural variability (in the lower right plot). Theoretical standard Gaussian distributions of residuals are shown with a thick solid line. The upper left plot is with respect to disease group. Distributions of residuals from log-normal model
Figure Generalized Spearman \rho^2 rank correlation between predictors and truncated survival time Generalized Spearman \(\rho^2\) rank correlation between predictors and truncated survival time
Figure Somers’ D_{xy} rank correlation between predictors and original survival time. For dzgroup or race, the correlation coefficient is the maximum correlation from using a dummy variable to represent the most frequent or one to represent the second most frequent category. Somers’ \(D_{xy}\) rank correlation between predictors and original survival time
Figure Partial likelihood ratio \chi^{2} statistics for association of each predictor with response from saturated main effects model, penalized for d.f. Partial \(\chi^{2}\) statistics from saturated main effects model
Figure Effect of each predictor on log survival time. Predicted values have been centered so that predictions at predictor reference values are zero. Pointwise 0.95 confidence bands are also shown. As all Y-axes have the same scale, it is easy to see which predictors are strongest. Effect of predictors on log survival time in SUPPORT
Figure Contribution of variables in predicting survival time in log-normal model Contribution of variables in predicting survival time in log-normal model
Figure Estimated survival time ratios for default settings of predictors. For example, when age changes from its lower quartile to the upper quartile (47.9y to 74.5y), median survival time decreases by more than half. Different shaded areas of bars indicate different confidence levels (0.9, 0.95, 0.99). Survival time ratios from fitted log-normal model
Figure Bootstrap validation of calibration curve. Dots represent apparent calibration accuracy; \times are bootstrap estimates corrected for overfitting, based on binning predicted survival probabilities and and computing Kaplan-Meier estimates. Black curve is the estimated observed relationship using hare and the blue curve is the overfitting-corrected hare estimate. The gray-scale line depicts the ideal relationship. Bootstrap validation of calibration curve for log-normal model
Figure Nomogram for predicting median and mean survival time, based on approximation of full model Nomogram for simplified log-normal model
Figure Altschuler–Nelson–Fleming–Harrington nonparametric survival estimates and Cox-Breslow estimates for rat data [@pik66] Nonparametric and Cox–Breslow survival estimates
Figure Unadjusted (Kaplan–Meier) and adjusted (Cox–Kalbfleisch–Prentice) estimates of survival. Left, Kaplan–Meier estimates for patients treated medically and surgically at Duke University Medical Center from November 1969 through December 1984. These survival curves are not adjusted for baseline prognostic factors. Right, survival curves for patients treated medically or surgically after adjusting for all known important baseline prognostic characteristics [@cal89]. Unadjusted (Kaplan–Meier) and adjusted survival estimates
Figure Kaplan–Meier log \Lambda estimates by sex and deciles of age, with 0.95 confidence limits. Kaplan–Meier log \(\Lambda\) estimates by sex and deciles of age
Figure Cox PH model stratified on sex, using spline function for age, no interaction. 0.95 confidence limits also shown. Cox PH model stratified on sex, using spline function for age
Figure Cox PH model stratified on sex, with interaction between age spline and sex. 0.95 confidence limits are also shown. Cox PH model stratified on sex,with interaction between age spline and sex
Figure Restricted cubic spline estimate of relationship between LVEF relative log hazard from a sample of 979 patients and 198 cardiovascular deaths. Data from the Duke Cardiovascular Disease Databank. Spline estimate of relationship between LVEF and relative log hazard
Figure Three smoothed estimates relating martingale residuals [@the90] to LVEF. Smoothed martingale residuals vs. LVEF
Figure Estimate of \Lambda_{2}/\Lambda_{1} based on -\log of Altschuler–Nelson–Fleming–Harrington nonparametric survival estimates. \(\Lambda\) ratio plot
Figure Stratified hazard ratios for pain/ischemia index over time. Data from the Duke Cardiovascular Disease Databank. Stratified hazard ratios for pain/ischemia index over time
Figure Smoothed weighted [@gra94pro] @sch82 residuals for the same data in @fig-cox-pi-hazard-ratio. Test for PH based on the correlation (\rho) between the individual weighted Schoenfeld residuals and the rank of failure time yielded \rho=-0.23, z=-6.73, P=2\times 10^{-11}. Smoothed Schoenfeld residuals
Figure Calibration of random predictions using Efron’s bootstrap with B=200 resamples. Dataset has n=200, 100 uncensored observations, 20 random predictors, model \chi^{2}_{20} = 19. The smooth black line is the apparent calibration estimated by adaptive linear spline hazard regression [@koo95haz], and the blue line is the bootstrap bias– (overfitting–) corrected calibration curve estimated also by hazard regression. The gray scale line is the line of identity representing perfect calibration. Black dots represent apparent calibration accuracy obtained by stratifiying into intervals of predicted 0.5y survival containing 40 events per interval and plotting the mean predicted value within the interval against the stratum’s Kaplan-Meier estimate. The blue \times represent bootstrap bias-corrected Kaplan-Meier estimates. Bootstrap calibration of random survival predictions
Figure A display of an interaction between treatment and extent of disease, and between treatment and calendar year of start of treatment. Comparison of medical and surgical average hazard ratios for patients treated in 1970, 1977, and 1984 according to coronary artery disease severity. Circles represent point estimates; bars represent 0.95 confidence limits for hazard ratios. Hazard ratios <1 indicate that surgery is more effective [@cal89]. Display of an interactions among treatment, extent of disease, and year
Figure Cox–Kalbfleisch–Prentice survival estimates stratifying on treatment and adjusting for several predictors, showing a secular trend in the efficacy of coronary artery bypass surgery. Estimates are for patients with left main disease and normal (LVEF=0.6) or impaired (LVEF=0.4) ventricular function [@pry87]. Cox–Kalbfleisch–Prentice survival estimates stratifying on treatment and adjusting for several predictors
Figure Cox model predictions with respect to a continuous variable. X-axis shows the range of the treadmill score seen in clinical practice and Y-axis shows the corresponding 5-year survival probability predicted by the Cox regression model for the 2842 study patients [@mar87]. Cox model predictions with respect to a continuous variable
Figure Survival estimates for model stratified on sex, with interaction. Survival estimates for model stratified on sex, with interaction.
Figure Nomogram from a fitted stratified Cox model that allowed for interaction between age and sex, and nonlinearity in age. The axis for median survival time is truncated on the left where the median is beyond the last follow-up time. Nomogram for stratified Cox model
Figure Raw and spline-smoothed scaled Schoenfeld residuals for dose of estrogen, nonlinearly coded from the Cox model fit, with \pm 2 standard errors. Schoenfeld residuals for dose of estrogen in Cox model
Figure Shape of each predictor on log hazard of death. Y-axis shows X\hat{\beta}, but the predictors not plotted are set to reference values. Note the highly non-monotonic relationship with ap, and the increased slope after age 70 which has been found in outcome models for various diseases. Shapes of predictors for log hazard in prostate cancer
Figure Bootstrap estimate of calibration accuracy for 5-year estimates from the final Cox model, using adaptive linear spline hazard regression. Line nearer the ideal line corresponds to apparent predictive accuracy. The blue curve corresponds to bootstrap-corrected estimates. Bootstrap estimates of calibration accuracy in prostate cancer model
Figure Hazard ratios and multi-level confidence bars for effects of predictors in model, using default ranges except for ap Hazard ratios for prostate survival model
Figure Nomogram for predicting death in prostate cancer trial Nomogram for predicting death in prostate cancer trial
Figure Transition proportions from data simulated from VIOLET Transition proportions from data simulated from VIOLET
Figure State occupancy proportions from simulated VIOLET data with death carried forward State occupancy proportions from simulated VIOLET data with death carried forward
Figure Estimated time trends in relative log odds of transitions. Variables not shown are set to median/mode and tx=0. Estimated time trends in relative log transition odds
Figure Variogram-like graph for checking intra-patient correlation structure. x-axis shows the number of days between two measurements. Variogram-like graph
Figure State occupancy probabilities for each treatment State occupancy probabilities for each treatment
Figure Relationship between bootstrap log ORs and differences in mean days unwell Relationship between bootstrap log ORs and differences in mean days unwell
Figure Missing data patterns in d Plot of the degree of symmetry of the distribution of a variable (value of 1.0 is most symmetric) vs. the number of distinct values of the variable. Hover over a point to see the variable name and detailed characteristics.
Figure Spearman rank correlation matrix. Positive correlations are blue and negative are red. Spearman rank correlation matrix. Positive correlations are blue and negative are red.