Appendix 1.Fitting the full (unpenalized) model against the 28 diagnostic predictors. For explanation of the predictor names see table 1 (same order of predictors) and point 11 below (Newlabels function). full <- lrm(emboly ~ age + sex + qi + dimmobil + dsym + famhisdvt + malignancy + surg3mths + collapse + dyspnoea + smoking + coughing + prevdvt + prevpe + legpain + palpitations + + legparesis + signsdvt + pleurub + wheezing + crepitations + oedema + breathfreq + artpo2 + artpco2 + abnxray + abnecho, x=T, y=T) summary(full) # showing model output 2.Estimation of shrinkage factor for the regression coefficients and ROC area of the final model using bootstrapping with 200 samples and replaying the selection of predictors within each bootstrap sample (validate function). Estimation of the calibration curve before and after bootstrapping with 200 samples and replaying the selection of predictors within each bootstrap sample (calibrate function). validate(full, B = 200, bw = T, rule = "p", sls = 0.1) calibrate(full, B = 200, bw = T, rule = "p", sls = 0.1) 3.To check whether the full unpenalized model was indeed overfitted using again the validate and calibrate function (without replaying variable reduction strategy). validate(full, B=200) calibrate(full, B=200) 4.Search for the optimal penalty factor using a trial and error process (pentrace function) in which a variety of ’s (ranging from 0 to 60) are used to estimate the corresponding modified AIC. The  which maximizes the modified AIC is selected as the optimal penalty factor. pentrace(full, list(simple=c(0:60))) 5.Penalizing the full model, i.e. estimating the regression coefficients and ROC area of the full model after penalizing with the selected penalty factor of 9 (update function). As we did not include interaction terms only the penalty factor for the simple (linear) terms is included. full.pen <- update(full, penalty=list(simple=9), x=T, y=T) summary(full.pen) 6.Syntax for figure 3. pr1 <- plogis(predict(full)) # predicted prob. full unpenalized model pr2 <- plogis(predict(full.pen)) # predicted prob. full penalized model s <- sample(1:length(pr1),20,replace=F) # draw random sample of 20 individuals plot(0,0, xlim=c(0,1), ylim=c(.01,.99), type ='n',axes=F, xlab='', ylab='') n <- 20 x1 <- .20 x2 <- .60 points(rep(x1,n),pr1[s]) points(rep(x2,n),pr2[s]) for(i in 1:n) lines(c(x1,x3), c(pr1[s][i],pr2[s][i]), lty=i, lwd=1) axis(2) text(c(x1,x2), rep(.00,2), c('Full\nunpenalized\nmodel','Full\npenalized\nmodel')) 7.Estimating whether the full penalized model was overfitted and estimating the calibration curve of the full penalized model. validate(full.pen, B=200) calibrate(full.pen, B=200) 8.Plot of the relative contribution of each predictor in the full penalized model. plot(anova(full.pen)) 9.Approximating (simplifying or reducing) the full penalized model. This is done in several steps. a. Save for each patient the linear predictor (plogit) estimated by the full penalized model, i.e. the summation on the log scale of the penalized regression coefficients multiplied by the patient value of the corresponding predictors. plogit <- predict(full.pen) b. Ordinary least square regression (ols function) of the linear predictor of the full penalized model to all diagnostic predictors. The r-square of this model is 1 (obviously). f <- ols(plogit ~ age + sex + qi + dimmobil + dsym + famhisdvt + malignancy + surg3mths + collapse + dyspnoea + smoking + coughing + prevdvt + prevpe + legpain + palpitations + + legparesis + signsdvt + pleurub + wheezing + crepitations + oedema + breathfreq + artpo2 + artpco2 + abnxray + abnecho, sigma=1) c. Automatic backward selection of all predictors (using an extreme AIC such that all predictors will be excluded) and estimating the model R-square after deleting each determinant step by step, starting with the least contributing. Select the subset of variables which still yield a model R-square of 0.95 as compared to the full penalized model. fastbw(f, aics=1e10) d. Fit the approximated or reduced model to obtain the penalized regression coefficients. full.aprox <- ols(plogit ~ age + qi + dimmobil + dsym + malignancy + surg3mths + collapse + coughing + legpain + wheezing + crepitations + breathfreq + abnxray + abnecho, x=T) summary(full.aprox) 10. Estimation of the correct, i.e. penalized, standard errors, 95% CI and p-values (Wald test) of the penalized regression coefficients. V <- Varcov(full.pen, regcoef.only=T) X <- cbind(1,full.pen$x) x <- full.aprox$x w <- solve(t(x) %*% x, t(x)) %*% X v <- w %*% V %*% t(w) diag(v) # variance matrix penalized coefficients secoef <- sqrt(diag(v)) # standard error penalized coefficients coef-(1.96*secoef) # lower limit penalized coefficients coef+(1.96*secoef) # uprevper limit penalized coefficients full.aprox$var <- v # Wald test per penalized predictor 11.Syntax for drawing of figure 3 (nomogram) of reduced (approximated) penalized model. The Newlabels function labels the original variable names. g <- Newlabels(full.aprox, c(age='age (years)', qi='body mass index (kg/m2)', dimmobil='days of immobilisation', dsym='days of symptoms', malignancy= 'malignancy', surg3mths='surgery within 3 months', collapse='collapse', coughing='coughing', legpain='pain in leg', wheezing='wheezing', crepitations ='crepitations', breathfreq= 'breaths per minute', abnxray= 'abnormal chest X-ray', abnecho='abnormal leg ultrasound')) nomogram(g, age=18:95, Qi=16:45, dimmobil=0:50, dsym=0:90, breathfreq=11:56, lp=F, fun=plogis, funlabel='Probability of emboly', fun.at=c(.02,.05,seq(.1,.9,by=.1),.95,.98), maxscale=50, xfrac=.4, lmgp=.25, ia.space=.8, cex.var=.7, cex.axis=.5 )