16 Caching

The workhorse behind Rmarkdown and Quarto (besides Pandoc) is knitr, which processes the code chunks and properly mingles code and tabular and graphical output. knitr has a built-in caching mechanism to make it so that code is not needlessly executed when the code inputs have not changed. This easy-to-use process does have two disadvantages: the dependencies are not transparent, and the stored cache files may be quite large. I like to take control of caching and to be able to read the stored results with other scripts. To that end, the Hmisc package runifChanged function was written. Here is an example of its use. First a function with no arguments must be composed. This is the (usually slow) function that will be conditionally run if any of a group of listed objects has changed since the last time it was run. This function when needed to be run produces an object that is stored in binary form in a user-specified file (the default file name is the name of the current R code chunk with .rds appended).

require(rms)
require(data.table)
g <- function() {
  # Fit a logistic regression model and bootstrap it 500 times, saving
  # the matrix of bootstrapped coefficients
  f <- lrm(y ~ x1 + x2, x=TRUE, y=TRUE, data=dat)
  bootcov(f, B=500)
}
set.seed(3)
n   <- 2000
dat <- data.table(x1=runif(n), x2=runif(n),
                  y=sample(0:1, n, replace=TRUE))
# runifChanged will write runifch.rds if needed (chunk name.rds)
# Will run if dat or source code for lrm or bootcov change
b <- runifChanged(g, dat, lrm, bootcov)
dim(b$boot.Coef)

[1] 500   3

head(b$boot.Coef)

       Intercept          x1          x2
[1,]  0.02506366 -0.26912787  0.22930212
[2,] -0.02513734 -0.06308701  0.23415528
[3,]  0.15264191 -0.51540301  0.27155256
[4,]  0.18871210 -0.16127618 -0.17868324
[5,]  0.06781028  0.03666227  0.04380128
[6,] -0.01370652 -0.40025695  0.34345943