15  Caching

The workhorse behind Rmarkdown and Quarto (besides Pandoc) is knitr, which processes the code chunks and properly mingles code and tabular and graphical output. knitr has a built-in caching mechanism to make it so that code is not needlessly executed when the code inputs have not changed. This easy-to-use process does have two disadvantages: the dependencies are not transparent, and the stored cache files may be quite large. I like to take control of caching. To that end, the runifChanged function was written. Here is an example of its use. First a function with no arguments must be composed. This is the (usually slow) function that will be conditionally run if any of a group of listed objects has changed since the last time it was run. This function when needed to be run produces an object that is stored in binary form in a user-specified file (the default file name is the name of the current R code chunk with .rds appended).

# Read the source code for the hashCheck and runifChanged functions from
# https://github.com/harrelfe/rscripts/blob/master/hashCheck.r
getRs('hashCheck.r', put='source')
g <- function() {
  # Fit a logistic regression model and bootstrap it 500 times, saving
  # the matrix of bootstrapped coefficients
  f <- lrm(y ~ x1 + x2, x=TRUE, y=TRUE, data=dat)
  bootcov(f, B=500)
n   <- 2000
dat <- data.table(x1=runif(n), x2=runif(n),
                  y=sample(0:1, n, replace=TRUE))
# runifChanged will write runifch.rds if needed (chunk name.rds)
# Will run if dat or source code for lrm or bootcov change
b <- runifChanged(g, dat, lrm, bootcov)
[1] 500   3
       Intercept          x1          x2
[1,]  0.02007292 -0.30079958  0.32416398
[2,]  0.06150624 -0.35741054  0.25522669
[3,]  0.25225861 -0.40094541  0.09290729
[4,]  0.13766665 -0.48661991  0.19684403
[5,] -0.22018456  0.02132711  0.33973578
[6,]  0.18217417 -0.36140896 -0.04873320