```
require(Hmisc)
getRs('reptools.r')
getRs('movStats.r')
```

# 1 Introduction

This book describes workflow that I’ve found to be efficient in making reproducible research reports using R with `Rmarkdown`

and now `Quarto`

in data analysis projects. I start with a fairly complete case study of survival patterns of passengers on the *Titanic* that exemplifies many of the methods presented in the book. This is followed by chapters covering importing data, creating annotated analysis files, examining extent and patterns of missing data, and running descriptive statistics on them with goals of understanding the data and their quality and completeness. Functions in the `Hmisc`

package are used to annotate data frames and data tables with labels and units of measurement, show metadata/data dictionaries, and to produce tabular and graphical statistical summaries. Efficient and clear methods of recoding variables are given. Several examples of processing and manipulating data using the `data.table`

package are given, including some non-trivial longitudinal data computations. General principles of data analysis are briefly surveyed and some flexible bivariate and 3-variable analysis methods are presented with emphasis on staying close to the data while avoiding highly problematic categorization of continuous independent variables. Examples of diagramming the flow of exclusion of observations from analysis, caching results, parallel processing, and simulation are presented. In the process several useful report writing methods are exemplified, including program-controlled creation of multiple report tabs.

## 1.1 R Code Repositories Used in This Book

This report makes heavy use of the following R packages and Github repository:

`Hmisc`

package which contains functions for importing data, data annotation, summary statistics, statistical graphics, advanced table making, etc.`data.table`

package for data storage, retrieval, manipulation, munging, aggregation, merging, and reshaping`haven`

package for importing datasets from statistical packages`ggplot2`

package for static graphics`consort`

package for consort diagrams`plotly`

package for interactive graphics`consort`

package for consort diagrams showing observation filtering`rms`

package for statistical modeling, validation, and presentation`knitr`

package for running reproducible reports, and also providing`kable`

and`kables`

functions for simple html table printing`rscripts`

Github repository with utility functions that are all loaded when`reptools.r`

is loaded`addCap`

,`printCap`

for adding captions to a list of figures and for printing the list`addggLayers`

for adding extended box plots and spike histograms to`ggplot2`

plots, especially when run on the output of`meltData`

`dataChk`

for data checking`dataOverview`

dataset overview`hashCheck`

for checking if parent objects have changed so a slow analysis has to be re-run (i.e., talking control of caching)`htmlList`

to easily print vectors in a named list using`kable`

`htmlView`

,`htmlViewx`

for viewing data dictionaries/metadata in browser windows`kabl`

to make it easy to use`kable`

and`kables`

for making html tables`maketabs`

to automatically make multiple tabs in`Quarto`

reports, each tab holding the output of one or more R command`makecolmarg`

to print an object in the right margin in`Quarto`

reports`makecnote`

to print an object in a collapsible`Quarto`

note`makecallout`

a generic Quarto callout maker called by`makecolmarg`

,`makecnote`

`makecodechunk`

`makemermaid`

make Quarto`mermaid`

diagrams with insertion of variable values`meltData`

melt a data table according to a formula, with optional substitution of variable labels for variable names`rsHelp`

for viewing helps files for functions in`rscripts`

`scplot`

for putting graphs in separate chunks with captions in TOC`seqFreq`

for creating a factor variable with categories in descending order of sequential frequencies of conditions (as used in computing study exclusion counts)`vClus`

for variable clustering`runifChanged`

which uses`hashCheck`

to automatically re-run an analysis if needed, otherwise to retrieve previous results efficiently

`qreport`

Github repository which has functions dedicated to randomized clinical trial reporting`aePlot`

for making an interactive`plotly`

dot chart of adverse event proportions by treatment

Another file in `rscripts`

is `movStats.r`

which defines the `movStats`

function for computing summary statistics by moving overlapping windows of a continuous variable, or simply stratified by a categorical variable. The `rscripts`

`Github`

functions are accessed by the `Hmisc`

function `getRs`

, e.g.

All the available help files for functions in `rscripts`

are at hbiostat.org/R/rscripts. To view a help file for one of the functions in the `RStudio`

`Viewer`

pane use for example `rsHelp(movStats)`

or `rsHelp(reptools)`

.