R Workflow

R Workflow for Reproducible Biomedical Research Using Quarto

Author
Affiliation

Department of Biostatistics
School of Medicine
Vanderbilt University

Published

June 26, 2022

flowchart TD
R[R Workflow] --> Rformat[Report formatting]
Rformat --> Quarto[Quarto setup

Using metadata in
report output

Table and graph formatting] R --> DI[Data import] --> Annot[Annotate data

View data dictionary
to assist coding] R --> Do[Data overview] --> F[Observation filtration
Missing data patterns
Data about data] R --> P[Data processing] --> DP[Recode
Transform
Reshape
Merge
Aggregate
Manipulate] R --> Des[Descriptive statistics
Univariate or simple
stratification] R --> An[Analysis
Stay close to data] --> DA[Descriptive

Avoid tables by using
nonparametric smoothers] & FA[Formal] R --> CP[Caching
Parallel computing
Simulation]

Preface

This work is intended to foster best practices in reproducible data documentation and manipulation, statistical analysis, graphics, and reporting. The context relates to my field of biomedical research but the methods presented are widely applicable. The work also showcases Quarto which is a new standard for making beautiful and reproducible reports with R and other languages. This book also captures what I’ve learned in using R (and its precursor S) heavily in biomedical research and clinical trials since 1991. See my Statistical Thinking blog fharrell.com and resources at hbiostat.org for more. The Statistical Thinking article R Workflow provides an overview of this book and includes some more motivation from the standpoint of doing good scientific research. But apart from Consort diagrams, the methods in R Workflow will be helpful to anyone who analyzes data, whether they work in business, marketing, manufacturing, journalism, finance, science, observational research, experimental research, and virtually any field relying on understanding data.

The term “workflow” connotes a rigid step-by-step process of data processing and reporting. In one’s day-to-day usage of R, myriad needs arise, and much creativity is needed to get the most insights from data while writing reliable code that generates reproducible results. R Workflow will equip R users/analysts with a variety of powerful and flexible tools that will assist them in attacking a huge variety of problems and producing elegant reports while reducing the amount of coding required.