Preface
This work is intended to foster best practices in reproducible data documentation and manipulation, statistical analysis, graphics, and reporting. It will enable the reader to efficiently produce attractive, readable, and reproducible research reports while keeping code concise and clear. Readers are also guided in choosing statistically efficient descriptive analyses that are consonant with the type of data being analyzed. The Statistical Thinking article R Workflow provides an overview of this book and includes some more motivation from the standpoint of doing good scientific research.
Anyone who claims to be able to do good data science without coding is misleading you. Coding is one of the most valuable skills for data preparation and analysis, and it leads to personal efficiency, reproducibility, and maintainability. Learning how to write concise, elegant, debug-able code that generalizes to handle more complex tasks is not an insurmountable goal for anyone dealing with data, and R Workflow
is intended to assist you in this regard.
The methods in R Workflow
will be helpful to anyone who analyzes data, whether they work in business, marketing, manufacturing, journalism, finance, science, observational research, experimental research, and virtually any field needing to understand data. The book is best suited for those having at least rudimentary experience in running R commands, but Chapter 3 points readers to excellent resources for learning R from scratch. R can also be learned by starting with some standard analysis templates such as this in this Github repository.
The work also showcases RStudio’s Quarto
which is a new standard for making beautiful and reproducible reports with R and other languages. This book also captures what I’ve learned in using R (and its precursor S) heavily in biomedical research and clinical trials since 1991. See my Statistical Thinking blog fharrell.com
and resources at hbiostat.org
for more.
The term “workflow” connotes a rigid step-by-step process of data processing and reporting. In one’s day-to-day usage of R, myriad needs arise, and much creativity is needed to get the most insights from data while writing reliable code that generates reproducible results. R Workflow
will equip R users/analysts with a variety of powerful and flexible tools that will assist them in attacking a huge variety of problems and producing elegant reports while reducing the amount of coding required.
The general statistical analysis/inference companion to this book is Biostatistics for Biomedical Research which is a reproducible book with numerous examples of R code. For and in-depth text and course notes on reproducible regression modeling with R, including extensive case studies, see RMS.
The author wishes to thank the R Core team and R package developers along with RStudio for the free software they have developed that has revolutionized statistical computing, reporting, and reproducible research. Thanks to Titus von der Malsburg for careful reading of the text and for reporting numerous typographical and grammatical errors and a few programming errors. Thanks to Norm Matloff, University of California Davis, who provided big ideas to improve the preface and motivation for the book.
2022-07-17 |
6, 8 |
Moved overall missing data summary to missChk |
|
2022-07-11 |
14 |
New introductory text and references copied from BBR Chapter 4 |
|
2022-07-10 |
2 |
New chapter with a case study of methods used in the book |
Norm Matloff |
2022-07-08 |
11.1 |
New section on using data.table with summarization functions that return two-dimensional results |
|
2022-07-07 |
15, 15.7 |
New chart about 1st, 2nd, 3rd order analysis; new section with example of 3rd order |
|
2022-07-06 |
13, 10 |
Re-wrote intro to chapter, added LOCF example, added data table examples using %like% |
|
2022-07-05 |
10.2, 10 |
Renamed section and added more about removing columns; added link to data.table vignettes |
|
2022-07-04 |
10.4 |
New section on operations on multiple data tables |
|
2022-07-03 |
10 |
New diagram to explain data tables |
|
2022-06-30 |
Preface |
Better wording |
Norm Matloff |
2022-06-28 |
|
Added Flow diagrams at the start of chapters |
Norm Matloff |
2022-06-27 |
4.5 |
New section on making html tables |
|
2022-06-27 |
3.6, 3.2, 3.3, 4 |
Added more basic R functions, arrays NA s, how to make knitr use plain text printing of objects such as data frame/tables |
|
2022-06-26 |
Preface |
Clarified goals and audience |
Norm Matloff |
2022-06-26 |
|
Fixed various typographical errors |
Titus von der Malsburg |
2022-06-15 |
|
Published |
|