flowchart LR Fig[Figures] --> Lay[Layout<br>Size] Tab[HTML Tables] Place[Placement] --> places[Margin<br>Tabs<br>Expand/Hide] rept[reptools] --> rhf[Report Writing<br>Helper Functions] OF[Overall Format] --> ofs[HTML] OF --> ltx[LaTeX] --> pdf[pdf] OF --> mfd[Multi-Format<br>Reports] MD[Metadata<br>Report<br>Annotations] --> mds[Variable Labels<br>and Units]
4 Report Formatting
A state-of-the-art way to make reproducible reports is to use a statistical computing language such as R and its knitr
package in conjunction with either RMarkdown
or Quarto
, with the latter likely to replace the former. Both of the report-making systems allow one to produce reports in a variety of formats including html, pdf, and Word. Html is recommended because pages can be automatically resized to allow optimum viewing on devices of most sizes, and because html allows for interactive graphics and other interactive components. Pdf is produced by converting RMarkdown
or Quarto
-produced markdown elements to \(\LaTeX\).
This document can serve as a template for using R with Quarto
; one can see the raw script by clicking on Code
at the top right of the report. When one has only one output format target, things are fairly straightforward except some situations where mixed formats are rendered in the same code chunk. Click below for details.
To make use of specialized functions that produce html or \(\LaTeX\) markup, one often has to put results='asis'
in the code chunk header to keep the system from disturbing the generated html or \(\LaTeX\) markup so that it will be typeset correctly in the final document. This process works smoothly but creates one complication: if you print an object that produces plain text in the same code chunk, the system will try to typeset it in html or \(\LaTeX\). To prevent this from happening you either need to split the chunks into multiple chunks (some with results='asis'
and some not) or you need to make it clear that parts of the output are to be typeset verbatim. To do that a simple function pr
can sense if results='asis'
is in effect for the current chunk. If so, the object is surrounded by the markdown
verbatim indicator—three consecutive back ticks. If not the object is left alone. pr
is defined in the marksupSpecs$markdown$pr
object, so you can bring it to your session by copying into a local function pr
as shown below, which has a chunk option results='asis'
to show that verbatim output appears anyway. If the argument obj
to pr
is a data frame or data table, variables will be rounded to the value given in the argument dec
(default dec=3
) before printing. If you specify inline=x
the object x
is printed with cat()
instead of print()
. inline
is more for printing character strings.
An example of something that may not render correctly due to results='asis'
being in the chunk header (needed for html(...)
):
options(prType='html')
<- ols(y ~ rcs(x1, 5))
f # prints model summary in html format
f <- matrix((1:10)/3, ncol=2)
m
m# use pr(obj=m) to fix
Here are examples of pr
usage.
require(Hmisc)
<- markupSpecs$markdown$pr
pr <- (1:5)/7
x pr('x:', x)
x:
[1] 0.1428571 0.2857143 0.4285714 0.5714286 0.7142857
pr(obj=x)
[1] 0.1428571 0.2857143 0.4285714 0.5714286 0.7142857
pr(inline=paste(round(x,3), collapse=', '))
0.143, 0.286, 0.429, 0.571, 0.714
Instead of working to keep certain outputs verbatim you can use knitr::kable()
to convert verbatim output to markdown. Also see the yaml
df-print
html option, for which you may want to set df-print: kable
.
knitr/Quarto
will by default print data frames and other simple tables using html. To make knitr
using plain text printing, put this code at the top of the report to redefine the default knitr
printing function.
<- knitr::normal_print knit_print
4.1 Quarto
Syntax for Figures
One can specify sizes, layouts, captions, and more using Quarto
markup. Captions are ignored unless a figure is given a label. Figure labels must begin with fig-
. The figure can be cross-referenced elsewhere in the document using for example See \@fig-scatterplot
. Figure
will be placed in front of the figure number automatically. Here is example syntax.
```{r}
#| label: fig-myplot
#| fig-cap: “An example caption (use one long line for caption)”
#| fig-height: 3
#| fig-width: 4
plot(1:7, abs(-3 : 3))
```
If the code produces multiple plots you can combine them into one with a single overall caption and include subcaptions for the individual panels:
```{r}
#| label: fig-myplot
#| fig-cap: “Overall caption …”
#| fig-height: 3
#| fig-width: 4
#| layout-ncol: 2
#| fig-subcap:
#| - “Subcaption for panel (a)”
#| - “Subcaption for panel (b)”
plot(1:7, abs(-3 : 3))
hist(x)
```
To include an existing image while making use of Quarto
for sizing and captioning etc. use this example.
```{r out.width=“600px”}
#| label: fig-mylabel
#| fig-cap: “…”
knitr::include_graphics(‘my.png’)
```
If you don’t need to caption or cross-reference the figure use e.g.
Other examples are in the next section.
The reptools
repository has helper functions for building a table of figures. To use those, put addCap()
or addCap(scap="short caption for figure")
as the first line of code in the chunk. The full caption is taken as the fig-cap:
markup. If you don’t specify scap
too addCap
the short caption will be taken as the fig-scap:
markup, or if that is missing, the full caption. At the end of the report you can print the table of figures using the following syntax (but surround the last line with back ticks).
# Figures
r printCap()
For chunks having #| label: fig-
you can automatically have knitr
call addCap
at the start of a chunk, extracting the needed information, if you run the reptools
function hookaddcap()
in a chunk before the first chunk that produced a graph. This procedure is used through this book. addCap
makes use of fig-scap:
for short captions.
4.2 Quarto
Built-in Syntax for Enhancing R Output
Helper functions described below allow one to enhance graphical and tabular R output by taking advantage of Quarto
formatting features. These functions allow one to produce different formats within one code chunk, e.g., a plot in the margin and a table in a collapsible note appearing after the code chunk. But if you need only one output format within a chunk you can make use of built-in syntax as described here. The yaml
-like syntax also allows you to specify heights and widths for figures, plus multi-figure layouts.
Here is some example code with all the markup shown.
```{r}
#| column: margin
#| fig-height: 1
#| fig-width: 3
par(mar=c(2, 2, 0, 0), mgp=c(2, .5, 0))
set.seed(1)
x <- rnorm(1000)
hist(x, nclass=40, main=’’)
x[1:3] # ordinary output stays put
knitr::kable(x[1:3]) # html output put in margin
hist(x, main=’’)
```
This results follow.
par(mar=c(2, 2, 0, 0), mgp=c(2, .5, 0))
set.seed(1)
<- rnorm(1000)
x hist(x, nclass=40, main='')
1:3] # ordinary output stays put x[
[1] -0.6264538 0.1836433 -0.8356286
::kable(x[1:3]) # html output put in margin knitr
x |
---|
-0.6264538 |
0.1836433 |
-0.8356286 |
hist(x, main='')
Here are a few markups for figure layout inside R chunks.
Wide page (takes over the margins) and put multiple plots in 1 row:
#| column: screen-inset
#| layout-nrow: 1
When plotting 3 figures put the first 2 in one row and the third in the second row and make it wide.
#| layout: [[1,1], [1]]
Make the top left panel be wider than the top right one.
#| layout: [[70,30], [100]]
Top left and top right panels have equal widths but devote 0.1 of the total width to an empty region between the two top panels.
#| layout: [[45, -10, 45], [100]]
See here for details about figure specifications inside code chunks.
You can put some .aside
information to the right of R output.
4.3 Quarto
Report Writing Helper Functions
Helper functions are defined when you run the Hmisc
function getRs
to retrieve them from Github
, i.e., getRs('reptools.r')
. You can get help on these functions by running rsHelp(functionname)
. Several of the functions construct Quarto
callouts which are fenced-off sections of markup that trigger special formatting, especially when producing html. The special formatting includes collapsible sections and marginal notes. Here is a summary of some of the reptools
helper functions.
Function | Purpose |
---|---|
dataChk |
run a series of logical expressions for checking data consistency, put results in separate tabs using maketabs , and optionally create two summary tabs |
dataOverview |
runs a data overview report |
missChk |
creates a series of analyses of the extent and patterns of missing values in a data table or data frame, and puts graphical summaries in tabs |
hookaddcap |
makes knitr automatically extract figure labels, captions, short captions for use in list of figures |
htmlList |
print a named list using the names as headers |
kabl |
front-end to knitr::kable and kables . If you run kabl on more than one object it will automatically call kables . |
makecallout |
generic Quarto callout maker used by makecnote , makecolmarg |
makecnote |
print objects or run code and place output in an initially collapsed callout note |
makecolmarg |
print objects or run code and place output in a marginal note |
maketabs |
print objects or run code placing output in separate tabs |
makemermaid |
makes a mermaid diagram with R variable values included in the diagram |
makegraphviz |
similar to makemermaid but using graphviz |
varType |
classify variables in a data table/frame or a vector as continuous, discrete, or non-numeric non-discrete |
conVars |
use varType to extract list of continuous variables |
disVars |
use varType to extract list of discrete variables |
vClus |
run Hmisc::varclus on a dataset after reducing it |
The input to maketabs
, as will be demonstrated later, may be a named list
, or more commonly, a series of formulas whose right-hand sides are executed and the result of each formula is placed in a separate tab. The left side of the formula becomes the tab label. For makecolmarg
there should be no left side of the formula as marginal notes are not labeled. For the named list
option the list
names become the tab names. Examples of both approaches appear later in this report. In formulas, a left side label must be enclosed in back ticks and not quotes if it is a multi-word string. A wide
argument is used to expand the width of the output outside the usual margins. An initblank
argument creates a first tab that is empty. This allows one to show nothing until one of the other tabs is clicked. Alternately you can specify as the first formula ` ` ~ ` `.
The two approaches to using maketabs
also apply to makecnote
and makecolmarg
. Examples of the “print an object and place it inside a callout” are given later in the report for makecnote
and makecolmarg
. Here is an example of the more general formula method that can render any object, including html widgets as produced by plotly
graphics. An interactive plotly
graphic appears at the bottom of the plots in the right margin. You can single click on elements in the legend to turn them off and on, and double click within the legend to restore to default values.
require(Hmisc)
options(plotlyauto=TRUE) # makes Hmisc use plotly's auto size option
# rather than computing height, width
getRs('reptools.r')
set.seed(1)
<- round(rnorm(100, 100, 15))
x makecolmarg(~ table(x) + raw + hist(x) + plot(ecdf(x)) + histboxp(x=x))
x
67 70 73 77 78 79 81 82 83 84 86 87 88 89 90 91 92 93 94 95
1 1 1 1 1 1 2 1 1 1 1 1 1 3 1 6 1 3 3 2
96 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 116 117
1 6 5 3 2 1 2 2 4 5 2 2 6 2 3 3 1 2 1 3
118 120 121 122 123 124 130 133 136
2 1 1 1 1 2 1 1 1
# or try makecnote(`makecnote example` ~ kabl(table(x)) + hist(x) + ...
# Avoid raw by using kabl(table(x)) instead of table(x)
Adding + raw
to a formula in makecnote
, makecolmarg
, or maketabs
forces printed results to be treated as raw verbatim R output.
makecallout
is a general Quarto
callout maker that implements different combinations of the following: list
or formula, print
or run code, defer executing and only produce the code to execute vs. running the code now, and close the callout or leave it open for more calls.
reptools
also has helper functions for interactively accessing information to help in report and analysis building:
Function | Purpose |
---|---|
htmlView |
view html-converted objects in RStudio View pane |
htmlViewx |
view html-converted objects in external browser |
4.4 Multi-Output Format Reports
To allow one report to be used to render multiple output formats, especially html and pdf, it is helpful to be able to sense which output format is currently in play, and to use different functions or options to render output explicitly for the current format. Here is how to create variables that can be referenced simply in code throughout the report, and to invoke the plotly
graphics package if output is in html to allow interactivity. A small function ggp
is defined so that if you run any ggplot2
output through it, the result will be automatically converted to plotly
using the ggplotly
function, otherwise it is left at standard static ggplot2
output if html is not the output target.
<- if(knitr::is_html_output ()) 'html' else 'pdf'
outfmt <- if(knitr::is_latex_output()) 'latex' else 'html'
markup <- outfmt == 'html'
ishtml if(ishtml) require(plotly)
<- if(ishtml) ggplotlyr else function(ggobject, ...) ggobject
ggp # See below for more about ggplotlyr (a front end for ggplotly that can
# correct a formatting issue with hover text)
Quarto
has a excellent facility for conditionally including document sections depending on the currently chosen output format.
The Hmisc
, rms
, and rmsb
packages have a good deal of support for creating \(\LaTeX\) output in addition to html. They require some special \(\LaTeX\) packages to be accessed. In addition, if using any of Quarto
’s nice features for making marginal notes, there is another \(\LaTeX\) package to attach. Below you’ll find what needs to be added to the yaml
prologue at the top of your script if using Quarto
. You have to modify pdf-engine
to suit your needs. I use luatex
because it handles special unicode characters. In the future (approximately July 2022) a bug in Pandoc
will be fixed and you can put links-as-notes: true
in the yaml
header instead of redefining href
and linking in hyperref
.
format:
html:
self-contained: true
. . .
pdf:
pdf-engine: lualatex
toc: false
number-sections: true
number-depth: 2
top-level-division: section
reference-location: document
listings: false
header-includes:
\usepackage{marginnote, here, relsize, needspace, setspace, hyperref}
\renewcommand{\href}[2]{#2\footnote{\url{#1}}}
The href
redefinition above turns URLs into footnotes if running \(\LaTeX\).
There is one output element provided by Quarto
that will not render correctly to \(\LaTeX\): a marginal note using the markup .column-margin
. To automatically use an alternate in-body format, define a function that can be used for both typesetting formats.
<- if(ishtml) '.column-margin'
mNote else
'.callout-note appearance="minimal"'
Then use r mNote enclosed in back ticks in place of the .column-margin
callout for generality.
Even when producing only html, one may wish to save individual graphics for manuscript writing. For non-interactive graphics you can right click on the image and download the .png
file. For interactive plots, plotly
shows a “take a snapshot” icon when you hover over the image. Clicking this icon will produce a static .png
snapshot of the graph. Some graphs are not appropriate for static documents, and the variables created in the code above can be checked so that, for example, an alternative graph can be produced when making a .pdf
file. But in other cases one just produces an additional static plot that is not shown in the html report. See the margin note near @fig-survplotp for an example.
Hmisc
Formatting for Variable Labels in Tables
As done with various Hmisc
and rms
package functions, one can capitalize on Hmisc
’s special formatting of variable labels and units when constructing tables in \(\LaTeX\) or html. The basic constructs are shown in the code below.
# Retrieve a set of markup functions depending on typesetting format
# See below for definition of ishtml
<- markupSpecs[[if(ishtml) 'html' else 'latex']]
specs # Hmisc markupSpecs functions create plain text, html, latex,
# markdown, or plotmath code
<- specs$varlabel # retrieve an individual function
varlabel # Format text describing variable named x
# hfill=TRUE typesets units to be right-justified in label
# Use the following character string as a row label
# Default specifies the string to use if there is not label
# (usually taken as the variable name)
varlabel(label(x, default='x'), units(x), hfill=TRUE)
Mermaid
Note: As of 2022-12-11 quarto
has withdrawn support for tooltips. I hope that is added back someday.
As exemplified in @sec-doverview, Mermaid
provides an easy way to make many types of diagrams. Diagrams are more valuable when they are dynamic. Mermaid
provides an easy way to include pop-up tooltips in diagram nodes, to provide deeper information about the node. When the tooltips contain tables whose columns need to line up, you need to put the following in your document so that tooltips will used a fixed-width font and preserve white space. The best way to include this is to put it in a .css
file that is reference in the report’s yaml
, or to surround the four lines with <style>
… </style>
.
mermaidTooltip {
font-family: courier;
white-space: pre;
}
4.5 HTML Tables
Nicely formatted tables can be created in multiple ways:
- using customized code that directly writes html markup
- using customized code that directly writes \(\LaTeX\) markup
- using customized code that writes markdown markup (e.g., “pipe” tables)
- hand coding markdown (usually pipe tables)
The latter two provide less flexibility but have the advantage of being automatically converted to html or \(\LaTeX\) depending on your destination format.
Here is an example of a hand coded markdown pipe table. Note (1) the second line of the markup indicates that the first column is to be left-justified and the second column right-justified, and (2) you can include computed values from R expressions.
| This Column | That Column |
|:—–|—–:|
| cat | dog |
| `r pi` | `r 2+3` |
: Table caption
The result is
This Column | That Column |
---|---|
cat | dog |
3.1415927 | 5 |
There is an automatic feature of html that makes it especially attractive as a destination format: If a cell contains a long string of characters, those strings will be line-wrapped appropriately, with the line length depending on the width of the display device.
The knitr
package kable
function provides an easy way to produce html tables from data tables/frames and matrices, and knitr::kables
allows one to put several tables together. The reptools
repository kabl
function combines the features of kable
and kables
. The kableExtra
package allows you to greatly extend what kable
can do.
There are many R packages and functions for making advanced html tables. See for example the Table 1
tab in Chapter 9. This table was produced by the Hmisc
package summaryM
function, which used the htmlTable
function in the htmlTable
package. Other packages to consider are tangram
and packages discussed here.
4.6 CSS
When producing reports in html, you can create custom html styles that quarto
will use. These styles are defined using HTML5’s CSS (cascading style sheets). An example .css
file is at hbiostat.org/rflow/h.css, and your report may gain access to such a .css
file by including a line like css: h.css
in the top-level quarto
yaml
header under the html:
section.
Two of the styles defined by defined by h.css
are smaller
and smaller2
. smaller
will shrink the font size of a block of text (even one containing code and R output, but it does not apply to tables) to 80% of its original size. smaller2
will make it 64% of the original size. To invoke these styles we use quarto
“divs
” as follows:
::: {.smaller2}
This is text that will appear smaller ...
:::
Here is an example using smaller2
.
This is text that will appear smaller. More of the same. More of the same. More of the same. More of the same. More of the same. More of the same. More of the same. More of the same. More of the same. More of the same. More of the same. More of the same. More of the same.
X | Y |
---|---|
2.3 | 4.5 |
2.2 | 3.3 |
<- pi
x x
[1] 3.141593
Another style in h.css
is quoteit
which is useful for including quotations. The text is italicized, dark blue, 80% of regular size, and has 10% left and right margins. Here is an example.
::: {.quoteit}
Some eloquent quote appears here. The author of the quote is assumed to know what they are talking about, and seem to be able to express themselves.
:::
Some eloquent quote appears here. The author of the quote is assumed to know what they are talking about, and seem to be able to express themselves.
4.7 Diagrams
Quarto
builds in two diagramming languages: mermaid
and graphviz
. Section 8.1 has detailed examples using mermaid
, which uses a simpler format than graphviz
. graphviz
allows for more complex diagrams exemplified here and also provides more control. graphviz
nodes can include HTML tables, and you can even have arrows drawn between table cells or between a table cells and other non-table nodes. Here is an example, taken from this excellent post. Connections between diagram elements are made possible by assigning port identifiers to elements.
dot
which is a primary module of graphviz
, for directed graphs.digraph {
graph [pad="0.5", nodesep="0.5", ranksep="2"]
// splines=ortho for square connections
node [shape=plain]
rankdir=LR;
Foo [label=<
<table border="0" cellborder="0" cellspacing="0">
<tr><td><b><i>InputFoo</i></b></td><td><font color="blue">two</font> </td> </tr><HR/>
<tr> <td port="1">one</td><td> two </td></tr>
<tr> <td port="2">two</td><td> two </td></tr>
<tr> <td port="3">three</td><td> two </td></tr>
<tr> <td port="4">four</td><td> two </td></tr>
<tr> <td port="5">five</td><td port="a"> two </td></tr>
<tr> <td port="6">six</td><td port="b"> two </td></tr>
</table>>];
Bar [label=<This and that<br/><font face="courier" color="darkblue">and that and <b>that</b></font>>];
Foo:3:w -> Foo:2:w;
// node name:port:direction (n,ne,e,se,s,sw,w,nw,c,_)
// c=center within node, _=use appropriate node side
// See graphviz.org/docs/attr-types/portPos
Foo:3:w -> Foo:6:w;
Foo:6:w -> Foo:1:w;
Foo:1:w -> Foo:a:e;
Foo:b:e -> Bar;
}
```
The reptools
makegraphviz
function allows variable insertions into graphviz
diagrams, and if a variable to be inserted is a data frame it will be converted to a simple HTML table that graphviz
can handle. Here is an example. {{u}}
is the syntax for inserting the value of variable u
.
<- data.frame(x1=round(runif(3), 3), x2=.q(a,b,c))
x pr(obj=x)
x1 x2
1 0.268 a
2 0.219 b
3 0.517 c
<- 'digraph {node [shape=plain];
z Foo [shape=oval label=<Information about <font color="blue">{{g}}</font>>];
Bar [label=<{{u}}>]; // add shape=box to box the table
Foo -> Bar}'
makegraphviz(z, g='states', u=x, file='gvtest.dot')
The diagram is then rendered with a dot
chunk containing a special file: gvtest.dot
markup.
See Section 8.1 for a more advanced graphviz
example that is along these lines. See this for some excellent graphviz
flowchart examples.