13  Graphics

I make heavy use of ggplot2, plotly and R base graphics. plotly is used for interactive graphics, and the R plotly package provides an amazing function ggplotly to convert a static ggplot2 graphics object to an interactive plotly one. If the user goes to the trouble of adding labels for graphics entities (usually points, lines, curves, rectangles, and circles) those labels can become hover text in plotly without disturbing anything in static graphics. As shown here you can sense whether an html or pdf report is being produced, and for html all ggplot2 objects can be automatically transformed to plotly.

With ggplotly extra text appears in front of labels, but the result of ggplotly can be run through Hmisc::ggplotlyr to remove this as shown in the example.

Many types of graphs can be created with base graphics, e.g. hist(age, nclass=50) or Ecdf(age) but using ggplot2 for even simple graphics makes it easy to add handle multiple groups on one graph or to create multiple panels for strata using faceting. ggplot2 has excellent default font sizes and axis labeling that works for most sizes of plots.

Here is a prototypical ggplot2 example illustrating many of the features I most often use. Ignore the ggplot2 label attribute if not using plotly. Options are given to the Hmisc label function so that it will retrieve the variable label and units (if present) and format them for axis labels or tables. The formatting takes into account whether html output is being created and plotly is being used.

require(Hmisc)
require(data.table)
getRs('reptools.r')
ishtml <- knitr::is_html_output()
hookaddcap()   # make knitr call a function at the end of each chunk
               # to try to automatically add to list of figure
# Create a vector of formatted labels for all variables in data
# For variables without labels or units use the variable name
# as the label.  If html and plotly are not in effect use R's
# regular plotmath notation to typeset labels/units

getHdata(stressEcho)
d <- stressEcho
setDT(d)
nam   <- names(d)
nv    <- length(nam)
vlabs <- structure(character(nv), names=nam)
for(n in nam)
  vlabs[n] <- label(d[[n]], plot=TRUE, html=ishtml, default=n)

# Define substitutes for xlab and ylab that look up our
# constructed labels.
# Could instead directly use xlab(vlabs['age'])
labx <- function(v) xlab(vlabs[[as.character(substitute(v))]])
laby <- function(v) ylab(vlabs[[as.character(substitute(v))]])
g <-
  ggplot(d, aes(x=age, y=bhr, color=gender, label=paste0('dose:', dose))) +
         geom_point() + geom_smooth() +
         guides(color=guide_legend(title='')) +
         theme(legend.position='bottom') +  # not respected by ggplotly
         labs(caption='Scatterplot of age by basal heart rate stratified by sex') +
         labx(age) + laby(bhr)
# or just xlab('Age in years') + ylab('Basal heart rate')
# To put the caption in a different font or size use e.g.
#   theme(plot.caption=element_text(family='mono', size=7))
# Likewise for the legend
#   theme(legend.text=element_text(family='mono', size=9))

ggplotlyr(g, remove='.*): ')  # removes paste0("dose:", dose): 

# dose is in hover text for each point

13.1 Formatting Columns in Legends

If the text for the legend contains columns that you want to have lined up, build the columns so that they are of equal length and use mono font, e.g.

pad <- function(x, n)  # pad x to n characters
  substring(paste(x, '                       '), 1, n)
d$z   <- paste(pad(a), b)
ggplot(d, aes(x, y, color=z)) + geom_line() +
  theme(legend.text = element_text(family='mono'))

13.2 Plot Annotation

  • See this by Mine Çetinkaya-Rundel. Note that when annotating a facet or a whole plot, when the annotation does not use an aesthetic (such as colors to represent different curves on one facet), make sure that aesthetic does not appear in ggplot() but rather only in the geoms.

For large datasets the Hmisc package has a function ggfreqScatter that makes it easy to see overlapping points by color coding the frequency of points in each small bin. That way scatterplots scale to very large datasets. Here is an example:

html=TRUE was needed because otherwise axis labels are formatted using R’s plotmath and plotly doesn’t like that.
set.seed(1)
x <- round(rnorm(2000), 1)
y <- 2 * (x > 1.5) + round(rnorm(2000), 1)
z <- sample(c('a', 'b'), 2000, replace=TRUE)
label(x) <- 'X Variable'   # could use xlab() &
label(y) <- 'Y Variable'   # ylab() in ggfreqScatter()
g <- ggfreqScatter(x, y, by=z, html=ishtml)
# If variables were inside a data table use
# g <- d[, ggfreqScatter(x, y, by=z, html=ishtml)]
g

Now convert the graphic to plotly if html is in effect otherwise stay with ggplot2 output.

ggplotlyr(g)

When you hover the mouse over a point, its frequency pops up.

Many functions in the Hmisc and rms packages produce plotly graphics directly. These two package’s functions using plotly try to compute optimal figure heights and widths, but it is usually better to let plotly auto-size the plots. Putting options(plotlyauto=TRUE) will override these dimensions and force plotly to auto-size. Putting this command in your .Rprofile file in the home directory makes this easy.

One of the most unique pure plotly functions in Hmisc is dotchartpl.