Hmisc examples using plotly interactive graphics in a distill html report
This report was produced using the distill R package, which is intended for producing academic articles in html format. You can see that distill does not always interact well with user-produced html code.
This is a set of reproducible examples for the R (R Development Team 2020) Hmisc package (Harrell 2020), put together in an rmarkdown html document using RStudio and knitr. When viewing the resulting html file you can see all the code, and there exist options to download the entire rmarkdown script, which is especially helpful for seeing how knitr chunks are specified. Graphics that have a plotly method for them are rendered using plotly instead of using defaults such as base graphics, lattice, or ggplot2. That way the plots are somewhat interactive, e.g., allow for drill-down for more information without the viewer needing to install R.
Rmarkdown themes such as bookdown and rmdformats::readthedown (the latter is used here) allow one to number figures and the symbolically reference them. The knitrSet capfile='captions.md' argument below capitalizes on this to make it easy for the user to insert a table of figures anywhere in the report. Here we place it at the end. A caption listed in the table of figures is the short caption (scap= or fig.scap in the chunk header) if there is one, otherwise the long caption is used. If neither caption is used, that figure will not appear in the table of figures.
The full script for this report may be found here.
The getHdata function is used to fetch a dataset from the Vanderbilt DataSets web site hbiostat.org/data. The upData function is used to
units attributed used by Hmisc and rms functions for table making and graphicscontents is used to print a data dictionary, run through an html method for nicer output. Information about the data source may be found here. Click on the number of levels in the contents table to jump to the value labels for the variable.
getHdata(pbc)
# Have upData move units from labels to separate attribute
pbc <- upData(pbc,
fu.yrs = fu.days / 365.25,
labels = c(fu.yrs = 'Follow-up Time',
status = 'Death or Liver Transplantation'),
units = c(fu.yrs = 'year'),
drop = 'fu.days',
moveUnits=TRUE, html=TRUE)
Input object size: 80952 bytes; 19 variables 418 observations
Label for bili changed from Serum Bilirubin (mg/dl) to Serum Bilirubin
units set to mg/dl
Label for albumin changed from Albumin (gm/dl) to Albumin
units set to gm/dl
Label for protime changed from Prothrombin Time (sec.) to Prothrombin Time
units set to sec.
Label for alk.phos changed from Alkaline Phosphatase (U/liter) to Alkaline Phosphatase
units set to U/liter
Label for sgot changed from SGOT (U/ml) to SGOT
units set to U/ml
Label for chol changed from Cholesterol (mg/dl) to Cholesterol
units set to mg/dl
Label for trig changed from Triglycerides (mg/dl) to Triglycerides
units set to mg/dl
Label for platelet changed from Platelets (per cm^3/1000) to Platelets
units set to per cm^3/1000
Label for copper changed from Urine Copper (ug/day) to Urine Copper
units set to ug/day
Added variable fu.yrs
Dropped variable fu.days
New object size: 84744 bytes; 19 variables 418 observations
# The following can also be done by running this command
# to put the results in a new browser tab:
# getHdata(pbc, 'contents')
html(contents(pbc), maxlevels=10, levelType='table')
| Name | Labels | Units | Levels | Storage | NAs |
|---|---|---|---|---|---|
| bili | Serum Bilirubin | mg/dl | double | 0 | |
| albumin | Albumin | gm/dl | double | 0 | |
| stage | Histologic Stage, Ludwig Criteria | integer | 6 | ||
| protime | Prothrombin Time | sec. | double | 2 | |
| sex | 2 | integer | 0 | ||
| age | Age | double | 0 | ||
| spiders | 2 | integer | 106 | ||
| hepatom | 2 | integer | 106 | ||
| ascites | 2 | integer | 106 | ||
| alk.phos | Alkaline Phosphatase | U/liter | double | 106 | |
| sgot | SGOT | U/ml | double | 106 | |
| chol | Cholesterol | mg/dl | integer | 134 | |
| trig | Triglycerides | mg/dl | integer | 136 | |
| platelet | Platelets | per cm^3/1000 | integer | 110 | |
| drug | 3 | integer | 0 | ||
| status | Death or Liver Transplantation | integer | 0 | ||
| edema | 3 | integer | 0 | ||
| copper | Urine Copper | ug/day | integer | 108 | |
| fu.yrs | Follow-up Time | year | double | 0 |
| Variable | Levels |
|---|---|
| sex | male |
| female | |
| spiders, hepatom | absent |
| ascites | present |
| drug | D-penicillamine |
| placebo | |
| not randomized | |
| edema | no edema |
| edema, no diuretic therapy | |
| edema despite diuretic therapy |
The html method is used for the describe function, and the output is put in a scrollable box. Other than for the overall title and variable names and labels, the output size used here is 80 (0.8 × the usual font size1). But the graphical display of the descriptives statistics that follows this is probably better.
# did have results='asis' above
d <- describe(pbc)
html(d, size=80, scroll=TRUE)
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 418 | 0 | 98 | 0.998 | 3.221 | 3.742 | 0.50 | 0.60 | 0.80 | 1.40 | 3.40 | 8.03 | 14.00 |
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 418 | 0 | 154 | 1 | 3.497 | 0.473 | 2.750 | 2.967 | 3.243 | 3.530 | 3.770 | 4.010 | 4.141 |
| n | missing | distinct | Info | Mean | Gmd |
|---|---|---|---|---|---|
| 412 | 6 | 4 | 0.893 | 3.024 | 0.9519 |
Value 1 2 3 4 Frequency 21 92 155 144 Proportion 0.051 0.223 0.376 0.350
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 416 | 2 | 48 | 0.998 | 10.73 | 1.029 | 9.60 | 9.80 | 10.00 | 10.60 | 11.10 | 12.00 | 12.45 |
| n | missing | distinct |
|---|---|---|
| 418 | 0 | 2 |
Value male female Frequency 44 374 Proportion 0.105 0.895
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 418 | 0 | 345 | 1 | 50.74 | 11.96 | 33.84 | 36.37 | 42.83 | 51.00 | 58.24 | 64.30 | 67.92 |
| n | missing | distinct |
|---|---|---|
| 312 | 106 | 2 |
Value absent present Frequency 222 90 Proportion 0.712 0.288
| n | missing | distinct |
|---|---|---|
| 312 | 106 | 2 |
Value absent present Frequency 152 160 Proportion 0.487 0.513
| n | missing | distinct |
|---|---|---|
| 312 | 106 | 2 |
Value absent present Frequency 288 24 Proportion 0.923 0.077
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 312 | 106 | 295 | 1 | 1983 | 1760 | 599.6 | 663.0 | 871.5 | 1259.0 | 1980.0 | 3826.4 | 6669.9 |
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 312 | 106 | 179 | 1 | 122.6 | 60.45 | 54.25 | 60.45 | 80.60 | 114.70 | 151.90 | 196.47 | 219.25 |
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 284 | 134 | 201 | 1 | 369.5 | 194.5 | 188.4 | 213.6 | 249.5 | 309.5 | 400.0 | 560.8 | 674.0 |
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 282 | 136 | 146 | 1 | 124.7 | 64.07 | 56.00 | 63.10 | 84.25 | 108.00 | 151.00 | 195.00 | 230.95 |
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 308 | 110 | 210 | 1 | 261.9 | 107.8 | 117.7 | 139.7 | 199.8 | 257.0 | 322.5 | 386.5 | 430.6 |
| n | missing | distinct |
|---|---|---|
| 418 | 0 | 3 |
Value D-penicillamine placebo not randomized Frequency 154 158 106 Proportion 0.368 0.378 0.254
| n | missing | distinct | Info | Sum | Mean | Gmd |
|---|---|---|---|---|---|---|
| 418 | 0 | 2 | 0.71 | 161 | 0.3852 | 0.4748 |
| n | missing | distinct |
|---|---|---|
| 418 | 0 | 3 |
Value no edema edema, no diuretic therapy
Frequency 354 44
Proportion 0.847 0.105
Value edema despite diuretic therapy
Frequency 20
Proportion 0.048
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 310 | 108 | 158 | 1 | 97.65 | 83.16 | 17.45 | 24.00 | 41.25 | 73.00 | 123.00 | 208.10 | 249.20 |
| n | missing | distinct | Info | Mean | Gmd | .05 | .10 | .25 | .50 | .75 | .90 | .95 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 418 | 0 | 399 | 1 | 5.251 | 3.429 | 0.671 | 1.661 | 2.992 | 4.736 | 7.155 | 9.649 | 11.063 |
| lowest : | 0.1122519 | 0.1177276 | 0.1396304 | 0.1943874 | 0.2108145 |
| highest: | 12.3203285 | 12.3449692 | 12.3832991 | 12.4736482 | 13.1279945 |
# prList is in Hmisc; useful for plotting or printing a list of objects
# Can just use plot(d) if don't care about the mess
# If using html output these 2 images would not be rendered no matter what
p <- plot(d)
# The option htmlfig=2 causes markupSpecs$html$cap() to be used to
# HTML-typeset as a figure caption and to put the sub-sub section
# marker ### in front of the caption. htmlfig is the only reason
# results='asis' was needed in the chunk header
# We define a long caption for one of the plots, which does not appear
# in the table of contents
# prList works for html notebooks but not html documents
# prList(p, lcap=c('', 'These are spike histograms'), htmlfig=2)
You can also re-form multiple plotly graphs into a single HTML object. If you want to have total control over long and short figure captions, use the Hmisc putHfig function to render the result, with a caption and a short caption for the table of contents. That would have fixed a problem with the chunk below: when plotly graphics are not rendered in the usual way, the figure is not numbered and no caption appears.
htmltools::tagList(p) # lapply(p, plotly::as.widget)
You can also create figure captions outside of R code by using the smg, fcap HTML tags defined in markupSpecs. The long caption not appearing in the table of contents will be in a separate line without ###.
Produce stratified quantiles, means/SD, and proportions by treatment group. Plot the results before rendering as an advanced html table:
s <- summaryM(bili + albumin + stage + protime + sex + age + spiders +
alk.phos + sgot + chol ~ drug, data=pbc,
overall=FALSE, test=TRUE)
plot(s, which='categorical')
Figure 1: Proportions and χ2 tests for categorical variables
To construct the caption outside of the code chunk use e.g. ### r cap(‘Proportions and,’ mu$chisq(), ‘tests for categorical variables’) where a backtick is placed before r and after the last ).
plot(s, which='continuous', vars=1 : 4)
Figure 2: Extended box plots for the first 4 continuous variables
plot(s, which='continuous', vars=5 : 7)
Figure 3: Extended box plots for the remaining continuous variables
html(s, caption='Baseline characteristics by randomized treatment',
exclude1=TRUE, npct='both', digits=3, middle.bold=TRUE,
prmsd=TRUE, brmsd=TRUE, msdsize=mu$smaller2)
| Baseline characteristics by randomized treatment. | |||||
| N |
D-penicillamine N=154 |
placebo N=158 |
not randomized N=106 |
Test Statistic |
|
|---|---|---|---|---|---|
Serum Bilirubin mg/dl |
418 | 0.725 1.300 3.600 3.649 ± 5.282 |
0.800 1.400 3.200 2.873 ± 3.629 |
0.725 1.400 3.075 3.117 ± 4.043 |
F2 415=0.03, P=0.9721 |
Albumin gm/dl |
418 | 3.342 3.545 3.777 3.524 ± 0.396 |
3.212 3.565 3.830 3.516 ± 0.443 |
3.125 3.470 3.720 3.431 ± 0.435 |
F2 415=2.13, P=0.121 |
| Histologic Stage, Ludwig Criteria : 1 | 412 | 0.03 4⁄154 | 0.08 12⁄158 | 0.05 5⁄100 | χ26=5.33, P=0.5022 |
| 2 | 0.21 32⁄154 | 0.22 35⁄158 | 0.25 25⁄100 | ||
| 3 | 0.42 64⁄154 | 0.35 56⁄158 | 0.35 35⁄100 | ||
| 4 | 0.35 54⁄154 | 0.35 55⁄158 | 0.35 35⁄100 | ||
Prothrombin Time sec. |
416 | 10.000 10.600 11.400 10.800 ± 1.138 |
10.025 10.600 11.000 10.653 ± 0.851 |
10.100 10.600 11.000 10.750 ± 1.078 |
F2 413=0.23, P=0.7951 |
| sex : female | 418 | 0.90 139⁄154 | 0.87 137⁄158 | 0.92 98⁄106 | χ22=2.38, P=0.3042 |
| Age | 418 | 41.43 48.11 55.80 48.58 ± 9.96 |
42.98 51.93 58.90 51.42 ± 11.01 |
46.00 53.00 61.00 52.87 ± 9.78 |
F2 415=6.1, P=0.0021 |
| spiders | 312 | 0.29 45⁄154 | 0.28 45⁄158 | χ21=0.02, P=0.8852 | |
Alkaline Phosphatase U/liter |
312 | 922 1283 1950 1943 ± 2102 |
841 1214 2028 2021 ± 2183 |
F1 310=0.06, P=0.8123 | |
SGOT U/ml |
312 | 83.8 117.4 151.9 125.0 ± 58.9 |
76.7 111.6 151.5 120.2 ± 54.5 |
F1 310=0.55, P=0.463 | |
Cholesterol mg/dl |
284 | 254 304 377 374 ± 252 |
248 316 417 365 ± 210 |
F1 282=0.37, P=0.5453 | |
|
a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD. N is the number of non-missing values. Tests used: 1Kruskal-Wallis test; 2Pearson test; 3Wilcoxon test . | |||||
with(pbc, histboxp(x=sgot, group=drug, sd=TRUE))
Figure 4: Spike histograms, means, quantiles, Gini’s mean difference, and SD stratified by treatment
The following is a better way to display proportions, for categorical variables. If computing marginal statistics by running the dataset through the Hmisc addMarginal function, the plot method with options(grType='plotly') is especially useful.
pbcm <- addMarginal(pbc, drug)
s <- summaryP(stage + sex + spiders ~ drug, data=pbc)
# putHcap('Proportions stratified by treatment')
plot(s, groups='drug')
Figure 5: Proportions (large symbols) and proportions stratified by treatment (small symbols)
s <- summaryP(stage + sex + spiders ~ drug, data=pbcm)
plot(s, marginVal='All', marginLabel='All Treatment Groups')
Figure 5: Proportions (large symbols) and proportions stratified by treatment (small symbols)
getHdata(support2)
with(support2, histboxp(x=meanbp, group=dzgroup, sd=TRUE, bins=200))
Figure 6: Spike histograms, means, quantiles, Gini’s mean difference, and SD for MAP stratified by diagnosis
As explained here, one can place captions under figures using ordinary knitr capabilities, and one can change the size of captions. The following example defines a CSS style to make captions small (here 0.6em), and produces a plot with a caption. Unlike using putHfig captions given in knitr chunks do not also appear in the table of contents.
Figure 7: This is a simple figure caption
R version 4.0.3 (2020-10-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Pop!_OS 20.10 Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Hmisc_4.4-2 ggplot2_3.3.2 Formula_1.2-4 survival_3.2-7 [5] lattice_0.20-41To cite R in publication use:
R Core Team (2020). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
.csl Reference Style Files# Note: mu was defined in an earlier code chunk
# Only need to install .csl file once.
mu$installcsl(rec=TRUE) # get list of recommended styles
mu$installcsl() # web search of styles meeting your criteria
# Install a .csl file to your project directory:
mu$installcsl('american-medical-association')
| Figure | Description |
|---|---|
| ?? | Two plotly graphics combined into one |
| 1 | Proportions and χ2 tests for categorical variables |
| 2 | Extended box plots for the first 4 continuous variables |
| 3 | Extended box plots for the remaining continuous variables |
| 4 | Stratified spike histograms and quantiles |
| 5 | Proportions with and without stratification by treatment |
| 6 | Stratified spike histograms and quantiles for MAP |
| 7 | This is a simple figure caption |
Note: the hidden R command that rendered the table of figures (including short captions) was markupSpecs$markdown$tof().