R Hmisc Package

Published

October 20, 2024

News

Hmisc version 5.2-0 will appear on CRAN around 2024-10-24. The most signicant change is the addition of a highly efficient function for computing the pseudomedian, also known as the Hodges-Lehman one-sample estimator. It is robust and efficient and is defined as the median of all possible pairs of values (including pairing an observation with itself). The pseudomendian was also added to the output from the describe function. It appears under the label pMedian.

Hmisc version 5.1-1 appeared on CRAN on 2023-05-08 and represents a milestone in Hmisc history. Here are the most significant additions and enhancements. Of these, the describe and fit.mult.impute functions have the most entensive enhancements. describe’s print method can be used to make a new table format when continuous and categorical variables are printed separately. Interactive sparklines show category details, e.g., for spike histograms, hovering over a spike will show the bin interval, frequency count, and particular values in the bin if they are few in number. For examples see this and this.

Function Purpose
fit.mult.immpute Add robust cluster sandwich covariance estimation, added method= and stacking method to facilitate likelihood ratio tests with rms::processMI
testCharDateTime New function to test character vectors for legal date/time/date-time majority of values
describe, mChoice Improved output for multiple choice variables
vlab,hlab,hlabs Fixed bug, improved logic, and also look in global environment for labels
spikecomp New options that facilitate sparklines
describe print method can use gt package and include interactive sparklines

Hmisc version 5.1-0 was completed on 2023-04-10 adds some major features above what is in 5.0+. The most significant additions are

  • fit.mult.impute implemented robust sandwich covariance estimates during multiple imputation
  • better handling of multiple choice mChoice variables in describe and summary
  • vlab, hlab, hlabs: fixed bug and improved logic, adding a search in the global environment
  • describe: added completely new print methods for separately printing categorical vs. continuous variables, and showing frequency distributions (spike histograms for continuous variables) using interactive sparklines by making use of the gt and sparkline packages, For examples see this where you will also see another new feature: for character variables with too many levels to tabulate, the lowest and highest alphabetic levels are listed, and the min/max/mean character width and mode category are reported.

Hmisc version 5.0-1 was completed 2023-03-05. The source version for a minor update to version 5.0-2 is available below, and binary versions are available for platforms other than the older Mac x86 hardware. Because of the number of new functions, version 5 represents the biggest update in the history of the package, which began in 1991. Another significant change is that Hmisc no longer loads other packages at startup.

The new functions are summarized below.

Function Purpose
rendHTML Render html text whether running interactively or when rendering a report
princmp Help in interpreting principal components and sparse principal components
getabd Fetch datasets from The Analysis of Biological Data
runParallel Make the parallel package easy to use
hashCheck Run digest::digest on a series of arguments to create a hash, fetch an existing result file which contains the hash of the input objects the last time an analysis was run, and to return the results stored in the file (an .rds file) if the hashes match, or NULL otherwise
runifChanged Re-run code if an input changed, as judged by hashChech
hlab Retrieve plotting-formatted variable label from a current dataset or from the object created by extractlabs, which takes priority
hlabs Call ggplot2 labs() after running variable names through hlab()
vlab Like hlab but returns text string form of label/units
extractlabs For \(\geq 1\) data frames/tables saves a data table of all variables that had a non-blank label or units attribute
nCoincident Count the number of coincident x,y pairs that are likely to be hidden from view in a scatterplot
meltData Take a formula and melt a data frame/table so that all right-hand-side formula variables are played against the left-hand side variable
ebpcomp Compute coordinates of components of an extended box plot. Useful for adding layers to ggplot2 graphs.
spikecomp Compute coordinates of components of a spike histogram
movStats General function for estimating the relationship between a continuous variable and a response, possibly stratified by another variable, using overlapping moving windows
combine.levels Added plevels argument and implemented new capabilities for ordered factors, for which only consecutive levels are allowed to be combined; also added m argument for all situations
completer Function by Yong-Hao Pua, Singapore General Hospital that facilitates drawing of multiple imputations to get one or more completed datasets
ecdfSteps Compute coordinates of empirical CDF with possible domain extension
fImport Front-end for rio package for general file import

Package Usage and Examples

Package Repositories and Updates

Bug Reports

Please go to GitHub issues

Mac Issues

If you get

ld: warning: directory not found for option '-L/usr/local/gfortran/lib/gcc/x86_64-apple-darwin18/8.2.0'
ld: warning: directory not found for option '-L/usr/local/gfortran/lib'
ld: library not found for -lgfortran

edit the following /Library/Frameworks/R.framework/Resources/etc by replacing the default (commented out line below) with the gcc directory location.

 # FLIBS =  -L/usr/local/gfortran/lib/gcc/x86_64-apple-darwin18/8.2.0 -L/usr/local/gfortran/lib -lgfortran -lquadmath -lm

FLIBS =  -L/usr/local/lib/gcc/11/gcc/x86_64-apple-darwin20/11.1.0 -L/usr/local/lib/gcc/11 -lgfortran -lquadmath -lm

Thanks to John Graves, Vanderbilt University.


Page created 2004-02-15