7  Data Checking

flowchart LR
Ran[Range Checks]
Con["Cross-Variable Consistency"]
Rep[Checking and Reporting<br>With Minimal Coding] --> Li[Listings] & S[Summaries]
Ran --> Rep
Con --> Rep

Besides useful descriptive statistics exemplified below, it is important to flag suspicious values in an automated way. Since checking multiple columns may involve a large number of R expressions to run to classify observations as suspicious, let’s automate the process somewhat by specifying a vector of expressions. Then we have R “compute on the language” to parse the expressions for finding observations to flag, and for printing. This is done by the dataChk function in qreport.

The following code results in separate output for each individual data check, in separate Quarto tabs. The dataset does not have a subject ID variable so let’s create one, and also add a site variable to print. Arguments are specified to dataChk so that no tab is produced for a condition that never occurred in the data, and a tab is produced showing all data flags, sorted by id and site.

require(Hmisc)
require(data.table)
require(qreport)
getHdata(stressEcho)
w <- stressEcho
setDT(w)
w[, id := 1 : .N]
set.seed(1)
w[, site := sample(LETTERS[1:6], .N, replace=TRUE)]
checks <- expression(
  age < 30 | age > 90,
  gender == 'female' & maxhr > 170,
  baseEF %between% c(72, 77),
  baseEF > 77,
  baseEF > 99,
  sbp > 250 & maxhr < 160)
dataChk(w, checks, id=c('id', 'site'),
        omit0=TRUE, byid=TRUE, html=TRUE)
       id   site        age 
       
 1:    14      B         29 
 2:    23      F         91 
 3:    30      D         26 
 4:    60      A         91 
 5:    64      D         92 
 6:   116      A         28 
 7:   235      F         91 
 8:   259      D         29 
 9:   313      C         93 
 
       id   site gender      maxhr 
        
 1:    11      A female        171 
 2:    89      B female        182 
 3:   412      E female        200 
 
       id   site     baseEF 
       
 1:    56      B         72 
 2:   200      A         75 
 3:   272      A         72 
 4:   366      E         74 
 5:   406      B         77 
 6:   433      A         74 
 7:   495      D         72 
 8:   496      E         74 
 
       id   site     baseEF 
       
 1:   299      D         79 
 2:   434      E         83 
 
       id   site        sbp      maxhr 
        
 1:    51      B        309        146 
 2:   146      E        283        135 
 3:   353      D        274        117 
 
 Key:  
        id   site                            Check     Values 
                                       
  1:    11      A gender == "female" & maxhr > 170 female 171 
  2:    14      B              age < 30 | age > 90         29 
  3:    23      F              age < 30 | age > 90         91 
  4:    30      D              age < 30 | age > 90         26 
  5:    51      B          sbp > 250 & maxhr < 160    309 146 
  6:    56      B                  baseEF [72, 77]         72 
  7:    60      A              age < 30 | age > 90         91 
  8:    64      D              age < 30 | age > 90         92 
  9:    89      B gender == "female" & maxhr > 170 female 182 
 10:   116      A              age < 30 | age > 90         28 
 11:   146      E          sbp > 250 & maxhr < 160    283 135 
 12:   200      A                  baseEF [72, 77]         75 
 13:   235      F              age < 30 | age > 90         91 
 14:   259      D              age < 30 | age > 90         29 
 15:   272      A                  baseEF [72, 77]         72 
 16:   299      D                      baseEF > 77         79 
 17:   313      C              age < 30 | age > 90         93 
 18:   353      D          sbp > 250 & maxhr < 160    274 117 
 19:   366      E                  baseEF [72, 77]         74 
 20:   406      B                  baseEF [72, 77]         77 
 21:   412      E gender == "female" & maxhr > 170 female 200 
 22:   433      A                  baseEF [72, 77]         74 
 23:   434      E                      baseEF > 77         83 
 24:   495      D                  baseEF [72, 77]         72 
 25:   496      E                  baseEF [72, 77]         74 
        id   site                            Check     Values 
 
                              Check n 
 1              age < 30 | age > 90 9 
 2 gender == "female" & maxhr > 170 3 
 3                  baseEF [72, 77] 8 
 4                      baseEF > 77 2 
 5                      baseEF > 99 0 
 6          sbp > 250 & maxhr < 160 3