Load(ssafety)
ssafety <- upData(ssafety, rdate=as.Date(rdate),
smoking=factor(smoking, 0:1, c('No','Yes')),
labels=c(smoking='Smoking', bmi='BMI',
pack.yrs='Pack Years', age='Age',
height='Height', weight='Weight'),
units=c(age='years', height='cm', weight='Kg'),
print=FALSE)
mtime <- function(f) format(file.info(f)$mtime)
datadate <- mtime('ssafety.rda')
primarydatadate <- mtime('ssafety.rda')
## List of lab variables that are missing too much to be used
omit <- Cs(amylase,aty.lymph,glucose.fasting,neutrophil.bands)
## Make a list that separates variables into major categories
vars <- list(baseline=Cs(age, sex, race, height, weight, bmi,
smoking, pack.yrs),
ae =Cs(headache, ab.pain, nausea, dyspepsia, diarrhea,
upper.resp.infect, coad),
ekg =setdiff(names(ssafety)[c(49:53,55:56)],
'atrial.rate'),
chem=setdiff(names(ssafety)[16:48],
c(omit, Cs(lymphocytes.abs, atrial.rate,
monocytes.abs, neutrophils.seg,
eosinophils.abs, basophils.abs))))
week <- ssafety$week
weeks <- sort(unique(week))
base <- subset(ssafety, week==0)
denom <- c(c(enrolled=500, randomized=nrow(base)), table(base$trx))
sethreportOption(tx.var='trx', denom=denom)
## Initialize app.tex
Philosophy
The reporting tools used here are based on a number of lessons learned from the intersection of the fields of statistical graphics, graphic design, and cognitive psychology, especially from the work of Bill Cleveland, Ralph McGill, John Tukey, Edward Tufte, and Jacques Bertin.
- Whenever largely numerical information is displayed, graphs convey the information most often needed much better than tables.
- Tables usually show more precision than is warranted by the sample information while hiding important features.
- Graphics are much better than tables for seeing patterns and anomalies.
- The best graphics are ones that make use of features that humans are most accurate in perceiving, namely position along a common scale.
- Information across multiple data categories is usually easier to judge when the categories are sorted by the numeric quantity underlying the information.
- The most robust and informative descriptive statistics for continuous variables are quantiles and whole distribution summaries.
- For group comparisons, confidence intervals for individual means, medians, or proportions are not very useful, and whether or not two confidence intervals overlap is not the correct statistical approach for judging the significance of the difference between the two. The half-width of the confidence interval for the difference, when centered at the midpoint of the two estimates, provides a succinct precision display, and this half-interval touches the two estimates if and only if there is no significant difference between the two.
- Each graphic needs a marker that provides the reader with a sense of exactly what fraction of the sample is being analyzed in that graphic.
- Tables are best used as backups to graphics.
- Tables should emphasize estimates that are not functions of the sample size. For categorical variables, proportions have interpretations independent of sample size so they are the featured estimates, and numerators and denominators are subordinate to the proportions. For continuous variables, minimum and maximum, while useful for data quality checking, are not population parameters, and they expand as n↑, so they are not proper summary statistics.
- With the availability of graphics that over hover text, it is more effective to produce tabular information on demand. The software used here will pop-up tabular information related to the point or group currently pointed to by the mouse. This makes it less necessary to produce separate tables.
Notation
Dot Charts
Dot charts are used to present stratified proportions. Details, including all numerators and denominators of proportions, can be revealed by hovering the mouse over a point.
Survival Curves
Graphs containing pairs of Kaplan-Meier survival curves show a shaded region centered at the midpoint of the two survival estimates and having a height equal to the half-width of the approximate 0.95 pointwise confidence interval for the difference of the two survival probabilities. Time points at which the two survival estimates do not touch the shaded region denote approximately significantly different survival estimates, without any multiplicity correction. Hover the mouse to see numbers of subjects at risk at a specific follow-up time, and more information.
Introduction
This is a sample of the part of a closed meeting Data Monitoring Committee report that contains software generated results. Components related to efficacy, study design, data monitoring plan, summary of previous closed report, interpretation, protocol changes, screening, eligibility, and waiting time until treatment commencement are not included in this example. This report used a random sample of safety data from a randomized clinical trial. Randomization date, dropouts, and compliance variables were simulated, the latter two not being made consistent with the presence or absence of actual data in the random sample. The date and time that the analysis file used here was last updated was2013-10-27 10:50:46. Source analysis files were last updated on primarydatadate.
Accrual
accrualReport(randomize(rdate) ~ site(site), data=base,
dateRange=c('1990-01-01','1994-12-31'),
targetDate='1994-12-31', targetN=300,
closeDate=max(base$rdate))
Study Numbers
|
Number
|
Category
|
|
20
|
Sites
|
|
250
|
Participants randomized
|
|
12.5
|
Participants per site
|
|
20
|
Sites randomizing
|
|
12.5
|
Subjects randomized per randomizing site
|
|
59.4
|
Months from first subject randomized (1990-01-03) to 1994-12-15
|
|
1101.7
|
Site-months for sites randomizing
|
|
55.1
|
Average months since a site first randomized
|
|
0.23
|
Participants randomized per site per month
|
∟ Participants randomized over time
|
The blue line depicts the cumulative frequency. The thick grayscale line represent targets.
|
|
Category
|
N
|
Used
|
|
Enrolled
|
500
|
250
|
|
Randomized
|
250
|
250
|
|
|
∟ Number of sites × number of participantsrandomized
|
Number of sites having the given number of participants randomized
|
|
Category
|
N
|
Used
|
|
Enrolled
|
500
|
250
|
|
Randomized
|
250
|
250
|
|
|
∟ Participants randomized by site
Baseline Variables
# Simulate regions
set.seed(1)
base$region <- sample(c('north', 'south'), nrow(base), replace=TRUE)
dReport(sex + race + smoking ~ region + trx, groups='trx', data=addMarginal(base, region))
∟ Proportions for sex, race, and smoking stratified by region and treatment
|
Proportions for sex, race, and smoking stratified by region and treatment. N=250
|
|
Category
|
N
|
Used
|
|
Enrolled
|
500
|
250
|
|
Randomized
|
250
|
250
|
|
A
|
81
|
81
|
|
B
|
169
|
169
|
|
|
Variable
|
A
|
B
|
|
Sex
|
81
|
169
|
|
Race
|
81
|
169
|
|
Smoking
|
81
|
169
|
|
|
## Show spike histogram and quantiles for raw data
dReport(age + height + weight + bmi + pack.yrs ~ trx, data=base,
popts=list(ncols=2))
∟ Histograms for age, height, weight, BMI, and pack years stratified by treatment
|
Histograms for age, height, weight, BMI, and pack years stratified by treatment. N=250
|
|
Category
|
N
|
Used
|
|
Enrolled
|
500
|
250
|
|
Randomized
|
250
|
250
|
|
A
|
81
|
81
|
|
B
|
169
|
169
|
|
|
Variable
|
A
|
B
|
|
Age
|
81
|
169
|
|
Height
|
81
|
169
|
|
Weight
|
81
|
169
|
|
BMI
|
81
|
169
|
|
Pack Years
|
81
|
169
|
|
|
Longitudinal Adverse Events
dReport(headache + ab.pain + nausea + dyspepsia + diarrhea +
upper.resp.infect + coad ~ week + trx + id(id),
groups='trx', data=ssafety, what='byx',
popts=list(ncols=2, height=700, width=1100))
∟ Means and 0.95 bootstrap percentile confidence limits for 7 variables vs. week stratified by treatment
|
Means and 0.95 bootstrap percentile confidence limits for 7 variables vs. week stratified by treatment. N=250
|
|
Category
|
N
|
Used
|
|
Enrolled
|
500
|
250
|
|
Randomized
|
250
|
250
|
|
A
|
81
|
81
|
|
B
|
169
|
169
|
|
|
Variable
|
A
|
B
|
|
headache
|
81
|
169
|
|
abdominal pain
|
81
|
169
|
|
nausea
|
81
|
169
|
|
dyspepsia
|
81
|
169
|
|
diarrhea
|
81
|
169
|
|
upper resp tract infection
|
81
|
169
|
|
chronic obstructive airways disease
|
81
|
169
|
|
|