14 Transformations, Measuring Change, and Regression to the Mean
14.1 Transformations
 Need to transform continuous variables properly for parametric statistical methods to work well
 Normality assumption often does not hold and the central limit
theorem is irrelevant for nonhuge datasets
 Skewed distribution
 Different standard deviation in different groups
 Transforming data can be a simple method to overcome these problems
 Nonparametric methods (Wilcoxon signed rank, Wilcoxon rank sum) another good option
 Transformations are also needed when you combine variables
 subtraction as with change scores
 addition (e.g. don’t add ratios that are not proportions)
14.2 Logarithmic Transformation
 Replace individual value by its logarithm
 \(u = \log(x)\)
 In statistics, always use the logarithm (base \(e\); \(ln(x)\))
 Algebra reminders
 \(\log(ab) = \log(a) + \log(b)\)
 \(\log\left(\frac{a}{b}\right) = \log(a)  \log(b)\)
 Inverse of the log function is \(\exp(u) = x\), where \(\exp(u) = e^{u}\) and \(e\) is a constant (\(e = 2.718282 ...\))
14.2.1 Example Dataset
 From Essential Medical Statistics, 13.2 (pre data only)
 Response: Urinary \(\beta\)thromboglobulin (\(\beta\)TG) excretion in 24 subjects
 24 total subjects: 12 diabetic, 12 normal
< rbind(
d data.frame(status='normal',
btg=c(4.1, 6.3, 7.8, 8.5, 8.9, 10.4, 11.5, 12.0, 13.8,
17.6, 24.3, 37.2)),
data.frame(status='diabetic',
btg=c(11.5, 12.1, 16.1, 17.8, 24.0, 28.8, 33.9, 40.7,
51.3, 56.2, 61.7, 69.2)))
require(ggplot2)
require(data.table)
< data.table(d)
d < d[, .(btg = median(btg)), by = status]
meds <
p1 ggplot(d, aes(x=status, y=btg)) +
geom_dotplot(binaxis='y', stackdir='center', position='dodge') +
geom_errorbar(aes(ymin=..y.., ymax=..y..), width=.25, size=1.3, data=meds) +
xlab('') + ylab(expression(paste(betaTG, ' (ng/day/100 ml creatinine)'))) +
coord_flip()
< ggplot(d, aes(x=status, y=btg)) +
p2 scale_y_log10(breaks=c(4,5,10,15,20,30,40,60,80)) +
geom_dotplot(binaxis='y', stackdir='center', position='dodge') +
xlab('') + ylab(expression(paste(betaTG, ' (ng/day/100 ml creatinine)'))) +
coord_flip()
arrGrob(p1, p2, ncol=2)
 Original scale
 Normal: \(\overline{x}_1 = 13.53\), \(s_1 = 9.194\), \(n_1 = 12\)
 Diabetic: \(\overline{x}_2 = 35.28\), \(s_2 = 20.27\), \(n_2 = 12\)
 Logarithm scale
 Normal: \(\overline{x}_1^* = 2.433\), \(s_1^* = 0.595\), \(n_1 = 12\)
 Diabetic: \(\overline{x}_2^* = 3.391\), \(s_2^* = 0.637\), \(n_2 = 12\)
 \(t\)test on logtransformed data
 \(s_{pool} = \sqrt{\frac{11 \times .595^2 + 11 \times .637^2}{22}} = 0.616\)
 \(t = \frac{2.4333.391}{0.616\sqrt{1/12 + 1/12)}} = 3.81\), \(\textrm{df} = 22\), \(p = 0.001\)
 Confidence Intervals (0.95 CI)
Note that \(t_{.975,22} = 2.074\)
For Normal subjects, a CI for the mean log \(\beta\)TG is \[\begin{array}{ccc} 0.95 \textrm{ CI } & = & 2.433  2.074 \times \frac{0.595}{\sqrt{12}} \textrm{ to } 2.433 + 2.074 \frac{0.595}{\sqrt{12}} \\ & = & 2.08 \textrm{ to } 2.79 \end{array}\]
Can transform back to original scale by using the antilog function \(\exp(u)\) to estimate medians \[\begin{array}{ccc} \textrm{Geometric mean} & = & e^{2.433} = 11.39 \\ 0.95 \textrm{ CI } & = & e^{2.08} \textrm{ to } e^{2.79} \\ & = & 7.98 \textrm{ to } 16.27 \\ \end{array}\]
t.test(btg ~ status, data=d)
Welch Two Sample ttest
data: btg by status
t = 3.3838, df = 15.343, pvalue = 0.003982
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
8.07309 35.41024
sample estimates:
mean in group diabetic mean in group normal
35.27500 13.53333
t.test(log(btg) ~ status, data=d)
Welch Two Sample ttest
data: log(btg) by status
t = 3.8041, df = 21.9, pvalue = 0.0009776
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.4352589 1.4792986
sample estimates:
mean in group diabetic mean in group normal
3.390628 2.433349
 Could also use a nonparametric test (Wilcoxon rank sum)
wilcox.test(btg ~ status, data=d)
Wilcoxon rank sum test with continuity correction
data: btg by status
W = 125.5, pvalue = 0.002209
alternative hypothesis: true location shift is not equal to 0
wilcox.test(log(btg) ~ status, data=d)
Wilcoxon rank sum test with continuity correction
data: log(btg) by status
W = 125.5, pvalue = 0.002209
alternative hypothesis: true location shift is not equal to 0
 Note that nonparametric test is the same for the logtransformed outcomes
14.2.2 Limitations of log transformations
 Can only be used on positive numbers
 Sometimes use \(u = \log(x+1)\)
 Is very arbitrary to the choice of the origin
 Not always useful or the best transformation
 Sometimes use a dimensionality argument, e.g., take cube root of volume measurements or per unit of volume counts like blood counts
 Cube and square roots are fine with zeros
14.3 Analysis of Paired Observations
 Frequently one makes multiple observations on same experimental unit
 Can’t analyze as if independent
 When two observations made on each unit (e.g., pre–post), it is common to summarize each pair using a measure of effect \(\rightarrow\) analyze effects as if (unpaired) raw data
 Most common: simple difference, ratio, percent change
 Can’t take effect measure for granted
 Subjects having large initial values may have largest differences
 Subjects having very small initial values may have largest post/pre ratios
14.4 What’s Wrong with Change in General?
14.4.1 Change from Baseline in Randomized Studies
Many authors and pharmaceutical clinical trialists make the mistake of analyzing change from baseline instead of making the raw followup measurements the primary outcomes, covariateadjusted for baseline. The purpose of a parallelgroup randomized clinical trial is to compare the parallel groups, not to compare a patient with herself at baseline. The central question is for two patients with the same pre measurement value of x, one given treatment A and the other treatment B, will the patients tend to have different posttreatment values? This is exactly what analysis of covariance assesses. Withinpatient change is affected strongly by regression to the mean and measurement error. When the baseline value is one of the patient inclusion/exclusion criteria, the only meaningful change score requires one to have a second baseline measurement post patient qualification to cancel out much of the regression to the mean effect. It is the second baseline that would be subtracted from the followup measurement.
The savvy researcher knows that analysis of covariance is required to "rescue" a change score analysis. This effectively cancels out the change score and gives the right answer even if the slope of post on pre is not 1.0. But this works only in the linear model case, and it can be confusing to have the pre variable on both the left and right hand sides of the statistical model. And if \(Y\) is ordinal but not intervalscaled, the difference in two ordinal variables is no longer even ordinal. Think of how meaningless difference from baseline in ordinal pain categories are. A major problem in the use of change score summaries, even when a correct analysis of covariance has been done, is that many papers and drug product labels still quote change scores out of context. Patientreported outcome scales with floor or ceiling effects are particularly problematic. Analysis of change loses the opportunity to do a robust, powerful analysis using a covariateadjusted ordinal response model such as the proportional odds or proportional hazards model. Such ordinal response models do not require one to be correct in how to transform \(Y\).
Not only is it very problematic to analyze change scores in parallel
group designs; it is problematic to even compute them. To compute
change scores requires many assumptions to hold, e.g.:
 the variable is not used as an inclusion/exclusion criterion for the study, otherwise regression to the mean will be strong
 if the variable is used to select patients for the study, a second postenrollment baseline is measured and this baseline is the one used for all subsequent analysis
 the post value must be linearly related to the pre value
 the variable must be perfectly transformed so that subtraction “works” and the result is not baselinedependent
 the variable must not have floor and ceiling effects
 the variable must have a smooth distribution
 the slope of the pre value vs. the followup measurement must be close to 1.0 when both variables are properly transformed (using the same transformation on both)
References may be found here.
Regarding 3. above, if pre is not linearly related to post, there is no transformation that can make a change score work. Two unpublished examples illustrate this problem. In a large degression study using longitudinal measurements of the HamiltonD depression scale, there was a strong nonlinear relationship between baseline Hamilton D and the final measurement. The form of the relationship was a flattening of the effect for larger Ham D, indicating there are patients with severe depression at baseline who can achieve much larger reductions in depression than an average change score would represent (and likewise, patients with mild to moderate depression at baseline would have a reduction in depression symptoms that is far less than the average change indicates). Doing an ordinal ANCOVA adjusting for a smooth nonlinear effect of baseline using a spline function would have been a much better analysis than the change from baseline analysis done by study leaders. In the second example, a similar result was found for a quality of life measure, KCCQ. Again, a flattening relationship for large KCCQ indicated that subtracting from baseline provided a nearly meaningless average change score.
Regarding 7. above, often the baseline is not as relevant as thought and the slope will be less than 1. When the treatment can cure every patient, the slope will be zero. When the baseline variable is irrelevant, ANCOVA will estimate a slope of approximately zero and will effectively ignore the baseline. Change will baseline will make the change score more noisy than just analyzing the final raw measurement. In general, when the relationship between pre and post is linear, the correlation between the two must exceed 0.5 for the change score to have lower variance than the raw measurement.
Bland & Altman (2011) have an excellent article about how misleading change from baseline is for clinical trials. This blog article has several examples of problematic change score analyses in the clinical trials literature.
The following R
code exemplifies how to do a powerful, robust
analysis without computing change from baseline. This analysis
addresses the fundamental treatment question posed at the top of this
section. This example uses a semiparametric ANCOVA that utilizes only
the ranks of \(Y\) and that provides the same test statistic no matter
how \(Y\) is transformed. It uses the proportional odds model, with no
binning of \(Y\), using the rms
package orm
function. This is a generalization of
the Wilcoxon test—it would be almost identical to the Wilcoxon test
had the baseline effect been exactly flat. Note that the proportional
odds assumption is more likely to be satisfied than the normal
residual, equal variance assumption of the ordinary linear model.
require(rms)
# Fit a smooth flexible relationship with baseline value y0
# (restricted cubic spline function with 4 default knots)
< orm(y ~ rcs(y0, 4) + treatment)
f
fanova(f)
# Estimate mean y as a function of y0 and treatment
< Mean(f) # creates R function to compute the mean
M plot(Predict(f, treatment, fun=M)) # y0 set to median since not varied
# To allow treatment to interact with baseline value in a general way:
< orm(y ~ rcs(y0, 4) * treatment)
f plot(Predict(f, y0, treatment)) # plot y0 on xaxis, 2 curves for 2 treatments
# The above plotted the linear predictor (log odds); can also plot mean
Kristoffer Magnusson has an excellent paper Change over time is not “treatment response”
Senn (2008) Chapter 7 has a very useful graphical
summary of the largesample variances (inefficiency) of change scores
vs. ANCOVA vs. ignoring baseline completely, as a function of the
(assumed linear) correlation between baseline and outcome.
14.4.2 Special Problems Caused By NonMonotonic Relationship With Ultimate Outcome
Besides the strong assumptions made about how variables are transformed before a difference is calculated, there are many types of variables for which change can never be interpreted without reference to the starting point, because the relationship between the variable in question and an ultimate outcome is not even monotonic. A good example is the improper definitions of acute kidney injury (AKI) that have been accepted without question. Many of the definitions are based on change in serum creatinine (SCr). Problems with the definitions include
 The nonmonotonic relationship between SCr and mortality demonstrates that it is not healthy to have very low SCr. This implies that increases in SCr for very low starting SCr may not be harmful as assumed in definitions of AKI.
 Given an earlier SCr measurement and a current SCr, the earlier measurement is not very important in predicting mortality once one adjusts for the last measurement. Hence a change in SCr is not very predictive and the current SCr is allimportant. As an example consider the estimated relationship between baseline SCr and mortality in critically ill ICU patients.
require(rms)
load('~/Analyses/SUPPORT/combined.sav')
< subset(combined,
combined select=c(id, death, d.time, hospdead, dzgroup, age, raceh, sex))
load('~/Analyses/SUPPORT/combphys.sav')
< subset(combphys, !is.na(crea1+crea3),
combphys select=c(id,crea1,crea3,crea7,crea14,crea25,alb3,
meanbp3,pafi3,wblc3))< merge(combined, combphys, by='id')
w < 'mg/dl'
u < upData(w, labels=c(crea1='Serum Creatinine, Day 1',
w crea3='Serum Creatinine Day 3',
crea14='Serum Creatinine Day 14'),
units=c(crea1=u, crea3=u, crea7=u, crea14=u, crea25=u),
print=FALSE)
< subset(w, crea1 < 2)
w < datadist(w); options(datadist='dd')
dd < lrm(hospdead ~ rcs(crea1, 5) + rcs(crea3, 5), data=w)
h anova(h)
Wald Statistics for hospdead


χ^{2}  d.f.  P  

crea1  19.52  4  0.0006 
Nonlinear  15.60  3  0.0014 
crea3  108.11  4  <0.0001 
Nonlinear  49.98  3  <0.0001 
TOTAL NONLINEAR  132.06  6  <0.0001 
TOTAL  217.11  8  <0.0001 
< lrm(hospdead ~ sex * rcs(crea3, 5), data=w)
h < Predict(h, crea3, sex, fun=plogis)
p ggplot(p, ylab='Risk of Hospital Death')