For more information see RMS Chapter 7

15.1 Introduction

A

Serial data, also called longitudinal data, repeated measures, or panel data, present a special challenge in that the response variable is multivariate, and that the responses measured at different times (or other conditions such as doses) but on the same subject are correlated with each other. One expects correlations between two measurements measured closer together will exceed correlations for two distant measurements on the same subject. Analyzing the repeated measurements just as though they were independent measurements falsely inflates the sample size and results in a failure to preserve type I error and confidence interval coverage. For example, having three measurements on 10 subjects results in 30 measurements, but depending on the correlations among the three measurements within subject, the effective sample size might be 16, for example. In other words, the statistical information for collecting 3 measurements on each of 10 subjects might provide the same statistical information and power as having one measurement on each of 16 subjects. The statistical information would however be less than that from 30 subjects measured once, if the intra-subject correlation exceeds zero.

The most common problems in analyzing serial data are

B
  1. treating repeated measurements per subject as if they were from separate subjects
  2. using two-way ANOVA as if different measurement times corresponded to different groups of subjects
  3. using repeated measures ANOVA which assumes that the correlation between any two points within subject is the same regardless of how far apart the measures were timed
  4. analyzing more than 3 time points as if time is a categorical rather than a continuous variable
    • multiplicity problem
    • analyses at different times may be inconsistent since times are not connected
    • loss of power by not jointly analyzing the serial data

15.2 Analysis Options

There are several overall approaches to the analysis of serial measurements. Some of the older approaches such as multiple \(t\)-tests and repeated measures ANOVA are now considered obsolete because of the availability of better methods that are more flexible and robust1. Separate \(t\)-tests at each time point do not make good use of available information, use an inefficient estimate of \(\sigma^2\), do not interpolate time points, and have multiple comparison problems. Since a multiplicity adjustment for multiple correlated \(t\)-tests is not model-based, the resulting confidence intervals and \(P\)-values are conservative. To preserve type I error, one always must sacrifice type II error, but in this case the sacrifice is too severe. In addition, investigators are frequently confused by the \(t\)-test being “significant” at one time point and not at another, and make the unwarranted claim that the treatment is effective only at the first time. Besides not recognizing the absence of evidence is not evidence for absence problem, the investigator can be mislead by increased variability in response at the second time point driving the \(t\) ratio towards zero. This variability may not even be real but may reflect instability in estimating \(\sigma\) separately at each time point.

C

1 Repeated measures ANOVA makes stringent assumptions and requires complex adjustments for within-subject correlation.

Especially when there are more than three unique measurement times, it is advisable to model time as a continuous variable. When estimation of the time-response profile is of central importance, that may be all that’s needed. When comparing time-response profiles (e.g., comparing two treatments) one needs to carefully consider the characteristics of the profiles that should be tested, i.e., where the type I and II errors should be directed, for example

D
  • difference in slope
  • difference in area under the curve
  • difference in mean response at the last planned measurement time
  • difference in mean curves at any time, i.e., whether the curves have different heights or shapes anywhere

The first 3 are 1 d.f. tests, and the last is a 2 d.f. test if linearity is assumed, and \(> 2\) d.f. if fitting a polynomial or spline function.

15.2.1 Joint Multivariate Models

E

This is the formal fully-specified statistical model approach whereby a likelihood function is formed and maximized. This handles highly imbalanced data, e.g., one subject having one measurement and another having 100. It is also the most robust approach to non-random subject dropouts. These two advantages come from the fact that full likelihood models “know” exactly how observations from the same subject are connected to each other.

Examples of full likelihood-based models include generalized least squares (GLS), mixed effects models, and Bayesian hierarchical models. GLS only handles continuous \(Y\) and assumes multivariate normality. It does not allow different subjects to have different slopes. But it is easy to specify and interpret, to have its assumptions checked, and it runs faster. Mixed effects models can handle multiple hierarchical levels (e.g., state/hospital/patient) and random slopes whereby subjects can have different trajectories. Mixed effects models can be generalized to binary and other endpoints but lose their full likelihood status somewhat when these extensions are used, unless a Bayesian approach is used.

  • Mixed effects model have random effects (random intercepts and possibly also random slopes or random shapes) for subjects
  • These allow estimation of the trajectory for an individual subject
  • When the interest is instead on group-level effects (e.g., average difference between treatments, over subjects), GLS squares models may be more appropriate
  • The generalization of GLS to non-normally distributed \(Y\) is marginalized models (marginalized over subject-level effects)

GLS, formerly called growth curve models, is the oldest approach and has excellent performance when \(Y\) is conditionally normally distributed.

15.2.2 GEE

Generalized estimating equations is usually based on a working independence model whereby ordinary univariate regressions are fitted on a combined dataset as if all observations are uncorrelated, and then an after-the-fit correction for intra-cluster correlation is done using the cluster sandwich covariance estimator or the cluster bootstrap. GEE is very non-robust to non-random subject dropout; it assumes missing response values are missing completely at random. It may also require large sample sizes for \(P\)-values and confidence intervals to be accurate. An advantage of GEE is that it extends easily to every type of response variable, including binary, ordinal, polytomous, and time-to-event responses.

F

15.2.3 Summary Measures

A simple and frequently effective approach is to summarize the serial measures from each subject using one or two measures, then to analyze these measures using traditional statistical methods that capitalize on the summary measures being independent for different subjects. This has been called

GH
  1. Two-stage derived variable analysis (Diggle et al., 2002)
  2. Response feature analysis (Dupont, 2008)
  3. Longitudinal analysis through summary measures (Matthews et al., 1990)

An excellent overview may be found in Matthews et al. (1990), Dupont (2008) (Chapter 11), and Senn et al. (2000).

Frequently chosen summary measures include the area under the time-response curve, slope, intercept, and consideration of multiple features simultaneously, e.g., intercept, coefficient of time, coefficient of time squared when fitting each subject’s data with a quadratic equation. This allows detailed analyses of curve shapes.

15.3 Case Study

15.3.1 Data and Summary Measures

Consider the isoproterenol dose-response analysis (Dupont, 2008) of the original data from Lang2. Twenty two normotensive men were studied, 9 of them black and 13 white. Blood flow was measured before the drug was given, and at escalating doses of isoproterenol. Most subjects had 7 measurements, and these are not independent. Note: Categorization of race is arbitrary and represents a gross oversimplification of genetic differences in physiology.

I

2 CC Lang etal NEJM 333:155-60, 1995

Code
require(Hmisc)
require(data.table)   # elegant handling of aggregation
require(ggplot2)
d <- csv.get('https://hbiostat.org/data/repo/11.2.Long.Isoproterenol.csv')
d <- upData(d, keep=c('id', 'dose', 'race', 'fbf'),
            race  =factor(race, 1:2, c('white', 'black')),
            labels=c(fbf='Forearm Blood Flow'),
            units=c(fbf='ml/min/dl'))
Input object size:   13168 bytes;    8 variables     154 observations
Modified variable   race
Kept variables  id,dose,race,fbf
New object size:    6496 bytes; 4 variables 154 observations
Code
setDT(d, key=c('id', 'race'))

J
Code
# Fit subject-by-subject spline fits and either return the coefficients,
# the estimated area under the curve from [0,400], or evaluate each
# subject's fitted curve over a regular grid of 150 doses
# Area under curve is divided by 400 to get a mean function
require(rms)
options(prType='html')
g <- function(x, y, what=c('curve', 'coef', 'area')) {
  what <- match.arg(what)   # 'curve' is default
  knots <- c(20, 60, 150)
  f <- ols(y ~ rcs(x, knots))
  xs <- seq(0, 400, length=150)
  switch(what,
         coef = {k <- coef(f)
                 list(b0 = k[1], b1=k[2], b2=k[3])},
         curve= {x <- seq(0, 400, length=150)
                 list(dose=xs, fbf=predict(f, data.frame(x=xs)))},
         area = {antiDeriv = rcsplineFunction(knots, coef(f),
                   type='integral')
                 list(dose  = 400, fbf=y[x == 400],
                      area  = antiDeriv(400) / 400,
                      tarea = areat(x, y) / 400)} )
}
# Function to use trapezoidal rule to compute area under the curve
areat <- function(x, y) {
  i <- ! is.na(x + y)
  x <- x[i]; y <- y[i]
  i <- order(x)
  x <- x[i]; y <- y[i]
  if(! any(x == 400)) NA else
  sum(diff(x) * (y[-1] + y[-length(y)]))/2
}

w <- d[, j=g(dose, fbf), by = list(id, race)]   # uses data.table package
a <- d[, j=g(dose, fbf, what='area'), by = list(id, race)]

ggplot(d, aes(x=dose, y=fbf, color=factor(id))) +
       geom_line() + geom_line(data=w, alpha=0.25) +
       geom_text(aes(label = round(area,1)), data=a, size=2.5,
                 position=position_dodge(width=50)) +
       xlab('Dose') + ylab(label(d$fbf, units=TRUE, plot=TRUE)) +
       facet_grid(~ race) +
       guides(color=FALSE)
Figure 15.1: Spaghetti plots for isoproterenol data showing raw data stratified by race. Next to each curve is the area under the curve divided by 400 to estimate the mean response function. The area is computed analytically from a restricted cubic spline function fitted separately to each subject’s dose-response curve. Shadowing the raw data are faint lines depicting spline fits for each subject

K
Code
ggplot(a, aes(x=tarea, y=area, color=race)) + geom_point() +
  geom_abline(col=gray(.8)) +
  xlab('Area by Trapezoidal Rule / 400') +
  ylab('Area by Spline Fit / 400')
Figure 15.2: AUC by curve fitting and by trapezoidal rule

When a subject’s dose (or time) span includes the minimum and maximum values over all subjects, one can use the trapezoidal rule to estimate the area under the response curve empirically. When interior points are missing, linear interpolation is used. The spline fits use nonlinear interpolation, which is slightly better, as is the spline function’s assumption of continuity in the slope. Figure 15.2 compares the area under the curve (divided by 400 to estimate the mean response) estimated using the the two methods. Agreement is excellent. In this example, the principle advantage of the spline approach is that slope and shape parameters are estimated in the process, and these parameters may be tested separately for association with group (here, race). For example, one may test whether slopes differ across groups, and whether the means, curvatures, or inflection points differ. One could also compare AUC from a sub-interval of \(X\).

L
Code
ggplot(a, aes(x=race, y=area)) +
  geom_boxplot(alpha=.5, width=.25) + geom_point() + coord_flip() +
  ylab(expression(paste('Mean Forearm Blood Flow,  ', scriptstyle(ml/min/dl))))
Figure 15.3: Mean blood flow computed from the areas under the spline curves, stratified by race, along with box plots

15.3.2 Nonparametric Test of Race Differences in AUC


A minimal-assumption approach to testing for differences in isoproterenol dose-response between races is to apply the Wilcoxon test to the normalized AUCs (mean response functions).

M
Code
wilcox.test(area ~ race, data=a, conf.int=TRUE)

    Wilcoxon rank sum exact test

data:  area by race
W = 112, p-value = 7.639e-05
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
  5.670113 16.348121
sample estimates:
difference in location 
              11.25836 

There is strong evidence that the mean response is greater for whites.

15.3.3 Nonparametric Test of General Curve Differences

O’Brien (1988) proposed a method for using logistic regression to turn the Hotelling \(T^2\) test on its side. The Hotelling test is the multivariate analog of the two-sample \(t\)-test, and can be used to test simultaneously such things as whether a treatment modifies either systolic or diastolic blood pressure. O’Brien’s idea was to test whether systolic or diastolic blood pressure (or both) can predict which treatment was given. Here we use the idea to test for race differences in the shape of the dose-response curves. We do this by predicting race from a 3-predictor model—one containing the intercept, the next the coefficient of the linear dose effect and the third the coefficient of the nonlinear restricted cubic spline term (differences in cubes). These coefficients were estimated using ordinary least squares in separately predicting each subject’s relationship between dose and forearm blood flow.

N
Code
h <- d[, j=g(dose, fbf, what='coef'), by = list(id, race)]
h
Key: <id, race>
            id   race         b0         b1          b2
    <labelled> <fctr>      <num>      <num>       <num>
 1:          1  white -0.1264763 0.32310327 -0.34253352
 2:          2  white  2.1147707 0.22798336 -0.21959542
 3:          3  white  2.3378721 0.06927717 -0.03625384
 4:          4  white  1.4822502 0.19837740 -0.20727584
 5:          5  white  2.5366751 0.15399740 -0.14756999
 6:          6  white  3.2117187 0.19910942 -0.18650561
 7:          7  white  1.4264366 0.05261565 -0.03871545
 8:          8  white  3.0999999 0.21833887 -0.19813457
 9:          9  white  5.1507764 0.45026617 -0.48013920
10:         10  white  4.4778127 0.23853904 -0.23289815
11:         11  white  1.9052885 0.13548226 -0.13917910
12:         12  white  2.1828176 0.07558431 -0.05524955
13:         13  white  2.9318982 0.12776900 -0.10679867
14:         14  black  2.3336099 0.02679742 -0.02856275
15:         15  black  1.8356227 0.07652884 -0.07972036
16:         16  black  2.5342537 0.02290717 -0.02585081
17:         17  black  2.0254606 0.06002835 -0.06261969
18:         18  black  3.3279080 0.07620477 -0.08062536
19:         19  black  1.9308650 0.03844018 -0.04060065
20:         20  black  1.7263259 0.12358392 -0.13595538
21:         21  black  1.3215502 0.03528716 -0.03480467
22:         22  black  2.0828281 0.03143768  0.02251155
            id   race         b0         b1          b2
Code
f <- lrm(race ~ b0 + b1 + b2, data=h, x=TRUE, y=TRUE)
f

Logistic Regression Model

lrm(formula = race ~ b0 + b1 + b2, data = h, x = TRUE, y = TRUE)
Model Likelihood
Ratio Test
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 22 LR χ2 19.73 R2 0.798 C 0.957
white 13 d.f. 3 R23,22 0.533 Dxy 0.915
black 9 Pr(>χ2) 0.0002 R23,16 0.650 γ 0.915
max |∂log L/∂β| 0.0003 Brier 0.080 τa 0.463
β S.E. Wald Z Pr(>|Z|)
Intercept   3.6618   4.1003 0.89 0.3718
b0   1.3875   2.5266 0.55 0.5829
b1  -165.3481  92.7864 -1.78 0.0747
b2  -101.1667  63.1549 -1.60 0.1092

The likelihood ratio \(\chi^{2}_{3}=19.73\) has \(P=0.0002\) indicating strong evidence that the races have different averages, slopes, or shapes of the dose-response curves. The \(c\)-index of 0.957 indicates nearly perfect ability to separate the races on the basis of three curve characteristics (although the sample size is small). We can use the bootstrap to get an overfitting-corrected index.

O
Code
set.seed(2)
v <- validate(f, B=1000)
v
Index Original
Sample
Training
Sample
Test
Sample
Optimism Corrected
Index
Successful
Resamples
Dxy 0.9145 0.8856 0.8562 0.0294 0.8851 1000
R2 0.7985 0.9341 0.6723 0.2618 0.5366 1000
Intercept 0 0 0.1183 -0.1183 0.1183 1000
Slope 1 1 0.3043 0.6957 0.3043 1000
Emax 0 0 0.2689 0.2689 0.2689 1000
D 0.8513 1.1279 0.6581 0.4699 0.3814 1000
U -0.0909 -0.0909 Inf -Inf Inf 1000
Q 0.9422 1.2189 -Inf Inf -Inf 1000
B 0.0803 0.0226 0.0765 -0.054 0.1342 1000
g 6.7833 124.7126 4.7468 119.9659 -113.1826 1000
gp 0.4712 0.4704 0.4288 0.0416 0.4296 1000

The overfitting-corrected \(c\)-index is \(c = \frac{D_{xy} + 1}{2}\) = 0.94.

15.3.4 Model-Based Analysis: Generalized Least Squares

Generalized least squares (GLS) is the first generalization of ordinary least squares (multiple linear regression). It is described in detail in Regression Modeling Strategies Chapter 7 where a comprehensive case study is presented. The assumptions of GLS are

PQ
  • All the usual assumptions about the right-hand-side of the model related to transformations of \(X\) and interactions
  • Residuals have a normal distribution
  • Residuals have constant variance vs. \(\hat{Y}\) or any \(X\) (but the G in GLS also refers to allowing variances to change across \(X\))
  • The multivariate responses have a multivariate normal distribution conditional on \(X\)
  • The correlation structure of the conditional multivariate distribution is correctly specified

With fully specified serial data models such as GLS, the fixed effects of time or dose are modeled just as any other predictor, with the only difference being that it is the norm to interact the time or dose effect with treatment or whatever \(X\) effect is of interest. This allows testing hypotheses such as

R
  • Does the treatment effect change over time? (time \(\times\) treatment interaction)
  • Is there a time at which there is a treatment effect? (time \(\times\) treatment interaction + treatment main effect combined into a chunk test)
  • Does the treatment have an effect at time \(t\)? (difference of treatments fixing time at \(t\), not assuming difference is constant across different \(t\))

In the majority of longitudinal clinical trials, the last hypothesis is the most important, taking \(t\) as the end of treatment point. This is because one is often interested in where patients ended up, not just whether the treatment provided temporary relief.

S

Now consider the isoproterenol dataset and fit a GLS model allowing for the same nonlinear spline effect of dose as was used above, and allowing the shapes of curves to be arbitrarily different by race. We impose a continuous time AR1 correlation structure on within-subject responses. This is the most commonly used correlation structure; it assumes that the correlation between two points is exponentially declining as the difference between two times or doses increases. We fit the GLS model and examine the equal variance assumption.

T
Code
require(nlme)
dd <- datadist(d); options(datadist='dd')
a <- Gls(fbf ~ race * rcs(dose, c(20,60,150)), data=d,
         correlation=corCAR1(form = ~ dose | id))
plot(fitted(a), resid(a))
Figure 15.4: Residual plot for generalized least squares fit on untransformed fbf

The variance of the residuals is clearly increasing with increasing dose. Try log transforming both fbf and dose. The log transformation requires a very arbitrary adjustment to dose to handle zeros.

Code
a <- Gls(log(fbf) ~ race * rcs(log(dose + 1), log(c(20,60,150)+1)), data=d,
         correlation=corCAR1(form = ~ dose | id))
anova(a)
Wald Statistics for log(fbf)
χ2 d.f. P
race (Factor+Higher Order Factors) 99.54 3 <0.0001
All Interactions 19.30 2 <0.0001
dose (Factor+Higher Order Factors) 312.10 4 <0.0001
All Interactions 19.30 2 <0.0001
Nonlinear (Factor+Higher Order Factors) 2.01 2 0.3667
race × dose (Factor+Higher Order Factors) 19.30 2 <0.0001
Nonlinear 0.07 1 0.7969
Nonlinear Interaction : f(A,B) vs. AB 0.07 1 0.7969
TOTAL NONLINEAR 2.01 2 0.3667
TOTAL NONLINEAR + INTERACTION 21.16 3 <0.0001
TOTAL 391.48 5 <0.0001

There is little evidence for a nonlinear dose effect on the log scale, implying that the underlying model is exponential on the original \(X\) and \(Y\) scales. This is consistent with Dupont (2008). Re-fit the model as linear in the logs. Before taking this as the final model, also fit the same model but using a correlation pattern based on time rather than dose. Assume equal time spacing during dose escalation.

U
Code
a <- Gls(log(fbf) ~ race * log(dose + 1), data=d,
         correlation=corCAR1(form = ~ dose | id))
d$time <- match(d$dose, c(0, 10, 20, 60, 150, 300, 400)) - 1
b <- Gls(log(fbf) ~ race * log(dose + 1), data=d,
         correlation=corCAR1(form = ~ time | id))
AIC(a);AIC(b)
[1] 231.3731
[1] 161.3765
Code
b

Generalized Least Squares Fit by REML

Gls(model = log(fbf) ~ race * log(dose + 1), data = d, correlation = corCAR1(form = ~time | 
    id))
Obs 150 Log-restricted-likelihood -74.69
Clusters 22 Model d.f. 3
g 0.755 σ 0.5023
d.f. 146
β S.E. t Pr(>|t|)
Intercept   0.9851  0.1376 7.16 <0.0001
race=black  -0.2182  0.2151 -1.01 0.3120
dose   0.3251  0.0286 11.38 <0.0001
race=black × dose  -0.1421  0.0446 -3.19 0.0018
Correlation Structure: Continuous AR(1)
 Formula: ~time | id 
 Parameter estimate(s):
      Phi 
0.6886846 
Code
anova(b)
Wald Statistics for log(fbf)
χ2 d.f. P
race (Factor+Higher Order Factors) 32.45 2 <0.0001
All Interactions 10.16 1 0.0014
dose (Factor+Higher Order Factors) 158.11 2 <0.0001
All Interactions 10.16 1 0.0014
race × dose (Factor+Higher Order Factors) 10.16 1 0.0014
TOTAL 180.17 3 <0.0001

Lower AIC is better, so it is clear that time-based correlation structure is far superior to dose-based. We will used the second model for the remainder of the analysis. But first we check some of the model assumptions.

V
Code
w <- data.frame(residual=resid(b), fitted=fitted(b))
p1 <- ggplot(w, aes(x=fitted, y=residual)) + geom_point()
p2 <- ggplot(w, aes(sample=residual)) + stat_qq()
gridExtra::grid.arrange(p1, p2, ncol=2)
Figure 15.5: Checking assumptions of the GLS model that is linear after logging dose and blood flow. The graph on the right is a QQ-plot to check normality of the residuals from the model, where linearity implies normality.

The test for dose \(\times\) race interaction in the above ANOVA summary of Wald statistics shows strong evidence for difference in curve characteristics across races. This test agrees in magnitude with the less parametric approach using the logistic model above. But the logistic model also tests for an overall shift in distributions due to race, and the more efficient test for the combined race main effect and dose interaction effect from GLS is more significant with a Wald \(\chi^{2}_{2} = 32.45\)3. The estimate of the correlation between two log blood flows measured on the same subject one time unit apart is 0.69.

W

3 The \(\chi^2\) for race with the dose-based correlation structure was a whopping 100 indicating that lack of fit of the correlation structure can have a significant effect on the rest of the GLS model.

The equal variance and normality assumptions appear to be well met as judged by Figure 15.5.

Now estimate the dose-response curves by race, with pointwise confidence intervals and simultaneous intervals that allow one to make statements about the entire curves4. Anti-log the predicted values to get predictions on the original blood flow scale. Anti-logging predictions from a model that assumes a normal distribution on the logged values results in estimates of the median response.

4 Since the model is linear in log dose there are two parameters involving dose—the dose main effect and the race \(\times\) dose interaction effect. The simultaneous inference adjustment only needs to reflect simultaneity in two dimensions.

X
Code
dos <- seq(0, 400, length=150)
p <- Predict(b, dose=dos, race, fun=exp)
s <- Predict(b, dose=dos, race, fun=exp, conf.type='simultaneous')
ps <- rbind(Pointwise=p, Simultaneous=s)
ggplot(ps, ylab=expression(paste('Median Forearm Blood Flow,  ',
                          scriptstyle(ml/min/dl))))
Figure 15.6: Pointwise and simultaneous confidence bands for median dose-response curves by race

Finally we estimate the white:black fold change (ratio of medians) as a function of dose with simultaneous confidence bands.

Y
Code
k <- contrast(b, list(dose=dos, race='white'),
                 list(dose=dos, race='black'), conf.type='simultaneous')
k <- as.data.frame(k[c('dose', 'Contrast', 'Lower', 'Upper')])
ggplot(k, aes(x=dose, y=exp(Contrast))) + geom_line() +
  geom_ribbon(aes(ymin=exp(Lower), ymax=exp(Upper)), alpha=0.2, linetype=0,
              show_guide=FALSE) +
  geom_hline(yintercept=1, col='red', size=.2) +
  ylab('White:Black Ratio of Median FBF')
Figure 15.7: White:black fold change for median response as a function of dose, with simultaneous confidence band

By comparing the simultaneous confidence binds to the red horizontal line, one can draw the inference that the dose-response for blacks is everywhere different than that for whites when the dose exceeds zero.

15.3.5 Longitudinal Bayesian Mixed Effects Semiparametric Model

  • Proportional odds model with random effects
  • Robust with regard to distribution
  • Compound symmetry may not be realistic
  • Model estimates logits of exceedance probabilities
  • Convert these to means on the original scale
Code
require(rmsb)
# Use cmdstan instead of rstan to get more current methods and
# better performance
options(mc.cores = parallel::detectCores() - 1, rmsb.backend='cmdstan')
cmdstanr::set_cmdstan_path('/usr/local/bin/cmdstan-2.34.1')

b <- blrm(fbf ~ race * rcs(dose, 5) + cluster(id), data=d)
Running MCMC with 4 chains, at most 11 in parallel...

Chain 1 finished in 3.4 seconds.
Chain 2 finished in 3.4 seconds.
Chain 3 finished in 3.4 seconds.
Chain 4 finished in 3.4 seconds.

All 4 chains finished successfully.
Mean chain execution time: 3.4 seconds.
Total execution time: 3.5 seconds.
Code
b

Bayesian Proportional Odds Ordinal Logistic Model

Dirichlet Priors With Concentration Parameter 0.023 for Intercepts

blrm(formula = fbf ~ race * rcs(dose, 5) + cluster(id), data = d)
Frequencies of Missing Values Due to Each Variable
        fbf        race        dose cluster(id) 
          4           0           0           0 
Mixed Calibration/
Discrimination Indexes
Discrimination
Indexes
Rank Discrim.
Indexes
Obs 150 LOO log L -769.93±17.16 g 6.583 [5.499, 7.898] C 0.864 [0.856, 0.869]
Draws 4000 LOO IC 1539.87±34.33 gp 0.471 [0.456, 0.485] Dxy 0.729 [0.712, 0.738]
Chains 4 Effective p 256.35±8.92 EV 0.751 [0.695, 0.816]
Time 4.2s B 0.116 [0.103, 0.134] v 33.444 [22.864, 47.967]
p 9 vp 0.187 [0.172, 0.202]
Cluster on id
Clusters 22
σγ 2.6019 [1.7642, 3.5987]
Mean β Median β S.E. Lower Upper Pr(β>0) Symmetry
race=black   -0.9182   -0.8907  1.3934   -3.6121   1.9406  0.2495  0.98
dose   0.3319   0.3311  0.0477   0.2390   0.4266  1.0000  1.05
dose'  -13.1109  -13.1255  4.3660  -21.6979  -4.8469  0.0013  0.97
dose''   15.7752   15.7636  5.7159   4.5251  26.6165  0.9962  1.02
dose'''   -2.3498   -2.3612  1.5030   -5.3500   0.5005  0.0558  1.01
race=black × dose   -0.1590   -0.1585  0.0607   -0.2759  -0.0368  0.0045  1.01
race=black × dose'   5.1099   5.0442  6.4426   -8.1835  17.8823  0.7908  0.97
race=black × dose''   -5.7660   -5.6494  8.4819  -22.4345  11.8712  0.2418  1.04
race=black × dose'''   0.2918   0.2808  2.2823   -4.4436   4.7783  0.5470  0.97
Figure 15.8: MCMC diagnostics
Code
stanDxplot(b)
Figure 15.9: MCMC diagnostics
Code
doses <- seq(0, 400, by=5)
ggplot(Predict(b, dose=doses, race))
Figure 15.10: Posterior mean logit of exceedance probability
Code
M <- Mean(b)
ggplot(Predict(b, dose=doses, race, fun=M))
k <- contrast(b, list(race='white', dose=seq(0, 400, by=100)),
                               list(race='black', dose=seq(0, 400, by=100)))
k
  dose Contrast     S.E.     Lower     Upper Pr(Contrast>0)
1    0 0.918220 1.393394 -1.940558  3.612059         0.7505
2  100 7.401274 1.657551  4.273938 10.680484         1.0000
3  200 5.665825 1.508486  2.814756  8.579904         0.9998
4  300 7.082135 1.443692  4.410357  9.987672         1.0000
5  400 9.127073 1.621517  6.031940 12.269018         1.0000

Intervals are 0.95 highest posterior density intervals
Contrast is the posterior mean 
Figure 15.11: Posterior mean of true mean forearm blood flow