gudif.pdf from
http://www.r-bloggers.com/new-r-softwaremethodology-for-handling-missing-dat

How many imputations:  https://statisticalhorizons.com/how-many-imputations

------------------ Multiple d.f. tests - Mark Seeto <markseeto@gmail.com>
I don't agree with your interpretation of Reiter's approach. I think the denominator d.f. will be different for different tests.

The way I read it, when testing the null hypothesis beta1 = beta2 = 0, the Q on p. 503 will be [beta1, beta2]', but when testing the null hypothesis beta3 = beta4 = beta5 = 0, it will be Q = [beta3, beta4, beta5]'. The different Q results in different B_m and \bar{U}_m, which results in different r_m, a, z, and d.f. My interpretation of the method is illustrated in the code I included a couple of emails ago.

Thanks for your reply Frank. Just to make sure I understand what you're saying, suppose x1 is continuous and x2 is categorical with levels a, b, c, d. Consider the multiple regression model

y = beta0 + beta1*x1 + beta2*x1^2 + beta3*(x2=b) + beta4*(x2=c) + beta5*(x2=d) + error

and tests of two multiple-d.f. null hypotheses:
(1)  beta1 = beta2 = 0
(2)  beta3 = beta4 = beta5 = 0.

Are you saying that Reiter's approach will give the same denominator d.f. for both of those tests?

Thanks,
Mark

I've noticed that Rubin's d.f. for single-d.f. tests can be much higher than the d.f. that would be used if there was no missing data. That seems strange to me.

My interpretation of Reiter's paper was different from yours. To me it looks like it gives different d.f. for different tests because the d.f. depends on the choice of the estimand Q. I've included an example below.

Mark

##################################################################################
## Generate data and imputed data sets

library(rms)

set.seed(1)

n <- 50  # sample size
m <- 5   # number of imputed data sets

d <- data.frame(x1 = rnorm(n),
                x2 = sample(c("a", "b", "c", "d"), n, replace=TRUE))

d$y <- 1 + 0.5*d$x1 + 0*(d$x2 == "b") + 0.2*(d$x2 == "c") + 0.6*(d$x2 == "d") +
  rnorm(n, 0, 1.5)

missing.ind <- list(x1 = 1:10, x2 = 21:30)  # indices of missing values

d[missing.ind$x1, "x1"] <- NA
d[missing.ind$x2, "x2"] <- NA

aregImpute.output <- aregImpute(~ y + x1 + x2, data = d, n.impute = m)

d.imp <- vector("list", m)  # list of imputed data sets

for (j in 1:m) {
  d.imp[[j]] <- d
  d.imp[[j]][missing.ind$x1, "x1"] <- aregImpute.output$imputed$x1[, j]
  d.imp[[j]][missing.ind$x2, "x2"] <-
    aregImpute.output$cat.levels$x2[aregImpute.output$imputed$x2[, j]]
}

##################################################################################

# For the model y = beta0 + beta1*x1 + beta2*x1^2 +
#                   beta3*(x2=b) + beta4*(x2=c) + beta5*(x2=d) + error,
#
# calculate Reiter's d.f. for tests of the following hypotheses:
# (1) beta1 = beta2 = 0
# (2) beta3 = beta4 = beta5 = 0.

# Calculations follow p. 503 of the article.
# Where applicable, variable names are consistent with Reiter's notation.

ols.imp <- vector("list", m)  # Models fitted on imputed data sets

for (j in 1:m) {
  ols.imp[[j]] <- ols(y ~ pol(x1, 2) + x2, data = d.imp[[j]])
}

tests <- c("x1", "x2")
# "x1" refers to the test of beta1 = beta2 = 0
# "x2" refers to the test of beta3 = beta4 = beta5 = 0

test.terms <- list(x1 = c("x1", "x1^2"),
                   x2 = c("x2=b", "x2=c", "x2=d"))

k <- sapply(test.terms, length)

nu.com <- n - ols.imp[[1]]$stats["d.f."] - 1
nu.com.s <- nu.com*(nu.com + 1)/(nu.com + 3)  # nu.com-star

B.full <- cov(t(sapply(ols.imp, coef)))
Ubar.full <- Reduce("+", lapply(ols.imp, vcov))/m

# Taking sub-matrices of B.full and Ubar.full is the same as calculating
# B and Ubar using just the terms of interest.

B <- vector("list", length(tests))
Ubar <- vector("list", length(tests))
r <- numeric(length(tests))
names(r) <- names(Ubar) <- names(B) <- tests

for (x in tests) {
  B[[x]] <- B.full[test.terms[[x]], test.terms[[x]]]
  Ubar[[x]] <- Ubar.full[test.terms[[x]], test.terms[[x]]]
  r[x] <- (1 + 1/m)*sum(diag(solve(Ubar[[x]], B[[x]])))/k[x]
}

t. <- k*(m - 1)

a <- r*t./(t. - 2)

z <- 1/(nu.com.s - 4*(1 + a)) +
  ( a^2 * (nu.com.s - 2*(1 + a)) / ((1 + a)^2 * (nu.com.s - 4*(1 + a))) +
    8*a^2 * (nu.com.s - 2*(1 + a)) / ((1 + a)*(nu.com.s - 4*(1 + a))^2) +
    4*a^2 / ((1 + a)*(nu.com.s - 4*(1 + a))) +
    4*a^2 / ((nu.com.s - 4*(1 + a))*(nu.com.s - 2*(1 + a))) +
    16*a^2 * (nu.com.s - 2*(1 + a)) / (nu.com.s - 4*(1 + a))^3 +
    8*a^2 / (nu.com.s - 4*(1 + a))^2
  ) / (t. - 4)

nu.f <- 4 + 1/z

nu.f  # Reiter's denominator d.f.

##       x1       x2
## 25.18177 22.06186

##################################################################################

Bayes and likelihood formulation that helps to explain why there is information in missing X with non-missing Y but not information with completely missing at random Y:
https://stats.stackexchange.com/questions/563129