pool_over_imputations.Rd
Function to pool parameter estimates as well as their squared standard errors and associated degrees of freedom over imputation sets when using multiple imputation via chained equations (MICE). Estimates are pooled according to Rubin's rules. Input notation assumes I imputations and P parameters.
pool_over_imputations(estimates, standard_errors, degrees_of_freedom)
An I x P matrix of parameter estimates.
An I x P matrix of standard errors.
An I x P matrix of degrees of freedom.
A matrix with the pooled estimates, standard errors, and associated degrees of freedom for the set of parameters.
Pools over multiple imputations using Rubin's rules. Final parameter estimates are the average over imputations, while standard errors are the combination of within variance (average of the estimates' variance, the square of the standard error, across imputations) and between variance (the variance of the point estimates across imputations). Finally, a correction is applied to the parameters' degrees of freedom.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. John Wiley & Sons.
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45 (3), 1-67. https://doi.org/10.18637/jss.v045.i03.
if (FALSE) {
# Simulation example
set.seed( 222 ) # For reproducibility
# Simulate 200 observations from
# 3 correlated variables
n <- 200
Sigma <- rbind(
c( 1.0, 0.3, 0.2 ),
c( 0.3, 1.0, 0.7 ),
c( 0.2, 0.7, 1.0 )
)
Y <- MASS::mvrnorm(
n, c( 0, 0, 0 ), Sigma
)
colnames(Y) <- c( 'Y1', 'Y2', 'Y3' )
Y <- data.frame( Y )
# Missing values for Y2 depend on both Y1 and Y2
Y$R2 <- rbinom(
n, 1, logistic(
logit(.3) + log(2)*Y$Y1 + log(4)*Y$Y2
)
)
Y$Y2.M <- Y$Y2
Y$Y2.M[ Y$R2 == 1 ] <- NA
# Predict Y1 from Y2 (All cases)
lm_Y1_from_Y2 <- lm(
Y1 ~ Y2, data = Y
)
print( round(
summary( lm_Y1_from_Y2 )$coefficients[2, c(1:2, 4)], 3
) )
# Predict Y1 from Y2 (Complete cases)
lm_Y1_from_Y2.M <- lm(
Y1 ~ Y2.M, data = Y
)
print( round(
summary( lm_Y1_from_Y2.M )$coefficients[2, c(1:2, 4)], 3
) )
# Impute missing values of Y2 from Y3 via a simplistic
# version of predictive mean matching (Note better
# methods exist, approach is for example purposes only)
lm_Y2.M_from_Y3 <- lm(
Y2.M ~ Y3, data = Y[ Y$R2 == 0, ]
)
Y2.M_pred <- predict(
lm_Y2.M_from_Y3, newdata = Y[ Y$R2 == 1, ]
)
# Impute missing values 10 times, sampling randomly
# from 5 closest observed values
i <- sapply(
1:10, function(m) {
sapply(
Y2.M_pred, function( yhat ) {
smallest_diff <-
order( abs( Y$Y2.M[ Y$R2 == 0 ] - yhat ) )
return(
Y$Y2.M[ Y$R2 == 0 ][smallest_diff][
sample( 1:5, size = 1 )
]
)
}
)
}
)
# Predict Y1 from Y2 using imputed data sets
# and save estimates, standard errors, and
# degrees of freedom
cf <- sapply(
1:10, function(m) {
Y$Y2.I <- Y$Y2.M
Y$Y2.I[ Y$R2 == 1 ] <- i[, m]
lm_Y1_from_Y2.I <- lm(
Y1 ~ Y2.I, data = Y
)
sm <- summary( lm_Y1_from_Y2.I )
return(
cbind( sm$coefficients[, 1:2], sm$df[2] )
)
}
)
cf <- array(
cf, dim = c( 2, 3, 10 )
)
pooled <- pool_over_imputations(
t( cf[, 1, ] ), # Coefficients
t( cf[, 2, ] ), # Squared standard errors
t( cf[, 3, ] ) # Degrees of freedom
)
print( round(
c( pooled[2, 1:2],
pt(
abs( pooled[2, 1]/pooled[2, 2] ), pooled[2, 3], lower.tail = F
)*2
), 3
) )
}