Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Column estimate not found in .data for pool() from mgcv::gam #218

Closed
mikeguggis opened this issue Feb 6, 2020 · 2 comments
Closed

Comments

@mikeguggis
Copy link

mikeguggis commented Feb 6, 2020

Hello,

I am getting the error "Error: Column estimate not found in .data" when trying to run pool() on a mira object generated from with() on a gam object (from mgcv::gam()). The appropriate tidy and glance functions exist.

Reproducible example

# Simulate data
library(MASS)
set.seed(1775)
N = 100
Xcomplete = mvrnorm(N, c(1,2), matrix(c(1,.5,.5,1),2,2))
W = cbind(1,Xcomplete[,1],Xcomplete[,1]^2,Xcomplete[,2])
eps = rnorm(N)
beta = c(1,1,2,3)

y = W %*% beta + eps

missingid1 = as.logical(rbinom(N,1,.1))
missingid2 = as.logical(rbinom(N,1,.1))
X = Xcomplete
X[missingid1,1] = NA
X[missingid2,2] = NA

datcomplete = as.data.frame(cbind(y=y,x1 = Xcomplete[,1], x2 = Xcomplete[,2]))
dat = as.data.frame(cbind(y=y,x1 = X[,1], x2 = X[,2]))

# Estimate fully observed GAM to get knots
library(mgcv)
out1 = gam(y ~ s(x1, bs = "bs") + x2, data = datcomplete)
knots = list(x1 = out1$smooth[[1]]$knots) #get knots

# MICE the data
library(mice)
mice1 = mice(dat)

# Estimate GAMs on MICEed data with user defined knots
micegam = with(data = mice1, gam(y ~ s(x1, bs = "bs") + x2, knots = knots))
poolgam = pool(micegam)
?broom::glance.gam
?broom::tidy.gam

Thank you for your time,

Mike

@stefvanbuuren
Copy link
Member

OK, thanks a lot.

I wasn't aware that broom::tidy.gam() doesn't produce the estimates and standard errors by default. mice 3.7.1 adds the parametric = TRUE parameter to the call to tidy.gam(), so now your example should run.

I do not know exactly what parametric = TRUE does, but I think comes down to a simplification of the model. Thus, it might have an effect on interpretation. As of now, it is still an open issue of how non-parametric smooths should be pooled.

Hope this helps, nevertheless.

@mikeguggis
Copy link
Author

mikeguggis commented Feb 7, 2020

Thanks Stef. I believe pooling the non-parametric smooth terms and parametric terms is a little bit of work but fairly straight forward. Assuming you are using cubic splines, the user must first manually define the knots (so each imputed data set uses the same knots) then you pool mgcv::gam()$coefficients to get the mean and between variance. The covariance matrices (if they exist) are mgcv::gam()$Ve, mgcv::gam()$Vp, and mgcv::gam()$Vc provide the within variance.

However, if the user defined knot locations are estimated prior to the model estimation with the MI datasets then I am not sure what the appropriate covariance matrix is (unconditional on estimated knot locations).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants