Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional PMM routine that excludes (a vector of) observed values from the donor pool #392

Closed
wants to merge 7 commits into from

Conversation

gerkovink
Copy link
Member

Might be interesting to include since it comes up as a request quite often.

What

mice.impute.pmm.exclude excludes observed values or a vector of observed values from matching. Hence, these values are not imputed, but still have a role in imputation.

Why

Sometimes users want to exclude certain observations from ending up in the imputations, without excluding them from the imputation procedure altogether. With mice.impute.pmm.exclude these observed values can still serve as predictor values.

Some tests

# to install this
# devtools::install_github(repo = "gerkovink/mice@pmm999")
library(mice)

# TEST 1
# impute without exclude
imp <- mice(nhanes, 
            seed = 123, 
            printFlag = FALSE)
A <- imp$imp$chl

# impute with exclude
meth  <- make.method(nhanes)
meth["chl"] <- "pmm.exclude"
imp <- mice(nhanes, meth = meth, exclude = c(218, 187), 
            seed = 123, 
            printFlag = FALSE)
B <- imp$imp$chl

any(A == 187 | A == 218) # May be TRUE
#> [1] TRUE 
any(B == 187 | B == 218) # Must be FALSE
#> [1] FALSE 

# TEST 2 - copied from mice.impute.pmm
set.seed(53177)
xname <- c("age", "hgt", "wgt")
r <- stats::complete.cases(boys[, xname])
x <- boys[r, xname]
y <- boys[r, "tv"]
ry <- !is.na(y)

# Impute missing tv data with original pmm
set.seed(123); yimp.pmm <- mice.impute.pmm(y, ry, x)
set.seed(123); yimp <- mice.impute.pmm.exclude(y, ry, x)
identical(yimp, yimp.pmm) #should be TRUE
#> [1] TRUE

set.seed(123); yimp.pmm <- mice.impute.pmm(y, ry, x)
set.seed(123); yimp <- mice.impute.pmm.exclude(y, ry, x, exclude = c(20, 25))
identical(yimp, yimp.pmm) # should be FALSE
#> [1] FALSE
c(20, 25) %in% yimp # should be FALSE twice
#> [1] FALSE FALSE

Created on 2021-05-10 by the reprex package (v1.0.0)

R CMD check

── R CMD check results ────────────────────────────────── mice 3.13.7 ────
Duration: 3m 4.3s

0 errors ✓ | 0 warnings ✓ | 0 notes ✓

R CMD check succeeded

@stefvanbuuren
Copy link
Member

This is a useful addition. Two suggestions:

  • Preferably implemented in the standard mice.impute.pmm(..., exclude = c(...))function to evade code duplication;
  • In the likely case that you want different exclusions for different variables, use the blots parameter to pass down different exclude vectors.

@gerkovink gerkovink closed this Nov 10, 2022
@gerkovink gerkovink reopened this Nov 10, 2022
@gerkovink
Copy link
Member Author

moving over to other branch. Closed by #519.

@gerkovink gerkovink closed this Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants