Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reasoning behind proportion adjustment in check.patterns of ampute #449

Closed
lederb opened this issue Nov 30, 2021 · 12 comments
Closed

Reasoning behind proportion adjustment in check.patterns of ampute #449

lederb opened this issue Nov 30, 2021 · 12 comments
Assignees

Comments

@lederb
Copy link

lederb commented Nov 30, 2021

Hi and sorry for bothering you,
But I'm currently going over the ampute R code as part of a university project and at the moment I'm a bit confused when it comes to the adjustment of the proportion if there are patterns containing only 1's (e.g. no expected amputation for this pattern, if I understood it correctly).

mice/R/ampute.R

Lines 529 to 564 in d60fe55

check.patterns <- function(patterns, freq, prop) {
prop.one <- 0
row.one <- c()
for (h in seq_len(nrow(patterns))) {
if (any(!patterns[h, ] %in% c(0, 1))) {
stop(paste("Argument patterns can only contain 0 and 1, pattern", h, "contains another element"), call. = FALSE)
}
if (all(patterns[h, ] %in% 1)) {
prop.one <- prop.one + freq[h]
row.one <- c(row.one, h)
}
}
if (prop.one != 0) {
warning(paste("Proportion of missingness has changed from", prop, "to", prop.one, "because of pattern(s) with merely ones"), call. = FALSE)
prop <- prop.one
freq <- freq[-row.one]
freq <- recalculate.freq(freq)
patterns <- patterns[-row.one, ]
warning("Frequency vector and patterns matrix have changed because of pattern(s) with merely ones", call. = FALSE)
}
prop.zero <- 0
row.zero <- c()
for (h in seq_len(nrow(patterns))) {
if (all(patterns[h, ] %in% 0)) {
prop.zero <- prop.zero + freq[h]
row.zero <- c(row.zero, h)
}
}
objects <- list(
patterns = patterns,
prop = prop,
freq = freq,
row.zero = row.zero
)
objects
}

According to the current implementation if I have e.g.

`

myPatterns
V1 V2 V3
1 1 1 1
2 1 0 1
3 1 1 0

result <- ampute(data=testdata, prop=0.5, patterns=myPatterns, freq=c(0.2, 0.4, 0.4))
Warning messages:
1: Proportion of missingness has changed from 0.5 to 0.2 because of pattern(s) with merely ones
2: Frequency vector and patterns matrix have changed because of pattern(s) with merely ones`

Thus, I'm getting a result where the proportion of amputed rows is close to 0.2.

`

md.pattern(result$amp)
V1 V3 V2
8013 1 1 1 0
994 1 1 0 1
993 1 0 1 1
0 993 994 1987
`

But currently I'm simply not understanding, why the proportion is reduced to 0.2 and not kept at 0.5 which would be like ignoring the patterns of ones and only applying the second and third pattern using frequencies of [0.4/0.8, 0.4/0.8].

I'd really appreciate any hint on why the approach was chosen to set the missingness proportion to the sum of frequencies of the patterns which have only 1's in the pattern?

Sorry for bothering :-)

@prockenschaub
Copy link
Contributor

prockenschaub commented Dec 1, 2021

@lederb I tend to agree with you.

@stefvanbuuren @RianneSchouten could this simply be a missing prop.one = 1 - prop.one right before the block

mice/R/ampute.R

Lines 541 to 547 in d60fe55

if (prop.one != 0) {
warning(paste("Proportion of missingness has changed from", prop, "to", prop.one, "because of pattern(s) with merely ones"), call. = FALSE)
prop <- prop.one
freq <- freq[-row.one]
freq <- recalculate.freq(freq)
patterns <- patterns[-row.one, ]
warning("Frequency vector and patterns matrix have changed because of pattern(s) with merely ones", call. = FALSE)

Edit: the above doesn't lead to the likely intended behaviour

@lederb
Copy link
Author

lederb commented Dec 1, 2021

@prockenschaub
yeah that was also my first intution but then I thought it should probably also be weighted by the initially set proportion like:

prop = (1 - prop.one) * prop

at least for me it would make sense if I e.g. have 3 partitions, the first one of 20% in my example without amputations, then two partitions for 40% and 40% each with a proportion of 0.5 (my example) amputed samples?

additionally I also was suprised that so far I couldn't find any adjustment of the weight matrix based on which patterns get removed. only the assertion of

mice/R/ampute.R

Lines 315 to 318 in d60fe55

weights <- as.data.frame(weights)
if (!nrow(weights) %in% c(nrow(patterns), nrow(patterns.new))) {
stop("The objects patterns and weights are not matching", call. = FALSE)
}

because if I remove some allOnes-pattern rows from the pattern matrix shouldn't those be also removed from the weights matrix?

but maybe I simply didn't see the consideration of this yet

@prockenschaub
Copy link
Contributor

@lederb agreed. I realised that only after submitting the comment, have edited above.

To take your example above, the final data should have a proportion of rows with missing data of (1 - prop.one) * prop = (1 - 0.2) * 0.5 = 0.4 which can be achieved by removing ones-only patterns and setting the prop = 0.4 for all other patterns.

Regarding the weights, I am not entirely sure. They appear to be handled in the sum.score function later but I haven't fully reconstructed what that function does.

@lederb
Copy link
Author

lederb commented Dec 1, 2021

@prockenschaub
regarding the weights I also only had a quick look if they were adjusted, which I couldn't see anywhere, but not yet how they are used later on, but since there is no returned information from the check.patterns function on which 1-rows were removed (only the row.zero index vector which indicates rows of zeros, but those are not dropped from the patterns I think) an adjustment of the weights outside of the check.patterns function seems unlikely

@prockenschaub
Copy link
Contributor

You are right, I think. As far as I can tell, the following code randomly assigns a latent missing pattern for each row based on the cleaned patterns and frequencies (i.e., those left when excluding the rows with all 1's).

mice/R/ampute.R

Lines 389 to 392 in d60fe55

P <- sample.int(
n = nrow(patterns.new), size = nrow(data),
replace = TRUE, prob = freq
) + 1

For your example above, patterns was

# X1 X2 X3
#  1  1  1
#  0  1  1
#  1  1  0

and the cleaned patterns.new equals

# X1 X2 X3
#  0  1  1
#  1  1  0

P in the above code snippet is therefore an assignment for each row in the data to one of the two rows of patterns.new. sum.scores --- which calculates the weights --- is later passed patterns, however, which still contains the original 3 rows and an unchanged weights, which also still contains 3 rows.

mice/R/ampute.R

Lines 402 to 408 in d60fe55

scores <- sum.scores(
P = P,
data = data,
std = std,
weights = weights,
patterns = patterns
)

In this function there is then a mismatch between P (which is based on 2 patterns) and patterns/weights, which still include 3 patterns. As a result, there is a mismatch between weights and patterns.new. The weight of the all-ones pattern weight[1,] is used for patterns.new[1,] (=patterns[2,]). weigth[2,] is used for patterns.new[2,] (=patterns[3,]). weight[3,] is not used at all.

The intended behaviour should be to use weights[2:3, ].

@gerkovink
Copy link
Member

@RianneSchouten can you drop in?

@gerkovink
Copy link
Member

Isn't it that the row with ones is redundant? Then it should be excluded from all calculations.

@lederb
Copy link
Author

lederb commented Dec 2, 2021

@gerkovink yes I'd think so too but also that first of all the adjustment of `prop <- prop.one' is for me at least a bit weird

if I'd call it using e.g. prop = 0.9 but my patterns contain for whatever reason a 1's row with corresponding frequency of e.g. 0.01, I at least would expect some proportion of amputed rows closer to 0.9 than 0.01

and second, if the patterns are sliced by removing rows of 1s the weights should probably be too in the same manner.
or maybe I'm misunderstanding the algorithm somehow then this issue is of course irrelevant

nevertheless, since this is rather an edge case I'll try to port the algorithm for now as is and change it later on if there is some consensus that the current behavior isn't intended

just wanted to point you at it, if it actually is unintended behaviour and therefore could require changes

@RianneSchouten
Copy link
Contributor

RianneSchouten commented Dec 2, 2021 via email

@RianneSchouten
Copy link
Contributor

RianneSchouten commented Dec 2, 2021

@lederb @prockenschaub thank you for your interest in ampute() and thank you for making us aware of issues like these.

Honestly, I do not remember anymore why we decided to set prop.one <- prop.one + freq[h] but I agree with you that prop.one <- (1 - prop.one) * prop would be better.

Regarding the weights matrix; I appreciate you finding this bug. Indeed, the weights matrix should be adapted in order for the patterns to correspond with the right weights.

I will make the necessary changes tomorrow and make a pull request.

Thank you both for reaching out, and please do not hesitate to let us know if you encounter other issues.

@gerkovink
Copy link
Member

Thanks @RianneSchouten

RianneSchouten added a commit to RianneSchouten/mice that referenced this issue Dec 3, 2021
stefvanbuuren added a commit that referenced this issue Mar 31, 2022
adapt prop, patterns and weights matrices for pattern with only 1's, and adding warnings when patterns cannot be generated (#449, #317)
@stefvanbuuren
Copy link
Member

Closing because it's now merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants