Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adapt prop, patterns and weights matrices for pattern with only 1's, and adding warnings when patterns cannot be generated (#449, #317) #451

Merged
merged 5 commits into from
Mar 31, 2022

Conversation

RianneSchouten
Copy link
Contributor

Dear Stef, dear Gerko,

I have made adjustments in ampute() regarding the calculation of prop, patterns and weights in the situation that a user specifies a pattern with only 1s (issue #449). I have furthermore added unittests and evaluated that all other tests work.

Can you accept my pull request?

Thank you,
Rianne

@lederb
Copy link

lederb commented Dec 3, 2021

sorry for chipping in again, just wanted to mention an issue imho that is slightly related to the check.patterns

mice/R/ampute.R

Lines 258 to 270 in 2cd32a7

if (!bycases) {
prop <- recalculate.prop(
prop = prop,
freq = freq,
patterns = patterns,
n = ncol(data)
)
}
check.pat <- check.patterns(
patterns = patterns,
freq = freq,
prop = prop
)

is it possible that the if (!bycases) {} block should be processed after patterns are cleaned of rows with only ones?
because if I understand recalculate.prop correctly

mice/R/ampute.R

Lines 505 to 521 in 2cd32a7

recalculate.prop <- function(prop, n, patterns, freq) {
miss <- prop * n^2 # Desired #missing cells
# Calculate #cases according prop and #zeros in patterns
cases <- vapply(
seq_len(nrow(patterns)),
function(i) (miss * freq[i]) / length(patterns[i, ][patterns[i, ] == 0]),
numeric(1)
)
if (sum(cases) > n) {
stop("Proportion of missing cells is too large in combination with the desired number of missing variables",
call. = FALSE
)
} else {
prop <- sum(cases) / n
}
prop
}

cases <- vapply( seq_len(nrow(patterns)), function(i) (miss * freq[i]) / length(patterns[i, ][patterns[i, ] == 0]), numeric(1) )

leads to division by 0 for rows of ones thus Inf in the cases vector and the evaluation of if (sum(cases) > n)

is always true?

but maybe I'm wrong.

sorry for bothering again

@RianneSchouten RianneSchouten changed the title adapt prop, patterns and weights matrices for pattern with only 1's (#449) adapt prop, patterns and weights matrices for pattern with only 1's (#449, #317) Dec 3, 2021
@RianneSchouten RianneSchouten changed the title adapt prop, patterns and weights matrices for pattern with only 1's (#449, #317) adapt prop, patterns and weights matrices for pattern with only 1's, and adding warnings when patterns cannot be generated (#449, #317) Dec 3, 2021
@RianneSchouten
Copy link
Contributor Author

@lederb I think you are right, I moved the if (!bycases) {} block until after the patterns.new, freq and prop have been recalculated in commit 6b2be70

@lederb
Copy link

lederb commented Dec 22, 2021

@RianneSchouten sorry it's me again 😄

I now pretty much finished porting the ampute functionality to a different ML system and one additional question came up for me:

It would be why the proportion adjustment for cellwise amputation (bycases = FALSE) in recalculate.prop is using n^2 (n = ncol(data) when being called)?

mice/R/ampute.R

Lines 514 to 530 in 6b2be70

recalculate.prop <- function(prop, n, patterns, freq) {
miss <- prop * n^2 # Desired #missing cells
# Calculate #cases according prop and #zeros in patterns
cases <- vapply(
seq_len(nrow(patterns)),
function(i) (miss * freq[i]) / length(patterns[i, ][patterns[i, ] == 0]),
numeric(1)
)
if (sum(cases) > n) {
stop("Proportion of missing cells is too large in combination with the desired number of missing variables",
call. = FALSE
)
} else {
prop <- sum(cases) / n
}
prop
}

shouldnt it be miss = prop * nrow(data) * ncol(data) and if (sum(cases) > nrow(data)) and also prop = sum(cases) / nrow(data)

cheers Bernhard

@RianneSchouten
Copy link
Contributor Author

@lederb

Again, you are right.
Thank you very much for noticing these mistakes. I appreciate it.

Can I ask, to which ML system are you porting the function?

Bests,
Rianne

@lederb
Copy link

lederb commented Dec 23, 2021

@RianneSchouten

sure. for a university project I'm porting it to https://github.com/apache/systemds which is ran under the open source Apache License 2.0 (http://www.apache.org/licenses/LICENSE-2.0) and has its own R-like script language.

I hope that is okay in terms of legality?

@RianneSchouten
Copy link
Contributor Author

@stefvanbuuren @gerkovink is it possible to accept this PR?

@stefvanbuuren stefvanbuuren merged commit 25a6517 into amices:master Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants