`tidy.mipo` and `tidy.glance` #240

vincentarelbundock · 2020-06-03T18:47:19Z

This PR does:

Add tests to pool.r.rsquared to make sure we get the same results as before after I modify it.
Add a glanced object to mipo. This allows us to retrieve information about the original models stored in mira.
Add support for mipo objects in pool.r.squared. This is step 3 because we need access to the new glanced object.
Add new glance.mipo and tidy.mipo methods.

I have two technical questions for @stefvanbuuren :

In pool.r.squared, did you intend to hardcode the value "3", or should it be df.residuals? https://github.com/vincentarelbundock/mice/blob/tidymipo/R/pool.r.squared.R#L89
For mipo objects, I don't have a good way to detect if the original model was lm. Do you think that checking if there is an r.squared column is glanced is sufficient, or should I add a new column of information to glance, such as class(object)[1]? Or maybe the call should be saved as an attribute of tidy.mipo(object)? Here's what I do now: https://github.com/vincentarelbundock/mice/blob/tidymipo/R/pool.r.squared.R#L73

stefvanbuuren · 2020-06-03T19:42:30Z

The standard error after Fisher transform is 1/sqrt(N - 3), and that's what I've used. The df.residual parameter depends on model complexity. I am not aware of any work that compares both possibilities. For the moment, I would stick to the "official" Fisher formulation, and thus use the "3".
Although some may disagree with me, pooling R^2 is a general procedure that is not tied to the linear model, so I think there is no need to know the model. For now, I would just pool any r.squared and r.squared.adjusted values, irrespective of where they come from. Also, here it would be interesting to study this a little deeper.

stefvanbuuren · 2020-06-03T19:52:10Z

For those who are interested in the problem of pooling R^2, also see Van Ginkel, JR (2020). Standardized regression coefficients and newly proposed estimators for R^2 in multiply imputed data. Psychometrika, https://doi.org/10.1007/s11336-020-09696-4

vincentarelbundock · 2020-06-04T03:19:38Z

Thanks or the reference. Looks interesting.

I believe that commit 59f1ad4 wraps things up for this PR.

Of course, I'm happy to change stuff if you find anything you dislike during code review.

stefvanbuuren · 2020-06-04T08:14:46Z

Vincent, thanks a lot. Almost there. Found three things:

DESCRIPTION needs a Depends: generics
Import needed:

  pool: no visible global function definition for ‘residuals’
  summary.mira: no visible global function definition for ‘residuals’
  Undefined global functions or variables:
    residuals
  Consider adding
    importFrom("stats", "residuals")
  to your NAMESPACE file.

The weird error checking still occurs with test-pool.r.squared.R and test-tidiers.R. Start a fresh session, then library(mice) and library(testthat) and run the test code. This produces errors, but devtools::check() does not find them. Very curious.

Could you take a look it these?

vincentarelbundock · 2020-06-04T12:05:33Z

Forgetting to run devtools::check(). A rookie mistake!

I fixed problems 1 and 2 and force-pushed to keep the history a bit cleaner.

W.r.t. problem 3, the issue seems to be that the imputed data produced by mice::mice interactively differs from the imputed data produced under devtools::test.

Here's the most minimal example I can produce. First, run this interactively from a fresh session to look at the seeds:

imp <- mice::mice(mice::nhanes, maxit = 2, m = 2, seed = 1, print = FALSE)
imp$lastSeedValue[1:5]
#> [1]      10403        141 -705880619 1904110533 1951823516

Then, save this in a file called mice/tests/testthat/test-seed.R:

context("seed issue")

imp <- mice::mice(mice::nhanes, maxit = 2, m = 2, seed = 1, print = FALSE)

test_that('interactive and devtools::test seeds differ',{
    interactive <- c(10403L, 141L, -705880619L, 1904110533L, 1951823516L)
    expect_equal(imp$lastSeedValue[1:5], interactive)
})

Finally, open a fresh session of R, and run devtools::test(). On my machine, the result is:

test-seed.R:7: failure: interactive and devtools::test seeds differ
imp$lastSeedValue[1:5] not equal to `interactive`.
2/5 mismatches (average diff: 5003)
[1] 403 - 10403 == -10000
[2] 135 -   141 ==     -6

This shows that the first 2 elements of imp$lastSeedValue are different when we run interactively in a fresh session or from devtools::test.

Note that this problem persists if I checkout and install 9184f07, which is your last commit before I made any changes.

I tried to find the problem, but I don't understand the internals of mice well enough to figure it out. For instance, in the mice() function, we assign lastSeedValue to .Random.seed, but that object is not assigned in that function at all.

https://github.com/stefvanbuuren/mice/blob/master/R/mice.R#L397

Why is it only the first two elements lastSeedValue that differ?
Could there be seed "bleeding" between calls when several runs of mice are conducted in a test suite?
Why does the parlmice test fail when I run devtools::test() twice in a single session?

stefvanbuuren · 2020-06-04T19:08:57Z

I was able to reproduce your findings. The imp$lastSeedValue is used in mice.mids() to continue iteration.
The following script suggests that devtools::test() and interactive R use different random generators

context("seed issue")

set.seed(1L)
test_that('interactive and devtools::test seeds differ (2)',{
  interactive <- c(10403L, 624L, -169270483L, -442010614L, -603558397L)
  expect_equal(.Random.seed[1:5], interactive)
})

Test fails. See also https://stackoverflow.com/questions/56104176/devtoolstest-rngversion-and-sample-yields-different-results. I searched devtools issues for "set.seed" and "RNG", but didn't find it. Do you think it's worth opening an issue?

I do not (yet) know where the parlmice problem stems from.
Your code looks fine, will merge.

stefvanbuuren · 2020-06-04T19:53:52Z

I now found the possible culprit. In test-pool.R I use

# set the random generator to V3.5.0 to ensure that this test 
# passes in V3.6.0 and later
# see mail Kurt Hornik, dated 06mar19
# FIXME: consider using the new generator once V3.6.0 is out, 
# at the expense of breaking reproducibility of the examples in 
# https://stefvanbuuren.name/fimd/
suppressWarnings(RNGversion("3.5.0"))

I expect updating this will solve it. No devtools problem.

vincentarelbundock · 2020-06-04T20:11:03Z

RNGs still feel a bit like black magic to me, so I'm glad you were able to track down the problem.

Thanks for being so efficient and reactive. This PR was a breeze. And thanks for your work on mice more generally!

vincentarelbundock added 4 commits June 3, 2020 13:19

test-pool.r.squared.R

7cca985

add glanced to mipo object

fd060f4

pool.r.squared support for mipo objects

5db95b8

new glance.mipo and tidy.mipo methods

4fe9536

tidy.mipo PR wrap-up

2618de9

stefvanbuuren merged commit 2618de9 into amices:master Jun 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`tidy.mipo` and `tidy.glance` #240

`tidy.mipo` and `tidy.glance` #240

vincentarelbundock commented Jun 3, 2020

stefvanbuuren commented Jun 3, 2020

stefvanbuuren commented Jun 3, 2020

vincentarelbundock commented Jun 4, 2020

stefvanbuuren commented Jun 4, 2020

vincentarelbundock commented Jun 4, 2020

stefvanbuuren commented Jun 4, 2020

stefvanbuuren commented Jun 4, 2020

vincentarelbundock commented Jun 4, 2020

tidy.mipo and tidy.glance #240

tidy.mipo and tidy.glance #240

Conversation

vincentarelbundock commented Jun 3, 2020

stefvanbuuren commented Jun 3, 2020

stefvanbuuren commented Jun 3, 2020

vincentarelbundock commented Jun 4, 2020

stefvanbuuren commented Jun 4, 2020

vincentarelbundock commented Jun 4, 2020

stefvanbuuren commented Jun 4, 2020

stefvanbuuren commented Jun 4, 2020

vincentarelbundock commented Jun 4, 2020

`tidy.mipo` and `tidy.glance` #240

`tidy.mipo` and `tidy.glance` #240