Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is use_first_valid_of different from dplyr's coalesce? #74

Closed
rgknight opened this issue Oct 21, 2016 · 4 comments
Closed

How is use_first_valid_of different from dplyr's coalesce? #74

rgknight opened this issue Oct 21, 2016 · 4 comments

Comments

@rgknight
Copy link
Collaborator

Would be helpful to have an explanation in the help file, if it is indeed different.

https://github.com/hadley/dplyr/blob/master/R/coalesce.R

@sfirke sfirke changed the title How is use_first_vaild_of different from dplyr coalesce? How is use_first_valid_of different from dplyr's coalesce? Oct 26, 2016
@sfirke
Copy link
Owner

sfirke commented Oct 26, 2016

Great catch. It's not that different :-( I didn't know about dplyr::coalesce.

I compared them and think they are close enough that use_first_valid_of should be deprecated and phased out, with a message to use dplyr::coalesce instead. There are two features in use_first_valid_of that are missing in coalesce:

  1. If there are any type mismatches in the input vectors, the output is coerced to a character vector (with a message). In coalesce, the operation fails with a warning. Stylistic difference, and okay with me.
  2. Factors are coerced to character in use_first_valid_of, while coalesce fails:
> x <- factor(c("a", NA))
> y <- factor(c("b", "c"))
> use_first_valid_of(x, y)
[1] "a" "c"
Warning message:
In use_first_valid_of(x, y) :
  Input vectors are of class 'factor' and will be coerced to class 'character' before combining
> coalesce(x, y)
[1] a    <NA>
Levels: a
Warning message:
In `[<-.factor`(`*tmp*`, i, value = 2L) :
  invalid factor level, NA generated

Course of action:

  • Deprecate use_first_valid_of with message referring to coalesce
  • Post dplyr issue identifying issue 2 above; acceptable if coalesce can't handle factors but in that case it should fail with an error message, not return the current output, IMO.

@rgknight
Copy link
Collaborator Author

rgknight commented Nov 9, 2016

I haven't looked too closely at convert_to_NA, but it may also be a candidate for deprecation with a pointer to dplyr's na_if?

@rgknight
Copy link
Collaborator Author

I submitted an issue on dplyr for this tidyverse/dplyr#2254

This was referenced Nov 28, 2016
sfirke added a commit that referenced this issue Dec 3, 2016
Deprecate `convert_to_NA` and `use_first_valid_of` per #74
@rgknight rgknight closed this as completed Dec 5, 2016
@sfirke
Copy link
Owner

sfirke commented Apr 12, 2017

Updating the vignette to remove the deprecated convert_to_NA and I kind of want to put it back after reading this discussion: tidyverse/dplyr#1972...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants