Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide sort argument to tabyl() #406

Closed
richierocks opened this issue Oct 21, 2020 · 8 comments
Closed

Provide sort argument to tabyl() #406

richierocks opened this issue Oct 21, 2020 · 8 comments

Comments

@richierocks
Copy link

In the same way that dplyr::count() has a sort argument, which sorts the results in descending order of count, it would be helpful if tabyl() had an equivalent sort argument.

In the 1-dimensional case, the behavior is fairly straightforward.

library(janitor)
mtcars %>% 
  tabyl(cyl, sort = TRUE)

would be equivalent to

library(janitor)
library(dplyr)
mtcars %>% 
  tabyl(cyl)%>%
  arrange(desc(n))
#>  cyl  n percent
#>    8 14 0.43750
#>    4 11 0.34375
#>    6  7 0.21875

For two dimensions, it's a little trickier. I think the best behavior is to only sort on the first dimension. That is,

mtcars %>% 
  tabyl(cyl, gear, sort = TRUE)

would be equivalent to

core <- mtcars %>% 
  tabyl(cyl, gear) %>% 
  attr(., "core")
ord <- order(rowSums(core), decreasing = TRUE)
core[ord, ]
#>   cyl  3 4 5
#> 3   8 12 0 2
#> 1   4  1 8 2
#> 2   6  2 4 1

You might want to allow the user to choose which dimension to sort on. For example, setting sort_dim = 1 would give the previous behavior, and sort_dim = 2 would sort the column order by decreasing column sums, and sort_dim = NA would be the default of no sorting. I'm not sure if that is too complicated an interface though.

@sfirke
Copy link
Owner

sfirke commented Oct 22, 2020

It's funny, back when one-way tabyl was a separate function it had a sort argument exactly like that. Then when it got merged with two-way tabyl, I didn't see a clear way to implement sort and so I removed it. Some past discussion is at #351. I appreciate your example of what sort could mean in the context of two-way tabyls and would be curious to hear from other users if such two-way sort is something they do. If so, the advantage to sort is that it would also sort the underlying core so that later adorn_ functions would work.

I could be convinced, I do like sort for one-way tabyls. But then it clutters the interface for two-way, and it's not so bad to type %>% arrange(desc(n)).

@jzadra
Copy link
Contributor

jzadra commented Oct 22, 2020

My two cents - I find that it's much simpler to just use %>% arrange(desc(n)) when I want to reorder something. I feel like adding a sort argument for a 2 way table that either sort the first dim or have to specify a dim is less intuitive and adds unnecessary complexity.

@richierocks
Copy link
Author

If it makes any difference, here's the context on why I want this.

I'm trying to teach exploration of categorical variables to some fairly new R users, and I want to end up with something like

library(dplyr)
mtcars %>% 
  count(carb, sort = TRUE) %>% 
  mutate(percent = 100 * n / sum(n))

So it's just the 1-way case, but it means I have to explain a lot of subtleties like "where did that n column come from?", and "why is percent calculated like that?".

I was hoping to use janitor to avoid those sorts of code discussions and just focus on the dataset. Unfortunately, the equivalent janitor code is still two lines and requires a little bit of thinking about for people who aren't that confident with data manipulation.

library(dplyr)
library(janitor)
mtcars %>% 
  tabyl(carb) %>% 
  arrange(desc(n))

Since I'm optimizing for code that requires minimal explaining, it seems like

library(janitor)
mtcars %>% 
  tabyl(carb, sort = TRUE)

or possibly

library(janitor)
mtcars %>% 
  tabyl(carb, sort_dim = 1)

would work really nicely.

@pstils
Copy link

pstils commented Dec 8, 2021

Please add back in, or fix sorting: dplyr::arrange(desc()) does work to an extent, but it does break; here's what I've found:
tabyl() %>% arrange(desc()) # that's fine

tabyl() %>% arrange(desc()) %>% adorn_totals() # also fine

tabyl() %>% arrange(desc()) %>% adorn_totals() %>% adorn_percentates() # still fine

tabyl() %>% arrange(desc()) %>% adorn_totals() %>% adorn_percentates() %>% adorn_pct_formatting() # table's looking nice....

tabyl() %>% arrange(desc()) %>% adorn_totals() %>% adorn_percentates() %>% adorn_pct_formatting() %>% adorn_ns() # order is now completely screwed up.

The only way aroud this I've found is to used adorn_ns(postiion = "front"), which kind-of allows me to then use arrange(desc()) afterwards, but then the "Total" row is no longer at the bottom.

Apologies if I've failed to read something I should have read before posting.

@sfirke
Copy link
Owner

sfirke commented Dec 9, 2021

@pstils is your example a two-way tabyl, like mtcars %>% tabyl(am, cyl)? If so, I think this should be fixed via this other issue: #407 It's a very detailed look but basically, if the problem is that the original non-sorted Ns are adorning onto the now-sorted tabyl, I think I can fix it there without adding a sort argument to tabyl().

@sfirke
Copy link
Owner

sfirke commented Dec 9, 2021

@pstils I just pushed an update to the main branch that I think will address what you're talking about. See if your example above now works after you install the dev version from GitHub?

Example:

mtcars %>% tabyl(am, cyl) %>% arrange(desc(`4`)) %>% adorn_totals() %>% adorn_percentages() %>% adorn_pct_formatting() %>% adorn_ns()

    am          4         6          8
     1 61.5%  (8) 23.1% (3) 15.4%  (2)
     0 15.8%  (3) 21.1% (4) 63.2% (12)
 Total 34.4% (11) 21.9% (7) 43.8% (14)

@sfirke
Copy link
Owner

sfirke commented Dec 9, 2021

@richierocks sorry for the slow response and for this getting away from what you brought up. I don't plan to re-implement a sort argument so will close this issue, I'm afraid arrange() is the best bet for now though I agree it's a little trickier for beginners.

@sfirke sfirke closed this as completed Dec 9, 2021
@pstils
Copy link

pstils commented Dec 9, 2021

@sfirke Thanks Sam, despite my non-reprex that was exactly the situation. I've got the dev version and it works exactly as indended with arrange(desc()) after the tabyl() now - Thank you again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants