Using tabyl with index-numbers #317

gekess · 2019-11-25T17:21:27Z

Feature requests

I would like to use tabyl for recurring procedures with a map()-function, looping over several variables in a data frame without naming them, but rather just have the loop access the next variable in line. It is possible with table(), but I don't like table()'s properties and rather do it with tabyl().

Here is an example with table():

map(mtcars[1:6], function(x) table(x,mtcars$gear))

With tabyl I would have to specify each tabulation, correct?

tabyl(mtcars, mpg, gear)
tabyl(mtcars, cyl, gear)
# etc.
tabyl(mtcars, wt, gear)

That seems tedious to me and I would prefer a looping-solution.

jzadra · 2019-11-25T17:55:05Z

suppressPackageStartupMessages(require(tidyverse))
suppressPackageStartupMessages(require(janitor))


mtcars %>% 
  pivot_longer(-gear) %>% 
  group_split(name) %>% 
  map(function(x) {
    varname <- x %>% distinct(name) %>% pull(name) 
    
    x %>% 
      rename(!!varname := value) %>% 
      tabyl(!!sym(varname), gear)
  })
#> [[1]]
#>  am  3 4 5
#>   0 15 4 0
#>   1  0 8 5
#> 
#> [[2]]
#>  carb 3 4 5
#>     1 3 4 0
#>     2 4 4 2
#>     3 3 0 0
#>     4 5 4 1
#>     6 0 0 1
#>     8 0 0 1
#> 
#> [[3]]
#>  cyl  3 4 5
#>    4  1 8 2
#>    6  2 4 1
#>    8 12 0 2
#> 
#> [[4]]
#>   disp 3 4 5
#>   71.1 0 1 0
#>   75.7 0 1 0
#>   78.7 0 1 0
#>   79.0 0 1 0
#>   95.1 0 0 1
#>  108.0 0 1 0
#>  120.1 1 0 0
#>  120.3 0 0 1
#>  121.0 0 1 0
#>  140.8 0 1 0
#>  145.0 0 0 1
#>  146.7 0 1 0
#>  160.0 0 2 0
#>  167.6 0 2 0
#>  225.0 1 0 0
#>  258.0 1 0 0
#>  275.8 3 0 0
#>  301.0 0 0 1
#>  304.0 1 0 0
#>  318.0 1 0 0
#>  350.0 1 0 0
#>  351.0 0 0 1
#>  360.0 2 0 0
#>  400.0 1 0 0
#>  440.0 1 0 0
#>  460.0 1 0 0
#>  472.0 1 0 0
#> 
#> [[5]]
#>  drat 3 4 5
#>  2.76 2 0 0
#>  2.93 1 0 0
#>  3.00 1 0 0
#>  3.07 3 0 0
#>  3.08 2 0 0
#>  3.15 2 0 0
#>  3.21 1 0 0
#>  3.23 1 0 0
#>  3.54 0 0 1
#>  3.62 0 0 1
#>  3.69 0 1 0
#>  3.70 1 0 0
#>  3.73 1 0 0
#>  3.77 0 0 1
#>  3.85 0 1 0
#>  3.90 0 2 0
#>  3.92 0 3 0
#>  4.08 0 2 0
#>  4.11 0 1 0
#>  4.22 0 1 1
#>  4.43 0 0 1
#>  4.93 0 1 0
#> 
#> [[6]]
#>   hp 3 4 5
#>   52 0 1 0
#>   62 0 1 0
#>   65 0 1 0
#>   66 0 2 0
#>   91 0 0 1
#>   93 0 1 0
#>   95 0 1 0
#>   97 1 0 0
#>  105 1 0 0
#>  109 0 1 0
#>  110 1 2 0
#>  113 0 0 1
#>  123 0 2 0
#>  150 2 0 0
#>  175 2 0 1
#>  180 3 0 0
#>  205 1 0 0
#>  215 1 0 0
#>  230 1 0 0
#>  245 2 0 0
#>  264 0 0 1
#>  335 0 0 1
#> 
#> [[7]]
#>   mpg 3 4 5
#>  10.4 2 0 0
#>  13.3 1 0 0
#>  14.3 1 0 0
#>  14.7 1 0 0
#>  15.0 0 0 1
#>  15.2 2 0 0
#>  15.5 1 0 0
#>  15.8 0 0 1
#>  16.4 1 0 0
#>  17.3 1 0 0
#>  17.8 0 1 0
#>  18.1 1 0 0
#>  18.7 1 0 0
#>  19.2 1 1 0
#>  19.7 0 0 1
#>  21.0 0 2 0
#>  21.4 1 1 0
#>  21.5 1 0 0
#>  22.8 0 2 0
#>  24.4 0 1 0
#>  26.0 0 0 1
#>  27.3 0 1 0
#>  30.4 0 1 1
#>  32.4 0 1 0
#>  33.9 0 1 0
#> 
#> [[8]]
#>   qsec 3 4 5
#>  14.50 0 0 1
#>  14.60 0 0 1
#>  15.41 1 0 0
#>  15.50 0 0 1
#>  15.84 1 0 0
#>  16.46 0 1 0
#>  16.70 0 0 1
#>  16.87 1 0 0
#>  16.90 0 0 1
#>  17.02 1 1 0
#>  17.05 1 0 0
#>  17.30 1 0 0
#>  17.40 1 0 0
#>  17.42 1 0 0
#>  17.60 1 0 0
#>  17.82 1 0 0
#>  17.98 1 0 0
#>  18.00 1 0 0
#>  18.30 0 1 0
#>  18.52 0 1 0
#>  18.60 0 1 0
#>  18.61 0 1 0
#>  18.90 0 2 0
#>  19.44 1 0 0
#>  19.47 0 1 0
#>  19.90 0 1 0
#>  20.00 0 1 0
#>  20.01 1 0 0
#>  20.22 1 0 0
#>  22.90 0 1 0
#> 
#> [[9]]
#>  vs  3  4 5
#>   0 12  2 4
#>   1  3 10 1
#> 
#> [[10]]
#>     wt 3 4 5
#>  1.513 0 0 1
#>  1.615 0 1 0
#>  1.835 0 1 0
#>  1.935 0 1 0
#>  2.140 0 0 1
#>  2.200 0 1 0
#>  2.320 0 1 0
#>  2.465 1 0 0
#>  2.620 0 1 0
#>  2.770 0 0 1
#>  2.780 0 1 0
#>  2.875 0 1 0
#>  3.150 0 1 0
#>  3.170 0 0 1
#>  3.190 0 1 0
#>  3.215 1 0 0
#>  3.435 1 0 0
#>  3.440 1 2 0
#>  3.460 1 0 0
#>  3.520 1 0 0
#>  3.570 1 0 1
#>  3.730 1 0 0
#>  3.780 1 0 0
#>  3.840 1 0 0
#>  3.845 1 0 0
#>  4.070 1 0 0
#>  5.250 1 0 0
#>  5.345 1 0 0
#>  5.424 1 0 0

^{Created on 2019-11-25 by the reprex package (v0.3.0)}

gekess · 2019-11-27T15:33:19Z

Thanks jzadra,

I would like to understand your code better to improve my own skills: from what I gathered, you transform the data from wide- to long-format and then split the df into list-elements each containing "gear" plus the variable to be tabulated with and the corresponding values per variable.
Don't we leave with this approach the idea of tidyverse of cases in rows and variables in columns? For me, it seemed practical in base to use indices for rows and columns. Then I read that Hadley W. doesn't approve of them, but in the post he didn't share why. And dplyr seems to follow a different logic. Could you point me to some literature or some posts where this issue is being discussed and explained?

And as an addendum: Would you be so kind to comment shortly after each line what your rationale for it is? What's the goal of your transformation?

jzadra · 2019-11-27T18:13:34Z

Certainly!

Yes, going to long does go against the tidy data standard, but only for a second - because what we are splitting the long format in the next step into separate tables that are then back to tidy data. Going wide to long is useful on many occasions when you want to functionalize something (i.e. using map) or you want to vectorize some operation on multiple columns. Often you then return to wide afterwords.

The reason for not using indices in my opinion is that they are incredibly prone to breakage. If you change something in an earlier processing step that adds a column or a row, all of a sudden your indices aren't referring to what they used to refer to. Referring to things by name on the other hand it doesn't matter what location it moves to.

My general goal below is to make it so that I don't have to type the names of the all the columns we want to compare with gear. To do that I make a list of separate tables with a column for gear and a column for the other var values that has the same name in each list table: value (the default from pivot_longer, you could name it whatever you want). We also have name column that we use to identify how to split into separate tables, and from that we can also grab the name of the var so we know what we're looking at later.

Reshape to long format where we have the gear column repeating alongside a name key column a value column:

mtcars %>% 
  pivot_longer(-gear) %>%

We want to end up with separate tables for each non-gear variable. This generates a list of tibbles based on the name value:

  group_split(name) %>%

Now we are going to write a function to do the same thing to each tibble in the list. x becomes the table object inside the function environment, same as we'd get if we pulled one of the tables out of the list and assigned it to x.

  map(function(x) {

Because we want to include then name of the variable we are crossing with gear, we extract it here from the column that holds the value to be used later. Since every row is the same, distinct() gets us down to 1 row. And since we just want a character object, pull() grabs it out of the tibble.
(The base R split used to name the lst elements which would have made this uncessary. Hadley refuses to do this for group_split() and it isn't clear why, but they are adamant about it, so we are left with this small extra step).

    varname <- x %>% distinct(name) %>% pull(name)

Again, x is the tibble, but it has gear, name, and value. columns. If we don't do the rename, the tabyl's will all say gear by name, so we won't know what variable is with gear. The way this is done is using rlang which is definitely more advanced R (and you can find it in Hadley's Advanced R book). The !! says to look for the varname object outside of the x object. And the := takes the place of = when we do an assignment that requires evaluating the assignee first (ie we don't want the name to be !!varname, we want it to be mpg, etc. So that code needs to evaluate as rename(mpg = value), and we have to get to the value of varname first.

    x %>% 
      rename(!!varname := value) %>%

Now finally for the tabyl. If we hadn't renamed value, this part would simply be tabyl(value, gear). But since now every time we go through the map loop the other variable has a different name, we need to use that variable name in the code. Again, the !! says to look for the varname object outside of x. sym() converts the character object varname into a symbol. Just like how we don't write tabyl("mpg", gear) where it would take "mpg" as literal rather than an object, giving it the raw varname would be the same as "mpg" when we actually want the bare, unquoted symbol mpg.

      tabyl(!!sym(varname), gear)
  })

Hope that helps!

sfirke · 2019-11-30T18:59:17Z

Thank you @jzadra for providing this thoughtful, detailed response! 🙌

gekess · 2019-11-30T19:02:59Z

@jzadra : wonderful!!! Appreciate this a lot!

RachelK1994 · 2021-06-24T01:42:23Z

Hi there!

I found this extremely helpful - I essentially replicated the code above for my own dataset (switching the order of variables in the table slightly) and it worked once but since I tried to clean it up slightly, I've repeatedly generated this error:

Error in :=(!!varname, value) :
Check that is.data.table(DT) == TRUE. Otherwise, := and :=(...) are defined for use in j, once only and in particular ways. See help(":=").

Any help would be hugely appreciated (@jzadra I wonder whether you have any quick thoughts?)!

Best,
Rachel

PS New to this forum so apologies in advance if I've missed some of the conventions.

library(readxl)
TPRdata <- read_excel("TPR-061121.xlsx")
View(TPRdata)
install.packages("janitor")
install.packages("reshape")
library(ggplot2)
library(lubridate)
library(tidyr)
library(tidyverse)
library(plyr)
library(dplyr)
library(glue)
library(purrr)
library(data.table)
library(reshape)
library(janitor)
Iffunction2 <- function(x) ifelse(x == "NA", "NA",ifelse(x == "-1", "Suppressed", ifelse(is.null(x), "Blank", ifelse(x == "Not Provided", "Not Provided", "Value"))))
TPRdata2 <- mutate_all(TPRdata, Iffunction2)
TPRdata2$state2 <- TPRdata$state
View(TPRdata2)
TPRdata2 %>% 
  pivot_longer(-state2) %>% 
  group_split(name) %>% 
  map(function(x) {
    varname <- x %>% distinct(name) %>% pull(name) 
    
    x %>% 
      rename(!!varname := value) %>% 
      tabyl(state2, !!sym(varname)) %>%
      adorn_percentages("row") %>%
      adorn_pct_formatting(digits = 2) %>%
      adorn_ns()
  })

sfirke · 2021-06-24T19:24:26Z

I can't say for sure but it looks like := is used in the data.table package, and since you load that package after the tidyverse packages, it's calling the data.table version of that operator. Try not loading data.table and seeing if that fixes your problem.

RachelK1994 · 2021-06-24T23:19:03Z

Thank you very much for getting back to me @sfirke! I've tried this but unfortunately I then get the message, " Error in :=(!!varname, value) : could not find function ":="" - is it just the "dplyr" package I need to read in or is there perhaps another package I'm missing (or perhaps I've loaded another redundant package that is causing the issue)? Best,Rachel

sfirke · 2021-06-25T00:28:34Z

It might be from the rlang package? I'm not really sure how to help further since it's not a janitor-related issue, sorry. You might try posting a question on StackOverflow for help. Sam

…

On Thu, Jun 24, 2021, 7:19 PM RachelK1994 ***@***.***> wrote: Thank you very much for getting back to me @sfirke <https://github.com/sfirke>! I've tried this but unfortunately I then get the message, " Error in :=(!!varname, value) : could not find function ":="" - is it just the "dplyr" package I need to read in or is there perhaps another package I'm missing (or perhaps I've loaded another redundant package that is causing the issue)? Best,Rachel — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#317 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZYDEBVJD52PSEQUJTDOHTTUO4PFANCNFSM4JRMHA3Q> .

jzadra · 2021-06-25T04:45:00Z

Yeah I'm pretty sure these require rlang. Thanks, Jon

…

-- Jonathan Zadra, PhD (he/him) Director, Data Science Sorenson Impact Center David Eccles School of Business, University of Utah www.sorensonimpact.com (801) 581-4815

On Jun 24, 2021, 18:28 -0600, Sam Firke ***@***.***>, wrote: It might be from the rlang package? I'm not really sure how to help further since it's not a janitor-related issue, sorry. You might try posting a question on StackOverflow for help. Sam On Thu, Jun 24, 2021, 7:19 PM RachelK1994 ***@***.***> wrote: > Thank you very much for getting back to me @sfirke > <https://github.com/sfirke>! I've tried this but unfortunately I then get > the message, " Error in :=(!!varname, value) : could not find function > ":="" - is it just the "dplyr" package I need to read in or is there > perhaps another package I'm missing (or perhaps I've loaded another > redundant package that is causing the issue)? Best,Rachel > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#317 (comment)>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABZYDEBVJD52PSEQUJTDOHTTUO4PFANCNFSM4JRMHA3Q> > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

mariahelmrich · 2022-07-22T08:55:36Z

I have spent days trying to figure this out! Your code worked for me, thank you!!

sfirke closed this as completed Nov 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using tabyl with index-numbers #317

Using tabyl with index-numbers #317

gekess commented Nov 25, 2019

jzadra commented Nov 25, 2019

gekess commented Nov 27, 2019 •

edited

Loading

jzadra commented Nov 27, 2019

sfirke commented Nov 30, 2019

gekess commented Nov 30, 2019

RachelK1994 commented Jun 24, 2021 •

edited

Loading

sfirke commented Jun 24, 2021

RachelK1994 commented Jun 24, 2021

sfirke commented Jun 25, 2021 via email

jzadra commented Jun 25, 2021 via email

mariahelmrich commented Jul 22, 2022

Using tabyl with index-numbers #317

Using tabyl with index-numbers #317

Comments

gekess commented Nov 25, 2019

Feature requests

jzadra commented Nov 25, 2019

gekess commented Nov 27, 2019 • edited Loading

jzadra commented Nov 27, 2019

sfirke commented Nov 30, 2019

gekess commented Nov 30, 2019

RachelK1994 commented Jun 24, 2021 • edited Loading

sfirke commented Jun 24, 2021

RachelK1994 commented Jun 24, 2021

sfirke commented Jun 25, 2021 via email

jzadra commented Jun 25, 2021 via email

mariahelmrich commented Jul 22, 2022

gekess commented Nov 27, 2019 •

edited

Loading

RachelK1994 commented Jun 24, 2021 •

edited

Loading