Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature suggestion: parameter to specify decimal separator on adorn_pct_formatting #451

Closed
mgacc0 opened this issue Jul 30, 2021 · 16 comments · Fixed by #510
Closed

feature suggestion: parameter to specify decimal separator on adorn_pct_formatting #451

mgacc0 opened this issue Jul 30, 2021 · 16 comments · Fixed by #510
Milestone

Comments

@mgacc0
Copy link

mgacc0 commented Jul 30, 2021

Would you consider of interest adding a parameter "decimal.mark" (or similar) to adorn_pct_formatting?

@sfirke
Copy link
Owner

sfirke commented Aug 2, 2021

Yes, that sounds like a good idea. This may be a blind spot for me as an American - is this for countries where what I know as 99.1% would be written 99,1% ?

@sfirke sfirke added the seeking comments Users and any interested parties should please weigh in - this is in a discussion phase! label Aug 2, 2021
@mgacc0
Copy link
Author

mgacc0 commented Aug 2, 2021

I'm from Europe.
Internally (on scripts and storage) I often use decimal point. But for reports and output sometimes it's a requirement to use decimal comma.
There are a lot of countries that use decimal comma: https://en.wikipedia.org/wiki/File:DecimalSeparator.svg

@sfirke
Copy link
Owner

sfirke commented Aug 7, 2021

Whoa that is a lot indeed! Yes, let's add this. I am not sure when I'll be able to implement it. I'd accept a pull request if someone wants to add it, or if none comes, then I'll hopefully add it down the road.

I'm trying to think through the implementation...

  • Are there other functions I'd need to modify? I don't think so...
  • Are there other packages that already do this? I could study them, borrow from them, or maybe call them directly

@mgacc0
Copy link
Author

mgacc0 commented Aug 8, 2021

adorn_pct_formatting converts numeric to string.
So it should be the only one where it's needed.

As a reference:
number from scales package https://github.com/r-lib/scales/blob/master/R/label-number.r
internally calls the base::format function.

@jzadra
Copy link
Contributor

jzadra commented Aug 9, 2021

Just a thought here that I haven't looked into due to lack of time, but isn't this a locale setting for R? Perhaps the function could look to that for info, rather than requiring user input?

@mgacc0
Copy link
Author

mgacc0 commented Aug 11, 2021

Use getOption("OutDec") as the default value for decimal.mark parameter:

adorn_pct_formatting <- function(dat, digits = 1, decimal.mark = getOption("OutDec"), rounding = "half to even", affix_sign = TRUE, ...)

@sfirke
Copy link
Owner

sfirke commented Aug 11, 2021

Ooh nice. Okay I can take a shot at this. Then when I get it on a branch, I'll post here and would love for someone in a comma-using locale to try it out.

@sfirke
Copy link
Owner

sfirke commented Jan 5, 2023

@mgacc0 I have a draft of this as #510. Can you take a look please? Hoping to finalize it in the next week or so to have it ready to go CRAN.

I ended up not adding a new argument. Instead it pulls from a user's locale, the value of getOption("OutDec"). There's a cost to adding a new argument, in that it breaks code that relied on column names being the 5th variable, like this test: https://github.com/sfirke/janitor/blob/main/tests/testthat/test-adorn-pct-formatting.R#L164. I was encouraging people to use the ,,, paradigm to get to the tidyselect cols argument and this would break that pattern.

My thinking was that using the user's locale would do the job virtually all the time, and in rare cases where it doesn't, they can set options(OutDec= ",") before calling the function.

@sfirke
Copy link
Owner

sfirke commented Jan 5, 2023

Also @YonghuiDong I see you gave 👍 above, please share if you have any thoughts on the PR linked above

@sfirke
Copy link
Owner

sfirke commented Jan 10, 2023

I haven't heard from any potential user of this feature. I'm going to merge to main branch so it's easier for people to install the dev version, then put out a call on Mastodon and see if I can find anyone to give feedback.

@jlacko
Copy link

jlacko commented Jan 11, 2023

I am sorry to report that the feature does not seem to be working - when using the Czech locale (we are one of the European troublemakers) I still get the dot as decimal separator; see the output below.

Also, on a personal note: it is very common practice for R users from non-English speaking countries to have set English locale as default. There are many reasons for that - such as that you don't really want to google for error messages in some marginal language such as mine, when there are troves of English resources just lying around. Also it is not the bestest of practice to have commas as decimal marks in your actual code; you want dots to make it universal.

As a consequence going by the decimal separator from locale can be unreliable. Many of the target users will have their development environment in English, but will be producing locally flavored output.

> sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] Czech_Czechia.1250
system code page: 65001

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] janitor_2.1.0.9000 dplyr_1.0.10      

loaded via a namespace (and not attached):
 [1] rstudioapi_0.14  magrittr_2.0.3   tidyselect_1.2.0 timechange_0.1.1 R6_2.5.1         rlang_1.0.6     
 [7] fansi_1.0.3      stringr_1.5.0    tools_4.2.0      utf8_1.2.2       cli_3.6.0        DBI_1.1.3       
[13] withr_2.5.0      ellipsis_0.3.2   assertthat_0.2.1 tibble_3.1.8     lifecycle_1.0.3  purrr_1.0.0     
[19] tidyr_1.2.1      vctrs_0.5.1      glue_1.6.2       snakecase_0.11.0 stringi_1.7.8    compiler_4.2.0  
[25] pillar_1.8.1     generics_0.1.3   lubridate_1.9.0  pkgconfig_2.0.3 


> mtcars %>%
+   tabyl(am, cyl) %>%
+   adorn_percentages("col") %>%
+   adorn_pct_formatting()
 am     4     6     8
  0 27.3% 57.1% 85.7%
  1 72.7% 42.9% 14.3%

@sfirke sfirke reopened this Jan 11, 2023
@sfirke
Copy link
Owner

sfirke commented Jan 11, 2023

Thank you @jlacko for the user perspective! To check something: would it be a viable workflow for a user who wanted commas in their output tabyl but not the rest of the time to set options(OutDec= ",") before printing their tabyls?

Based on your feedback I'm thinking there should be an explicit argument in the adorn_pct_formatting function that allows for decimal mark control. Do you/others think that the default value should be . or the locale-specific getOption("OutDec") ?

I don't understand why the locale marker is not getting picked up in your case, but maybe it doesn't matter if that workflow is undesirable anyway.

@jlacko
Copy link

jlacko commented Jan 11, 2023

Having thought about the issue a bit I believe the approach using OutDec option makes great sense.

For two reasons:

  • the use case for comma as decimal mark is not on actual code, but on generated output - documents in R markdown & now Quarto. In generating markdown & quarto documents is common practice to set options for stuff like {knitr} in the special setup chunk. So there is precedent.
  • the option(OutDec=",") guides not only janitor::adorn_pct_formatting() behavior specifically, but the decimal separator generally; such as when called via base::print(). So setting it once at document level should lead to a consistent behavior across all printing functions.

I confess I am liking the approach the more I think about it...

@sfirke
Copy link
Owner

sfirke commented Jan 12, 2023

Alright, so I'd encourage use of option(OutDec) regardless of locale it sounds like. And I guess in a case like yours where it isn't getting picked up via the locale setting, that's fine (nothing to be done about it) and you could just control it via that parameter?

In that case I could leave the current code as-is and I would just need to add to the function documentation that users can directly control the printing with options(OutDec= ",") at the .Rprofile, file, or chunk level. How does that sound?

@jlacko
Copy link

jlacko commented Jan 12, 2023

Yes, that is what I would do. And in encouraging the use of option(OutDec) in package documentation please mention that this setting also guides the behavior of base print, so both are aligned using one parameter - which is kind of neat, if you think of it :)

@sfirke
Copy link
Owner

sfirke commented Jan 30, 2023

I'll implement the documentation in a separate issue, to reflect that the code-writing part of this is done: #520

@sfirke sfirke closed this as completed Jan 30, 2023
@sfirke sfirke removed the seeking comments Users and any interested parties should please weigh in - this is in a discussion phase! label Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants