Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to make_clean_names() breaks packages on CRAN #513

Closed
sfirke opened this issue Jan 16, 2023 · 13 comments
Closed

Changes to make_clean_names() breaks packages on CRAN #513

sfirke opened this issue Jan 16, 2023 · 13 comments
Assignees
Milestone

Comments

@sfirke
Copy link
Owner

sfirke commented Jan 16, 2023

Re: this matter #495 (comment)

The breaking change to how duplicate names are assigned suffixes _1 _2 etc. breaks tests in at least the BFS and nflfastR packages.
To resolve it, I see at least these options:

  1. Refactor that feature so that it adds an allow_dupes argument and preserves the existing behavior of make_clean_names. Submit this month to CRAN.
  2. Same as Option 1 but do it after the 2.2 release, pulling this change out for now.
  3. Pull the change out for this 2.2 release, then merge back into main as-is with the breaking change. Notify the affected package maintainers, help them fix if necessary, give them time to resubmit to CRAN, and then submit janitor to CRAN again.

I lean toward 1 (if we have bandwidth now) or 2 (if not). Seeing a couple of CRAN packages that depended on the old behavior makes me think of all the users whose scripts will be similarly affected. I don't think our change in the make_clean_names output is an improvement, it's neutral IMO - and while it improves the janitor codebase, I think avoiding major disruption outweighs that.

@billdenney
Copy link
Collaborator

I lean toward option 3. I prefer the new behavior, but having breaking behavior will be problematic with short notice.

@sfirke
Copy link
Owner Author

sfirke commented Jan 19, 2023

Sounds like either way the immediate move is to remove the breaking change for 2.2 to buy some time. Time for me to try out git revert.

I wonder also if the next release would be the a good time to implement #240 and split the package up, especially if janitor became tidytab and custodian (??) or something and janitor got left behind in maintenance mode...

@sfirke sfirke self-assigned this Jan 19, 2023
@billdenney
Copy link
Collaborator

I like the idea of implementing #240 (as I have for a while). I'd lean toward janitor keeping the name and tidytab being split off. (I like the name janitor, and it is well-known in the R community, too!) The breaking change of make_clean_names() giving slightly different numbering doesn't seem like it's worth a whole-scale switch in the package, to me.

@billdenney
Copy link
Collaborator

I've been thinking a bit more about it. How about we do the following for the release:

  • Make a change for make_clean_names(janitor_version = "2.1.0") which will use the old behavior with a deprecation warning.
  • If the janitor_version option is NULL or missing, it uses the new method.
  • Suggest PRs for the new behavior for the dependent packages.
  • Switch the default to janitor_version = NULL in the next version of janitor (after the other packages have released on CRAN-- and probably in at least 6 months)
  • Remove the janitor_version = "2.1.0" code some time in the relatively distant future (at least 2024)

@sfirke
Copy link
Owner Author

sfirke commented Jan 20, 2023

Under that paradigm, would the intended behavior for current janitor users be: see the deprecation warning upon updating to the new version, start using dat %>% clean_names(janitor_version = NULL) in all new projects to future-proof the code?

Then when the next version (2.3 or 3.0) comes out, they can go back to dat %>% clean_names() ? I think most users won't permanently adopt the practice of always specifying janitor_version.

And then I imagine we'd leave that janitor_version in for years, as you imply, to aid in reproducibility of old scripts.

@billdenney
Copy link
Collaborator

Yeah, I think that current users would get a warning that there will be a change in the next version. The goal is that we enable a smooth transition path for users and packages.

And, yeah, we would leave janitor_version in for years.

@sfirke
Copy link
Owner Author

sfirke commented Jan 24, 2023

I keep thinking about this and feel increasingly confident that we should revert back to the old behavior. Here's my analysis of the tradeoffs of the new approach:

Pros

  • More elegant codebase of make_clean_names itself
  • Slightly more elegant UI

Neutral

  • The actual numbering of duplicates

Cons

  • Additional work for us now and in the future. Implementing the interim solution, reaching out to other packages to help them with the change, implementing the next stage of the solution and submitting to CRAN again...
  • An unknown amount of work for janitor's users, who will have to pay attention to the warnings they get, start specifying janitor_version, and possibly deal with breakage of old code.
  • Overall codebase gets more complicated as we have to support an addition argument and its implementation.

I don't think I'm up for committing to the additional maintainer work, now and in the future, but even if I was, I worry that this will be a net-negative for the janitor user base who will mostly experience additional friction.

I propose reverting to Jason's earlier version of #497 where the while loop can be avoided. We could avoid it with allow_dupes as he first drew it up. Or if we can document it well, we could instead allow unique_sep = NULL to trigger avoiding the while-loop. Then we would be exposing an existing underlying argument rather than adding a new one while resolving that slight conflict that Jason raised. As long as users who want to preserve duplicated names can find that in the docs.

@billdenney
Copy link
Collaborator

That makes sense to me.

I've been working through a project of mine from 10 years ago this morning. I'm really appreciating stable interfaces, and I've been struggling to figure out why I did some of the things that I did then. More friction should be for a strong reasons, and I think that we can accomplish this in a way that preserves backward compatibility as you outlined.

@sfirke
Copy link
Owner Author

sfirke commented Jan 25, 2023

Thanks, Bill. I will take a crack at implementing as described above, starting with Jason's initial work, and I'll ping you when it's ready for review.

@sfirke
Copy link
Owner Author

sfirke commented Jan 29, 2023

We could avoid it with allow_dupes as he first drew it up. Or if we can document it well, we could instead allow unique_sep = NULL to trigger avoiding the while-loop.

I started trying to do the latter and I don't think it's viable. because the default call to make_clean_names uses an implied unique_sep = NULL already when it calls to_any_case. So we'd be getting into territory of, act differently when the janitor user explicitly says unique_sep = NULL rather than when it's not specified at all and defaults to that value.

Too nuanced and hazy, I just went with allow_dupes as originally discussed in that issue and PR. Not perfect but felt like the most practical way forward.

@sfirke
Copy link
Owner Author

sfirke commented Jan 29, 2023

This is fixed in #519, @billdenney I'll tag you for review - let me know whether you're able to review this coming week.

@billdenney
Copy link
Collaborator

I'm on it.

@sfirke
Copy link
Owner Author

sfirke commented Jan 30, 2023

Fixed by #519

@sfirke sfirke closed this as completed Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants