-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clean_names function simplification #102
Comments
Probably should have written a pull request, but this is what I have simplified the function down to:
The other approach you could take is using the function to modify the dataframe in place, but I have heard that is a bad practice. I'm wondering if something like this should be put as a pull-request to tidyr, as it seems like a task any analyst would run into. Let me know what you think of this. |
Hi and thanks for sharing these thoughts. Besides looking at the underlying clean_names.R file, please review the tests for clean_names for the full list of cases the function addresses. There are a lot of things I want to improve about janitor but refactoring If you are looking for a way to contribute to this function, you could take a shot at #96? |
Thanks for the feedback. I only suggest stringr because it would be faster than base gsub. Is there a particular reason why you choose using base over a different package? I understand dependencies come at a cost, but I think it would help with other issues, such as verifying encoding of bad characters on column names. With that, I'd definitely be willing to take a look at #96 to see if I can assist with this issue. |
Can you open a separate issue for "/" failing? That's a bug that should be fixed. Speed isn't much of a concern here because you shouldn't have so many column names that it makes a difference. If you think I could see |
@rgknight Agreed, sorry for such a late response on this, so maybe do like a try-catch/switch statement to first detect if there are _ convert to a space, and then run the |
I was looking over the documentation for your clean names function, and I think it can probably be simplified to something like this:
with stringr package:
It would probably be easiest to get the column names from an dataframe, store it in a vector, do the transformations, and then re-assign the result.
This may look a little bit complicated, but essentially it is stringing together POSCIX character classes with an or operator to capture pretty much any special character, and replace with nothing. You could define your own with the brackets. Hope you find this to be usefull.
http://www.regular-expressions.info/posixbrackets.html
The text was updated successfully, but these errors were encountered: