You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the get_dupes() for detecting duplicates in my datasets. Relevant variables are all numeric.
For sex, age, agelength, LocIDorg, PopulationUniverse, and value variables, the rule is to detect non-identical values.
For year.07 variable, the rule is slightly relaxed: if the difference between two record is smaller than 0.7, then these two records are considered duplicates. Is it possible to have this relaxed rules for duplicates matching in get_dupes()? Please let me know.
Hi! I think this is outside the scope of that function. I'd call this "fuzzy" duplicate identification and in my experience it gets complex quickly, with each case requiring a unique solution. You might try:
Expanding your data to contain values within the full range of that variable, then looking for duplicates including that column
That might be done with an expand() or some kind of join. Also take a look at the fuzzyjoin package, because you're working with a numeric variable this might not be so bad
Binning the variable somehow, then looking to see if there are duplicate bins
It could be a good candidate for a StackOverflow question, if you can share reproducible data that folks can work with. Good luck!
Feature requests
I am using the get_dupes() for detecting duplicates in my datasets. Relevant variables are all numeric.
For sex, age, agelength, LocIDorg, PopulationUniverse, and value variables, the rule is to detect non-identical values.
For year.07 variable, the rule is slightly relaxed: if the difference between two record is smaller than 0.7, then these two records are considered duplicates. Is it possible to have this relaxed rules for duplicates matching in get_dupes()? Please let me know.
# duplicates <- janitor::get_dupes(df, sex, age, agelength, LocIDorg, PopulationUniverse, year.07, value)
The text was updated successfully, but these errors were encountered: