-
-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement dehumanize method #956
Conversation
Branch needs to be updated
Codecov Report
@@ Coverage Diff @@
## master #956 +/- ##
=======================================
Coverage 99.74% 99.75%
=======================================
Files 10 10
Lines 1944 2014 +70
Branches 312 323 +11
=======================================
+ Hits 1939 2009 +70
Misses 4 4
Partials 1 1
Continue to review full report at Codecov.
|
Hey @anishnya from a quick skim over (more detailed review will follow) I really like this PR and think merging it is quite possible. If it only works for 37/54 locales at the moment we can encode a locale check and work on improving the method. With dehumanize and all the PR attempts before I think we've let perfect be the enemy of good sometime. |
* added parameter weekday and thus ability to choose start day when using span in combination with frame == "week" * changed default value for new parameter weekday. * added test for new parameter weekday when using span and frame == "week" * added comma after new parameter weekday. * Update arrow/arrow.py Co-authored-by: Chris <30196510+systemcatch@users.noreply.github.com> * changed parameter name from weekday to week_start. * extended doc string to highlight new parameter week_start. * raise ValueError if 1 <= week_start <= 7 is not met * renamed weekday to week_start in comment. * changed if-else block to one liner. Updated comments. * Update arrow/arrow.py add blank line at the end of the docstring Co-authored-by: Chris <30196510+systemcatch@users.noreply.github.com> * Remove trailing whitespace * added new line at end of doc string. * Fix doc issue * extended test_span_week by further test cases. Co-authored-by: Jad Chaar <jadchaar@users.noreply.github.com> Co-authored-by: Chris <30196510+systemcatch@users.noreply.github.com> Co-authored-by: Jad Chaar <jad.chaar@gmail.com>
* Progress toward locale validation * standardize and validate locales * Add tests * Clean up tests * Use new locale name in error * Remove useless comments * Address comments * English * Remove default locale from test_parser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @anishnya I've done a first pass with some comments. I think you should create a list of supported locales in constants.py
and perform a check with the provided locale, raising an error if it's not supported.
arrow/arrow.py
Outdated
"""Returns an arrow object that represents the | ||
date/time represented by a humanzied arrow time string. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this docstring to be like the humanize
one with params and usage examples.
arrow/arrow.py
Outdated
# Sign logic | ||
sign_val = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed (it is set below or throws an error).
arrow/arrow.py
Outdated
locale_obj = locales.get_locale(locale) | ||
|
||
# Create a regex pattern object for numbers | ||
num_pattern = re.compile(r"[0-9]+") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be able to use \d
here instead.
arrow/arrow.py
Outdated
# If no number matches set change value to be one | ||
if not num_match: | ||
change_value = 1 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for a space here.
tests/test_arrow.py
Outdated
# Errors with humanize cause this test case to fail (issue reported) | ||
""" def test_mixed_granularity_month(self, locale_list_no_weeks): | ||
|
||
for lang in locale_list_no_weeks: | ||
|
||
arw = arrow.Arrow(2000, 1, 10, 5, 55, 0) | ||
past = arw.shift(months=-3, days=-23, seconds=-1) | ||
future = arw.shift(months=3, days=23, seconds=1) | ||
|
||
past_string = past.humanize(arw, locale=lang, granularity=["month","day","second"]) | ||
future_string = future.humanize( | ||
arw, locale=lang, granularity=["month","day","second"] | ||
) | ||
|
||
print(future_string) | ||
print(past_string) | ||
|
||
assert arw.dehumanize(past_string, locale=lang) == past | ||
assert arw.dehumanize(future_string, locale=lang) == future """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed now the issue is filed.
arrow/arrow.py
Outdated
|
||
return current_time.shift( | ||
seconds=sign_val * time_object_info["seconds"], | ||
minutes=sign_val * time_object_info["minutes"], | ||
hours=sign_val * time_object_info["hours"], | ||
days=sign_val * time_object_info["days"], | ||
weeks=sign_val * time_object_info["weeks"], | ||
months=sign_val * time_object_info["months"], | ||
years=sign_val * time_object_info["years"], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return current_time.shift( | |
seconds=sign_val * time_object_info["seconds"], | |
minutes=sign_val * time_object_info["minutes"], | |
hours=sign_val * time_object_info["hours"], | |
days=sign_val * time_object_info["days"], | |
weeks=sign_val * time_object_info["weeks"], | |
months=sign_val * time_object_info["months"], | |
years=sign_val * time_object_info["years"], | |
) | |
time_changes = {k: sign_val*v for k, v in time_object_info.items()} | |
return current_time.shift(**time_changes) |
Or something along those lines, saves having to do all the multiplications in the return.
arrow/arrow.py
Outdated
else: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need lots of tests where garbage/bad info is passed to dehumanize
.
arrow/arrow.py
Outdated
time_object_info = { | ||
"seconds": 0, | ||
"minutes": 0, | ||
"hours": 0, | ||
"days": 0, | ||
"weeks": 0, | ||
"months": 0, | ||
"years": 0, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time_object_info = { | |
"seconds": 0, | |
"minutes": 0, | |
"hours": 0, | |
"days": 0, | |
"weeks": 0, | |
"months": 0, | |
"years": 0, | |
} | |
time_object_info = dict.fromkeys(["seconds", "minutes", "hours", "days", "weeks", "months", "years"], 0) |
Purely personal preference, feel free to ignore.
Will get on these issues asap @systemcatch. How much input validation is needed? Just input that is completely bad, like for example a random string? Or more nuanced input validation like rejecting "2 seconds 10 seconds ago"? As mentioned in the original PR post, the latter is proving to be quite difficult to generalize across locales. The current solution will reject any string that does not contain some kind of time reference like "ago" or "in". |
Let's start with simple validation and work from there. |
…put validation test cases. Implemented fixes as per discussion.
…put validation test cases. Implemented fixes as per discussion.
I've implemented all the suggestions and added additional input validation test cases. Although the input validation isn't perfect, it does reject complete nonsense. Apologies in advance for the weird merge issues in this PR. I'm not too sure why that ended up happening. |
Not to worry we'll squash all the commits to make it look nice and clean. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arrow/arrow.py
Outdated
normalized_locale_name = locale.lower().replace("_", "-") | ||
|
||
if normalized_locale_name not in DEHUMANIZE_LOCALES: | ||
raise ValueError(f"Dehumanize does not currently support locale {locale}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise ValueError(f"Dehumanize does not currently support locale {locale}.") | |
raise ValueError(f"Dehumanize does not currently support the {locale} locale, please consider making a contribution to add support for this locale.") |
arrow/arrow.py
Outdated
if normalized_locale_name not in DEHUMANIZE_LOCALES: | ||
raise ValueError(f"Dehumanize does not currently support locale {locale}.") | ||
|
||
# Create current time object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really need this comment.
"""Invalid input String. String does not contain any relative time information. | ||
String should either represent a time in the future or a time in the past. | ||
Ex: "in 5 seconds" or "5 seconds ago". """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good error message 👍
arrow/arrow.py
Outdated
unit_visited[unit] = True | ||
continue | ||
|
||
# Add change value to the correct unit (incorporates the plualirty that exists within timeframe i.e second v.s seconds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plurality
arrow/arrow.py
Outdated
if not any([True for k, v in unit_visited.items() if v]): | ||
raise ValueError( | ||
"""Input string not valid. Note: Some locales do not support the week granulairty in Arrow. | ||
If you are attempting to use the week granulairty on an unsupported locale, this could be the cause of this error.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
granularity
docs/index.rst
Outdated
Dehumanize | ||
~~~~~~~~ | ||
|
||
Dehumanize a string representing a past time: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "Take a human readable string and use it to shift into a past time."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last tweak and we should be good to merge @anishnya
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done @anishnya, thanks for your effort on this.
Pull Request Checklist
Thank you for taking the time to improve Arrow! Before submitting your pull request, please check all appropriate boxes:
tox
ormake test
to find out!).tox -e lint
ormake lint
to find out!).master
branch.If you have any questions about your code changes or any of the points above, please submit your questions along with the pull request and we will try our best to help!
Description of Changes
This PR isn't expected to be merged, however, we would like to start a dialogue on how to proceed with some of the challenges ahead in implementing a reverse_humanize/dehumanize method as proposed in #360. We would first like to thank #772 and #895 for giving us a good base to work off of.
This attempt so far works with 37 of the 54 current locales in Arrow, a big step forward from previous attempts. There are a couple issues we would like to address and discuss.
1) Input Validation
Currently, this solution has little input validation. For example, a user can "in 3 months and 3 lightseconds" and dehumanize will return an object that is 3 months ahead. After some experimenting, creating a generalized method for input validation across locales seems unfeasible.
For example, one method we attempted was to simply delete any match from the input string and then at the very end of search, raise an error if the string was not empty (after removing the locale's and word and removing any whitespace). This creates some issues in languages such as Azerbaijani, the plural for a unit is the same as a single value for a unit. For example, the phrase for "a second" in Azerbaijani is "saniyə" whereas the phrase for "5 seconds" is "5 saniyə". This type of situation has only been encountered in 3 languages so far. There is an easy fix here, but we worry there may be another locale introduced that breaks those rules entirely.
Some dialogue is needed here on how much input validation is or isn't needed. Potentially we can label dehumanize as fuzzy_dehumanize in the interim and revisit the question of input validation later.
2) Remaining Locales and the Need for Standardization Across Locales
In order to support the remaining locales not supported by this feature, there needs to be some element of standardization across locales. This pertains to what we will be referring to as special time frame values (a special word conveying two days ago).
Some locales incorporate these special time frame values very differently. Some like the Slavic languages used the regular timeframe object and have the key as some list.
Bulgarian:
"minutes": ["{0} минута", "{0} минути", "{0} минути"]
Some have their timeframe key as a dictionary.
Arabic:
"seconds": {"double": "ثانيتين", "ten": "{0} ثوان", "higher": "{0} ثانية"},
Some have some nested combination of both.
Czech:
"seconds": {"past": "{0} sekundami", "future": ["{0} sekundy", "{0} sekund"]}
Some introduce their own separate objects.
Korean:
special_dayframes = { -3: "그끄제", -2: "그제", -1: "어제", 1: "내일", 2: "모레", 3: "글피", 4: "그글피", }
A temporary solution here would be to create a special time frame dictionary for each of these locales, allowing dehumanize to function on these languages and then later down the line figuring out a way to standardize across locales.
A more permanent solution maybe to standardized like the following.
Temporary Solution:
special_timeframes = {"string for two months ago": ["months", 2, True], "{0} special plural unit": ["seconds", 1, False]}
The first element in the list represents the unit that string modifies, the second the value modified and the third is a bool if we need to obtain the value from the string or not.
Permanent Solution:
"seconds": {1: "special value one", 5: "special value two", "general": "general value one {0}", "special case one": "special case string {0}"}
We could check if a key is an int and know the value associated with that phrase for the particular unit. If the key is some string, we know there the value that can be associated with that phrase is arbitrary and we need to obtain the value.
I believe these issues aren't too complex to solve, but I think we need some discussion to choose from multiple potential soltuons.