-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support relational operators for decimal type #1173
Conversation
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
Updated decimal generation
…to decimal_lessthan
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
…mal_lessthan Signed-off-by: Niranjan Artal <nartal@nvidia.com>
build |
if precision is None: | ||
precision = 18 | ||
scale = 0 | ||
special_cases = [Decimal('-1'), Decimal('0'), Decimal('1'), Decimal(DECIMAL_MIN), Decimal(DECIMAL_MAX)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even -1 and 1 do not work all of the time. a precision of 1 and scale of -3 would be an example where this would not work. It can only hold multiples of 100 up to 900.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't understand above statement. precision of 1 and scale of -3 is 1000
and I am not getting why it can hold multiples of 100. Could you please explain. I tried setting precision and scale as mentioned above and it passed. So I am missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> df = spark.createDataFrame(data=[(Decimal('-1')), (Decimal('0'))], schema=DecimalType(7, -3))
>>> df.show()
+-----+
|value|
+-----+
| 0E+3|
| 0E+3|
+-----+
The details of -2 vs -3 scale don't matter, what matters is -1 becomes 0 for any scale that is negative. For a Decimal where the precision is larger than the scale this actually becomes an error'
>>> df = spark.createDataFrame(data=[(Decimal('-1')), (Decimal('0'))], schema=DecimalType(7, 7))
>>> df.show()
20/11/25 12:52:57 ERROR Executor: Exception in task 0.0 in stage 32.0 (TID 125)
java.lang.ArithmeticException: Decimal precision 8 exceeds max precision 7
at org.apache.spark.sql.types.Decimal.set(Decimal.scala:122)
at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:571)
We cannot blindly add in -1 as a special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the examples. I have removed -1
and 1
from special cases. Added another generator where scale and precision are same.
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
def __init__(self, precision=None, scale=None, nullable=True, special_cases=[]): | ||
if precision is None: | ||
#Maximum number of decimal digits a Long can represent is 18 | ||
precision = 18 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This is going to cause problems for add/etc.
df = spark.createDataFrame(data=[(Decimal('1')), (Decimal('1'))], schema=DecimalType(18, 0))
>>> df.selectExpr('value + value').printSchema()
root
|-- (value + value): decimal(19,0) (nullable = true)
We will fall have to fall back to the CPU in all of those cases. I am OK with this for now, but I think we need to work on begin sure that we select the proper set of precision and scale for a given test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Selecting correct precision and scale for each test would be good while adding other tests.
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
build |
build |
build |
@revans2 I think I have addressed remaining review comments. Please take another look. |
Signed-off-by: Niranjan Artal <nartal@nvidia.com> Co-authored-by: Robert (Bobby) Evans <bobby@apache.org>
Signed-off-by: Niranjan Artal <nartal@nvidia.com> Co-authored-by: Robert (Bobby) Evans <bobby@apache.org>
[auto-merge] bot-auto-merge-branch-23.06 to branch-23.08 [skip ci] [bot]
Support relational operators for decimal type. Added integration tests to verify corner cases.