Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support relational operators for decimal type #1173

Merged
merged 15 commits into from
Dec 1, 2020

Conversation

nartal1
Copy link
Collaborator

@nartal1 nartal1 commented Nov 20, 2020

Support relational operators for decimal type. Added integration tests to verify corner cases.

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
integration_tests/src/main/python/data_gen.py Outdated Show resolved Hide resolved
integration_tests/src/main/python/data_gen.py Outdated Show resolved Hide resolved
integration_tests/src/main/python/data_gen.py Outdated Show resolved Hide resolved
integration_tests/src/main/python/data_gen.py Outdated Show resolved Hide resolved
integration_tests/src/main/python/data_gen.py Outdated Show resolved Hide resolved
@nartal1 nartal1 self-assigned this Nov 20, 2020
@nartal1 nartal1 added this to the Nov 23 - Dec 4 milestone Nov 20, 2020
revans2 and others added 2 commits November 20, 2020 13:29
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
Updated decimal generation
@sameerz sameerz added the feature request New feature or request label Nov 23, 2020
@nartal1 nartal1 changed the title [WIP] Support lessThan operator for decimal type and add tests Support relational operators for decimal type and add tests Nov 24, 2020
@nartal1 nartal1 changed the title Support relational operators for decimal type and add tests Support relational operators for decimal type Nov 24, 2020
…mal_lessthan

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@nartal1
Copy link
Collaborator Author

nartal1 commented Nov 24, 2020

build

integration_tests/src/main/python/cmp_test.py Outdated Show resolved Hide resolved
integration_tests/src/main/python/data_gen.py Show resolved Hide resolved
integration_tests/src/main/python/data_gen.py Outdated Show resolved Hide resolved
integration_tests/src/main/python/data_gen.py Outdated Show resolved Hide resolved
if precision is None:
precision = 18
scale = 0
special_cases = [Decimal('-1'), Decimal('0'), Decimal('1'), Decimal(DECIMAL_MIN), Decimal(DECIMAL_MAX)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even -1 and 1 do not work all of the time. a precision of 1 and scale of -3 would be an example where this would not work. It can only hold multiples of 100 up to 900.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand above statement. precision of 1 and scale of -3 is 1000 and I am not getting why it can hold multiples of 100. Could you please explain. I tried setting precision and scale as mentioned above and it passed. So I am missing something.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> df = spark.createDataFrame(data=[(Decimal('-1')), (Decimal('0'))], schema=DecimalType(7, -3))
>>> df.show()
+-----+
|value|
+-----+
| 0E+3|
| 0E+3|
+-----+

The details of -2 vs -3 scale don't matter, what matters is -1 becomes 0 for any scale that is negative. For a Decimal where the precision is larger than the scale this actually becomes an error'

>>> df = spark.createDataFrame(data=[(Decimal('-1')), (Decimal('0'))], schema=DecimalType(7, 7))
>>> df.show()
20/11/25 12:52:57 ERROR Executor: Exception in task 0.0 in stage 32.0 (TID 125)
java.lang.ArithmeticException: Decimal precision 8 exceeds max precision 7
	at org.apache.spark.sql.types.Decimal.set(Decimal.scala:122)
	at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:571)

We cannot blindly add in -1 as a special case.

Copy link
Collaborator Author

@nartal1 nartal1 Nov 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the examples. I have removed -1 and 1 from special cases. Added another generator where scale and precision are same.

def __init__(self, precision=None, scale=None, nullable=True, special_cases=[]):
if precision is None:
#Maximum number of decimal digits a Long can represent is 18
precision = 18
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is going to cause problems for add/etc.

df = spark.createDataFrame(data=[(Decimal('1')), (Decimal('1'))], schema=DecimalType(18, 0))
>>> df.selectExpr('value + value').printSchema()
root
 |-- (value + value): decimal(19,0) (nullable = true)

We will fall have to fall back to the CPU in all of those cases. I am OK with this for now, but I think we need to work on begin sure that we select the proper set of precision and scale for a given test.

Copy link
Collaborator Author

@nartal1 nartal1 Nov 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Selecting correct precision and scale for each test would be good while adding other tests.

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@sperlingxx
Copy link
Collaborator

build

@nartal1
Copy link
Collaborator Author

nartal1 commented Nov 30, 2020

build

@nartal1
Copy link
Collaborator Author

nartal1 commented Nov 30, 2020

build

@nartal1
Copy link
Collaborator Author

nartal1 commented Nov 30, 2020

@revans2 I think I have addressed remaining review comments. Please take another look.

@revans2 revans2 merged commit e577705 into NVIDIA:branch-0.3 Dec 1, 2020
nartal1 added a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
Signed-off-by: Niranjan Artal <nartal@nvidia.com>

Co-authored-by: Robert (Bobby) Evans <bobby@apache.org>
nartal1 added a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
Signed-off-by: Niranjan Artal <nartal@nvidia.com>

Co-authored-by: Robert (Bobby) Evans <bobby@apache.org>
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
[auto-merge] bot-auto-merge-branch-23.06 to branch-23.08 [skip ci] [bot]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants