Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix flaky array_item test failures #11054

Merged
merged 3 commits into from
Jun 14, 2024
Merged

Conversation

binmahone
Copy link
Collaborator

@binmahone binmahone commented Jun 13, 2024

This PR closes #8652

Currently, the cache key for ArrayGen (as well as some other Gens) is incomplete, which will lead to unexpected input being reused.
For the test case described in the issue, it's veriying GetArrayElement(array, index) will throw an exception when index is a invalid value(like negative values). However, if this test case is preceded by another test case which will populate cache with an array with all nulls, GetArrayElement will return null regardless of the index value (Thus not throwing exceptions).

The solution is to refine cache key for all of the XXGen.

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
@binmahone
Copy link
Collaborator Author

build

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
@binmahone
Copy link
Collaborator Author

build

@@ -224,7 +224,11 @@ def test_all_null_int96(spark_tmp_path):
class AllNullTimestampGen(TimestampGen):
def start(self, rand):
self._start(rand, lambda : None)
data_path = spark_tmp_path + '/PARQUET_DATA'

def _cache_repr(self):
Copy link
Collaborator

@thirtiseven thirtiseven Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we need to add a _cache_repr, even if it will only be used once, to prevent a key conflict in the cache. Didn't realize that, nice catch.

thirtiseven
thirtiseven previously approved these changes Jun 13, 2024
Copy link
Collaborator

@thirtiseven thirtiseven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The array integration test issue was quite confusing to me even though it comes from some code I wrote. Thanks for investigating and fixing it!

integration_tests/src/main/python/data_gen.py Outdated Show resolved Hide resolved
integration_tests/src/main/python/data_gen.py Outdated Show resolved Hide resolved
Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
@binmahone
Copy link
Collaborator Author

build

@binmahone binmahone requested a review from jlowe June 13, 2024 13:51
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent find, @binmahone! 🚀

@sameerz sameerz added the test Only impacts tests label Jun 13, 2024
@binmahone binmahone merged commit 599ae17 into NVIDIA:branch-24.08 Jun 14, 2024
44 checks passed
SurajAralihalli pushed a commit to SurajAralihalli/spark-rapids that referenced this pull request Jul 12, 2024
* fix flaky array_item test failures

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

* fix indent

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

* fix whitespace

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

---------

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] array_item test failures on Spark 3.3.x
5 participants