Enable approx percentile tests #3770

andygrove · 2021-10-08T01:05:56Z

Depends on rapidsai/cudf#9403 and rapidsai/cudf#9537.

Closes #3703 and #3706.

Signed-off-by: Andy Grove <andygrove@nvidia.com>

revans2

Just a few minor nits.

integration_tests/src/main/python/hash_aggregate_test.py

andygrove · 2021-10-22T16:18:59Z

build

mythrocks

LGTM

andygrove · 2021-10-22T19:32:02Z

I am working with @mythrocks to track down the cause for the test failures and it looks like it may be due to a regression in cuDF.

mythrocks · 2021-10-27T00:01:13Z

I am working with @mythrocks to track down the cause for the test failures

It took a while to find. I should have a PR up for this fix shortly.

`segmented_gather()` currently assumes that null LIST rows also have a `0` size (as defined by the difference of adjacent offsets.) This might not hold, for example, for LIST columns that are members of STRUCT columns whose parent null masks are superimposed on its children. This would cause a non-empty list row to be marked null, without compaction. This leads to errors in fetching elements of a list row as seen in NVIDIA/spark-rapids/pull/3770. This commit adds the handling of uncompacted LIST rows in `segmented_gather()`.

mythrocks · 2021-10-27T20:15:34Z

I should have a PR up for this fix shortly.

Sorry for the delay. The fix is in rapidsai/cudf#9537. I've tested that test_hash_groupby_approx_percentile_double_scalar works with this fix.

$ ./run_pyspark_from_build.sh -k test_hash_groupby_approx_percentile_double_scalar
...
============================= test session starts ==============================
platform linux -- Python 3.8.12, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 -- /home/mithunr/anaconda3/envs/cudf-dev-11.2-2/bin/python
cachedir: .pytest_cache
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/mithunr/workspace/dev/spark-plugin/01/integration_tests/target/run_dir/.hypothesis/examples')
rootdir: /home/mithunr/workspace/dev/spark-plugin/01/integration_tests, configfile: pytest.ini
plugins: xdist-2.3.0, benchmark-3.4.1, hypothesis-6.21.5, forked-1.3.0
collecting ... collected 11283 items / 11281 deselected / 2 selected

../../src/main/python/hash_aggregate_test.py::test_hash_groupby_approx_percentile_double_scalar[false] PASSED [ 50%]
../../src/main/python/hash_aggregate_test.py::test_hash_groupby_approx_percentile_double_scalar[true] PASSED [100%]
...
=============== 2 passed, 11281 deselected, 20 warnings in 8.10s ===============

…-percentile-tests

`segmented_gather()` currently assumes that null LIST rows also have a `0` size (as defined by the difference of adjacent offsets.) This might not hold, for example, for LIST columns that are members of STRUCT columns whose parent null masks are superimposed on its children. This would cause a non-empty list row to be marked null, without compaction. This leads to errors in fetching elements of a list row as seen in NVIDIA/spark-rapids/pull/3770. This commit adds the handling of uncompacted LIST rows in `segmented_gather()`. Authors: - MithunR (https://github.com/mythrocks) Approvers: - Conor Hoekstra (https://github.com/codereport) - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) URL: #9537

…-percentile-tests

andygrove · 2021-11-05T23:11:49Z

build

andygrove added 7 commits October 4, 2021 14:39

Enable some approx percentile tests

7a0ff7b

Signed-off-by: Andy Grove <andygrove@nvidia.com>

Enable some approx percentile tests

5bee18c

Signed-off-by: Andy Grove <andygrove@nvidia.com>

enable tests and add more tests

7f5674f

improve null test

e5fb1d0

merge latest

3f304de

add tests for byte input

3a2c496

remove temp debug print

8f906a8

andygrove added the bug Something isn't working label Oct 8, 2021

andygrove added this to the Oct 4 - Oct 15 milestone Oct 8, 2021

andygrove self-assigned this Oct 8, 2021

andygrove marked this pull request as draft October 8, 2021 01:06

andygrove added 3 commits October 7, 2021 19:07

Remove comment

40c4644

Signed-off-by: Andy Grove <andygrove@nvidia.com>

update documentation

163df72

run approx percentile tests with and without AQE

d2f5acb

Signed-off-by: Andy Grove <andygrove@nvidia.com>

sameerz modified the milestones: Oct 4 - Oct 15, Oct 18 - Oct 29 Oct 15, 2021

andygrove added 4 commits October 15, 2021 16:56

Add test for split CPU/GPU approx_percentile and implement fix

ea908ed

scalastyle

45683a8

merge from branch-21.12

47bf382

Revert fix for issue 3770

f5c633a

andygrove linked an issue Oct 19, 2021 that may be closed by this pull request

[BUG] approx_percentile returns array of zero percentiles instead of null in some cases #3706

Closed

andygrove changed the title ~~WIP: Enable approx percentile tests~~ Enable approx percentile tests Oct 19, 2021

andygrove marked this pull request as ready for review October 19, 2021 14:39

revans2 previously approved these changes Oct 22, 2021

View reviewed changes

integration_tests/src/main/python/hash_aggregate_test.py Show resolved Hide resolved

integration_tests/src/main/python/hash_aggregate_test.py Outdated Show resolved Hide resolved

andygrove added 2 commits October 22, 2021 10:07

address PR feedback

13039fe

merge from branch-21.12

1e92ce5

andygrove dismissed revans2’s stale review via 1e92ce5 October 22, 2021 16:14

revans2 approved these changes Oct 22, 2021

View reviewed changes

mythrocks approved these changes Oct 22, 2021

View reviewed changes

andygrove changed the title ~~Enable approx percentile tests~~ WIP: Enable approx percentile tests Oct 27, 2021

andygrove marked this pull request as draft October 27, 2021 16:02

mythrocks mentioned this pull request Oct 27, 2021

Fix segmented_gather() for null LIST rows rapidsai/cudf#9537

Merged

Merge remote-tracking branch 'nvidia/branch-21.12' into enable-approx…

2f6ec80

…-percentile-tests

Salonijain27 modified the milestones: Oct 18 - Oct 29, Nov 1 - Nov 12 Oct 31, 2021

Merge remote-tracking branch 'nvidia/branch-21.12' into enable-approx…

5185bb8

…-percentile-tests

andygrove marked this pull request as ready for review November 5, 2021 23:11

andygrove changed the title ~~WIP: Enable approx percentile tests~~ Enable approx percentile tests Nov 5, 2021

andygrove merged commit e608ee7 into NVIDIA:branch-21.12 Nov 8, 2021

andygrove deleted the enable-approx-percentile-tests branch November 8, 2021 14:51

pxLi mentioned this pull request Nov 9, 2021

[BUG] test_hash_groupby_approx_percentile_long_repeated_keys failed intermittently #4060

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable approx percentile tests #3770

Enable approx percentile tests #3770

andygrove commented Oct 8, 2021 •

edited

Loading

revans2 left a comment

andygrove commented Oct 22, 2021

mythrocks left a comment

andygrove commented Oct 22, 2021

mythrocks commented Oct 27, 2021

mythrocks commented Oct 27, 2021

andygrove commented Nov 5, 2021

Enable approx percentile tests #3770

Enable approx percentile tests #3770

Conversation

andygrove commented Oct 8, 2021 • edited Loading

revans2 left a comment

Choose a reason for hiding this comment

andygrove commented Oct 22, 2021

mythrocks left a comment

Choose a reason for hiding this comment

andygrove commented Oct 22, 2021

mythrocks commented Oct 27, 2021

mythrocks commented Oct 27, 2021

andygrove commented Nov 5, 2021

andygrove commented Oct 8, 2021 •

edited

Loading