Enable tests in udf_cudf_test.py #777

firestarman · 2020-09-16T12:03:52Z

This PR is to enable all the tests in file udf_cudf_test.py.

Update the configs for CUDF tests to enable pool and turn down the
pool size to avoid OOM as much as possible.
Increase the size of the test data frame to avoid IPC errors when running with the GPU columnar pipeline,
Some tasks will get empty data when the data frame is small enough, then the IPC
error happens.

1) Update the configs for CUDF tests to enable pool and turn down the pool size to avoid OOM as much as possible. 2) Increase the size of the test data frame to avoid IPC errors. Some tasks will get empty data when the data frame is small enough, then the IPC error happens. Signed-off-by: Firestarman <firestarmanllc@gmail.com>

firestarman · 2020-09-16T12:06:32Z

Force pushing to add the signoff

revans2 · 2020-09-16T13:03:31Z

Increase the size of the test data frame to avoid IPC errors. Some tasks
will get empty data when the data frame is small enough, then the IPC
error happens.

Does the IPC error happen with the CPU only version of Spark? If we have a reproducible use case then we should file something against Spark itself. If it only happens with our replacement code then we need to do some more digging to understand why this results in an error.

revans2 · 2020-09-16T13:03:40Z

build

firestarman · 2020-09-16T13:10:05Z

Does the IPC error happen with the CPU only version of Spark? If we have a reproducible use case then we should file something against Spark itself. If it only happens with our replacement code then we need to do some more digging to understand why this results in an error.

This IPC error only happens with our code of the columnar part per my current findings. Yes, i will try to dig more.

revans2 · 2020-09-16T13:25:07Z

This IPC error only happens with our code of the columnar part per my current findings.

Then it is either an empty batch some how sneaking in or we launch things too early before we know if we will get any data or not. Thanks for looking into this.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>

firestarman · 2020-09-17T01:08:20Z

build

Since shim layer for Spark 3.1.0 is not ready. Signed-off-by: Firestarman <firestarmanllc@gmail.com>

firestarman · 2020-09-17T04:03:15Z

build

abellina · 2020-09-17T04:03:54Z

@firestarman do you know why it doesn't work in 3.1.0 yet, just curious.

firestarman · 2020-09-17T04:18:35Z

do you know why it doesn't work in 3.1.0 yet, just curious.

The WindowExecBase becomes a trait in Spark 3.1.0, so need to add a shim layer for our gpu GPU version, but it has not been done yet.

integration_tests/src/main/python/udf_cudf_test.py

jenkins/spark-tests.sh

firestarman · 2020-09-17T08:42:02Z

build

pxLi

LGTM if the IT passed locally

revans2 · 2020-09-18T18:43:59Z

integration_tests/src/main/python/udf_cudf_test.py

-            [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)],
-            ("id", "v")
-        )
+    elements = list(map(lambda i: (i, i/1.0), range(1, 5000)))


@firestarman What is the follow on issue to fix the underlying problem? This was a workaround to not expose the bug, but I would much rather see us xfail in that situation pointing at the bug for us to track and fix it.

This is tracked by #750

revans2 · 2020-09-18T18:44:35Z

integration_tests/src/main/python/udf_cudf_test.py

 from spark_session import with_cpu_session, with_gpu_session
 from marks import allow_non_gpu, cudf_udf

+pytestmark = pytest.mark.skipif(LooseVersion(spark_version()) >= LooseVersion('3.1.0'),


Here too if we are skipping a test, or a test is failing we need to have an issue that this is pointing to so we don't lose track of this.

Here it is #844

tgravescs · 2020-09-18T19:23:23Z

when I just run the full integration tests from mvn (or spark-submit) these tests fail for me. I'm going to revert this change. These tests need to run automatically with new no special setup from the user and should run both from running mvn test to run all integration tests and via the spark-submit command specifying runtest.py.

Error I got is:
E /home/tgraves/miniconda3/bin/python: Error while finding module specification for 'rapids.daemon' (ModuleNotFoundError: No module named 'rapids')

This reverts commit f76ed9c.

tgravescs · 2020-09-18T19:25:57Z

I would also like to understand how premerge passed with this - is the premerge doing extra setup for the tests to run and installing these somehow?

This reverts commit f76ed9c. Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Enable tests in udf_cudf_test.py 1) Update the configs for CUDF tests to enable pool and turn down the pool size to avoid OOM as much as possible. 2) Increase the size of the test data frame to avoid IPC errors. Some tasks will get empty data when the data frame is small enough, then the IPC error happens. Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Skip cudf test for premerge Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Skip cudf tests for Spark 3.1.0+ temporarily Since shim layer for Spark 3.1.0 is not ready. Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Use LooseVersion to compare the version instead Signed-off-by: Firestarman <firestarmanllc@gmail.com>

This reverts commit f76ed9c. Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Enable tests in udf_cudf_test.py 1) Update the configs for CUDF tests to enable pool and turn down the pool size to avoid OOM as much as possible. 2) Increase the size of the test data frame to avoid IPC errors. Some tasks will get empty data when the data frame is small enough, then the IPC error happens. Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Skip cudf test for premerge Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Skip cudf tests for Spark 3.1.0+ temporarily Since shim layer for Spark 3.1.0 is not ready. Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Use LooseVersion to compare the version instead

This reverts commit f76ed9c. Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Enable tests in udf_cudf_test.py 1) Update the configs for CUDF tests to enable pool and turn down the pool size to avoid OOM as much as possible. 2) Increase the size of the test data frame to avoid IPC errors. Some tasks will get empty data when the data frame is small enough, then the IPC error happens. Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Skip cudf test for premerge Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Skip cudf tests for Spark 3.1.0+ temporarily Since shim layer for Spark 3.1.0 is not ready. Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Use LooseVersion to compare the version instead

This reverts commit f76ed9c. Signed-off-by: Thomas Graves <tgraves@nvidia.com>

…IDIA#777) Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com> Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

firestarman requested review from GaryShen2008, jlowe, NvTimLiu, revans2 and tgravescs as code owners September 16, 2020 12:03

firestarman linked an issue Sep 16, 2020 that may be closed by this pull request

[BUG] cudf_udf_test.py is flakey #746

Closed

firestarman force-pushed the enable-cudf-udf-test branch from 6602930 to 5bf2ce1 Compare September 16, 2020 12:05

Skip cudf test for premerge

aea0a92

Signed-off-by: Firestarman <firestarmanllc@gmail.com>

Skip cudf tests for Spark 3.1.0+ temporarily

af2b812

Since shim layer for Spark 3.1.0 is not ready. Signed-off-by: Firestarman <firestarmanllc@gmail.com>

pxLi reviewed Sep 17, 2020

View reviewed changes

integration_tests/src/main/python/udf_cudf_test.py Outdated Show resolved Hide resolved

Use LooseVersion to compare the version instead

0e906ed

pxLi reviewed Sep 17, 2020

View reviewed changes

jenkins/spark-tests.sh Show resolved Hide resolved

pxLi approved these changes Sep 17, 2020

View reviewed changes

jlowe added build Related to CI / CD or cleanly building test Only impacts tests labels Sep 17, 2020

GaryShen2008 approved these changes Sep 18, 2020

View reviewed changes

firestarman merged commit f76ed9c into NVIDIA:branch-0.3 Sep 18, 2020

firestarman deleted the enable-cudf-udf-test branch September 18, 2020 01:15

revans2 reviewed Sep 18, 2020

View reviewed changes

tgravescs added a commit that referenced this pull request Sep 18, 2020

Revert "Enable tests in udf_cudf_test.py (#777)"

dd576e4

This reverts commit f76ed9c.

tgravescs mentioned this pull request Sep 18, 2020

Revert "Enable tests in udf_cudf_test.py" #813

Merged

tgravescs added a commit that referenced this pull request Sep 18, 2020

Revert "Enable tests in udf_cudf_test.py (#777)"

abe4ee7

This reverts commit f76ed9c. Signed-off-by: Thomas Graves <tgraves@nvidia.com>

tgravescs added a commit that referenced this pull request Sep 21, 2020

Revert "Enable tests in udf_cudf_test.py (#777)" (#813)

007f884

This reverts commit f76ed9c. Signed-off-by: Thomas Graves <tgraves@nvidia.com>

sperlingxx pushed a commit to sperlingxx/spark-rapids that referenced this pull request Nov 20, 2020

Revert "Enable tests in udf_cudf_test.py (NVIDIA#777)" (NVIDIA#813)

853fac5

This reverts commit f76ed9c. Signed-off-by: Thomas Graves <tgraves@nvidia.com>

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Revert "Enable tests in udf_cudf_test.py (NVIDIA#777)" (NVIDIA#813)

78bc182

This reverts commit f76ed9c. Signed-off-by: Thomas Graves <tgraves@nvidia.com>

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Revert "Enable tests in udf_cudf_test.py (NVIDIA#777)" (NVIDIA#813)

9b26f95

This reverts commit f76ed9c. Signed-off-by: Thomas Graves <tgraves@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable tests in udf_cudf_test.py #777

Enable tests in udf_cudf_test.py #777

firestarman commented Sep 16, 2020 •

edited

Loading

firestarman commented Sep 16, 2020

revans2 commented Sep 16, 2020

revans2 commented Sep 16, 2020

firestarman commented Sep 16, 2020 •

edited

Loading

revans2 commented Sep 16, 2020

firestarman commented Sep 17, 2020

firestarman commented Sep 17, 2020

abellina commented Sep 17, 2020

firestarman commented Sep 17, 2020 •

edited

Loading

firestarman commented Sep 17, 2020

pxLi left a comment

revans2 Sep 18, 2020

firestarman Sep 24, 2020

revans2 Sep 18, 2020

firestarman Sep 24, 2020

tgravescs commented Sep 18, 2020

tgravescs commented Sep 18, 2020

Enable tests in udf_cudf_test.py #777

Enable tests in udf_cudf_test.py #777

Conversation

firestarman commented Sep 16, 2020 • edited Loading

firestarman commented Sep 16, 2020

revans2 commented Sep 16, 2020

revans2 commented Sep 16, 2020

firestarman commented Sep 16, 2020 • edited Loading

revans2 commented Sep 16, 2020

firestarman commented Sep 17, 2020

firestarman commented Sep 17, 2020

abellina commented Sep 17, 2020

firestarman commented Sep 17, 2020 • edited Loading

firestarman commented Sep 17, 2020

pxLi left a comment

Choose a reason for hiding this comment

revans2 Sep 18, 2020

Choose a reason for hiding this comment

firestarman Sep 24, 2020

Choose a reason for hiding this comment

revans2 Sep 18, 2020

Choose a reason for hiding this comment

firestarman Sep 24, 2020

Choose a reason for hiding this comment

tgravescs commented Sep 18, 2020

tgravescs commented Sep 18, 2020

firestarman commented Sep 16, 2020 •

edited

Loading

firestarman commented Sep 16, 2020 •

edited

Loading

firestarman commented Sep 17, 2020 •

edited

Loading