-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable tests in udf_cudf_test.py #777
Enable tests in udf_cudf_test.py #777
Conversation
1) Update the configs for CUDF tests to enable pool and turn down the pool size to avoid OOM as much as possible. 2) Increase the size of the test data frame to avoid IPC errors. Some tasks will get empty data when the data frame is small enough, then the IPC error happens. Signed-off-by: Firestarman <firestarmanllc@gmail.com>
6602930
to
5bf2ce1
Compare
Force pushing to add the signoff |
Does the IPC error happen with the CPU only version of Spark? If we have a reproducible use case then we should file something against Spark itself. If it only happens with our replacement code then we need to do some more digging to understand why this results in an error. |
build |
This IPC error only happens with our code of the columnar part per my current findings. Yes, i will try to dig more. |
Then it is either an empty batch some how sneaking in or we launch things too early before we know if we will get any data or not. Thanks for looking into this. |
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
build |
Since shim layer for Spark 3.1.0 is not ready. Signed-off-by: Firestarman <firestarmanllc@gmail.com>
build |
@firestarman do you know why it doesn't work in 3.1.0 yet, just curious. |
The WindowExecBase becomes a |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if the IT passed locally
[(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)], | ||
("id", "v") | ||
) | ||
elements = list(map(lambda i: (i, i/1.0), range(1, 5000))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@firestarman What is the follow on issue to fix the underlying problem? This was a workaround to not expose the bug, but I would much rather see us xfail in that situation pointing at the bug for us to track and fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tracked by #750
from spark_session import with_cpu_session, with_gpu_session | ||
from marks import allow_non_gpu, cudf_udf | ||
|
||
pytestmark = pytest.mark.skipif(LooseVersion(spark_version()) >= LooseVersion('3.1.0'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too if we are skipping a test, or a test is failing we need to have an issue that this is pointing to so we don't lose track of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it is #844
when I just run the full integration tests from mvn (or spark-submit) these tests fail for me. I'm going to revert this change. These tests need to run automatically with new no special setup from the user and should run both from running mvn test to run all integration tests and via the spark-submit command specifying runtest.py. Error I got is: |
This reverts commit f76ed9c.
I would also like to understand how premerge passed with this - is the premerge doing extra setup for the tests to run and installing these somehow? |
This reverts commit f76ed9c. Signed-off-by: Thomas Graves <tgraves@nvidia.com>
* Enable tests in udf_cudf_test.py 1) Update the configs for CUDF tests to enable pool and turn down the pool size to avoid OOM as much as possible. 2) Increase the size of the test data frame to avoid IPC errors. Some tasks will get empty data when the data frame is small enough, then the IPC error happens. Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Skip cudf test for premerge Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Skip cudf tests for Spark 3.1.0+ temporarily Since shim layer for Spark 3.1.0 is not ready. Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Use LooseVersion to compare the version instead Signed-off-by: Firestarman <firestarmanllc@gmail.com>
This reverts commit f76ed9c. Signed-off-by: Thomas Graves <tgraves@nvidia.com>
* Enable tests in udf_cudf_test.py 1) Update the configs for CUDF tests to enable pool and turn down the pool size to avoid OOM as much as possible. 2) Increase the size of the test data frame to avoid IPC errors. Some tasks will get empty data when the data frame is small enough, then the IPC error happens. Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Skip cudf test for premerge Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Skip cudf tests for Spark 3.1.0+ temporarily Since shim layer for Spark 3.1.0 is not ready. Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Use LooseVersion to compare the version instead
This reverts commit f76ed9c. Signed-off-by: Thomas Graves <tgraves@nvidia.com>
* Enable tests in udf_cudf_test.py 1) Update the configs for CUDF tests to enable pool and turn down the pool size to avoid OOM as much as possible. 2) Increase the size of the test data frame to avoid IPC errors. Some tasks will get empty data when the data frame is small enough, then the IPC error happens. Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Skip cudf test for premerge Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Skip cudf tests for Spark 3.1.0+ temporarily Since shim layer for Spark 3.1.0 is not ready. Signed-off-by: Firestarman <firestarmanllc@gmail.com> * Use LooseVersion to compare the version instead
This reverts commit f76ed9c. Signed-off-by: Thomas Graves <tgraves@nvidia.com>
…IDIA#777) Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com> Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
This PR is to enable all the tests in file
udf_cudf_test.py
.pool size to avoid OOM as much as possible.
Some tasks will get empty data when the data frame is small enough, then the IPC
error happens.