Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move pandas_udf functions into the tests functions #926

Merged
merged 3 commits into from
Oct 13, 2020

Conversation

tgravescs
Copy link
Collaborator

Move pandas_udf functions into the tests functions so they don't try to compile when skipped.

Fixes #922

Note I can now run tests without this failing, but I haven't run the test to make sure it passes as I don't have the env setup to do it.

@tgravescs
Copy link
Collaborator Author

@firestarman @shotai any chance you have env setup you can make sure the test still works when config specified?

@tgravescs tgravescs added the build Related to CI / CD or cleanly building label Oct 9, 2020
@tgravescs
Copy link
Collaborator Author

build

Signed-off-by: Thomas Graves <tgraves@nvidia.com>
@firestarman
Copy link
Collaborator

firestarman commented Oct 10, 2020

Thanks a lot for finding this.

One little issue is the two udfs _sum_cpu_func and _sum_gpu_func should be defined again in test tet_window since this test also uses these 2 udfs.

Something like: firestarman@900a91d

@cudf_udf
def test_window(enable_cudf_udf):
+    @pandas_udf("int")
+    def _sum_cpu_func(v: pd.Series) -> int:
+        return v.sum()
+
+   @pandas_udf("integer")
+   def _sum_gpu_func(v: pd.Series) -> int:
+       import cudf
+       gpu_series = cudf.Series(v)
+       return gpu_series.sum()
+
    def cpu_run(spark):
          df = _create_df(spark)
          w = Window.partitionBy('id').rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)
          return df.withColumn('sum_v', _sum_cpu_func('v').over(w)).collect()

@firestarman
Copy link
Collaborator

With the change above, i can get cudf udf tests passed locally.
========== 11 passed, 3418 deselected, 1 warning in 131.68s (0:02:11) ==========

@sameerz sameerz added this to the Oct 12 - Oct 23 milestone Oct 10, 2020
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
@tgravescs
Copy link
Collaborator Author

@firestarman thanks for the review, I added them to test_window. I realize this duplicates the functions so if there is a better way you can come up with not to duplicate the functions that is more ideal. I tried putting them in a class but that still compiled it.
I think this pr will work for now though.

@tgravescs
Copy link
Collaborator Author

build

1 similar comment
@tgravescs
Copy link
Collaborator Author

build

@tgravescs tgravescs merged commit f3e4062 into NVIDIA:branch-0.3 Oct 13, 2020
@tgravescs tgravescs deleted the cudfudftests branch October 13, 2020 13:13
sperlingxx pushed a commit to sperlingxx/spark-rapids that referenced this pull request Nov 20, 2020
* Move pandas_udf functions into the tests functions so they don't try to
compile when skipped

* put back enable hive

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Add missing functions to test_window

Signed-off-by: Thomas Graves <tgraves@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Move pandas_udf functions into the tests functions so they don't try to
compile when skipped

* put back enable hive

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Add missing functions to test_window

Signed-off-by: Thomas Graves <tgraves@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Move pandas_udf functions into the tests functions so they don't try to
compile when skipped

* put back enable hive

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Add missing functions to test_window

Signed-off-by: Thomas Graves <tgraves@nvidia.com>
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
…IDIA#926)

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] spark 3.1.0 udf_cudf_test fail without specifying --cudf-udf option
4 participants