Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement way for python integration tests to validate Exec is in GPU plan #1668

Merged
merged 22 commits into from
Feb 5, 2021

Conversation

tgravescs
Copy link
Collaborator

I've been wanting this for a while. It allows us to verify that certain GPU execs are actually in the plan for the tests we run. Spark has so many optimizations and such that sometimes you think your testing one thing but it gets changed to something else. So this allows us to make sure the gpu execs are in the plan.
It is intended for testing purposes only and is pretty basic. We could easily make it smarter later if needed. Like if you need to verify it was in plan multiple times, etc.

You simple tag the test with:
@validate_execs_in_gpu_plan('HostColumnarToGpu')
It allows multiple like:
@validate_execs_in_gpu_plan('HostColumnarToGpu', "fooExec")
If the execs not found it throws an exception and fails the test. The failure looks like:

e = IllegalArgumentException('Plan GpuColumnarToRow false\n+- GpuSort [col1#2 ASC NULLS FIRST], true, RequireSingleBatch, ...:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.lang.Thread.run(Thread.java:748)\n', None)

>   ???
E   pyspark.sql.utils.IllegalArgumentException: Plan GpuColumnarToRow false
E   +- GpuSort [col1#2 ASC NULLS FIRST], true, RequireSingleBatch, 0
E      +- GpuCoalesceBatches RequireSingleBatch
E         +- GpuShuffleCoalesce 2147483647
E            +- GpuColumnarExchange gpurangepartitioning(col1#2 ASC NULLS FIRST, 200, StructField(col1,IntegerType,true)), true, [id=#32]
E               +- HostColumnarToGpu TargetSize(2147483647)
E                  +- BatchScan[col1#2] class com.nvidia.spark.rapids.tests.datasourcev2.parquet.ArrowColumnarDataSourceV2$MyScanBuilder
E    does not contain the following execs: Set(fooExec)

@tgravescs tgravescs added the test Only impacts tests label Feb 4, 2021
@tgravescs tgravescs added this to the Feb 1 - Feb 12 milestone Feb 4, 2021
@tgravescs tgravescs self-assigned this Feb 4, 2021
@tgravescs
Copy link
Collaborator Author

build

@tgravescs
Copy link
Collaborator Author

build

jlowe
jlowe previously approved these changes Feb 4, 2021
@tgravescs
Copy link
Collaborator Author

looking at failures

@tgravescs
Copy link
Collaborator Author

build

Comment on lines 445 to 448
if (execsNotFound.nonEmpty) {
throw new IllegalArgumentException(
s"Plan ${plan.toString()} does not contain the following execs: $execsNotFound")
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can use the syntax sugar require to produce this IllegalArgumentException

Suggested change
if (execsNotFound.nonEmpty) {
throw new IllegalArgumentException(
s"Plan ${plan.toString()} does not contain the following execs: $execsNotFound")
}
require(execsNotFound.nonEmpty, s"Plan ${plan.toString()} does not contain the following execs: $execsNotFound")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@tgravescs
Copy link
Collaborator Author

build failing due to changes in 3.0.1 for ShuffleLike, working on changes

between spark versions

Signed-off-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
@tgravescs
Copy link
Collaborator Author

build

Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, 🚀

@tgravescs tgravescs merged commit 8985d85 into NVIDIA:branch-0.4 Feb 5, 2021
@tgravescs tgravescs deleted the validateExecContained branch February 5, 2021 00:50
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
… plan (NVIDIA#1668)

* Add Data source v2 test classes

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* update v2 source testing

* fix batch num rows and logging

* update the numberin batch

* Fix issue with reading booleans from ArrowColumnVectors and add more
tests

* move test file so pytest regex pick it up

* add comments

* Validate that the plan actually contains the exec

Signed-off-by: Thomas Graves <tgraves@apache.org>

* fix getting class name

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* change to use findOperators

* fix calls to findOperators

* turn on for all tests

* add a mark for validating execs in gpu plan

* fix name

* cleanup

* fix line length

* update comment

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Update copyrights to have 2021

* Move findOperators into shim layer due to changes in ShuffleExchangeExec
between spark versions

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* add comments to Shim and change to use require

Signed-off-by: Thomas Graves <tgraves@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
… plan (NVIDIA#1668)

* Add Data source v2 test classes

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* update v2 source testing

* fix batch num rows and logging

* update the numberin batch

* Fix issue with reading booleans from ArrowColumnVectors and add more
tests

* move test file so pytest regex pick it up

* add comments

* Validate that the plan actually contains the exec

Signed-off-by: Thomas Graves <tgraves@apache.org>

* fix getting class name

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* change to use findOperators

* fix calls to findOperators

* turn on for all tests

* add a mark for validating execs in gpu plan

* fix name

* cleanup

* fix line length

* update comment

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Update copyrights to have 2021

* Move findOperators into shim layer due to changes in ShuffleExchangeExec
between spark versions

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* add comments to Shim and change to use require

Signed-off-by: Thomas Graves <tgraves@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants