-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Spark 3.0 EMR Shim layer #827
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
tgravescs
requested review from
GaryShen2008,
jlowe,
NvTimLiu and
revans2
as code owners
September 22, 2020 18:27
build |
jlowe
reviewed
Sep 22, 2020
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
build |
jlowe
previously approved these changes
Sep 22, 2020
revans2
reviewed
Sep 22, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few questions.
.../spark300emr/src/main/scala/com/nvidia/spark/rapids/shims/spark300emr/Spark300EMRShims.scala
Outdated
Show resolved
Hide resolved
revans2
previously approved these changes
Sep 22, 2020
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
build |
jlowe
approved these changes
Sep 22, 2020
revans2
approved these changes
Sep 22, 2020
sperlingxx
pushed a commit
to sperlingxx/spark-rapids
that referenced
this pull request
Nov 20, 2020
* Initial files for EMR shim Signed-off-by: Thomas Graves <tgraves@nvidia.com> * pom file missed add Signed-off-by: Thomas Graves <tgraves@nvidia.com> * Reflection for FileScanRDD * Make tests run on EMR Signed-off-by: Thomas Graves <tgraves@nvidia.com> * doc and include fix Signed-off-by: Thomas Graves <tgraves@nvidia.com> * Fix formatting Signed-off-by: Thomas Graves <tgraves@nvidia.com> * Update integration_tests/README.md Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update integration_tests/README.md Co-authored-by: Jason Lowe <jlowe@nvidia.com> * cache the FileScanRDD constructor Signed-off-by: Thomas Graves <tgraves@nvidia.com> Co-authored-by: Jason Lowe <jlowe@nvidia.com>
nartal1
pushed a commit
to nartal1/spark-rapids
that referenced
this pull request
Jun 9, 2021
* Initial files for EMR shim Signed-off-by: Thomas Graves <tgraves@nvidia.com> * pom file missed add Signed-off-by: Thomas Graves <tgraves@nvidia.com> * Reflection for FileScanRDD * Make tests run on EMR Signed-off-by: Thomas Graves <tgraves@nvidia.com> * doc and include fix Signed-off-by: Thomas Graves <tgraves@nvidia.com> * Fix formatting Signed-off-by: Thomas Graves <tgraves@nvidia.com> * Update integration_tests/README.md Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update integration_tests/README.md Co-authored-by: Jason Lowe <jlowe@nvidia.com> * cache the FileScanRDD constructor Signed-off-by: Thomas Graves <tgraves@nvidia.com> Co-authored-by: Jason Lowe <jlowe@nvidia.com>
nartal1
pushed a commit
to nartal1/spark-rapids
that referenced
this pull request
Jun 9, 2021
* Initial files for EMR shim Signed-off-by: Thomas Graves <tgraves@nvidia.com> * pom file missed add Signed-off-by: Thomas Graves <tgraves@nvidia.com> * Reflection for FileScanRDD * Make tests run on EMR Signed-off-by: Thomas Graves <tgraves@nvidia.com> * doc and include fix Signed-off-by: Thomas Graves <tgraves@nvidia.com> * Fix formatting Signed-off-by: Thomas Graves <tgraves@nvidia.com> * Update integration_tests/README.md Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update integration_tests/README.md Co-authored-by: Jason Lowe <jlowe@nvidia.com> * cache the FileScanRDD constructor Signed-off-by: Thomas Graves <tgraves@nvidia.com> Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
closes #818
This adds a shim layer to support Amazon EMR.
The main issues is they have a different version number and then their FileScanRDD takes a different number of parameters.
Other then that add some tests support and fix a few things that assumed DataSource v1 was enabled. There is one join test that has a follow up jira because it appears they have a new exec for bloom filters.
I ran all the integration tests successfully on an EMR cluster by specifying --runtime_env emr with the spark-submit command. It does not need to be built on an EMR cluster because we are using reflection to determine the FileScanRDD parameter count.
I did not run the unit tests against it as there is another jira to setup build and test on EMR and we can do it with that.