Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Spark 3.0 EMR Shim layer #827

Merged
merged 10 commits into from
Sep 22, 2020
Merged

Add Spark 3.0 EMR Shim layer #827

merged 10 commits into from
Sep 22, 2020

Conversation

tgravescs
Copy link
Collaborator

@tgravescs tgravescs commented Sep 22, 2020

closes #818

This adds a shim layer to support Amazon EMR.
The main issues is they have a different version number and then their FileScanRDD takes a different number of parameters.

Other then that add some tests support and fix a few things that assumed DataSource v1 was enabled. There is one join test that has a follow up jira because it appears they have a new exec for bloom filters.

I ran all the integration tests successfully on an EMR cluster by specifying --runtime_env emr with the spark-submit command. It does not need to be built on an EMR cluster because we are using reflection to determine the FileScanRDD parameter count.

I did not run the unit tests against it as there is another jira to setup build and test on EMR and we can do it with that.

Signed-off-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
@tgravescs
Copy link
Collaborator Author

build

@tgravescs tgravescs self-assigned this Sep 22, 2020
@tgravescs tgravescs added the feature request New feature or request label Sep 22, 2020
@tgravescs tgravescs added this to the Sep 14 - Sep 25 milestone Sep 22, 2020
integration_tests/README.md Outdated Show resolved Hide resolved
integration_tests/README.md Outdated Show resolved Hide resolved
tgravescs and others added 2 commits September 22, 2020 13:46
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
@tgravescs
Copy link
Collaborator Author

build

jlowe
jlowe previously approved these changes Sep 22, 2020
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few questions.

revans2
revans2 previously approved these changes Sep 22, 2020
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
@tgravescs tgravescs dismissed stale reviews from revans2 and jlowe via 920bf18 September 22, 2020 21:08
@tgravescs
Copy link
Collaborator Author

build

@jlowe jlowe merged commit ee8a778 into NVIDIA:branch-0.3 Sep 22, 2020
sperlingxx pushed a commit to sperlingxx/spark-rapids that referenced this pull request Nov 20, 2020
* Initial files for EMR shim

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* pom file missed add

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Reflection for FileScanRDD

* Make tests run on EMR

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* doc and include fix

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Fix formatting

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Update integration_tests/README.md

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

* Update integration_tests/README.md

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

* cache the FileScanRDD constructor

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

Co-authored-by: Jason Lowe <jlowe@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Initial files for EMR shim

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* pom file missed add

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Reflection for FileScanRDD

* Make tests run on EMR

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* doc and include fix

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Fix formatting

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Update integration_tests/README.md

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

* Update integration_tests/README.md

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

* cache the FileScanRDD constructor

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

Co-authored-by: Jason Lowe <jlowe@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Initial files for EMR shim

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* pom file missed add

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Reflection for FileScanRDD

* Make tests run on EMR

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* doc and include fix

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Fix formatting

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Update integration_tests/README.md

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

* Update integration_tests/README.md

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

* cache the FileScanRDD constructor

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Create shim layer for AWS EMR
3 participants