Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] create AWS EMR 3.0.1 shim #905

Closed
tgravescs opened this issue Oct 2, 2020 · 9 comments · Fixed by #995
Closed

[FEA] create AWS EMR 3.0.1 shim #905

tgravescs opened this issue Oct 2, 2020 · 9 comments · Fixed by #995
Assignees
Labels
feature request New feature or request P0 Must have for release

Comments

@tgravescs
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
Add a shim layer for aws EMR being based on spark 3.0.1

@tgravescs tgravescs added feature request New feature or request P0 Must have for release labels Oct 2, 2020
@tgravescs tgravescs self-assigned this Oct 2, 2020
@aluroid
Copy link

aluroid commented Oct 10, 2020

It's great to know spark-rapids is going to support EMR, is there an ETA of 0.3.0?
Also, latest EMR 6.1.0 is with Spark 3.0.0, so is this feature targeting a future release of EMR?

@sameerz
Copy link
Collaborator

sameerz commented Oct 10, 2020

We are working to see if we can get the 0.2.0 plugin on EMR 6.2.0, which should be available in November / December.
The target for the 0.3.0 plugin is end of the year.

@Data-drone
Copy link

Data-drone commented Oct 13, 2020

Cloudera Spark 3 release also runs on 3.0.1 so currently I run into:
https://docs.cloudera.com/cdp-private-cloud-base/7.1.3/cds-3/topics/spark-spark-3-requirements.html

20/10/14 01:13:34 ERROR repl.Main: Failed to initialize Spark session.
java.lang.IllegalArgumentException: Could not find Spark Shim Loader for 3.0.1.3.0.7110.0-81
	at com.nvidia.spark.rapids.ShimLoader$.detectShimProvider(ShimLoader.scala:45)
	at com.nvidia.spark.rapids.ShimLoader$.findShimProvider(ShimLoader.scala:52)
	at com.nvidia.spark.rapids.ShimLoader$.getSparkShims(ShimLoader.scala:64)

@tgravescs
Copy link
Collaborator Author

thanks for reporting this, we actually have a separate issue to make the version parsing better - #409

We did add a config that lets the user override the shim if you know it will work. Its marked as an internal config because the user should know what they are doing because if the shim they specify doesn't work things will likely fail. It depends on if each vendor made internal Spark changes.
For example you could try: spark.rapids.shims-provider-override=com.nvidia.spark.rapids.shims.spark301.SparkShimServiceProvider

@Data-drone
Copy link

cheers that got it to boot but as you said there will be no guarantees that something won't break somewhere. How can I validate?

@sameerz
Copy link
Collaborator

sameerz commented Oct 13, 2020

You can run the integration tests in your environment (instructions here: https://github.com/NVIDIA/spark-rapids/tree/branch-0.3/integration_tests ). We are going to work with Cloudera to ensure we are compatible in their environment.

@Data-drone
Copy link

You can run the integration tests in your environment (instructions here: https://github.com/NVIDIA/spark-rapids/tree/branch-0.3/integration_tests ). We are going to work with Cloudera to ensure we are compatible in their environment.

Okies I'm with Cloudera if you need any help finding people

@sameerz
Copy link
Collaborator

sameerz commented Oct 14, 2020

I confirmed with our Cloudera contact that CDS 3.0 is basically running Spark 3.0.1, so our plugin will work fine. The workaround mentioned above will work for now. Longer term we will build a shim layer for Cloudera so the workaround is not required.

@tgravescs
Copy link
Collaborator Author

filed #951 to track that

@tgravescs tgravescs added this to the Oct 12 - Oct 23 milestone Oct 21, 2020
@sameerz sameerz changed the title [FEA] create EMR 3.0.1 shim [FEA] create AWS EMR 3.0.1 shim Dec 14, 2020
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
…IDIA#905)

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request P0 Must have for release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants