Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addressing the Named Parameter change in Spark 4.0.0 [databricks] #10992

Merged
merged 1 commit into from
Jun 8, 2024

Conversation

razajafri
Copy link
Collaborator

This PR addresses the change in Spark of renaming the named parameter shouldBroadcast to isDynamicPruning in this change.

We have taken the approach of overloading the naming of the variable and passing the parameter without naming it.

contributes to #9259

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri razajafri added the Spark 4.0+ Spark 4.0+ issues label Jun 6, 2024
@razajafri
Copy link
Collaborator Author

build

@razajafri
Copy link
Collaborator Author

build

@razajafri
Copy link
Collaborator Author

341db instance failed to start

[2024-06-07T17:28:50.316Z] cluster response is {"cluster_id":"0607-172444-i09uaaxw","creator_user_name":"timl@nvidia.com","driver_healthy":false,"cluster_name":"CI-jenkins-rapids-databricks_premerge-github-799-3.4.1-1717781083","spark_version":"13.3.x-gpu-ml-scala2.12","aws_attributes":{"first_on_demand":1,"availability":"SPOT_WITH_FALLBACK","zone_id":"us-west-2c","spot_bid_price_percent":100,"ebs_volume_count":0},"node_type_id":"g5.4xlarge","driver_node_type_id":"g5.4xlarge","ssh_public_keys":["****"],"autotermination_minutes":390,"enable_elastic_disk":false,"disk_spec":{"disk_count":0},"cluster_source":"API","enable_local_disk_encryption":false,"instance_source":{"node_type_id":"g5.4xlarge"},"driver_instance_source":{"node_type_id":"g5.4xlarge"},"effective_spark_version":"13.3.x-gpu-ml-scala2.12","state":"TERMINATED","state_message":"Please reduce the number of instances in your request, or wait for additional capacity to become available. You can also try launching an instance by selecting different instance types (which you c...","start_time":1717781084273,"terminated_time":1717781302423,"last_state_loss_time":0,"last_activity_time":0,"last_restarted_time":1717781084273,"num_workers":0,"default_tags":{"Vendor":"Databricks","Creator":"timl@nvidia.com","ClusterName":"CI-jenkins-rapids-databricks_premerge-github-799-3.4.1-1717781083","ClusterId":"0607-172444-i09uaaxw"},"termination_reason":{"code":"AWS_INSUFFICIENT_INSTANCE_CAPACITY_FAILURE","type":"CLIENT_ERROR","parameters":{"aws_api_error_code":"InsufficientInstanceCapacity","aws_error_message":"We currently do not have sufficient g5.4xlarge capacity in the Availability Zone you requested (us-west-2c). Our system will be working on provisioning additional capacity. You can currently get g5.4xlarge capacity by not specifying an Availability Zone in your request or choosing us-west-2a, us-west-2b."}},"init_scripts_safe_mode":false,"spec":{"cluster_name":"CI-jenkins-rapids-databricks_premerge-github-799-3.4.1-1717781083","spark_version":"13.3.x-gpu-ml-scala2.12","aws_attributes":{"first_on_demand":1,"availability":"SPOT_WITH_FALLBACK","zone_id":"us-west-2c","spot_bid_price_percent":100,"ebs_volume_count":0},"node_type_id":"g5.4xlarge","driver_node_type_id":"g5.4xlarge","ssh_public_keys":["****"],"autotermination_minutes":390,"enable_elastic_disk":false,"enable_local_disk_encryption":false,"num_workers":0}}

[2024-06-07T17:28:50.316Z] 0607-172444-i09uaaxw state:TERMINATED

@razajafri
Copy link
Collaborator Author

build

Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but could be a way smaller change

@@ -65,24 +65,27 @@ class GpuInSubqueryExecSuite extends SparkQueryCompareTestSuite {

private def buildCpuInSubqueryPlan(
spark: SparkSession,
shouldBroadcast: Boolean): SparkPlan = {
shouldBroadcastOrDpp: Boolean): SparkPlan = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is to minimize the change. I would leave the name as is, then we could have the whole patch a one-line change

@razajafri
Copy link
Collaborator Author

build

@razajafri razajafri merged commit 9030b13 into NVIDIA:branch-24.08 Jun 8, 2024
45 checks passed
@razajafri razajafri deleted the SP-9259-shouldbroadcast branch June 8, 2024 05:44
SurajAralihalli pushed a commit to SurajAralihalli/spark-rapids that referenced this pull request Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Spark 4.0+ Spark 4.0+ issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants