[FEA] [Databricks 12.2] Support GpuMergeIntoCommand notMatchedBySourceClauses on GPU #8415

andygrove · 2023-05-26T17:32:11Z

Is your feature request related to a problem? Please describe.
PR #8282 adds basic support for Databricks 12.2 but did not add support for notMatchedBySourceClauses in GpuMergeIntoCommand and instead falls back to CPU.

Describe the solution you'd like
We should support notMatchedBySourceClauses on GPU.

Describe alternatives you've considered

Additional context
See the discussion at #8282 (comment) for more details.

The text was updated successfully, but these errors were encountered:

andygrove · 2023-06-14T18:08:15Z

Please ignore. This was unrelated to this issue and is now resolved.

One challenge in implementing this is that we replace JoinedRowProcessor with a RapidsProcessDeltaMergeJoin logical operator that involves a non-deterministic expression (monotonically_increasing_id). Spark has an analyzer rule that only allows non-deterministic expressions in a hard-coded set of operators, resulting in the following error:

23/06/14 16:55:53 ERROR GpuMergeIntoCommand: Fatal error in MERGE with materialized source in attempt 1.
org.apache.spark.sql.AnalysisException: nondeterministic expressions are only allowed in Project, Filter, Aggregate or Window, found:
a,b,c,_target_row_id_,_row_dropped_,_incr_row_count_,_change_type,(_source_row_present_ IS NULL),(_target_row_present_ IS NULL),true,a,b,c,_target_row_id_,true,(UDF() AND UDF()),CAST(NULL AS STRING),a,b,c,_target_row_id_,false,true,'delete',a,b,c,_target_row_id_,false,UDF(),CAST(NULL AS STRING),a,b,c,_target_row_id_,true,true,CAST(NULL AS STRING)
in operator RapidsProcessDeltaMergeJoin [a#304727, b#304728, c#304729, _target_row_id_#304730L, _row_dropped_#304731, _incr_row_count_#304732, _change_type#304733], isnull(_source_row_present_#304699), isnull(_target_row_present_#304702), [true], [[[a#304425, b#304426, c#304427, _target_row_id_#304707L, true, (UDF() AND UDF()), null], [a#304425, b#304426, c#304427, _target_row_id_#304707L, false, true, delete]]], [a#304425, b#304426, c#304427, _target_row_id_#304707L, false, UDF(), null], [a#304425, b#304426, c#304427, _target_row_id_#304707L, true, true, null].; line 1 pos 0;

Here is the Spark code from CheckAnalysis:

          case o if o.expressions.exists(!_.deterministic) &&
            !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
            !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] &&
            // Lateral join is checked in checkSubqueryExpression.
            !o.isInstanceOf[LateralJoin] =>
            // The rule above is used to check Aggregate operator.
            o.failAnalysis(
              errorClass = "_LEGACY_ERROR_TEMP_2439",
              messageParameters = Map(
                "sqlExprs" -> o.expressions.map(_.sql).mkString(","),
                "operator" -> operator.simpleString(SQLConf.get.maxToStringFields)))

For now, we avoid replacing the operator if notMatchedBySourceClauses is non-empty.

andygrove added feature request New feature or request ? - Needs Triage Need team to review and classify labels May 26, 2023

mattahrens removed the ? - Needs Triage Need team to review and classify label May 30, 2023

andygrove self-assigned this Jun 1, 2023

andygrove mentioned this issue Jun 1, 2023

Databricks 12.2 Support [databricks] #8282

Merged

4 tasks

This was referenced Jun 8, 2023

[FEA] Add initial support for Databricks 12.2 ML LTS #7876

Closed

[FEA] [Delta Lake] Add support for new metrics in MERGE #8556

Closed

Add support for Delta Lake 2.4.0 [databricks] #8573

Merged

andygrove mentioned this issue Jun 13, 2023

[FEA] Add support for Delta Lake 2.4 with Spark 3.4 #8547

Closed

andygrove mentioned this issue Jul 28, 2023

[FEA] Additional support for Databricks 12.2 ML LTS #8860

Open

4 tasks

andygrove removed their assignment Aug 31, 2023

jlowe mentioned this issue Oct 27, 2023

Delta Lake 2.3.0 support #9562

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] [Databricks 12.2] Support GpuMergeIntoCommand notMatchedBySourceClauses on GPU #8415

[FEA] [Databricks 12.2] Support GpuMergeIntoCommand notMatchedBySourceClauses on GPU #8415

andygrove commented May 26, 2023 •

edited

Loading

andygrove commented Jun 14, 2023 •

edited

Loading

[FEA] [Databricks 12.2] Support GpuMergeIntoCommand notMatchedBySourceClauses on GPU #8415

[FEA] [Databricks 12.2] Support GpuMergeIntoCommand notMatchedBySourceClauses on GPU #8415

Comments

andygrove commented May 26, 2023 • edited Loading

andygrove commented Jun 14, 2023 • edited Loading

andygrove commented May 26, 2023 •

edited

Loading

andygrove commented Jun 14, 2023 •

edited

Loading