Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support AST in a left join(SMJ) when casting a string to date #11213

Open
viadea opened this issue Jul 17, 2024 · 0 comments
Open

[FEA] Support AST in a left join(SMJ) when casting a string to date #11213

viadea opened this issue Jul 17, 2024 · 0 comments
Labels
feature request New feature or request

Comments

@viadea
Copy link
Collaborator

viadea commented Jul 17, 2024

I wish we can support AST in a left join(SMJ) when casting a string to date

Env:
24.08 snapshot jar

Mini repro:

import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._

val data = Seq(
    Row(Row("Adam ","","Green"),"1","M",1000,"2024-01-01"),
    Row(Row("Bob ","Middle","Green"),"2","M",2000,"2024-12-12"),
    Row(Row("Cathy ","","Green"),"3","F",3000,"2022-03-04")
)

val schema = (new StructType()
  .add("name",new StructType()
    .add("firstname",StringType)
    .add("middlename",StringType)
    .add("lastname",StringType)) 
  .add("id",StringType)
  .add("gender",StringType)
  .add("salary",IntegerType)
  .add("birthday_str",StringType))

val df = spark.createDataFrame(spark.sparkContext.parallelize(data),schema).withColumn("birthday_dt",current_date().as("dt"))
df.printSchema

df.write.format("parquet").mode("overwrite").save("/tmp/testparquet")
spark.read.parquet("/tmp/testparquet").createOrReplaceTempView("df")

spark.conf.set("spark.rapids.sql.hasExtendedYearValues","false")
spark.conf.set("spark.sql.autoBroadcastJoinThreshold","1")

val query="""
select count(*) as cnt 
from df a left join df b 
on a.birthday_str between b.birthday_dt and b.birthday_dt+1
and a.name=b.name

"""

spark.sql(query).collect

Not-supported messages:

      !Exec <SortMergeJoinExec> cannot run on GPU because not all expressions can be replaced
        @Expression <AttributeReference> name#81 could run on GPU
        @Expression <AttributeReference> name#277 could run on GPU
        @Expression <And> ((cast(birthday_str#85 as date) >= birthday_dt#282) AND (cast(birthday_str#85 as date) <= date_add(birthday_dt#282, 1))) could run on GPU
          @Expression <GreaterThanOrEqual> (cast(birthday_str#85 as date) >= birthday_dt#282) could run on GPU
            !Expression <Cast> cast(birthday_str#85 as date) cannot run on GPU because AST is required and this expression does not support AST
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants