Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support str_to_map function #3542

Closed
nvliyuan opened this issue Sep 18, 2021 · 0 comments · Fixed by #4636
Closed

[FEA] Support str_to_map function #3542

nvliyuan opened this issue Sep 18, 2021 · 0 comments · Fixed by #4636
Assignees
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request

Comments

@nvliyuan
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
I wish the plugin can support str_to_map.

Additional context
Example code:

import org.apache.spark.sql.Row
import org.apache.spark.sql.types._

val data = Seq(
    Row(Row("Adam ","","Green"),1,"M",10,"Math:13,Gym:24,English:45",Map("hair"->"black","eye"->"black"),"{\"hair\":\"black\",\"eye\":\"black\"}"),
    Row(Row("Bob ","Middle","Green"),2,"M",20,"Math:55,Gym:24,English:37",Map("hair"->"yellow","eye"->"yellow"),"{\"hair\":\"green\",\"eye\":\"green\"}"),
    Row(Row("Cathy ","","Green"),3,"F",30,"Math:83,Gym:15,English:63",Map("hair"->"blue","eye"->"blue"),"{\"hair\":\"blue\",\"eye\":\"blue\"}")
)

val schema = (new StructType()
  .add("name",new StructType()
    .add("firstname",StringType)
    .add("middlename",StringType)
    .add("lastname",StringType)) 
  .add("low",IntegerType)
  .add("gender",StringType)
  .add("high",IntegerType)
  .add("score",StringType)
  .add("feature",MapType(StringType,StringType))
  .add("feature_json",StringType))

val df = spark.createDataFrame(spark.sparkContext.parallelize(data),schema)
df.write.format("parquet").mode("overwrite").save("/tmp/yl/tmpdatas")

val df2 = spark.read.parquet("/tmp/yl/tmpdatas")
df2.createOrReplaceTempView("df")

spark.sql("SELECT str_to_map(score, ',', ':') from df").show()

@nvliyuan nvliyuan added ? - Needs Triage Need team to review and classify feature request New feature or request labels Sep 18, 2021
@Salonijain27 Salonijain27 added cudf_dependency An issue or PR with this label depends on a new feature in cudf and removed ? - Needs Triage Need team to review and classify labels Sep 28, 2021
@ttnghia ttnghia self-assigned this Dec 8, 2021
@sameerz sameerz linked a pull request Jan 31, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants