Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] doing a window operation with an orderby for a single constant crashes #880

Closed
revans2 opened this issue Sep 29, 2020 · 1 comment · Fixed by #889
Closed

[BUG] doing a window operation with an orderby for a single constant crashes #880

revans2 opened this issue Sep 29, 2020 · 1 comment · Fixed by #889
Assignees
Labels
bug Something isn't working P0 Must have for release

Comments

@revans2
Copy link
Collaborator

revans2 commented Sep 29, 2020

Describe the bug
If you do a window operation on only a constant string then we crash.

Steps/Code to reproduce bug

spark = SparkSession.builder \
        .appName('test').getOrCreate()

simpleData = (("James", "Sales", 3000), \
    ("Michael", "Sales", 4600),  \
    ("Robert", "Sales", 4100),   \
    ("Maria", "Finance", 3000),  \
    ("James", "Sales", 3000),    \
    ("Scott", "Finance", 3300),  \
    ("Jen", "Finance", 3900),    \
    ("Jeff", "Marketing", 3000), \
    ("Kumar", "Marketing", 2000),\
    ("BAD", "Marketing", None),\
    ("Saif", "Sales", 4100) \
  )

columns= ["employee_name", "department", "salary"]
df = spark.createDataFrame(data = simpleData, schema = columns)

windowSpec  = Window.partitionBy("department").orderBy(lit(''))

df.withColumn("row_num", row_number().over(windowSpec)).show(truncate=False)

Expected behavior
It produces a similar response to what spark does (which honestly results in non-reproducible results, but what ever)

@revans2 revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Sep 29, 2020
@revans2
Copy link
Collaborator Author

revans2 commented Sep 29, 2020

FYI @willb

@sameerz sameerz added P0 Must have for release and removed ? - Needs Triage Need team to review and classify labels Sep 29, 2020
@sameerz sameerz added this to the Sep 28 - Oct 9 milestone Sep 29, 2020
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
…IDIA#880)

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants