Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] what is deterministic used for in Spark, specifically around aggregate functions #4684

Closed
abellina opened this issue Feb 3, 2022 · 2 comments
Labels
documentation Improvements or additions to documentation wontfix This will not be worked on

Comments

@abellina
Copy link
Collaborator

abellina commented Feb 3, 2022

#4677 introduced a change to align with Spark 3.3.0 where First/Last/Collect are now marked as deterministic. According to the referenced Spark PR, the change was made because First/Last/Collect were marked non-deterministic by mistake. Non-deterministic expressions are not eligible for certain optimizations, so that was the motivation for the PR to Spark. That said, the comments in First.scala and Last.scala and collect.scala still say the operations are "non-deterministic", which seems at odds with the change to Spark.

This issue is for a research spike that looks at deterministic to determine what is used for, and propose a likely change to Spark to help change some of the docs around these aggregates.

@abellina abellina added documentation Improvements or additions to documentation ? - Needs Triage Need team to review and classify labels Feb 3, 2022
@gerashegalov
Copy link
Collaborator

The purpose of deterministic is to prevent having inconsistent results of an expression in the context of a single query if the value is not memoized https://issues.apache.org/jira/browse/SPARK-8023.

@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Feb 8, 2022
@sameerz
Copy link
Collaborator

sameerz commented Feb 8, 2022

Our behavior is consistent with Spark, so additional documentation is not warranted.

@sameerz sameerz closed this as completed Feb 8, 2022
@sameerz sameerz added the wontfix This will not be worked on label Apr 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants