[DOC] what is deterministic
used for in Spark, specifically around aggregate functions
#4684
Labels
deterministic
used for in Spark, specifically around aggregate functions
#4684
#4677 introduced a change to align with Spark 3.3.0 where First/Last/Collect are now marked as deterministic. According to the referenced Spark PR, the change was made because First/Last/Collect were marked non-deterministic by mistake. Non-deterministic expressions are not eligible for certain optimizations, so that was the motivation for the PR to Spark. That said, the comments in First.scala and Last.scala and collect.scala still say the operations are "non-deterministic", which seems at odds with the change to Spark.
This issue is for a research spike that looks at
deterministic
to determine what is used for, and propose a likely change to Spark to help change some of the docs around these aggregates.The text was updated successfully, but these errors were encountered: