[FEA] figure out how to do collect_set in a window operation. #10113
Labels
feature request
New feature or request
reliability
Features to improve reliability or bugs that severly impact the reliability of the plugin
Is your feature request related to a problem? Please describe.
collect_list
andcollect_set
are really not good from a reliability standpoint when run in the context of a window operation. Because of this I filed #10110 to disable them by default. But long term we need to come up with a much better solution to the problem.collect_list
has some fundamental problems that make it very hard to work with. Which is why I filed #10111 for it to be handled separately.In some ways
collect_set
is better because in the common case the cardinality of the values being collected are likely to be small. But in the worst case it is exactly the same ascollect_list
in terms of size.So we could try and do the same things proposed in #10111, with keeping the dedupe stage separate.
We could also look at making changes to how CUDF implements
collect_set
to make the common case much more memory efficient. Right nowcollect_set
is implemented in terms ofcollect_list
with a sort/dedupe stage. This works but makes the memory usage just as bad ascollect_list
in all cases. I am not sure what we could do to make it better though. The CPU essentially allocates a set per row and recalculates everything as it goes. This is not likely to work for the GPU because we cannot do dynamic memory allocation from the GPU itself, which means we would need to allocate memory for a set per row up front. I think there are things we could do with this by doing a distinct count on the entire column in the table and using it as an estimate to decide what to do next, but we are going to need to sit down with some cuda experts to see what we can come up with.The text was updated successfully, but these errors were encountered: