-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DateType support for AST expressions #3752
Comments
I would like to try |
I am a little curious about what kind of optimization we could do on AST. |
There are a few reasons. The biggest one is around the context in which the expression runs. If we want to do the comparison as a part of a join. For example Join A and B on The second reason is memory bandwidth. For example, if I want to run something like a + b + c + d. If I run it the normal, non-AST way. I need to add a+b, and produce a temp result. Then add that temp result to c and produce another temp result. Then add the other temp result to d to get the final answer. That means I had to call 3 kernels, write out 3 columns of data to the GPU's memory, and read in 6 columns of data from the GPU's memory. With the AST, in theory we run 1 kernel, read 4 columns of data and write out 1. That should speed it up by 2x, in theory. We have started to work on the first use case with joins. Just because there is no other way to do it even remotely efficiently for some types of joins otherwise. For the second use case we have not seen projections be enough of an issue that we have really started to tackle it. |
Thank you for your explanation. |
libcudf AST supports timestamp types, and Spark's
DateType
is treated as a timestamp type in libcudf. We should be able to extend the existing AST expression support to includeDateType
inputs.The text was updated successfully, but these errors were encountered: