-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Remove "All columns required to have same data type" requirement in .to_dlpack() method. #7123
Comments
This would require changing the C++ implementation, so I'd like to understand... what is the type of the data that This requires casts, so at the C++ level we would likely require the user to explicitly specify the output type, and reserve the right to error. We would still error if they try to do something impossible like convert a table with one column of ints and one column of strings (or lists of lists of strings!) into a tensor of floats. |
Hi @harrism , In the example above, I understand casting is needed, but maybe it is well-worth to double-check how An additional output type parameter would be nice. Nevertheless, I would make it optional. When not present, I would cast to the bigger datatype which is able to store the data in the dataframe. About the strings, I´d say there is no problem because, IIRC, dataframes with only numeric series are allowed so far. |
We should maintain the current C++ behavior of requiring everything to be the same type. It's the callers responsibility (i.e., the cuDF Python wrappers) to cast all the columns to the desired type beforehand. |
Hi @jrhemstad , Can I ask why we should maintain current implementation? I mean, is there any technical reason why it couldn’t be changed? |
@miguelusque I think the point thus far has been that the C++ function in libcudf shouldn't do any automatic casting as that's unexpected behavior from the perspective of C++. On the Python side it could be argued that we should handle automatic casting for dlpack similar to what's done for |
HI @kkraus14 , Sure! My only concern is the performance penalty by performing the casting in the Python side instead of the c++ side (I do not have any data to support my concern). I mean, in the worst scenario, the user will need to cast his data, so the fastest it could be done, the better. In the Python side, it can always be invoked Nevertheless, thank you for considering this feature request! |
The actual casting would be done on the C++ side, just explicitly invoked from the Python implementation on an as-needed basis. |
This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d. |
I think this feature request is still relevant. |
This issue has been labeled |
I think this issue is still relevant. |
This issue has been labeled |
I think this feature request is still relevant. |
Resolves: #7123 This PR adds a common dtype casting as requested here: #7123 (comment) Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - https://github.com/brandon-b-miller - Michael Wang (https://github.com/isVoid) URL: #9585
Is your feature request related to a problem? Please describe.
Hi!
While trying to convert a cudf dataframe to a cupy array, I have noticed a different behaviour between two cudf methods:
_df_.as_gpu_matrix()
_df_.to_dlpack()
While the first method works well when the dataframe contains different numeric types (
int64
andfloat64
), the second method returns the following error:The following code works:
The following code fails:
Hope it helps!
Miguel
Describe the solution you'd like
It would be great if
_df_.to_dlpack()
method could behave similarly to_df_.as_gpu_matrix()
method.Describe alternatives you've considered
Using
_df_.as_gpu_matrix()
method.Additional context
Using RAPIDS 0.17, the latest stable version on Ubuntu 20.04.
The text was updated successfully, but these errors were encountered: