Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] cudf.DataFrame.from_dict #11934

Closed
mattf opened this issue Oct 17, 2022 · 7 comments · Fixed by #12048
Closed

[FEA] cudf.DataFrame.from_dict #11934

mattf opened this issue Oct 17, 2022 · 7 comments · Fixed by #12048
Assignees
Labels
feature request New feature or request Python Affects Python cuDF API.

Comments

@mattf
Copy link

mattf commented Oct 17, 2022

Is your feature request related to a problem? Please describe.
rewriting code from pandas into cudf, trying to use import cudf as pd

Describe the solution you'd like
cudf.DataFrame.from_dict matching https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.from_dict.html

@mattf mattf added Needs Triage Need team to review and classify feature request New feature or request labels Oct 17, 2022
@rjzamora
Copy link
Member

+1 to this feature (and to it's dask_cudf counterpart).

@galipremsagar galipremsagar self-assigned this Oct 18, 2022
@quasiben
Copy link
Member

From a call with @shwina. We can already do something dict things like:

In [4]: cudf.DataFrame({'a': cudf.Series([1,2,3]), 'b': cudf.Series([4,5,6], index=[9,10,11])})
Out[4]:
       a     b
0      1  <NA>
1      2  <NA>
2      3  <NA>
9   <NA>     4
10  <NA>     5
11  <NA>     6

Hmm, actually the DataFrame constructor is pretty robust:

In [19]: cudf.DataFrame({'c': ['a', 'b', 'd'], 'b': cudf.Series([1, 2, 3]), 'd': pd.Series([10, 11, 12])})
Out[19]:
   c  b   d
0  a  1  10
1  b  2  11
2  d  3  12

In [23]: cudf.DataFrame({'a': cudf.Series([1,2,3]), 'b': cudf.Series([4,5,6], index=[9,10,11]),'c': pd.Series(['q', 'r', 's'])})
Out[23]:
       a     b     c
0      1  <NA>     q
1      2  <NA>     r
2      3  <NA>     s
9   <NA>     4  <NA>
10  <NA>     5  <NA>
11  <NA>     6  <NA>

@mattf can you, as a test, check if the standard constructor works for your usecase ? If so, maybe this is as easy as redirecting from_dict to DataFrame(...)

@rjzamora
Copy link
Member

@quasiben - I also suspect that cudf.from_dict can simply wrap cudf.DataFrame - Probably raising a NotImplementedError for orient="index" and orient="tight". In the long run, it may be nice to support the same orient options as pandas. However, the most important thing is probably that the API exists.

More context: The dask.dataframe.DataFrame constructor does not accept a dict-like argument, and so dd.from_dict is now the recommended way to create a Dask-DataFrame from dict-formatted data. Now that dd.from_dict is backend-dispatchable, it would be nice to be able to call into an explicit cudf/dask_cudf.DataFrame method for the "cudf" backend.

@mattf
Copy link
Author

mattf commented Oct 21, 2022

@quasiben the cudf.DataFrame constructor works for the specific case i have, but the most important thing is the API matches what pandas.DataFrame.from_dict provides

@GregoryKimball
Copy link
Contributor

@mattf Would you please provide a specific case where pandas and cudfdiverge infrom_dict` behavior?

@GregoryKimball GregoryKimball added 0 - Waiting on Author Waiting for author to respond to review Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Oct 21, 2022
@rjzamora
Copy link
Member

Would you please provide a specific case where pandas and cudf diverge infrom_dict` behavior?

I think the problem is that there is no cudf.DataFrame.from_dict method at all.

@mattf
Copy link
Author

mattf commented Oct 22, 2022

@GregoryKimball pandas.DataFrame.from_dict exists and cudf.DataFrame.from_dict does not

@galipremsagar galipremsagar removed the 0 - Waiting on Author Waiting for author to respond to review label Nov 1, 2022
rapids-bot bot pushed a commit that referenced this issue Nov 14, 2022
…12048)

Resolves: #11934 

- [x] Adds `DataFrame.from_dict` and `DataFrame.to_dict`
- [x] Adds `Series.to_dict`

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Lawrence Mitchell (https://github.com/wence-)

URL: #12048
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants