Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Refactor cuDF Python merging logic for index types #9978

Closed
vyasr opened this issue Jan 6, 2022 · 3 comments
Closed

[FEA] Refactor cuDF Python merging logic for index types #9978

vyasr opened this issue Jan 6, 2022 · 3 comments
Labels
feature request New feature or request Python Affects Python cuDF API.

Comments

@vyasr
Copy link
Contributor

vyasr commented Jan 6, 2022

Is your feature request related to a problem? Please describe.
pandas only supports merge operations for DataFrame and Series objects. cudf's merging code also supports merges involving Index objects. This support makes the internals of the merge code excessively convoluted and likely introduces performance overheads for user-facing merge APIs due to the additional logic required to handle index objects (and therefore, to handle input objects that do not themselves have indexes).

Describe the solution you'd like
We cannot simply disable merges for index objects because various internal code paths in cudf assume that Index objects may be merged. Therefore, we should separate logic for merging Indexes from the implementation of the public merge APIs. By identifying the exact use cases for index merging we may be able to significantly accelerate code paths relying on these merges since the implementation of such a merge is likely to be much simpler than the current merge implementation, which has to handle all the complexities associated with the pandas merge API. The change should also save us from needing to introduce complex multiple dispatch patterns as proposed in #9807 (comment).

@vyasr vyasr added feature request New feature or request Needs Triage Need team to review and classify labels Jan 6, 2022
@vyasr vyasr added this to the CuDF Python Refactoring milestone Jan 6, 2022
@github-actions
Copy link

github-actions bot commented Feb 5, 2022

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@github-actions
Copy link

github-actions bot commented May 6, 2022

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@GregoryKimball GregoryKimball added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Jun 26, 2022
@vyasr
Copy link
Contributor Author

vyasr commented Jul 13, 2022

After #10689 and #11184 index merging is based only on DataFrame merging. There is nothing index-specific left to do here, although we may in the future consider extracting a simpler version of merging that doesn't involve supporting all the different pieces of the pandas merge API.

@vyasr vyasr closed this as completed Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

2 participants