Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drops duplicate edges in non-MultiGraph PLC SGGraph instances #4658

Conversation

rlratzel
Copy link
Contributor

@rlratzel rlratzel commented Sep 19, 2024

Graph input with duplicate edges intended for Graph/DiGraph instances resulted in internal PLC SGGraph instances with duplicate edges, which were effectively treated as MultiGraphs and caused incorrect results from algorithms like pagerank.

This PR sets the drop_multi_edges PLC SGGraph ctor option to have PLC remove duplicate edges on SGGraph creation.

The overhead to drop duplicate edges for non-MultiGraphs is negligible, and in the case of a large test graph (wikipedia data, 37.5M nodes, 464.5M edges) resulted in an overall speedup for pagerank going from 12.2 seconds to 10.7 seconds on my workstation, likely due to fewer edges to process a minor slowdown from 10.5s to 10.7s. edit: after several re-runs, the pagerank runtime before the change settled to 10.5, and the runtime after the change was typically 10.7.

A test was added that uses pagerank to ensure Graphs vs. MultiGraphs are handled correctly and duplicate edges are dropped as needed. The results when run without drop_multi_edges set:

>       assert actual_pr_for_G == approx(expected_pr_for_G)
E       assert {0: 0.0875795...7955580949783} == approx({0: 0....32 ± 1.8e-07})
E
E         comparison failed. Mismatched elements: 4 / 4:
E         Max absolute difference: 0.08785887916592061
E         Max relative difference: 0.5007959662968462
E         Index | Obtained            | Expected
E         0     | 0.08757955580949783 | 0.17543839772251532 ± 1.8e-07
E         1     | 0.41242048144340515 | 0.32456160227748454 ± 3.2e-07
E         2     | 0.41242048144340515 | 0.32456160227748454 ± 3.2e-07
E         3     | 0.08757955580949783 | 0.17543839772251532 ± 1.8e-07

The same test passes when run with the changes in this PR to set drop_multi_edges.

…ank to ensure Graphs w/out multi edges are created and that MultiGraphs are still created correctly.
@rlratzel rlratzel added bug Something isn't working non-breaking Non-breaking change nx-cugraph labels Sep 19, 2024
@rlratzel rlratzel self-assigned this Sep 19, 2024
@rlratzel rlratzel requested a review from a team as a code owner September 19, 2024 19:54
@@ -702,6 +710,7 @@ def _get_plc_graph(
renumber=False,
do_expensive_check=False,
vertices_array=self._node_ids,
drop_multi_edges=not self.is_multigraph(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. All you did was setting a flag and no performance degradation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it ended up being very straightforward. I did notice a slight slowdown of <2% after additional runs, but still worth doing.

@rlratzel
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 7e058e2 into rapidsai:branch-24.10 Sep 20, 2024
131 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Non-breaking change nx-cugraph python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants