Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GraphStore: Data, HeteroData respect is_sorted #4922

Merged
merged 7 commits into from
Jul 7, 2022

Conversation

mananshah99
Copy link
Contributor

@mananshah99 mananshah99 commented Jul 6, 2022

Allows NeighborLoader csc() conversion with GraphStore to respect is_sorted if GraphStore is a Data or HeteroData object.

@codecov
Copy link

codecov bot commented Jul 6, 2022

Codecov Report

Merging #4922 (0317cbc) into master (31866b5) will increase coverage by 0.00%.
The diff coverage is 95.34%.

@@           Coverage Diff           @@
##           master    #4922   +/-   ##
=======================================
  Coverage   82.71%   82.71%           
=======================================
  Files         330      330           
  Lines       17857    17870   +13     
=======================================
+ Hits        14770    14781   +11     
- Misses       3087     3089    +2     
Impacted Files Coverage Δ
torch_geometric/data/data.py 91.40% <88.88%> (-0.16%) ⬇️
torch_geometric/data/hetero_data.py 94.03% <88.88%> (-0.18%) ⬇️
torch_geometric/data/graph_store.py 92.99% <100.00%> (+0.13%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31866b5...0317cbc. Read the comment docs.

Copy link
Member

@rusty1s rusty1s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing information here. Why would we want to allow override of is_sorted? IMO, one should take care of correctly inserting COO representation in the first place.

@mananshah99 mananshah99 changed the title GraphStore: respect NeighborLoader is_sorted override GraphStore: Data, HeteroData respect is_sorted Jul 6, 2022
@mananshah99
Copy link
Contributor Author

@rusty1s this is now updated to have Data and HeteroData respect is_sorted properly. PTAL

Copy link
Member

@rusty1s rusty1s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock, but I have some questions in the comments below on handling edge_attr in (Data + HeteroData).

torch_geometric/data/data.py Outdated Show resolved Hide resolved
torch_geometric/data/data.py Outdated Show resolved Hide resolved
@@ -842,6 +842,9 @@ def _put_edge_index(self, edge_index: EdgeTensorType,
attr_val = edge_tensor_type_to_adj_type(edge_attr, edge_index)
setattr(self, attr_name, attr_val)

# Set edge attributes:
setattr(self, f'{attr_name}_edge_attr', edge_attr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also set size of edge_attr (if not specified). WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand: why would it be useful to set the size of EdgeAttr here? We set the size of the Data object if it's provided in edge_attr, and when we get_all_edge_attrs we return this size as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I think we need some clarity on what we actually want to save in EdgeAttr - I was under the impression that all its information is useful to maintain. If the EdgeAttr does not define size, we can set it based on self.num_nodes.

torch_geometric/data/data.py Show resolved Hide resolved
@rusty1s
Copy link
Member

rusty1s commented Jul 7, 2022

@mananshah99 My main confusion is probably that EdgeAttr defines both attributes to query in get_edge_index, but also information that describe the EdgeTensorType. Wouldn't it make more sense to push information of size and is_sorted to EdgeTensorType rather than EdgeAttr?

@mananshah99
Copy link
Contributor Author

@rusty1s that's a fair point, you are right that currently EdgeAttr contains a mix of information that relate to both query information (e.g. edge type, edge layout) and edge-specific information (e.g. is_sorted, size). I don't think it's easy to store this information with EdgeTensorType, as that should just be the returned edge index imo. Open to an alternative refactor here in the future, but for now I think this solution is reasonable.

@mananshah99 mananshah99 merged commit fdb1ab0 into master Jul 7, 2022
@mananshah99 mananshah99 deleted the gs_respect_is_sorted branch July 7, 2022 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants