Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[distGB] graphbolt graph edge's mask will be filled with 0 if these edges have no mask initial #7846

Merged
merged 15 commits into from
Jan 9, 2025

Conversation

CfromBU
Copy link
Collaborator

@CfromBU CfromBU commented Dec 12, 2024

Add a new parameter "padding" to add_edge_attribute, the edge's mask will be filled with padding rather than 0.
When use DistEdgeDataloader the padding will be set to 1 to fully sample the edge with no mask.

@CfromBU
Copy link
Collaborator Author

CfromBU commented Dec 12, 2024

@thvasilo ,this pr can fix the graphbolt's mask issue in graphstorm.

@CfromBU CfromBU requested a review from Rhett-Ying December 12, 2024 09:45
Copy link
Collaborator

@Rhett-Ying Rhett-Ying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add dedicated testcases for the bug

Copy link
Contributor

@thvasilo thvasilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code looks good now. One suggestion on the naming, padding implies we pad an existing tensor with additional elements until it reaches a specifc length, see https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html

What we are doing here is filling a tensor with a specifc value, which used to be 0 and is now 1. So I would suggest we change the name of the argument and variables to fill instead padding.

@CfromBU CfromBU requested a review from thvasilo December 30, 2024 01:55
gb_padding : int, optional
The padding value for GraphBolt partitions' new edge_attributes if the attributes in DistGraph are None.
e.g. prob/mask-based sampling.
Only when the mask of one edge is set as 1, the edge will be sampled.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more details about why we set default value as 1 while set 0 for node counterpart

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have set the gb_padding as 1 in NodeCollator and changed the comment correspondingly.

attr_data = torch.zeros(num_edges, dtype=data_type)

# Padding is used here to fill missing edge attributes (e.g., 'prob' or 'mask') for certain edge types.
# In DGLGraph, some edges may not have attributes or their values could be None.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not clear enough. we should make it clear that the None edges will be sampled in DGL, not sampled while in GB.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have enriched the comment and explained the None edges will be sampled in DGLGraph.

Copy link
Collaborator

@Rhett-Ying Rhett-Ying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically LGTM. please resolve all threads after changes. and always resolve all threads before merge. Make sure the code works well in the case we hit the issue, and unit test passes in your local as CI is down for now.

Copy link
Contributor

@classicsong classicsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

gb_padding : int, optional
The padding value for GraphBolt partitions' new edge_attributes.
e.g. some edges of specific types have no mask, the mask will be set as gb_padding.
the edge will not be sampled if the mask is 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the edge will not be sampled if the mask is 0.
An edge will not be sampled if the mask is 0.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I have changed the comment.

@Rhett-Ying Rhett-Ying merged commit 17017c2 into dmlc:master Jan 9, 2025
1 of 2 checks passed
@CfromBU
Copy link
Collaborator Author

CfromBU commented Jan 9, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants