-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[distGB] graphbolt graph edge's mask will be filled with 0 if these edges have no mask initial #7846
Conversation
@thvasilo ,this pr can fix the graphbolt's mask issue in graphstorm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add dedicated testcases for the bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code looks good now. One suggestion on the naming, padding implies we pad an existing tensor with additional elements until it reaches a specifc length, see https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html
What we are doing here is filling a tensor with a specifc value, which used to be 0 and is now 1. So I would suggest we change the name of the argument and variables to fill
instead padding
.
gb_padding : int, optional | ||
The padding value for GraphBolt partitions' new edge_attributes if the attributes in DistGraph are None. | ||
e.g. prob/mask-based sampling. | ||
Only when the mask of one edge is set as 1, the edge will be sampled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more details about why we set default value as 1 while set 0 for node counterpart
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have set the gb_padding as 1 in NodeCollator and changed the comment correspondingly.
python/dgl/distributed/dist_graph.py
Outdated
attr_data = torch.zeros(num_edges, dtype=data_type) | ||
|
||
# Padding is used here to fill missing edge attributes (e.g., 'prob' or 'mask') for certain edge types. | ||
# In DGLGraph, some edges may not have attributes or their values could be None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not clear enough. we should make it clear that the None edges will be sampled in DGL, not sampled while in GB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I have enriched the comment and explained the None edges will be sampled in DGLGraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically LGTM. please resolve all threads after changes. and always resolve all threads before merge. Make sure the code works well in the case we hit the issue, and unit test passes in your local as CI is down for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
gb_padding : int, optional | ||
The padding value for GraphBolt partitions' new edge_attributes. | ||
e.g. some edges of specific types have no mask, the mask will be set as gb_padding. | ||
the edge will not be sampled if the mask is 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the edge will not be sampled if the mask is 0. | |
An edge will not be sampled if the mask is 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I have changed the comment.
图机器学习算法库作业已收到!
|
Add a new parameter "padding" to add_edge_attribute, the edge's mask will be filled with padding rather than 0.
When use DistEdgeDataloader the padding will be set to 1 to fully sample the edge with no mask.