You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have followed the user guide and converted Pytorch ET + Kineto Trace into Chakra ET. However, it seems that communication nodes are incorrectly marked as computation nodes, for example:
if json_node.is_gpu_op():
if "ncclDevKernel_SendRecv" in json_node.name:
parent_node = json_node_map[json_node.parent]
keyword = (
json_node_map[parent_node.parent].name
if parent_node.name == "record_param_comms"
else parent_node.name
)
if "send" in keyword:
return COMM_SEND_NODE
if "recv" in keyword:
return COMM_RECV_NODE
if "ncclKernel" in json_node.name or "ncclDevKernel" in json_node.name:
return COMM_COLL_NODE
return COMP_NODE
def is_gpu_op(self) -> bool:
"""
Check if the node is a GPU operator.
Returns
bool: True if the node is a GPU operator, False otherwise.
"""
return self.cat is not None
However, it seems that the "cat" attribute would be dropped during the Pytorch ET + Kineto Trace link so that none of the nodes would be marked as a GPU node, and consequently marked as COMP_NODE.
I do not know which part is not as expected, but the logic of json_node.is_gpu_op() seems weird to me.
The text was updated successfully, but these errors were encountered:
I have followed the user guide and converted Pytorch ET + Kineto Trace into Chakra ET. However, it seems that communication nodes are incorrectly marked as computation nodes, for example:
At https://github.com/mlcommons/chakra/blob/main/src/converter/pytorch_converter.py#L341
It seems that a node must be a GPU node before it becomes a communication node. However, the logic of
json_node.is_gpu_op()
is really confusing.At https://github.com/mlcommons/chakra/blob/main/src/converter/pytorch_node.py#L149
However, it seems that the "cat" attribute would be dropped during the Pytorch ET + Kineto Trace link so that none of the nodes would be marked as a GPU node, and consequently marked as COMP_NODE.
I do not know which part is not as expected, but the logic of
json_node.is_gpu_op()
seems weird to me.The text was updated successfully, but these errors were encountered: