-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCT/IB/MLX5/DC: introducing dcs_hybrid policy #10138
base: master
Are you sure you want to change the base?
Conversation
src/uct/ib/mlx5/dc/dc_mlx5_ep.h
Outdated
uct_dc_mlx5_dci_pool_init_dci(iface, uct_dc_mlx5_ep_pool_index(ep), | ||
UCT_DC_MLX5_HW_DCI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it mean HW DCS is the first DCI, that is most likely to be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/uct/ib/mlx5/dc/dc_mlx5_ep.h
Outdated
if (!uct_dc_mlx5_iface_is_hybrid(iface)) { | ||
return UCS_ERR_NO_RESOURCE; | ||
} | ||
|
||
if (ep->dci == UCT_DC_MLX5_HW_DCI) { | ||
return UCS_OK; | ||
} | ||
|
||
if (uct_dc_mlx5_iface_dci_has_tx_resources(iface, UCT_DC_MLX5_HW_DCI)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we try to reduce number of branches on the fast path?
Add hw_dci attribute to iface which will be -1 if hybrid was not selected |
@@ -331,6 +331,9 @@ struct uct_dc_mlx5_iface { | |||
uint8_t av_fl_mlid; | |||
|
|||
uint8_t num_dci_channels; | |||
|
|||
/* used in hybrid dcs policy otherwise -1 */ | |||
int8_t hybrid_hw_dci; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we limited to 127 DCIs? maybe make it uint8_t?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it should be uint16_t because I don't want this value to intersect with UCT_DC_MLX5_EP_NO_DCI
which is a (uint8_t) -1
Also the maximum capacity is 512 which is > 256
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you create some DC specific gtests for this new mode?
@@ -901,7 +903,7 @@ uct_dc_mlx5_iface_dcis_create(uct_dc_mlx5_iface_t *iface, | |||
|
|||
ucs_array_length(&iface->tx.dcis) = 0; | |||
|
|||
status = uct_dc_mlx5_iface_create_dci(iface, 0, 0); | |||
status = uct_dc_mlx5_iface_create_dci(iface, 0, 0, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why just one channel and not self->tx.num_dci_channels
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't matter as this dci is only created to query bb_max
and then destroyed (lines 913-914)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ic, probably need to change the name of this func, as it is quite confusing currently
if (uct_dc_mlx5_iface_is_dci_shared(iface) || | ||
uct_dc_mlx5_is_hw_dci(iface, ep->dci)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems these two check are frequently used together. separate func is worth creating
What
Introducing a new dci allocation policy - dcs_hybrid which will behave the same as dcs_quota except using a dedicated HW dci when available instead of waiting for a dci from the pool to be available
Why ?
Utilizing hw resources instead of waiting for another dci, improves latency for certain message sizes
How ?
In case
dci_alloc_or_create
returned 0 - use the dedicated HW dci with a new dci channel.