Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT/IB/MLX5/DC: introducing dcs_hybrid policy #10138

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

roiedanino
Copy link
Contributor

@roiedanino roiedanino commented Sep 9, 2024

What

Introducing a new dci allocation policy - dcs_hybrid which will behave the same as dcs_quota except using a dedicated HW dci when available instead of waiting for a dci from the pool to be available

Why ?

Utilizing hw resources instead of waiting for another dci, improves latency for certain message sizes

How ?

In case dci_alloc_or_create returned 0 - use the dedicated HW dci with a new dci channel.

@roiedanino roiedanino self-assigned this Sep 9, 2024
src/uct/ib/mlx5/dc/dc_mlx5_ep.c Outdated Show resolved Hide resolved
src/uct/ib/mlx5/dc/dc_mlx5_ep.h Outdated Show resolved Hide resolved
Comment on lines 373 to 374
uct_dc_mlx5_dci_pool_init_dci(iface, uct_dc_mlx5_ep_pool_index(ep),
UCT_DC_MLX5_HW_DCI);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it mean HW DCS is the first DCI, that is most likely to be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't mean that, because in alloc_or_create we are allocating by the length of dcis array:
image

src/uct/ib/mlx5/dc/dc_mlx5_ep.h Outdated Show resolved Hide resolved
src/uct/ib/mlx5/dc/dc_mlx5_ep.h Outdated Show resolved Hide resolved
src/uct/ib/mlx5/dc/dc_mlx5_ep.h Outdated Show resolved Hide resolved
Comment on lines 711 to 719
if (!uct_dc_mlx5_iface_is_hybrid(iface)) {
return UCS_ERR_NO_RESOURCE;
}

if (ep->dci == UCT_DC_MLX5_HW_DCI) {
return UCS_OK;
}

if (uct_dc_mlx5_iface_dci_has_tx_resources(iface, UCT_DC_MLX5_HW_DCI)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we try to reduce number of branches on the fast path?

@roiedanino
Copy link
Contributor Author

Add hw_dci attribute to iface which will be -1 if hybrid was not selected

@@ -331,6 +331,9 @@ struct uct_dc_mlx5_iface {
uint8_t av_fl_mlid;

uint8_t num_dci_channels;

/* used in hybrid dcs policy otherwise -1 */
int8_t hybrid_hw_dci;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we limited to 127 DCIs? maybe make it uint8_t?

Copy link
Contributor Author

@roiedanino roiedanino Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be uint16_t because I don't want this value to intersect with UCT_DC_MLX5_EP_NO_DCI which is a (uint8_t) -1
Also the maximum capacity is 512 which is > 256

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor

@brminich brminich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you create some DC specific gtests for this new mode?

@@ -901,7 +903,7 @@ uct_dc_mlx5_iface_dcis_create(uct_dc_mlx5_iface_t *iface,

ucs_array_length(&iface->tx.dcis) = 0;

status = uct_dc_mlx5_iface_create_dci(iface, 0, 0);
status = uct_dc_mlx5_iface_create_dci(iface, 0, 0, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why just one channel and not self->tx.num_dci_channels?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter as this dci is only created to query bb_max and then destroyed (lines 913-914)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ic, probably need to change the name of this func, as it is quite confusing currently

src/uct/ib/mlx5/dc/dc_mlx5_ep.h Show resolved Hide resolved
Comment on lines +725 to +726
if (uct_dc_mlx5_iface_is_dci_shared(iface) ||
uct_dc_mlx5_is_hw_dci(iface, ep->dci)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems these two check are frequently used together. separate func is worth creating

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants