TL/MLX5: various optimizations #1012

samnordmann · 2024-08-21T11:32:29Z

What

This PR contains various optimizations for TL/MLX5/a2a. In order of importance/relevance:

support rectangular blocks
other configurations in how we post the WQEs:
- iterate across nodes before blocks when posting the WQEs
- reuse dm chunks
- send blocks by batch
knomial fan-in for the internode sync

We might want to merge this PR as is, or to divide it into several smaller ones. But this branch is at least a pointer for a working version, that can be used as is for performance experimentation.

TODO:

One important optimization that is yet to be implemented is to support using several NICs. So far, our algorithm only uses one NIC.

cc @lappazos @x41lakazam

TL/MLX5: add npolls cfg for FANIN TL/MLX5: knomial fanin TL/MLX5: add prints and profile events TL/MLX5: remove debug prints

tiny bit more robust print blocks dimensions fully working configurable batch_size, serialization, and pollings

clean

clean and working TL/MLX5: add more config for block dimensions force longer by default

lintrunner cleaning

samnordmann added 6 commits August 20, 2024 19:43

TL/MLX5: add WOD prof event

cd2fe2b

TL/MLX5: implement knomial fanin

fc10638

TL/MLX5: add npolls cfg for FANIN TL/MLX5: knomial fanin TL/MLX5: add prints and profile events TL/MLX5: remove debug prints

TL/MLX5: configurable batch_size, serialization, and pollings

d8763fb

tiny bit more robust print blocks dimensions fully working configurable batch_size, serialization, and pollings

TL/MLX5: BFS iter, visit nodes before blocks

32cfff1

clean

TL/MLX5: support rectangular blocks

5c18b85

clean and working TL/MLX5: add more config for block dimensions force longer by default

TL/MLX5: clean and fix after rebase

32d0718

lintrunner cleaning

janjust requested review from MamziB and janjust September 19, 2024 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TL/MLX5: various optimizations #1012

TL/MLX5: various optimizations #1012

samnordmann commented Aug 21, 2024 •

edited

Loading

TL/MLX5: various optimizations #1012

Are you sure you want to change the base?

TL/MLX5: various optimizations #1012

Conversation

samnordmann commented Aug 21, 2024 • edited Loading

What

TODO:

samnordmann commented Aug 21, 2024 •

edited

Loading