UCT/CUDA: Update cuda_copy perf estimates for Grace-Hopper #10155

SeyedMir · 2024-09-17T22:51:57Z

What

Update cuda_copy perf estimates for Grace-Hopper

Why ?

The bandwidth and latency values will be different for PCIe versus C2C links that connect CPU and GPU.

How ?

Update the cuda_copy bw config and UCX_CUDA_COPY_BW.

ivankochin · 2024-09-19T06:41:27Z

src/uct/cuda/cuda_copy/cuda_copy_iface.c

+ perf_attr->bandwidth.shared = zcopy ? iface->config.bw.h2d :
+ iface->config.bw.h2d * 0.95;
 } else if ((src_mem_type == UCS_MEMORY_TYPE_CUDA) &&
 (dst_mem_type == UCS_MEMORY_TYPE_HOST)) {
- perf_attr->bandwidth.shared = (zcopy ? 11660.0 : 9320.0) *
- UCS_MBYTE;
+ perf_attr->bandwidth.shared = zcopy ? iface->config.bw.d2h :
+ iface->config.bw.d2h * 0.95;


Why bcopy BW is slower than zcopy one? BTW 11660 * 0.95 is not equal to 9320. Maybe we need to introduce to different env variables like BCOPY_BW and ZCOPY_BW to control this values accurately. Or if we are OK with changing performance in common case, maybe better just not to distinguish bcopy/zcopy perf and set one value in both cases?

It's actually not zcopy vs. bcopy; it's zcopy vs. short. Unlike zcopy, put/get short operations invoke cuStreamSynchronize per operation. Therefore, we want to advertise a slightly lower bw for the short vs zcopy operation for cuda_copy.
I'm not sure what 9320 represents. Why do you want it to be equal to 9320?

Thanks for the explanation. I am thinking about whether we need this difference to be made by this way because each change to performance estimation without proper performance testing can lead to unforeseen degradation in some cases. So if we want to tune performance for GH systems only I would like to leave performance for other platforms untouched. Or if we are OK to change performance on all platforms in that PR, I am wondering whether this 5% difference really matters or we can follow this KISS principle and set the same values for both zcopy and short cases.

@brminich @yosefe WDYT?

ivankochin · 2024-09-19T06:46:22Z

src/uct/cuda/cuda_copy/cuda_copy_iface.c

+ ucs_offsetof(uct_cuda_copy_iface_config_t, bw.d2h)},
+ {"d2d", "device to device bandwidth",
+ ucs_offsetof(uct_cuda_copy_iface_config_t, bw.d2d)},
+ {"other", "any other src-dest memory types bandwidth",


Minor

Suggested change

{"other", "any other src-dest memory types bandwidth",

{"other", "any other memory types combinations bandwidth",

ivankochin · 2024-09-19T06:47:23Z

src/uct/cuda/cuda_copy/cuda_copy_iface.c

- ucs_offsetof(uct_cuda_copy_iface_config_t, bandwidth), UCS_CONFIG_TYPE_BW},
+ /* TODO: 1. Add separate keys for shared and dedicated bandwidth
+ 2. Remove the "other" key (use pref_loc for managed memory) */
+ {"BW", "h2d:8300MBs,d2h:11660MBs,d2d:320GBs,other:10000MBs",


Is that possible to remove other and define value for other memory types combinations through value without label?

Suggested change

{"BW", "h2d:8300MBs,d2h:11660MBs,d2d:320GBs,other:10000MBs",

{"BW", "10000MBs,h2d:8300MBs,d2h:11660MBs,d2d:320GBs",

SeyedMir requested review from ivankochin and brminich September 17, 2024 22:57

UCT/CUDA: Update cuda_copy perf estimates for Grace-Hopper

f6964ee

SeyedMir force-pushed the cuda-estimate-perf-conf branch from 9f8dc24 to f6964ee Compare September 18, 2024 17:24

ivankochin reviewed Sep 19, 2024

View reviewed changes

Use no-key shortcut to set 'other' value and update TODO comment

9dc6481

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UCT/CUDA: Update cuda_copy perf estimates for Grace-Hopper #10155

UCT/CUDA: Update cuda_copy perf estimates for Grace-Hopper #10155

SeyedMir commented Sep 17, 2024

ivankochin Sep 19, 2024

SeyedMir Sep 19, 2024

ivankochin Sep 20, 2024

ivankochin Sep 19, 2024

ivankochin Sep 19, 2024

	{"other", "any other src-dest memory types bandwidth",
	{"other", "any other memory types combinations bandwidth",

	{"BW", "h2d:8300MBs,d2h:11660MBs,d2d:320GBs,other:10000MBs",
	{"BW", "10000MBs,h2d:8300MBs,d2h:11660MBs,d2d:320GBs",

UCT/CUDA: Update cuda_copy perf estimates for Grace-Hopper #10155

Are you sure you want to change the base?

UCT/CUDA: Update cuda_copy perf estimates for Grace-Hopper #10155

Conversation

SeyedMir commented Sep 17, 2024

What

Why ?

How ?

ivankochin Sep 19, 2024

Choose a reason for hiding this comment

SeyedMir Sep 19, 2024

Choose a reason for hiding this comment

ivankochin Sep 20, 2024

Choose a reason for hiding this comment

ivankochin Sep 19, 2024

Choose a reason for hiding this comment

ivankochin Sep 19, 2024

Choose a reason for hiding this comment