[BUG] Significant overhead on running neural network benchmark #1132

aterrel · 2024-04-10T18:28:59Z

Software versions

unknown

Jupyter notebook / Jupyter Lab version

No response

Expected behavior

Rory Mitchell has two implementations of neural network training, one in cunumeric and one in native C++/CUDA (with nccl). These two pieces of code should be similar in runtime with perhaps cunumeric being only 80% of the CUDA version.

Observed behavior

When he trains 10 (small) neural networks with hidden layer size (100,100) in legateboost on 1 GPU, on a dataset, he gets an order difference in training times:
Cunumeric: 79.447s
Native: 4.833s

Example code or instructions

https://github.com/rapidsai/legate-boost/pull/92

Stack traceback or browser console output

Slack thread:

Rory Mitchell
10 hours ago
I have two implementations of neural network training, one in cunumeric and one in native C++/CUDA (with nccl). If I train 10 (small) neural networks with hidden layer size (100,100) in legateboost on 1 GPU, on a dataset, I get the following training times:
Cunumeric: 79.447s
Native: 4.833s
In summary, the latency of cunumeric seems to be too high for this kind of application. Its too slow for me to run tests on CI when each run takes this long.

Andy Terrel
7 hours ago
Hi Rory,
Can you please share the code? I would like to make a github issue that will preserve the lineage of the issue better than a slack thread.

Rory Mitchell
7 hours ago
rapidsai/legate-boost#92
👍
1

Wonchan Lee
:spiral_calendar_pad: 1 hour ago
@Rory Mitchell
how hard is it to extract the nn piece and make it runnable for both cupy and cunumeric? this posting is quite timely, as we just started talking about optimizing single GPU execution: https://docs.google.com/document/d/1IGwvwaSi4Dh5vqK7Hq41k9A8gPWnwPict8rLrGOyQq4/edit#heading=h.4yizf0qjzlv7

Wonchan Lee
:spiral_calendar_pad: 1 hour ago
this could be a great target for the fast path proposed in the doc

Wonchan Lee
:spiral_calendar_pad: 1 hour ago
I'm also curious to see the execution profile if it's readily available. I want to make sure that we're limited by the overhead of launching cunumeric tasks

Rory Mitchell
31 minutes ago
I can probably pull it out for you with some effort.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Significant overhead on running neural network benchmark #1132

[BUG] Significant overhead on running neural network benchmark #1132

aterrel commented Apr 10, 2024

[BUG] Significant overhead on running neural network benchmark #1132

[BUG] Significant overhead on running neural network benchmark #1132

Comments

aterrel commented Apr 10, 2024

Software versions

Jupyter notebook / Jupyter Lab version

Expected behavior

Observed behavior

Example code or instructions

Stack traceback or browser console output