How to pretrain on GPU? #13

ghost · 2021-02-25T15:28:11Z

Hello!

I am trying to pretrain an adapter using the 4_pretrain_adapter.sh script.
I have a GeForce RTX 2080 SUPER installed (~8GB VRAM), with NVIDIA Driver Version: 440.33.01, CUDA Version: 10.2 and tensorflow-gpu 1.15.5.
I set CUDA_VISIBLE_DEVICES to 0 in the 4_pretrain_adapter.sh script since I only have a single GPU
Pretraining have been running for 12-16hrs now and is just about completing the warmup phase (~10000 steps).
I noticed that the pretraining only uses ~115MB of VRAM but several threads on CPU driving its usage up to ~100%.
I started perusing the code for GPU usage options/parameters, but, but so far only found a switch for TPU usage and a comment stipulating that if TPU is not available, then the Estimator (tf.contrib.tpu.TPUEstimator) will fall back on CPU or GPU.
I then looked at the official tensorflow documentation for TPUEstimator but no luck there either.

As I continue to look into this, I was wondering if you add some tips or advices about running the code locally on a single GPU.

The text was updated successfully, but these errors were encountered:

ai-nikolai · 2021-02-25T15:36:46Z

Hi Again.

Thanks for opening this issue. Yes, there were several issues with GPU training back then as well.

Perhaps you need to set the value to of CUDA_VISIBLE_DEVICES=1 (also don't forget to export the value from the bash script, since it needs to be "global"[aka accessible by tensorflow]).

Back then we tried different numbers of this value and for different people different things worked. Do try 1, however.

Here is also potentially a useful stack-overflow script.

https://stackoverflow.com/questions/39649102/how-do-i-select-which-gpu-to-run-a-job-on

ai-nikolai · 2021-02-25T15:37:45Z

We won't be able to try running it on GPUs until perhaps in 1 or 2 weeks, if the issue still persists then, we will try it again on our machines and will keep you posted.

ai-nikolai · 2021-02-25T15:42:57Z

Also, did I understand the question correctly? You did not manage to run it on the GPU locally?

Or is it rather that you want to speed up the run on the GPU locally?

If it is the latter then I can recommend the following:

Optimse Batch-size (find maximal batch-size for your GPU)
There are a couple tips in this blog-post (which also apply to single GPU): https://towardsdatascience.com/9-tips-for-training-lightning-fast-neural-networks-in-pytorch-8e63a502f565
Most prominent ones:

Use Mixed Precision
Use Faster Data File Format (DataLoaders)
Gradient Accumulation (before doing batch-update -> "this is technically like increasing the batch-size, without actually doing it"

ai-nikolai · 2021-02-25T15:44:42Z

Finally,

Pretraining is slow and does take a long time. I can't remember our exact times, but that was together with the grid-search for fine-tuning the slowest parts.

ghost · 2021-02-25T18:31:17Z

Hello again Nikolai!

You did understand my question, thank you for making sure.

I tried setting CUDA_VISIBLE_DEVICES=1 but it did not work (pretraining is not using the GPU as per nvidia-smi with this setting).

Pretraining does use the GPU when I set CUDA_VISIBLE_DEVICES=0, but only about 115MB of video memory is used (which is pretty low when doing anything with BERT in my experience). However, CPU usage spikes at ~100% (also not very typical outside of preproc).

Is this behavior (low GPU memory usage, high CPU usage) expected during pretraining?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to pretrain on GPU? #13

How to pretrain on GPU? #13

ghost commented Feb 25, 2021

ai-nikolai commented Feb 25, 2021

ai-nikolai commented Feb 25, 2021

ai-nikolai commented Feb 25, 2021

ai-nikolai commented Feb 25, 2021

ghost commented Feb 25, 2021

How to pretrain on GPU? #13

How to pretrain on GPU? #13

Comments

ghost commented Feb 25, 2021

ai-nikolai commented Feb 25, 2021

ai-nikolai commented Feb 25, 2021

ai-nikolai commented Feb 25, 2021

ai-nikolai commented Feb 25, 2021

ghost commented Feb 25, 2021