Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to pretrain on GPU? #13

Open
ghost opened this issue Feb 25, 2021 · 5 comments
Open

How to pretrain on GPU? #13

ghost opened this issue Feb 25, 2021 · 5 comments

Comments

@ghost
Copy link

ghost commented Feb 25, 2021

Hello!

I am trying to pretrain an adapter using the 4_pretrain_adapter.sh script.
I have a GeForce RTX 2080 SUPER installed (~8GB VRAM), with NVIDIA Driver Version: 440.33.01, CUDA Version: 10.2 and tensorflow-gpu 1.15.5.
I set CUDA_VISIBLE_DEVICES to 0 in the 4_pretrain_adapter.sh script since I only have a single GPU
Pretraining have been running for 12-16hrs now and is just about completing the warmup phase (~10000 steps).
I noticed that the pretraining only uses ~115MB of VRAM but several threads on CPU driving its usage up to ~100%.
I started perusing the code for GPU usage options/parameters, but, but so far only found a switch for TPU usage and a comment stipulating that if TPU is not available, then the Estimator (tf.contrib.tpu.TPUEstimator) will fall back on CPU or GPU.
I then looked at the official tensorflow documentation for TPUEstimator but no luck there either.

As I continue to look into this, I was wondering if you add some tips or advices about running the code locally on a single GPU.

@ai-nikolai
Copy link
Collaborator

Hi Again.

Thanks for opening this issue. Yes, there were several issues with GPU training back then as well.

Perhaps you need to set the value to of CUDA_VISIBLE_DEVICES=1 (also don't forget to export the value from the bash script, since it needs to be "global"[aka accessible by tensorflow]).

Back then we tried different numbers of this value and for different people different things worked. Do try 1, however.

Here is also potentially a useful stack-overflow script.

https://stackoverflow.com/questions/39649102/how-do-i-select-which-gpu-to-run-a-job-on

@ai-nikolai
Copy link
Collaborator

We won't be able to try running it on GPUs until perhaps in 1 or 2 weeks, if the issue still persists then, we will try it again on our machines and will keep you posted.

@ai-nikolai
Copy link
Collaborator

Also, did I understand the question correctly? You did not manage to run it on the GPU locally?

Or is it rather that you want to speed up the run on the GPU locally?

If it is the latter then I can recommend the following:

  1. Optimse Batch-size (find maximal batch-size for your GPU)
  2. There are a couple tips in this blog-post (which also apply to single GPU): https://towardsdatascience.com/9-tips-for-training-lightning-fast-neural-networks-in-pytorch-8e63a502f565
  3. Most prominent ones:
  • Use Mixed Precision
  • Use Faster Data File Format (DataLoaders)
  • Gradient Accumulation (before doing batch-update -> "this is technically like increasing the batch-size, without actually doing it"

@ai-nikolai
Copy link
Collaborator

Finally,

  1. Pretraining is slow and does take a long time. I can't remember our exact times, but that was together with the grid-search for fine-tuning the slowest parts.

@ghost
Copy link
Author

ghost commented Feb 25, 2021

Hello again Nikolai!

You did understand my question, thank you for making sure.

I tried setting CUDA_VISIBLE_DEVICES=1 but it did not work (pretraining is not using the GPU as per nvidia-smi with this setting).

Pretraining does use the GPU when I set CUDA_VISIBLE_DEVICES=0, but only about 115MB of video memory is used (which is pretty low when doing anything with BERT in my experience). However, CPU usage spikes at ~100% (also not very typical outside of preproc).

Is this behavior (low GPU memory usage, high CPU usage) expected during pretraining?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant