Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce cuFFT memory usage by re-using the memory space between the forward and backward FFT operations #3

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

David-McKenna
Copy link
Contributor

Hey Cees,

Not the patch I was intending to upstream next, but I've been having endless issues with OpenMP forcing itself to be serial, so here's a quick win I came across.

cuFFT allows you to manage your memory usage yourself, so here we can only allocate the larger memory block needed for the cuFFT operation rather than allocating two separate blocks for each FFT. This saves me around 2GB in VRAM in my normal configuration, and I ran a test to compare it to what you've mentioned in the past (20 subbands I believe?) and it should be ~1GB saved in your case, so you could increase nforward or sample even more DMs

Cheers,
David

This is achieved by sharing the memory space between the cuFFT operations (since they cannot be executed in parallel), and only allocated the memory needed by the largest operation (though in this case they -should- be the same

References https://docs.nvidia.com/cuda/cufft/index.html\#function-cufftgetsizemany https://docs.nvidia.com/cuda/cufft/index.html\#function-cufftsetautoallocation + further functions in 3.7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant