-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rationalize compute capability arguments in makefiles #3
base: main
Are you sure you want to change the base?
Rationalize compute capability arguments in makefiles #3
Conversation
Removes superflous {x.y | y > 0} args, adds comments to CC 6+ lines, and removes trailing space on CC 3.0 line. Also uncomments CC 5.0 line in win64 and linux makefiles, as current CUDA 12.6 still supports CC 5.x (Maxwell).
I couldn't quite determine who was responsible for adding the "two numbered" compute capabilities to the makefiles, but the fact that CC3.5 seemed to "unlock" some functionality that enabled speedup made me think I shouldn't delete all of the CC. for y!=0 lines, as I didn't have a good way of checking that this didn't give speedups. So I don't really want to remove them if the only downside is a slightly larger binary. However I do want to add CC5.0, I was just tricked into thinking it wasn't supported by someone else commenting 5.0 out in a distribution. Is that reasonable? |
The reason it builds a seperate kernel for CC 3.5 instead of using the one for CC 3.0 is documentend in the Makefile's comments, albeit very lightly.
The only reason this "unlocks" functionality is because of an #if statement in my_intrinsics.h. Adding more CC x.y arguments would only improve performance if code was written to take advantage of any newer features that they support. That is something I would be interested in seeing, but I have zero experience programming CUDA, so that's something I still need to research. TODO:
|
I wasn't aware of this bit of code before, thank you.
It's unclear if the CUDA compiler also takes advantage of the features of the "minor" CC version behind the scenes. This software is meant to be performance optimized, so even the possibility of a performance improvement outweighs the cost of a slightly larger binary, imo. You're more than welcome to compile a smaller binary with just the CC for the cards you're using, if binary size is that important to you. |
I admit, I don't really have anything to back up my statement, except a vague "vibe" (to use a neologism) I get from the comments on the CC1.1 to 5.0 args.
My motivation isn't neccessarily binary size, but rather for the makefile to look "pretty". This is entirely irrational, of course. I can compare CC 6.0 with CC 6.1 on my laptop's GPU, so I'll see if it makes any difference. |
NVIDIA GeForce GTX 1060 Mobile with a 24W power limit. This is only anecdotal evidence, of course, but in this instance there was no meaningful performance difference. |
The extra {x.y | x >= 6 and y > 0} lines only serve to increase compile times and executable size.
Also, current CUDA still supports CC 5.x, so that should still be included.
I would propose to keep the old CC lines in a commented out state, with the default being for the current CUDA version.
The old ones still work, as I have tested CUDA 5.0 builds (32 and 64-bit) with CC 1.1 on a Windows XP laptop with R304 drivers and a Tesla GPU.