-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMD: parse the architecture as supplied by gcnArchName #11244
base: master
Are you sure you want to change the base?
Conversation
I don't know at all whether this is the correct way to do it. @IMbackK your input would be appreciated. |
Yes this is more correct, the current code misses the arch step part. Thus nak on the change to the gfx8 define. |
Theres also a snag in this pr regarding gfx90a, gfx90a reports 9.1 as major minor but its gcnArchName is gfx90a which this pr wont parse correctly, same goes for others like gfx90c. So the current code is not correct, but this pr has too many issues to serve as an improvement as is. |
It appears this returns the full target ID as defined in https://github.com/ROCm/clr/blob/amd-staging/rocclr/device/device.cpp around line 125. This'll need to be expanded upon in order to parse out xnack status and to handle the addition of generics. If it were possible to retrieve the version stepping directly that would be preferable to parsing it out of a string. Would the xnack status be of any use here or can that just be ignored? |
xnack can be ignored since we dont use hipMallocManaged allocated memory. Outside of the user recompileing the whole rocm stack with non default flags only gfx942 and gfx90a can end up in xnak+ mode. |
Yeah, they certainly don't make enabling xnack easy. On linux the kernel module also needs patched to prevent it from rejecting the device |
7bd1195
to
468296f
Compare
468296f
to
9620bce
Compare
This will now work with all the IDs AMD has in staging and will gracefully fall back to the old way if it fails. Please let me know if I've missed anything. Would it be better to submit backend changes like this to ggml first? |
The value provided by minor is truncated for AMD, parse the value returned by gcnArchName instead to retrieve an accurate ID. We can also use the common value for GCN4, as gfx800, to avoid missing compatible devices.
9620bce
to
f77ea24
Compare
The value provided by minor is truncated for AMD so parse the value returned by gcnArchName for an accurate ID.
We can also use the common value for GCN4, gfx800, to avoid missing compatible devices.
This is a follow-up to #11209 and will change the behavior of CDNA3, CDNA, VEGA and GCN4 as they should now be recognized as expected. Of those I only have access to a GCN4 device for testing.