-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROCm gfx10 and gfx11 not supported #1429
Comments
How did you determine the affected versions? It looks like they still use wavefronts of 64 threads, so there might be something going wrong with the macro we use to determine whether we are compiling for CUDA or ROCm. |
Simple. Ran the build like 6 times and added whichever ones I saw were failing in the log to the gpu targets exclude list in my package. |
PR: #261155 |
Can reproduce this outside nixos |
First observations: This is the failing command
Removing |
You mean here? Line 201 in 2f3720f
I already replace it with # `--amdgpu-target` is deprecated
substituteInPlace cmake/hip.cmake \
--replace "--amdgpu-target" "--offload-arch" |
Removing |
Probable fix: https://llvm.org/docs/AMDGPUUsage.html#target-features use |
I'm getting this error when prepending In file included from /build/source/include/ginkgo/core/base/exception.hpp:41:
ginkgo-hpc-hip> In file included from /build/source/include/ginkgo/core/base/types.hpp:50:
ginkgo-hpc-hip> /nix/store/7y1f77gd62zy52b52ivapa4inkjcb9mq-clr-5.7.0/include/hip/hip_runtime.h:41:2: error: HIP is not supported on the specified GPU ARCH with wavefront size 64
ginkgo-hpc-hip> #error HIP is not supported on the specified GPU ARCH with wavefront size 64
ginkgo-hpc-hip> ^
ginkgo-hpc-hip> [ 20%] Building CXX object reference/CMakeFiles/ginkgo_reference.dir/solver/cb_gmres_kernels.cpp.o
ginkgo-hpc-hip> 1 warning and 1 error generated when compiling for gfx1010.
ginkgo-hpc-hip> CMake Error at ginkgo_hip_generated_residual_norm_kernels.hip.cpp.o.cmake:200 (message):
ginkgo-hpc-hip> Error generating file
ginkgo-hpc-hip> /build/source/build/hip/CMakeFiles/ginkgo_hip.dir/stop/./ginkgo_hip_generated_residual_norm_kernels.hip.cpp.o
ginkgo-hpc-hip>
ginkgo-hpc-hip> make[2]: *** [hip/CMakeFiles/ginkgo_hip.dir/build.make:399: hip/CMakeFiles/ginkgo_hip.dir/stop/ginkgo_hip_generated_residual_norm_kernels.hip.cpp.o] Error 1
ginkgo-hpc-hip> make[2]: *** Waiting for unfinished jobs....
ginkgo-hpc-hip> [ 21%] Building CXX object reference/CMakeFiles/ginkgo_reference.dir/solver/common_gmres_kernels.cpp.o #if __HIP_DEVICE_COMPILE__ && !__GFX8__ && !__GFX9__ && __AMDGCN_WAVEFRONT_SIZE == 64
#error HIP is not supported on the specified GPU ARCH with wavefront size 64
#endif |
Thanks for the report, that's disappointing. It should be possible to compile Ginkgo with wavefront size 32, but unfortunately we don't have any gfx10/11 GPUs available right now to test this, and I'm not comfortable claiming support for it without checking that the tests run correctly. Some dependencies (rocBLAS, rocSPARSE, rocFFT) might also not work on them? Though I haven't checked yet. |
If you can give me a patch set I can give you at least a preliminary indicator of whether it works or not. |
I just noticed we are doing this almost correctly already. As a first check, you could try
and check which tests run correctly. I only see a handful that will definitely fail. |
Looks like that fixed pretty much everything. |
Note that this is not a complete fix, it's is intended only a workaround for consumer-grade GPUs. Though I'm not sure if we actually do need larger than 32 Jacobi blocks. |
ginkgo-test.log |
That's how it looks when no kernels are being run in CUDA/we compiled for the wrong architecture. We don't seem to be catching errors of that kind explicitly. I guess it's safe to say that for now, we can only support server-grade GPUs (Radeon VII and Instinct series) |
Could you also try appending the following change in
warpSize to 32 might not be neccessary because AMD should use 32 already for those archs. |
@yhmtsai that likely won't make a difference, since uint32 and uint64 are inter-convertible. The use of warpSize is why this issue came up in the first place, otherwise we wouldn't be trying to instantiate the 64 blocksize Jacobi kernels. |
The voting function will consider the all bits of mask type. |
The result of |
Yes, but it only works for checking |
I think we're safe there:
but you are right, many of the failing tests use either cooperative groups or rocBLAS. |
Sorry for the late reply. |
That is encouraging. Based on the test output, I would be so bold to put the blame for all non-MPI failures on rocBLAS, which we use for our GEMM operations, which we use in all tests for the system matrix. Can you try running the rocBLAS test suite? |
I assume you're talking about |
I'll also have a gfx1102 available soon, so if this is too much effort, I can also take over ;) |
I need to rearrange a lot of the rocm derivations to use a layout more like ginkgo in the linked PR. |
I would just build rocBLAS separately outside of nix, since I would assume if it's an issue, it's less likely to be a configuration issue, and more likely AMD not testing rocBLAS extensively on all of their GPUs. |
While likely not a nix issue, I'm not going to discount it's possibility. I can probably spin up a VM or something and do some A-B testing with each having a rocBLAS installation to make sure.
I've packaged most of their software, and I can tell you that's very likely. |
Looks like only gfx9 and below is supported here ATM.
gfx1010, gfx1012, gfx1030, gfx1100, gfx1101, and gfx1102 are affected.
Here's the error log. I'll link you to the build environment once I submit a PR to nixpkgs.
The text was updated successfully, but these errors were encountered: