You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The purpose of the nvptx-tools is to test a gcc toolchain generating code for a single threaded ptx interpreter. The interpreter is provided by the cuda platform. Part of this platform is ptxas.
This is no ordinary assembler, but one with different optimization levels (-O0 ... -O4), and different code generators (-ori/-noori), and the task to insert the missing instructions and annotations handling the convergence stack (ssy, .s postfix). Consequently, ptxas can be less stable than you'd like for an assembler.
Nvptx-as.c calls ptxas with -O0. The purpose here is to do a minimal verification of the ptx validity. The output of ptxas is thrown away (so there's no great value in spending time optimizing the generated code), and the ptx is compiled again in nvptx-run.c (here using the cuda runtime functions rather than ptxas). The default optimization setting in nvptx-run.c though is -O4. Presumably the intention here is to achieve the fastest execution possible.
My observation is that while ptxas sigsegvs (and equivalent failures in nxptv-run) are of interest to nvidia, they are usually uninteresting from the point of view of gcc code generation.
Typically, when encountering such a sigsegv in the gcc test suite, we manually:
try different ptxas optimization levels
try different code generators
try different cuda versions
and if we find that the sigsegv goes away when changing any of those parameters, we conclude it's a cuda bug, and move on (perhaps by xfailing the testcase or some such).
I wonder if it makes sense to automate this:
let nvptx-as.c first try -O0, then -O0 -ori
let nvptx-run.c first try -O4, then -O4 -ori, then -O3, the -O3 -ori, etc (or some such)
(assuming we can achieve the -ori equivalent using CUjit_option CU_JIT_NEW_SM3X_OPT)
This would reduce testsuite noise, save time, and reduce the number of xfails (and xpasses when running with different ptxas flags or cuda versions).
This could be implemented as the new default behaviour, or we could add a --fallback switch to nvptx-as.c and nvptx-run.c.
Eventually we could warn against using cuda versions which are known to be buggy to the point that that this fallback scenario doesn't help.
The text was updated successfully, but these errors were encountered:
The purpose of the nvptx-tools is to test a gcc toolchain generating code for a single threaded ptx interpreter. The interpreter is provided by the cuda platform. Part of this platform is ptxas.
This is no ordinary assembler, but one with different optimization levels (-O0 ... -O4), and different code generators (-ori/-noori), and the task to insert the missing instructions and annotations handling the convergence stack (ssy, .s postfix). Consequently, ptxas can be less stable than you'd like for an assembler.
Nvptx-as.c calls ptxas with -O0. The purpose here is to do a minimal verification of the ptx validity. The output of ptxas is thrown away (so there's no great value in spending time optimizing the generated code), and the ptx is compiled again in nvptx-run.c (here using the cuda runtime functions rather than ptxas). The default optimization setting in nvptx-run.c though is -O4. Presumably the intention here is to achieve the fastest execution possible.
My observation is that while ptxas sigsegvs (and equivalent failures in nxptv-run) are of interest to nvidia, they are usually uninteresting from the point of view of gcc code generation.
Typically, when encountering such a sigsegv in the gcc test suite, we manually:
and if we find that the sigsegv goes away when changing any of those parameters, we conclude it's a cuda bug, and move on (perhaps by xfailing the testcase or some such).
I wonder if it makes sense to automate this:
(assuming we can achieve the -ori equivalent using CUjit_option CU_JIT_NEW_SM3X_OPT)
This would reduce testsuite noise, save time, and reduce the number of xfails (and xpasses when running with different ptxas flags or cuda versions).
This could be implemented as the new default behaviour, or we could add a --fallback switch to nvptx-as.c and nvptx-run.c.
Eventually we could warn against using cuda versions which are known to be buggy to the point that that this fallback scenario doesn't help.
The text was updated successfully, but these errors were encountered: