make nvptx-tools robust against cuda bugs #14

vries · 2017-03-15T17:12:18Z

The purpose of the nvptx-tools is to test a gcc toolchain generating code for a single threaded ptx interpreter. The interpreter is provided by the cuda platform. Part of this platform is ptxas.

This is no ordinary assembler, but one with different optimization levels (-O0 ... -O4), and different code generators (-ori/-noori), and the task to insert the missing instructions and annotations handling the convergence stack (ssy, .s postfix). Consequently, ptxas can be less stable than you'd like for an assembler.

Nvptx-as.c calls ptxas with -O0. The purpose here is to do a minimal verification of the ptx validity. The output of ptxas is thrown away (so there's no great value in spending time optimizing the generated code), and the ptx is compiled again in nvptx-run.c (here using the cuda runtime functions rather than ptxas). The default optimization setting in nvptx-run.c though is -O4. Presumably the intention here is to achieve the fastest execution possible.

My observation is that while ptxas sigsegvs (and equivalent failures in nxptv-run) are of interest to nvidia, they are usually uninteresting from the point of view of gcc code generation.

Typically, when encountering such a sigsegv in the gcc test suite, we manually:

try different ptxas optimization levels
try different code generators
try different cuda versions

and if we find that the sigsegv goes away when changing any of those parameters, we conclude it's a cuda bug, and move on (perhaps by xfailing the testcase or some such).

I wonder if it makes sense to automate this:

let nvptx-as.c first try -O0, then -O0 -ori
let nvptx-run.c first try -O4, then -O4 -ori, then -O3, the -O3 -ori, etc (or some such)
(assuming we can achieve the -ori equivalent using CUjit_option CU_JIT_NEW_SM3X_OPT)

This would reduce testsuite noise, save time, and reduce the number of xfails (and xpasses when running with different ptxas flags or cuda versions).

This could be implemented as the new default behaviour, or we could add a --fallback switch to nvptx-as.c and nvptx-run.c.

Eventually we could warn against using cuda versions which are known to be buggy to the point that that this fallback scenario doesn't help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make nvptx-tools robust against cuda bugs #14

make nvptx-tools robust against cuda bugs #14

vries commented Mar 15, 2017

make nvptx-tools robust against cuda bugs #14

make nvptx-tools robust against cuda bugs #14

Comments

vries commented Mar 15, 2017