Skip to content

Conversation

@mmason-nvidia
Copy link
Contributor

When using the CUDA Toolkit release 13.1 or later for debug builds, we need to pass the -numba-debug flag to libnvvm in order to enable enhanced debug information.

Closes #679

When using the CUDA Toolkit release 13.1 or later for debug builds, we need to pass the -numba-debug flag to libnvvm in order to enable enhanced debug information.

Closes NVIDIA#679
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 19, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 19, 2025

Greptile Summary

Adds support for passing the -numba-debug flag to libnvvm when compiling with debugging enabled (g option) and using CUDA Toolkit 13.1+.

Major changes:

  • Implements get_cuda_version() method that detects CUDA version by compiling a dummy IR program and analyzing the PTX output using cuda.bindings.utils APIs
  • Adds version check in CompilationUnit.__init__() that automatically adds -numba-debug flag for CUDA 13.1+ or when version cannot be determined
  • Uses caching mechanism to avoid repeated version detection calls

Issues found:

  • Error handling in get_cuda_version() at lines 283-285 re-raises exceptions, breaking the graceful fallback to None that the rest of the code expects

Confidence Score: 4/5

  • Safe to merge after fixing the exception handling bug
  • The implementation correctly addresses the requirement to pass -numba-debug for CUDA 13.1+. The version detection approach is reasonable given the lack of direct API. However, there is a critical logic error in exception handling (lines 283-285) that breaks the intended fallback behavior. Once fixed, the change is safe.
  • The error handling in get_cuda_version() needs to be fixed to properly fall back to None on exceptions

Important Files Changed

Filename Overview
numba_cuda/numba/cuda/cudadrv/nvvm.py Adds CUDA version detection via PTX compilation and conditionally passes -numba-debug flag for CUDA 13.1+. Implementation is mostly sound but has an error handling issue.

…bnvvm.

As suggested by leofang, the original approach of determinging if the -numba-debug flag should be used is unreliable. Instead, use the pattern from one of the tests and compile a test program to examine it's PTX output version to deduce the CUDA Toolkit version. This result is cached in the NVVM singleton.
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. numba_cuda/numba/cuda/cudadrv/nvvm.py, line 254-255 (link)

    syntax: Incorrect type for options parameter. nvvmVerifyProgram expects POINTER(c_char_p) but receives a Python list.

  2. numba_cuda/numba/cuda/cudadrv/nvvm.py, line 257 (link)

    syntax: Same issue: incorrect type for options parameter. Must use option_ptrs instead.

  3. numba_cuda/numba/cuda/cudadrv/nvvm.py, line 274-275 (link)

    logic: Unconditional destruction of potentially uninitialized program handle. If nvvmCreateProgram fails at line 239, program remains an empty c_void_p() and destroying it could cause issues.

1 file reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. numba_cuda/numba/cuda/cudadrv/nvvm.py, line 262 (link)

    syntax: use c_size_t() instead of c_int() to match function signature

  2. numba_cuda/numba/cuda/cudadrv/nvvm.py, line 255-256 (link)

    syntax: options must be encoded to bytes for ctypes

  3. numba_cuda/numba/cuda/cudadrv/nvvm.py, line 277-279 (link)

    logic: calling check_error in finally block can raise exception during error handling, masking the original error. wrap in try-except or check error without raising

1 file reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. numba_cuda/numba/cuda/cudadrv/nvvm.py, line 283-285 (link)

    logic: re-raising the exception defeats the graceful fallback logic. If test program compilation fails, _libnvvm_cuda_version stays None but the exception propagates, preventing the caller from using the fallback behavior. Consider removing the raise to allow silent failure and return None.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@gmarkall
Copy link
Contributor

gmarkall commented Jan 6, 2026

/ok to test 6764681

Comment on lines +290 to +293
try:
self.check_error(err, "Failed to destroy test program.")
except Exception:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's no point checking the error if we're going to swallow the exception the check will raise anyway.

Suggested change
try:
self.check_error(err, "Failed to destroy test program.")
except Exception:
pass

err = self.nvvmGetCompiledResult(program, ptx_data)
self.check_error(err, "Failed to get test program compiled result.")
except Exception as exception:
print(f"Exception compiling test program: {exception}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be a warning rather than a print:

Suggested change
print(f"Exception compiling test program: {exception}")
warnings.warn(
f"Exception compiling test program: {exception}",
category=NvvmWarning
)

self.check_error(err, "Failed to get test program compiled result.")
except Exception as exception:
print(f"Exception compiling test program: {exception}")
raise exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should re-raise the exception, just let it pass - otherwise I'd expect it to propagate all the way back to the user, which we may not want.

Suggested change
raise exception

self._libnvvm_cuda_version = (
get_minimal_required_cuda_ver_from_ptx_ver(ptx_version)
)
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the only likely exception we'd expect from the PTX version functions is a ValueError - anything else is a bit more surprising so we should let it manifest to expose the underlying bug instead:

Suggested change
except Exception:
except ValueError:

# pass in the -numba-debug flag.
if "g" in options:
ctk_version = self.driver.get_cuda_version()
if ctk_version is None or ctk_version >= (13, 1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we couldn't determine the version of the CTK, that could be because the version is 12.x because the necessary PTX version functions weren't present in the CUDA bindings for 12.x. So I think it'd be safer to assume we don't pass the -numba-debug flag if we can't determine the version:

Suggested change
if ctk_version is None or ctk_version >= (13, 1):
if ctk_version is not None and ctk_version >= (13, 1):

Copy link
Contributor

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some comments on the diff, but I also saw that the CI is failing.

If you set up commit signing (so that your commits show as "Verified" rather than "Unverified") you should be able to trigger the CI yourself by commenting /ok to test as well.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +283 to +285
except Exception as exception:
print(f"Exception compiling test program: {exception}")
raise exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: catching the exception, printing it, and re-raising defeats the purpose of graceful fallback. the code at lines 299-305 expects exceptions to be silently caught, allowing _libnvvm_cuda_version to remain None. this re-raise will prevent the function from returning None on error.

Suggested change
except Exception as exception:
print(f"Exception compiling test program: {exception}")
raise exception
except Exception:
pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4 - Waiting on author Waiting for author to respond to review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] Pass -numba-debug flag to libnvvm for debug builds using the 13.1 or later release of the CUDA Toolkit

3 participants