Skip to content

Conversation

@mmason-nvidia
Copy link
Contributor

The linker code was passing in -lto to linker invocations that did not involve LTOIR code, and not passing it in some cases where LTOIR code was being linked. When enabling debugging of a Numba CUDA kernel which calls into LTOIR code, an exception was being raised by nvjitlink.

This change corrects that behavior, only passing in -lto for cases where at least one LTOIR code object is in the link list. The lto= parameter to the Linker initialization is still used to control compilation of .cu code with LTO enabled (which will result in the self._has_ltoir flag being set).

A testcase for validating this change and catching regressions is included.

Closes #696

…TOIR code

The linker code was passing in -lto to linker invocations that did not involve LTOIR code. When enabling debugging of a Numba CUDA kernel which calls into LTOIR code, an exception was being raised by nvjitlink.

This change corrects that behavior, only passing in -lto for cases where at least one LTOIR code object is in the link list. The lto= parameter to the Linker initialization is still used to control compilation of .cu code with LTO enabled (which will result in the self._has_ltoir flag being set).

A testcase for validating this change and catching regressions is included.

Closes NVIDIA#696
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 7, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 7, 2026

Greptile Overview

Greptile Summary

This PR fixes a critical bug where the -lto linker flag was incorrectly passed to nvjitlink when debugging kernels, causing ERROR_LTO_NOT_ENABLED exceptions. The fix introduces a _has_ltoir flag to track when LTOIR code is actually present and conditionally sets linker options based on this tracking.

Changes Overview

driver.py:

  • Removed premature LinkerOptions initialization from __init__ that unconditionally set link_time_optimization=lto
  • Added _has_ltoir flag initialized to False to track actual LTOIR code presence
  • Created _get_linker_options() method that conditionally sets link_time_optimization and ptx flags based on _has_ltoir
  • Updated add_cu() to set _has_ltoir=True when compiling with self.lto=True
  • Updated add_ltoir() to always set _has_ltoir=True
  • Both get_linked_ptx() and complete() now use _get_linker_options() to generate appropriate options

test_linker.py:

  • Adds new test test_debug_kernel_with_lto() to verify debugging works with LTOIR-linked code
  • Introduces debuggable_kernel that uses add_from_numba device function from LTOIR
  • Relies on NUMBA_CUDA_TEST_BIN_DIR environment variable to locate test LTOIR file

Issues Found

Critical: The test code has a NameError bug - test_device_functions_ltoir is only defined when TEST_BIN_DIR is set, but is used unconditionally at module level, causing import failures in environments without this variable set.

Incomplete Fix: As noted in previous review threads, the _has_ltoir flag is only set in add_cu() (when lto=True) and add_ltoir(), but not in add_fatbin(), add_object(), or add_library(). These methods can also contain LTOIR/NVVM code (as evidenced by the inspect_obj_content() checks in add_file_guess_ext()), which means the same bug could occur when linking with such files.

Confidence Score: 2/5

  • This PR has a critical bug in the test code that will cause a NameError on import when TEST_BIN_DIR is not set
  • While the core fix for LTOIR linker flag handling appears correct and addresses the stated issue, the test code has a critical logical error that will prevent the test module from being imported in environments where TEST_BIN_DIR is not set. Additionally, the issues raised in previous threads about missing _has_ltoir tracking in add_fatbin/add_object/add_library methods remain unaddressed, which could lead to the same bug recurring in different scenarios.
  • The test file (numba_cuda/numba/cuda/tests/cudadrv/test_linker.py) requires immediate attention to fix the NameError bug. The driver.py file should also be reviewed for completeness regarding LTOIR tracking in all add_* methods.

Important Files Changed

File Analysis

Filename Score Overview
numba_cuda/numba/cuda/cudadrv/driver.py 3/5 Fixes LTOIR flag handling in linker by introducing _has_ltoir tracking, but missing LTOIR detection for add_fatbin/add_object/add_library methods
numba_cuda/numba/cuda/tests/cudadrv/test_linker.py 2/5 Adds regression test for debug+LTO issue, but has NameError bug when TEST_BIN_DIR environment variable is not set

@gmarkall
Copy link
Contributor

/ok to test 94e6745

add_from_numba = cuda.declare_device(
"add_from_numba",
"int32(int32, int32)",
link=["testing/test_device_functions.ltoir"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong (hardcoding the path will not work in all cases) - instead it should be like in other test cases, e.g.:

test_device_functions_ltoir = os.path.join(
TEST_BIN_DIR, "test_device_functions.ltoir"
)

self.arch = arch
if lto is False:
# WAR for apparent nvjitlink issue
lto = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this will be a problem with the 12.9 CUDA Python bindings - there was a version that would only accept None and not False, but I don't recall what the version was.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I could tell from the CI so far, the removal of the None value did not cause an issue - perhaps it is no longer needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the tests that are now failing, it seems that this workaround is probably still needed.

@gmarkall
Copy link
Contributor

/ok to test ec52710

@gmarkall
Copy link
Contributor

Since the code changes in the non-test code looked good, I've just pushed the relevant fix for the test code to get this over the line.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +120 to +124
add_from_numba = cuda.declare_device(
"add_from_numba",
"int32(int32, int32)",
link=[test_device_functions_ltoir],
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is defined outside the if TEST_BIN_DIR: block (lines 114-117) but references test_device_functions_ltoir which is only defined inside that block. When TEST_BIN_DIR is not set, importing this test file will raise NameError: name 'test_device_functions_ltoir' is not defined.

This code should be indented to be inside the if TEST_BIN_DIR: block, or the declarations should be conditional:

Suggested change
add_from_numba = cuda.declare_device(
"add_from_numba",
"int32(int32, int32)",
link=[test_device_functions_ltoir],
)
if TEST_BIN_DIR:
test_device_functions_ltoir = os.path.join(
TEST_BIN_DIR, "test_device_functions.ltoir"
)
add_from_numba = cuda.declare_device(
"add_from_numba",
"int32(int32, int32)",
link=[test_device_functions_ltoir],
)
def debuggable_kernel(result):
i = cuda.grid(1)
result[i] = add_from_numba(i, i)

Comment on lines +353 to +354
def test_debug_kernel_with_lto(self):
cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses debuggable_kernel which references add_from_numba, but that function is only defined when TEST_BIN_DIR is set. The test will fail with NameError when TEST_BIN_DIR is not set.

The test should either:

  1. Be skipped when TEST_BIN_DIR is not set, or
  2. Be moved inside a conditional block
Suggested change
def test_debug_kernel_with_lto(self):
cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)
@unittest.skipIf(not os.getenv("NUMBA_CUDA_TEST_BIN_DIR"), "TEST_BIN_DIR not set")
def test_debug_kernel_with_lto(self):
cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 12, 2026

Additional Comments (1)

numba_cuda/numba/cuda/cudadrv/driver.py
The add_fatbin() and add_object() methods don't set _has_ltoir = True, but object files and fatbin files can contain embedded LTOIR/NVVM code (as indicated by the inspect_obj_content checks in add_file_guess_ext at lines 2354-2356 and 2385-2386).

When such files are added, the linker will incorrectly use link_time_optimization=False and ptx=False in the linker options, potentially causing the same linking errors this PR aims to fix.

Consider tracking LTOIR presence for these file types as well. This could be done by:

  1. Accepting an optional parameter to indicate LTOIR content, or
  2. Using inspect_obj_content to check for NVVM entries when these methods are called

Example approach:

def add_fatbin(self, fatbin, name="<cudapy-fatbin>", has_ltoir=False):
    obj = ObjectCode.from_fatbin(fatbin, name=name)
    self._object_codes.append(obj)
    if has_ltoir:
        self._has_ltoir = True

def add_object(self, obj, name="<cudapy-object>", has_ltoir=False):
    obj = ObjectCode.from_object(obj, name=name)
    self._object_codes.append(obj)
    if has_ltoir:
        self._has_ltoir = True

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +353 to +354
def test_debug_kernel_with_lto(self):
cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_debug_kernel_with_lto references debuggable_kernel and add_from_numba (lines 127-129 and 120-124), which are only defined when TEST_BIN_DIR is set (line 114). When TEST_BIN_DIR is not set, this test will fail with a NameError.

This test needs a skip decorator like other tests in the codebase that depend on TEST_BIN_DIR:

Suggested change
def test_debug_kernel_with_lto(self):
cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)
@unittest.skipIf(not TEST_BIN_DIR, "necessary binaries not generated.")
def test_debug_kernel_with_lto(self):

See test_nrt.py line 182 and test_linkable_code.py line 40 for examples of the correct pattern.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 12, 2026

Additional Comments (1)

numba_cuda/numba/cuda/cudadrv/driver.py
The _has_ltoir flag is not being set when adding object files (.o), fatbins (.fatbin), or libraries (.a/.lib) that may contain LTOIR/NVVM code. This could lead to the same bug this PR is fixing - the linker won't pass -lto even when LTOIR code is present in these file types.

While add_ltoir() correctly sets the flag (line 2446), and add_cu() sets it when self.lto is true (line 2436-2437), the methods add_fatbin(), add_object(), and add_library() do not set this flag.

Note that the code in add_file_guess_ext() (lines 2354-2356) uses inspect_obj_content() to detect LTOIR in objects/fatbins, but only for the ignore_nonlto logic - it doesn't set _has_ltoir.

Consider updating these methods to detect and set _has_ltoir when appropriate, similar to how add_file_guess_ext() checks for "nvvm" entries. For example:

def add_fatbin(self, fatbin, name="<cudapy-fatbin>"):
    obj = ObjectCode.from_fatbin(fatbin, name=name)
    self._object_codes.append(obj)
    # Check if fatbin contains LTOIR and set flag if needed

This would ensure consistent behavior regardless of which file format contains the LTOIR code.

@gmarkall
Copy link
Contributor

/ok to test

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 12, 2026

/ok to test

@gmarkall, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@gmarkall
Copy link
Contributor

/ok to test 78e0634

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +120 to +124
add_from_numba = cuda.declare_device(
"add_from_numba",
"int32(int32, int32)",
link=[test_device_functions_ltoir],
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code will cause a NameError when TEST_BIN_DIR is not set, because test_device_functions_ltoir is only defined inside the if TEST_BIN_DIR: block (lines 114-117). When the module loads, the cuda.declare_device() call will fail trying to access an undefined variable.

This will prevent the entire test module from importing when NUMBA_CUDA_TEST_BIN_DIR is not set in the environment.

The module-level declarations of add_from_numba and debuggable_kernel should be moved inside the if TEST_BIN_DIR: block, or the test method test_debug_kernel_with_lto should include a skip decorator similar to other tests in test_nvjitlink.py:

Suggested change
add_from_numba = cuda.declare_device(
"add_from_numba",
"int32(int32, int32)",
link=[test_device_functions_ltoir],
)
if TEST_BIN_DIR:
test_device_functions_ltoir = os.path.join(
TEST_BIN_DIR, "test_device_functions.ltoir"
)
add_from_numba = cuda.declare_device(
"add_from_numba",
"int32(int32, int32)",
link=[test_device_functions_ltoir],
)
def debuggable_kernel(result):
i = cuda.grid(1)
result[i] = add_from_numba(i, i)

Comment on lines +353 to +354
def test_debug_kernel_with_lto(self):
cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_debug_kernel_with_lto will fail when TEST_BIN_DIR environment variable is not set because debuggable_kernel and add_from_numba are only defined when TEST_BIN_DIR is set (as suggested in the comment on lines 120-124).

This test should include a skip decorator similar to tests in test_nvjitlink.py:

Suggested change
def test_debug_kernel_with_lto(self):
cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)
@unittest.skipIf(
not TEST_BIN_DIR, "TEST_BIN_DIR not set"
)
def test_debug_kernel_with_lto(self):

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +113 to +124
TEST_BIN_DIR = os.getenv("NUMBA_CUDA_TEST_BIN_DIR")
if TEST_BIN_DIR:
test_device_functions_ltoir = os.path.join(
TEST_BIN_DIR, "test_device_functions.ltoir"
)


add_from_numba = cuda.declare_device(
"add_from_numba",
"int32(int32, int32)",
link=[test_device_functions_ltoir],
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable test_device_functions_ltoir is only defined when TEST_BIN_DIR is set (line 114-117), but it's used unconditionally on line 123. This will cause a NameError when the test module is imported if TEST_BIN_DIR is not set in the environment, preventing the entire test module from loading.

This should either:

  1. Move the add_from_numba and debuggable_kernel definitions inside the conditional block, or
  2. Initialize test_device_functions_ltoir to a default value (like None) and add appropriate guards
Suggested change
TEST_BIN_DIR = os.getenv("NUMBA_CUDA_TEST_BIN_DIR")
if TEST_BIN_DIR:
test_device_functions_ltoir = os.path.join(
TEST_BIN_DIR, "test_device_functions.ltoir"
)
add_from_numba = cuda.declare_device(
"add_from_numba",
"int32(int32, int32)",
link=[test_device_functions_ltoir],
)
TEST_BIN_DIR = os.getenv("NUMBA_CUDA_TEST_BIN_DIR")
if TEST_BIN_DIR:
test_device_functions_ltoir = os.path.join(
TEST_BIN_DIR, "test_device_functions.ltoir"
)
add_from_numba = cuda.declare_device(
"add_from_numba",
"int32(int32, int32)",
link=[test_device_functions_ltoir],
)
def debuggable_kernel(result):
i = cuda.grid(1)
result[i] = add_from_numba(i, i)

Comment on lines +353 to +354
def test_debug_kernel_with_lto(self):
cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses debuggable_kernel which references test_device_functions_ltoir that only exists when TEST_BIN_DIR is set. The test should have a skip decorator to avoid failures when the required LTOIR file is not available.

Consider adding a skip decorator similar to other tests in this file:

Suggested change
def test_debug_kernel_with_lto(self):
cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)
@unittest.skipUnless(TEST_BIN_DIR, "TEST_BIN_DIR not set")
def test_debug_kernel_with_lto(self):
cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)

Copy link
Contributor

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the old WAR is not related to the fails on CUDA <= 12.2 - this seems to be some other issue that needs debugging - accordingly, I've left the change in its original form, and the only change of mine that I kept is the fix so that the test can run.

I would ignore greptile, it is talking without enough idea about the larger context.

@gmarkall gmarkall added the 4 - Waiting on author Waiting for author to respond to review label Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4 - Waiting on author Waiting for author to respond to review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]Incorrect linker flags when linking a debuggable Numba CUDA kernel with code compiled for LTO

2 participants