-
Notifications
You must be signed in to change notification settings - Fork 54
Use pathfinder for dynamic libraries
#308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Use pathfinder for dynamic libraries
#308
Conversation
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
|
||
|
|
||
| def _get_source_variable(lib, static=False): | ||
| # remove? only used in test() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove this function and test() below? cuda_paths also supports saying what the source of the loaded library is, such as if it's a wheel vs conda lib etc. Path finder treats this as an implementation detail.
We could always change it to just report the path without the source but its possible this function has proliferated a bit and is used for debugging out in the wild.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path finder treats this as an implementation detail.
A hint could be added to the LoadedDL type if that's useful. Do you think it will be?
LoadedDL reliably includes the absolute path, even if the library was loaded outside of path finder. The path is more conclusive and usually it's very obvious if it's from a wheel, conda, or full CTK installation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's helpful context. I think the best solution then would be one that maintains the current function, but decodes what the library source is on the numba side. I can take a whack at this and see how it looks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's possible to keep the test() functionality I'd prefer that - it has been quite important for debugging in the past. For a lot of misconfigurations on peoples' systems, Numba-CUDA has not been the root cause of an issue but it is often the first place where a problem manifests, so it's been invaluable for showing people what the issue is and demonstrating that it's not just that "Numba-CUDA is broken". It's OK if the format of the output changes, but whatever information we can provide here from the CUDA pathfinder that provides parity with existing functionality will be helpful.
Also note that Numba will attempt to call it when producing sysinfo, so if it is removed entirely then it'll break numba sysinfo's (running python -m numba -s or numba -s) ability to show info about CUDA: https://github.com/numba/numba/blob/0c633b896526b039171c0f800df70320e1c61a53/numba/misc/numba_sysinfo.py#L381
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but whatever information we can provide here from the CUDA pathfinder that provides parity with existing functionality will be helpful.
Ok, tx, I'll work on adding that in.
| configuration. | ||
| """ | ||
|
|
||
| # CUDA headers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't a library but are still needed by numba, is this something pathfinder will support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for this feature in pathfinder
|
@rwgk could you help review? (I can't add you as a reviewer for some reason) |
rwgk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a lot of numba-cuda specific context, but what I see in this PR looks good to me.
| configuration. | ||
| """ | ||
|
|
||
| # CUDA headers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the plan.
|
|
||
|
|
||
| def _get_source_variable(lib, static=False): | ||
| # remove? only used in test() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path finder treats this as an implementation detail.
A hint could be added to the LoadedDL type if that's useful. Do you think it will be?
LoadedDL reliably includes the absolute path, even if the library was loaded outside of path finder. The path is more conclusive and usually it's very obvious if it's from a wheel, conda, or full CTK installation.
| from numba.cuda.cudadrv.driver import locate_driver_and_loader, load_driver | ||
| from numba.cuda.cudadrv.error import CudaSupportError | ||
| from numba.core import config | ||
| from cuda.bindings import path_finder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rwgk wouldn't this break once we make pathfinder a standalone module? (NVIDIA/cuda-python#723)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking not, because my intention is to keep the backward compatibility code for a cuda_bindings release or two.
I just saw @kkraus14 asked about adding a deprecation warning in that code (on NVIDIA/cuda-python#723): I'd don't think there is a rush to get rid of the few lines of backward compatibility code, therefore I'd lean towards making it easy for the world, by staging: one cuda_bindings release without deprecation warning, the next one or two cuda_bindings releases with warning, then remove the backward compatibility code.
(This is very similar to how features are deprecated in CPython.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with the approach of phasing in the deprecation.
|
|
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
cuda_paths in favor of using cuda.bindings.path_finder to find CUDA componentspathfinder for dynamic libraries
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
/ok to test |
Greptile OverviewGreptile SummaryThis PR refactors library path discovery to use Critical Issues Found1. Missing Base Dependency (BLOCKING)
2. Breaking Exception Handling (BLOCKING)
3. Removed Fallback Mechanism (BREAKING)
4. Inconsistent Implementation
Additional Issues
Recommended Actions
Confidence Score: 0/5
Important Files ChangedFile Analysis
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 3 comments
| from numba.cuda.cudadrv.driver import locate_driver_and_loader, load_driver | ||
| from numba.cuda.cudadrv.error import CudaSupportError | ||
| from numba.cuda.core import config | ||
| from numba.core import config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P0] Incorrect import breaks functionality. Changed from numba.cuda.core to numba.core, but CUDA_INCLUDE_PATH is defined in numba.cuda.core.config (line 510-511 of cuda_paths.py), not numba.core.config. This will cause AttributeError: module 'numba.core.config' has no attribute 'CUDA_INCLUDE_PATH' at line 205 when calling config.CUDA_INCLUDE_PATH.
| if lib in {"nvrtc", "nvvm"}: | ||
| return get_cuda_paths()[lib].info or _dllnamepattern % lib | ||
|
|
||
| dir_type = "static_cudalib_dir" if static else "cudalib_dir" | ||
| libdir = get_cuda_paths()[dir_type].info | ||
| return get_cuda_paths()[lib].info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P0] Removed fallback breaks library loading. Old code: get_cuda_paths()[lib].info or _dllnamepattern % lib provided fallback to generic library name when path not found, allowing system loader to search. New code returns None if pathfinder fails (when get_cuda_paths()[lib].info is None), causing ctypes.CDLL(None) to fail in open_cudalib() at line 85.
numba_cuda/numba/cuda/cuda_paths.py
Outdated
| except pathfinder.DynamicLibNotFoundError: | ||
| # Try system search | ||
| # TODO: remove after cuda-python/1157 is resolved | ||
| nvvm = pathlib.Path(_get_nvvm_system_path()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P1] pathlib.Path(None) raises TypeError when _get_nvvm_system_path() returns None (line 552-553). This occurs when get_system_ctk("nvvm") returns None, causing unhandled exception instead of raising DynamicLibNotFoundError. Check if return value is None before creating Path object.
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 files reviewed, 5 comments
| def _get_nvvm(): | ||
| try: | ||
| nvvm = pathfinder.load_nvidia_dynamic_lib("nvvm") | ||
| return nvvm | ||
| except pathfinder.DynamicLibNotFoundError: | ||
| # Try system search | ||
| # TODO: remove after cuda-python/1157 is resolved | ||
| path_or_none = _get_nvvm_system_path() | ||
| if path_or_none is not None: | ||
| nvvm = pathlib.Path(path_or_none) | ||
| if nvvm.exists(): | ||
| dl = pathfinder._dynamic_libs.load_nvidia_dynamic_lib.load_with_abs_path( | ||
| "nvvm", nvvm, "system-search" | ||
| ) | ||
| return dl | ||
| else: | ||
| raise pathfinder.DynamicLibNotFoundError("nvvm not found") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic error in exception handling: if path_or_none is not None but nvvm.exists() returns False (line 529), the function returns None implicitly by falling through. This will cause an AttributeError when _get_nvvm_path() (line 548) tries to access nvvm.found_via on the None value.
The exception should be raised in all failure cases:
| def _get_nvvm(): | |
| try: | |
| nvvm = pathfinder.load_nvidia_dynamic_lib("nvvm") | |
| return nvvm | |
| except pathfinder.DynamicLibNotFoundError: | |
| # Try system search | |
| # TODO: remove after cuda-python/1157 is resolved | |
| path_or_none = _get_nvvm_system_path() | |
| if path_or_none is not None: | |
| nvvm = pathlib.Path(path_or_none) | |
| if nvvm.exists(): | |
| dl = pathfinder._dynamic_libs.load_nvidia_dynamic_lib.load_with_abs_path( | |
| "nvvm", nvvm, "system-search" | |
| ) | |
| return dl | |
| else: | |
| raise pathfinder.DynamicLibNotFoundError("nvvm not found") | |
| def _get_nvvm(): | |
| try: | |
| nvvm = pathfinder.load_nvidia_dynamic_lib("nvvm") | |
| return nvvm | |
| except pathfinder.DynamicLibNotFoundError: | |
| # Try system search | |
| # TODO: remove after cuda-python/1157 is resolved | |
| path_or_none = _get_nvvm_system_path() | |
| if path_or_none is not None: | |
| nvvm = pathlib.Path(path_or_none) | |
| if nvvm.exists(): | |
| dl = pathfinder._dynamic_libs.load_nvidia_dynamic_lib.load_with_abs_path( | |
| "nvvm", nvvm, "system-search" | |
| ) | |
| return dl | |
| raise pathfinder.DynamicLibNotFoundError("nvvm not found") |
| dl = pathfinder._dynamic_libs.load_nvidia_dynamic_lib.load_with_abs_path( | ||
| "nvvm", nvvm, "system-search" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using private pathfinder API (pathfinder._dynamic_libs.load_nvidia_dynamic_lib.load_with_abs_path) is fragile and could break in future versions of cuda-pathfinder. The underscore prefix indicates this is an internal/private API not meant for external use. Consider using only the public pathfinder API or coordinating with the pathfinder maintainers to expose this functionality publicly.
| def _get_nvrtc(): | ||
| return pathfinder.load_nvidia_dynamic_lib("nvrtc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_get_nvrtc() lacks system path fallback that _get_nvvm() has. If pathfinder fails to find nvrtc (e.g., in some system installations), this will raise an exception with no fallback attempt. Consider adding similar system search fallback logic as in _get_nvvm() for consistency and robustness, or document why nvrtc doesn't need this fallback.
| if lib in {"nvrtc", "nvvm"}: | ||
| return get_cuda_paths()[lib].info or _dllnamepattern % lib | ||
| return get_cuda_paths()[lib].info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed fallback to generic library name pattern. The old code had return get_cuda_paths()[lib].info or _dllnamepattern % lib which provided a fallback to generic names like "libnvvm.so" or "libnvrtc.so", allowing the system loader to search for the library. The new code returns None if pathfinder can't find the library, which will cause ctypes.CDLL(None) to fail in open_cudalib() (line 85).
This breaks the documented behavior in the docstring (lines 49-53): "If the search fails, return a generic filename for the library...so that we may attempt to load it using the system loader's search mechanism."
Consider restoring the fallback or updating the implementation to handle None returns gracefully.
Additional Comments (1)
|
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 files reviewed, 5 comments
numba_cuda/numba/cuda/cuda_paths.py
Outdated
| def _get_nvvm(): | ||
| try: | ||
| nvvm = pathfinder.load_nvidia_dynamic_lib("nvvm") | ||
| return nvvm | ||
| except pathfinder.DynamicLibNotFoundError: | ||
| # Try system search | ||
| # TODO: remove after cuda-python/1157 is resolved | ||
| path_or_none = _get_nvvm_system_path() | ||
| if path_or_none is not None: | ||
| nvvm = pathlib.Path(path_or_none) | ||
| if nvvm.exists(): | ||
| dl = pathfinder._dynamic_libs.load_nvidia_dynamic_lib.load_with_abs_path( | ||
| "nvvm", nvvm, "system-search" | ||
| ) | ||
| return dl | ||
| raise pathfinder.DynamicLibNotFoundError("nvvm not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic error: if path_or_none is not None but the file doesn't exist (line 529 returns False), the function falls through without returning anything, implicitly returning None. This will cause an AttributeError when _get_nvvm_path() tries to access .found_via attribute on the None value.
The issue occurs when:
pathfinder.load_nvidia_dynamic_lib("nvvm")raisesDynamicLibNotFoundError_get_nvvm_system_path()returns a path (not None)nvvm.exists()returns False (file doesn't exist at that path)- Code falls through lines 530-533 without executing them
- Function returns None implicitly
Fix: raise the exception at line 534 in all cases where the library is not found, or return a default value.
| def _get_nvvm(): | |
| try: | |
| nvvm = pathfinder.load_nvidia_dynamic_lib("nvvm") | |
| return nvvm | |
| except pathfinder.DynamicLibNotFoundError: | |
| # Try system search | |
| # TODO: remove after cuda-python/1157 is resolved | |
| path_or_none = _get_nvvm_system_path() | |
| if path_or_none is not None: | |
| nvvm = pathlib.Path(path_or_none) | |
| if nvvm.exists(): | |
| dl = pathfinder._dynamic_libs.load_nvidia_dynamic_lib.load_with_abs_path( | |
| "nvvm", nvvm, "system-search" | |
| ) | |
| return dl | |
| raise pathfinder.DynamicLibNotFoundError("nvvm not found") | |
| def _get_nvvm(): | |
| try: | |
| nvvm = pathfinder.load_nvidia_dynamic_lib("nvvm") | |
| return nvvm | |
| except pathfinder.DynamicLibNotFoundError: | |
| # Try system search | |
| # TODO: remove after cuda-python/1157 is resolved | |
| path_or_none = _get_nvvm_system_path() | |
| if path_or_none is not None: | |
| nvvm = pathlib.Path(path_or_none) | |
| if nvvm.exists(): | |
| dl = pathfinder._dynamic_libs.load_nvidia_dynamic_lib.load_with_abs_path( | |
| "nvvm", nvvm, "system-search" | |
| ) | |
| return dl | |
| # Always raise if library not found | |
| raise pathfinder.DynamicLibNotFoundError("nvvm not found") |
| def _get_nvrtc(): | ||
| return pathfinder.load_nvidia_dynamic_lib("nvrtc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistency: _get_nvrtc() lacks the system path fallback that _get_nvvm() has (lines 519-534). If pathfinder.load_nvidia_dynamic_lib("nvrtc") fails, this function will raise an exception, while _get_nvvm() has a fallback to _get_nvvm_system_path().
This creates inconsistent behavior between the two library lookup functions. Either:
- Add a similar system path fallback for nvrtc, OR
- Remove the fallback from nvvm to keep them consistent
The TODO comment in _get_nvvm() (line 525) suggests the system fallback is temporary until cuda-python issue #1157 is resolved. Consider whether nvrtc needs the same workaround.
| def _get_nvrtc(): | |
| return pathfinder.load_nvidia_dynamic_lib("nvrtc") | |
| def _get_nvrtc(): | |
| try: | |
| return pathfinder.load_nvidia_dynamic_lib("nvrtc") | |
| except pathfinder.DynamicLibNotFoundError: | |
| # Try system search if pathfinder fails | |
| # Consider adding _get_nvrtc_system_path() similar to _get_nvvm_system_path() | |
| raise |
| dl = pathfinder._dynamic_libs.load_nvidia_dynamic_lib.load_with_abs_path( | ||
| "nvvm", nvvm, "system-search" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using private API: pathfinder._dynamic_libs (underscore prefix indicates private) is not part of the public pathfinder API. This is fragile and could break if pathfinder changes its internal implementation.
Consider either:
- Using only public pathfinder APIs, OR
- Filing an issue with cuda-pathfinder to expose
load_with_abs_path()as a public API if this functionality is needed
| dl = pathfinder._dynamic_libs.load_nvidia_dynamic_lib.load_with_abs_path( | |
| "nvvm", nvvm, "system-search" | |
| ) | |
| # Use public API only - consider requesting this functionality | |
| # from pathfinder maintainers if needed | |
| dl = pathfinder._dynamic_libs.load_nvidia_dynamic_lib.load_with_abs_path( | |
| "nvvm", nvvm, "system-search" | |
| ) |
numba_cuda/numba/cuda/cuda_paths.py
Outdated
| # if os.path.isfile(nvvm_path): | ||
| # return nvvm_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented-out code should be removed. This code checks if the file exists at the path before returning it. Since this check is commented out, the function returns the path even if the file doesn't exist.
This behavior is intentional according to the old implementation (same commented-out code existed before), but it's unclear why. Either:
- Remove the commented code if the check is not needed, OR
- Uncomment the check if file existence should be verified, OR
- Add a comment explaining why the path is returned without verifying existence
| # if os.path.isfile(nvvm_path): | |
| # return nvvm_path | |
| nvvm_path = os.path.join( | |
| nvvm_lib_dir, "nvvm64_40_0.dll" if IS_WIN32 else "libnvvm.so.4" | |
| ) | |
| # Return path even if file doesn't exist - system loader will search for it | |
| return nvvm_path |
Additional Comments (1)
return get_cuda_paths()[lib].info or _dllnamepattern % libNow it's just: return get_cuda_paths()[lib].infoThe old docstring explicitly states: "If the search fails, return a generic filename for the library (e.g. 'libnvvm.so' for 'nvvm') so that we may attempt to load it using the system loader's search mechanism." This fallback allowed the system loader to search standard library paths when the explicit path lookup failed. Removing it breaks this documented behavior and could cause Either restore the fallback or update the function to ensure |
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No files reviewed, no comments
|
/ok to test |
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 3 comments
| def _get_nvrtc(): | ||
| return pathfinder.load_nvidia_dynamic_lib("nvrtc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_get_nvrtc() lacks system search fallback that _get_nvvm() has. If pathfinder can't find nvrtc (e.g., in non-standard system installations), this raises DynamicLibNotFoundError during get_cuda_paths() construction. The old code gracefully returned None and allowed get_cudalib() to fall back to the system loader. This breaks existing installations where nvrtc is in a standard system path but not detected by pathfinder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, pathfinder doesn't need a custom system fallback for nvrtc, only nvvm.
| if path_or_none is not None: | ||
| nvvm = pathlib.Path(path_or_none) | ||
| if nvvm.exists(): | ||
| dl = pathfinder._dynamic_libs.load_nvidia_dynamic_lib.load_with_abs_path( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing private API pathfinder._dynamic_libs is fragile and may break with pathfinder updates. Consider using a public API or submitting a feature request to cuda-pathfinder for this functionality.
Additional Comments (1)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 files reviewed, 4 comments
| def _get_nvrtc(): | ||
| return pathfinder.load_nvidia_dynamic_lib("nvrtc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing exception handling for nvrtc library lookup. Unlike _get_nvvm() which catches pathfinder.DynamicLibNotFoundError and tries a system search fallback, _get_nvrtc() has no exception handling. This means if nvrtc is not found by pathfinder, the exception will propagate through _get_nvrtc_path() → get_cuda_paths() and cause failures in production code.
The old implementation handled this by returning None when the library wasn't found, then falling back to a generic library name pattern in get_cudalib(). Now, an uncaught exception will crash at import/initialization time instead of allowing the system loader to attempt finding the library.
Either add the same exception handling as _get_nvvm(), or ensure that pathfinder never raises for nvrtc (which seems unlikely since both libraries should have similar lookup requirements).
| if path_or_none is not None: | ||
| nvvm = pathlib.Path(path_or_none) | ||
| if nvvm.exists(): | ||
| dl = pathfinder._dynamic_libs.load_nvidia_dynamic_lib.load_with_abs_path( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing private API pathfinder._dynamic_libs.load_nvidia_dynamic_lib.load_with_abs_path(). APIs prefixed with underscore are considered private and may change without notice in pathfinder updates, breaking this code.
This should either:
- Use a public pathfinder API if one exists for loading from absolute paths
- Be documented with a TODO/FIXME explaining why the private API is needed
- Raise an issue with cuda-python to expose this as a public API
numba_cuda/numba/cuda/cuda_paths.py
Outdated
| def _get_nvvm(): | ||
| try: | ||
| nvvm = pathfinder.load_nvidia_dynamic_lib("nvvm") | ||
| return nvvm | ||
| except pathfinder.DynamicLibNotFoundError: | ||
| # Try system search | ||
| # TODO: remove after cuda-python/1157 is resolved | ||
| path_or_none = _get_nvvm_system_path() | ||
| if path_or_none is not None: | ||
| nvvm = pathlib.Path(path_or_none) | ||
| if nvvm.exists(): | ||
| dl = pathfinder._dynamic_libs.load_nvidia_dynamic_lib.load_with_abs_path( | ||
| "nvvm", nvvm, "system-search" | ||
| ) | ||
| return dl | ||
| raise pathfinder.DynamicLibNotFoundError("nvvm not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exception re-raise loses error context. When the system search fallback also fails (path_or_none is None or file doesn't exist), the code raises a new generic DynamicLibNotFoundError("nvvm not found"), discarding the original exception from pathfinder.
This makes debugging harder - users won't see why pathfinder initially failed. Consider chaining the exceptions using raise ... from original_exception or at minimum including more diagnostic info about what was tried:
raise pathfinder.DynamicLibNotFoundError(
f"nvvm not found by pathfinder or system search (tried: {path_or_none})"
) from original_exception| if lib in {"nvrtc", "nvvm"}: | ||
| return get_cuda_paths()[lib].info or _dllnamepattern % lib | ||
| return get_cuda_paths()[lib].info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaking change: Removed system loader fallback for nvrtc/nvvm. The old implementation was:
return get_cuda_paths()[lib].info or _dllnamepattern % libNow it's just:
return get_cuda_paths()[lib].infoThis breaks the documented behavior in the docstring above (lines 48-53) which states: "If the search fails, return a generic filename for the library (e.g. 'libnvvm.so' for 'nvvm') so that we may attempt to load it using the system loader's search mechanism."
The problem: If get_cuda_paths()[lib].info is None (or if pathfinder raises an exception), this function now returns None instead of the fallback pattern. When open_cudalib() calls ctypes.CDLL(None) on line 85, it will fail with a TypeError.
This is especially problematic because _get_nvrtc() doesn't catch DynamicLibNotFoundError, so the exception will propagate before we even get to this fallback logic.
Either:
- Update
_get_nvrtc()and_get_nvvm()to never return None and never raise exceptions (always succeed or provide a fallback path) - Restore the
or _dllnamepattern % libfallback - Update the docstring to reflect the new behavior where failures raise exceptions
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 files reviewed, 6 comments
| def _get_nvrtc(): | ||
| return pathfinder.load_nvidia_dynamic_lib("nvrtc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent exception handling: _get_nvrtc() lacks try-except wrapper that _get_nvvm() has (lines 520-535). If nvrtc library is not found, pathfinder.load_nvidia_dynamic_lib("nvrtc") will raise pathfinder.DynamicLibNotFoundError, which will propagate through _get_nvrtc_path() → get_cuda_paths() → callers.
This is problematic because:
- Callers like
nvvm.pyline 154 only catchOSError, notpathfinder.DynamicLibNotFoundError _get_nvvm()has fallback logic to try system paths, but_get_nvrtc()does not
Add try-except wrapper with system search fallback to match _get_nvvm() pattern, or at minimum wrap the exception to ensure it's caught by existing error handlers.
| from numba.cuda.core.config import IS_WIN32 | ||
| from numba.cuda.misc.findlib import find_lib | ||
| from numba.cuda import config | ||
| from cuda import pathfinder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical import error: cuda-pathfinder is imported unconditionally but is only listed in optional dependencies (cu12 and cu13 extras in pyproject.toml lines 30, 37). Users installing base numba-cuda without extras will get ModuleNotFoundError: No module named 'cuda' at import time.
This breaks basic package installation. Either:
- Add
cuda-pathfinder>=1.3.1to base dependencies in pyproject.toml line 24, OR - Make this import conditional with a fallback mechanism
| ) | ||
| from numba.cuda.cuda_paths import ( | ||
| _get_libdevice_path_decision, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test coverage removed: The entire TestNvvmLookUp class (originally ~50 lines testing nvvm path resolution) has been deleted. This removes test coverage for:
- nvvm library lookup from Conda environment
- nvvm library lookup from CUDA_HOME
- nvvm library lookup from system paths
Since nvvm library path resolution has been completely refactored to use pathfinder, new tests should be added to verify:
- pathfinder integration works correctly
- The fallback to system search (lines 524-534 in cuda_paths.py) functions properly
- Error handling when nvvm is not found
The import of _get_nvvm_path_decision is removed because that function no longer exists, but this leaves a testing gap for the new implementation.
| # TODO: Expose private cuda-python APIs | ||
| dl = pathfinder._dynamic_libs.load_nvidia_dynamic_lib.load_with_abs_path( | ||
| "nvvm", nvvm, "system-search" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing private pathfinder API: pathfinder._dynamic_libs.load_nvidia_dynamic_lib.load_with_abs_path() is a private API (indicated by leading underscore). This creates a fragile dependency that could break with pathfinder updates.
While the TODO comment acknowledges this, the implementation is already in production use. Consider:
- Coordinating with cuda-python maintainers to expose this as a public API
- Adding version guards to detect if the private API changes
- Wrapping this in a try-except to gracefully handle API changes
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Additional Comments (2)
return get_cuda_paths()[lib].info or _dllnamepattern % libNow it's: return get_cuda_paths()[lib].infoProblems with the new implementation:
Either restore the fallback pattern or update all callers to handle |
rwgk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When testing the suggested code, I found a bug in pathfinder:
In combination with the 1506 fix, the code suggested here works on Windows even if CUDA_HOME and CUDA_PATH are not set.
|
|
||
|
|
||
| def _get_nvvm_system_path(): | ||
| nvvm_lib_dir = get_system_ctk("nvvm") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If IS_WIN32 is true, this will always be None (see line 177 in this file; current state of this PR):
def get_system_ctk(*subdirs):
"""Return path to system-wide cudatoolkit; or, None if it doesn't exist."""
# Linux?
if not IS_WIN32:
# Is cuda alias to /usr/local/cuda?
# We are intentionally not getting versioned cuda installation.
result = os.path.join("/usr/local/cuda", *subdirs)
if os.path.exists(result):
return result
return None
return NoneSo the IS_WIN32 tests in this function further down here will certainly by false.
| def _get_nvvm(): | ||
| try: | ||
| nvvm = pathfinder.load_nvidia_dynamic_lib("nvvm") | ||
| return nvvm | ||
| except pathfinder.DynamicLibNotFoundError as e: | ||
| # Try system search | ||
| # TODO: remove after cuda-python/1157 is resolved | ||
| path_or_none = _get_nvvm_system_path() | ||
| if path_or_none is not None: | ||
| nvvm = pathlib.Path(path_or_none) | ||
| if nvvm.exists(): | ||
| # TODO: Expose private cuda-python APIs | ||
| dl = pathfinder._dynamic_libs.load_nvidia_dynamic_lib.load_with_abs_path( | ||
| "nvvm", nvvm, "system-search" | ||
| ) | ||
| return dl | ||
| raise pathfinder.DynamicLibNotFoundError("nvvm not found") from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an alternative implementation that avoids get_system_ctk(). It comes complete with a (Cursor-generated) unit test. I tested it interactively on Linux and Windows. Please copy out the meat, I believe you can plug it in here unchanged.
"""
Alternative implementation for _get_nvvm() that avoids using get_system_ctk("nvvm").
This uses nvrtc's location to infer the CUDA installation root and temporarily
sets CUDA_HOME to help pathfinder find nvvm.
"""
import os
import pathlib
from contextlib import contextmanager
# These would be imported from cuda.bindings in the real implementation
try:
from cuda import pathfinder
except ImportError:
pathfinder = None
print("Warning: cuda.bindings.pathfinder not available, using mock")
@contextmanager
def temporary_env_var(key, value):
"""Context manager to temporarily set an environment variable."""
old_value = os.environ.get(key)
os.environ[key] = value
try:
yield
finally:
if old_value is None:
del os.environ[key]
else:
os.environ[key] = old_value
def _find_cuda_home_from_lib_path(lib_path):
"""
Walk up from a library path to find a directory containing 'nvvm' subdirectory.
For example, given /usr/local/cuda/lib64/libnvrtc.so.12,
this would find /usr/local/cuda (which contains nvvm/).
Returns the path if found, None otherwise.
"""
current = pathlib.Path(lib_path).resolve()
# Walk up the directory tree
for parent in current.parents:
nvvm_subdir = parent / "nvvm"
if nvvm_subdir.is_dir():
return str(parent)
return None
def _get_nvrtc():
"""Get nvrtc library info via pathfinder."""
if pathfinder is None:
return None
return pathfinder.load_nvidia_dynamic_lib("nvrtc")
def _get_nvvm():
# Strategy:
# 1. Try pathfinder directly
# 2. If CUDA_HOME/CUDA_PATH are set, pathfinder would have found it - give up
# 3. Use nvrtc's location to infer CUDA installation root
# 4. Temporarily set CUDA_HOME and retry pathfinder
# First, try pathfinder directly
try:
return pathfinder.load_nvidia_dynamic_lib("nvvm")
except pathfinder.DynamicLibNotFoundError:
pass
# If CUDA_HOME or CUDA_PATH is set, pathfinder would have found libnvvm
# based on the environment variable(s) - nothing more we can do
if os.environ.get("CUDA_HOME") or os.environ.get("CUDA_PATH"):
return None
# Try to locate nvrtc - this library is almost certainly needed if nvvm is needed (in the context of numba-cuda)
try:
loaded_nvrtc = _get_nvrtc()
except pathfinder.DynamicLibNotFoundError:
return None
# If nvrtc was not found via system-search, we can't reliably determine
# the CUDA installation structure
if loaded_nvrtc.found_via != "system-search":
return None
# Search backward from nvrtc's location to find a directory with "nvvm" subdirectory
cuda_home = _find_cuda_home_from_lib_path(loaded_nvrtc.abs_path)
if cuda_home is None:
return None
# Temporarily set CUDA_HOME and retry pathfinder
with temporary_env_var("CUDA_HOME", cuda_home):
try:
return pathfinder.load_nvidia_dynamic_lib("nvvm")
except pathfinder.DynamicLibNotFoundError:
return None
# For testing without cuda.bindings
class MockDynamicLib:
def __init__(self, name, abs_path, found_via):
self.name = name
self.abs_path = abs_path
self.found_via = found_via
def test_find_cuda_home():
"""Test the directory walking logic."""
import tempfile
# Create a mock CUDA installation structure
with tempfile.TemporaryDirectory() as tmpdir:
cuda_root = pathlib.Path(tmpdir) / "cuda"
lib64 = cuda_root / "lib64"
nvvm = cuda_root / "nvvm"
nvvm_lib64 = nvvm / "lib64"
lib64.mkdir(parents=True)
nvvm_lib64.mkdir(parents=True)
# Create mock library files
nvrtc_path = lib64 / "libnvrtc.so.12"
nvrtc_path.touch()
nvvm_lib = nvvm_lib64 / "libnvvm.so.4"
nvvm_lib.touch()
# Test finding CUDA_HOME from nvrtc path
found_cuda_home = _find_cuda_home_from_lib_path(str(nvrtc_path))
# Compare resolved paths to handle Windows short path names (e.g., RGROSS~1)
expected = str(cuda_root.resolve())
assert found_cuda_home == expected, (
f"Expected {expected}, got {found_cuda_home}"
)
print(f"SUCCESS: Found CUDA_HOME={found_cuda_home} from nvrtc at {nvrtc_path}")
# Test that the nvvm directory exists at the found location
assert (pathlib.Path(found_cuda_home) / "nvvm").is_dir()
print(f"SUCCESS: nvvm/ subdirectory exists at {found_cuda_home}/nvvm")
if __name__ == "__main__":
print("Testing directory walking logic...")
test_find_cuda_home()
print("\nAll tests passed!")
print("\n" + "=" * 60)
print("Attempting to use actual pathfinder (if available)...")
print("=" * 60)
if pathfinder is not None:
result = _get_nvvm()
if result:
print(f"SUCCESS: Found nvvm at {result.abs_path}")
print(f" Found via: {result.found_via}")
else:
print("Could not load nvvm")
else:
print("Skipping (pathfinder not available)")
Towards #302
This PR switches to using
pathfinderto locatenvvmandnvrtcwithinnumba-cuda.