Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion libcudacxx/include/cuda/std/__cccl/dialect.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#endif // no system header

#include <cuda/std/__cccl/builtin.h>
#include <cuda/std/__cccl/cuda_capabilities.h>
#include <cuda/std/__cccl/host_std_lib.h>

///////////////////////////////////////////////////////////////////////////////
Expand Down Expand Up @@ -111,7 +112,11 @@

// We need to treat host and device separately
#if _CCCL_DEVICE_COMPILATION() && !_CCCL_CUDA_COMPILER(NVHPC)
# define _CCCL_GLOBAL_CONSTANT _CCCL_DEVICE constexpr
# if _CCCL_HAS_RDC()
# define _CCCL_GLOBAL_CONSTANT _CCCL_DEVICE inline constexpr
# else // ^^^ _CCCL_HAS_RDC() ^^^ / vvv !_CCCL_HAS_RDC() vvv
# define _CCCL_GLOBAL_CONSTANT _CCCL_DEVICE constexpr
# endif // ^^^ !_CCCL_HAS_RDC() ^^^
#else // ^^^ _CCCL_DEVICE_COMPILATION() && !_CCCL_CUDA_COMPILER(NVHPC) ^^^ /
// vvv !_CCCL_DEVICE_COMPILATION() || _CCCL_CUDA_COMPILER(NVHPC) vvv
# define _CCCL_GLOBAL_CONSTANT inline constexpr
Comment on lines 114 to 122
Copy link
Contributor Author

@davebayer davebayer Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miscco Why do we treat nvhpc differently? The objects don't live in device memory, if we don't make them __device__ with nvhpc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that NVHPC has a single compilation pass, so it will see that the global object is used on device and instantiate it there

Copy link
Contributor Author

@davebayer davebayer Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkolsen-pgi could you tell us what's exactly happening under the hood?

What I observe is that if I set the execution space of a CPO to:

  • nothing it seems that the object is not part of the generated PTX and if I do &cpo in device code, it return nullptr
  • __device__, the object is visible in the PTX.

See https://ce.nvidia.com/z/7d77Kj. I'm not sure whether this is a problem, but it's not consistent with what other compilers produce. Because with all other compilers, we basically create the object twice - 1x on __host__ and 1x on __device__

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NVHPC's handling of constexpr variables has bugs. I can't fully explain what you are seeing.

The basic cases work well when the constexpr variable doesn't have any __host__ or __device__ annotations:

  • A constexpr scalar variable whose address is never needed will be constant folded in both host and device code. An actual object won't be generated in either host or device because it isn't needed.
  • A CPO or other stateless function object can be called just fine in both host and device code. The object isn't created correctly in device code, but it doesn't matter because the object is never actually accessed.

The thing that doesn't work is when the constexpr variable has a meaningful value and that value is accessed via a pointer to the object. Using it in host code is fine, but using it in device code usually won't compile, with nvc++ complaining about accessing a global variable from device code. If the constexpr variable is explicitly marked __device__, it will work correctly in device code, but wrong results might happen in host code.

We haven't fixed these problems because it's not easy to do and the existing behavior works well enough for the code that nvc++ needs to compile.

I think constexpr inline, without any __device__ is generally the best thing to do for NVHPC.

Expand Down
Loading