[WIP] Add CUDAArray type and implementation with addresspace information #236

ZzEeKkAa · 2025-05-01T19:08:17Z

Replace internal usage of types.Array towards cuda specific array type CUDAArray that can handle addrespace. The idea is to help the nvvm compiler recognize address space for the memory load/store operation. The issue is that in complex workloads compiler loses track of address space and produce general purpose instructions instead of memory specific one. One such example is device GEMM. This PR results in LDS shared memory specific instruction generated instead of general purpose LD.E.

List of changes happened after transitioning to CUDAArray:

introduce CUDAArray type and model that supports address space pointer;
all arrays inside cuda.jit are now using CUDAArray instead of types.Array. That breaks some api like requesting implementation with general purpose array signature. Thats why many tests had to be updated;
atomics was updated to use nvvm intrinsics specific to the memory address: https://docs.nvidia.com/cuda/nvvm-ir-spec/#atomic .

Future improvements that unblocked:

get rid of memory_info data field for cuda array. The purpose was to have it like a smart pointer when going back and force between python and compiled code. For kernels we have only compiled code, so we don't need this field ;
implement address space validation ;
implement static size arrays, so that size could be resolved at compile time .

TODO:

Fix rest of the tests.
Add auto type casting when passing specific address space pointer array to a function that accepts generic address space pointer array.

…a_array_type

copy-pr-bot · 2025-07-22T21:19:34Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

ZzEeKkAa · 2025-07-22T21:24:18Z

/ok to test

…a_array_type

gmarkall · 2025-07-28T08:29:38Z

/ok to test

gmarkall

I haven't fully reviewed this yet but want to discuss thoughts so far - some of them are marked on the diff.

I think the main design concern I have at the moment is with trying to ensure that all array types in kernels are a CUDAArray type instead of an Array type - I think this might impact launch latency and have a lot of edge cases we need to find. Is an alternative path to keep Array types coexisting with CUDAArray types in kernels, but treat Array types as being in the generic address space? The idea here is to leave the decorator and dispatcher logic unchanged so we don't have to try and make CUDAArray types in the critical path of a launch.

gmarkall · 2025-07-28T08:40:59Z

numba_cuda/numba/cuda/types.py

+                type_name = "readonly " + type_name
+            if not self.aligned:
+                type_name = "unaligned " + type_name
+            self.name = "%s(%s, %sd, %s, addrspace(%d))" % (


As the address spaces in nvvm.py are just integers, might it be worth converting them to be an enum class so that it's easier to get the array type to print like

array(int64, 1, 'C', SHARED)

instead of

array(int64, 1, 'C', addrspace(3))

?

It might make interactive debug / development a little easier without having to mentally translate the address space numbers to names - there aren't to many uses of the address spaces so I would hope that updating the uses (if necessary) wouldn't be too burdensome.

gmarkall · 2025-07-28T08:43:49Z

numba_cuda/numba/cuda/types.py

    # dispatcher type in future.
+
+
+class CUDAArray(types.Array):


Might the mangling_args property need implementing as well? Two methods that differ only in the address space of an array could end up mangling to the same name and potentially creating a symbol clash.

For example:

from numba.core.itanium_mangler import mangle_type from numba.cuda.types import CUDAArray from numba import types shared_array = CUDAArray(types.int64, 1, 'C', addrspace=3) generic_array = CUDAArray(types.int64, 1, 'C', addrspace=0) shared_mangled = mangle_type(shared_array) generic_mangled = mangle_type(generic_array) print(shared_mangled) print(generic_mangled) assert shared_mangled != generic_mangled

gives

9CUDAArrayIxLi1E1C7mutable7alignedE 9CUDAArrayIxLi1E1C7mutable7alignedE Traceback (most recent call last): File "/home/gmarkall/numbadev/issues/numba-cuda-236/mangle_test.py", line 13, in <module> assert shared_mangled != generic_mangled ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError

Similarly, what about the unification (the unify() method) and conversion (can_convert_to())? If unify() is not implemented, then all CUDA arrays will end up unifying to Array types instead, even if the set of types to unify were all in the same address space.

Conversions will also lose address space information, or perhaps even allow invalid conversions - I think we should not allow conversion from shared to local address space, for example, but conversions to the generic address space should always be OK.

gmarkall · 2025-07-28T09:06:43Z

numba_cuda/numba/cuda/dispatcher.py

        # the CUDA Array Interface.
        try:
-            return typeof(val, Purpose.argument)
+            tp = typeof(val, Purpose.argument)


I'm concerned this could have a non-trivial impact on kernel launch time. Can you do a microbenchmark to check how much this impacts the latency of launches with various numbers of array arguments?

gmarkall · 2025-07-28T09:16:13Z

all arrays inside cuda.jit are now using CUDAArray instead of types.Array. That breaks some api like requesting implementation with general purpose array signature. Thats why many tests had to be updated;

I think this will break user code too - if we can find a way to avoid doing that I'd strongly prefer to.

ZzEeKkAa · 2025-07-28T14:54:17Z

all arrays inside cuda.jit are now using CUDAArray instead of types.Array. That breaks some api like requesting implementation with general purpose array signature. Thats why many tests had to be updated;

I think this will break user code too - if we can find a way to avoid doing that I'd strongly prefer to.

Yes, that's what I'm worry about. Still thinking about potential ways to avoid it

CLAassistant · 2025-08-20T02:00:11Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

This change adds "dwarfAddressSpace" attribute to debug metadata for CUDA shared memory pointers, enabling debuggers to correctly identify memory location of variables. I choose to add address space tracking in the lowering phase, rather than modifying the underlying typing infrastructure (ArrayModel, PointerModel) due to the following reasons: 1) There is an onging effort decoupling from Numba's typing system, but the default behavior is still redirect to Numba; 2) There is a WIP [PR#236](#236) introducing CUDAArray type and implementation with addresspace information. When either of the above is completed, there will be a cleaner approach to update this patch. So in this change, 1) Add detection in CUDALower Numba ir.Call to find cuda.shared.array() call; set flag for the subsequent storevar() to record the name / addrespace mapping; later reference the address space map when emitting debug info. 2) A mapping from NVVM address space to DWARF address class is added in order to emit the "dwarfAddressSpace" to the DIDerivedType for pointer member "data" from the CUDA array descriptor. 3) A new test is added to make sure shared array and regular local array get distinguished. This fixes nvbug#5643016. --------- Co-authored-by: Graham Markall <[email protected]>

Add CUDAArray type and implementation

3988f90

ZzEeKkAa changed the title ~~Add CUDAArray type and implementation~~ Add CUDAArray type and implementation with addresspace information May 1, 2025

gmarkall added the 2 - In Progress Currently a work in progress label May 2, 2025

Merge remote-tracking branch 'origin/main' into yhavrylko/feature/cud…

4032a3d

…a_array_type

ZzEeKkAa added 6 commits July 24, 2025 08:15

Merge remote-tracking branch 'origin/main' into yhavrylko/feature/cud…

1312645

…a_array_type

Add addrspace support for atomic

f55cba7

Use CUDAArray instaed of types.Array

3da13e1

Fix tests

dfd5642

Update atomics to use address space

f96740f

Merge remote-tracking branch 'origin/main' into yhavrylko/feature/cud…

671a4da

…a_array_type

ZzEeKkAa changed the title ~~Add CUDAArray type and implementation with addresspace information~~ [WIP] Add CUDAArray type and implementation with addresspace information Jul 25, 2025

ZzEeKkAa requested review from atmnp and gmarkall July 25, 2025 18:06

ZzEeKkAa added improvement Improves an existing functionality breaking Introduces a breaking change labels Jul 25, 2025

gmarkall reviewed Jul 28, 2025

View reviewed changes

gmarkall mentioned this pull request Nov 17, 2025

feat: add support for cache-hinted load and store operations #587

Merged

10 tasks

jiel-nv mentioned this pull request Nov 18, 2025

Add DWARF address class support for shared memory arrays #594

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Add CUDAArray type and implementation with addresspace information #236

[WIP] Add CUDAArray type and implementation with addresspace information #236

Uh oh!

ZzEeKkAa commented May 1, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Jul 22, 2025

Uh oh!

ZzEeKkAa commented Jul 22, 2025

Uh oh!

gmarkall commented Jul 28, 2025

Uh oh!

gmarkall left a comment

Uh oh!

gmarkall Jul 28, 2025

Uh oh!

gmarkall Jul 28, 2025

Uh oh!

gmarkall Jul 28, 2025

Uh oh!

gmarkall Jul 28, 2025

Uh oh!

gmarkall commented Jul 28, 2025

Uh oh!

ZzEeKkAa commented Jul 28, 2025

Uh oh!

CLAassistant commented Aug 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[WIP] Add CUDAArray type and implementation with addresspace information #236

Are you sure you want to change the base?

[WIP] Add CUDAArray type and implementation with addresspace information #236

Uh oh!

Conversation

ZzEeKkAa commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Jul 22, 2025

Uh oh!

ZzEeKkAa commented Jul 22, 2025

Uh oh!

gmarkall commented Jul 28, 2025

Uh oh!

gmarkall left a comment

Choose a reason for hiding this comment

Uh oh!

gmarkall Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

gmarkall Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

gmarkall Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

gmarkall Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

gmarkall commented Jul 28, 2025

Uh oh!

ZzEeKkAa commented Jul 28, 2025

Uh oh!

CLAassistant commented Aug 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ZzEeKkAa commented May 1, 2025 •

edited

Loading