Feature: allow return array #335

ZzEeKkAa · 2025-07-21T17:24:11Z

Vendor NopythonTypeInference pass and modify it to allow array mutations:

@cuda.jit(device=True, forceinline=True)
def slice_array(a, x_id, x_size, y_id, y_size):
    return a[
        x_id * x_size : (x_id + 1) * x_size : 1,
        y_id * y_size : (y_id + 1) * y_size : 1,
    ]

Fixes: #221

How it works

Instead of maintaining two lists of what is cast and arg values it populates whitelist of vars that may be returned. Ideally it should be upstreamed to numba, since there is the exactly same problem there. Only happens in nopython mode with nrt disabled.

Why is it safe

We are practically just making a view of an array, not creating a new array, so no memory allocation or leak are introduced.

copy-pr-bot · 2025-07-21T17:24:14Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

gmarkall · 2025-07-21T21:40:54Z

/ok to test

numba_cuda/numba/cuda/compiler.py

gmarkall · 2025-07-22T10:25:39Z

numba_cuda/numba/cuda/compiler.py

+                        if inst.value.value.name in whitelist_vars:
+                            whitelist_vars.add(inst.target.name)


Is there a danger that this misses transitively allowing variable where the blocks aren't visited in the correct order? Does propagation of allowed variables need to iterate to a fixpoint instead? I'm thinking of a case like

if cond: b = a[:, 1] c = b return c

If the block after the if is traversed first, is there a risk that returning c is disallowed?

Yeap, it is a valid point. My thoughts was that blocks are properly ordered. And you can reference variable only if it is above the usage of the same variable. Is it a thing that blocks could be not ordered/nested?

I was a bit apprehensive because I'm not certain what the ordering is, or is guaranteed to be. I'm also wondering whether phi nodes will be a problem.

I've updated to use forest of trees to eliminate any issues with block ordering

gmarkall

I think this is a good idea in principle. I have a couple of questions on the diff (and await the addition of tests).

…urn_array

ZzEeKkAa · 2025-07-22T14:22:47Z

numba_cuda/numba/cuda/tests/cudapy/test_array_return.py

+
+
+def array_local(shape, dtype):
+    return cuda.local.array(shape, dtype=dtype)


@gmarkall I know this is conceptually wrong, since we are trying to return pointer to the stack memory. However if we set it to forceinline it should turn into a valid code, but as far as I know it is against llvm design to generate invalid code that turns into valid only because of force inline. Do you know any idea how it potentially could be achieved? I have one use case in nvmath that will benefit from it.

copy-pr-bot · 2025-07-22T14:23:14Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

ZzEeKkAa · 2025-07-22T14:23:40Z

/ok to test

ZzEeKkAa · 2025-07-22T14:38:37Z

/ok to test

ZzEeKkAa · 2025-07-22T14:56:10Z

Is it me messing up the test, or it is out of scope of this MR:

Compilation is falling back to object mode WITHOUT looplifting enabled because Function "init_xoroshiro128p_states_cpu" failed type inference due to: Invalid use of type(CPUDispatcher(<function init_xoroshiro128p_state at 0x777b44828400>)) with parameters (array(Record(s0[type=uint64;offset=0],s1[type=uint64;offset=8];16;True), 1d, C), Literal[int](0), uint64)

gmarkall · 2025-07-22T15:11:35Z

Is it me messing up the test, or it is out of scope of this MR:

That's not you, the simulator always does that. It's a bit hard to fix and not really critical so it's never got to the top of the priority list.

…urn_array

ZzEeKkAa · 2025-07-25T18:43:35Z

/ok to test

…urn_array

ZzEeKkAa · 2025-07-31T16:10:00Z

/ok to test

… growth

numba_cuda/numba/cuda/tests/cudapy/test_array_return.py

gmarkall

Thanks for the fixes! I tried adding a few more test cases and I found that arguments aren't tracked through tuples - I've pushed these now, and the ones that use tuples to hold array arguments are the failing ones.

For "getitem" ops (presently handled, but they don't traverse tuples), I think it will be necessary to ensure that all of the tuple elements are an argument.
For "static_getitem" ops (presently not handled in the code), it should be sufficient to ensure that only the indexed item in the tuple is an argument.

gmarkall · 2025-08-01T11:08:16Z

/ok to test

…urn_array

ZzEeKkAa · 2025-08-05T14:12:32Z

/ok to test

ZzEeKkAa · 2025-08-05T14:47:16Z

/ok to test

gmarkall

gmarkall · 2025-08-18T10:08:17Z

numba_cuda/numba/cuda/tests/cudapy/test_array_return.py

+    @pytest.mark.xfail(reason="Returning local arrays is not yet supported")
+    @skip_on_cudasim("type inference is unsupported in the simulator")
+    def test_array_local(self):
+        @cuda.jit
+        def array_local_fp32(size):
+            return cuda.local.array(size, dtype=np.float32)
+
+        @cuda.jit
+        def kernel(r):
+            x = array_local_fp32(2)
+            x[0], x[1] = 1.0, 2.0
+
+            r[0] = x[0] + x[1]
+
+        r = np.zeros(1, dtype=np.float32)
+
+        kernel[1, 1](r)
+
+        np.testing.assert_equal(r, [3.0])
+


I don't think we should have this xfailing test - it contradicts the test_array_local_illegal test above. It's not clear to me how we could have a valid way to return a local array.

Suggested change

@pytest.mark.xfail(reason="Returning local arrays is not yet supported")

@skip_on_cudasim("type inference is unsupported in the simulator")

def test_array_local(self):

@cuda.jit

def array_local_fp32(size):

return cuda.local.array(size, dtype=np.float32)

@cuda.jit

def kernel(r):

x = array_local_fp32(2)

x[0], x[1] = 1.0, 2.0

r[0] = x[0] + x[1]

r = np.zeros(1, dtype=np.float32)

kernel[1, 1](r)

np.testing.assert_equal(r, [3.0])

In C++ it is possible with constexpr

gmarkall · 2025-08-18T10:12:52Z

numba_cuda/numba/cuda/tests/cudapy/test_array_return.py

+            return b
+
+        # c in the loop is a local array
+        # TODO: do we want to support local and shared arrays?


I don't think we want to support local and shared arrays being returned from a device function that declares them. Local arrays seem like a case we shouldn't support, but I'm less sure about shared arrays - does CUDA C++ allow you to return a shared array that a device function created?

In C++ it is possible with constexpr

…urn_array

gmarkall · 2025-08-18T10:39:31Z

/ok to test

ZzEeKkAa added 3 commits July 21, 2025 09:42

Vendor NopythonTypeInference

2085b67

Flatten CUDANopythonTypeInference

7795600

Change check logic to allow getitem mutation of the array for the return

5484cea

gmarkall added the 2 - In Progress Currently a work in progress label Jul 21, 2025

gmarkall reviewed Jul 22, 2025

View reviewed changes

numba_cuda/numba/cuda/compiler.py Outdated Show resolved Hide resolved

gmarkall reviewed Jul 22, 2025

View reviewed changes

gmarkall requested changes Jul 22, 2025

View reviewed changes

gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 2 - In Progress Currently a work in progress labels Jul 22, 2025

ZzEeKkAa added 2 commits July 22, 2025 07:18

Add tests

a59a639

Merge remote-tracking branch 'origin/main' into yhavrylko/feature/ret…

6fb5d60

…urn_array

ZzEeKkAa commented Jul 22, 2025

View reviewed changes

ZzEeKkAa changed the title ~~[WIP] Feature: allow return array~~ Feature: allow return array Jul 22, 2025

ZzEeKkAa marked this pull request as ready for review July 22, 2025 14:23

ZzEeKkAa added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Jul 22, 2025

ZzEeKkAa self-assigned this Jul 22, 2025

ZzEeKkAa requested a review from gmarkall July 22, 2025 14:24

Fix tests

a7e3ea1

ZzEeKkAa added 2 commits July 25, 2025 11:39

Merge remote-tracking branch 'origin/main' into yhavrylko/feature/ret…

654f8c7

…urn_array

Use forest to avoid issues related to block order

e71d2cd

Skip type test on cudasim

f7773b2

ZzEeKkAa added 5 commits July 31, 2025 08:17

Add support for phi and var assignment

742babe

Add view support

167be15

Fix view test

ea87c8c

Add xfail cuda.local array

1848127

Merge remote-tracking branch 'origin/main' into yhavrylko/feature/ret…

bee201e

…urn_array

Update retstmts to use set to avoid duplications and exponential size…

e76b86e

… growth

ZzEeKkAa added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Jul 31, 2025

Add more test cases

33b98ce

gmarkall reviewed Aug 1, 2025

View reviewed changes

numba_cuda/numba/cuda/tests/cudapy/test_array_return.py Show resolved Hide resolved

gmarkall requested changes Aug 1, 2025

View reviewed changes

gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Aug 1, 2025

ZzEeKkAa added 4 commits August 5, 2025 07:04

Add basic tuple support

9897518

Add loop support

13fc064

Relax tuple logic

7e06ffb

Merge remote-tracking branch 'origin/main' into yhavrylko/feature/ret…

def02eb

…urn_array

Skip typing error check on simulator

93ce7d3

gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Aug 6, 2025

gmarkall reviewed Aug 18, 2025

View reviewed changes

Merge remote-tracking branch 'NVIDIA/main' into yhavrylko/feature/ret…

f18bfa9

…urn_array

gmarkall closed this Aug 18, 2025

gmarkall reopened this Aug 18, 2025

NVIDIA deleted a comment from CLAassistant Sep 29, 2025

		if inst.value.value.name in whitelist_vars:
		whitelist_vars.add(inst.target.name)



		def array_local(shape, dtype):
		return cuda.local.array(shape, dtype=dtype)

Feature: allow return array #335

Are you sure you want to change the base?

Feature: allow return array #335

Uh oh!

Conversation

ZzEeKkAa commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How it works

Why is it safe

Uh oh!

copy-pr-bot bot commented Jul 21, 2025

Uh oh!

gmarkall commented Jul 21, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gmarkall left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

copy-pr-bot bot commented Jul 22, 2025

Uh oh!

ZzEeKkAa commented Jul 22, 2025

Uh oh!

ZzEeKkAa commented Jul 22, 2025

Uh oh!

ZzEeKkAa commented Jul 22, 2025

Uh oh!

gmarkall commented Jul 22, 2025

Uh oh!

ZzEeKkAa commented Jul 25, 2025

Uh oh!

ZzEeKkAa commented Jul 31, 2025

Uh oh!

Uh oh!

gmarkall left a comment

Choose a reason for hiding this comment

Uh oh!

gmarkall commented Aug 1, 2025

Uh oh!

ZzEeKkAa commented Aug 5, 2025

Uh oh!

ZzEeKkAa commented Aug 5, 2025

Uh oh!

gmarkall left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gmarkall commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ZzEeKkAa commented Jul 21, 2025 •

edited

Loading