-
Notifications
You must be signed in to change notification settings - Fork 54
Feature: allow return array #335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
numba_cuda/numba/cuda/compiler.py
Outdated
| if inst.value.value.name in whitelist_vars: | ||
| whitelist_vars.add(inst.target.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a danger that this misses transitively allowing variable where the blocks aren't visited in the correct order? Does propagation of allowed variables need to iterate to a fixpoint instead? I'm thinking of a case like
if cond:
b = a[:, 1]
c = b
return cIf the block after the if is traversed first, is there a risk that returning c is disallowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap, it is a valid point. My thoughts was that blocks are properly ordered. And you can reference variable only if it is above the usage of the same variable. Is it a thing that blocks could be not ordered/nested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit apprehensive because I'm not certain what the ordering is, or is guaranteed to be. I'm also wondering whether phi nodes will be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated to use forest of trees to eliminate any issues with block ordering
gmarkall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good idea in principle. I have a couple of questions on the diff (and await the addition of tests).
|
|
||
|
|
||
| def array_local(shape, dtype): | ||
| return cuda.local.array(shape, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gmarkall I know this is conceptually wrong, since we are trying to return pointer to the stack memory. However if we set it to forceinline it should turn into a valid code, but as far as I know it is against llvm design to generate invalid code that turns into valid only because of force inline. Do you know any idea how it potentially could be achieved? I have one use case in nvmath that will benefit from it.
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
|
/ok to test |
|
Is it me messing up the test, or it is out of scope of this MR: |
That's not you, the simulator always does that. It's a bit hard to fix and not really critical so it's never got to the top of the priority list. |
|
/ok to test |
|
/ok to test |
gmarkall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes! I tried adding a few more test cases and I found that arguments aren't tracked through tuples - I've pushed these now, and the ones that use tuples to hold array arguments are the failing ones.
- For "getitem" ops (presently handled, but they don't traverse tuples), I think it will be necessary to ensure that all of the tuple elements are an argument.
- For "static_getitem" ops (presently not handled in the code), it should be sufficient to ensure that only the indexed item in the tuple is an argument.
|
/ok to test |
|
/ok to test |
|
/ok to test |
gmarkall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<posting a review because I have a couple of pending comments and don't want to lose them if I push an update>
| @pytest.mark.xfail(reason="Returning local arrays is not yet supported") | ||
| @skip_on_cudasim("type inference is unsupported in the simulator") | ||
| def test_array_local(self): | ||
| @cuda.jit | ||
| def array_local_fp32(size): | ||
| return cuda.local.array(size, dtype=np.float32) | ||
|
|
||
| @cuda.jit | ||
| def kernel(r): | ||
| x = array_local_fp32(2) | ||
| x[0], x[1] = 1.0, 2.0 | ||
|
|
||
| r[0] = x[0] + x[1] | ||
|
|
||
| r = np.zeros(1, dtype=np.float32) | ||
|
|
||
| kernel[1, 1](r) | ||
|
|
||
| np.testing.assert_equal(r, [3.0]) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should have this xfailing test - it contradicts the test_array_local_illegal test above. It's not clear to me how we could have a valid way to return a local array.
| @pytest.mark.xfail(reason="Returning local arrays is not yet supported") | |
| @skip_on_cudasim("type inference is unsupported in the simulator") | |
| def test_array_local(self): | |
| @cuda.jit | |
| def array_local_fp32(size): | |
| return cuda.local.array(size, dtype=np.float32) | |
| @cuda.jit | |
| def kernel(r): | |
| x = array_local_fp32(2) | |
| x[0], x[1] = 1.0, 2.0 | |
| r[0] = x[0] + x[1] | |
| r = np.zeros(1, dtype=np.float32) | |
| kernel[1, 1](r) | |
| np.testing.assert_equal(r, [3.0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In C++ it is possible with constexpr
| return b | ||
|
|
||
| # c in the loop is a local array | ||
| # TODO: do we want to support local and shared arrays? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to support local and shared arrays being returned from a device function that declares them. Local arrays seem like a case we shouldn't support, but I'm less sure about shared arrays - does CUDA C++ allow you to return a shared array that a device function created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In C++ it is possible with constexpr
|
/ok to test |
Vendor
NopythonTypeInferencepass and modify it to allow array mutations:Fixes: #221
How it works
Instead of maintaining two lists of what is cast and arg values it populates whitelist of vars that may be returned. Ideally it should be upstreamed to
numba, since there is the exactly same problem there. Only happens in nopython mode with nrt disabled.Why is it safe
We are practically just making a view of an array, not creating a new array, so no memory allocation or leak are introduced.