-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make StridedReinterpretArray
's get/setindex
pointer based.
#44186
Conversation
@N5N3 this PR seems to provide a significant performance boost for reinterpreted arrays. If you're happy with this, would you mind recommending some reviewers so that it might move ahead? Ref: https://discourse.julialang.org/t/reinterpretedarray-performance-even-worse-on-1-8/80102/27 |
if its root parent isa `Array` and it is dense like. Also add missing `pointer` for `FasterContiguousSubArray`
Since `getindex`/`setindex!` might be strided-based, we use `WrapperArray{T,N,<:Array}` to make sure the general fallback is tested thoroughly.
This PR is mainly for the performance of a reinterpreted (contigious) buffer. |
Gentle bump |
One possible use case that might benefit from this PR: https://discourse.julialang.org/t/slowdown-with-reinterpret/91749 |
In fact reinterpret helps a lot when different approaches to handle e.g. coordinate arrays for discretization grids need to be reconciled. E.g. mesh generators written in C/C++ may return a 3xn matrix of coordinates, and this could be readily reinterpreted e.g. as a Vector{Point3}, and vice versa. So I think the use case is not very exotic. |
@nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
bump, this helps greatly for some of the long-standing performance issues w.r.t. |
I removed the |
Unfortunately, I can confirm this PR blocks Vectorization in some simple cases. julia> typeof(aa)
Base.ReinterpretArray{Int64, 2, Float64, Matrix{Float64}, false}
julia> @btime fill!($aa, 1);
1.590 μs (0 allocations: 0 bytes)
julia> Base.check_store(::Array) = false
julia> @btime fill!($aa, 1);
1.010 μs (0 allocations: 0 bytes) |
Ah okay. Is that because this intercepts at sort of a bad point, so rather than doing all-native load/store/bitcast for simple cases like Int64<->Float64, now those always go through our unsafe primitives, that are only supposed to be for C-compatibility, of unsafe load/store, and then lose out on the efficiency benefits of the native primitives? |
I thought LLVM might prefer |
I think there is roughly 2 cases here:
For isprimitivetype, it looks like we might be only using native operations currently, and those will end in LLVM as a load/store into registers. That seems the case where it looks like it should be already capable of emitting the optimal code, so this can only make it worse by using unsafe operations which requires the compiler to destroy useful information (e.g. vectorizability). For everything else, it looks like we always use a native load from the array too, but for other cases we know that will turn into a memcpy to a stack slot, then we do a move from that stack slot to a Ref, then we do an unsafe_load of that Ref, which we know will also turn into a memcpy to yet another stack slot, and finally we return that, which should do one final copy from that stack slot to the sret. After inlining and optimizations, most of those copies should go away (those are the same copies we would expect if this wasn't a reinterpret, but merely a copy via a |
Tried some bisect locally. |
Sure, but we know doing stuff with |
Tried to play with LLVM a bit. Looks like LV dislikes the GC preserved |
Also turn off ptr-indexing for `Array`-based storage once element size keeps identity.
We should probably now define that |
Yes, I mean |
This PR make
StridedReinterpretArray
'sget/setindex
pure pointer based if its root parent is aArray
.Thus a "Dense"
ReinterpretArray
should behave more like aArray
.Some examples with better performance:
Test has been extended thus all branches should be tested.