Make `StridedReinterpretArray`'s `get/setindex` pointer based. #44186

N5N3 · 2022-02-15T06:23:27Z

This PR make StridedReinterpretArray's get/setindex pure pointer based if its root parent is a Array.
Thus a "Dense" ReinterpretArray should behave more like a Array.
Some examples with better performance:

julia> a = randn(ComplexF64, 100, 100); b = randn(100); c = a * b;
julia> aa = reinterpret(Float64, a); cc = reinterpret(Float64, c);
julia> @btime LinearAlgebra.generic_matvecmul!($cc, 'N', $aa, $b, LinearAlgebra.MulAddMul(true,false));
  2.544 μs (0 allocations: 0 bytes)  # on master 116.900 μs (0 allocations: 0 bytes)

julia> f(x, y) = @inbounds @simd ivdep for i in eachindex(x,y) # ivdep here is useless on master
       x[i] = ntoh(y[i])
       end

julia> a = reinterpret(Float64, rand(UInt8, 16000));

julia> @btime f($a, $a)
  124.388 ns (0 allocations: 0 bytes) # on master 697.260 ns (0 allocations: 0 bytes)

Test has been extended thus all branches should be tested.

jishnub · 2022-04-29T05:54:08Z

@N5N3 this PR seems to provide a significant performance boost for reinterpreted arrays. If you're happy with this, would you mind recommending some reviewers so that it might move ahead?

Ref: https://discourse.julialang.org/t/reinterpretedarray-performance-even-worse-on-1-8/80102/27

if its root parent isa `Array` and it is dense like. Also add missing `pointer` for `FasterContiguousSubArray`

Since `getindex`/`setindex!` might be strided-based, we use `WrapperArray{T,N,<:Array}` to make sure the general fallback is tested thoroughly.

N5N3 · 2022-04-29T08:23:03Z

This PR is mainly for the performance of a reinterpreted (contigious) buffer.
It should be a common usage so I think specific optimizaiton is OK.
The main concern might be the safey. Hope @vtjnash has time to give some suggestion.

jishnub · 2022-11-16T10:53:15Z

Gentle bump

ronisbr · 2022-12-16T17:07:41Z

One possible use case that might benefit from this PR:

https://discourse.julialang.org/t/slowdown-with-reinterpret/91749

j-fu · 2022-12-16T17:28:08Z

In fact reinterpret helps a lot when different approaches to handle e.g. coordinate arrays for discretization grids need to be reconciled. E.g. mesh generators written in C/C++ may return a 3xn matrix of coordinates, and this could be readily reinterpreted e.g. as a Vector{Point3}, and vice versa. So I think the use case is not very exotic.

gbaraldi · 2022-12-22T19:41:20Z

@nanosoldier runtests()

nanosoldier · 2022-12-23T10:36:12Z

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

Moelf · 2023-10-10T18:21:01Z

bump, this helps greatly for some of the long-standing performance issues w.r.t. StridedReinterpretArray

vtjnash · 2023-11-01T21:17:48Z

I removed the unsafe_convert method added here, since it does not seem correct, vis a vis 0a0bd00. Let's see how CI is doing

test/testhelpers/arrayindexingtypes.jl

test/reinterpretarray.jl

N5N3 · 2023-11-02T17:16:32Z

Unfortunately, I can confirm this PR blocks Vectorization in some simple cases.

julia> typeof(aa)
Base.ReinterpretArray{Int64, 2, Float64, Matrix{Float64}, false}

julia> @btime fill!($aa, 1);
  1.590 μs (0 allocations: 0 bytes)

julia> Base.check_store(::Array) = false

julia> @btime fill!($aa, 1);
  1.010 μs (0 allocations: 0 bytes)

vtjnash · 2023-11-02T17:48:49Z

Ah okay. Is that because this intercepts at sort of a bad point, so rather than doing all-native load/store/bitcast for simple cases like Int64<->Float64, now those always go through our unsafe primitives, that are only supposed to be for C-compatibility, of unsafe load/store, and then lose out on the efficiency benefits of the native primitives?

N5N3 · 2023-11-02T18:00:26Z

I thought LLVM might prefer unsafe_store!(p, v, li) rather than unsafe_store!(p + sizeof(T) * (li - 1), v).
But the "regression" is not fixed.
unsafe_store!(p, v, li) is fully intrinsiced based, and it does make LLVM IR simpler, but LLVM still refused to generate Vectorization block.

vtjnash · 2023-11-02T18:15:05Z

I think there is roughly 2 cases here:

isprimitivetype for both source and dest
only isbitstype

For isprimitivetype, it looks like we might be only using native operations currently, and those will end in LLVM as a load/store into registers. That seems the case where it looks like it should be already capable of emitting the optimal code, so this can only make it worse by using unsafe operations which requires the compiler to destroy useful information (e.g. vectorizability).

For everything else, it looks like we always use a native load from the array too, but for other cases we know that will turn into a memcpy to a stack slot, then we do a move from that stack slot to a Ref, then we do an unsafe_load of that Ref, which we know will also turn into a memcpy to yet another stack slot, and finally we return that, which should do one final copy from that stack slot to the sret. After inlining and optimizations, most of those copies should go away (those are the same copies we would expect if this wasn't a reinterpret, but merely a copy via a Ref(x)[], so this should not be too unusual of IR)

N5N3 · 2023-11-03T15:01:22Z

Tried some bisect locally.
This PR still works fine without #51319, so this looks like a fresh "regression".

vtjnash · 2023-11-03T20:28:19Z

Sure, but we know doing stuff with unsafe_ operations is expected to perform badly. The advantage of this PR seems to be that since it was already using the bad unsafe_ operations in many cases, that sometimes there are ways to make those more direct (with unsafe_load) instead of waiting for LLVM to transform them to something roughly equivalent

N5N3 · 2023-11-07T08:37:53Z

Tried to play with LLVM a bit. Looks like LV dislikes the GC preserved MemoryRef.
Fortunately, Memory stores Ptr directly and would be vectorlizable after reinterpretation. If vectorlization does matter, then user can transform a reinterpreted Array into a reinterpreted + viewed/reshaped Memory to get full speed.

Also turn off ptr-indexing for `Array`-based storage once element size keeps identity.

vtjnash · 2023-11-07T16:29:31Z

We should probably now define that parent(::Array) = getfield(array.ref.mem), like all the other types. But a viewed/reshaped Memory typically is an Array. Although maybe you meant you also needed it to be a non-mutable object?

N5N3 · 2023-11-07T17:07:56Z

Although maybe you meant you also needed it to be a non-mutable object?

Yes, I mean Base.SubArray and Base.ReshapedArray

#44186)" This reverts commit 1972432.

N5N3 added arrays [a, r, r, a, y, s] performance Must go faster labels Feb 15, 2022

N5N3 force-pushed the Contigious branch from 2174197 to 6341e7b Compare February 15, 2022 08:55

N5N3 added 4 commits April 29, 2022 15:30

Replace @inline @propagate_inbounds with @propagate_inbounds

e40035e

Make ReinterpretArray's indexing pure pointer based

a6d1346

if its root parent isa `Array` and it is dense like. Also add missing `pointer` for `FasterContiguousSubArray`

Make StridedReinterpretArray's getindex effect-free

6cc5d9b

Extend test

75ee2e4

Since `getindex`/`setindex!` might be strided-based, we use `WrapperArray{T,N,<:Array}` to make sure the general fallback is tested thoroughly.

N5N3 force-pushed the Contigious branch from 6341e7b to 75ee2e4 Compare April 29, 2022 08:07

N5N3 requested a review from vtjnash April 29, 2022 08:15

Merge branch 'master' into Contigious

2cb1439

StefanKarpinski added the triage This should be discussed on a triage call label Dec 21, 2022

N5N3 mentioned this pull request Dec 23, 2022

Don't overload Base.unsafe_load OpenMendel/PGENFiles.jl#9

Merged

N5N3 mentioned this pull request Jan 6, 2023

Breaking up ReinterpretArray methods by typcasting type to enable more optimizations #48138

Closed

jishnub mentioned this pull request Oct 10, 2023

30x slower looping over reinterpret array #51658

Open

Moelf mentioned this pull request Oct 10, 2023

in-place broadcast (e.g. .+=) significanctly slower for reinterpreted array #48801

Open

oscardssmith added the forget me not PRs that one wants to make sure aren't forgotten label Oct 10, 2023

Merge branch 'master' into Contigious

58ec2d9

vtjnash added merge me PR is reviewed. Merge when all tests are passing and removed triage This should be discussed on a triage call forget me not PRs that one wants to make sure aren't forgotten labels Nov 1, 2023

N5N3 commented Nov 2, 2023

View reviewed changes

test/testhelpers/arrayindexingtypes.jl Outdated Show resolved Hide resolved

fix for wrappedarray

dbf0026

giordano reviewed Nov 2, 2023

View reviewed changes

test/reinterpretarray.jl Outdated Show resolved Hide resolved

exclude unrelated change and fix whitespace

1078fec

N5N3 removed the merge me PR is reviewed. Merge when all tests are passing label Nov 2, 2023

N5N3 marked this pull request as draft November 2, 2023 17:18

Allow ReshapedArray as wrapper and Memory as storage

a629884

Also turn off ptr-indexing for `Array`-based storage once element size keeps identity.

N5N3 marked this pull request as ready for review November 7, 2023 10:54

N5N3 merged commit 1972432 into JuliaLang:master Nov 8, 2023
5 of 7 checks passed

N5N3 deleted the Contigious branch November 8, 2023 15:36

vtjnash added a commit that referenced this pull request Nov 9, 2023

Revert "Make StridedReinterpretArray's get/setindex pointer based. (

a8eda35

#44186)" This reverts commit 1972432.

vtjnash mentioned this pull request Nov 9, 2023

Revert "Make StridedReinterpretArray's get/setindex pointer based." #52101

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `StridedReinterpretArray`'s `get/setindex` pointer based. #44186

Make `StridedReinterpretArray`'s `get/setindex` pointer based. #44186

N5N3 commented Feb 15, 2022

jishnub commented Apr 29, 2022 •

edited

Loading

N5N3 commented Apr 29, 2022 •

edited

Loading

jishnub commented Nov 16, 2022

ronisbr commented Dec 16, 2022

j-fu commented Dec 16, 2022

gbaraldi commented Dec 22, 2022

nanosoldier commented Dec 23, 2022

Moelf commented Oct 10, 2023

vtjnash commented Nov 1, 2023

N5N3 commented Nov 2, 2023 •

edited

Loading

vtjnash commented Nov 2, 2023

N5N3 commented Nov 2, 2023

vtjnash commented Nov 2, 2023

N5N3 commented Nov 3, 2023

vtjnash commented Nov 3, 2023

N5N3 commented Nov 7, 2023 •

edited

Loading

vtjnash commented Nov 7, 2023

N5N3 commented Nov 7, 2023

Make StridedReinterpretArray's get/setindex pointer based. #44186

Make StridedReinterpretArray's get/setindex pointer based. #44186

Conversation

N5N3 commented Feb 15, 2022

jishnub commented Apr 29, 2022 • edited Loading

N5N3 commented Apr 29, 2022 • edited Loading

jishnub commented Nov 16, 2022

ronisbr commented Dec 16, 2022

j-fu commented Dec 16, 2022

gbaraldi commented Dec 22, 2022

nanosoldier commented Dec 23, 2022

Moelf commented Oct 10, 2023

vtjnash commented Nov 1, 2023

N5N3 commented Nov 2, 2023 • edited Loading

vtjnash commented Nov 2, 2023

N5N3 commented Nov 2, 2023

vtjnash commented Nov 2, 2023

N5N3 commented Nov 3, 2023

vtjnash commented Nov 3, 2023

N5N3 commented Nov 7, 2023 • edited Loading

vtjnash commented Nov 7, 2023

N5N3 commented Nov 7, 2023

Make `StridedReinterpretArray`'s `get/setindex` pointer based. #44186

Make `StridedReinterpretArray`'s `get/setindex` pointer based. #44186

jishnub commented Apr 29, 2022 •

edited

Loading

N5N3 commented Apr 29, 2022 •

edited

Loading

N5N3 commented Nov 2, 2023 •

edited

Loading

N5N3 commented Nov 7, 2023 •

edited

Loading