Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocating empty array is 4x slower than Java #45160

Closed
oscardssmith opened this issue May 3, 2022 · 16 comments
Closed

Allocating empty array is 4x slower than Java #45160

oscardssmith opened this issue May 3, 2022 · 16 comments
Labels
performance Must go faster

Comments

@oscardssmith
Copy link
Member

oscardssmith commented May 3, 2022

This seems unreasonably slow. It feels like we should be able to do better here.

function f()
   x=Int[]
   push!(x, 23)
end
@btime f()
 29.798 ns (2 allocations: 144 bytes)

while with Java

package org.sample;

import org.openjdk.jmh.annotations.Benchmark;
import java.util.*;
public class MyBenchmark {

    @Benchmark
    public void testMethod() {
        List x = new ArrayList<Integer>();
        x.add(12);
    }

}

(using jmh to benchmark) gives 4.5ns per iteration

Note that the push! isn't the slow part, I just needed it to make sure Java wasn't removing the whole thing since without it, the Java time was sub ns per operation.

@oscardssmith oscardssmith added the performance Must go faster label May 3, 2022
@yuyichao yuyichao changed the title Allocating small objects is slow Allocating array is slow May 3, 2022
@KristofferC
Copy link
Member

What would be fast?

It feels to me that for an issue like this to be useful at all it either needs some comparison to e.g. a different language that has the same semantics for creating an empty array or an analysis of what the time is spent on and what of that is wasteful. Otherwise, issues like this can be created for virtually any function and it is unclear when they can be closed.

@bvdmitri
Copy link
Contributor

bvdmitri commented May 3, 2022

Isn't that just the performance of malloc?

julia> @btime Int[]
  27.517 ns (1 allocation: 64 bytes)
Int64[]

julia> @btime Base.Libc.malloc(0)
  28.919 ns (0 allocations: 0 bytes)
Ptr{Nothing} @0x00006002925cc010

I assume one extra allocation is due to need to keep length.

@StefanKarpinski
Copy link
Member

Arguably we should be faster than malloc though.

@bvdmitri
Copy link
Contributor

bvdmitri commented May 3, 2022

Well, it might be possible, but does it make sense to benchmark allocation function without any context? It is possible to return instantly and shift all of the complexity from allocation to the garbage collection later on (as Java does), but it does not make an application faster.

@KristofferC
Copy link
Member

Yes, as it is right now, it just feels like a blanket "Julia's allocation/GC system could be faster"-issue. Does it help getting any work done?

@oscardssmith
Copy link
Member Author

oscardssmith commented May 3, 2022

the reason I made this is I was bench-marking JuliaMath/Primes.jl#115 and saw that creating an empty Factorization (which is just a Vector{Pair{Int,Int}} was taking roughly 25% of the time to factor small integers. For an actual comparison, see the updated first post

@oscardssmith oscardssmith changed the title Allocating array is slow Allocating empty array is 4x slower than Java May 3, 2022
@oscardssmith
Copy link
Member Author

For reference, Int[] seems much slower to allocate than other objects in Julia. (these benchmarks are on a different computer than above)

julia> @btime Ref(Int);
  4.740 ns (1 allocation: 16 bytes)

julia> @btime Base._string_n(40);
  8.378 ns (1 allocation: 64 bytes)

julia> @btime Ref((0,0,0,0));
  4.760 ns (1 allocation: 48 bytes)

julia> @btime Ref((0,0,0,0,0,0));
  4.980 ns (1 allocation: 64 bytes)

julia> @btime Int[];
  21.435 ns (1 allocation: 64 bytes)

@oscardssmith
Copy link
Member Author

I think I've figured out the problem. All the time is in the C function _new_array in array.c. The problem is that since constructing an Array does a ccall, we lose a lot of static information that leads to a ton of if statements. If our arrays were implemented in Julia, a call to Int[] in a function would be able to know that the eltype of the array was constant, and use that info to figure out all of the fields of the array at compile time. Since there is a C-call, this optimization can't happen, so every time we create an array, we have to do a bunch of logic to determine information about how the array will be stored in memory, even though that information should be available at compile time.

@gbaraldi
Copy link
Member

gbaraldi commented May 5, 2022

Couldn't we pass the info to C anyway? Since array allocation is already special cased?

@omicron321
Copy link

the reason I made this is I was bench-marking JuliaMath/Primes.jl#115 and saw that creating an empty Factorization (which is just a Vector{Pair{Int,Int}} was taking roughly 25% of the time to factor small integers. For an actual comparison, see the updated first post

One solution to prevent thousands+ of allocations is to use pool of reusable objects.
Thus making allocation costs less relevant or much less preponderant.

@oscardssmith
Copy link
Member Author

That is technically possible, but really annoying. It would be way better if this was fast by default.

@jakobnissen
Copy link
Contributor

Might this be addressed by #46015 ?

@giordano
Copy link
Contributor

Partially addressed by #51319? On 1.9.3 I get

julia> using BenchmarkTools

julia> function f()
          x=Int[]
          push!(x, 23)
       end
f (generic function with 1 method)

julia> @btime f();
  50.489 ns (2 allocations: 144 bytes)

on 909bcea instead

julia> using BenchmarkTools

julia> function f()
          x=Int[]
          push!(x, 23)
       end
f (generic function with 1 method)

julia> @btime f();
  36.263 ns (2 allocations: 128 bytes)

@oscardssmith
Copy link
Member Author

@giordano can you add @btime Ref((0,0,0,0,0,0)) for comparison?

@giordano
Copy link
Contributor

Not much different

julia> @btime Ref((0,0,0,0,0,0));
  6.961 ns (1 allocation: 64 bytes) # 1.9.3
  6.971 ns (1 allocation: 64 bytes) # 909bceae8a60969dfceba4924142e5efee459d47

@oscardssmith
Copy link
Member Author

So all in all, I think this is closable. It's still a little slow, but it is noticeably faster than before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

8 participants