Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions #14

Closed
PallHaraldsson opened this issue Oct 30, 2023 · 8 comments
Closed

Questions #14

PallHaraldsson opened this issue Oct 30, 2023 · 8 comments

Comments

@PallHaraldsson
Copy link

Hi,

It's an intriguing approach. I see you take over similar, basically "malloc" for Julia... and you reduce allocations, but the timing is still worse. Is that maybe not in general, since that's the whole point?

I like Bumper, since there are no allocations, basically, so I'm confused why you have any at all? Probably for the output array.

Taking over similar seems clever, should it ideally be done too for zeros, ones (and fill?) or does it in fact happen implicitly? I'm thinking though, should it happen in all cases, and what you do is I think so-called type-piracy, so it's all or nothing?

I suppose this does not work for GPUs despite me seeing something Flux related. And neither Bumper.jl... but do you think it would be plausible?

@ericphanson
Copy link
Owner

ericphanson commented Oct 30, 2023

It's an intriguing approach. I see you take over similar, basically "malloc" for Julia... and you reduce allocations, but the timing is still worse. Is that maybe not in general, since that's the whole point?

It is a tiny bit better, but not much, in the Flux benchmark on the README:

# Baseline: Array
infer!(b, predictions, model, data): 0.499735 seconds (8.05 k allocations: 306.796 MiB, 6.47% gc time)
# Baseline: StrideArray
stride_data = StrideArray.(data)
infer!(b, predictions, model, stride_data): 0.364180 seconds (8.05 k allocations: 306.796 MiB, 8.32% gc time)
# Using AllocArray:
alloc_data = AllocArray.(data)
infer!(b, predictions, model, alloc_data): 0.351953 seconds (13.60 k allocations: 3.221 MiB)

i.e. using StrideArrays already provides a decent speedup here, and using AllocArrays gives a microscopic extra boost and many fewer allocations. (CheckedAllocArrays are slooow, but that are just for tests/debugging/development). (See also:

AllocArrays.jl/test/flux.jl

Lines 138 to 146 in ac1d45e

# Note: for max perf, consider
# using Functors
# model = fmap(AllocArray ∘ PtrArray, model; exclude = x -> x isa AbstractArray)
# alloc_data = AllocArray.(PtrArray.(data))
# @showtime infer!(b, predictions, model, alloc_data)
# @showtime infer!(b, predictions, model, alloc_data)
# Together, that ensure everything is an `AllocArray(PtrArray(...))`
# This seems to help with runtime although not a huge amount,
# and doesn't really help with allocations.
)

I like Bumper, since there are no allocations, basically, so I'm confused why you have any at all? Probably for the output array.

I'm not sure, they would need to be tracked down. Mutable structs could definitely cause allocations and are not tracked. Inference itself could allocate; and this code is not totally type stable, as which allocator is dispatched to is a runtime value. There also could be array allocations that evade the similar path, e.g. if some code constructs Vector{Float64}(undef, 10) directly, there's nothing we can do here.

Taking over similar seems clever, should it ideally be done too for zeros, ones (and fill?) or does it in fact happen implicitly? I'm thinking though, should it happen in all cases, and what you do is I think so-called type-piracy, so it's all or nothing?

This package does not do any type piracy. We provide a method for similar on AllocArrays (and CheckedAllocArrays), which we are allowed to add methods for.

zeros, ones, and fill could definitely allocate. zero(::AllocArray) does not though (when within with_allocator with a bump allocator), since that is defined as:

zero(x::AbstractArray{T}) where {T} = fill!(similar(x, typeof(zero(T))), zero(T))

I suppose this does not work for GPUs despite me seeing something Flux related. And neither Bumper.jl... but do you think it would be plausible?

The current implementation doesn't, but I think it could work. Even with Bumper, I think you could have an AllocBuffer backed by GPU memory. I'm not sure when it would be beneficial or not.

@PallHaraldsson
Copy link
Author

PallHaraldsson commented Oct 31, 2023

FYI: "crash"/abort here: https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_hash/9fe9346_vs_9b20aca/report.html

Seems like a non-issue if/since abort, i.e. timeout?

Any idea why aborted after less time than the "ok"?

@ericphanson
Copy link
Owner

ericphanson commented Oct 31, 2023

From the linked logs:

     Testing Running tests...
Skipping Base.active_repl
Skipping Base.active_repl_backend
sum(model, data): 0.023210 seconds (23.69 k allocations: 2.159 MiB)
sum(model, alloc_data): 0.298628 seconds (100.03 k allocations: 3.663 MiB, 92.30% gc time)
bumper_run(model, alloc_data): 0.027096 seconds (140.06 k allocations: 35.420 MiB)
Instruction does not dominate all uses!
  %161 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %158), !dbg !130
  call void @llvm.julia.gc_preserve_end(token %161), !dbg !154
Instruction does not dominate all uses!
  %161 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %158), !dbg !130
  call void @llvm.julia.gc_preserve_end(token %161), !dbg !154
Instruction does not dominate all uses!
  %161 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %158), !dbg !130
  call void @llvm.julia.gc_preserve_end(token %161), !dbg !154
Instruction does not dominate all uses!
  %161 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %158), !dbg !130
  call void @llvm.julia.gc_preserve_end(token %161), !dbg !154
Instruction does not dominate all uses!
  %161 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %158), !dbg !130
  call void @llvm.julia.gc_preserve_end(token %161), !dbg !154
Instruction does not dominate all uses!
  %161 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %158), !dbg !130
  call void @llvm.julia.gc_preserve_end(token %161), !dbg !154
Instruction does not dominate all uses!
  %161 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %158), !dbg !130
  call void @llvm.julia.gc_preserve_end(token %161), !dbg !154
Instruction does not dominate all uses!
  %161 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %158), !dbg !130
  call void @llvm.julia.gc_preserve_end(token %161), !dbg !154
julia: /source/src/llvm-lower-handlers.cpp:256: virtual bool LowerExcHandlersLegacy::runOnFunction(llvm::Function&): Assertion `!verifyFunction(F, &errs())' failed.

[27] signal (6.-6): Aborted
in expression starting at /home/pkgeval/.julia/packages/AllocArrays/ojGtM/test/flux.jl:95
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7fb47585540e)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
runOnFunction at /source/src/llvm-lower-handlers.cpp:256
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /opt/julia/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /opt/julia/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /opt/julia/bin/../lib/julia/libLLVM-14jl.so (unknown line)
operator() at /source/src/jitlayers.cpp:1152
withModuleDo<(anonymous namespace)::OptimizerT::operator()(llvm::orc::ThreadSafeModule, llvm::orc::MaterializationResponsibility&)::<lambda(llvm::Module&)> > at /source/usr/include/llvm/ExecutionEngine/Orc/ThreadSafeModule.h:136 [inlined]
operator() at /source/src/jitlayers.cpp:1117 [inlined]
CallImpl<(anonymous namespace)::OptimizerT> at /source/usr/include/llvm/ADT/FunctionExtras.h:222
_ZN4llvm3orc16IRTransformLayer4emitESt10unique_ptrINS0_29MaterializationResponsibilityESt14default_deleteIS3_EENS0_16ThreadSafeModuleE at /opt/julia/bin/../lib/julia/libLLVM-14jl.so (unknown line)
emit at /source/src/jitlayers.cpp:631
_ZN4llvm3orc31BasicIRLayerMaterializationUnit11materializeESt10unique_ptrINS0_29MaterializationResponsibilityESt14default_deleteIS3_EE at /opt/julia/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc19MaterializationTask3runEv at /opt/julia/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm6detail18UniqueFunctionBaseIvJSt10unique_ptrINS_3orc4TaskESt14default_deleteIS4_EEEE8CallImplIPFvS7_EEEvPvRS7_ at /opt/julia/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession12dispatchTaskESt10unique_ptrINS0_4TaskESt14default_deleteIS3_EE at /opt/julia/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession22dispatchOutstandingMUsEv at /opt/julia/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession17OL_completeLookupESt10unique_ptrINS0_21InProgressLookupStateESt14default_deleteIS3_EESt10shared_ptrINS0_23AsynchronousSymbolQueryEESt8functionIFvRKNS_8DenseMapIPNS0_8JITDylibENS_8DenseSetINS0_15SymbolStringPtrENS_12DenseMapInfoISF_vEEEENSG_ISD_vEENS_6detail12DenseMapPairISD_SI_EEEEEE at /opt/julia/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc25InProgressFullLookupState8completeESt10unique_ptrINS0_21InProgressLookupStateESt14default_deleteIS3_EE at /opt/julia/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession19OL_applyQueryPhase1ESt10unique_ptrINS0_21InProgressLookupStateESt14default_deleteIS3_EENS_5ErrorE at /opt/julia/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupENS0_10LookupKindERKSt6vectorISt4pairIPNS0_8JITDylibENS0_19JITDylibLookupFlagsEESaIS8_EENS0_15SymbolLookupSetENS0_11SymbolStateENS_15unique_functionIFvNS_8ExpectedINS_8DenseMapINS0_15SymbolStringPtrENS_18JITEvaluatedSymbolENS_12DenseMapInfoISI_vEENS_6detail12DenseMapPairISI_SJ_EEEEEEEEESt8functionIFvRKNSH_IS6_NS_8DenseSetISI_SL_EENSK_IS6_vEENSN_IS6_SV_EEEEEE at /opt/julia/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupERKSt6vectorISt4pairIPNS0_8JITDylibENS0_19JITDylibLookupFlagsEESaIS7_EERKNS0_15SymbolLookupSetENS0_10LookupKindENS0_11SymbolStateESt8functionIFvRKNS_8DenseMapIS5_NS_8DenseSetINS0_15SymbolStringPtrENS_12DenseMapInfoISK_vEEEENSL_IS5_vEENS_6detail12DenseMapPairIS5_SN_EEEEEE at /opt/julia/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupERKSt6vectorISt4pairIPNS0_8JITDylibENS0_19JITDylibLookupFlagsEESaIS7_EENS0_15SymbolStringPtrENS0_11SymbolStateE at /opt/julia/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupENS_8ArrayRefIPNS0_8JITDylibEEENS0_15SymbolStringPtrENS0_11SymbolStateE at /opt/julia/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupENS_8ArrayRefIPNS0_8JITDylibEEENS_9StringRefENS0_11SymbolStateE at /opt/julia/bin/../lib/julia/libLLVM-14jl.so (unknown line)
addModule at /source/src/jitlayers.cpp:1420
jl_add_to_ee at /source/src/jitlayers.cpp:1815
_jl_compile_codeinst at /source/src/jitlayers.cpp:241
jl_generate_fptr_impl at /source/src/jitlayers.cpp:460
jl_compile_method_internal at /source/src/gf.c:2348 [inlined]
jl_compile_method_internal at /source/src/gf.c:2237
_jl_invoke at /source/src/gf.c:2750 [inlined]
ijl_apply_generic at /source/src/gf.c:2940
jl_apply at /source/src/julia.h:1880 [inlined]
start_task at /source/src/task.c:1092
Allocations: 55506618 (Pool: 55465205; Big: 41413); GC: 81

Testing failed after 373.47s

I have never seen an error like that before. Maybe a Julia bug? Or an issue in StrideArrays or Bumper? Or this package?

Either way, doesn't seem like a non-issue :(

@ericphanson
Copy link
Owner

from slack, it sounds like we are hitting llvm/llvm-project#63984

@PallHaraldsson
Copy link
Author

PallHaraldsson commented Nov 3, 2023

FYI: there's also an error for 1.9, at least for latest backport PR (I didn't take a look, likely same as before):

https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_hash/1d2d440_vs_9b20aca/report.html

And maybe also related(?), or good for you to know, for the other failing package:

https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_hash/1d2d440_vs_9b20aca/NumCME.primary.log

Instruction does not dominate all uses!
%599 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %589), !dbg !539
call void @llvm.julia.gc_preserve_end(token %599), !dbg !539
julia: /source/src/llvm-alloc-opt.cpp:1188: bool {anonymous}::AllocOpt::runOnFunction(llvm::Function&, llvm::function_refllvm::DominatorTree&()): Assertion `!verifyFunction(F, &errs())' failed.

@PallHaraldsson
Copy link
Author

1.9.4 seems about to be released, but it seems it broke your package. Or did it have a bug already, but still only triggered by it? Then who's to blame?

JuliaLang/julia#52153
https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_hash/1d2d440_vs_9b20aca/AllocArrays.primary.log
JuliaLang/julia#50977

@KristofferC
Copy link

1.9.4 seems about to be released, but it seems it broke your package.

It didn't break the package. See e.g. JuliaLang/julia#50453 (comment).

@ericphanson
Copy link
Owner

closing, since I don't think there's anything to do here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants