You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been using KrylovKit to compute eigenvalues/eigenvectors of symmetric matrices (both dense and sparse) using GPU acceleration, and I randomly see the following bug while using the GPU with Julia using multiple threads. I am running Julia on a cluster using 64 threads and the code below reproduces the bug. It should be noted that this bug does NOT appear consistently and only appears randomly (on my machine it isn't a very rare bug, occurring 10-30% of the time). Disabling multithreading fixes the error for me. I can see multiple (closed) threads about this here, but it seems to either not be resolved or otherwise could be made more clear to end users that multithreading isn't supported with GPU arrays.
using KrylovKit
using Random
using CUDA
using LinearAlgebra
functionmain()
seed =7
Random.seed!(seed)
N =1000
A =randn(N,N)
A = A + A'
A =cu(Symmetric(A))
x =cu(randn(N))
eigsolve(A,x,5; issymmetric =true)
eigsolve(A,x,5; issymmetric =true)
eigsolve(A,x,5; issymmetric =true)
endprintln(Threads.nthreads())
for i in1:10main()
end
error in running finalizer:ErrorException("task switch not allowed from inside gc finalizer")
ijl_error at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/rtutils.c:41
ijl_switch at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:636
try_yieldto at ./task.jl:921
wait at ./task.jl:995#wait#645 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
slowlock at ./lock.jl:156
lock at ./lock.jl:147 [inlined]
lock at ./lock.jl:227
push! at /home/jtj5311/.julia/packages/CUDA/YIj5X/lib/utils/cache.jl:55 [inlined]
#1157 at /home/jtj5311/.julia/packages/CUDA/YIj5X/lib/cublas/CUBLAS.jl:92
unknown function (ip:0x7f2320a32885)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
run_finalizer at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:318
jl_gc_run_finalizers_in_list at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:408
run_finalizers at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:454
enable_finalizers at ./gcutils.jl:157 [inlined]
unlock at ./locks-mt.jl:68 [inlined]
multiq_deletemin at ./partr.jl:168
trypoptask at ./task.jl:977
jfptr_trypoptask_75326.1 at /home/jtj5311/julia-1.10.0/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
get_next_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:329 [inlined]
ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:382
poptask at ./task.jl:985
wait at ./task.jl:994
task_done_hook at ./task.jl:675
jfptr_task_done_hook_75249.1 at /home/jtj5311/julia-1.10.0/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_finish_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:320
start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1249
error in running finalizer:ErrorException("task switch not allowed from inside gc finalizer")
ijl_error at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/rtutils.c:41
ijl_switch at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:636
try_yieldto at ./task.jl:921
wait at ./task.jl:995#wait#645 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
slowlock at ./lock.jl:156
lock at ./lock.jl:147 [inlined]
lock at ./lock.jl:227
push! at /home/jtj5311/.julia/packages/CUDA/YIj5X/lib/utils/cache.jl:55 [inlined]
#1157 at /home/jtj5311/.julia/packages/CUDA/YIj5X/lib/cublas/CUBLAS.jl:92
unknown function (ip:0x7f2320a32885)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
run_finalizer at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:318
jl_gc_run_finalizers_in_list at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:408
run_finalizers at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:454
enable_finalizers at ./gcutils.jl:157 [inlined]
unlock at ./locks-mt.jl:68 [inlined]
multiq_deletemin at ./partr.jl:168
trypoptask at ./task.jl:977
jfptr_trypoptask_75326.1 at /home/jtj5311/julia-1.10.0/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
get_next_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:329 [inlined]
ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:382
poptask at ./task.jl:985
wait at ./task.jl:994
task_done_hook at ./task.jl:675
jfptr_task_done_hook_75249.1 at /home/jtj5311/julia-1.10.0/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_finish_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:320
start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1249
ERROR: TaskFailedException
nested task error: schedule: Task not runnable
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] schedule(t::Task, arg::Any; error::Bool)
@ Base ./task.jl:851
[3] schedule
@ Base ./task.jl:849 [inlined]
[4] notify(c::Base.GenericCondition{Base.Threads.SpinLock}, arg::Any, all::Bool, error::Bool)
@ Base ./condition.jl:154
[5] notify (repeats 2 times)
@ Base ./condition.jl:148 [inlined]
[6] (::Base.var"#notifywaiters#649")(rl::ReentrantLock)
@ Base ./lock.jl:187
[7] (::Base.var"#_unlock#648")(rl::ReentrantLock)
@ Base ./lock.jl:183
[8] unlock
@ ./lock.jl:177 [inlined]
[9] lock(f::CUDA.APIUtils.var"#10#13"{…}, l::ReentrantLock)
@ Base ./lock.jl:231
[10] check_cache
@ ~/.julia/packages/CUDA/YIj5X/lib/utils/cache.jl:26 [inlined]
[11] pop!
@ ~/.julia/packages/CUDA/YIj5X/lib/utils/cache.jl:47 [inlined]
[12] (::CUDA.CUBLAS.var"#new_state#1162")(cuda::@NamedTuple{…})
@ CUDA.CUBLAS ~/.julia/packages/CUDA/YIj5X/lib/cublas/CUBLAS.jl:87
[13] #1160
@ ~/.julia/packages/CUDA/YIj5X/lib/cublas/CUBLAS.jl:106 [inlined]
[14] get!(default::CUDA.CUBLAS.var"#1160#1167"{…}, h::Dict{…}, key::CuContext)
@ Base ./dict.jl:479
[15] handle()
@ CUDA.CUBLAS ~/.julia/packages/CUDA/YIj5X/lib/cublas/CUBLAS.jl:105
[16] axpy!
@ ~/.julia/packages/CUDA/YIj5X/lib/cublas/wrappers.jl:215 [inlined]
[17] axpy!
@ ~/.julia/packages/CUDA/YIj5X/lib/cublas/linalg.jl:145 [inlined]
[18] (::KrylovKit.var"#17#19"{KrylovKit.OrthonormalBasis{…}, SubArray{…}, Vector{…}, Int64, StepRange{…}})()
@ KrylovKit ~/.julia/packages/KrylovKit/diNbc/src/orthonormal.jl:319
Stacktrace:
[1] sync_end(c::Channel{Any})
@ Base ./task.jl:448
[2] macro expansion
@ ./task.jl:480 [inlined]
[3] basistransform!(b::KrylovKit.OrthonormalBasis{CuArray{…}}, U::SubArray{Float32, 2, Matrix{…}, Tuple{…}, false})
@ KrylovKit ~/.julia/packages/KrylovKit/diNbc/src/orthonormal.jl:315
[4] eigsolve(A::Symmetric{…}, x₀::CuArray{…}, howmany::Int64, which::Symbol, alg::Lanczos{…})
@ KrylovKit ~/.julia/packages/KrylovKit/diNbc/src/eigsolve/lanczos.jl:116
[5] #eigsolve#38
@ ~/.julia/packages/KrylovKit/diNbc/src/eigsolve/eigsolve.jl:202 [inlined]
[6] eigsolve (repeats 2 times)
@ ~/.julia/packages/KrylovKit/diNbc/src/eigsolve/eigsolve.jl:180 [inlined]
[7] main()
@ Main ~/.julia/packages/KrylovKit/diNbc/src/eigsolve/julia krylovkit bug test.jl:17
[8] top-level scope
@ ~/.julia/packages/KrylovKit/diNbc/src/eigsolve/julia krylovkit bug test.jl:23
Some type information was truncated. Use `show(err)` to see complete types.
The text was updated successfully, but these errors were encountered:
I've been using KrylovKit to compute eigenvalues/eigenvectors of symmetric matrices (both dense and sparse) using GPU acceleration, and I randomly see the following bug while using the GPU with Julia using multiple threads. I am running Julia on a cluster using 64 threads and the code below reproduces the bug. It should be noted that this bug does NOT appear consistently and only appears randomly (on my machine it isn't a very rare bug, occurring 10-30% of the time). Disabling multithreading fixes the error for me. I can see multiple (closed) threads about this here, but it seems to either not be resolved or otherwise could be made more clear to end users that multithreading isn't supported with GPU arrays.
The text was updated successfully, but these errors were encountered: