You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the same environment as #673, and I suspect the same root cause, but a different crash - perhaps this one will give more information.
I am getting an assert crash in usrsctp with the assertion
sodealloc(): so_count -58769387
which certainly looks like the socket object has been freed and re-used by another allocation. In my logs I can see that usrsctp_close was called just before the crash.
The crash is in the sctp_timeout_handler, and the stack is
Thread 1 (Thread 0x7fc75affd640 (LWP 7273)):
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140494201935424) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=140494201935424) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=140494201935424, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007fc88d788476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007fc88d76e7f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x00007fc760e089c5 in terminate_non_graceful () at ./user_environment.h:100
#6 0x00007fc760e094d5 in sodealloc (so=0x7fc7f4015cf0) at user_socket.c:230
#7 0x00007fc760e09ef4 in sofree (so=0x7fc7f4015cf0) at user_socket.c:302
#8 0x00007fc760eba401 in sctp_timeout_handler (t=0x7fc7c0009538) at netinet/sctputil.c:2218
#9 0x00007fc760e1600c in sctp_handle_tick (elapsed_ticks=10) at netinet/sctp_callout.c:172
#10 0x00007fc760e16258 in user_sctp_timer_iterate (arg=0x0) at netinet/sctp_callout.c:214
#11 0x00007fc88d7dab43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#12 0x00007fc88d86ca00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
The state inside sctp_timeout_handler is
#8 0x00007fc760eba401 in sctp_timeout_handler (t=0x7fc7c0009538) at netinet/sctputil.c:2218
2218 sorele(upcall_socket);
(gdb) info locals
tv = {tv_sec = 1683580305, tv_usec = 140499344030605}
inp = 0x7fc7c0022b80
stcb = 0x7fc7c0009460
net = 0x0
tmr = 0x7fc7c0009538
op_err = 0x7fc75affcdf0
upcall_socket = 0x7fc7f4015cf0
type = 3
i = 8
secret = 0
did_output = true
released_asoc_reference = true
__func__ = "sctp_timeout_handler"
I suspect it's not very useful because the memory has probably been re-used, but in case there's anything meaningful still lingering in later fields, the socket object is
I have reproduced the same exact issue in a test suite with Gstreamers usrsctplib version, in my case the sctp-timer thread ticks with a SCTP_TIMER_TYPE_SHUTDOWN event, decrements a reference on the socket and eventually calls sofree() which triggers this same assert:
@tuexen is there anything else we can help you with to figure out this issue?
I can reproduce it very often with a test that stresses the setup and shutdown of the sockets, so it won't be a problem to dig out more information.
The userspace code in sctp_close locks SCTP_INP_WLOCK when it sets SCTP_PCB_FLAGS_SOCKET_GONE in sctp_flags; however, the userspace code to set upcall_socket in sctp_timeout_handler checks that bit in sctp_flags without, as far as I can tell, acquiring that lock. Thus, I suspect, under load, the sctp_timeout_handler thread can get suspended immediately after the check of SCTP_PCB_FLAGS_SOCKET_GONE, and a usrsctp_close call can sneak in.
However, I don't understand the lock hierarchy here. Would it be safe to lock SCTP_INP in sctp_timeout_handler, or could that cause a lock inversion?
This is the same environment as #673, and I suspect the same root cause, but a different crash - perhaps this one will give more information.
I am getting an assert crash in usrsctp with the assertion
which certainly looks like the socket object has been freed and re-used by another allocation. In my logs I can see that usrsctp_close was called just before the crash.
The crash is in the sctp_timeout_handler, and the stack is
The state inside sctp_timeout_handler is
I suspect it's not very useful because the memory has probably been re-used, but in case there's anything meaningful still lingering in later fields, the socket object is
In case it's useful, here are the dereferences of the local pointers referenced by sctp_timeout_handler:
The text was updated successfully, but these errors were encountered: