Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timer Thread can deadlock with usrsctp_conninput #719

Open
addisonpolcyn opened this issue Nov 24, 2024 · 1 comment
Open

Timer Thread can deadlock with usrsctp_conninput #719

addisonpolcyn opened this issue Nov 24, 2024 · 1 comment

Comments

@addisonpolcyn
Copy link

addisonpolcyn commented Nov 24, 2024

Thank you for your work Michael! Recently I discovered a deadlock happening quite commonly based upon master usrsctp. This happens from time to time when a notification happens to be processed simultaneously with incoming traffic.

Timer Stack:

[Switching to thread 262 (LWP 4064001)]
#0 0x00007fec1eeb9e2b in __lll_lock_wait () from target:/lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0 0x00007fec1eeb9e2b in __lll_lock_wait () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007fec1eeb2843 in pthread_mutex_lock () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007fec1efefcb3 in sctp_ulp_notify (notification=5, stcb=0x7fe974028ac0, error=0, data=0x7fe974003a60, so_locked=0) at netinet/sctputil.c:4282
#3 0x00007fec1eff1606 in sctp_release_pr_sctp_chunk (stcb=stcb@entry=0x7fe974028ac0, tp1=tp1@entry=0x7fe974003a60, sent=sent@entry=1 '\001', so_locked=so_locked@entry=0)
  at netinet/sctputil.c:5611
#4 0x00007fec1efdc358 in sctp_mark_all_for_resend (num_abandoned=<synthetic pointer>, num_marked=<synthetic pointer>, window_probe=0, alt=0x7fe720129c00,
  net=0x7fe720129c00, stcb=0x7fe974028ac0) at netinet/sctp_timer.c:645
#5 sctp_t3rxt_timer (inp=inp@entry=0x7fe9740338c0, stcb=stcb@entry=0x7fe974028ac0, net=net@entry=0x7fe720129c00) at netinet/sctp_timer.c:916
#6 0x00007fec1eff5adf in sctp_timeout_handler (t=t@entry=0x7fe720129d30) at netinet/sctputil.c:1905
#7 0x00007fec1ef9e10b in sctp_handle_tick (elapsed_ticks=<optimized out>) at netinet/sctp_callout.c:172
#8 0x00007fec1ef9e1db in user_sctp_timer_iterate (arg=<optimized out>) at netinet/sctp_callout.c:214
#9 0x00007fec1eeafea7 in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007fec1edcfacf in clone () from target:/lib/x86_64-linux-gnu/libc.so.6

Worker Thread Stack:

[Switching to thread 150 (LWP 4063886)]
#0 0x00007fec1eeb9e2b in __lll_lock_wait () from target:/lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0 0x00007fec1eeb9e2b in __lll_lock_wait () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007fec1eeb2843 in pthread_mutex_lock () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007fec1efee7b9 in sctp_invoke_recv_callback (inp=inp@entry=0x7fe9740338c0, stcb=stcb@entry=0x7fe974028ac0, control=control@entry=0x7fe9740548c0, 
  inp_read_lock_held=inp_read_lock_held@entry=1) at netinet/sctputil.c:5360
#3 0x00007fec1efeea9d in sctp_add_to_readq (inp=0x7fe9740338c0, stcb=0x7fe974028ac0, control=0x7fe9740548c0, sb=0x7fe974030388, end=1, inp_read_lock_held=0, so_locked=0)
  at netinet/sctputil.c:5467
#4 0x00007fec1efa6af6 in sctp_process_a_data_chunk (chk_type=<optimized out>, last_chunk=1, break_flag=<synthetic pointer>, abort_flag=0x7fe98dfe3d54, 
  high_tsn=0x7fe98dfe3ef8, net=0x7fe720129c00, chk_length=<optimized out>, offset=<optimized out>, m=0x7fe98dfe4088, asoc=0x7fe974028b18, stcb=0x7fe974028ac0)
  at netinet/sctp_indata.c:2140
#5 sctp_process_data (mm=mm@entry=0x7fe98dfe4088, iphlen=iphlen@entry=0, offset=offset@entry=0x7fe98dfe3eec, length=length@entry=36, inp=0x7fe9740338c0, 
  stcb=stcb@entry=0x7fe974028ac0, net=0x7fe720129c00, high_tsn=0x7fe98dfe3ef8) at netinet/sctp_indata.c:2792
#6 0x00007fec1efb6c85 in sctp_common_input_processing (mm=mm@entry=0x7fe98dfe4088, iphlen=iphlen@entry=0, offset=<optimized out>, offset@entry=12, 
  length=length@entry=36, src=src@entry=0x7fe98dfe4090, dst=dst@entry=0x7fe98dfe40a0, sh=0x7fe9740539f0, ch=0x7fe9740539fc, compute_crc=1 '\001', ecn_bits=0 '\000', 
  vrf_id=0, port=0) at netinet/sctp_input.c:6117
#7 0x00007fec1ef94419 in usrsctp_conninput (addr=<optimized out>, buffer=0x7fe98dfe6700, length=36, ecn_bits=<optimized out>) at user_socket.c:3320
Thread Has Wants
Timer Thread SCTP_TCB_LOCK SCTP_INP_READ_LOCK
Worker Thread (conninput) SCTP_INP_READ_LOCK SCTP_TCB_LOCK

To reproduce I add the following to sctputil.c in sctp_ulp_notify:

+	if (notification == SCTP_NOTIFY_SENT_DG_FAIL) {
+		sleep(1);
+	}
	if (notification != SCTP_NOTIFY_PARTIAL_DELVIERY_INDICATION) {
		SCTP_INP_READ_LOCK(inp);
	}

Then opened a connection and sent a high volume of messages for a few seconds to prompt the above notification to fire.

@addisonpolcyn
Copy link
Author

addisonpolcyn commented Nov 24, 2024

I do wonder if this issue is a duplicate: #649

I am also wondering if we can
a). first check if the notification events are registered before acquiring SCTP_INP_READ_LOCK (in my case we actually don't have this event registered)
b). Does the atomic decrement after the callback need the SCTP_TCB_LOCK? Alternatively, do we need to hold the SCTP_INP_READ_LOCK while calling into the sctp_invoke_recv_callback?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant