Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] single_thread + heartbeat_check_interval - terminate called without an active exception #5671

Open
mrVrAlex opened this issue Jan 23, 2025 · 3 comments

Comments

@mrVrAlex
Copy link

mrVrAlex commented Jan 23, 2025

Hello,
It seams I found bug and some potential problem.

For each worker process swoole trigger such warning when master process get SIGTERM signal:
[2025-01-23 00:18:49 *16.0] WARNING Server::stop_async_worker(): failed to push WORKER_STOP message, Error: Operation not permitted[1]
And after this in stderr I see
terminate called without an active exception (with coredump)

My settings on Swoole 6.0, PHP 8.4 ZTS, PROCESS mode

'bootstrap' => APPLICATION_PATH . '/bin/app',
'init_arguments' => function () {
    $map = new \Swoole\Thread\Map();
    return [$map];
},
'single_thread' => true, // Need true with SWOOLE_PROCESS + PHP ZTS
'user'  => 'www-data',
'group' => 'www-data',
'worker_num'    => 2,
'dispatch_mode' => 10
'reactor_num'   => swoole_cpu_num() + 6,
'max_request'       => 100_000, 
'max_request_grace' => 50_000,
'max_wait_time'           => 120, // WARNING Worker_reactor_try_to_exit() (ERRNO 9012): worker exit timeout, forced termination
'max_concurrency'         => 1600,
'worker_max_concurrency'  => 100,
'reload_async'            => true, // default value
'discard_timeout_request' => false, // just experiment
'enable_preemptive_scheduler' => true,
'enable_coroutine'            => true,
'max_coroutine'               => 10_000,
'backlog'           => 5000,
'open_cpu_affinity' => true,
'heartbeat_idle_time'      => 600,
'heartbeat_check_interval' => 30,
'max_conn'           => 10_000, // should be > ulimit -n: On NODES = 1kk
'buffer_output_size' => 128 * SizeInKiloBytes::MB, // for each worker process
'input_buffer_size'  => 4 * SizeInKiloBytes::MB, // default
'package_max_length' => 1 * SizeInKiloBytes::GB,
'open_tcp_keepalive' => true,
'tcp_keepidle'       => 30, 
'tcp_keepinterval'   => 1, 
'tcp_keepcount'      => 5, 
'open_tcp_nodelay'   => true,
'tcp_fastopen'       => true,
'tcp_defer_accept'   => 5, 
'tcp_user_timeout'   => 10 * 1000, 
'socket_timeout'         => 3 * 60,
'socket_connect_timeout' => 2,
'socket_read_timeout'    => 60,
'socket_write_timeout'   => 60,
'socket_buffer_size'     => 128 * SizeInKiloBytes::MB,

This is trace log (in DESC order):

terminate called without an active exception | 2025-01-23T13:13:51.426133349Z
-- | --
[2025-01-23 13:13:51 #7.0]	TRACE	Server::destroy(:1006): terminate reactor threads | 2025-01-23T13:13:51.425988474Z
[2025-01-23 13:13:51 $12.0]	TRACE	Factory::kill_event_workers(:88): wait worker#1[pid=16] | 2025-01-23T13:13:51.422994142Z
[2025-01-23 13:13:51 *14.0]	TRACE	ReactorEpoll::del(:128): remove event[reactor_id=0\|fd=16] | 2025-01-23T13:13:51.409302736Z
[2025-01-23 13:13:51 *16.1]	TRACE	ReactorEpoll::del(:128): remove event[reactor_id=0\|fd=14] | 2025-01-23T13:13:51.409264466Z
[2025-01-23 13:13:51 *14.0]	TRACE	ReactorEpoll::del(:128): remove event[reactor_id=0\|fd=14] | 2025-01-23T13:13:51.409234825Z
[2025-01-23 13:13:51 *16.1]	TRACE	ReactorEpoll::del(:128): remove event[reactor_id=0\|fd=12] | 2025-01-23T13:13:51.409182362Z
[2025-01-23 13:13:51 *14.0]	TRACE	ReactorEpoll::del(:128): remove event[reactor_id=0\|fd=12] | 2025-01-23T13:13:51.409177772Z
[2025-01-23 13:13:51 *16.1]	WARNING	Server::stop_async_worker(): failed to push WORKER_STOP message, Error: Operation not permitted[1] | 2025-01-23T13:13:51.409175418Z
[2025-01-23 13:13:51 *14.0]	TRACE	Timer::select(:204): timer msec=434913, round=10919 | 2025-01-23T13:13:51.409171443Z
[2025-01-23 13:13:51 *16.1]	TRACE	ReactorEpoll::del(:128): remove event[reactor_id=0\|fd=8] | 2025-01-23T13:13:51.409128412Z
[2025-01-23 13:13:51 *14.0]	WARNING	Server::stop_async_worker(): failed to push WORKER_STOP message, Error: Operation not permitted[1] | 2025-01-23T13:13:51.409126117Z
[2025-01-23 13:13:51 *14.0]	TRACE	ReactorEpoll::del(:128): remove event[reactor_id=0\|fd=6] | 2025-01-23T13:13:51.409119442Z
[2025-01-23 13:13:51 $12.0]	TRACE	Factory::kill_event_workers(:88): wait worker#0[pid=14] | 2025-01-23T13:13:51.409108554Z
[2025-01-23 13:13:51 $12.0]	TRACE	Factory::kill_event_workers(:84): kill worker#1[pid=16] | 2025-01-23T13:13:51.409063799Z
[2025-01-23 13:13:51 $12.0]	TRACE	Factory::kill_event_workers(:84): kill worker#0[pid=14] | 2025-01-23T13:13:51.409057317Z
  | 2025-01-23T13:13:51.409Z  CUSTOM LOG IN workerStop listener: STOP WORKER 0
  | 2025-01-23T13:13:51.409Z CUSTOM LOG IN workerStop listener: STOP WORKER 1
[2025-01-23 13:13:51 #7.0]	TRACE	Server::destroy(:988): release service | 2025-01-23T13:13:51.408926949Z
[2025-01-23 13:13:51 #7.0]	TRACE	ReactorEpoll::del(:128): remove event[reactor_id=0\|fd=13] | 2025-01-23T13:13:51.408841522Z
[2025-01-23 13:13:51 #7.0]	TRACE	ReactorEpoll::del(:128): remove event[reactor_id=0\|fd=11] | 2025-01-23T13:13:51.408756759Z
[2025-01-23 13:13:51 #7.0]	TRACE	Timer::select(:204): timer msec=435337, round=724 | 2025-01-23T13:13:51.408751278Z
[2025-01-23 13:13:51 #7.0]	TRACE	Server::signal_handler_shutdown(:966): shutdown end | 2025-01-23T13:13:51.408749069Z
[2025-01-23 13:13:51 #7.0]	TRACE	ReactorEpoll::del(:128): remove event[reactor_id=0\|fd=15] | 2025-01-23T13:13:51.408746873Z
[2025-01-23 13:13:51 #7.0]	TRACE	ReactorEpoll::del(:128): remove event[reactor_id=0\|fd=14] | 2025-01-23T13:13:51.408744561Z
[2025-01-23 13:13:51 #7.0]	TRACE	ReactorEpoll::del(:128): remove event[reactor_id=0\|fd=9] | 2025-01-23T13:13:51.408734170Z
[2025-01-23 13:13:51 #7.0]	TRACE	ReactorEpoll::del(:128): remove event[reactor_id=0\|fd=7] | 2025-01-23T13:13:51.408536966Z
[2025-01-23 13:13:51 #7.0]	TRACE	Timer::remove(:185): id=1, exec_msec=436000, round=0, exist=0 | 2025-01-23T13:13:51.408530276Z
[2025-01-23 13:13:51 #7.0]	TRACE	ReactorEpoll::del(:128): remove event[reactor_id=0\|fd=5] | 2025-01-23T13:13:51.408518906Z
[2025-01-23 13:13:51 #7.0]	TRACE	Server::signal_handler_shutdown(:947): shutdown begin | 2025-01-23T13:13:51.408496495Z
[2025-01-23 13:13:51 #7.0]	TRACE	Server::master_signal_handler(:1838): signal[15] Terminated: 15 triggered in 7 | 2025-01-23T13:13:51.408473259Z
[2025-01-23 13:13:51 #7.0]	TRACE	Timer::select(:214): id=1, exec_msec=435000, round=0, exist=0 | 2025-01-23T13:13:51.073089306Z
[2025-01-23 13:13:51 #7.0]	TRACE	Timer::select(:204): timer msec=435002, round=723 | 2025-01-23T13:13:51.073058993Z
[2025-01-23 13:13:50 #7.0]	TRACE	Timer::select(:214): id=1, exec_msec=434000, round=0, exist=0 | 2025-01-23T13:13:50.071994310Z
[2025-01-23 13:13:50 #7.0]	TRACE	Timer::select(:204): timer msec=434000, round=722 | 2025-01-23T13:13:50.071968618Z

Stacktrace for coredump:

#0  __restore_sigs (set=set@entry=0x7ffd274da3a0) at ./arch/x86_64/syscall_arch.h:40
#1  0x00007095aec08e1b in raise (sig=sig@entry=6) at src/signal/raise.c:11
#2  0x00007095aebec9ad in abort () at src/exit/abort.c:11
#3  0x00007095acf9cad6 in ?? () from /usr/lib/libstdc++.so.6
#4  0x00007095acfb1898 in __cxxabiv1::__terminate(void (*)()) () from /usr/lib/libstdc++.so.6
#5  0x00007095acf9c516 in std::terminate() () from /usr/lib/libstdc++.so.6
#6  0x00007095aca1d331 in std::__terminate () at /usr/include/c++/14.2.0/x86_64-alpine-linux-musl/bits/c++config.h:324
#7  std::thread::~thread (this=0x7095ac174680, __in_chrg=<optimized out>) at /usr/include/c++/14.2.0/bits/std_thread.h:182
#8  swoole::Server::~Server (this=0x7095ac173dc0, __in_chrg=<optimized out>) at /tmp/swoole/src/server/master.cc:734
#9  0x00007095acade813 in server_free_object (object=0x7095abdfe220) at /tmp/swoole/ext-src/swoole_server.cc:243
#10 0x00005ed8162f2d06 in zend_objects_store_del ()
#11 0x00005ed8162ad532 in execute_ex ()
#12 0x00005ed8162a2774 in zend_execute ()
#13 0x00005ed816312099 in zend_execute_script ()
#14 0x00005ed81619bca4 in php_execute_script_ex ()
#15 0x00005ed816314153 in ?? ()
#16 0x00005ed815e58317 in ?? ()
#17 0x00007095aebec496 in libc_start_main_stage2 (main=0x5ed815e57ff0, argc=3, argv=0x7ffd274ddea8) at src/env/__libc_start_main.c:95
#18 0x00005ed815e583b4 in _start ()
@mrVrAlex mrVrAlex changed the title WARNING during SIGTERM for HTTP server terminate called without an active exception Jan 23, 2025
@mrVrAlex
Copy link
Author

mrVrAlex commented Jan 23, 2025

I asked ChatGPT about it stacktrace:

This is a very typical situation when std::terminate() is called in the destructor of std::thread. This occurs when you destroy a std::thread object without having previously called join() or at least detach() on it. According to the C++ standard, if a std::thread destructor is called on a still "joinable" thread, it will inevitably lead to the invocation of std::terminate(). Put simply: If you have a class member of type std::thread, and the class "dies" (delete or goes out of scope) before the corresponding thread has been properly joined (join()) or detached (detach()), C++ will automatically call std::terminate(). для 11 секунд
This is a very typical scenario when std::terminate() is called from the std::thread destructor. It occurs if you destroy a std::thread object without having joined or at least detached it. According to the C++ standard, if the std::thread destructor is called on a still “joinable” thread, it is guaranteed to invoke std::terminate().
In simpler terms: If you have a class member of type std::thread, and that class “dies” (either gets deleted or goes out of scope) before the corresponding thread has been properly joined or detached, C++ will automatically call std::terminate().

So really why in

void Server::join_reactor_thread() {
    if (single_thread) {
        return;
    }

We just return for single_thread option instead make something like get_thread(0)->thread.join();?
Because for Server::start_reactor_threads() even with single_thread option we make start_heartbeat_thread() (if we set this value as in my case) - so create another thread.

@mrVrAlex mrVrAlex changed the title terminate called without an active exception [bug] single_thread + heartbeat_check_interval - terminate called without an active exception Jan 23, 2025
@mrVrAlex
Copy link
Author

About second issue
WARNING Server::stop_async_worker(): failed to push WORKER_STOP message, Error: Operation not permitted[1]
then if not use in settings (or use root)

'user'  => 'www-data',
'group' => 'www-data',

Then workers stopping without any warning.

@NathanFreeman
Copy link
Member

Since the www-data user cannot send signals to the root user, I need to consider whether it is reasonable to set the manager process to also run as www-data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants