You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When using threaded mode in filter_multiline, segmentation faults or deadlocks are occurring randomly (especially in high load situations).
I assume this is caused by missing thread-safe implementation within the flb_log_event_encoder functions.
There is also an auto-closed issue #6728, together with an open and outdated PR from @nokute78#6765 which are describing a similar issue, which is obviously still not fixed.
Example deadlock stacktraces:
flb_log_event_encoder_commit_record
Thread 57 (Thread 0x7fbe132dc6c0 (LWP 113) "flb-in-tail.47-"):
#0 futex_wait (private=0, expected=2, futex_word=0x7fbe4ec16708) at ../sysdeps/nptl/futex-internal.h:146
#1 __GI___lll_lock_wait (futex=futex@entry=0x7fbe4ec16708, private=0) at ./nptl/lowlevellock.c:49
#2 0x00007fbe505a90f1 in lll_mutex_lock_optimized (mutex=0x7fbe4ec16708) at ./nptl/pthread_mutex_lock.c:48
#3 ___pthread_mutex_lock (mutex=0x7fbe4ec16708) at ./nptl/pthread_mutex_lock.c:93
#4 0x00005648f0d551d1 in ?? ()
#5 0x00005648f0d625b0 in ?? ()
#6 0x00005648f0cf2417 in ?? ()
#7 0x00005648f0dd4436 in flb_log_event_encoder_dynamic_field_scope_leave ()
#8 0x00005648f0dd465d in flb_log_event_encoder_dynamic_field_flush ()
#9 0x00005648f0dd2ac6 in flb_log_event_encoder_commit_record ()
#10 0x00005648f0db459d in flb_ml_flush_stream_group ()
#11 0x00005648f0dd6627 in flb_ml_rule_process ()
#12 0x00005648f0db4f9b in ?? ()
#13 0x00005648f0db5458 in ?? ()
#14 0x00005648f0db573d in flb_ml_append_object ()
#15 0x00005648f0eb7963 in ?? ()
#16 0x00005648f0da95bb in flb_processor_run ()
#17 0x00005648f0dcc8e7 in ?? ()
#18 0x00005648f0dcca6c in flb_input_log_append_skip_processor_stages ()
#19 0x00005648f0ebe3dc in ?? ()
#20 0x00005648f0da95bb in flb_processor_run ()
#21 0x00005648f0dcc8e7 in ?? ()
#22 0x00005648f0dcca9d in flb_input_log_append_records ()
#23 0x00005648f0e0b516 in flb_tail_file_chunk ()
#24 0x00005648f0e05c57 in in_tail_collect_event ()
flb_log_event_encoder_dynamic_field_reset
Thread 153 (Thread 0x7fbe4f67f6c0 (LWP 17) "flb-pipeline"):
#0 futex_wait (private=0, expected=2, futex_word=0x7fbe4ec16708) at ../sysdeps/nptl/futex-internal.h:146
#1 __GI___lll_lock_wait (futex=futex@entry=0x7fbe4ec16708, private=0) at ./nptl/lowlevellock.c:49
#2 0x00007fbe505a90f1 in lll_mutex_lock_optimized (mutex=0x7fbe4ec16708) at ./nptl/pthread_mutex_lock.c:48
#3 ___pthread_mutex_lock (mutex=0x7fbe4ec16708) at ./nptl/pthread_mutex_lock.c:93
#4 0x00005648f0d551d1 in ?? ()
#5 0x00005648f0d625b0 in ?? ()
#6 0x00005648f0cf2417 in ?? ()
#7 0x00005648f0dd4436 in flb_log_event_encoder_dynamic_field_scope_leave ()
#8 0x00005648f0dd46aa in flb_log_event_encoder_dynamic_field_reset ()
#9 0x00005648f0dd2891 in flb_log_event_encoder_reset_record ()
#10 0x00005648f0dd2979 in flb_log_event_encoder_emit_record ()
#11 0x00005648f0db459d in flb_ml_flush_stream_group ()
#12 0x00005648f0db4cd5 in flb_ml_flush_parser_instance ()
#13 0x00005648f0db4d91 in flb_ml_flush_pending ()
#14 0x00005648f0da0446 in flb_sched_event_handler ()
#15 0x00005648f0d9c7c8 in flb_engine_start ()
#16 0x00005648f0d79268 in ?? ()
#17 0x00007fbe505a5a94 in start_thread (arg=) at ./nptl/pthread_create.c:447
#18 0x00007fbe50632c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
and similar stacktraces for other flb_log_event_encoder functions.
Example stacktrace for segmentation fault crash:
[2025/01/09 08:36:09] [engine] caught signal (SIGSEGV)
[2025/01/09 08:36:09] [engine] caught signal (SIGSEGV)
#0 0x55a8643027c8 in cfl_list_add_before() at lib/cfl/include/cfl/cfl_list.h:130
#1 0x55a864302832 in cfl_list_prepend() at lib/cfl/include/cfl/cfl_list.h:154
#2 0x55a8643063f2 in flb_log_event_encoder_dynamic_field_scope_enter() at src/flb_log_event_encoder_dynamic_field.c:67
#3 0x55a864306524 in flb_log_event_encoder_dynamic_field_begin_array() at src/flb_log_event_encoder_dynamic_field.c:124
#4 0x55a8642fbab2 in flb_log_event_encoder_emit_record() at src/flb_log_event_encoder.c:168
#5 0x55a8642fbd1c in flb_log_event_encoder_commit_record() at src/flb_log_event_encoder.c:267
#6 0x55a8642806a0 in flb_ml_flush_stream_group() at src/multiline/flb_ml.c:1505
#7 0x55a86427d92a in flb_ml_flush_parser_instance() at src/multiline/flb_ml.c:117
#8 0x55a86427d9e0 in flb_ml_flush_pending() at src/multiline/flb_ml.c:137
#9 0x55a86427da93 in cb_ml_flush_timer() at src/multiline/flb_ml.c:163
#10 0x55a864225b73 in flb_sched_event_handler() at src/flb_scheduler.c:624
#11 0x55a864216cf7 in flb_engine_start() at src/flb_engine.c:1044
#12 0x55a8641ae5d4 in flb_lib_worker() at src/flb_lib.c:763
#13 0x7f2ac7abaa93 in start_thread() at c:447
#14 0x7f2ac7b47c3b in clone3() at inux/x86_64/clone3.S:78
#15 0xffffffffffffffff in ???() at ???:0
@nokute78 (cc @edsiper) Was there a reason for #6765 not to be merged (and updated to current code base)?
To Reproduce
Use tail input plugin (we use globs for multiple files)
Use multiline filter with threaded mode enabled
Put enough load on it and watch it crash/see deadlock in gdb (e.g. use: gdb -p <pid> --batch -ex "thread apply all bt" -ex "detach" -ex "quit")
Your Environment
Version used: 3.2.4 (but the issue exists since many versions)
Maybe related:
As I read in the announcement of v2.0.2, the memory ring buffer mem_buf_limit should be no less than 20M in size. As far as I understand the code, the in_emitter is used with memrb in case of threaded multiline filter.
However, as I've already mentioned in #8473, there is this strange (and most probably wrong) assignment:
The default value for the flush frequency is 2000, so I assume this would set the ring buffer size to only 2k. Can you please verify this @nokute78@edsiper@leonardo-albertovich@pwhelan
The text was updated successfully, but these errors were encountered:
@nokute78@edsiper Our recent observations indicate that, in addition to segfaults and deadlocks, we are also experiencing log corruption, where log entries are getting mixed up. This appears to be a significant issue
Bug Report
Describe the bug
When using threaded mode in filter_multiline, segmentation faults or deadlocks are occurring randomly (especially in high load situations).
I assume this is caused by missing thread-safe implementation within the
flb_log_event_encoder
functions.There is also an auto-closed issue #6728, together with an open and outdated PR from @nokute78 #6765 which are describing a similar issue, which is obviously still not fixed.
Example deadlock stacktraces:
flb_log_event_encoder_commit_record
flb_log_event_encoder_dynamic_field_reset
and similar stacktraces for other flb_log_event_encoder functions.
Example stacktrace for segmentation fault crash:
@nokute78 (cc @edsiper) Was there a reason for #6765 not to be merged (and updated to current code base)?
To Reproduce
gdb -p <pid> --batch -ex "thread apply all bt" -ex "detach" -ex "quit"
)Your Environment
Maybe related:
As I read in the announcement of v2.0.2, the memory ring buffer
mem_buf_limit
should be no less than 20M in size. As far as I understand the code, thein_emitter
is used withmemrb
in case of threaded multiline filter.However, as I've already mentioned in #8473, there is this strange (and most probably wrong) assignment:
fluent-bit/plugins/in_emitter/emitter.c
Line 245 in 9652b0d
The default value for the flush frequency is 2000, so I assume this would set the ring buffer size to only 2k. Can you please verify this @nokute78 @edsiper @leonardo-albertovich @pwhelan
The text was updated successfully, but these errors were encountered: