Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(duplication): deal with bulkload dup crash #2101

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

ninsmiracle
Copy link
Contributor

What problem does this PR solve?

#2100

What is changed and how does it work?

Make replica server skip RPC_RRDB_RRDB_BULK_LOAD code when reply the plog.

Tests
  • Unit test
  • Manual test

@ninsmiracle
Copy link
Contributor Author

Without this fix, cluster will core like this (if users doing bulkload on a table with dup enable)

(gdb) bt
#0  0x0000000000000001 in ?? ()
#1  0x00007f8c97f3698d in ?? () from /lib64/libgcc_s.so.1
#2  0x00007f8c97f36ded in _Unwind_Resume () from /lib64/libgcc_s.so.1
#3  0x00007f8c9cb5953e in ~_Function_base (this=<optimized out>, __in_chrg=<optimized out>) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:276
#4  ~function (this=<optimized out>, __in_chrg=<optimized out>) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:389
#5  dsn::replication::ship_mutation::ship(std::set<std::tuple<unsigned long, dsn::task_code, dsn::blob>, dsn::replication::mutation_tuple_cmp, std::allocator<std::tuple<unsigned long, dsn::task_code, dsn::blob> > >&&) (this=this@entry=0x2d110f950, in=...) at /home/work/test/pegasus/src/rdsn/src/replica/duplication/duplication_pipeline.cpp:71
#6  0x00007f8c9cb59b27 in dsn::replication::ship_mutation::run(long&&, std::set<std::tuple<unsigned long, dsn::task_code, dsn::blob>, dsn::replication::mutation_tuple_cmp, std::allocator<std::tuple<unsigned long, dsn::task_code, dsn::blob> > >&&) (this=0x2d110f950, last_decree=<optimized out>, in=...) at /home/work/test/pegasus/src/rdsn/src/replica/duplication/duplication_pipeline.cpp:88
#7  0x00007f8c9ccc1f11 in dsn::task::exec_internal (this=this@entry=0xb2df000f0) at /home/work/test/pegasus/src/rdsn/src/runtime/task/task.cpp:176
#8  0x00007f8c9ccd75c2 in dsn::task_worker::loop (this=0x2584d10) at /home/work/test/pegasus/src/rdsn/src/runtime/task/task_worker.cpp:224
#9  0x00007f8c9ccd7740 in dsn::task_worker::run_internal (this=0x2584d10) at /home/work/test/pegasus/src/rdsn/src/runtime/task/task_worker.cpp:204
#10 0x00007f8c9b953a1f in execute_native_thread_routine () from /home/work/app/pegasus/alsgsrv-monetization-master/replica/package/bin/libdsn_utils.so
#11 0x00007f8c9975edc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f8c97c5d73d in clone () from /lib64/libc.so.6
(gdb) 

And stdout like this:

F2024-08-08 16:26:36.249 (1723105596249247134 67179) replica.replica20.0406000900422908: pegasus_mutation_duplicator.cpp:85:get_hash_from_request(): unexpected task code: RPC_RRDB_RRDB_BULK_LOAD

@github-actions github-actions bot added the cpp label Aug 21, 2024
@ninsmiracle
Copy link
Contributor Author

@acelyc111 Shoud I fix .github/workflows/module_labeler_conf.yml to deal with CI stage Module Labeler?

@acelyc111
Copy link
Member

@acelyc111 Shoud I fix .github/workflows/module_labeler_conf.yml to deal with CI stage Module Labeler?

Will. Just ignore it now.

@acelyc111 acelyc111 changed the title fix(duplication): deal with bulkload dup crush fix(duplication): deal with bulkload dup crash Sep 13, 2024
Copy link
Member

@acelyc111 acelyc111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! @ninsmiracle

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants