Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion `worker->inprogress++ == 0' failed #10039

Open
pereverges opened this issue Aug 2, 2024 · 3 comments
Open

Assertion `worker->inprogress++ == 0' failed #10039

pereverges opened this issue Aug 2, 2024 · 3 comments
Labels

Comments

@pereverges
Copy link

pereverges commented Aug 2, 2024

Describe the bug

I have compiled the code in my laptop and there it executes perfectly, however when I port the code to a server I am sometimes running into this error, however this does not happen always. I am not sure when this error arises.

[gs07r1b29:3935050:2:3935480] ucp_worker.c:2990 Assertion `worker->inprogress++ == 0' failed
backtrace (tid:3935480) ====
0 /gpfs/apps/MN5/GPP/UCX/1.16.0/INTEL/lib/libucs.so.0.0.0(ucs_handle_error+0x3f4) [0x7f5584b05704]
1 /gpfs/apps/MN5/GPP/UCX/1.16.0/INTEL/lib/libucs.so.0.0.0(ucs_fatal_error_message+0xec) [0x7f5584b02b9c]
2 /gpfs/apps/MN5/GPP/UCX/1.16.0/INTEL/lib/libucs.so.0.0.0(ucs_fatal_error_format+0x103) [0x7f5584b02aa3]
3 /gpfs/apps/MN5/GPP/UCX/1.16.0/INTEL/lib/libucp.so.0.0.0(ucp_worker_progress+0x1a3) [0x7f5552cfb433]
4 [0x7f5515415e5b]

[gs07r1b29:3935050:1:3935478] ucp_worker.c:2995 Assertion `--worker->inprogress == 0' failed
backtrace (tid:3935478) ====
0 /gpfs/apps/MN5/GPP/UCX/1.16.0/INTEL/lib/libucs.so.0.0.0(ucs_handle_error+0x3f4) [0x7f5584b05704]
1 /gpfs/apps/MN5/GPP/UCX/1.16.0/INTEL/lib/libucs.so.0.0.0(ucs_fatal_error_message+0xec) [0x7f5584b02b9c]
2 /gpfs/apps/MN5/GPP/UCX/1.16.0/INTEL/lib/libucs.so.0.0.0(ucs_fatal_error_format+0x103) [0x7f5584b02aa3]
3 /gpfs/apps/MN5/GPP/UCX/1.16.0/INTEL/lib/libucp.so.0.0.0(ucp_worker_progress+0xd3) [0x7f5552cfb363]
4 [0x7f5515415e5b]

Steps to Reproduce

Executing an application that involves send stream / receive stream using jucx, and follows a structure similar to the UCXBenchmark

Setup and versions

  • Linux + CPU architecture x86_64
    Using
    export UCX_TLS=ud_mlx5
    export UCX_NET_DEVICES=mlx5_2:1
@pereverges pereverges added the Bug label Aug 2, 2024
@yosefe
Copy link
Contributor

yosefe commented Aug 5, 2024

Seems an issue with enabling multi-threading support. If the application is multi-threaded, UCX has to be compiled with multi-thread support (--enable-mt) and ucp_worker_create has to be called with ucp_worker_params_t::thread_mode= UCS_THREAD_MODE_MULTI

@pereverges
Copy link
Author

pereverges commented Aug 5, 2024 via email

@yosefe
Copy link
Contributor

yosefe commented Aug 5, 2024

I am using jucx, the Java binding, how do I have to call it in that case?

On Mon, Aug 5, 2024 at 9:17 AM Yossi Itigin @.> wrote: Seems an issue with enabling multi-threading support. If the application is multi-threaded, UCX has to be compiled with multi-thread support (--enable-mt) and ucp_worker_create has to be called with ucp_worker_params_t::thread_mode= UCS_THREAD_MODE_MULTI — Reply to this email directly, view it on GitHub <#10039 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALGUSLZ3367LSDPKVXWQ4HLZP6QTPAVCNFSM6AAAAABL3ZQNPGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRZGQ2DGMJSGQ . You are receiving this because you authored the thread.Message ID: @.>

See https://github.com/openucx/ucx/blob/master/bindings/java/src/test/java/org/openucx/jucx/UcpWorkerTest.java#L41 - requestThreadSafety

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants