-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HQ crashes #743
Comments
Hi, thanks for the report. It looks like you have cut out the most important part of the stack trace though (you only sent the lines starting at stack frame #97) :) Could you please include the whole stack trace? Thanks! |
Sorry about that: this is the whole stack thread thread 'main' panicked at crates/tako/src/internal/server/worker.rs:126:9: HyperQueue version: v0.19.0 You can also re-run HyperQueue server (and its workers) with the |
Oops, that looks like some race condition, we will take a look. If you can reproduce the error, could you please run the server with the following environment variable: It would be also great to know how do you create workers (manually/autoalloc?) and what |
@chkabir Were you able to reproduce the issue and/or run HQ with more logging? :) |
Hi,
I was running an hq server at the Oven node at metacentrum.cz. The oven node is supposed to be explicitly designed to let processes run for long times, and even after their walltime. However, for the last instances the Hq server keeps crashing. Below I attach the relevant statements from the log file:
97: 0x557345bba500 - main
98: 0x154fcfff624a - __libc_start_call_main
at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
99: 0x154fcfff6305 - __libc_start_main_impl
at ./csu/../csu/libc-start.c:360:3
100: 0x557345ad4049 -
101: 0x0 -
Oops, HyperQueue has crashed. This is a bug, sorry for that.
If you would be so kind, please report this issue at the HQ issue tracker: https://github.com/It4innovations/hyperqueue/issues/new?title=HQ%20crashes
Please include the above error (starting from "thread ... panicked ...") and the stack backtrace in the issue contents, along with the following information:
HyperQueue version: v0.19.0
You can also re-run HyperQueue server (and its workers) with the
RUST_LOG=hq=debug,tako=debug
environment variable, and attach the logs to the issue, to provide us more information.
Can you kindly look into this error ?
The text was updated successfully, but these errors were encountered: