sdexec: add stop timeout to handle unkillable processes#6666
sdexec: add stop timeout to handle unkillable processes#6666mergify[bot] merged 11 commits intoflux-framework:masterfrom
Conversation
Problem: job-exec may need to set TimeoutStopUSec=infinity to disable systemd's stop timeout that clobbers the IMP with a SIGKILL. Since this property has a non-string value, it needs to be special cased in sdexec_start_transient_unit() so it can be represented using the proper dbus type. Add it, with "infinity" support.
Problem: there is no test that ensures sdexec can communicate TimeoutStopUSec to systemd. Add some tests.
Problem: Type is a regular systemd property, but libsdexec requires it to be set in a different way than other properties. Drop the type parameter from sdexec_start_transient_unit(). If SDEXEC_PROP_Type is not explicitly set in the command options, set Type=exec, which was the value hard coded by sdexec. Update sdexec. Update unit test.
Problem: there is no test for sdexec Type property handling. Add some tests.
d875964 to
3fa2c84
Compare
|
Ugh I just realized this needs to drain any node that ends up with an abandoned stuck unit. |
Problem: an sdexec imp-shell unit can run into the following problem: - flux-shell is killed/terminates - there are unkillable children of flux-shell - the IMP won't exit until the cgroup is empty - the job remains in R state This adds a configurable stop timer to sdexec that is triggered when the unit enters deactivating state. Disabled by default, the timer is configured via the following subprocess command options: SDEXEC_STOP_TIMER_SEC Specify the timeout value in seconds. If non-negative, this enables the stop timer. SDEXEC_STOP_TIMER_SIGNAL Specify a signal to send to the unit after the timeout. By default, SIGKILL is used. The behavior of the stop timer is follows: - The timer is activated when the unit enters "deactivating" state. - After STOP_TIMER_SEC seconds, STOP_TIMER_SIGNAL is sent to the unit. - After another STOP_TIMER_SEC seconds, the unit is abandonded and subprocess exec RPC is terminated with an EDEADLK error. To solve the stated problem, the stop timer must be used with job-exec changes to run the unit with Type=notify, in conjunction with changes to the IMP to call sd_notify() STOPPING=1 when the shell exits. The job-exec changes are coming in a future commit.
Problem: jobs remain in R state when the flux-shell exits with unkillable processes. Run imp-shell units with Type=notify and the new sdexec stop timer. Disable systemd's stop timer by setting TimeoutStopUsec=infinity. This assumes the IMP has been modified to call sd_notify(3) at appropriate transitions. The stop timer, which is enabled by default with a timeout of 30s and signal of SIGUSR1, may be configured or disabled via the TOML [exec] table. Fixes flux-framework#6656
3fa2c84 to
ade2b54
Compare
|
OK, pushed a change that handles the stop timeout speficially rather than just letting job-exec treat it like a generic failure. Here is a test where the and The unit is still running |
grondo
left a comment
There was a problem hiding this comment.
This LGTM! Nice improvement.
I will get a PR to update flux-security to v0.14.0 in CI.
doc/man5/flux-config-exec.rst
Outdated
| sdexec-stop-timer-sec | ||
| (optional) Configure the length of time in seconds after a unit enters | ||
| deactivating state when it will be sent the sdexec-stop-timer-signal. | ||
| After the same length of time, if the unit hasn't terminated, for example | ||
| due to unkillable processes, the unit will be abandoned and the exec RPC | ||
| is terminated. Default 30. |
There was a problem hiding this comment.
Suggestion: Remind the reader here what triggers the unit entering deactivating state.
There was a problem hiding this comment.
thanks, fixed that and forced a push.
I forgot that I didn't add tests for the SDEXEC_TIMER stuff. However, if you are OK with merging this as is I'm fine with that and time permitting, I can follow on with some test improvements in another PR.
Actually that's just a one liner in |
Problem: the sdexec-stop-timer config options are undocumented. Add sdexec-stop-timer-signal and sdexec-stop-timer-sec options to the flux-config-exec(5) man page.
Problem: when sdexec's stop timer terminates a subprocess exec RPC, the node is not drained and a subsequent job could be co-located with the unkillable processes. Add specific handling for an EDEADLK error that drains the node and then behaves the same as if a EHOSTUNREACH error were received.
ade2b54 to
6ef0d5b
Compare
That should probably go here actually. I 'll tack that on. |
Problem: sdexec now launches work in transient Type=notify containers, but for that to work, a flux-security version that calls sd_notify() is required. Bump the minimum required version of flux-security.
Problem: flux-security 0.14 or greater is required now but the debian control file require 0.13. Bump the version.
Problem: flux-security 0.13.0 is installed by default in CI but flux-core now requires 0.14.0 or greater. Bump the version.
|
Yeah, it kind of seems reasonable to merge this one as is to keep things moving along. |
|
Great, thanks! Setting MWP then. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6666 +/- ##
==========================================
- Coverage 83.90% 83.86% -0.04%
==========================================
Files 533 533
Lines 88593 88678 +85
==========================================
+ Hits 74332 74374 +42
- Misses 14261 14304 +43
🚀 New features to boost your workflow:
|
Problem: in a system instance using sdexec, jobs may remain in R state when the shell exits with unkillable processes.
This is because the IMP doesn't exit until the unit cgroup is empty, and the IMP must exit for sdexec to collect the shell's exit status.
This PR, in conjunction with flux-framework/flux-security#201, arranges for the unit to enter deactivating state once the shell exits. An sdexec timeout begins at that point. If after a configurable period, the unit has not stopped, SIGUSR1 (IMP proxy for SIGKILL) is sent to the unit. If after another period the unit still has not stopped, it is abandoned and the subprocess protocol is terminated so the job can advance and release resources.
The abandoned unit remains in deactivating state indefinitely, and can be picked up by the
sdmonmontoring service proposed in #6616