-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The state.json should be generated prior to the creation of the cgroup. #4535
base: main
Are you sure you want to change the base?
The state.json should be generated prior to the creation of the cgroup. #4535
Conversation
@kolyshkin Could you help to check this? |
libcontainer/process_linux.go
Outdated
@@ -561,6 +561,13 @@ func (p *initProcess) start() (retErr error) { | |||
} | |||
}() | |||
|
|||
// A SIGKILL can happen at any time, and without the state.json, | |||
// the 'runc delete --force' command won't be able to clear the cgroup. | |||
_, err = p.container.updateState(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
We should know that we will write the pid of init process to state.json
, so when we do delete -f
, once this init process has dead, but runc stage 2 process is alive, I think maybe we still can't remove this cgroup path either because there is still one process in this cgroup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my testing, I discovered that when systemd is used to handle cgroup, it gets cleaned up once runc init has exited. However, if we don't use systemd, some remnants of cgroup are left behind.
If we're not using systemd to manage, we might need to do a check in runc delete to see if runc init is still listed in cgroups.procs. This shouldn't take too much time since runc init is bound to exit due to an error. This error happens because when runc init gets to parts like procHooks that require synchronization with the parent process, it fails as the parent process has already been terminated, leading to errors when runc init tries to write or read values from the pipeline, and consequently, it exits.
I paste the CI error msgs here, you can refer it if you can't see the logs.
To add your Signed-off-by line to every commit in this branch: Ensure you have a local copy of your branch by checking out the pull request locally via command line. |
2026161
to
7e6327b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I generally agree this is a bug which should be fixed, I don't like the way it is fixed. The issues are:
- lot of code duplication;
- API bloat (we now have
LoadCreatingState
andDestroyCreating
-- does libcontainer user really has to care about all this?); - maybe some bugs (like,
creating-state.json
is removed afterstate.json
is written, not at the same time).
Can we reuse the same state.json
, and consider the state is "creating" if init pid is not known?
Also, it would be nice to have a test case added (somehow). |
Thank you, I'll make some adjustments. However, I'm still not sure how to add a test case. This bug isn't easily reproducible unless we simulate a timeout by adding a sleep command before runc creates the state.json file. |
Fix 4534
Make sure that the state.json is in place before setting up the cgroup or writing 'THAWED' into the freezer.state. This way, the 'runc delete --force' command will work as expected.