Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The state.json should be generated prior to the creation of the cgroup. #4535

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jianghao65536
Copy link

Fix 4534

Make sure that the state.json is in place before setting up the cgroup or writing 'THAWED' into the freezer.state. This way, the 'runc delete --force' command will work as expected.

@wxx213
Copy link

wxx213 commented Nov 19, 2024

@kolyshkin Could you help to check this?

@@ -561,6 +561,13 @@ func (p *initProcess) start() (retErr error) {
}
}()

// A SIGKILL can happen at any time, and without the state.json,
// the 'runc delete --force' command won't be able to clear the cgroup.
_, err = p.container.updateState(p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
We should know that we will write the pid of init process to state.json, so when we do delete -f, once this init process has dead, but runc stage 2 process is alive, I think maybe we still can't remove this cgroup path either because there is still one process in this cgroup.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my testing, I discovered that when systemd is used to handle cgroup, it gets cleaned up once runc init has exited. However, if we don't use systemd, some remnants of cgroup are left behind.

If we're not using systemd to manage, we might need to do a check in runc delete to see if runc init is still listed in cgroups.procs. This shouldn't take too much time since runc init is bound to exit due to an error. This error happens because when runc init gets to parts like procHooks that require synchronization with the parent process, it fails as the parent process has already been terminated, leading to errors when runc init tries to write or read values from the pipeline, and consequently, it exits.

{BF5FEB57-6023-4623-A04F-7C96FC189661} {F284436E-31E8-4967-AE89-18A124B1C24C}

@lifubang
Copy link
Member

I paste the CI error msgs here, you can refer it if you can't see the logs.

  • libcontainer/process_linux.go:564: File is not gofumpt-ed (gofumpt)
    // A SIGKILL can happen at any time, and without the state.json,
  • libcontainer/process_linux.go:564: // A SIGKILL can happen at any time, and without the state.json,
    ^^^ extra whitespace at EOL, please fix
  • Commit sha: 2026161, Author: jianghao53, Committer: jianghao53; The sign-off is missing.

To add your Signed-off-by line to every commit in this branch:

Ensure you have a local copy of your branch by checking out the pull request locally via command line.
In your local branch, run: git rebase HEAD~1 --signoff
Force push your changes to overwrite the branch: git push --force-with-lease origin generate-state-before-procHooks

@jianghao65536 jianghao65536 force-pushed the generate-state-before-procHooks branch from 2026161 to 7e6327b Compare November 24, 2024 16:41
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I generally agree this is a bug which should be fixed, I don't like the way it is fixed. The issues are:

  • lot of code duplication;
  • API bloat (we now have LoadCreatingState and DestroyCreating -- does libcontainer user really has to care about all this?);
  • maybe some bugs (like, creating-state.json is removed after state.json is written, not at the same time).

Can we reuse the same state.json, and consider the state is "creating" if init pid is not known?

@kolyshkin
Copy link
Contributor

Also, it would be nice to have a test case added (somehow).

@jianghao65536
Copy link
Author

Thank you, I'll make some adjustments. However, I'm still not sure how to add a test case. This bug isn't easily reproducible unless we simulate a timeout by adding a sleep command before runc creates the state.json file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants