Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add graph-based pod stop #25169

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Add graph-based pod stop #25169

wants to merge 2 commits into from

Conversation

mheon
Copy link
Member

@mheon mheon commented Jan 30, 2025

Implement a graph-based pod stop and use it by default, ensuring that containers stop in a dependency-based order. This prevents race conditions where application containers stopped after the infra container, meaning they did not have functional networking for the last seconds before they stopped, potentially causing unexpected application errors.

As a pleasant side-effect, make removing containers within a pod parallel, which should improve performance.

Full details in commit descriptions.

Does this PR introduce a user-facing change?

Containers in pods are now stopped in order based on their dependencies, with the infra container being stopped last, preventing application containers from losing networking before they are stopped due to the infra container stopping prematurely.

Copy link
Contributor

openshift-ci bot commented Jan 30, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 30, 2025
@mheon mheon added the No New Tests Allow PR to proceed without adding regression tests label Jan 30, 2025
@mheon
Copy link
Member Author

mheon commented Jan 30, 2025

Tagging no new tests as the existing pod stop/remove tests should exercise this. Would be nice to test the ordering aspect but I'm not sure if that's doable without being very racy.

@Luap99
Copy link
Member

Luap99 commented Jan 30, 2025

For tests one thing that we could do is check the FinishedAt time from inspect for the infra and actual container after pod stop, then make sure the one of the infra is later.
That should guarantee us it was stopped last and it should be easy to add to an existing pod stop test.

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do a proper review tomorrow

Comment on lines 98 to 101
exists, err := c.runtime.state.HasContainer(c.ID())
if err != nil {
return err
}
if !exists {
return fmt.Errorf("container %s does not exist in database: %w", c.ID(), define.ErrNoSuchCtr)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That adds a extra db queries overhead, I am not sure we need this at all.
Technically if the pod doesn't exists you could just ignore the error, i.e. only take locks in an if err == nil {} block.
Then the other code would return its normal error anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I like that. Will fix.

Comment on lines 89 to 93
// Have to lock the pod the container is a part of.
// This prevents running `podman start` at the same time a
// `podman pod stop` is running, which could lead to wierd races.
// Pod locks come before container locks, so do this first.
if c.config.Pod != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do the same thing around stop for consistency

The intention behind this is to stop races between
`pod stop|start` and `container stop|start` being run at the same
time. This could result in containers with no working network
(they join the still-running infra container's netns, which is
then torn down as the infra container is stopped, leaving the
container in an otherwise unused, nonfunctional, orphan netns.

Locking the pod (if present) in the public container start and
stop APIs should be sufficient to stop this.

Signed-off-by: Matt Heon <[email protected]>
@mheon mheon force-pushed the graph_stop branch 2 times, most recently from 9c9d10f to bee225e Compare January 31, 2025 13:24
@mheon
Copy link
Member Author

mheon commented Jan 31, 2025

Oops. Forgot to wait for the parallel executors to stop creating horrible races. Fixed now.

@mheon mheon force-pushed the graph_stop branch 11 times, most recently from ed5adc4 to 3aa18df Compare January 31, 2025 21:27
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

Copy link

Cockpit tests failed for commit 3aa18df. @martinpitt, @jelly, @mvollmer please check.

@mheon mheon force-pushed the graph_stop branch 3 times, most recently from 2fb56ab to 8f21503 Compare January 31, 2025 23:32
First, refactor our existing graph traversal code to improve code
sharing. There still isn't much sharing between inward traversal
(stop, remove) and outward traversal (start) but stop and remove
are sharing most of their code, which seems a positive.

Second, add a new graph-traversal function to stop containers.
We already had start and remove; stop uses the newly-refactored
inward-traversal code which it shares with removal.

Third, rework the shared stop/removal inward-traversal code to
add locking. This allows parallel execution of stop and removal,
which should improve the performance of `podman pod rm` and
retain the performance of `podman pod stop` at about what it is
right now.

Fourth and finally, use the new graph-based stop when possible
to solve unordered stop problems with pods - specifically, the
infra container stopping before application containers, leaving
those containers without a working network.

Fixes https://issues.redhat.com/browse/RHEL-76827

Signed-off-by: Matt Heon <[email protected]>
@martinpitt
Copy link
Contributor

The cockpit test failure from above is essentially this:

# podman pod rm --force --time 0 --all

Error: not all containers could be removed from pod f99a8d18b9b1cf2c8a4951fcce467057f5477ab385b9eb23d38b912ad93120eb: removing pod containers
Error: error removing container 52e324f5fe703a53d4ac0fdfa66fd4914ffb5b7dfdcbc2d3b6d88eccb12b946c from pod f99a8d18b9b1cf2c8a4951fcce467057f5477ab385b9eb23d38b912ad93120eb: container 52e324f5fe703a53d4ac0fdfa66fd4914ffb5b7dfdcbc2d3b6d88eccb12b946c has dependent containers which must be removed before it: df56d19ae6e61a5dc779e1e6e1994734e2e490ed0e8769efa2fb48a29e13ce6f: container already exists

This feels related to this change? The latest push passed, but in commit 3aa18df it also passed in F41 and only failed once in Rawhide -- so this feels like a race condition, or possibly file system order etc., i.e. something not reliably reproducible? Does that ring a bell?

@Luap99
Copy link
Member

Luap99 commented Feb 3, 2025

@martinpitt Most of our tests fail as well so yes this patch is broken and cannot be merged like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. No New Tests Allow PR to proceed without adding regression tests release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants