-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci] flaky test: TestCheckpoint on almalinux-8 #4457
Comments
I see it many times. |
It wouldn't be the first time criu becomes unreliable on old kernels. Maybe @kolyshkin has an insight on what to do (just skip it?) |
In general, yes, older kernels have issues when trying to freeze a cgroup, and as a result criu fails sometimes. We have a similar issue with runc itself ( Both criu and runc retries (and I've changed the retry timings and attempts in runc at least twice), but sometimes it's still not enough. The issue is probably the same as #4273. |
Let me correct myself: it's not older kernels, it's cgroup v1 freezer. |
Cgroup v1 freezer has always been problematic, failing to freeze a cgroup. In runc, we have implemented a few kludges to increase the chance of succeeding, but those are used when runc freezes a cgroup for its own purposes (for "runc pause" and to modify device properties for cgroup v1). When criu is used, it fails to freeze a cgroup from time to time (see [1], [2]). Let's try adding kludges similar to ones in runc. Alas, I have absolutely no way to test this, so please review carefully. [1]: opencontainers/runc#4273 [2]: opencontainers/runc#4457 Signed-off-by: Kir Kolyshkin <[email protected]>
Cgroup v1 freezer has always been problematic, failing to freeze a cgroup. In runc, we have implemented a few kludges to increase the chance of succeeding, but those are used when runc freezes a cgroup for its own purposes (for "runc pause" and to modify device properties for cgroup v1). When criu is used, it fails to freeze a cgroup from time to time (see [1], [2]). Let's try adding kludges similar to ones in runc. Alas, I have absolutely no way to test this, so please review carefully. [1]: opencontainers/runc#4273 [2]: opencontainers/runc#4457 Signed-off-by: Kir Kolyshkin <[email protected]>
Cgroup v1 freezer has always been problematic, failing to freeze a cgroup. In runc, we have implemented a few kludges to increase the chance of succeeding, but those are used when runc freezes a cgroup for its own purposes (for "runc pause" and to modify device properties for cgroup v1). When criu is used, it fails to freeze a cgroup from time to time (see [1], [2]). Let's try adding kludges similar to ones in runc. Alas, I have absolutely no way to test this, so please review carefully. [1]: opencontainers/runc#4273 [2]: opencontainers/runc#4457 Signed-off-by: Kir Kolyshkin <[email protected]>
Cgroup v1 freezer has always been problematic, failing to freeze a cgroup. In runc, we have implemented a few kludges to increase the chance of succeeding, but those are used when runc freezes a cgroup for its own purposes (for "runc pause" and to modify device properties for cgroup v1). When criu is used, it fails to freeze a cgroup from time to time (see [1], [2]). Let's try adding kludges similar to ones in runc. Alas, I have absolutely no way to test this, so please review carefully. [1]: opencontainers/runc#4273 [2]: opencontainers/runc#4457 Signed-off-by: Kir Kolyshkin <[email protected]>
Cgroup v1 freezer has always been problematic, failing to freeze a cgroup. In runc, we have implemented a few kludges to increase the chance of succeeding, but those are used when runc freezes a cgroup for its own purposes (for "runc pause" and to modify device properties for cgroup v1). When criu is used, it fails to freeze a cgroup from time to time (see [1], [2]). Let's try adding kludges similar to ones in runc. Alas, I have absolutely no way to test this, so please review carefully. [1]: opencontainers/runc#4273 [2]: opencontainers/runc#4457 Signed-off-by: Kir Kolyshkin <[email protected]>
=== RUN TestCheckpoint
time="2024-10-18T08:55:44Z" level=warning msg="--- Quoting "/tmp/TestCheckpoint214687474/003/criu-parent/dump.log""
time="2024-10-18T08:55:44Z" level=warning msg="118:(09.517977) freezer.state=FREEZING"
time="2024-10-18T08:55:44Z" level=warning msg="119:(09.618087) freezer.state=FREEZING"
time="2024-10-18T08:55:44Z" level=warning msg="120:(09.718192) freezer.state=FREEZING"
time="2024-10-18T08:55:44Z" level=warning msg="121:(09.818291) freezer.state=FREEZING"
time="2024-10-18T08:55:44Z" level=warning msg="122:(09.918412) freezer.state=FREEZING"
time="2024-10-18T08:55:44Z" level=warning msg="123:(10.001045) Error (criu/cr-dump.c:1779): Timeout reached. Try to interrupt: 0"
time="2024-10-18T08:55:44Z" level=warning msg="124:(10.001084) freezer.state=FREEZING"
time="2024-10-18T08:55:44Z" level=warning msg="125:(10.001125) Unfreezing tasks into 1"
time="2024-10-18T08:55:44Z" level=warning msg="126:(10.001128) \tUnseizing 45035 into 1"
time="2024-10-18T08:55:44Z" level=warning msg="127:(10.001140) Error (compel/src/lib/infect.c:418): Unable to detach from 45035: No such process"
time="2024-10-18T08:55:44Z" level=warning msg="128:(10.001144) Writing image inventory (version 1)"
time="2024-10-18T08:55:44Z" level=warning msg="129:(10.001223) Error (criu/cr-dump.c:1893): Pre-dumping FAILED."
time="2024-10-18T08:55:44Z" level=warning msg=---
checkpoint_test.go:93: criu failed: type PRE_DUMP errno 0
--- FAIL: TestCheckpoint (10.24s)
The text was updated successfully, but these errors were encountered: