-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRaC checkpoint of java app running inside a docker container (Mac OSX) #1
Comments
I've reproduced this on native Linux host. The workaround would be use 17-crac+3 release so far. |
While playing a bit more with this issue, I've found my host was running out of space. I can't reproduce the issue after cleanup. |
Thank you @snazarkin but I can confirm I do NOT run out of space. with 17-crac+3 indeed cannot reproduceI've taken your suggestion and try with 17-crac+3 and indeed seems to be working, here I do the checkpoint: and then I'm able to exit container, start a new one, and resume from checkpoint: with 17-crac+4 can reproduce every timeIf I try to make use again of 17-crac+4, every time I stumble on the same issue: checkpoint is performed with (seems) no errors, I exit the container, I start new one, but I cannot restore from checkpoint: It is to be noted I'm not on a VM: I'm directly on Mac OSX and I perform those checkpoint/restore operations while inside the Docker container defined here.
|
From the log it looks like in the case C there is a process with PID == 9, which prevents java to be restored. Strange it reproduces only with a fresh shell, and only with build 4. On the last success report with 17-crac+3 you've started shell as a first process, so java got PID 17 on checkpoint, so at restore in the fresh container PID 17 was free, and the restore succeeded. A workaround is to ensure no clash between PIDs on checkpoint and restore possible, e.g. in this example it's enough to run
We are trying to make the container experience better now, we'll look how to cover this sutiation as well. CC @wkia The same workaround helps on restore, but the workaroudnd should be applied either on checkpoint, or on restore, but not for both (the OR is strong). And occassionally, trying to restore several times to free PIDs for java helps as well
or
|
Thank you so much @AntonKozlov, 🙏 I'm playing around with the workaround to avoid PID clashes as described in #1 (comment) with |
Executive summary: I've read the manual, but it's not clear to me why checkpointing a Java app running inside a docker container, should I checkpoint the docker container itself too; I just wanted to "snapshot" my running Java app? Could you kindly clarify why docker checkpoint is needed when performing a CRaC/criu for a java app running inside a docker container, even if I just collect the files in a persistent way, please?
Details
Hi,
I've been experimenting with this project following this video from Devoxx and this great tutorial.
Since I'm on Mac OSX (and not linux) I operate inside docker container.
My goal is to "snapshot" a running Java app, using CRaC/criu at a point in time and restore it, following the tutorials mentioned and the documentation I could find here on github.
Since I operate the CRaC inside a container because I'm on Mac OSX, I make sure the files are collected on a mounted volume, so I can mount them across container restarts.
I have created a banal Java app to test this, here: https://github.com/tarilabs/demo20230223-counting-on-crac
In another shell I perfom:
Up to here, everything works as expected, the app is checkpointed and dump files are created.
Now I want to restore, using the command:
I have tried 3 use-cases
Case A
If in the first shell, as the docker container is still running, I execute the restorefrom, it works.
Case B
If in the second shell, I capture a docker checkpoint with something ~like:
exit docker ps -a docker commit CONTAINER_ID demo20230223-counting-on-crac:checkpoint
then in the first shell, I restart from the checkpoint with something ~like:
it works.
Case C
In the second shell, I just exit.
In the first shell, I just exit.
No container is running and no docker-checkpoint was taken.
In the first shell I go with:
docker run -it --privileged -v $(pwd)/crac-files:/opt/crac-files --rm --name demo20230223-counting-on-crac demo20230223-counting-on-crac java -XX:CRaCRestoreFrom=/opt/crac-files
I get:
I don't get why I cannot just restart the Java app from the dumped files (which are available across container restart as they are on the host disk), somehow additional status of the docker container must also be captured (with the docker checkpoint) ?
Is this a limitation of the system I'm using Mac OSX, and if I was Linux I could have turned off and turned on the linux computer across Java app checkpoint and restore?
Thanks!
The text was updated successfully, but these errors were encountered: