-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cannot podman rm
after reboot with fs full
#13967
Comments
Thanks for reaching out, @martinetd! Which system are you running on? |
these tests check kernel features, so they should not be persisted since you can change kernel version.
even if we disable the tests, there are still files/directories created by podman to track its internal state. We can address some of these issues now, but it will be a nightmare to maintain it in the future as we need to handle ENOSPC for every file operation. |
I originally was running this on alpine, version here was reproducing on latest git
Ah, yeah ok that makes sense. I was thinking filesystem features (e.g. overlay on top of tmpfs doesn't work exactly the same as overlay on top of btrfs) but if kernel version also matters it's less obvious.
Well, that state doesn't need everything -- but it'd be great if a minimal set of commands (ps, rm, rmi ?) could work so space can be freed. |
hmm, so trying a bit further (I only tried ps earlier), after "fixing" ps by creating the test files in /run rm also fails because it tries to update bolt_state.db before doing the rm. I guess it'll be difficult to get anything reliably working in these conditions... But once again, there is nothing else than podman files in this partition: there is nothing to clean except podman containers data. (This happens to be a btrfs so for the customer this happen at I've advised to temporarily grow the filesystem by adding another partition, running podman commands to clean then remove the partition back but I don't feel comfortable explaining this in our user manual. At least there is no urgency at this point) So it doesn't have to be all commands, and perhaps not even normal commands, but for example I was able to run podman mount, at which point one can safely remove files in root/overlay//merged/../diff manually, so just reversing the order removing files -> updating bolt_state.db would likely work. Given there's no "podman fsck" to reap orphans it'd actually probably be better in this order? image rm also didn't work (it tries to create a temporary images.json file in the same directory before doing rm work); I wouldn't want to change this order though as half removed image still listed would be pretty bad. Anyway, we don't really need everything to work -- just freeing enough space for normal commands to work next would do. Making sure podman ps -a works and having a hard podman rm mode that doesn't care about metadata would be good. (The devil in me makes me say it'd be easier to create a dummy 1MB file somewhere to be removed in case of such emergencies... But it'd be a shame for the million of users who don't need it, so let's not go there) |
@mheon WDYT? |
I don't see a way to avoid a DB write, we need to keep track of the current state of the container. The real killer is probably that the DB is trying to maintain transactional integrity by retaining the old record until the write of the new container state has succeeded - as such, when disk space available is 0, it will fail to update because even if the new state consumes no more space the old state info is retained by the DB until the write succeeds - and I really don't want to turn that off because transactional safety saves us in dozens of other places, even if it hurts us here. |
I definitely don't want to throw db integrity here: we can do exactly the same as we're doing with the db after removing other files, odds are we'll have removed something and it'll work. That's not necessarily true for other actions, e.g. start should be done as currently: creating in db then creating the container on the filesystem, but I see rm as the opposite so poping "entries" in reverse order makes sense to me. If you don't think it does (we might have different breakage models in mind), putting it in a specific (Fun fact, for e.g. |
A friendly reminder that this issue had no activity for 30 days. |
I'm still interested in solving this, but I still don't have a better idea than making a few specific commands ENOSPC-safe as I'm not comfortable automating growing the podman partition for our product. (and I appreciate it's easier said than done...) |
A friendly reminder that this issue had no activity for 30 days. |
@martinetd Any movement on this? |
Right now we still tell any customer stuck on this to temporarily extend the filesystem if that happens; it's been rare enough that I haven't automated anything yet. Honestly I fully understand that it is hard to make commands not write anything (e.g. ps) or free up data before updating db (at least for rm, I agree we wouldn't want to do that for image rm), so if this become more of a problem reserving e.g. 1MB in a file that can easily be removed would likely be my ""solution"" at this point. I stand by my last comment that it should be possible to have podman ps/kill/rm work in a full fs context but I'll be a bad greedy user here -- honestly don't think I can spare time to work on this in the forseeable future. |
A friendly reminder that this issue had no activity for 30 days. |
A friendly reminder that this issue had no activity for 30 days. |
@mheon If we created a dummy file of a couple of megabytes when storage is first created, and then removed it first when attempting to do a system reset, do you think that could fix this problem? |
Probably, assuming that whatever ate all the storage on the system doesn't eat the space we free first. |
Well since the most likely case is containers/storage used up the space, we should work fine in that situation. |
I would say: make a Libpod database with 100 containers, 100 pods, 200 volumes - all pretty sizable numbers - and find out what the size of it is. Then double that. That should cover typical cases, I think... |
I am seeing a similar issue (as reported in #17198), and after following this thread I got an impression that the
In fact, in both scenarios (database located on the same and on the separate filesystem) I do see the container gone from the |
hmm, I recall podman stopped on first error, but in the log you gave it lists both json files so it could very well have gone ahead with the bolt.db as well.. I assume you request stopped containers as well with I didn't re-read the whole thread but I'm fairly sure I described both the libpod/bolt.db and the json files as metadata, and just suggested for podman rm to remove the actual data (e.g. layer files) before updating all of that -- as the worst that could happen if the command is interrupted is a half-deleted layer that the user wanted to delete anyway, so finishing that cleanup e.g. running command again or cleanup after reboot would finish the job. Anyway, if ps -a doesn't work, your next best bet is to go throught the containers.json and layers.json manually and compare with respectively running containers and images... I don't think podman has a 'gc' command yet. Exporting all images and starting over might be the easiest if you don't want to deal with that. |
I did do From a layman's point of view I agree with you about deleting the container and layer data before updating the metadata and database - it sounds better to end up in an inconsistent state where the container and layer data is gone (which is expected after Another idea: judging by the filename in the error message ( UPD: my previous suggestion about pre-allocating with open+seek would obviously create a sparse file not actually allocating the space, so one would probably need to actually write the corresponding number of bytes in order to pre-allocate, which is of course somewhat more expensive. |
Interested in opening a PR? |
I think this should be the default, in that case, since podamn system reset, should just clobber those directories. |
…3967 Keeping a temporary file of at least the same size as the target file for atomic writes helps reduce the probability of running out of space when deleting entities from corresponding metadata in a disk full scenario. This strategy is applied to writing containers.json, layers.json, images.json and mountpoints.json. Signed-off-by: Denys Knertser <[email protected]>
…3967 Keeping a temporary file of at least the same size as the target file for atomic writes helps reduce the probability of running out of space when deleting entities from corresponding metadata in a disk full scenario. This strategy is applied to writing containers.json, layers.json, images.json and mountpoints.json. Signed-off-by: Denys Knertser <[email protected]>
Seems that the solution I suggested belongs to another repo, I have created containers/storage#1480 |
…3967 Keeping a temporary file of at least the same size as the target file for atomic writes helps reduce the probability of running out of space when deleting entities from corresponding metadata in a disk full scenario. This strategy is applied to writing containers.json, layers.json, images.json and mountpoints.json. Signed-off-by: Denys Knertser <[email protected]>
This seems to affect
|
Hi, any updates? Or any workaround? Thanks! Details: #23045 It's blocking our work. jiazha-mac:~ jiazha$ podman machine list
NAME VM TYPE CREATED LAST UP CPUS MEMORY DISK SIZE
podman-machine-default* applehv 3 weeks ago Currently running 5 2GiB 100GiB
jiazha-mac:~ jiazha$ podman build -t quay.io/olmqe/etcd-index:fips -f catalog.Dockerfile
Error: mkdir /var/tmp/libpod_builder3448793565: no space left on device
jiazha-mac:~ jiazha$ podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
jiazha-mac:~ jiazha$ podman images -a
REPOSITORY TAG IMAGE ID CREATED SIZE
jiazha-mac:~ jiazha$ podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
jiazha-mac:~ jiazha$ sudo podman ps -a
Password:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
jiazha-mac:~ jiazha$ sudo podman images
REPOSITORY TAG IMAGE ID CREATED SIZE |
the original report was about Yours look like a different problem. What is filling the disk space? What is the size of your storage ( |
The "cannot remove machine" part of the issue is a duplicate of this -- freeing up space then running rm again worked as you've illustrated in the later message where you could stop/delete the machine. The "building the image makes the fs full" is another problem, perhaps your store configuration is the old vfs without overlay that eats up a lot of space and the build has too many steps with a big image? But regardless it's unrelated to this, please open another issue regarding the not being able to build if you cannot figure it out (or a discussion, as that's really more of a usage problem than a bug in my opinion) |
Hi @martinetd, this error happened after following the suggestion(rm the old machine and start a new one) here: #23045 (comment) , anyway reported a new issue #23287 for it. |
There are no these folders, can I know which space the jiazha-mac:~ jiazha$ du --si /var/lib/containers/storage
du: /var/lib/containers/storage: No such file or directory
jiazha-mac:~ jiazha$ du --si ~/.local/share/containers/storage
du: /Users/jiazha/.local/share/containers/storage: No such file or directory
jiazha-mac:~ jiazha$ podman images
REPOSITORY TAG IMAGE ID CREATED SIZE I guess the 1, I get the jiazha-mac:~ jiazha$ podman system prune -a
WARNING! This command removes:
- all stopped containers
- all networks not used by at least one container
- all images without at least one container associated with them
- all build cache
Are you sure you want to continue? [y/N] y
Error: 128 errors occurred:
* open /var/home/core/.local/share/containers/storage/overlay-images/.tmp-images.json2632288465: no space left on device
* open /var/home/core/.local/share/containers/storage/overlay-images/.tmp-images.json2123817899: no space left on device
* open /var/home/core/.local/share/containers/storage/overlay-images/.tmp-images.json1495409303: no space left on device
* open /var/home/core/.local/share/containers/storage/overlay-images/.tmp-images.json4215706309: no space left on device 2, I re-init a new jiazha-mac:~ jiazha$ podman machine stop
Machine "podman-machine-default" stopped successfully
jiazha-mac:~ jiazha$ podman machine rm
The following files will be deleted:
/Users/jiazha/.config/containers/podman/machine/applehv/podman-machine-default.json
/var/folders/5n/w9ysf4w93jnfy7k19xxct31c0000gn/T/podman/podman-machine-default.sock
/var/folders/5n/w9ysf4w93jnfy7k19xxct31c0000gn/T/podman/podman-machine-default-gvproxy.sock
/var/folders/5n/w9ysf4w93jnfy7k19xxct31c0000gn/T/podman/podman-machine-default-api.sock
/var/folders/5n/w9ysf4w93jnfy7k19xxct31c0000gn/T/podman/podman-machine-default.log
Are you sure you want to continue? [y/N] y
jiazha-mac:~ jiazha$ podman machine init
Looking up Podman Machine image at quay.io/podman/machine-os:5.0 to create VM
Extracting compressed file: podman-machine-default-arm64.raw: done
Machine init complete
To start your machine run:
podman machine start
jiazha-mac:~ jiazha$ podman machine start
Starting machine "podman-machine-default"
This machine is currently configured in rootless mode. If your containers
require root permissions (e.g. ports < 1024), or if you run into compatibility
issues with non-podman clients, you can switch using the following command:
podman machine set --rootful
API forwarding listening on: /var/run/docker.sock
Docker API clients default to this address. You do not need to set DOCKER_HOST.
Machine "podman-machine-default" started successfully
jiazha-mac:~ jiazha$ podman machine list
NAME VM TYPE CREATED LAST UP CPUS MEMORY DISK SIZE
podman-machine-default* applehv About a minute ago Currently running 5 2GiB 100GiB 3, After reacting the new |
Do |
Hi @rhatdan , as follows, jiazha-mac:~ jiazha$ podman machine ssh
Connecting to vm podman-machine-default. To close connection, use `~.` or `exit`
Fedora CoreOS 40.20240701.2.0
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/tag/coreos
core@localhost:~$
core@localhost:~$ journalctl -p err
Jul 16 17:33:39 localhost systemd[1]: /etc/systemd/system/var-folders.automount:2: Where= path is not absolute, ignoring: a0bb3a2c8b0b02ba5958b0576f0d6530e104
Jul 16 17:33:39 localhost systemd[1]: /etc/systemd/system/private.automount:2: Where= path is not absolute, ignoring: 71708eb255bc230cd7c91dd26f7667a7b938
Jul 16 17:33:39 localhost systemd[1]: /etc/systemd/system/Users.automount:2: Where= path is not absolute, ignoring: a2a0ee2c717462feb1de2f5afd59de5fd2d8
Jul 16 17:33:40 localhost systemd-tmpfiles[1599]: "/home" already exists and is not a directory.
Jul 16 17:33:40 localhost systemd-tmpfiles[1599]: "/srv" already exists and is not a directory.
Jul 16 17:33:40 localhost systemd-tmpfiles[1599]: "/root" already exists and is not a directory.
Jul 16 17:33:41 localhost.localdomain systemd[1]: /etc/systemd/system/var-folders.automount:2: Where= path is not absolute, ignoring: a0bb3a2c8b0b02ba5958b0576f0d6530e104
Jul 16 17:33:41 localhost.localdomain systemd[1]: /etc/systemd/system/private.automount:2: Where= path is not absolute, ignoring: 71708eb255bc230cd7c91dd26f7667a7b938
Jul 16 17:33:41 localhost.localdomain systemd[1]: /etc/systemd/system/Users.automount:2: Where= path is not absolute, ignoring: a2a0ee2c717462feb1de2f5afd59de5fd2d8 |
I would run df and look at amount of space in /var/tmp and other locations. |
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
Running
podman rm
(orpodman ps
or any other command) fails on a freshly booted system (runRoot empty) when graphRoot is full.In my particular use case, we have a filesystem dedicated to podman graphRoot, so when that hits maximum capacity our user could no longer delete stopped image to free space.
Steps to reproduce the issue:
I've reproduced on my laptop as follow, as root:
Describe the results you received:
ENOSPC error for something that shouldn't require space
Describe the results you expected:
actual listing files or allowing to delete some.
Additional information you deem important (e.g. issue happens only occasionally):
There are various tests made -- rightly so -- on overlay directory that are cached in /run.
I see various ways of working around this:
Output of
podman version
:I've reproduced on today's main:
Output of
podman info --debug
:shouldn't be needed, ask if you really want it.
Package info (e.g. output of
rpm -q podman
orapt list podman
):built from sources.
Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)
Yes
The text was updated successfully, but these errors were encountered: