-
Notifications
You must be signed in to change notification settings - Fork 399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BlockingIOError: [Errno 11] Resource temporarily unavailable #176
Comments
Perhaps if IO is overloaded you could just discard the output, that would be better than a hard exit, for example: def read_log_files(self, units):
BUFSIZE = 8192
for unit in units:
if unit in self._log_file:
new_text = b""
while True:
buf = os.read(self._log_file[unit], BUFSIZE)
if not buf:
break
new_text += buf
text = self._log_hold[unit] + new_text
if not text:
continue
lines = text.split(b"\n")
if not text.endswith(b"\n"):
self._log_hold[unit] = lines[-1]
lines = lines[:-1]
for line in lines:
prefix = unit.encode("utf-8")
content = prefix + b": " + line + b"\n"
try:
os.write(1, content)
try:
os.fsync(1)
except Exception:
pass
except BlockingIOError:
pass |
Sorry, looking at this a bit further it seems that systemctl might be contributing to the high IO. I applied the above patch so that it's no longer exiting, but where there are failing servicehelpers it seems to be producing excessive log outputs such as the below. Unfortunately we do not have full control over the services inside the container as it's used much like a VPS (with strict security policies). Should there be a back-off and then eventual failure for these servicehelpers? or can we rate-limit them?
|
The proposed patch is correct anyway. But what I see is a number of services that can not start in a container anyway. There are some "igno_" lists in the python-file that excludes some enabled units from being included in the init-list of modules. Do we need to extend that list? Additionally, try to run the container as a command-shell and check what services are included in the init-list
If you see anything that looks like coming from the original systemd then it should be kicked out. |
Thanks, here's the output:
Is snapd already included the in ignore list? |
I never had containerized a snapd app. Probably kill the appbox_dbus service as well as I am suspicious that it runs addtional services. May be you can test a bit with adding patterns to the igno_-lists? |
Snapd doesn't work in docker as it requires real systemd, so we can add igno_always = ["network*", "dbus*", "systemd-*", "kdump*", "kmod*", "snapd*"] However some really big journal logs have been created for example:
It seems now like it's using 100% CPU reading these log files as I can see it reading errors which have been resolved. Sorry for my ignorance, I'm not a systemd expert, but what's the reason we're reading the whole log file? EDIT: Testing this on a few other containers with the issue, deleting the log files reduces the CPU usage from 100% to 0.7% even when services are failing, so this seems like the real issue regarding the resource usage. |
Setting the buffersize to 10MB resolves the issue:
Are logs ever rotated? I know on normal systemd systemd-journald rotates the journal files EDIT: Looks like it's not implemented (#41) are there any work arounds for this other than running a custom service to just rotate the log files (we would need to add jitter as we run many containers per node)? |
Logs are not rotated, you are right, and the read-log-loop is about duplicating the log lines from each application service into the container log (docker log), so that messages from the applications become visible outside the container. The current implementation runs the service processes with the log files attached, so it is not really possible to just rotate them as systemd does. There has been just an idea floating around to punch holes in the log files aka sparse files. It may be possible to analyse systemd-journald taking over some idea. But I did not try so far. In reality it was good enough to fix the problems on the applications themselves that led to too many output lines. |
Thanks @gdraheim, so I could use |
Well, that sounds like a reasonable workaround. I hope that one day, the systemctl initloop could throw away parts beyond a certain limit. |
I am thinking to just extend the code in a fork to add a function to fallocate the log files if they're over 5MB (and truncate them to 5MB), I would add a last run variable to check if it has run during the current day, and if not schedule it for a random time. it's probably too opinionated to be merged, but I can make a PR if you're interested. |
We run this on multiple webtop containers (https://github.com/linuxserver/docker-webtop).
The containers can sometimes become IO constrained due to heavy IO. This causes systemctl3.py to exit with the following error:
Once it exits, it's child processes are moved to PID 1 (we use s6 as the PID 1 and run systemctl as an s6 service to handle systemd services for convenience).
systemctl3.py then enters an infinite loop of trying to start the services which are already running. They exit immediately due to PID file lock contention, and then systemctl3.py tries to start them again.
This infinite loop uses 100% CPU:
Is there any way we can remove the IO timeout? Having systemctl hang is much more preferable to an exit, as the contention will eventually subside and resources will become available again.
The text was updated successfully, but these errors were encountered: