-
-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mergerfs mount not responsive intermittently #1347
Comments
I need a trace from an app experiencing the issue. Your trace is of df and it looks to exit fine and the query of mergerfs took 0.0096 seconds to run. And what is going on on the system? Have you run free? iotop? What's the loadavg? Etc. |
Thanks for the reply. I think the strace shows the df command not completing for ~58 seconds, no activity from 19:47:00 - 19:47:58 I thought df/ls on the host would be a good test rather than the actual apps (which are dockerised). I can give the output of those other commands next time it happens. |
Using basic tooling is what I suggest using in the docs but a filesystem is a complex beast with many types of functions so getting more than one function testing is useful in debugging. statfs is different from stat is different from file creation, etc. That's why simple tools like "touch", "ls", "rm", "ln", etc. are useful for debugging. They don't do a whole lot. In any case I didn't notice that as later in the log it shows interactions with mergerfs working fine so it isn't entirely held up. |
There mergerfs trace shows nothing but a gap in time around then which After that point things look pretty normal. stat'ing files mostly. |
Do you have any advice on how I can capture traces for this given it’s intermittent? I was thinking about setting up a rolling trace buffer on the key processes and using ls/df as the test, when it takes >20 seconds to complete to write out the trace buffer to record the trace before the issue starts. Would this work and help to diagnose the issue? |
Having traces will only show who is being slow. It won't necessarily diagnose anything. There is very little going on in mergerfs that can cause slowdown. It's primarily just a proxy. Mostly, if there is slowdown it's because the underlying devices are overwhelmed or there is buffer bloat and swapping going on. This is why general system measurements can be important here. Load avg, IO waits, etc. I'm not aware of any built in sampling behavior in strace. The problem is if you have some external loop checking for delay and it is buffer bloat related or similar then by the time you trace it could be done flushing. The easiest thing would be to disable all the caches and ensure flush on close is enabled and compare behavior. |
issue description
system info
Strace was run while mount was unresponsive, started the mergerfs trace and then a strace on 'df':
The text was updated successfully, but these errors were encountered: