-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mattermost suddenly goes out of memory (OOM) and reboots #20625
Mattermost suddenly goes out of memory (OOM) and reboots #20625
Comments
Hi @DummyThatMatters - It would be awesome if you can capture a heap profile during the memory spike. They don't contain any user data and should be safe to share. |
Hello! We found out that issue caused by calling api method api/v4/channels/{channel_id}/posts?since={timestamp}&skipFetchThreads=false&collapsedThreads=true&collapsedThreadsExtended=false Most likely server fails at at api4/posts.go/getPostsForChannel (str. 249):
We have rewritten mattermost server a bit and deployed modified version in order to find out whats going wrong. I assume that func being called when user search something in channel. And as far as i understand - there is no limitations on number of messages fetched, and calling it on channel with hight amount of messages and heavy content will cause lots of trouble for whole server. Can it be fixed somehow? |
Can someone please check info we have provided and mark this as an bug/issue to be fixed? |
Thank you @DummyThatMatters. Yes your profile matches with what you are seeing. We are looking into it. |
@agnivade , ok, thanks! Let me know if you need more info, we will try to provide what we can. |
@DummyThatMatters - As an update, we have triaged it and we are tracking this internally. I'll give you an update when this is fixed. Thank you for reporting it! |
As a temporary solution, you can enable Bleve Indexing in the system settings, after turning on the server will stop crashing on oom, but turns off the search for bot messages. |
We are facing the same issue. Mattermost server is killed by OOM killer serveral times a day. Is there any progress on this? |
Hi @mjnaderi , what Mattermost server version are you running currently? |
Thanks, just wanted to confirm that it still happens with 7.7.1. |
Increasing |
We did this after seeing messages like |
We stumbled upon this issue ourselves and found out that this is related to a cgroup memory leak, which seems to be fixed already in the kernel: If this is also the issue on your end, you can try to add the kernel command line option
|
Greetings! |
Adding I doubled the value of |
@agnivade Was this fixed, or do we keep this open for now? |
Apologies. Somehow I missed this. So it seems like various users with different problems are commenting on this issue. It's not clear as to what's the real root cause is. For some users, bumping up the The original issue reported by @DummyThatMatters was an API related problem, and there's been a lot of changes to MM since 7.0.1. I'd like to know if it happens on a later 9.x version. |
Don't worry, we also encountered OOM issues due to :D |
I get that. But I'd like to have an explanation as to how does bumping up the |
Summary
Mattermost goes for an unexpected reboot periodically (every 1-2 working days appox.) due to sudden increase in memory consumption.
Steps to reproduce
Mattermost 7.0.1 Team Edition, deployed on a pod in openshift (tryed to allocate from 2.6GB to 5.5GB RAM with the same result). Postgres 14.2 as a DB.
Around ~4000 users, ~1200 of them are active
~ 26000 messages per day load.
Expected behavior
Mattermost workes stably without reboots.
Observed behavior (that appears unintentional)
Mattermost goes for a reboot every 1-2 working days. The cause of reboot is OOM. Here is the example of log of the memory consumption:
The same increase of load can be observed on CPU part as well:
As you can see there is sudden growth of resource utilisation out of nowhere. The logs are relatevely clean and logs raito didnt show any increase of operations number or increased user activity.
We have done our small investigation and we think that it can be caused by unproper functioning of getPostsForChannel method. That assumption being done by inspecting mattermost go profile.
Here is example of heap tree made via pprof tool:
Please help with investigating, we can provide additional info if it needed (if it can be collected via our tools and does not contain corporate data)
The text was updated successfully, but these errors were encountered: