Skip to content
This repository has been archived by the owner on Sep 16, 2020. It is now read-only.

Big memory consumption when tasks are slow to be consumed #17

Open
kpanic opened this issue Feb 19, 2016 · 18 comments
Open

Big memory consumption when tasks are slow to be consumed #17

kpanic opened this issue Feb 19, 2016 · 18 comments

Comments

@kpanic
Copy link

kpanic commented Feb 19, 2016

First of all, thanks for toniq! It got me started on a project of mine.

I was wondering if this line

%{ worker_state | pending_jobs: worker_state.pending_jobs ++ [ {job, caller} ] }
might be the cause for a big memory consumption when I am trying to fire a lot of tasks...

What do you think @joakimk ?

Thanks in advance!

@joakimk
Copy link
Owner

joakimk commented Feb 20, 2016

@kpanic have you tried using :observer.start? you can use that to run your app and inspect which process is using a lot of memory.

Simplest way would be something like:

iex -S mix phoenix.server
iex(1)> :observer.start

But you can also connect to a remote node and then use the observer from there.

@kpanic
Copy link
Author

kpanic commented Feb 21, 2016

@joakimk yes, I tried and it seems that the memory consumption is coming from Toniq.JobConcurrencyLimiter.
I might have more time to investigate it, I will let you know.

My use case is to fire up a lot of tasks while streaming from a file

btw, I am not using phoenix at the moment

@joakimk
Copy link
Owner

joakimk commented Feb 27, 2016

I'll keep this issue open in case anyone (including myself) feels like trying to fix it :)

One good first step could be to create a script, somewhat like https://github.com/joakimk/toniq/blob/master/test/benchmark.exs that can be used to see how toniq handles lots of waiting jobs.

@benwilson512
Copy link

Just running into this, this line https://github.com/joakimk/toniq/blob/master/lib/toniq/job_concurrency_limiter.ex#L86 really isn't a viable option. As you say at the top of the file "if you enqueue 10000 jobs, this
gets called 10000 times". This means that if you enqueue 10k jobs, you have to traverse 100 million items.

Not only does this add avoidable overhead, it produces a lot of garbage for the GC as well.

Please use http://erlang.org/doc/man/queue.html. The theoretical analysis can be found in https://www.cs.cmu.edu/~rwh/theses/okasaki.pdf but the gist of it is that you maintain 2 lists. When you "append" to the queue you really just prepend to one of the lists, getting nice O(1) properties. Then when you actually need items, you pop O(1) off the second list. If there's no items, you reverse the first list and it becomes the second list. You pay an O(n) penalty just once and then it's good for another O(n) operations, so you get a nice amortized O(1) for all operations.

@tlvenn
Copy link

tlvenn commented Apr 14, 2016

@benwilson512 any chance you can open a PR to solve this ?

@benwilson512
Copy link

I'll look into it this weekend.

@tlvenn
Copy link

tlvenn commented Apr 14, 2016

@benwilson512 Thanks a lot in advance and thank you for all the hard work on GraphQL 👍

@joakimk
Copy link
Owner

joakimk commented Apr 17, 2016

Nice to see that people using Toniq more actively that me is working on this problem. I'll certainly accept a pull request that fixes this.

@tlvenn
Copy link

tlvenn commented Jun 2, 2016

@benwilson512 have you been able to look into this yet ?

@joakimk
Copy link
Owner

joakimk commented Jun 17, 2016

Can someone confirm if the above merged pull request in 1.0.6 fixes this issue?

@kpanic
Copy link
Author

kpanic commented Jun 19, 2016

@joakimk I will try it (hopefully) during the next weekend! 📗

@kpanic
Copy link
Author

kpanic commented Jun 25, 2016

@joakimk I tried toniq 1.0.6, and I saw not considerable differences.
If you want I can prepare a branch and describe to you how to reproduce.
I do not have so much time in these days to dedicate to my pet open source project, so if this issue hits just me you can close it.
I can prepare the "example" branch although if you wish!

@joakimk
Copy link
Owner

joakimk commented Sep 25, 2016

Toniq is not meant to handle every kind of asynchronous task, after all, we do have OTP itself, but I expect this problem could be common enough that Toniq should handle it.

What we should do is probably to add some kind of stress-test suite for Toniq since that's one important aspect of a job-queue.

I might get around to this myself, but I have a few other projects I want to finish first so it could be quite a while before I do.

Please do link to a branch that shows the issue if it's easy to do so someone can try to fix this.

@joakimk
Copy link
Owner

joakimk commented Sep 25, 2016

This might be related https://youtu.be/6yoJ8sWRiyg?list=PLE7tQUdRKcyYoiEKWny0Jj72iu564bVFD&t=1517 (erlangs handling of big binaries).

@joakimk
Copy link
Owner

joakimk commented Sep 25, 2016

If it turns out that holding many jobs in memory is impractical, maybe they can be flushed out to redis when the list is too long. Possibly as incoming jobs if we also add a limit on importing.

Adding jobs to that list is a bit like I mention here: https://github.com/joakimk/toniq#load-balancing.

We have to be careful to ensure takeover works well with that though.

joakimk added a commit that referenced this issue Sep 25, 2016
joakimk added a commit that referenced this issue Sep 25, 2016
Does not work, but saving it for futher prototyping.

If this turns out to be a useful idea it should be reimplemented cleanly
with tests.

Related to #17
@joakimk
Copy link
Owner

joakimk commented Sep 26, 2016

We have to consider takeover when there are many jobs as well. The way it's written now it assumes that moving jobs over inside redis will be fast. That won't be the case if there are lots of jobs jcelliott@b32df75#commitcomment-19167803

@joakimk
Copy link
Owner

joakimk commented Sep 26, 2016

@kpanic you could try the import_limits branch. It will ensure there are only 500 jobs in memory at a time. It's untested (other than simple manual verification in iex) but it would be interesting to know if it fixes your issues :)

If it does, we can reimplement it propertly.

joakimk added a commit that referenced this issue Sep 26, 2016
joakimk added a commit that referenced this issue Sep 26, 2016
This should ensure there are never more jobs loaded than the limit.

Still only a prototype.

#17
@kpanic
Copy link
Author

kpanic commented Sep 26, 2016

Hi @joakimk thanks for pushing it ahead! I will try this weekend hopefully (fingers crossed!)
Thanks again!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants