-
-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage and tail calls #534
Comments
Hi @yorickhardy. First of all, thanks for the report! These both seem like real issues, let me find some time this week to take a closer look. |
Regarding:
Suspect this is related to the amount of memory added to a thread after major GC. Need to look into this more. Regarding the
|
Perform full scanning of function application list to ensure self-recursive calls are found. This prevents infinite loops in the beta expansion code when compiling simple recursive calls.
Thanks! I am sorry that I have not contributed any fixes, I am going to (eventually) try to put more time into understanding cyclone and help where/if I can. |
No worries, and bug reports are always appreciated! I would welcome fixes as well, however, these two issues in particular are in areas that would be difficult to track down... and I still need to investigate the first one :) |
As this runs, Consider the memory usage when using fixnums or doubles:
That said, I suspect we can do better here, especially since the interpreters can. Will need to spend time looking into this further. |
Snippet of code from
Compare with code from
Questions: Why are we doing an allocation here but not above, and can we safely speed up / optimize the latter code? |
I did not realize that it had switched over to bignum! Thanks. I have more or less isolated the memory usage problems that I was encountering (firstly, I was using many short lived threads instead of a thread pool and assumed that thread-join would (eventually) garbage collect the thread: that assumption is false as far as I can tell). Here is another example with bounded(?) but large memory use:
and similarly with constantly growing memory use:
which surprised me! This simple example grows a bit slower but in the same way:
Strangely, I am not sure whether these examples are helpful, or just examples of poorly written scheme! At this point I am not convinced that the remaining part is a valid issue, or poor programming on my part - so please close the issue if you are satisfied. |
Hello again, The memory consumption that motivated this issue is all due to the use of many threads, via srfi-18 and I was very surprised that terminated threads consume memory. Nevertheless, I don't think the issue is entirely valid as reported, since the underlying problem was threads (not GC and tail calls). Apologies if I have wasted too much of your time. (I do appreciate that you have looked into the reported examples and that you are willing to investigate improvements.) Thanks! |
Glad you got it working @yorickhardy! I appreciate your feedback and think there are genuine issues that are being raised here, though I have not dug into your latest |
@yorickhardy Do you have an example of |
Sure! I hope I have not missed anything obvious ...
|
Hello again, I am not sure if this is the whole story, but after adding a bit of debug output it seems that the memory allocated in I am not yet able to make a more meaningful contribution, but I am trying to work towards it! |
Hey @yorickhardy, appreciate the update! That's interesting.... we do I wonder, could it be that major GC is not being triggered by the example program? |
Yes, that seems to be the beginning of the issue. Am I correct in saying that the collector only starts (sometimes) when allocating scheme objects? In my debugging output, I hacked together a workaround to force the collector out of the |
Correct, the collector will only trigger a major GC when allocating objects and the runtime detects a need to start that collector. For example, a percentage of memory being used up. |
Thanks. In I still need to track down how the heap allocations are eventually freed, and then I can try to force the freeing of memory to see if that shows better memory use for the example program. |
This is a first attempt to improve the memory usage reported in issue justinethier#534.
This ensures that the collector has a chance to run whenever a thread exits. Attempts to partially address issue justinethier#534.
When a thread exits, the heap is merged into the main thread. Before doing so, free any unused parts of the heap to reduce memory usage. Attempts to partially address issue justinethier#534.
I have started to try to work through this, but I look at it infrequently so I am sure that I have completely missed the mark! The branch is here: https://github.com/yorickhardy/cyclone/commits/threads-gc-work/ The total size is still huge, but the resident memory has improved:
|
Thanks @yorickhardy this looks promising! I'm wondering if Another thing I'm wondering about is if the pages on the thread heaps are empty or if there are a couple live objects that are causing the memory usage to grow over time. |
Use gc_start_major_collection() instead. Partial work towards addressing issue justinethier#534.
Thanks! I added some debugging output and the pages which are merged have 3145536 or 3145600 (of 3145728) bytes remaining (so 192 bytes and 128 bytes used respectively). I would guess fragmentation is becoming a problem here? The objects are (repeating for each thread created/destroyed):
|
This is interesting:
I would think these would be parameter objects:
Hmm. I was thinking these would eventually be freed because the |
Moving the code from gc_merge_all_heaps to gc_heap_merge removes special handling of the start of the list and is (hopefully) easier to read. Partial work towards addressing issue justinethier#534.
Partial work towards addressing issue justinethier#534.
Partial work towards addressing issue justinethier#534.
Partial work towards addressing issue justinethier#534.
Partial work towards addressing issue justinethier#534.
This ensures that any objects which are part of the thread context are transferred to the heap. Partial work towards addressing issue justinethier#534.
This will be used to create the thread context. Partial work towards addressing issue justinethier#534.
Also introduce a global variable to track whether merged heaps need to be swept. Partial work towards addressing issue justinethier#534.
The context ensures that parametrised objects, continuations and exception handlers can still be traced but are no longer root objects (after thread terminations) and can be GCd eventually. Partial work towards addressing issue justinethier#534.
The primordial thread may not have an opportunity to sweep heap pages which have been merged from terminated threads. So sweep any unswept pages during the cooperation phase. Partial work towards addressing issue justinethier#534.
A slightly late happy new year! I have attempted to address this issue, but I am still quite unsure about the correctness of the code (in particular: converting a heap page to free list seems like a bad idea?). The proposed solution is ugly, I hope a better solution can be found. On my very simple test program above I see improved behaviour. Sometimes the memory usage is a bit high, but it eventually reduces again. I thought it would follow a pattern, but I don't seem to observe one (by eye). The test programs also all pass, but I am not sure that says much! Any suggestions will be greatly appreciated. Unfortunately I will be quite busy again soon, so I will probably take very long to get around to looking at the issue again (sorry). |
Thank you so much for your work on this @yorickhardy! Let me spend time looking this over, maybe tonight but if not definitely tomorrow. I would like to get a PR together if it looks ready, but if not we can see what that will take. I remember when looking at your fork previously there were good improvements. |
@yorickhardy After a first pass through everything I think these changes are looking good! Usually heap pages are free lists, there is an optimization where the page is initially organized into a contiguous block of memory. This allows for faster initial allocations but we need to revert back to a free list for sweeps and longer-term maintenance of the page. Long way of saying, I think what you are doing there is fine. I was wondering about the extra overhead of sweeping on the main thread. We are already merging everything to that thread, though. And if the extra overhead ever affected program performance, the application could be modified to transition the impacted work to another thread. The I'm inclined to create a PR and work through integrating this into Cyclone. |
Thanks! I will try to get to it this weekend. I thought that maybe the heap pages should be owned by the thread which calls thread-join! Perhaps this thread can also take responsibility for freeing the thread data. |
* gc: add a function to force the collector to run This requires adding a "forced" stage for the collector, which is the initial stage for a forced collection. Thereafter, the collector continues to the usual stages of collection. * runtime: force the garbage collector to run when a thread exits This is a first attempt to improve the memory usage reported in issue #534. * srfi-18: call Cyc_end_thread on thread exits This ensures that the collector has a chance to run whenever a thread exits. Attempts to partially address issue #534. * gc: free unused parts of the heap before merging When a thread exits, the heap is merged into the main thread. Before doing so, free any unused parts of the heap to reduce memory usage. Attempts to partially address issue #534. * srfi-18: thread-terminate! takes a thread as argument * gc: revert adding STAGE_FORCING Use gc_start_major_collection() instead. Partial work towards addressing issue #534. * gc: free empty pages in gc_heap_merge() Moving the code from gc_merge_all_heaps to gc_heap_merge removes special handling of the start of the list and is (hopefully) easier to read. Partial work towards addressing issue #534. * gc: oops, forgot the "freed" count Partial work towards addressing issue #534. * gc: oops, forgot the "freed" count (again) Partial work towards addressing issue #534. * types: update forward declaration of gc_heap_merge() Partial work towards addressing issue #534. * gc: remove accidental double counting * runtime: small (cosmetic) simplification * srfi-18: add a slot for thread context in the thread object Partial work towards addressing issue #534. * srfi-18: do a minor gc when terminating a thread This ensures that any objects which are part of the thread context are transferred to the heap. Partial work towards addressing issue #534. * types.h: make gc_alloc_pair public This will be used to create the thread context. Partial work towards addressing issue #534. * gc: prepare heap objects for sweeping Also introduce a global variable to track whether merged heaps need to be swept. Partial work towards addressing issue #534. * gc: create a context for terminated thread objects The context ensures that parametrised objects, continuations and exception handlers can still be traced but are no longer root objects (after thread terminations) and can be GCd eventually. Partial work towards addressing issue #534. * gc: sweep and free empty heaps for the primordial thread The primordial thread may not have an opportunity to sweep heap pages which have been merged from terminated threads. So sweep any unswept pages during the cooperation phase. Partial work towards addressing issue #534. * srfi-18: revert thread-terminate! changes These changes need to be revisited, and are not suitable for the threads garbage collection pull request.
Hello,
I am trying out
(srfi 18)
in cyclone and noticed that memory was being "consumed" very quickly by a simple monitoring thread, the following small example seems to exhibit this behaviour:Similarly, the cyclone compiler can be encouraged to use excessive memory (and compile time) by:
Both examples work as expected in
icyc
(with regards to memory use).Am I doing something unreasonable here? I am using cyclone-0.36.0 on NetBSD 10.99.10 amd64 (current-ish).
The text was updated successfully, but these errors were encountered: