Skip to content

Commit 986a17e

Browse files
committed
Add a "memory overview" section to the docs
Many users find resident vs heap memory confusing and they struggle to analyze properly programs where understanding the difference between the two is important. Furthermore, is very common that internal allocator-specific behaviours confuse users when they need to understand what is causing the resident size to increase or decrease. To help users understand these problems and concepts, add a new memory overview section to the docs that covers most typical concepts and problems when analyzing memory. Signed-off-by: Pablo Galindo <[email protected]>
1 parent 26f04e2 commit 986a17e

File tree

4 files changed

+185
-0
lines changed

4 files changed

+185
-0
lines changed

docs/_static/images/rss_vs_heap.png

28.5 KB
Loading
30.9 KB
Loading

docs/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
getting_started
66
run
77
python_allocators
8+
memory
89
temporary_allocations
910
attach
1011
native_mode

docs/memory.rst

+184
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
Memory overview
2+
===============
3+
4+
Heap memory vs resident memory
5+
------------------------------
6+
7+
When you are profiling memory usage, you will often see two different metrics:
8+
heap memory and resident memory.
9+
10+
Resident memory corresponds to the memory that is allocated currently in your
11+
main memory or RAM. Generally speaking, programs will store most of the memory
12+
they are using in main memory but there are some caveats you need to be aware of
13+
if you want to make sense of how your programs are using memory. Resident memory
14+
is a metric that is **not independent** on the other programs that are running
15+
concurrently and what's happening in your machine. This means that **two
16+
identical runs of your program can have very different resident memory
17+
measurements**. If the OS determines that other programs or
18+
tasks have higher priority than the one you are analyzing, it may move some of the
19+
memory used by the program to swap space. This means that resident memory usage
20+
in your program may decrease (and increase later) even if you don't free or
21+
allocate any memory yourself.
22+
23+
This makes resident memory a tricky metric to make sense of. On the one hand it is a
24+
good indicator of what may be causing your machine to run out of memory.
25+
Ultimately this is the memory that is limited by the amount of RAM (and
26+
swap space) you have. But on the other hand the value of resident memory
27+
associated with a given program is dependent on all the other programs that are
28+
running concurrently so it may be a bit difficult to properly diagnose why it
29+
decreases or increases.
30+
31+
Memory is lazily allocated
32+
--------------------------
33+
34+
What makes it even more complicated to properly relate heap memory and resident
35+
memory is that memory is lazily allocated by the OS. For example, if you call
36+
one the system allocator APIs (``malloc`` for example), it may return instantly
37+
without actually allocating memory at all. It still will give you a pointer to a
38+
chunk of memory that you can use, but memory will only be allocated when you write to that pointer.
39+
``malloc`` promises you the memory chunk but you only get it for real when you
40+
really need it. This means that heap memory will increase as soon as the
41+
allocator API is called but resident memory will only increase once you actually
42+
write to that memory.
43+
44+
For instance, consider this code: ::
45+
46+
import time
47+
import numpy
48+
time.sleep(1)
49+
big_array = numpy.empty(1_000_000)
50+
time.sleep(1)
51+
big_array[:] = 42.0
52+
time.sleep(1)
53+
54+
If you run ``memray`` against this code and generate a flamegraph, you will see
55+
the following the following plot:
56+
57+
.. image:: _static/images/rss_vs_heap.png
58+
59+
As you can see in the plot, the line for the heap size increases first
60+
(corresponding to the call to ``numpy.empty``) but the resident size does not
61+
increase inmediately. Instead, the resident size only increases after we have
62+
populated the whole array with floating point numbers. Is only at this moment
63+
when the OS will actually allocate the necessary memory pages to satisfy our
64+
initial request. Notice that this happens when memory is **written** to the
65+
array, so a memory profiler **won't be able to tell you what makes the resident
66+
size grow** as it doesn't have visibility into when pages are actually assigned.
67+
68+
Memory is not freed immediately
69+
-------------------------------
70+
71+
Another thing that makes difficult to relate heap memory and resident memory is
72+
that memory is not freed immediately after it is no longer needed. This is
73+
because the system allocator may not release the memory to the OS when it is no
74+
longer needed. This means that once you call a deallocator API (``free`` for
75+
example), the implementation may not free the memory for real until later. This means
76+
that you may see the heap size decrease but the resident memory size may
77+
not decrease yet.
78+
79+
For instance, consider this code: ::
80+
81+
import time
82+
import numpy
83+
time.sleep(1)
84+
big_array = numpy.empty(1_000_000)
85+
time.sleep(1)
86+
big_array[:] = 42.0
87+
time.sleep(1)
88+
del big_array
89+
time.sleep(1)
90+
91+
If you run ``memray`` against this code and generate a flamegraph, you will see the following the following plot:
92+
93+
.. image:: _static/images/rss_vs_heap_no_free.png
94+
95+
As you can see in the plot, the line for the heap size decreases after we delete
96+
the array (corresponding to the call to ``del``) but the resident size does not
97+
decrease inmediately. Instead, the resident size will only decrease (not shown
98+
in this plot) after the system allocator determines that it is a good idea to free
99+
the memory pages. Notice that this happens when pages are released, so a
100+
memory profiler **won't be able to tell you what makes the resident size
101+
decrease** as it doesn't have visibility when pages are actually unmapped.
102+
103+
Memory is shared
104+
----------------
105+
106+
Another thing that makes it difficult to relate heap memory and resident memory is
107+
that memory is shared. This means that the same memory pages can be used by
108+
different processes. This happens for instance when you fork a process. When you
109+
fork a process, the child process will initially share the same memory pages with the
110+
parent process. This means that the resident memory size that the child process
111+
requires will not increase until copy on write (COW) is triggered. You can read
112+
more about COW in the `Wikipedia page <https://en.wikipedia.org/wiki/Copy-on-write>`_.
113+
114+
Memory can be fragmented
115+
------------------------
116+
117+
Another thing that makes it difficult to relate heap memory and resident memory is
118+
that memory can be fragmented. This means that the memory that is allocated by
119+
the system allocator may be spread around the address space in different
120+
fragments. This means that the resident memory size will increase/decrease in
121+
unpredictable ways. This happens because the system allocator may not be able to
122+
reuse memory that has been freed before.
123+
124+
Memory fragmentation results in seemingly unnecessary requests to the OS
125+
for more memory. Even when the sum of the space already available for the memory
126+
allocator is large enough to satisfy a memory allocation request, it's possible
127+
no individual fragment (or set of contiguous fragments) is large enough to satisfy that
128+
memory allocation request. Memory fragmentation is caused by a combination of
129+
the allocation strategy used by the allocator you are using, the sizes and
130+
alignments of the internal structures, and the memory allocation
131+
behavior of your application.
132+
133+
Detecting fragmentation is a very difficult task because it depends on the system
134+
allocator that you are using. If you are using GLIBC's ``malloc`` for example,
135+
you can use the ``malloc_stats`` API to get information about the memory
136+
allocator. This API will give you information about the number of free chunks
137+
and the total size of the free chunks. If you see that the number of free chunks
138+
is large but the total size of the free chunks is small, then you may be
139+
suffering from memory fragmentation. You can read more about this in the `man
140+
page <https://man7.org/linux/man-pages/man3/malloc_stats.3.html>`_.
141+
142+
Although this API must be called from native code, you can use the `ctypes module <https://docs.python.org/3/library/ctypes.html>`_
143+
to call it from Python. For example, you can use the following code to call it from Python: ::
144+
145+
import ctypes
146+
libc = ctypes.CDLL("libc.so.6")
147+
libc.malloc_stats.restype = None
148+
libc.malloc_stats()
149+
150+
Another option is to use GLIBC's ``malloc_info`` API. This API will give you
151+
information about the memory allocator in a format that is easier to parse from
152+
programs. As with the other API, you can use the `ctypes module <https://docs.python.org/3/library/ctypes.html>`_
153+
to call it from Python. ::
154+
155+
import ctypes
156+
157+
class MallInfo(ctypes.Structure):
158+
_fields_ = [
159+
(name, ctypes.c_int)
160+
for name in (
161+
"arena",
162+
"ordblks",
163+
"smblks",
164+
"hblks",
165+
"hblkhd",
166+
"usmblks",
167+
"fsmblks",
168+
"uordblks",
169+
"fordblks",
170+
"keepcost",
171+
)
172+
]
173+
174+
175+
libc = ctypes.CDLL("libc.so.6")
176+
mallinfo = libc.mallinfo
177+
mallinfo.argtypes = []
178+
mallinfo.restype = MallInfo
179+
180+
info = mallinfo()
181+
fields = [(name, getattr(info, name)) for name, _ in info._fields_]
182+
print("Malloc info:")
183+
for name, value in fields:
184+
print(f"- {name}: {value}")

0 commit comments

Comments
 (0)