|
| 1 | +Memory overview |
| 2 | +=============== |
| 3 | + |
| 4 | +Heap memory vs resident memory |
| 5 | +------------------------------ |
| 6 | + |
| 7 | +When you are profiling memory usage, you will often see two different metrics: |
| 8 | +heap memory and resident memory. |
| 9 | + |
| 10 | +Resident memory corresponds to the memory that is allocated currently in your |
| 11 | +main memory or RAM. Generally speaking, programs will store most of the memory |
| 12 | +they are using in main memory but there are some caveats you need to be aware of |
| 13 | +if you want to make sense of how your programs are using memory. Resident memory |
| 14 | +is a metric that is **not independent** on the other programs that are running |
| 15 | +concurrently and what's happening in your machine. This means that **two |
| 16 | +identical runs of your program can have very different resident memory |
| 17 | +measurements**. If the OS determines that other programs or |
| 18 | +tasks have higher priority than the one you are analyzing, it may move some of the |
| 19 | +memory used by the program to swap space. This means that resident memory usage |
| 20 | +in your program may decrease (and increase later) even if you don't free or |
| 21 | +allocate any memory yourself. |
| 22 | + |
| 23 | +This makes resident memory a tricky metric to make sense of. On the one hand it is a |
| 24 | +good indicator of what may be causing your machine to run out of memory. |
| 25 | +Ultimately this is the memory that is limited by the amount of RAM (and |
| 26 | +swap space) you have. But on the other hand the value of resident memory |
| 27 | +associated with a given program is dependent on all the other programs that are |
| 28 | +running concurrently so it may be a bit difficult to properly diagnose why it |
| 29 | +decreases or increases. |
| 30 | + |
| 31 | +Memory is lazily allocated |
| 32 | +-------------------------- |
| 33 | + |
| 34 | +What makes it even more complicated to properly relate heap memory and resident |
| 35 | +memory is that memory is lazily allocated by the OS. For example, if you call |
| 36 | +one the system allocator APIs (``malloc`` for example), it may return instantly |
| 37 | +without actually allocating memory at all. It still will give you a pointer to a |
| 38 | +chunk of memory that you can use, but memory will only be allocated when you write to that pointer. |
| 39 | +``malloc`` promises you the memory chunk but you only get it for real when you |
| 40 | +really need it. This means that heap memory will increase as soon as the |
| 41 | +allocator API is called but resident memory will only increase once you actually |
| 42 | +write to that memory. |
| 43 | + |
| 44 | +For instance, consider this code: :: |
| 45 | + |
| 46 | + import time |
| 47 | + import numpy |
| 48 | + time.sleep(1) |
| 49 | + big_array = numpy.empty(1_000_000) |
| 50 | + time.sleep(1) |
| 51 | + big_array[:] = 42.0 |
| 52 | + time.sleep(1) |
| 53 | + |
| 54 | +If you run ``memray`` against this code and generate a flamegraph, you will see |
| 55 | +the following the following plot: |
| 56 | + |
| 57 | +.. image:: _static/images/rss_vs_heap.png |
| 58 | + |
| 59 | +As you can see in the plot, the line for the heap size increases first |
| 60 | +(corresponding to the call to ``numpy.empty``) but the resident size does not |
| 61 | +increase inmediately. Instead, the resident size only increases after we have |
| 62 | +populated the whole array with floating point numbers. Is only at this moment |
| 63 | +when the OS will actually allocate the necessary memory pages to satisfy our |
| 64 | +initial request. Notice that this happens when memory is **written** to the |
| 65 | +array, so a memory profiler **won't be able to tell you what makes the resident |
| 66 | +size grow** as it doesn't have visibility into when pages are actually assigned. |
| 67 | + |
| 68 | +Memory is not freed immediately |
| 69 | +------------------------------- |
| 70 | + |
| 71 | +Another thing that makes difficult to relate heap memory and resident memory is |
| 72 | +that memory is not freed immediately after it is no longer needed. This is |
| 73 | +because the system allocator may not release the memory to the OS when it is no |
| 74 | +longer needed. This means that once you call a deallocator API (``free`` for |
| 75 | +example), the implementation may not free the memory for real until later. This means |
| 76 | +that you may see the heap size decrease but the resident memory size may |
| 77 | +not decrease yet. |
| 78 | + |
| 79 | +For instance, consider this code: :: |
| 80 | + |
| 81 | + import time |
| 82 | + import numpy |
| 83 | + time.sleep(1) |
| 84 | + big_array = numpy.empty(1_000_000) |
| 85 | + time.sleep(1) |
| 86 | + big_array[:] = 42.0 |
| 87 | + time.sleep(1) |
| 88 | + del big_array |
| 89 | + time.sleep(1) |
| 90 | + |
| 91 | +If you run ``memray`` against this code and generate a flamegraph, you will see the following the following plot: |
| 92 | + |
| 93 | +.. image:: _static/images/rss_vs_heap_no_free.png |
| 94 | + |
| 95 | +As you can see in the plot, the line for the heap size decreases after we delete |
| 96 | +the array (corresponding to the call to ``del``) but the resident size does not |
| 97 | +decrease inmediately. Instead, the resident size will only decrease (not shown |
| 98 | +in this plot) after the system allocator determines that it is a good idea to free |
| 99 | +the memory pages. Notice that this happens when pages are released, so a |
| 100 | +memory profiler **won't be able to tell you what makes the resident size |
| 101 | +decrease** as it doesn't have visibility when pages are actually unmapped. |
| 102 | + |
| 103 | +Memory is shared |
| 104 | +---------------- |
| 105 | + |
| 106 | +Another thing that makes it difficult to relate heap memory and resident memory is |
| 107 | +that memory is shared. This means that the same memory pages can be used by |
| 108 | +different processes. This happens for instance when you fork a process. When you |
| 109 | +fork a process, the child process will initially share the same memory pages with the |
| 110 | +parent process. This means that the resident memory size that the child process |
| 111 | +requires will not increase until copy on write (COW) is triggered. You can read |
| 112 | +more about COW in the `Wikipedia page <https://en.wikipedia.org/wiki/Copy-on-write>`_. |
| 113 | + |
| 114 | +Memory can be fragmented |
| 115 | +------------------------ |
| 116 | + |
| 117 | +Another thing that makes it difficult to relate heap memory and resident memory is |
| 118 | +that memory can be fragmented. This means that the memory that is allocated by |
| 119 | +the system allocator may be spread around the address space in different |
| 120 | +fragments. This means that the resident memory size will increase/decrease in |
| 121 | +unpredictable ways. This happens because the system allocator may not be able to |
| 122 | +reuse memory that has been freed before. |
| 123 | + |
| 124 | +Memory fragmentation results in seemingly unnecessary requests to the OS |
| 125 | +for more memory. Even when the sum of the space already available for the memory |
| 126 | +allocator is large enough to satisfy a memory allocation request, it's possible |
| 127 | +no individual fragment (or set of contiguous fragments) is large enough to satisfy that |
| 128 | +memory allocation request. Memory fragmentation is caused by a combination of |
| 129 | +the allocation strategy used by the allocator you are using, the sizes and |
| 130 | +alignments of the internal structures, and the memory allocation |
| 131 | +behavior of your application. |
| 132 | + |
| 133 | +Detecting fragmentation is a very difficult task because it depends on the system |
| 134 | +allocator that you are using. If you are using GLIBC's ``malloc`` for example, |
| 135 | +you can use the ``malloc_stats`` API to get information about the memory |
| 136 | +allocator. This API will give you information about the number of free chunks |
| 137 | +and the total size of the free chunks. If you see that the number of free chunks |
| 138 | +is large but the total size of the free chunks is small, then you may be |
| 139 | +suffering from memory fragmentation. You can read more about this in the `man |
| 140 | +page <https://man7.org/linux/man-pages/man3/malloc_stats.3.html>`_. |
| 141 | + |
| 142 | +Although this API must be called from native code, you can use the `ctypes module <https://docs.python.org/3/library/ctypes.html>`_ |
| 143 | +to call it from Python. For example, you can use the following code to call it from Python: :: |
| 144 | + |
| 145 | + import ctypes |
| 146 | + libc = ctypes.CDLL("libc.so.6") |
| 147 | + libc.malloc_stats.restype = None |
| 148 | + libc.malloc_stats() |
| 149 | + |
| 150 | +Another option is to use GLIBC's ``malloc_info`` API. This API will give you |
| 151 | +information about the memory allocator in a format that is easier to parse from |
| 152 | +programs. As with the other API, you can use the `ctypes module <https://docs.python.org/3/library/ctypes.html>`_ |
| 153 | +to call it from Python. :: |
| 154 | + |
| 155 | + import ctypes |
| 156 | + |
| 157 | + class MallInfo(ctypes.Structure): |
| 158 | + _fields_ = [ |
| 159 | + (name, ctypes.c_int) |
| 160 | + for name in ( |
| 161 | + "arena", |
| 162 | + "ordblks", |
| 163 | + "smblks", |
| 164 | + "hblks", |
| 165 | + "hblkhd", |
| 166 | + "usmblks", |
| 167 | + "fsmblks", |
| 168 | + "uordblks", |
| 169 | + "fordblks", |
| 170 | + "keepcost", |
| 171 | + ) |
| 172 | + ] |
| 173 | + |
| 174 | + |
| 175 | + libc = ctypes.CDLL("libc.so.6") |
| 176 | + mallinfo = libc.mallinfo |
| 177 | + mallinfo.argtypes = [] |
| 178 | + mallinfo.restype = MallInfo |
| 179 | + |
| 180 | + info = mallinfo() |
| 181 | + fields = [(name, getattr(info, name)) for name, _ in info._fields_] |
| 182 | + print("Malloc info:") |
| 183 | + for name, value in fields: |
| 184 | + print(f"- {name}: {value}") |
0 commit comments