Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dict memory leak in Python 3.12.x / msgpack 1.0.x #612

Open
amachanic opened this issue May 24, 2024 · 6 comments
Open

Dict memory leak in Python 3.12.x / msgpack 1.0.x #612

amachanic opened this issue May 24, 2024 · 6 comments

Comments

@amachanic
Copy link

amachanic commented May 24, 2024

Hello,

I think I've found a second Python 3.12 memory leak (having read the issue for the other one, and tried upgrading to msgpack 1.0.8 to fix).

This issue occurs with dicts, and key uniqueness seems to be a major consideration. (A bunch of dicts with the exact same keys don't cause the problem -- some kind of key caching issue maybe?)

Here's a repro:

import psutil
import msgpack

# Set up a list of packed dicts
p = [
    msgpack.packb(
        {f'{p}.{r}': True for r in range(1000)}
    )
    for p in range(1000)
]

# Check initial state
print(f'initial mem: {psutil.Process().memory_info().rss}')

# Loop and unpack
i = 0
for x in p:
    y = msgpack.unpackb(x)
    i += 1
    if i % 100 == 0:
        print(f'subsequent mem: {psutil.Process().memory_info().rss}')

Output, msgpack 1.0.8 on Python 3.12.3:

initial mem: 24571904
subsequent mem: 34029568
subsequent mem: 42627072
subsequent mem: 47493120
subsequent mem: 60088320
subsequent mem: 64684032
subsequent mem: 69550080
subsequent mem: 89878528
subsequent mem: 94744576
subsequent mem: 99340288
subsequent mem: 104206336

Output, msgpack 0.6.2 on Python 3.12.3 or msgpack 1.0.8 on Python 3.11.9 (slightly different numbers but same effect):

initial mem: 22761472
subsequent mem: 23089152
subsequent mem: 23089152
subsequent mem: 23089152
subsequent mem: 23089152
subsequent mem: 23089152
subsequent mem: 23089152
subsequent mem: 23089152
subsequent mem: 23089152
subsequent mem: 23089152
subsequent mem: 23089152

p.s. apologies for using psutil in the script, I didn't remember until later that it isn't a Python built-in.

@jvs
Copy link

jvs commented May 24, 2024

The memory leak occurs with both the C version and the fallback version (using Python 3.12).

In the fallback module, if you remove the call to sys.intern, then the leak does not occur. I haven't set up the build, but I assume that if you remove the call to PyUnicode_InternInPlace, then the leak will not occur in the C version.

I wonder if Python 3.12 has a regression with the table of interned strings.

@jvs
Copy link

jvs commented May 24, 2024

Sounds related: python/cpython#113993

@ThomasWaldmann
Copy link
Contributor

I also have seen the memory usage going upwards with just the psutil code in the loop.

I didn't compare the magnitude of that with and without the msgpack call though.

@jvs
Copy link

jvs commented May 25, 2024

In Python 3.12, it sounds like all intern'd strings are immortal now. The documentation for intern is out of date at the moment. That explains why memory use continues to go up as an application unpacks dicts with unique keys. All those keys are being kept alive forever.

This ticket explains why msgpack uses intern: #372

Maybe an optional flag could be added to the unpacker, so that applications with many unique keys can opt-out of interning.

@amachanic
Copy link
Author

This ticket explains why msgpack uses intern: #372

"Many json decoders exploit this and memoize/intern strings, to reduce the memory use of the resulting message"

The irony!

Optional flag sounds like a nice addition; presumably CPython will fix this particular issue at some point but some other case(s) could exist in the future where interning is not appropriate.

@methane
Copy link
Member

methane commented Sep 10, 2024

waiting python/cpython#123065

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants