Memory leak in aws s3 cp when piping .zstd files through stdout to a slow consumer #8910
Labels
bug
This issue is a bug.
closing-soon
This issue will automatically close in 4 days unless further comments are made.
p2
This is a standard priority issue
response-requested
Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.
s3
Describe the bug
Like the title says, when doing an
aws s3 cp
to stdout on a.zst
file piped tozstd -dc
and then to a slow consumer, after a while the AWS CLI exhibits memory leak, leading to the complete freeze of the OS within minutes.Expected Behavior
The file should be streamed without incident.
Current Behavior
After a certain amount of time, the AWS CLI starts using too much memory until the OS freezes.
Reproduction Steps
Below is a script that streams a
.zst
file to stdout (-
), pipes it tozstd -dc
and then to a slow consumer: a Python script that reads from stdin and sleeps for 1 second every 1000 lines.For this to work, you need access to an S3 bucket with a large enough
.zst
file for the bug to manifest. In my tests, it consistently happened before 1 million lines were read, but you could use a larger file to be sure. The production file I was reading when I stumbled upon this bug contained one JSON object / line, and lines had an average length of ~1000-2000 characters, in case this matters.The
ulimit
is there to prevent the memory leak from freezing your machine, the process will instead get killed once the allowed memory is exceeded (2G). It also demonstrates that the leak happens in the call toaws s3 cp
, and not at any other step.I have tested this with higher values like 8G, 16G, 32G, no matter how much memory you have once the bug happens your machine's RAM will get filled up within minutes, even though up to that point memory use remained largely linear.
Possible Solution
My best guess is that it has to do with the multi-threaded nature of
aws s3 cp
, and the wayzstd -dc
reads data from stdin. The bug doesn't manifest with other file formats (e.g.gzip -dc
), or withaws s3api get-object
.Additional Information/Context
This was reproduced on multiple CLI versions, and on multiple operating systems (Ubuntu, Amazon Linux).
CLI version used
2.15.43
Environment details (OS name and version, etc.)
Ubuntu 22.04.4 LTS
The text was updated successfully, but these errors were encountered: