Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in aws s3 cp when piping .zstd files through stdout to a slow consumer #8910

Closed
dandrei opened this issue Sep 10, 2024 · 4 comments
Assignees
Labels
bug This issue is a bug. closing-soon This issue will automatically close in 4 days unless further comments are made. p2 This is a standard priority issue response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. s3

Comments

@dandrei
Copy link

dandrei commented Sep 10, 2024

Describe the bug

Like the title says, when doing an aws s3 cp to stdout on a .zst file piped to zstd -dc and then to a slow consumer, after a while the AWS CLI exhibits memory leak, leading to the complete freeze of the OS within minutes.

Expected Behavior

The file should be streamed without incident.

Current Behavior

After a certain amount of time, the AWS CLI starts using too much memory until the OS freezes.

Reproduction Steps

Below is a script that streams a .zst file to stdout (-), pipes it to zstd -dc and then to a slow consumer: a Python script that reads from stdin and sleeps for 1 second every 1000 lines.

For this to work, you need access to an S3 bucket with a large enough .zst file for the bug to manifest. In my tests, it consistently happened before 1 million lines were read, but you could use a larger file to be sure. The production file I was reading when I stumbled upon this bug contained one JSON object / line, and lines had an average length of ~1000-2000 characters, in case this matters.

The ulimit is there to prevent the memory leak from freezing your machine, the process will instead get killed once the allowed memory is exceeded (2G). It also demonstrates that the leak happens in the call to aws s3 cp, and not at any other step.

I have tested this with higher values like 8G, 16G, 32G, no matter how much memory you have once the bug happens your machine's RAM will get filled up within minutes, even though up to that point memory use remained largely linear.

(ulimit -v 2097152; aws s3 cp s3://your-bucket/path/to/file.zst -) | zstd -dc | python -c "
import sys, time
line_count = 0
for line in sys.stdin: 
    line_count += 1
    if line_count % 1000 == 0:
        print(str(line_count))
        time.sleep(1)
"

Possible Solution

My best guess is that it has to do with the multi-threaded nature of aws s3 cp, and the way zstd -dc reads data from stdin. The bug doesn't manifest with other file formats (e.g. gzip -dc), or with aws s3api get-object.

Additional Information/Context

This was reproduced on multiple CLI versions, and on multiple operating systems (Ubuntu, Amazon Linux).

CLI version used

2.15.43

Environment details (OS name and version, etc.)

Ubuntu 22.04.4 LTS

@dandrei dandrei added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Sep 10, 2024
@tim-finnigan
Copy link
Contributor

Hi, thanks for reaching out. The latest AWS CLI version is 2.17.50 per the CHANGELOG, I recommend testing on a more recent version if you haven't already. You can also try setting different S3 configurations to optimize the download: https://awscli.amazonaws.com/v2/documentation/api/latest/topic/s3-config.html

That's interesting that this only happens with .zst files, and aws s3api get-object works for the same file. What size .zst file can you reproduce this with? Could you share your debug logs (with any sensitive info redacted) by adding --debug to your command? That could help give more insight into what's going on here.

@tim-finnigan tim-finnigan self-assigned this Sep 13, 2024
@tim-finnigan tim-finnigan added s3 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Sep 13, 2024
Copy link

Greetings! It looks like this issue hasn’t been active in longer than five days. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Sep 23, 2024
@dandrei
Copy link
Author

dandrei commented Sep 23, 2024

I can confirm that the bug is no longer surfacing in the current version, 2.17.56.

@dandrei dandrei closed this as completed Sep 23, 2024
Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. closing-soon This issue will automatically close in 4 days unless further comments are made. p2 This is a standard priority issue response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. s3
Projects
None yet
Development

No branches or pull requests

2 participants