Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce default memory allocation to the java process #1407

Merged
merged 2 commits into from
Oct 31, 2024

Conversation

amahussein
Copy link
Collaborator

@amahussein amahussein commented Oct 31, 2024

Fixes #1406

This pull request introduces several changes to improve the handling of JVM heap size and thread calculations in the spark_rapids_tools module. The most important changes include updating the method for calculating JVM heap size.

This change aims at avoiding allocating memory by default that would trigger the OOM-killer

  • use available memory instead of total.
  • cap the xmx to 32 GB
  • cap the max number of threads to 8

Enhancements to JVM heap size and thread calculations:

Method renaming for clarity:

@amahussein amahussein added bug Something isn't working user_tools Scope the wrapper module running CSP, QualX, and reports (python) labels Oct 31, 2024
@amahussein amahussein requested a review from parthosa October 31, 2024 17:11
@amahussein amahussein self-assigned this Oct 31, 2024
Signed-off-by: Ahmed Hussein <[email protected]>
Copy link
Collaborator

@parthosa parthosa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @amahussein.

@amahussein amahussein merged commit e1c4742 into NVIDIA:dev Oct 31, 2024
14 checks passed
@amahussein amahussein deleted the rapids-tools-1406 branch October 31, 2024 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working user_tools Scope the wrapper module running CSP, QualX, and reports (python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] User tools is aggressive in reserving memory on large machines
3 participants