Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve stack memory usage at compile time #9963

Draft
wants to merge 37 commits into
base: master
Choose a base branch
from

Conversation

michal-shalev
Copy link
Contributor

@michal-shalev michal-shalev commented Jun 17, 2024

Waiting for:

What?

Improve stack memory usage by introducing an 8KB threshold for stack frames using the GCC flag -Wframe-larger-than=8192. This task involves identifying and addressing functions that exceed this limit.

Why?

Customers with limited stack sizes have recently experienced stack overflow issues when running UCX.
The goal is to ensure that the entire UCX runs within a defined stack size threshold.
Mitigates the risk of stack overflow, particularly in environments with limited stack sizes.

How?

  • Include -Wframe-larger-than=8192 in the build configuration to trigger warnings for large stack frames.
  • Review and refactor functions with large stack frames or move large variables to the heap.
  • Introduce an option for users to adjust or disable the threshold, and update documentation accordingly.

@michal-shalev michal-shalev added the WIP-DNM Work in progress / Do not review label Jun 17, 2024
@michal-shalev michal-shalev marked this pull request as draft June 17, 2024 16:13
new_log_file_name = (char *)ucs_malloc(PATH_MAX, "new_log_file_name");

if (old_log_file_name == NULL || new_log_file_name == NULL) {
ucs_free(old_log_file_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess no need to free when we ucs_fatal().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid for all cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you :), please check now @tvegas1

src/ucs/debug/log.c Outdated Show resolved Hide resolved
src/ucs/debug/log.c Outdated Show resolved Hide resolved
src/ucs/debug/log.c Outdated Show resolved Hide resolved
Comment on lines 180 to 185
if (access(old_log_file_name, W_OK) != 0) {
ucs_free(old_log_file_name);
ucs_free(new_log_file_name);
ucs_fatal("unable to write to %s", old_log_file_name);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Printing log file names after freeing them seems like not a good idea. Also there is a lot of code duplication, so I would propose something like the following:

Suggested change
if (access(old_log_file_name, W_OK) != 0) {
ucs_free(old_log_file_name);
ucs_free(new_log_file_name);
ucs_fatal("unable to write to %s", old_log_file_name);
}
if (access(old_log_file_name, W_OK) != 0) {
ucs_error("unable to write to %s", old_log_file_name);
goto err_out;
}
...
err_out:
ucs_free(old_log_file_name);
ucs_free(new_log_file_name);
if (err_msg) {
ucs_fatal("ucs_log_file_rotate() fatal error (see logs before)");
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you :), please check now @ivankochin

@michal-shalev michal-shalev force-pushed the convert-stack-allocation-to-heap branch from c44c2d2 to 9dd5ab8 Compare June 18, 2024 09:05
src/ucs/debug/log.c Outdated Show resolved Hide resolved
src/ucs/debug/log.c Outdated Show resolved Hide resolved

if (access(old_log_file_name, W_OK) != 0) {
ucs_fatal("unable to write to %s", old_log_file_name);
ucs_error("unable to write to %s", old_log_file_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we fatal anyways, maybe set msg = "..." then print with ucs_fatal(".. %s.. ", msg);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just replace ucs_error() by ucs_fatal()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace ucs_error() by ucs_fatal()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced ucs_error() by ucs_fatal()

ucs_free(new_log_file_name);

err_out:
ucs_free(old_log_file_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to free i guess, since there would be so many leaked resources aynways..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it, please check again

err_out:
ucs_free(old_log_file_name);
ucs_free(new_log_file_name);
ucs_fatal("ucs_log_file_rotate() fatal error (see logs before)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see previous suggestion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check again

@michal-shalev michal-shalev force-pushed the convert-stack-allocation-to-heap branch 3 times, most recently from 90260f5 to daae19b Compare June 18, 2024 11:20
if ((rel_path_buf == NULL) || (abs_path_buf == NULL)) {
ucs_error("Failed to allocate memory for path buffers");
status = UCS_ERR_NO_MEMORY;
goto out;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remote out label and use return UCS_ERR_NO_MEMORY instead, rename out_free_buffers to out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there could be a case where one of the buffer was allocated successfully and the other one failed, so I think we still need to free, so I changed it to "goto out", let me know what do you think

@@ -259,13 +259,21 @@ static ucs_status_t ucs_vfs_node_add(void *parent_obj, ucs_vfs_node_type_t type,
va_list ap, ucs_vfs_node_t **new_node)
{
ucs_vfs_node_t *current_node;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coding guideline mandates current_node be after initialized variables

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to move current_node line 264

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@michal-shalev michal-shalev removed the WIP-DNM Work in progress / Do not review label Jun 18, 2024
@michal-shalev michal-shalev marked this pull request as ready for review June 18, 2024 14:11
@michal-shalev michal-shalev force-pushed the convert-stack-allocation-to-heap branch 2 times, most recently from 7eac1d4 to e10a8a8 Compare June 19, 2024 23:08
@michal-shalev michal-shalev force-pushed the convert-stack-allocation-to-heap branch 4 times, most recently from ef9cbb9 to 21e1828 Compare June 30, 2024 20:47
# -Wall: All common warnings
# -Werror: Treat warnings as errors
# -Wframe-larger-than=8192: Warn if stack frame size exceeds 8192 bytes (8 KB)
BASE_CFLAGS="-g -Wall -Werror -Wframe-larger-than=8192"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might as well remove comments since the flags are so common

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed "-Wall", "-Werror" and "-g"

@@ -773,7 +773,7 @@ class uct_perf_test_runner {
TEST_CASE_ALL_OSD(_perf, _case, UCT_PERF_DATA_LAYOUT_BCOPY) \
TEST_CASE_ALL_OSD(_perf, _case, UCT_PERF_DATA_LAYOUT_ZCOPY)

ucs_status_t uct_perf_test_dispatch(ucx_perf_context_t *perf)
ucs_status_t __attribute__ ((optimize("O1"))) uct_perf_test_dispatch(ucx_perf_context_t *perf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is attribute needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without optimization the stack frame size > 8192 bytes. I tried using "new" at "src/tools/perf/lib/ucp_tests.cc" to make the runner use the heap instead of the stack and it compiled successfully on the HPC node, but then the CI failed because of the "new" and I couldn't reproduce it.
At the moment I changed it to pragma instead of the attribute.


if (access(old_log_file_name, W_OK) != 0) {
ucs_fatal("unable to write to %s", old_log_file_name);
ucs_error("unable to write to %s", old_log_file_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace ucs_error() by ucs_fatal()

@@ -259,13 +259,21 @@ static ucs_status_t ucs_vfs_node_add(void *parent_obj, ucs_vfs_node_type_t type,
va_list ap, ucs_vfs_node_t **new_node)
{
ucs_vfs_node_t *current_node;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to move current_node line 264

if ((rel_path_buf == NULL) || (abs_path_buf == NULL)) {
ucs_error("Failed to allocate memory for path buffers");
status = UCS_ERR_NO_MEMORY;
goto out;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return here.

@@ -457,6 +457,11 @@ internal::CartesianProductHolder<Generator...> Combine(const Generator&... g) {
#define GTEST_GET_FIRST_(first, ...) first
#define GTEST_GET_SECOND_(first, second, ...) second

// Modified to reduce stack frame size per test file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to remove changes to googletest (disabling check for it)? or make it outside of googletest by including gtest-param-test.h?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find a workaround for this, so @yosefe advised me to make this change

@michal-shalev michal-shalev force-pushed the convert-stack-allocation-to-heap branch 4 times, most recently from 5c24d38 to 6aa969d Compare July 1, 2024 20:21
@michal-shalev michal-shalev changed the title convert stack allocation to heap add compile time stack memory usage check Jul 1, 2024
Copy link
Contributor

@tvegas1 tvegas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe remove #pragma in gtest by not specifing Wframe-large-than when building .cc files

{
ucs_status_t status;

status = dispatch_osd(perf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe replace by an array of function pointers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, done

@@ -90,6 +90,8 @@ UCS_TEST_F(test_bitops, ptr_ctz) {
ASSERT_EQ(88, ucs_count_ptr_trailing_zero_bits(buffer, 160));
}

#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wframe-larger-than="
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid all pragma in test/gtest, is there a way to not set the -Wframe-larger-than when compiling gtests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few macros that I can change to make those tests use the heap instead of the stack, just like I did in the commit "TEST/GTEST: reduce stack frame size in INSTANTIATE_TEST_SUITE_P macro".
Those macros are "UCS_TEST_P_" and "UCS_TEST_P_", I'm trying to make the syntax work, it's challenging so I added the pragma to see if I can make the CI pass meanwhile.
The reason I want to reduce the stack allocation in the tests is that later on I will check the stack usage on run time alongside running the tests and if the stack usage of the tests are initially above the threshold then it'll be harder to add a rule for the CI to catch cases of large stack usage during run time.

@michal-shalev michal-shalev force-pushed the convert-stack-allocation-to-heap branch 2 times, most recently from 5a31e59 to ec83329 Compare July 2, 2024 09:05
@michal-shalev michal-shalev force-pushed the convert-stack-allocation-to-heap branch 3 times, most recently from 54c414c to 746ca85 Compare September 16, 2024 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP-DNM Work in progress / Do not review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants