Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in ofi_get_core_info due to global log_prefix variable in libfabric v1.22.0 #10563

Open
piotrchmiel opened this issue Nov 20, 2024 · 1 comment

Comments

@piotrchmiel
Copy link
Contributor

Describe the bug
A race condition occurs in the ofi_get_core_info function in the libfabric library (v1.22.0). Specifically, the global variable log_prefix is modified at line 317 without thread safety, leading to issues when fi_getinfo is called simultaneously from multiple threads.

To Reproduce
Steps to reproduce the behavior:

  1. Use libfabric v1.22.0 with the RXD utility provider in conjunction with verbs.
  2. Have multiple threads where for example one calls fi_getinfo and another calls fi_fabric.
  3. Run the application.
  4. Observe race condition warnings or crashes flagged by thread sanitizer.

Expected behavior
Per the documentation, fi_getinfo should be thread-safe and callable simultaneously by multiple threads without serialization. Modifications to the global variable log_prefix should not cause a race condition

Output
Thread sanitizer identifies a race condition when modifying the global variable log_prefix during simultaneous calls to fi_getinfo and other operations.

Code Reference
The problematic function is:

int ofi_get_core_info(uint32_t version, const char *node, const char *service,
                      uint64_t flags, const struct util_prov *util_prov,
                      const struct fi_info *util_hints,
                      const struct fi_info *base_attr,
                      ofi_map_info_t info_to_core, struct fi_info **core_info)
{
    struct fi_info *core_hints = NULL;
    int ret;

    ret = ofi_info_to_core(version, util_prov->prov, util_hints, base_attr,
                           info_to_core, &core_hints);
    if (ret)
        return ret;

    log_prefix = util_prov->prov->name;  // <-- Global variable modified here

    ret = fi_getinfo(version, node, service, flags | OFI_CORE_PROV_ONLY,
                     core_hints, core_info);

    log_prefix = "";  // <-- Global variable reset here

    fi_freeinfo(core_hints);
    return ret;
}

Environment:

  • OS: ubuntu22.04
  • Provider: RXD utility provider with verbs.
  • Libfabric version: 1.22.0

Additional Context
The log_prefix variable is shared across threads, which leads to undefined behavior when multiple threads modify it simultaneously. This violates the thread-safety guarantees of fi_getinfo. A possible fix could involve using thread-local storage for log_prefix to avoid contention between threads.

@piotrchmiel piotrchmiel changed the title Race Condition in ofi_get_core_info Due to Global log_prefix Variable in libfabric v1.22.0 Race Condition in ofi_get_core_info due to hlobal log_prefix variable in libfabric v1.22.0 Nov 20, 2024
@piotrchmiel piotrchmiel changed the title Race Condition in ofi_get_core_info due to hlobal log_prefix variable in libfabric v1.22.0 Race Condition in ofi_get_core_info due to global log_prefix variable in libfabric v1.22.0 Nov 20, 2024
@piotrchmiel piotrchmiel changed the title Race Condition in ofi_get_core_info due to global log_prefix variable in libfabric v1.22.0 Race condition in ofi_get_core_info due to global log_prefix variable in libfabric v1.22.0 Nov 20, 2024
@dsciebu
Copy link
Contributor

dsciebu commented Nov 21, 2024

I prepared a patch for this issue: #10568.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants