Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory locking on mac! #7373

Open
wants to merge 50 commits into
base: dev
Choose a base branch
from
Open

memory locking on mac! #7373

wants to merge 50 commits into from

Conversation

Cathrach
Copy link
Contributor

@Cathrach Cathrach commented Sep 22, 2022

Ticket(s) Closed

Description
This PR adds code to use patched mimalloc as a substitute for system malloc, so we can mlock our allocations.
It also includes a function to mlock statics, which is ad-hoc and requires more testing/possibly a better method. I am extremely open to suggestions.

See https://www.notion.so/whisthq/Mimalloc-and-mlock-Implementation-c2e7d80ce8dc406981928bd0e4560b89.

Implementation

Documentation & Tests Added

Testing Instructions

Pull whisthq/mimalloc, check out the serina/mlock_error branch, and build that using the building instructions. Copy the mimalloc.o file to the build/lib/etc/mimalloc/etc folder on the protocol side

On the protocol side:

  • make sure you compile with -D USE_MIMALLOC=ON
  • To verify that mlocking is working as intended, you can turn on MLOCK_LOG in the mimalloc side; you can also check that Whist's memory usage is higher than dev Whist
  • To check that this is actually improving performance:
    • turn on PLOT_UDP_RECV_GAP in debug_flags.h
    • turn on LOG_DATA_FOR_PLOTTER in log_statistics.h and change other settings as you wish
    • stress your memory
    • use plotter.py

Serina's ABABABAB test (suggested by Yancey)
with mlock:
Screen Shot 2022-09-26 at 4 11 03 PM
dev:
Screen Shot 2022-09-26 at 4 11 20 PM
with mlock:
Screen Shot 2022-09-26 at 4 11 50 PM
dev:
Screen Shot 2022-09-26 at 4 12 07 PM
with mlock:
Screen Shot 2022-09-26 at 4 12 22 PM
dev:
Screen Shot 2022-09-26 at 4 12 37 PM
with mlock:
Screen Shot 2022-09-26 at 4 12 54 PM
dev:
Screen Shot 2022-09-26 at 4 13 29 PM

PR Checklist

  • Did the PR author fully test this PR end-to-end?
  • Did one PR reviewer fully test this PR end-to-end?
  • Did one PR reviewer conduct a thorough code design review?

@github-actions github-actions bot added 📁 Repo: workflows This PR/Issue modifies /workflows code 📁 Repo: protocol This PR/Issue modifies /protocol code labels Sep 22, 2022
@wangyu-
Copy link
Contributor

wangyu- commented Sep 22, 2022

Is there a link to some data of this PR vs dev?

Seems like the only data so far is a screenshot in the protocol channel.

I suggest do rigorous test/experiment, and post more data in the PR description. (As I mentioned in #7011)

Copy link
Owner

@philippemnoel philippemnoel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the patch file still needed?

@Cathrach
Copy link
Contributor Author

Is there a link to some data of this PR vs dev?

Seems like the only data so far is a screenshot in the protocol channel.

I suggest do rigorous test/experiment, and post more data in the PR description. (As I mentioned in #7011)

I will do this over the weekend and post images! yeet

Copy link
Owner

@philippemnoel philippemnoel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we solve the TODOs?

protocol/CMakeLists.txt Show resolved Hide resolved
@Cathrach Cathrach force-pushed the serina/mimalloc branch 2 times, most recently from 65bdeb7 to 72a940d Compare September 29, 2022 21:38
@codecov
Copy link

codecov bot commented Sep 29, 2022

Codecov Report

Merging #7373 (b159538) into dev (c104c46) will increase coverage by 1.25%.
The diff coverage is n/a.

❗ Current head b159538 differs from pull request most recent head 3a9e3c9. Consider uploading reports for the commit 3a9e3c9 to get more accurate results

@@            Coverage Diff             @@
##              dev    #7373      +/-   ##
==========================================
+ Coverage   56.13%   57.38%   +1.25%     
==========================================
  Files         158      106      -52     
  Lines       32519    26947    -5572     
==========================================
- Hits        18255    15464    -2791     
+ Misses      14010    11483    -2527     
+ Partials      254        0     -254     
Flag Coverage Δ
backend-services ?
protocol 57.38% <ø> (-0.16%) ⬇️
Impacted Files Coverage Δ
protocol/client/whist_client.cpp 0.94% <ø> (ø)
protocol/whist/core/whist_memory.c 68.30% <ø> (ø)
protocol/whist/fec/wirehair/WirehairCodec.cpp 88.21% <0.00%> (-2.55%) ⬇️
backend/services/host-service/auth/auth.go
backend/services/metadata/providers.go
backend/services/host-service/host-service.go
...ling-service/scaling_algorithms/default/default.go
...d/services/scaling-service/dbclient/mandelboxes.go
...aling-service/scaling_algorithms/default/assign.go
...aling-service/scaling_algorithms/default/deploy.go
... and 47 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7175fb1...3a9e3c9. Read the comment docs.

@github-actions
Copy link

Protocol End-to-End Streaming Test Results

Experiments Summary

Expand Summary

Experiment 1 - Bandwidth: Unbounded, Delay: None, Packet Drops: None, Queue limit: default. Download logs:

aws s3 cp s3://whist-e2e-protocol-test-logs/serina/mimalloc/2022_11_21@19-43-12/ 2022_11_21@19-43-12/ --recursive

Experiment 2 - Bandwidth: variable between 15Mbit and 30Mbit, Delay: 10 ms, Packet Drops: None, Queue Limit: None, Conditions change over time? Yes, frequency is variable between 1000 ms and 2000 ms. Download logs:

aws s3 cp s3://whist-e2e-protocol-test-logs/serina/mimalloc/2022_11_21@19-49-30/ 2022_11_21@19-49-30/ --recursive

Experiment 3 - Bandwidth: variable between 10Mbit and 20Mbit, Delay: 10 ms, Packet Drops: None, Queue Limit: None, Conditions change over time? Yes, frequency is variable between 500 ms and 2000 ms. Download logs:

aws s3 cp s3://whist-e2e-protocol-test-logs/serina/mimalloc/2022_11_21@20-07-55/ 2022_11_21@20-07-55/ --recursive

Experiment 4 - Bandwidth: 10Mbit, Delay: 10 ms, Packet Drops: None, Queue Limit: 100 packets, Conditions change over time? No.. Download logs:

aws s3 cp s3://whist-e2e-protocol-test-logs/serina/mimalloc/2022_11_21@20-12-11/ 2022_11_21@20-12-11/ --recursive

Full Results: link here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📁 Repo: protocol This PR/Issue modifies /protocol code 📁 Repo: workflows This PR/Issue modifies /workflows code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement memory locking (to avoid memory swap to disk)
4 participants