Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf][Metrics] Use flurry's concurrent hashmap for 5x throughput #2305

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

utpilla
Copy link
Contributor

@utpilla utpilla commented Nov 18, 2024

Towards #1740

Changes

  • Current implementation of Metrics SDK uses a RwLock<Hashmap> for aggregating measurements. RwLock brings in significant amount of contention even for concurrent reads
  • This PR showcases how we could use flurry crate's concurrent Hashmap instead of a RwLock<Hashmap>.

The performance gains are huge!

Stress Tests results:

Machine details:

OS: Ubuntu 22.04.4 LTS (5.15.153.1-microsoft-standard-WSL2)
Hardware: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz, 16vCPUs,
RAM: 64.0 GB

  • Counter throughput went up from 9M to 45M iterations per sec
  • Histogram throughput went up from 7.5 M to 35M iterations per sec

Benchmarks

  • The benchmark results are comparable. There is no significant difference in the benchmark results.

Note for reviewers

This PR is not meant for merging as-is. It's meant to show that we can utilize a more efficient concurrent data structure to our advantage. If we indeed decide to use flurry's Hashmap, we have to address the following:

  1. There is a potential race condition during collect for Delta temporality. This needs to be fixed to avoid losing measurements.
  2. Provide meaningful Ord implementations for KeyValue. Ord implementation for the Hashmap's key type is a requirement from flurry. For this PR, I have added a very basic implementation just to unblock myself from testing the Hashmap.
  3. We need to offer this under a feature flag. We don't want to add dependency on an external crate by default.

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@utpilla utpilla requested a review from a team as a code owner November 18, 2024 01:58
Copy link

codecov bot commented Nov 18, 2024

Codecov Report

Attention: Patch coverage is 53.84615% with 24 lines in your changes missing coverage. Please review.

Project coverage is 79.5%. Comparing base (3ac2d9f) to head (dcc1aab).

Files with missing lines Patch % Lines
opentelemetry/src/common.rs 0.0% 23 Missing ⚠️
opentelemetry-sdk/src/metrics/internal/mod.rs 96.5% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main   #2305     +/-   ##
=======================================
- Coverage   79.6%   79.5%   -0.1%     
=======================================
  Files        123     123             
  Lines      21263   21291     +28     
=======================================
+ Hits       16938   16947      +9     
- Misses      4325    4344     +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@utpilla utpilla marked this pull request as draft November 18, 2024 02:03
@lalitb
Copy link
Member

lalitb commented Nov 18, 2024

Thank you for sharing the results! The performance improvements are impressive, as anticipated, given that Flurry is designed after Java's ConcurrentHashMap and provides more fine-grained concurrency compared to DashMap. This can be added under the feature flag, but we also need to see some concern raised on its stability on high-load - jonhoo/flurry#127:

flurry has been a very interesting concurrent hash table experiment and has driven a lot of innovation in that space. However, at this point, flurry suffers from performance as well as memory usage issues under load.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants