-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nsqd: discover topic/channel paused state on new topic discovery #1274
base: master
Are you sure you want to change the base?
Conversation
@jehiah @mreiferson @ploxiln This PR is ready for review. PTAL. |
Here is the design:
see more details in the code. : ) |
Thanks for taking a first pass at this @thekingofworld! 🙏 |
@jehiah @mreiferson @ploxiln This PR is ready for review. PTAL. |
@mreiferson This PR is ready for review. PTAL. |
/cc @mreiferson @ploxiln @jehiah any of you can help to take a look at this?🙏 |
@mreiferson Ping. |
/cc @mreiferson |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuing to improve! Thanks for your patience 🙏
My big high-level concern now is that these new paths in nsqlookupd
are going to be significantly more expensive. In the /channels
case, we're doing multiple passes over data in RegistrationDB
to filter and lookup paused state in producers. Considering #1208, we should think about better data structures to use to store this data to make these newer, more complex queries performant.
cc @andyxning
Agree.
After this, seems like handle the "*" case for FindRegistrations is not enough. I have to think about how to solve this whole problem. @mreiferson @andyxning Any ideas? like we can add two additional maps to handle "*" case for FindRegistrations, we can also add additional map handle the state lookup for topic/channel, doesn't it? |
@mreiferson It seems still fast enough, I doubt if the situation of write lock starvation really exists? |
@mreiferson Ping. |
@mreiferson @jehiah @ploxiln Anyone can help to take a look at this? It has waited for more than 10 days... |
Hi folks, let's keep going.[Doge] if there are any problems, please let me know and I will try my best to solve them. |
hi @thekingofworld, sorry for the delay, I'll try to find some time this week to take a closer look. We do know that at larger scale there are performance issues that exist (#1208), and this PR will make some queries perform worse. |
Well, I read the related discussion about the PR #1208. |
By the way, benchmarks show that these queries are still fast. |
I think that the situation that #1208 is attempting to optimize, which the existing benchmark setups here do not emulate, is when there are many channels per topic, and they are constantly being added and removed (as ephemeral channels tend to do when consumers connect and disconnect). Each add and remove does a "write" lock/unlock of the RWMutex, and this blocks all the "read" locks of the lookups. The current benchmarks setup just two channels per topic, and do lookups without concurrent updates, and only time the lookups part. |
@ploxiln yeah, the situation you described is right. |
f513954
to
0ceec31
Compare
@mreiferson @ploxiln Comparison before and after optimization:Performance increased by 40%. Corns:
|
@mreiferson @ploxiln Ping. |
Thanks for continuing to push this forward @thekingofworld. The approach here will now be much more costly on the write (registration) side, because each write will now have to walk the whole structure and produce the cached "high performance" map. Instead, we should concurrently update both the standard and new "high performance" maps during each mutation. The discussion in #1208 (comment) (from "if sync.Map everywhere is too much overhead" down) and #1208 (comment) has the most relevant context for what needs to happen here. |
agree, update highspeed map during each mutation instead walk the whole structure and produce the cached "high performance" map. @mreiferson done, PTAL. |
d134659
to
6e5beee
Compare
@mreiferson ping. |
Fix #1199