Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shuffle the list of URLs #2888

Merged
merged 2 commits into from
Aug 15, 2023
Merged

Conversation

sourcefrenchy
Copy link
Contributor

If a server goes down, clients will be distributed better across the remaining servers.

…be distributed better accross the remaining servers.
@CLAassistant
Copy link

CLAassistant commented Aug 14, 2023

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor Author

@sourcefrenchy sourcefrenchy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, if a server is down all connected clients will switch to the next server listed, meaning all of them will reconnected to the same next server even if multiple servers are available. This proposed patch will make sure to redistribute better and avoid a surge of connections to the same server.

Copy link
Contributor

@scudette scudette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really thread safe because the config file is used in all go routines and there is no lock to protect it from being modified. I'm pretty sure the race detector will be very upset by this code.

I don't actually think it's necessary since each client will go through it's list at a different rate. So in practice they will all be fairly random. Do you have numbers to show that it's an issue in practice?

@sourcefrenchy
Copy link
Contributor Author

Oh this request comes from what we observed in production. With around 60k clients, some pools of clients rebooting on their assigned time windows (patching, etc) and with 6 servers, we see the same burst every time a server reboots for maintenance. Graphs show a ~15k or more new clients getting added to the next server in the list e.g. all clients on server 2 now rebooting will adds themselves to server 3, etc. Apologies if the code is not thread safe though but I feel it would help if the next server is simply a random entry from the pool.

@scudette
Copy link
Contributor

This is an interesting observation - thanks for sharing it.

Sorry I didn't read the code closely enough before - I thought it was randomizing after each call but it is only randomizing once in the constructor. I checked with the race detector and it seems fine because url is a copy of the original slice and it is not modified from other goroutines.

This looks good! thanks.

@scudette scudette merged commit e0e9978 into Velocidex:master Aug 15, 2023
1 check passed
scudette added a commit to scudette/velociraptor that referenced this pull request Aug 28, 2023
If a server goes down, clients will be distributed better across the
remaining servers.

---------

Co-authored-by: Mike Cohen <[email protected]>
scudette added a commit that referenced this pull request Aug 28, 2023
If a server goes down, clients will be distributed better across the
remaining servers.

---------

Co-authored-by: Mike Cohen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants