-
-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: smp server cluster #1422
base: master
Are you sure you want to change the base?
rfc: smp server cluster #1422
Conversation
|
||
## Problem | ||
|
||
Currently we can only scale servers on a given address vertically, which has 2 problems: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we can only scale servers on a given address vertically, which has 2 problems: | |
Currently we can only scale servers on a given address vertically, which has 3 problems: |
|
||
The second approach makes it easy to migrate parts of the state between servers in the cluster, as message queues are already grouped in folder with the top level having 2 letters in folder name. This would allow to have up to 4096 servers in the cluster. | ||
|
||
The proxy would then choose a random server from the list of servers to create a queue, and the server would have to be configured to use specific 2 letters in base64 encoding of queue addresses. For existing queues, the server will be choosed based on the queue ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and the server would have to be configured to use specific 2 letters
probably should be configured to select from range of 2 letters, and not specific 2 letters. this way when new server is added, queues transferred from this server can be split more evenly by further splitting ranges. I think Cassandra does something similar to that..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To group servers into a cluster we need to map requests to specific servers. This can be done in one of two ways: | ||
- additional server ID in the cluster added to transmissions. | ||
- map the first two letters in base64 encoding of queue ID to the server ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should consider advantages and implementation of first approach
- additional protocol commands used only by load-balancing proxy to create references to servers that have the actual queue from other IDs. | ||
- use the same 2 letters for all IDs. | ||
|
||
The latter approach is simpler, but it it cannot be used if some of the IDs are generated client-side and some IDs are generated server-side - we would need to generate all IDs in one place. Alternatively, the client can generate some IDs with the same 2 letters in the ID, and the server in the cluster will be chosen to match this ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, how existing queues will migrate
|
||
If we decide on the second approach, and add client-generated IDs, we already may start rejecting IDs that contain different first 2 letters. It would effectively reduce ID entropy from 192 to 180 bits which could be a better tradeoff than additional protocol commands and requests to find the queue, that would add to the request latency. | ||
|
||
The advantage of the first approach is that it is more generic, and does not impose any restriction on IDs, and making additional requests within the operators network would add a small fraction to the latency, compared with much larger latency to the end user. The balancing proxy could cache the results of dereferencing requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also second approach leaks some metadata possibly..? for example if sender and notification server collude.. basically they'd know they're referring to same node in cluster. not sure, seems far fetched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall I think approach 2 (for question "A separate question is how to map other queue IDs (sender, notifier, link) to the recipient ID", not necessarily overall) is better. Seems reasonable for cluster to know its state
- proxy -> server 2 chosen based on sender ID: SET_SREF | ||
- proxy <- server 2: OK | ||
- proxy -> server 3 chosen based on notifier ID: SET_NREF | ||
- proxy <- server 3: OK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do servers 2 and 3 need to know anything here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as in - it's proxy that's responsible for routing / load balancing, or does this sequence imply servers aren't yet in the same cluster?.. not sure I understand
**The sequence of requests to send the message**: | ||
|
||
- client -> proxy: SEND | ||
- proxy -> mapped server 1 based on sender ID: GET_SREF | ||
- proxy <- server 1: SREF | ||
- proxy -> mapped server 2 based on recipient ID (or cluster ID) in SREF: SEND | ||
- proxy <- server 2: OK | ||
- client <- proxy: OK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also don't get it. aren't these servers in same cluster and behind the same proxy address? why server 1 (and not proxy) has to know sender id for queue that's on server 2?
No description provided.