-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Another solution to livelock #14
Comments
Yes, I think that works. Thanks for the writeup. |
Sorry for asking this, but how do you guarantee you have found the least
I know epaxos technically uses optimized dependencies, but I'm trying to consider a more generic dependency graph, which seems possible to generate with 5 replicas (1 replica proposes In the above graph, after receiving commits for Could you please explain why (1) would hold in such a case? (e.g. I'm probably missing something obvious here, or maybe graph like this is too generic/actually impossible? |
As an aside, I would encourage you not to implement the version of the algorithm that requires "seq". Instead, rely on "deps" alone and ack a command to the client only after the command's dependency graph is finalized (or only after it has been executed). To order commands within an SCC use any order criterion as long as it is consistent across replicas. Without "seq", the algorithm has fewer corner cases and better consistency properties. |
I'm not sure it would help with solving livelock due to infinitely growing SCC (without stopping new proposes that is), as using Edit: I actually discovered epaxos after reading a recent paper on bpaxos (there was no |
Well, that's why I called the observation about seq an aside, not a direct answer to your question. That said, I do believe that there could very well be ways to tackle this from a theoretical point of view as well. I haven't thought about this too much, but you could try to reason about which dependencies are really necessary, and which dependencies are superfluous. The critical thing in EPaxos is to guarantee that if two commands interfere with each other, one will be in the other's deps graph. They don't need to be in the same SCC, and as you point out, it's better if they are not. Thus, are there some rules that replicas can use to reason "A must be in deps(B), so I won't add B to deps(A)"? I think so. |
Assumes our discussion is only about interfering commands: I think what confused you is this step:
Let me elaborate on it: Assumes the least For an unseen command Y:
Thus |
I'm not sure how to parse this. Are you talking about direct dependencies, or transitive dependencies as well? I assume direct, since in SCC everything depends on everything, in such case here's a counter example:
The sequence of events is as follows (number before the colon is the replica number):
Here's a short reference of events over time on each replica:
Now that replica 5 is out of a network partition, it receives commits in some particular order:
Running your algorithm we may decide that |
In epaxos the execution consistency only guarantees order about interfering commands, not unrelated commands:
That's why I said:
In your example
Here what I mean is transitively depending, so that a potential minimal |
But it does depend on Case 1:
Case 2:
Then I'm not sure how it helps with livelocks. Waiting for all transitive dependencies would mean replica needs to see a complete SCC, which may grow indefinitely, hence a livelock. Anyway, thank you for your answers. I think using some kind of monotonically increasing per-replica propose time as a tie breaker (used for breaking ties only, thus clock drift would only affect latency of long chains of conflicting commands and their dependencies, so not much of an issue) is more promising for solving this problem. |
Update of my previous comment: As you had the setup with
In order to let this algorithm to work with optimized
Here we need to retrieve all of the unoptimized dependent commands. Back to your steps:
The optimized |
hi @snaury what about my latest comment? |
I don't understand what you mean by any two interfering commands having a direct depends-on relationship. Commands |
For two committed interfering command X and Y, there is at least one replica that has both of them.
Yes.
During execution.
I think epaxos gives complete proof about the consistency of the committed graph.
It does expand the least seq command, but this loop stops when the set of replicas of CMDs stops growing:
|
I'm not sure how that helps. Here's another example, we are constructing a graph that looks like this:
There is a long chain of commands A, B, C, D, E, F conflicting pairwise. There is also a long chain of commands U, V, W, X, Y, Z where U conflicts with A and eventually Z conflicts with F. Here's an incomplete diagram of how proposes/pre-accepts and accepts are spread over time:
Here's a textual description of how these commands are processed and what happens to their
Here we assume that Notice we have already accepted these commands:
Notice also that replica 2 has never seen (and will never see) any of
Replica 2 is not aware of However on a replica that is aware of Yes, there is a replica that sees both |
Here's an even better diagram where both replicas 1 and 2 are never aware that
Only a single replica 3 observes both branches of the eventual loop. |
You are right. My algorithm works only with a set of commands in which every two of them interfering with each other. Thanks for the detailed diagram to explain 😆😆 Although it is possible to generalize this algorithm to work, looking for the subgraph containing the least command can not be done within finite steps. By the way, I did not use |
I think I have a better solution to livelock. Here's a sketch of an idea:
Proof sketch for conflicting nodes
The execution algorithm is then simple: for every instance The graph has no loops, so there is no livelock. What do you think? |
😲😲😲 Compare the complete commit-time deps set( Finally
If my understanding of your idea above is correct, I believe we have almost the same solution. Comparing |
I'm not sure what you mean about comparing What I show is that if |
The fact that you don't have a cycle in the complete dependency graph doesn't seem sufficient to avoid livelock. You also need the ability to start executing before the entire graph is fully committed. Consider the following sequence of PreAccepts, with non-transitive interference relations C1 ~ C2 ~ C3 ~ ... ~ Cn ~ C0
Assume that C0 commits only after this entire sequence of events. It's clear that we can construct examples like this with arbitrarily long chains of dependencies. After we commit C1 and try to execute it, we must follow the deps graph edges until we eventually get to C0, which has no deps. This is livelock, because n can be arbitrarily high. At no point can we execute C1 without having finalized the entire deps graph, because we never know that C1 has the smallest seq of any command we might discover by following the deps edges to the end. |
I don't think it can be called a livelock, because each instance reaches commit state independently and takes a finite number of steps. In my scheme as soon as This is different from original epaxos, where SCC can grow indefinitely and can never be complete, so it would never be executed. I may be mistaken, but I assumed that was the livelock problem mentioned in the paper. When committed instances have other instances in their interference set and those instances are not committed for a long time the recovery should kick in and either commit the original command or replace it with no-op. It's probably possible for several replicas to fight over who finishes recovery, but that's a separate issue for paxos in general. |
Also, that's exactly what I'm doing. Let's assume |
Moreover, if Yes, Note that in original epaxos dependencies in the final graph may end up looking like this:
This is an arbitrarily large SCC and we have to wait until it is complete. However in my proposal graph would end up looking like this:
There are no ambiguities in the graph, everything is linear (since your example is very tidy), it's obvious that |
Ah, yes, that's true. Why do you need to update deps in the Accept phase of a command? Can you give an example where that's necessary? |
Here's a diagram:
The final graph is Also note that if accept phase of |
Oh, I suppose it's because you want to ensure that you see all the commands that interfere with your command, and which might still get a lower seq than your final seq. The fact that the Accept has the final seq for your command guarantees that all other commands (that you don't see) will get higher seqs. Yeah, I think it works. If I were to describe this I wouldn't make the Accept phase add anything to the final committed deps. Instead I would say the extra deps are just an execution hint for the command leader. This way you don't have to complicate recovery. Finally, I want to point out, as I did in the paper, that this seq-based algorithm doesn't ensure strict serializability when command interference is non-transitive. For example, say that a user writes two objects, A and B (initially A = 0, B = 0). First the user commits A = 1, then a time later, the user commits B = 1. While all this is happening, someone else is trying to read both A and B. Consider the following PreAccept order:
Read A&B is ordered before Write A, but after Write B. As a consequence it returns A = 0 & B = 1, which is not consistent with the time ordering of the operations. |
Yes, thank you, I'm aware. :) Our database is currently serializable, but not strict serializable (by default), due to some optimizations that cause similar "time travel" issues. But then just like the paper states it's possible to wait for dependency chain to be committed before acknowledging commit to client, that would make sure any newly interfering command acquires a higher seq than what we depended on. |
Determine the order:If Let sort It guarantees the execution consistency and does not need a |
Initially I thought replicas would persist new deps to later reply with the earliest possible result. But now I think you're right, it could just be a hint if when replica receives
Btw, I realized that ordering by Anyway, I'm not sure why people hate |
To me, |
You need to be a little more formal about your statements. Here's an example:
At time 1 three instances are proposed on each replica. All of them send pre-accept to the next replica and eventually commit the result. What you get is a perfect loop, where If you start accumulating additional dependencies during the accept phase like I propose, you must be aware that different replicas (leader and recovery) may use different quorums (we must consider the case where each replica may only be able to communicate with a subset of other replicas, you may not always see the full picture), accumulate different dependencies, and you may have different dependency graphs on different replicas. I my case that's perfectly fine, since I show how using |
Sorry, it's my bad.
Safe means:
If
I do not know why this is a problem. |
That does not magically happen, the reason the committed value will be the same on recovery is because when there is potential for disagreement (there is no fast quorum of pre-accept-eq replies) epaxos goes thru a separate accept phase. Only when accept phase acquires a quorum can you be sure that the exact value you accepted will be discovered during recovery. Conversely if recovery is in progress and there's a risk of choosing a different value the leader would not be able to finish accept for its value, since quorum will be blocked by a prepare phase of a replica which started recovery. What I'm doing is I acquire additional hints during the accept phase, but I don't have a separate accept phase for those hits (only the command itself, I'm not sure how your solution can deal with different commit messages that have different |
I do not understand. In my opinion epaxos recovery guarantees that:
And yes, original epaxos does not guarantee the hint would be committed with the same value by leader and recovery. Thus you have the intersection solution. In every phase, use
No more phase, just pre-accept and accept, then commit. |
Can you show some examples how you would decide on
Now what is the right commit value? The original leader thinks How do you accumulate dependencies so they are committed consistently? How do you solve chain linked loops which grow indefinitely? How do you decide you are not missing some instances that will loop back on you later? |
To run with
The triangle circle example:
Finally no
I have been using the vector-clock way to solve ordering problem in my project.
First, at t=1, A received 2 pre-accept replies, R1 could commit on fast-path. If R1 insists to run into an accept phase, it should be correct too. According to epaxos, there is not any update to At t=5, if recovery process sees any accept-phase
The leader does not need to update
Yes, leader and recovery should have chosen the same value.
Accumulated dependencies(or
As previously I mentioned(you did not reject it thus I assume it is true If the entire SCC is committed, executing instances by the rank: Thus to execute
|
Wouldn't it be just normal
But if it's decided in pre-accept phase, then it's not even close to being the same, my accumulated deps are accumulated during accept phase, how can it be the same if yours happens in pre-accept phase?
I'll take your word for it, but I don't understand how you could ever guarantee that, especially in very complex loops. You even show a table with values for
I omitted fast path, because it's irrelevant to the issue (it's basically a cheat where you got lucky and something happened the same way on all replicas, being lucky is good for performance, but algorithm must be safe when fast path doesn't exist). Working around fast path is always possible, but makes graph messy, it's easier to consider two instances instead of 3, 4, 5, etc.
The problem I'm trying to solve is how to start executing the graph when entire SCC is not committed, and may never be committed, because it never stops growing. Think what happens when conflict rate is close to 100%, commands proposed on multiple replicas will almost always be looped to each other with more and more dependencies that are not committed yet. |
@drmingdrmer please look at the following diagram:
Here conflicts are pairwise, i.e.
Now let's say replica 5 receives some (but not all) commit messages, what would be the first instance it executes? If you answer |
Ah, I think I'm starting to understand. What you propose is to accumulate transitive conflicts of each conflicting instance you encounter during pre-accept, this way you can glimpse what could have been chosen during a commit. After conflicting instances That's an interesting idea, but I'm not sure it works in some edge cases, where dependencies don't match exactly because of different quorums.
Final graph will have the following dependencies and ranks:
Notice how there are two loops
Then we have:
We would see Unfortunately contradictions are still possible. Looking at subset/superset is a nice idea, but it cannot guarantee edges will be present on both ends of potentially intersecting commands. |
With this time flow the final rank would be:
And it is a loop thus my answer is
B or E could both be executed first, on different replica:
Since D and E do not interfere, it does not matter. The possible order is PS. I'm gonna answer your question in the second last comment right away.:DDD |
The difference is
Except the difference of the phase when they are formed, they play the same role
Yes, this is not very strict in my statement.
The
I said I assume you got the gist of this from your next comments thus I'm not going too |
Yes I had a little trouble finding a correct word for the transitive conflicts.
We allow First I concluded the interfering relations are as below(as you did not specify
This situation is a little bit complicated because it depends on what According to epaxos specified, there are 3 contrains for fast-commit, a system
In this scenario, it could be the second or the third constrain that the system And Thus in this situation execute |
In my example only
I'm not saying it's unreasonable, it's perfectly reasonable. I'm saying the execution order is inconsistent:
Don't you think that's an issue? My example is complicated, but illustrates how it's possible to have I think for it to be safe you must show either:
I would love for the second option to work somehow (it would then be possible to use with bpaxos, where dependency service is separate from command leaders), but I don't see how it can work in a single pass like you propose. :( |
2020-05-21: The following solution is not proved correct. Refer to #19 for a proved exec-algo without livelock. I did not yet have an answer to your previous comment. Break loopEasy to see that every loop contains at least one instance entered accept phase(slow-path). If Thus we remove the relation Execution1 Then choose the instance 2 Following the path in Consistency:This is not currect. I have a fix here: #14 (comment)
What do you think guys? |
BTW, what is bpaxos? I did not find a paper on google. May you share me a link? |
It's kind of off topic, but here's where I initially read about it: https://arxiv.org/abs/2003.00331 |
2020-05-21: The following solution is not proved correct. Refer to #19 for a proved exec-algo without livelock. Fix: in previous comment I said
It is not correct. This should fix the consistency problem in my previous comment. Add order to tip vertexFor some vertices in We could just use instance-id to sort these instances. ExecutionTo execute one instance is quite simply:
ConsistencyIf two replicas( If Otherwise, execute either of |
2020-05-21: The following solution is not proved correct. Refer to #19 for a proved exec-algo without livelock.
Not like that. If we do not use the fast-commit-condition-1(send pre-accept to
This is what I understand about epaxos. But it is a little out of our topic.
I have to say you must be my hero.
This seems to be the simplest directy to solve the problem. I think it works but as always, waiting for your opinions.
To discover an instance that no present instances depend on does not seem to be |
Update: 2020-03-07:
This approach works only with a set of instances in which every two of them interfere with each other. See @snaury 's comments below: #14 (comment)
The livelock problem
Epaxos specifies that in an SCC, the lowest
seq
commands should be executedfirst.
And if there is a stream of interfering commands, execution thread needs to wait until walking through the entire SCC, before it could execute any command.
This is the livelock.
The solution provided by epaxos is to prioritize completing old commands over
proposing new commands.
This solution might bring in some latency because we need to stop
proposing a new command to break the command chain.
I've been thinking maybe there is some other way to solve the livelock problem.
We assume that all commands mentioned below interfere with each other.
Ordering
Assume we determine the command order in an SCC by a tuple
(seq, replicaID)
.Then the first command to execute is the one with the least
(seq, replicaID)
in an SCC.
Find the first command safe to execute
Provided with a committed, un-executed, interfering command list
CMDs = {Ra.A, Rb.B, Rc.C ..}
(Ra.A
is a command owned by replicaRa
).Assume that
Ra.A
has the least(seq, replicaID)
.Define a function
replicas(cmds)
: to extract a list of command owner replica,e.g.:
replicas({Ra.A, Rb.B}) = {Ra, Rb}
.Define
allDeps
, the union of all dependency ofCMDs
:allDeps = Ra.A.deps U Rb.B.deps U Rc.C.deps...
If
CMDs
contains only the leastseq
commands(with two un-executed interfering commandRa.X
andRa.Y
:if
Ra.X.seq < Ra.Y.seq
, andRa.Y
is inCMDs
,Ra.X.seq
must be inCMDs
):For an un-executed command X not in
CMDs
and owned by one ofreplicas(CMDs)
, we have:X.seq > Ra.A.seq
.Thus X should never be executed before
Ra.A
.For an un-executed command Y not in
CMDs
and NOT owned by any ofreplicas(CMDs)
,if
replicas(allDeps) ⊆ replicas(CMDs)
,then Y depends on
CMDs
.Thus Y should never be executed before
Ra.A
.With (1) and (2), we have the conclusion
that
Ra.A
is the command that should be executed first.The abstract algorithm
Initialize an empty
CMDs
.Add the next least
(seq, replicaID)
command and itsdeps
intoCMDs
.Repeat this step until we find a
CMDs
so thatreplicas(allDeps) ⊆ replicas(CMDs)
, thenexecute the least command in
CMDs
.Thus we could execute command even before the entire SCC committed.
Because the number of replicas is finite and small, this process would finish quickly,
even when there are a lot of interfering commands.
In this way searching for an SCC is no more required and there won't be a livelock problem anymore.
The text was updated successfully, but these errors were encountered: