-
Notifications
You must be signed in to change notification settings - Fork 538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exchanges continually report as not_responsible #1011
Comments
Is this during a cluster change operation, or ongoing in steady-state running? |
Ongoing, after the cluster has been initialised and transfers have completed etc. |
Hmm, looked back at the last time I tested with this form of AAE enabled - and I got this constantly in the background as well. I've touched the code here, so let me have a look - this isn't right. I have it in Riak 2.9.1, which version have you found it in? |
Merged bet branches into a 3.1 and saw it there, so assumed it was something from my merge. But looked on 3.0 and it's there also. |
Ah - my change was to add the log! |
Hmmm, so it's possibly something that's been there for a while, but silently? |
Perhaps. I may go back to when the log was added to make sure it occurs in that version. The log got introduced in 2.9.0 |
I have a working theory, that this is just wrong: https://github.com/basho/riak_kv/blob/develop-3.0/src/riak_kv_entropy_manager.erl#L958-L964 an Exchange is The RemoteIdx must be on another node (if target_n__val has been met by the cluster logic). So when this happens, and the RemoteIdx is used as a LocalIdx, that partition cannot be a primary when he hit this point - https://github.com/basho/riak_kv/blob/develop-3.0/src/riak_kv_entropy_manager.erl#L895. I just don't understand what |
The only theory is that prune_exchanges/1 is to make sure that each exchange happens only in 1 direction. So by screwing it by switching the indexes in the other direction all exchanges occur, but once not twice? |
So I think that does make sense. With prune_exchanges/1 doing the index switcheroo whenever LocalIdx > RemoteIdx, then 50% of the exchanges result in not_responsible. So Two nodes will generate the exchange between A and B for IndexN: one as {A, B, IndexN}, the other as {B, A, IndexN}. The first node will run the exchange for that combination, the second will not. So the exchange will happen, but only once per cycle. If I remove So I think the |
Yep, you're right there I think. It seems like it is deliberately switched to prevent us from uneccessarily doing the same exchange twice, instigated from both the owner and the fallback, which makes sense. Just seems a crappy way of going about it - unless there's more to it that I'm not seeing. Changing the list comp in
I'll make this change PR anyway and let it sit for review for a bit / mull it over. |
2020-03-20 10:08:14.611 [info] <0.1670.0>@riak_kv_entropy_manager:maybe_exchange:986 Attempt to start exchange between {1096126227998177188652763624537212264741949407232,3} and 1141798154164767904846628775559596109106197299200 resulted in not_responsible
Must be the case that the exchange_queue in entropy_manager is being populated errorneously. I'll dig into this a bit more, doesn't seem like expected behaviour, and I've not noticed it in previous builds. Have you seen this befgore @martinsumner?
The text was updated successfully, but these errors were encountered: