-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing nodes creates multiple processes out of 1 process #207
Comments
I am also seeing duplicated processes in my cluster. We have implemented the :name_conflict message, but my two processes are running concurrently without anyone of these getting the messages. |
Could someone make a minimal application that exhibits this behaviour? |
Sure, I'll share the same code I used back then in few hours |
Hi @KamilZielinski , |
@philipgiuliani Hey, I got it on my GH already. I need to clean it up from some things and describe scenario to reproduce. Sorry for the delay, I will try to drop it here soon |
Hey guys, sorry for the delay. Seems like I didn't push the correct version many months ago. Fortunately, I had changes locally. Long live GIT! (or something 😂 ) Repo is here: https://github.com/KamilZielinski/horderoro Start node 1/2/3 with port 4000/4001/4002 (simply change it in
Join them together eg. from node1
and verify that nodes are connected eg. from Node3
Now hit |
Hi everyone, some updates with this issue? I've got very strange behaviour when nodes are removed from the cluster (testing with a huge number of processes) and I suspect it's for the same cause |
The same issue is happening to me :( |
Script to reproduce: defmodule TestServer do
use GenServer
def start_link(val) do
GenServer.start_link(__MODULE__, val)
end
def init(val) do
IO.puts "starting TestServer #{inspect val}"
{:ok, val}
end
end
{:ok, _} =
Horde.DynamicSupervisor.start_link(
name: :sup_1,
strategy: :one_for_one,
delta_crdt_options: [sync_interval: 20]
)
{:ok, _} =
Horde.DynamicSupervisor.start_link(
name: :sup_2,
strategy: :one_for_one,
delta_crdt_options: [sync_interval: 20]
)
Horde.Cluster.set_members(:sup_1, [:sup_1, :sup_2])
Process.sleep(200)
{:ok, _} = Horde.DynamicSupervisor.start_child(:sup_1, {TestServer, "foo1"})
{:ok, _} = Horde.DynamicSupervisor.start_child(:sup_1, {TestServer, "foo2"})
{:ok, _pid} = Horde.DynamicSupervisor.start_child(:sup_2, {TestServer, "foo3"})
{:ok, _pid} = Horde.DynamicSupervisor.start_child(:sup_2, {TestServer, "foo4"})
Process.sleep(200)
stop_sup_2 = fn ->
# remove sup_2 and stop it
Horde.Cluster.set_members(:sup_1, [:sup_1])
Horde.DynamicSupervisor.stop(:sup_2)
Process.sleep(200)
end
restart_sup_2 = fn ->
# start sup_2 and add it
{:ok, _} =
Horde.DynamicSupervisor.start_link(
name: :sup_2,
strategy: :one_for_one,
delta_crdt_options: [sync_interval: 20]
)
Horde.Cluster.set_members(:sup_1, [:sup_1, :sup_2])
Process.sleep(200)
end
count_processes_in_crdt = fn ->
Enum.filter(DeltaCrdt.read(:"sup_1.Crdt"), fn
{{:process, _}, _} -> true
_ -> false
end)
|> length()
end
count_processes_in_supervisor = fn ->
length(Horde.DynamicSupervisor.which_children(:sup_1))
end
4 = count_processes_in_crdt.()
4 = count_processes_in_supervisor.()
stop_sup_2.()
restart_sup_2.()
6 = count_processes_in_supervisor.()
8 = count_processes_in_crdt.()
stop_sup_2.()
restart_sup_2.()
10 = count_processes_in_supervisor.()
12 = count_processes_in_crdt.() |
Hi, Upon inspection of the CRDT state during the test script, we noticed that after So there's two options here:
The first solution is already suggested in PR #226, to us that felt somewhat During the exploration of option 2 we realized that the :active/:passive setup We decided to call this on the recipient supervisor during handover because at |
I am seeing a similar behavior; I have some supervisors that start and register. I understood that registering multiple times with the same name would not be allowed, and on merge of the same process starting on multiple nodes, one of them would be killed. In fact I see that many processes are present multiple times. The process is started by calling:
when each application boot. What I get is that either it starts, or it is found duplicate, and not started. All is good. HordeRegistryWhile if I ask HordeRegistry about a specific process (that is supposed to be a singleton) it is present just once:
If I go look at the list of processes it is linked to, there are TWO processes for it:
HordeSupervisorIf I ask HordeSupervisor for the list of all children, I see that the process is present multiple times:
I wonder what the point of having Horde allow multiple instances of StatusSrv (with the same call parameters) would be. I thought that the whole point of Horde was to have a supervisor that would keep alive cluster-unique processes, and it did so by preventing and - when necessary - killing processes booting with the same name. Or am I missing something trivial? |
same issue is happening to me. Have you figured out a way to prevent it? |
@bernardo-martinez I don't have time to dig into this right now unfortunately. |
@bernardo-martinez could you try applying the patch from #226, I think it helps. I am not sure why a new random child ID is being generated on process handoff, the identity of the child is still the same, it is just being started somewhere else... |
@arjan i get the problem, what i don't get is where in the code should i dive (And how it works internally) in order to fix it. But ok when i have time i will go first through the path you suggest and if no sucess will go through the long way (studying all the code basically). Thanks for the replies :) |
@derekkraan I believe it could be a good idea to put a note on the main README that there are currently issues with process duplication - because this is a major issue. I personally decided to use a different approach because in our environment this was a complete no-go. I'm totally fine with you not having time/bandwidth to address these issues, as every open source maintainer knows (we are all on the same boat), but for such a critical piece of infrastructure as Horde is, it would definitely be nice to be clear on this topic, e.g. by saying that that there are some unsolved major issues, and that they won't be solved any time soon. People (even open source maintainers) have lives and jobs and other interests as well, so I don't think there is any loss of face in being open about it. Horde is still a terrific library. |
@l3nz can you elaborate on the workaround you used to avoid this? We're running into the same issue and it's creating issues with our rolling deploys. |
I basically have a singleton-per-cluster process using an approach from https://github.com/derekkraan/highlander that takes care of having a list of processes that must be up and distribuiting them to different cluster nodes. When there are topology changes, and in any case every few minutes, it checks to make sure that all processes that are supposed to be up are actually up and that all nodes are reachable. |
Hey guys. I was learning how the library works and encountered weird issue which I can't really understand nor find explanation in the docs: why having 1 process created within 3-nodes cluster can scale up to 8 processes after removing 2 nodes from the cluster?
Setup
Deps (I'm using 0.8.0-rc.1 as it solved the issue with init/1):
Issue
I have a DynamicSupervisor(members: :auto) which creates 1 process (transient, called Conversation in my example) on a API request with random name. I don't worry about moving the state between nodes for now. Process always lasts 60s just waiting and then dies.
My setup: 3 nodes (Phoenix APIs) started locally on different ports (Node1,2,3 from left to right on the screenshot).
The text was updated successfully, but these errors were encountered: