Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BW2 DR Failure Crashes Spawnpoint #22

Open
jhkolb opened this issue Apr 7, 2017 · 4 comments
Open

BW2 DR Failure Crashes Spawnpoint #22

jhkolb opened this issue Apr 7, 2017 · 4 comments

Comments

@jhkolb
Copy link
Contributor

jhkolb commented Apr 7, 2017

When the Bosswave DR for its namespace fails, a spawnpoint daemon also fails.

We need to ensure that spawnd prints some informative warnings when this occurs, but continues operation.

@jhkolb
Copy link
Contributor Author

jhkolb commented Jun 3, 2017

After looking into this in more detail, there are deeper problems. If the DR fails, then reestablishing spawnd's active subscriptions would be very tricky.

@jhkolb
Copy link
Contributor Author

jhkolb commented Oct 17, 2017

I've tried to make spawnd more resilient to Bosswave failures now, which also involved some changes to bw2bind.

DR failures should be handled better. We still have the issue of the (usually local) agent failing as well, but some of this should be handled by restarting through systemd and spawnd's efforts to cleanly recover old state.

Let's see how spawnd behaves in some real deployments for a while before closing this.

@immesys
Copy link
Member

immesys commented Oct 17, 2017

You should also try testing netsplits. I find the best way is to use iptables to drop all packets to/from the DR ip. Its distinct from DR failure in that the software doesn't get the TCP RST so it can behave quite differently

@jhkolb
Copy link
Contributor Author

jhkolb commented Oct 17, 2017

Cool, thanks for the advice! Yeah, I'll add that to queue of stuff to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants