-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R&D] Improve the recovery of NSM clients #1565
Comments
Another problem scenario:
Description: |
We found that for begin.Request() and for eventFactory.Request() we generally use a very similar logic. Most likely, this can also help with modifying the retry functionality - we can use it from the Question: |
retry problemsI've considered adding retry after the begin chain element.
Possible solutionIn my opinion, the problem here is that we are calling eventFactory.Executor from two different places:
I would suggest:
|
We have done a lot of work to improve the recovery of NSM in the past. Recently we've found a few use-cases that conceptually do not work with existing code. I also noticed that the
heal
chain element could be simplified.Problem scenarios
Scenario 1:
Actual: no connection established on forwareders re-deployed
Scenario 2:
Actual: no connection established on forwareders re-deployed
Scenario 3:
Actual: no connection established on forwareders re-deployed
Scenario 4:
Actual: no connection established on forwareders re-deployed
Solution
Rework a few chain elements and reorganize the code.
Changes
heal
chain element:1.1. simply starts monitoring DP/CP goroutines.
1.2. If something goes wrong with CP Monitroing, a refresh request should be scheduled.
1.3. If something goes wrong with DP monitoring, a refresh request with reselection should be scheduled.
retry
chain element:2.1. should not be a wrapper.
2.2. should use
begin
for retriesbegin
should be able to cancel the current request if a reselect request is called.The text was updated successfully, but these errors were encountered: