FIX: Reconcile treafik service with canary at 0 #1692
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Setting the weight to 100 on both services makes 50% of the traffic go to each service. This made our canary enter an infinity loop while promoting a new version and the traefik service go altered.
The traefik service should not be changed as it is managed by flagger but getting stuck in an infinity loop is not great. The loop happened because during promotion with
StepWeightPromotion
when the traefik service gets reconciled the weights are reset. After that the getroutes makes thiscalculus for the weights which returns 0 for the canary and then it would later not be able to exit
this.
Besides this change do you know why are we treating the weights as percentages? Should I also change the get routes function to calculate the percentage based on the weights or is it coded like that because it is expected that flagger keeps the weights with those constraints?