Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

abortScaleDownDelaySeconds is not working as expected #1841

Open
mksha opened this issue Feb 3, 2022 · 12 comments
Open

abortScaleDownDelaySeconds is not working as expected #1841

mksha opened this issue Feb 3, 2022 · 12 comments
Assignees
Labels
bug Something isn't working no-issue-activity workaround There's a workaround, might not be great, but exists

Comments

@mksha
Copy link
Contributor

mksha commented Feb 3, 2022

Summary

What happened/what you expected to happen?
when we set abortScaleDownDelaySeconds:0 then it should not delete the canary pod but its deleting it immediately.

Diagnostics

What version of Argo Rollouts are you running?
1.1.1

# Paste the logs from the rollout controller

# Logs for the entire controller:
kubectl logs -n argo-rollouts deployment/argo-rollouts

# Logs for a specific rollout:
kubectl logs -n argo-rollouts deployment/argo-rollouts | grep rollout=<ROLLOUTNAME>

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@mksha mksha added the bug Something isn't working label Feb 3, 2022
@jessesuen
Copy link
Member

@mksha this option is ignored if you are doing a basic canary (not using a traffic router). Can you confirm you are using a traffic router? Which one are you using?

@mksha
Copy link
Contributor Author

mksha commented Feb 4, 2022

@jessesuen I am using the canary with traffic routing.
I am using istio with destination rule

@huikang
Copy link
Member

huikang commented Feb 4, 2022

I can reproduce the issue and will work on a fix. Thanks, @mksha

@huikang huikang self-assigned this Feb 4, 2022
@huikang
Copy link
Member

huikang commented Feb 4, 2022

@mksha , after second thought, I think this is an expected behavior if you don't use setCanaryScale in a step, e.g.,

      - setCanaryScale:
          replicas: 3

Otherwise, the code will set the new replica to nil:

if rollout.Status.Abort {
if abortDelay, _ := defaults.GetAbortScaleDownDelaySecondsOrDefault(rollout); abortDelay != nil {
// If rollout is aborted do not use the set canary scale, *unless* the user explicitly
// indicated to leave the canary scaled up (abortScaleDownDelaySeconds: 0).
return nil
}
}
currentStep, currentStepIndex := GetCurrentCanaryStep(rollout)
if currentStep == nil {
// setCanaryScale feature is unused
return nil
}
for i := *currentStepIndex; i >= 0; i-- {
step := rollout.Spec.Strategy.Canary.Steps[i]
if step.SetCanaryScale == nil {
continue
}
if step.SetCanaryScale.MatchTrafficWeight {
return nil
}
return step.SetCanaryScale
}
return nil

@huikang huikang added the workaround There's a workaround, might not be great, but exists label Feb 4, 2022
@jessesuen
Copy link
Member

@mksha could you please provide more details including rollout spec, and kubernetes rollout events when this happens? I think we don't have enough information to go off of currently.

@mksha
Copy link
Contributor Author

mksha commented Feb 6, 2022

I am using the same spec shared at #1838.

@mksha
Copy link
Contributor Author

mksha commented Feb 6, 2022

@jessesuen

@mksha
Copy link
Contributor Author

mksha commented Feb 7, 2022

@huikang

  1. You are correct, I tried that and it worked, so one that we need to do is, make sure document is updated so we know how to use it with what.
  2. Still there is a weird behaviour, if we use the
    steps:
    - setCanaryScale:
    weight: 100
    - some-analysis-template
    - setCanaryScale:
    matchTrafficWeight: true
    - setWeight: 10
    - pause: {}
    in that case, when we abort after the pause, it create a new pod using same canary replica set and destroy the existing one that fullfill the purpose of abosrtScaleDownDelaySeconds but it cause canary requests to fail, because old canary pod is deleted and new one is being created so no pod to server the canary requests.

So we need to make sure when we add

  • setCanaryScale:
    matchTrafficWeight: true

it uses the same pod that was created by
- setCanaryScale:
weight: 100

@mksha
Copy link
Contributor Author

mksha commented Feb 15, 2022

@huikang @jessesuen any thoughts?

@mksha
Copy link
Contributor Author

mksha commented Mar 1, 2022

@jgwest @huikang any thoughts?

@github-actions
Copy link
Contributor

This issue is stale because it has been open 60 days with no activity.

@bishalthapa-t
Copy link
Contributor

bishalthapa-t commented Sep 16, 2024

@huikang @zachaller Could you please provide an update or an estimated timeline for addressing this bug?

Additionally, I have submitted a pull request to improve the documentation. You can review the changes here: PR #3835. Please take a look and let me know if it accurately addresses the scenario at hand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working no-issue-activity workaround There's a workaround, might not be great, but exists
Projects
None yet
Development

No branches or pull requests

6 participants