Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If spec.replicas is not defined, should default to the current value #3689

Open
2 tasks done
tomasz-torcz-airspace-intelligence opened this issue Jul 1, 2024 · 2 comments · May be fixed by #4036
Open
2 tasks done
Labels
bug Something isn't working

Comments

@tomasz-torcz-airspace-intelligence
Copy link

tomasz-torcz-airspace-intelligence commented Jul 1, 2024

Checklist:

  • I've included steps to reproduce the bug.
  • I've included the version of argo rollouts.

Describe the bug

In #119, a defaulting of missing replicas: field to 1 was added. This behaviour could impact reliability of a service. Defaulting was added to workaround HPA's deficiency (kubernetes/kubernetes#111781)

We are using a custom autoscaler, not HPA. Our deployments tend not to have replicas: field defined. Our expectations is that during new version rollout, the number of replicas is carried-over from previous version.

What we observed, when replicas: field is not set, argo-rollouts sets it to 1. After few seconds, our autoscaler sets replicas: back to the proper value. Unfortunately during those few seconds Kubernetes starts killing the pods, and until new pods are spun up and ready, the service is degraded.

To Reproduce

Prepare a deployment without a spec.replicas: field. Observe "Defaulting .spec.replica to 1" message in argo rollout's log.

Expected behavior

The number of replicas, if not defined, should be carried over from current state. The code should do something like

	// In order to work with HPA, the rollout.Spec.Replica field cannot be nil. As a result, the controller will update
	// the rollout to have the replicas field set to the current or the default value. see https://github.com/argoproj/argo-rollouts/issues/119
	if rollout.Spec.Replicas == nil {
		if rollout.Status.Replicas != nil {
			replicas := rollout.Status.Replicas
		} else {
			replicas := pointer.Int32Ptr(defaults.DefaultReplicas)
		}
		logCtx.Info("Defaulting .spec.replicas to %d", replicas)
		r.Spec.Replicas = replicas

(Above isn't syntactically correct, sorry).

rollout.status.replicas: is NOT marked as an optional field, so should be available all the time.
The value of 0 (zero) of the number of status replicas may be wanted/correct and should be carried over, too.

Screenshots

Version
1.7.0

Logs

# Paste the logs from the rollout controller

# Logs for the entire controller:
kubectl logs -n argo-rollouts deployment/argo-rollouts

# Logs for a specific rollout:
kubectl logs -n argo-rollouts deployment/argo-rollouts | grep rollout=<ROLLOUTNAME

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@tomasz-torcz-airspace-intelligence tomasz-torcz-airspace-intelligence added the bug Something isn't working label Jul 1, 2024
@insylogo
Copy link

insylogo commented Dec 13, 2024

Yeah this is a serious issue that affects me as well. I use an HPA and if I don't set the replicas field to the current replica count, it wipes out all but one of my pods before recreating them. I have persistent client connections I can't afford to lose in this way.

@jamsesso
Copy link

jamsesso commented Jan 7, 2025

I came across this issue while debugging the same behavior using autoscaler/v2 HPA in the CNCF slack channel for argo rollouts as well: https://cloud-native.slack.com/archives/C01U781DW2E/p1736195993217089

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants