If spec.replicas is not defined, should default to the current value #3689

tomasz-torcz-airspace-intelligence · 2024-07-01T10:24:46Z

Checklist:

I've included steps to reproduce the bug.
I've included the version of argo rollouts.

Describe the bug

In #119, a defaulting of missing replicas: field to 1 was added. This behaviour could impact reliability of a service. Defaulting was added to workaround HPA's deficiency (kubernetes/kubernetes#111781)

We are using a custom autoscaler, not HPA. Our deployments tend not to have replicas: field defined. Our expectations is that during new version rollout, the number of replicas is carried-over from previous version.

What we observed, when replicas: field is not set, argo-rollouts sets it to 1. After few seconds, our autoscaler sets replicas: back to the proper value. Unfortunately during those few seconds Kubernetes starts killing the pods, and until new pods are spun up and ready, the service is degraded.

To Reproduce

Prepare a deployment without a spec.replicas: field. Observe "Defaulting .spec.replica to 1" message in argo rollout's log.

Expected behavior

The number of replicas, if not defined, should be carried over from current state. The code should do something like

	// In order to work with HPA, the rollout.Spec.Replica field cannot be nil. As a result, the controller will update
	// the rollout to have the replicas field set to the current or the default value. see https://github.com/argoproj/argo-rollouts/issues/119
	if rollout.Spec.Replicas == nil {
		if rollout.Status.Replicas != nil {
			replicas := rollout.Status.Replicas
		} else {
			replicas := pointer.Int32Ptr(defaults.DefaultReplicas)
		}
		logCtx.Info("Defaulting .spec.replicas to %d", replicas)
		r.Spec.Replicas = replicas

(Above isn't syntactically correct, sorry).

rollout.status.replicas: is NOT marked as an optional field, so should be available all the time.
The value of 0 (zero) of the number of status replicas may be wanted/correct and should be carried over, too.

Screenshots

Version
1.7.0

Logs

# Paste the logs from the rollout controller

# Logs for the entire controller:
kubectl logs -n argo-rollouts deployment/argo-rollouts

# Logs for a specific rollout:
kubectl logs -n argo-rollouts deployment/argo-rollouts | grep rollout=<ROLLOUTNAME

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

The text was updated successfully, but these errors were encountered:

insylogo · 2024-12-13T21:11:36Z

Yeah this is a serious issue that affects me as well. I use an HPA and if I don't set the replicas field to the current replica count, it wipes out all but one of my pods before recreating them. I have persistent client connections I can't afford to lose in this way.

jamsesso · 2025-01-07T19:55:39Z

I came across this issue while debugging the same behavior using autoscaler/v2 HPA in the CNCF slack channel for argo rollouts as well: https://cloud-native.slack.com/archives/C01U781DW2E/p1736195993217089

tomasz-torcz-airspace-intelligence added the bug Something isn't working label Jul 1, 2024

jamsesso linked a pull request Jan 9, 2025 that will close this issue

fix(controller): Set the rollout replica field to the previous revision replica count. Fixes #3689 #4036

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If spec.replicas is not defined, should default to the current value #3689

If spec.replicas is not defined, should default to the current value #3689

tomasz-torcz-airspace-intelligence commented Jul 1, 2024 •

edited

Loading

insylogo commented Dec 13, 2024 •

edited

Loading

jamsesso commented Jan 7, 2025

If spec.replicas is not defined, should default to the current value #3689

If spec.replicas is not defined, should default to the current value #3689

Comments

tomasz-torcz-airspace-intelligence commented Jul 1, 2024 • edited Loading

insylogo commented Dec 13, 2024 • edited Loading

jamsesso commented Jan 7, 2025

tomasz-torcz-airspace-intelligence commented Jul 1, 2024 •

edited

Loading

insylogo commented Dec 13, 2024 •

edited

Loading