-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Checklist:
- I've included steps to reproduce the bug.
- I've included the version of argo rollouts.
Describe the bug
When Stable ReplicaSet is not ready (e.g. spec.replicas was just updated), setHeaderTraffic step is unexpectedly skipped and marked as completed without modifying the VirtualService.
I believe this is because the guardrail func called checkReplicasAvailable (introduced in #3878) returns false and reconcileTrafficRouting() finishes if the Stable ReplicaSet is not ready:
argo-rollouts/rollout/trafficrouting.go
Lines 271 to 284 in 7518bde
| // check if the stable RS has enough pods before recalculating the new | |
| // weight status. | |
| if !c.checkReplicasAvailable(c.stableRS, weightutil.MaxTrafficWeight(c.rollout)-desiredWeight) { | |
| return nil | |
| } | |
| // We need to check for revision > 1 because when we first install the rollout we run step 0 this prevents that. | |
| // There is a bigger fix needed for the reasons on why we run step 0 on rollout install, that needs to be explored. | |
| revision, revisionFound := annotations.GetRevisionAnnotation(c.rollout) | |
| if currentStep != nil && (revisionFound && revision > 1) { | |
| if currentStep.SetHeaderRoute != nil { | |
| if err = reconciler.SetHeaderRoute(currentStep.SetHeaderRoute); err != nil { | |
| return err | |
| } | |
| } |
To Reproduce
First apply the following Rollout, then update spec.replicas to 10 and image: argoproj/rollouts-demo:yellow.
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: rollouts-demo
spec:
replicas: 1
strategy:
canary:
canaryService: rollouts-demo-canary
stableService: rollouts-demo-stable
trafficRouting:
istio:
virtualServices:
- name: rollouts-demo-vsvc1
routes:
- primary
managedRoutes:
- name: test-route
steps:
- setCanaryScale:
replicas: 1
- setHeaderRoute:
name: "test-route"
match:
- headerName: "X-Test"
headerValue:
exact: "Hello"
- pause: {}
revisionHistoryLimit: 2
selector:
matchLabels:
app: rollouts-demo
template:
metadata:
labels:
app: rollouts-demo
istio-injection: enabled
spec:
containers:
- name: rollouts-demo
image: argoproj/rollouts-demo:blue
ports:
- name: http
containerPort: 8080
protocol: TCP
resources:
requests:
memory: 32Mi
cpu: 5m
Once rollout is paused, check the VirtualService, but header match route is not set.
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: rollouts-demo-vsvc1
namespace: default
spec:
gateways:
- rollouts-demo-gateway
hosts:
- rollouts-demo-vsvc1.local
http:
- name: primary
route:
- destination:
host: rollouts-demo-stable
port:
number: 15372
weight: 100
- destination:
host: rollouts-demo-canary
port:
number: 15372
weight: 0
Expected behavior
VirtualService should be updated with the following header match route:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: rollouts-demo-vsvc1
spec:
gateways:
- rollouts-demo-gateway
hosts:
- rollouts-demo-vsvc1.local
http:
- match:
- headers:
X-Test:
exact: Hello
name: test-route
route:
- destination:
host: rollouts-demo-canary
port:
number: 15372
weight: 100
- name: primary
route:
- destination:
host: rollouts-demo-stable
port:
number: 15372
weight: 100
- destination:
host: rollouts-demo-canary
port:
number: 15372
weight: 0
Version
$ kubectl argo rollouts version
kubectl-argo-rollouts: v1.8.3+49fa151
BuildDate: 2025-06-04T22:19:21Z
GitCommit: 49fa1516cf71672b69e265267da4e1d16e1fe114
GitTreeState: clean
GoVersion: go1.23.9
Compiler: gc
Platform: darwin/amd64
Logs
Note that the step name (invalid) was known issue, which was fixed in #4490.
time="2025-12-15T03:25:37Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Reconciling TrafficRouting with type 'Istio'" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="ReplicaSet 'rollouts-demo-76c569d8b' has 4 available replicas, waiting for 10" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"default\", Name:\"rollouts-demo\", UID:\"6e4e1920-dd0b-44cc-a135-1162bc04ac07\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"1765769137406719012\", FieldPath:\"\"}): type: 'Normal' reason: 'SwitchService' Switched selector for service 'rollouts-demo-canary' from '76c569d8b' to '5cf4fb69f8'"
time="2025-12-15T03:25:37Z" level=info msg="Rollout step 1/3 completed (setCanaryScale{replicas: 1})" event_reason=RolloutStepCompleted namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"default\", Name:\"rollouts-demo\", UID:\"6e4e1920-dd0b-44cc-a135-1162bc04ac07\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"1765769137406719012\", FieldPath:\"\"}): type: 'Normal' reason: 'RolloutStepCompleted' Rollout step 1/3 completed (setCanaryScale{replicas: 1})"
time="2025-12-15T03:25:37Z" level=info msg="syncing service" namespace=default rollout=rollouts-demo service=rollouts-demo-canary
time="2025-12-15T03:25:37Z" level=info msg="Patched: {\"status\":{\"availableReplicas\":5,\"currentStepIndex\":1,\"readyReplicas\":5}}" generation=2 namespace=default resourceVersion=1765769137406719012 rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="persisted to informer" generation=2 namespace=default resourceVersion=1765769137515599012 rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Reconciliation completed" generation=2 namespace=default resourceVersion=1765769137406719012 rollout=rollouts-demo time_ms=87.583723
time="2025-12-15T03:25:37Z" level=info msg="Started syncing rollout" generation=2 namespace=default resourceVersion=1765769137515599012 rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Reconciling TrafficRouting with type 'Istio'" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="ReplicaSet 'rollouts-demo-76c569d8b' has 4 available replicas, waiting for 10" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Rollout step 2/3 completed (invalid)" event_reason=RolloutStepCompleted namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"default\", Name:\"rollouts-demo\", UID:\"6e4e1920-dd0b-44cc-a135-1162bc04ac07\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"1765769137515599012\", FieldPath:\"\"}): type: 'Normal' reason: 'RolloutStepCompleted' Rollout step 2/3 completed (invalid)"
time="2025-12-15T03:25:37Z" level=info msg="Start processing" resource=default/rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Processing completed" resource=default/rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Patched: {\"status\":{\"currentStepIndex\":2}}" generation=2 namespace=default resourceVersion=1765769137515599012 rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="persisted to informer" generation=2 namespace=default resourceVersion=1765769137556735012 rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Reconciliation completed" generation=2 namespace=default resourceVersion=1765769137515599012 rollout=rollouts-demo time_ms=36.272875000000006
time="2025-12-15T03:25:37Z" level=info msg="Started syncing rollout" generation=2 namespace=default resourceVersion=1765769137556735012 rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Reconciling TrafficRouting with type 'Istio'" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="ReplicaSet 'rollouts-demo-76c569d8b' has 4 available replicas, waiting for 10" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Reconciling canary pause step (stepIndex: 2/3)" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Not finished reconciling Canary Pause" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Adding pause reason CanaryPauseStep with start time 2025-12-15T03:25:37Z" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Start processing" resource=default/rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Processing completed" resource=default/rollouts-demo
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.