Skip to content

SetHeaderTraffic is skipped unexpectedly if Stable ReplicaSet is not ready #4561

@yfuruyama

Description

@yfuruyama

Checklist:

  • I've included steps to reproduce the bug.
  • I've included the version of argo rollouts.

Describe the bug

When Stable ReplicaSet is not ready (e.g. spec.replicas was just updated), setHeaderTraffic step is unexpectedly skipped and marked as completed without modifying the VirtualService.

I believe this is because the guardrail func called checkReplicasAvailable (introduced in #3878) returns false and reconcileTrafficRouting() finishes if the Stable ReplicaSet is not ready:

// check if the stable RS has enough pods before recalculating the new
// weight status.
if !c.checkReplicasAvailable(c.stableRS, weightutil.MaxTrafficWeight(c.rollout)-desiredWeight) {
return nil
}
// We need to check for revision > 1 because when we first install the rollout we run step 0 this prevents that.
// There is a bigger fix needed for the reasons on why we run step 0 on rollout install, that needs to be explored.
revision, revisionFound := annotations.GetRevisionAnnotation(c.rollout)
if currentStep != nil && (revisionFound && revision > 1) {
if currentStep.SetHeaderRoute != nil {
if err = reconciler.SetHeaderRoute(currentStep.SetHeaderRoute); err != nil {
return err
}
}

To Reproduce

First apply the following Rollout, then update spec.replicas to 10 and image: argoproj/rollouts-demo:yellow.

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollouts-demo
spec:
  replicas: 1
  strategy:
    canary:
      canaryService: rollouts-demo-canary
      stableService: rollouts-demo-stable
      trafficRouting:
        istio:
          virtualServices:
          - name: rollouts-demo-vsvc1
            routes:
            - primary
        managedRoutes:
          - name: test-route
      steps:
      - setCanaryScale:
          replicas: 1
      - setHeaderRoute:
          name: "test-route"
          match:
          - headerName: "X-Test"
            headerValue:
              exact: "Hello"
      - pause: {}
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: rollouts-demo
  template:
    metadata:
      labels:
        app: rollouts-demo
        istio-injection: enabled
    spec:
      containers:
      - name: rollouts-demo
        image: argoproj/rollouts-demo:blue
        ports:
        - name: http
          containerPort: 8080
          protocol: TCP
        resources:
          requests:
            memory: 32Mi
            cpu: 5m

Once rollout is paused, check the VirtualService, but header match route is not set.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: rollouts-demo-vsvc1
  namespace: default
spec:
  gateways:
  - rollouts-demo-gateway
  hosts:
  - rollouts-demo-vsvc1.local
  http:
  - name: primary
    route:
    - destination:
        host: rollouts-demo-stable
        port:
          number: 15372
      weight: 100
    - destination:
        host: rollouts-demo-canary
        port:
          number: 15372
      weight: 0

Expected behavior

VirtualService should be updated with the following header match route:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: rollouts-demo-vsvc1
spec:
  gateways:
  - rollouts-demo-gateway
  hosts:
  - rollouts-demo-vsvc1.local
  http:
  - match:
    - headers:
        X-Test:
          exact: Hello
    name: test-route
    route:
    - destination:
        host: rollouts-demo-canary
        port:
          number: 15372
      weight: 100
  - name: primary
    route:
    - destination:
        host: rollouts-demo-stable
        port:
          number: 15372
      weight: 100
    - destination:
        host: rollouts-demo-canary
        port:
          number: 15372
      weight: 0

Version

$ kubectl argo rollouts version
kubectl-argo-rollouts: v1.8.3+49fa151
  BuildDate: 2025-06-04T22:19:21Z
  GitCommit: 49fa1516cf71672b69e265267da4e1d16e1fe114
  GitTreeState: clean
  GoVersion: go1.23.9
  Compiler: gc
  Platform: darwin/amd64

Logs

Note that the step name (invalid) was known issue, which was fixed in #4490.

time="2025-12-15T03:25:37Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Reconciling TrafficRouting with type 'Istio'" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="ReplicaSet 'rollouts-demo-76c569d8b' has 4 available replicas, waiting for 10" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"default\", Name:\"rollouts-demo\", UID:\"6e4e1920-dd0b-44cc-a135-1162bc04ac07\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"1765769137406719012\", FieldPath:\"\"}): type: 'Normal' reason: 'SwitchService' Switched selector for service 'rollouts-demo-canary' from '76c569d8b' to '5cf4fb69f8'"
time="2025-12-15T03:25:37Z" level=info msg="Rollout step 1/3 completed (setCanaryScale{replicas: 1})" event_reason=RolloutStepCompleted namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"default\", Name:\"rollouts-demo\", UID:\"6e4e1920-dd0b-44cc-a135-1162bc04ac07\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"1765769137406719012\", FieldPath:\"\"}): type: 'Normal' reason: 'RolloutStepCompleted' Rollout step 1/3 completed (setCanaryScale{replicas: 1})"
time="2025-12-15T03:25:37Z" level=info msg="syncing service" namespace=default rollout=rollouts-demo service=rollouts-demo-canary
time="2025-12-15T03:25:37Z" level=info msg="Patched: {\"status\":{\"availableReplicas\":5,\"currentStepIndex\":1,\"readyReplicas\":5}}" generation=2 namespace=default resourceVersion=1765769137406719012 rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="persisted to informer" generation=2 namespace=default resourceVersion=1765769137515599012 rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Reconciliation completed" generation=2 namespace=default resourceVersion=1765769137406719012 rollout=rollouts-demo time_ms=87.583723
time="2025-12-15T03:25:37Z" level=info msg="Started syncing rollout" generation=2 namespace=default resourceVersion=1765769137515599012 rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Reconciling TrafficRouting with type 'Istio'" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="ReplicaSet 'rollouts-demo-76c569d8b' has 4 available replicas, waiting for 10" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Rollout step 2/3 completed (invalid)" event_reason=RolloutStepCompleted namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"default\", Name:\"rollouts-demo\", UID:\"6e4e1920-dd0b-44cc-a135-1162bc04ac07\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"1765769137515599012\", FieldPath:\"\"}): type: 'Normal' reason: 'RolloutStepCompleted' Rollout step 2/3 completed (invalid)"
time="2025-12-15T03:25:37Z" level=info msg="Start processing" resource=default/rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Processing completed" resource=default/rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Patched: {\"status\":{\"currentStepIndex\":2}}" generation=2 namespace=default resourceVersion=1765769137515599012 rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="persisted to informer" generation=2 namespace=default resourceVersion=1765769137556735012 rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Reconciliation completed" generation=2 namespace=default resourceVersion=1765769137515599012 rollout=rollouts-demo time_ms=36.272875000000006
time="2025-12-15T03:25:37Z" level=info msg="Started syncing rollout" generation=2 namespace=default resourceVersion=1765769137556735012 rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Reconciling TrafficRouting with type 'Istio'" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="ReplicaSet 'rollouts-demo-76c569d8b' has 4 available replicas, waiting for 10" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Reconciling canary pause step (stepIndex: 2/3)" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Not finished reconciling Canary Pause" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Adding pause reason CanaryPauseStep with start time 2025-12-15T03:25:37Z" namespace=default rollout=rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Start processing" resource=default/rollouts-demo
time="2025-12-15T03:25:37Z" level=info msg="Processing completed" resource=default/rollouts-demo

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions