Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent primary hpa collision for keda scaled objects when migrating from an hpa #1677

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jdgeisler
Copy link

This MR fixes #1646 where Flagger fails to create the primary hpa for a keda scaled object when you are migrating from a regular hpa to a scaled object in a canary.

Using the scaledobject.keda.sh/transfer-hpa-ownership: "true" annotation, a keda scaled object can take ownership of an already existing HPA. Currently, annotations are not copied over from the canary scaled object to the primary scaled object, so I am adding this functionality.

Additionally, we also need to add the Advanced.HorizontalPodAutoscalerConfig.Name field to the ScaledObjectSpec so that we can specify the HPA name that the scaled object will take ownership of. When creating the primary scaled object from the canary, we use the same hpa name but append -primary to the end of it.

Testing

With the above changes, we can then successfully use the transfer hpa ownership annotation and specify the HPA that the scaled objects will take ownership of. This prevents KEDA from failing to create the new primary hpa since it already exists. Instead it just uses and modifies the already existing one.

Applying the following diff to test in a kubernetes cluster:

fortio, fortio-server-deployment-2, Canary (flagger.app) has changed:
  # Source: fortio/templates/fortio.yaml
  apiVersion: flagger.app/v1beta1
  kind: Canary
  metadata:
    name: fortio-server-deployment-2
    namespace: fortio
  spec:
    analysis:
      interval: 30s
      maxWeight: 50
      metrics:
      - interval: 30s
        name: request-success-rate
        thresholdRange:
          min: 99
      stepWeight: 10
      threshold: 10
      webhooks:
      - metadata:
          async: "on"
          c: "6"
          labels: fortio-server-deployment-2
          load: Start
          nocatchup: "on"
          qps: "12"
          save: "on"
          stdclient: "on"
          t: 600s
          uniform: "on"
          url: http://fortio-server-deployment-2-canary.fortio.svc.cluster.local:8080/
        name: generateLoad
        timeout: 60s
        type: pre-rollout
        url: http://fortio-client.fortio.svc.cluster.local:8080/fortio/rest/run?jsonPath=.metadata
    progressDeadlineSeconds: 600
    autoscalerRef:
-     apiVersion: autoscaling/v2
-     kind: HorizontalPodAutoscaler
+     apiVersion: keda.sh/v1alpha1
+     kind: ScaledObject
      name: fortio-server-deployment-2
    targetRef:
      apiVersion: apps/v1
      kind: Deployment
      name: fortio-server-deployment-2
    service:
      port: 8080
      targetPort: 8080
fortio, fortio-server-deployment-2, HorizontalPodAutoscaler (autoscaling) has been removed:
- # Source: fortio/templates/fortio.yaml
- apiVersion: autoscaling/v2
- kind: HorizontalPodAutoscaler
- metadata:
-   name: fortio-server-deployment-2
-   namespace: fortio
- spec:
-   maxReplicas: 4
-   metrics:
-   - resource:
-       name: cpu
-       target:
-         averageUtilization: 99
-         type: Utilization
-     type: Resource
-   minReplicas: 2
-   scaleTargetRef:
-     apiVersion: apps/v1
-     kind: Deployment
-     name: fortio-server-deployment-2
+
fortio, fortio-server-deployment-2, ScaledObject (keda.sh) has been added:
-
+ # Source: fortio/templates/fortio.yaml
+ apiVersion: keda.sh/v1alpha1
+ kind: ScaledObject
+ metadata:
+   name: fortio-server-deployment-2
+   namespace: fortio
+   annotations:
+     scaledobject.keda.sh/transfer-hpa-ownership: "true"
+ spec:
+   advanced:
+     horizontalPodAutoscalerConfig:
+       name: fortio-server-deployment-2
+   scaleTargetRef:
+     name: fortio-server-deployment-2
+   minReplicaCount: 2
+   maxReplicaCount: 4
+   triggers:
+   - type: prometheus
+     metadata:
+       serverAddress: http://kube-prometheus-stack-prometheus:9090/prometheus
+       threshold: '100'
+       query: sum(rate(istio_requests_total{destination_service_name="fortio-server-2"}[1m]))
+   - type: cpu
+     metricType: Utilization
+     metadata:
+       value: "99"

See that the primary scaled object successfully takes over ownership of the primary hpa and it no longer fails.

(⎈|arn:aws:eks:us-east-2:824163854805:cluster/rokumesh-auto-jgeisler-t7:fortio)jgeisler@PK4V290RH9 platform % k get so
NAME                                 SCALETARGETKIND      SCALETARGETNAME                      MIN   MAX   TRIGGERS     AUTHENTICATION   READY   ACTIVE    FALLBACK   PAUSED    AGE
fortio-server-deployment-2           apps/v1.Deployment   fortio-server-deployment-2           2     4     prometheus                    True    Unknown   False      True      170m
fortio-server-deployment-2-primary   apps/v1.Deployment   fortio-server-deployment-2-primary   2     4     prometheus                    True    False     False      Unknown   170m
(⎈|arn:aws:eks:us-east-2:824163854805:cluster/rokumesh-auto-jgeisler-t7:fortio)jgeisler@PK4V290RH9 platform % k get hpa
NAME                                 REFERENCE                                       TARGETS               MINPODS   MAXPODS   REPLICAS   AGE
fortio-server-deployment-2-primary   Deployment/fortio-server-deployment-2-primary   0/100 (avg), 4%/99%   2         4         2          13d

@jdgeisler jdgeisler force-pushed the keda-scaled-object-hpa-migration branch 2 times, most recently from b446861 to c8859f4 Compare July 10, 2024 15:26
@jdgeisler jdgeisler force-pushed the keda-scaled-object-hpa-migration branch from c8859f4 to 5965983 Compare August 13, 2024 14:16
@jdgeisler
Copy link
Author

Hey @stefanprodan and @aryan9600, looking for some feedback here on this PR and/or the issue it fixes #1646

Copy link
Member

@aryan9600 aryan9600 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for raising this PR! i did a quick first pass. will review it once more in the coming days.

pkg/canary/scaled_object_reconciler.go Outdated Show resolved Hide resolved
pkg/canary/scaled_object_reconciler.go Outdated Show resolved Hide resolved
@MohammedShetaya
Copy link

Thanks for the fix @jdgeisler

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 80.00000% with 6 lines in your changes missing coverage. Please review.

Project coverage is 34.80%. Comparing base (61d81ff) to head (c162e30).
Report is 30 commits behind head on main.

Files with missing lines Patch % Lines
pkg/canary/scaled_object_reconciler.go 80.00% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1677      +/-   ##
==========================================
+ Coverage   34.10%   34.80%   +0.70%     
==========================================
  Files         282      283       +1     
  Lines       20556    24646    +4090     
==========================================
+ Hits         7010     8579    +1569     
- Misses      12616    15128    +2512     
- Partials      930      939       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@aryan9600 aryan9600 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! 🎖️
could you squash the two commits into one with an appropriate message and then rebase on top of main?

@jdgeisler jdgeisler force-pushed the keda-scaled-object-hpa-migration branch from c162e30 to 7104b35 Compare October 28, 2024 14:24
@jdgeisler jdgeisler force-pushed the keda-scaled-object-hpa-migration branch from 7104b35 to 4aae393 Compare October 28, 2024 14:27
@jdgeisler
Copy link
Author

lgtm! 🎖️ could you squash the two commits into one with an appropriate message and then rebase on top of main?

Done, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KEDA Scaled Object Primary creation failed due to workload already managed by the hpa
4 participants