Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResourceBinding claims FullyApplied when not healthy #5867

Closed
a7i opened this issue Nov 24, 2024 · 9 comments
Closed

ResourceBinding claims FullyApplied when not healthy #5867

a7i opened this issue Nov 24, 2024 · 9 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@a7i
Copy link
Contributor

a7i commented Nov 24, 2024

What happened:
The ResourceBinding claims FullyApplied even when the work status in unhealthy.

What you expected to happen:
Resourcebinding to claim FullyApplied ONLY when work status is considered healthy.

How to reproduce it (as minimally and precisely as possible):

I have a Health Interpreter that basically looks for a condition called 'ChangeApplied' to be "True":

apiVersion: config.karmada.io/v1alpha1
kind: ResourceInterpreterCustomization
metadata:
  name: service-access
spec:
  target:
    apiVersion: service.access.com/v1alpha1
    kind: ServiceAccess
  customizations:
    healthInterpretation:
      luaScript: |
        function isChangeApplied(condition)
          return condition.type == 'ChangeApplied' and condition.status == 'True'
        end

        function InterpretHealth(observedObj)
          if observedObj.status ~= nil and observedObj.status.conditions ~= nil then
            for conditionIndex = 1, #observedObj.status.conditions do
              if isChangeApplied(observedObj.status.conditions[conditionIndex]) then
                return true
              end
            end
          end
          return false
        end

It appears that ResourceBinding is marked as fully applied, even through the
resource status is not considered healthy (per my customized interpreter).
Note .status.conditions[1]

apiVersion: work.karmada.io/v1alpha2
kind: ResourceBinding
metadata:
  annotations:
    policy.karmada.io/applied-placement: '{"clusterAffinity":{"clusterNames":["member-01"]},"clusterTolerations":[{"key":"cluster.karmada.io/not-ready","operator":"Exists","effect":"NoExecute","tolerationSeconds":300},{"key":"cluster.karmada.io/unreachable","operator":"Exists","effect":"NoExecute","tolerationSeconds":300}]}'
    propagationpolicy.karmada.io/name: karmada-test
    propagationpolicy.karmada.io/namespace: karmada-test
  creationTimestamp: "2024-11-20T05:06:11Z"
  finalizers:
  - karmada.io/binding-controller
  generation: 3
  labels:
    propagationpolicy.karmada.io/permanent-id: e82039ca-c99c-4770-99c4-32696654c113
    resourcebinding.karmada.io/permanent-id: b5667b85-1933-4b9c-a3c2-71355825f6ff
  name: karmada-test-gbtrr-serviceaccess
  namespace: karmada-test
  ownerReferences:
  - apiVersion: service.access.com/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: ServiceAccess
    name: karmada-test-gbtrr
    uid: d93cfd13-35ab-4951-92f5-690bbaa636ce
  resourceVersion: "7179628430"
  uid: 1f68051b-32b0-4888-b635-d52dae3ab115
spec:
  clusters:
  - name: member-01
  conflictResolution: Abort
  placement:
    clusterAffinity:
      clusterNames:
      - member-01
    clusterTolerations:
    - effect: NoExecute
      key: cluster.karmada.io/not-ready
      operator: Exists
      tolerationSeconds: 300
    - effect: NoExecute
      key: cluster.karmada.io/unreachable
      operator: Exists
      tolerationSeconds: 300
  resource:
    apiVersion: service.access.com/v1alpha1
    kind: ServiceAccess
    name: karmada-test-gbtrr
    namespace: karmada-test
    resourceVersion: "7179627852"
    uid: d93cfd13-35ab-4951-92f5-690bbaa636ce
  schedulerName: default-scheduler
status:
  aggregatedStatus:
  - applied: true
    clusterName: member-01
    health: Unhealthy
    status:
      legacy:
        legacycrname: sac-legacy-staging-member-01-karmada-test-e7fd0cfb
        policyarns: []
        reasoncode: LegacyErrorInternal
        status: Running
      retriesremaining: 3
      servicerole:
        reasoncode: ServiceRoleCompleted
        rolearn: arn:aws:iam::1111111111111111:role/self-serve/staging-member-01-karmada-test-e7fd0cfb
        rolecrname: sac-role-staging-member-01-karmada-test-e7fd0cfb
        status: Succeeded
  conditions:
  - lastTransitionTime: "2024-11-20T05:06:11Z"
    message: Binding has been scheduled successfully.
    reason: Success
    status: "True"
    type: Scheduled
  - lastTransitionTime: "2024-11-20T05:06:11Z"
    message: All works have been successfully applied
    reason: FullyAppliedSuccess
    status: "True"  # <----- This is incorrect, notice no 'ChangeApplied' condition in aggreatedStatus[0].status.conditions
    type: FullyApplied
  lastScheduledTime: "2024-11-20T05:06:11Z"
  schedulerObservedGeneration: 3

The manifest is the final state where the operator actually catches up. So this state is correct, but per above, it was claimed FullyApplied when it shouldn't have been.

apiVersion: work.karmada.io/v1alpha2
kind: ResourceBinding
metadata:
  annotations:
    policy.karmada.io/applied-placement: '{"clusterAffinity":{"clusterNames":["member-01"]},"clusterTolerations":[{"key":"cluster.karmada.io/not-ready","operator":"Exists","effect":"NoExecute","tolerationSeconds":300},{"key":"cluster.karmada.io/unreachable","operator":"Exists","effect":"NoExecute","tolerationSeconds":300}]}'
    propagationpolicy.karmada.io/name: karmada-test
    propagationpolicy.karmada.io/namespace: karmada-test
  creationTimestamp: "2024-11-20T05:06:11Z"
  finalizers:
  - karmada.io/binding-controller
  generation: 3
  labels:
    propagationpolicy.karmada.io/permanent-id: e82039ca-c99c-4770-99c4-32696654c113
    resourcebinding.karmada.io/permanent-id: b5667b85-1933-4b9c-a3c2-71355825f6ff
  name: karmada-test-gbtrr-serviceaccess
  namespace: karmada-test
  ownerReferences:
  - apiVersion: service.access.com/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: ServiceAccess
    name: karmada-test-gbtrr
    uid: d93cfd13-35ab-4951-92f5-690bbaa636ce
  resourceVersion: "7179628432"
  uid: 1f68051b-32b0-4888-b635-d52dae3ab115
spec:
  clusters:
  - name: member-01
  conflictResolution: Abort
  placement:
    clusterAffinity:
      clusterNames:
      - member-01
    clusterTolerations:
    - effect: NoExecute
      key: cluster.karmada.io/not-ready
      operator: Exists
      tolerationSeconds: 300
    - effect: NoExecute
      key: cluster.karmada.io/unreachable
      operator: Exists
      tolerationSeconds: 300
  resource:
    apiVersion: service.access.com/v1alpha1
    kind: ServiceAccess
    name: karmada-test-gbtrr
    namespace: karmada-test
    resourceVersion: "7179627852"
    uid: d93cfd13-35ab-4951-92f5-690bbaa636ce
  schedulerName: default-scheduler
status:
  aggregatedStatus:
  - applied: true
    clusterName: member-01
    health: Healthy
    status:
      conditions:
      - changeID: 4c66758b-ae35-4894-90a4-162dc266fdd7
        deployID: deploy:01JD3YD35ZS12JFFSPAQF05K2X
        lastTransitionTime: "2024-11-20T05:06:21Z"
        message: Service Access has completed
        reason: ServiceAccessCompleted
        status: "True"
        type: ChangeApplied # <----- This is the condition that interpreter is looking for
      legacy:
        legacycrname: sac-legacy-staging-member-01-karmada-test-e7fd0cfb
        policyarns:
        - arn:aws:iam::1111111111111111:policy/self-serve/zao-member-01-karmada-test-policy-a8f0e15
        reasoncode: LegacyCompleted
        status: Succeeded
      outputs:
      - output: arn:aws:iam::1111111111111111:role/self-serve/staging-member-01-karmada-test-e7fd0cfb
        type: ResourceARN
      - output: arn:aws:iam::1111111111111111:policy/self-serve/zao-member-01-karmada-test-policy-a8f0e15
        type: PolicyARN
      retriesremaining: 5
      servicerole:
        reasoncode: ServiceRoleCompleted
        rolearn: arn:aws:iam::1111111111111111:role/self-serve/staging-member-01-karmada-test-e7fd0cfb
        rolecrname: sac-role-staging-member-01-karmada-test-e7fd0cfb
        status: Succeeded
  conditions:
  - lastTransitionTime: "2024-11-20T05:06:11Z"
    message: Binding has been scheduled successfully.
    reason: Success
    status: "True"
    type: Scheduled
  - lastTransitionTime: "2024-11-20T05:06:11Z"
    message: All works have been successfully applied
    reason: FullyAppliedSuccess
    status: "True"
    type: FullyApplied
  lastScheduledTime: "2024-11-20T05:06:11Z"
  schedulerObservedGeneration: 3

Anything else we need to know?:

Environment: Kubernetes 1.29

  • Karmada version: 1.12.alpha1
  • kubectl-karmada or karmadactl version (the result of kubectl-karmada version or karmadactl version):
  • Others:
@XiShanYongYe-Chang
Copy link
Member

Hi @a7i long time no see :)

I think I understand what you're saying, and I'm wondering if we could add a condition to express the state you want, like HealthRunning(One of my random ideas), so that FullyApplied still means that the resource is distributed successfully. In this way, the state we describe will be richer.

@a7i
Copy link
Contributor Author

a7i commented Nov 25, 2024

Hi @a7i long time no see :)

I think I understand what you're saying, and I'm wondering if we could add a condition to express the state you want, like HealthRunning(One of my random ideas), so that FullyApplied still means that the resource is distributed successfully. In this way, the state we describe will be richer.

Hi @XiShanYongYe-Chang then I misunderstood the intent of FullyApplied. I think we can just query the status.aggregatedStatus[*].health field along with FullyApplied

@XiShanYongYe-Chang
Copy link
Member

I think we can just query the status.aggregatedStatus[*].health field along with FullyApplied

Is that convenient? I mean, there's a layer of calculation to go through when querying. Would it be better to have a condition to describe this?

@a7i
Copy link
Contributor Author

a7i commented Nov 26, 2024

Should be fine for us. If you think that the community can benefit from such conditions, I'm happy to contribute something.

I would assume that we need 3 condition reasons:

  • FullyHealthy (all work status items are healthy)
  • PartiallyHealth (at least 1 work status is healthy, and one work status is unhealthy)
  • Unhealthy ( all work status items are unhealthy)

@XiShanYongYe-Chang
Copy link
Member

Let's go at your pace, and if you need it, let's add it.

@XiShanYongYe-Chang
Copy link
Member

Hi @a7i, How do you use Karmada now? Is it used in production?
By the way, have you submitted a topic for KubeCon EU 2025?

@XiShanYongYe-Chang
Copy link
Member

Can we close this issue @a7i ?

@a7i
Copy link
Contributor Author

a7i commented Dec 5, 2024

Hi @a7i, How do you use Karmada now? Is it used in production?

we're still in staging at the momoent, but plan on going to production next year

By the way, have you submitted a topic for KubeCon EU 2025?

I have not, I doubt my company will cover any expenses to send me to EU :)

Can we close this issue @a7i ?

Yes, thank you for your support on this! 🙌🏼

@a7i a7i closed this as completed Dec 5, 2024
@XiShanYongYe-Chang
Copy link
Member

Thanks for your reply @a7i :)
We look forward to using Karmada in production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
Status: No status
2 participants