-
Notifications
You must be signed in to change notification settings - Fork 923
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1197 from mrlihanbo/failover-policy
add docs: failover policy overview
- Loading branch information
Showing
2 changed files
with
208 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,207 @@ | ||
# Failover Overview | ||
|
||
## Monitor the cluster health status | ||
|
||
Karmada supports both `Push` and `Pull` modes to manage member clusters. | ||
|
||
More details about cluster registration please refer to [Cluster Registration](./cluster-registration.md#cluster-registration). | ||
|
||
### Determining failures | ||
|
||
For clusters there are two forms of heartbeats: | ||
- updates to the `.status` of a Cluster. | ||
- `Lease` objects within the `karmada-cluster` namespace in karmada control plane. Each cluster has an associated `Lease` object. | ||
|
||
#### Cluster status collection | ||
|
||
For `Push` mode clusters, the cluster status controller in karmada control plane will continually collect cluster's status for a configured interval. | ||
|
||
For `Pull` mode clusters, the `karmada-agent` is responsible for creating and updating the `.status` of clusters with configured interval. | ||
|
||
The interval for `.status` updates to `Cluster` can be configured via `--cluster-status-update-frequency` flag(default is 10 seconds). | ||
|
||
Cluster might be set to the `NotReady` state with following conditions: | ||
- cluster is unreachable(retry 4 times within 2 seconds). | ||
- cluster's health endpoint responded without ok. | ||
- failed to collect cluster status including the kubernetes’ version, installed APIs, resources usages, etc. | ||
|
||
#### Lease updates | ||
Karmada will create a `Lease` object and a lease controller for each cluster when clusters are joined. | ||
|
||
Each lease controller is responsible for updating the related Leases. The lease renewing time can be configured via `--cluster-lease-duration` and `--cluster-lease-renew-interval-fraction` flags(default is 10 seconds). | ||
|
||
Lease’s updating process is independent with cluster’s status updating process, since cluster’s `.status` field is maintained by cluster status controller. | ||
|
||
The cluster controller in Karmada control plane would check the state of each cluster every `--cluster-monitor-period` period(default is 5 seconds). | ||
|
||
The cluster's `Ready` condition would be changed to `Unknown` when cluster controller has not heard from the cluster in the last `--cluster-monitor-grace-period`(default is 40 seconds). | ||
|
||
### Check cluster status | ||
You can use `kubectl` to check a Cluster's status and other details: | ||
``` | ||
kubectl describe cluster <cluster-name> | ||
``` | ||
|
||
The `Ready` condition in `Status` field indicates the cluster is healthy and ready to accept workloads. | ||
It will be set to `False` if the cluster is not healthy and is not accepting workloads, and `Unknown` if the cluster controller has not heard from the cluster in the last `cluster-monitor-grace-period`. | ||
|
||
The following example describes an unhealthy cluster: | ||
``` | ||
kubectl describe cluster member1 | ||
Name: member1 | ||
Namespace: | ||
Labels: <none> | ||
Annotations: <none> | ||
API Version: cluster.karmada.io/v1alpha1 | ||
Kind: Cluster | ||
Metadata: | ||
Creation Timestamp: 2021-12-29T08:49:35Z | ||
Finalizers: | ||
karmada.io/cluster-controller | ||
Resource Version: 152047 | ||
UID: 53c133ab-264e-4e8e-ab63-a21611f7fae8 | ||
Spec: | ||
API Endpoint: https://172.23.0.7:6443 | ||
Impersonator Secret Ref: | ||
Name: member1-impersonator | ||
Namespace: karmada-cluster | ||
Secret Ref: | ||
Name: member1 | ||
Namespace: karmada-cluster | ||
Sync Mode: Push | ||
Status: | ||
Conditions: | ||
Last Transition Time: 2021-12-31T03:36:08Z | ||
Message: cluster is not reachable | ||
Reason: ClusterNotReachable | ||
Status: False | ||
Type: Ready | ||
Events: <none> | ||
``` | ||
|
||
## Failover feature of Karmada | ||
The failover feature is controlled by the `Failover` feature gate, users need to enable the `Failover` feature gate of karmada scheduler: | ||
``` | ||
--feature-gates=Failover=true | ||
``` | ||
|
||
### Concept | ||
|
||
When it is determined that member clusters becoming unhealthy, the karmada scheduler will reschedule the reference application. | ||
There are several constraints: | ||
- For each rescheduled application, it still needs to meet the restrictions of PropagationPolicy, such as ClusterAffinity or SpreadConstraints. | ||
- The application distributed on the ready clusters after the initial scheduling will remain when failover schedule. | ||
|
||
#### Duplicated schedule type | ||
For `Duplicated` schedule policy, when the number of candidate clusters that meet the PropagationPolicy restriction is not less than the number of failed clusters, | ||
it will be rescheduled to candidate clusters according to the number of failed clusters. Otherwise, no rescheduling. | ||
|
||
Take `Deployment` as example: | ||
``` | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: nginx | ||
labels: | ||
app: nginx | ||
spec: | ||
replicas: 2 | ||
selector: | ||
matchLabels: | ||
app: nginx | ||
template: | ||
metadata: | ||
labels: | ||
app: nginx | ||
spec: | ||
containers: | ||
- image: nginx | ||
name: nginx | ||
--- | ||
apiVersion: policy.karmada.io/v1alpha1 | ||
kind: PropagationPolicy | ||
metadata: | ||
name: nginx-propagation | ||
spec: | ||
resourceSelectors: | ||
- apiVersion: apps/v1 | ||
kind: Deployment | ||
name: nginx | ||
placement: | ||
clusterAffinity: | ||
clusterNames: | ||
- member1 | ||
- member2 | ||
- member3 | ||
- member5 | ||
spreadConstraints: | ||
- maxGroups: 2 | ||
minGroups: 2 | ||
replicaScheduling: | ||
replicaSchedulingType: Duplicated | ||
``` | ||
|
||
Suppose there are 5 member clusters, and the initial scheduling result is in member1 and member2. When member2 fails, it triggers rescheduling. | ||
|
||
It should be noted that rescheduling will not delete the application on the ready cluster member1. In the remaining 3 clusters, only member3 and member5 match the `clusterAffinity` policy. | ||
|
||
Due to the limitations of spreadConstraints, the final result can be [member1, member3] or [member1, member5]. | ||
|
||
#### Divided schedule type | ||
For `Divided` schedule policy, karmada scheduler will try to migrate replicas to the other health clusters. | ||
|
||
Take `Deployment` as example: | ||
``` | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: nginx | ||
labels: | ||
app: nginx | ||
spec: | ||
replicas: 3 | ||
selector: | ||
matchLabels: | ||
app: nginx | ||
template: | ||
metadata: | ||
labels: | ||
app: nginx | ||
spec: | ||
containers: | ||
- image: nginx | ||
name: nginx | ||
--- | ||
apiVersion: policy.karmada.io/v1alpha1 | ||
kind: PropagationPolicy | ||
metadata: | ||
name: nginx-propagation | ||
spec: | ||
resourceSelectors: | ||
- apiVersion: apps/v1 | ||
kind: Deployment | ||
name: nginx | ||
placement: | ||
clusterAffinity: | ||
clusterNames: | ||
- member1 | ||
- member2 | ||
replicaScheduling: | ||
replicaDivisionPreference: Weighted | ||
replicaSchedulingType: Divided | ||
weightPreference: | ||
staticWeightList: | ||
- targetCluster: | ||
clusterNames: | ||
- member1 | ||
weight: 1 | ||
- targetCluster: | ||
clusterNames: | ||
- member2 | ||
weight: 2 | ||
``` | ||
|
||
Karmada scheduler will divide the replicas according the `weightPreference`. The initial schedule result is member1 with 1 replica and member2 with 2 replicas. | ||
|
||
When member1 fails, it triggers rescheduling. Karmada scheduler will try to migrate replicas to the other health clusters. The final result will be member2 with 3 replicas. |