Skip to content

Commit

Permalink
Support karpenter-crd Helm Chart and Fix Node Interruption Handling (
Browse files Browse the repository at this point in the history
  • Loading branch information
milldr authored and mgledi committed Nov 29, 2023
1 parent f39fe8c commit 283a161
Show file tree
Hide file tree
Showing 6 changed files with 266 additions and 85 deletions.
66 changes: 66 additions & 0 deletions modules/eks/karpenter/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
## Version 1.348.0

Components PR [#868](https://github.com/cloudposse/terraform-aws-components/pull/868)

The `karpenter-crd` helm chart can now be installed alongside the `karpenter` helm chart to automatically manage the lifecycle of Karpenter CRDs. However since this chart must be installed before the `karpenter` helm chart, the Kubernetes namespace must be available before either chart is deployed. Furthermore, this namespace should persist whether or not the `karpenter-crd` chart is deployed, so it should not be installed with that given `helm-release` resource. Therefore, we've moved namespace creation to a separate resource that runs before both charts. Terraform will handle that namespace state migration with the `moved` block.

There are several scenarios that may or may not require additional steps. Please review the following scenarios and follow the steps for your given requirements.

### Upgrading an existing `eks/karpenter` deployment without changes

If you currently have `eks/karpenter` deployed to an EKS cluster and have upgraded to this version of the component, no changes are required. `var.crd_chart_enabled` will default to `false`.

### Upgrading an existing `eks/karpenter` deployment and deploying the `karpenter-crd` chart

If you currently have `eks/karpenter` deployed to an EKS cluster, have upgraded to this version of the component, do not currently have the `karpenter-crd` chart installed, and want to now deploy the `karpenter-crd` helm chart, a few additional steps are required!

First, set `var.crd_chart_enabled` to `true`.

Next, update the installed Karpenter CRDs in order for Helm to automatically take over their management when the `karpenter-crd` chart is deployed. We have included a script to run that upgrade. Run the `./karpenter-crd-upgrade` script or run the following commands on the given cluster before deploying the chart. Please note that this script or commands will only need to be run on first use of the CRD chart.

Before running the script, ensure that the `kubectl` context is set to the cluster where the `karpenter` helm chart is deployed. In Geodesic, you can usually do this with the `set-cluster` command, though your configuration may vary.

```bash
set-cluster <tenant>-<region>-<stage> terraform
```

Then run the script or commands:

```bash
kubectl label crd awsnodetemplates.karpenter.k8s.aws provisioners.karpenter.sh app.kubernetes.io/managed-by=Helm --overwrite
kubectl annotate crd awsnodetemplates.karpenter.k8s.aws provisioners.karpenter.sh meta.helm.sh/release-name=karpenter-crd --overwrite
kubectl annotate crd awsnodetemplates.karpenter.k8s.aws provisioners.karpenter.sh meta.helm.sh/release-namespace=karpenter --overwrite
```

:::info

Previously the `karpenter-crd-upgrade` script included deploying the `karpenter-crd` chart. Now that this chart is moved to Terraform, that helm deployment is no longer necessary.

For reference, the `karpenter-crd` chart can be installed with helm with the following:
```bash
helm upgrade --install karpenter-crd oci://public.ecr.aws/karpenter/karpenter-crd --version "$VERSION" --namespace karpenter
```

:::

Now that the CRDs are upgraded, the component is ready to be applied. Apply the `eks/karpenter` component and then apply `eks/karpenter-provisioner`.

#### Note for upgrading Karpenter from before v0.27.3 to v0.27.3 or later

If you are upgrading Karpenter from before v0.27.3 to v0.27.3 or later,
you may need to run the following command to remove an obsolete webhook:

```bash
kubectl delete mutatingwebhookconfigurations defaulting.webhook.karpenter.sh
```

See [the Karpenter upgrade guide](https://karpenter.sh/v0.32/upgrading/upgrade-guide/#upgrading-to-v0273)
for more details.

### Upgrading an existing `eks/karpenter` deployment where the `karpenter-crd` chart is already deployed

If you currently have `eks/karpenter` deployed to an EKS cluster, have upgraded to this version of the component, and already have the `karpenter-crd` chart installed, simply set `var.crd_chart_enabled` to `true` and redeploy Terraform to have Terraform manage the helm release for `karpenter-crd`.

### Net new deployments

If you are initially deploying `eks/karpenter`, no changes are required, but we recommend installing the CRD chart. Set `var.crd_chart_enabled` to `true` and continue with deployment.
56 changes: 46 additions & 10 deletions modules/eks/karpenter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,19 +21,14 @@ components:
eks/karpenter:
metadata:
type: abstract
settings:
spacelift:
workspace_enabled: true
vars:
enabled: true
tags:
Team: sre
Service: karpenter
eks_component_name: eks/cluster
eks_component_name: "eks/cluster"
name: "karpenter"
# https://github.com/aws/karpenter/tree/main/charts/karpenter
chart_repository: "oci://public.ecr.aws/karpenter"
chart: "karpenter"
chart_repository: "https://charts.karpenter.sh"
chart_version: "v0.16.3"
chart_version: "v0.31.0"
create_namespace: true
kubernetes_namespace: "karpenter"
resources:
Expand All @@ -47,9 +42,14 @@ components:
atomic: true
wait: true
rbac_enabled: true
# "karpenter-crd" can be installed as an independent helm chart to manage the lifecycle of Karpenter CRDs
crd_chart_enabled: true
crd_chart: "karpenter-crd"
# Set `legacy_create_karpenter_instance_profile` to `false` to allow the `eks/cluster` component
# to manage the instance profile for the nodes launched by Karpenter (recommended for all new clusters).
legacy_create_karpenter_instance_profile: false
# Enable interruption handling to deploy a SQS queue and a set of Event Bridge rules to handle interruption with Karpenter.
interruption_handler_enabled: true

# Provision `karpenter` component on the blue EKS cluster
eks/karpenter-blue:
Expand Down Expand Up @@ -281,6 +281,37 @@ For your cluster, you will need to review the following configurations for the K
ttl_seconds_until_expired: 2592000
```

## Node Interruption

Karpenter also supports listening for and responding to Node Interruption events. If interruption handling is enabled, Karpenter will watch for upcoming involuntary interruption events that would cause disruption to your workloads. These interruption events include:

- Spot Interruption Warnings
- Scheduled Change Health Events (Maintenance Events)
- Instance Terminating Events
- Instance Stopping Events

:::info

The Node Interruption Handler is not the same as the Node Termination Handler. The latter is always enabled and cleanly shuts down the node in 2 minutes in response to a Node Termination event. The former gets advance notice that a node will soon be terminated, so it can have 5-10 minutes to shut down a node.

:::

For more details, see refer to the [Karpenter docs](https://karpenter.sh/v0.32/concepts/disruption/#interruption) and [FAQ](https://karpenter.sh/v0.32/faq/#interruption-handling)

To enable Node Interruption handling, set `var.interruption_handler_enabled` to `true`. This will create an SQS queue and a set of Event Bridge rules to deliver interruption events to Karpenter.

## Custom Resource Definition (CRD) Management

Karpenter ships with a few Custom Resource Definitions (CRDs). In earlier versions
of this component, when installing a new version of the `karpenter` helm chart, CRDs
were not be upgraded at the same time, requiring manual steps to upgrade CRDs after deploying the latest chart.
However Karpenter now supports an additional, independent helm chart for CRD management.
This helm chart, `karpenter-crd`, can be installed alongside the `karpenter` helm chart to automatically manage the lifecycle of these CRDs.

To deploy the `karpenter-crd` helm chart, set `var.crd_chart_enabled` to `true`.
(Installing the `karpenter-crd` chart is recommended. `var.crd_chart_enabled` defaults
to `false` to preserve backward compatibility with older versions of this component.)

## Troubleshooting

For Karpenter issues, checkout the [Karpenter Troubleshooting Guide](https://karpenter.sh/docs/troubleshooting/)
Expand Down Expand Up @@ -312,14 +343,16 @@ For more details, refer to:
| Name | Version |
|------|---------|
| <a name="provider_aws"></a> [aws](#provider\_aws) | >= 4.9.0 |
| <a name="provider_kubernetes"></a> [kubernetes](#provider\_kubernetes) | >= 2.7.1, != 2.21.0 |

## Modules

| Name | Source | Version |
|------|--------|---------|
| <a name="module_eks"></a> [eks](#module\_eks) | cloudposse/stack-config/yaml//modules/remote-state | 1.5.0 |
| <a name="module_iam_roles"></a> [iam\_roles](#module\_iam\_roles) | ../../account-map/modules/iam-roles | n/a |
| <a name="module_karpenter"></a> [karpenter](#module\_karpenter) | cloudposse/helm-release/aws | 0.10.0 |
| <a name="module_karpenter"></a> [karpenter](#module\_karpenter) | cloudposse/helm-release/aws | 0.10.1 |
| <a name="module_karpenter_crd"></a> [karpenter\_crd](#module\_karpenter\_crd) | cloudposse/helm-release/aws | 0.10.1 |
| <a name="module_this"></a> [this](#module\_this) | cloudposse/label/null | 0.25.0 |

## Resources
Expand All @@ -331,6 +364,7 @@ For more details, refer to:
| [aws_iam_instance_profile.default](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_instance_profile) | resource |
| [aws_sqs_queue.interruption_handler](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sqs_queue) | resource |
| [aws_sqs_queue_policy.interruption_handler](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sqs_queue_policy) | resource |
| [kubernetes_namespace.default](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/namespace) | resource |
| [aws_eks_cluster_auth.eks](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/eks_cluster_auth) | data source |
| [aws_iam_policy_document.interruption_handler](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_partition.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/partition) | data source |
Expand All @@ -349,6 +383,8 @@ For more details, refer to:
| <a name="input_chart_version"></a> [chart\_version](#input\_chart\_version) | Specify the exact chart version to install. If this is not specified, the latest version is installed | `string` | `null` | no |
| <a name="input_cleanup_on_fail"></a> [cleanup\_on\_fail](#input\_cleanup\_on\_fail) | Allow deletion of new resources created in this upgrade when upgrade fails | `bool` | `true` | no |
| <a name="input_context"></a> [context](#input\_context) | Single object for setting entire context at once.<br>See description of individual variables for details.<br>Leave string and numeric variables as `null` to use default value.<br>Individual variable settings (non-null) override settings in context object,<br>except for attributes, tags, and additional\_tag\_map, which are merged. | `any` | <pre>{<br> "additional_tag_map": {},<br> "attributes": [],<br> "delimiter": null,<br> "descriptor_formats": {},<br> "enabled": true,<br> "environment": null,<br> "id_length_limit": null,<br> "label_key_case": null,<br> "label_order": [],<br> "label_value_case": null,<br> "labels_as_tags": [<br> "unset"<br> ],<br> "name": null,<br> "namespace": null,<br> "regex_replace_chars": null,<br> "stage": null,<br> "tags": {},<br> "tenant": null<br>}</pre> | no |
| <a name="input_crd_chart"></a> [crd\_chart](#input\_crd\_chart) | The name of the Karpenter CRD chart to be installed, if `var.crd_chart_enabled` is set to `true`. | `string` | `"karpenter-crd"` | no |
| <a name="input_crd_chart_enabled"></a> [crd\_chart\_enabled](#input\_crd\_chart\_enabled) | `karpenter-crd` can be installed as an independent helm chart to manage the lifecycle of Karpenter CRDs. Set to `true` to install this CRD helm chart before the primary karpenter chart. | `bool` | `false` | no |
| <a name="input_create_namespace"></a> [create\_namespace](#input\_create\_namespace) | Create the namespace if it does not yet exist. Defaults to `false` | `bool` | `null` | no |
| <a name="input_delimiter"></a> [delimiter](#input\_delimiter) | Delimiter to be used between ID elements.<br>Defaults to `-` (hyphen). Set to `""` to use no delimiter at all. | `string` | `null` | no |
| <a name="input_descriptor_formats"></a> [descriptor\_formats](#input\_descriptor\_formats) | Describe additional descriptors to be output in the `descriptors` output map.<br>Map of maps. Keys are names of descriptors. Values are maps of the form<br>`{<br> format = string<br> labels = list(string)<br>}`<br>(Type is `any` so the map values can later be enhanced to provide additional options.)<br>`format` is a Terraform format string to be passed to the `format()` function.<br>`labels` is a list of labels, in order, to pass to `format()` function.<br>Label values will be normalized before being passed to `format()` so they will be<br>identical to how they appear in `id`.<br>Default is `{}` (`descriptors` output will be empty). | `any` | `{}` | no |
Expand Down
6 changes: 4 additions & 2 deletions modules/eks/karpenter/interruption_handler.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ locals {
interruption_handler_enabled = local.enabled && var.interruption_handler_enabled
interruption_handler_queue_name = module.this.id

dns_suffix = data.aws_partition.current.dns_suffix
dns_suffix = join("", data.aws_partition.current[*].dns_suffix)

events = {
health_event = {
Expand Down Expand Up @@ -40,7 +40,9 @@ locals {
}
}

data "aws_partition" "current" {}
data "aws_partition" "current" {
count = local.interruption_handler_enabled ? 1 : 0
}

resource "aws_sqs_queue" "interruption_handler" {
count = local.interruption_handler_enabled ? 1 : 0
Expand Down
12 changes: 4 additions & 8 deletions modules/eks/karpenter/karpenter-crd-upgrade
Original file line number Diff line number Diff line change
Expand Up @@ -2,27 +2,23 @@

function usage() {
cat >&2 <<'EOF'
./karpenter-crd-upgrade <version>
./karpenter-crd-upgrade
Use this script to upgrade the Karpenter CRDs by installing or upgrading the karpenter-crd helm chart.
Use this script to prepare a cluster for karpenter-crd helm chart support by upgrading Karpenter CRDs.
EOF
}

function upgrade() {
VERSION="${1}"
[[ $VERSION =~ ^v ]] || VERSION="v${VERSION}"

set -x

kubectl label crd awsnodetemplates.karpenter.k8s.aws provisioners.karpenter.sh app.kubernetes.io/managed-by=Helm --overwrite
kubectl annotate crd awsnodetemplates.karpenter.k8s.aws provisioners.karpenter.sh meta.helm.sh/release-name=karpenter-crd --overwrite
kubectl annotate crd awsnodetemplates.karpenter.k8s.aws provisioners.karpenter.sh meta.helm.sh/release-namespace=karpenter --overwrite
helm upgrade --install karpenter-crd oci://public.ecr.aws/karpenter/karpenter-crd --version "$VERSION" --namespace karpenter
}

if (($# == 0)); then
usage
upgrade
else
upgrade $1
usage
fi
Loading

0 comments on commit 283a161

Please sign in to comment.