Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support karpenter-crd Helm Chart and Fix Node Interruption Handling #868

Merged
merged 25 commits into from
Nov 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
d1ecd5f
added karpenter-crd chart support, added back docs for node interrupt…
milldr Oct 10, 2023
c05bf0e
update readme for consistency
milldr Oct 10, 2023
60a3f5c
removed unnecessary service account for karpenter-crd
milldr Oct 11, 2023
9b5dc51
Apply suggestions from code review
milldr Oct 11, 2023
81e4536
create namespace with separate resource
milldr Oct 11, 2023
1aa1bdf
changelog for karpenter crd
milldr Oct 11, 2023
0a3ff15
changelog for karpenter crd
milldr Oct 11, 2023
716c3b1
corrected atmos resource in changelog
milldr Oct 11, 2023
c8d41ed
depends_on and moved block for karpenter
milldr Oct 11, 2023
954bb7d
updated changelog for moved block
milldr Oct 11, 2023
3f723da
added all scenarios to the changelog
milldr Oct 11, 2023
590a193
Merge branch 'main' into support-karpenter-crd
milldr Oct 12, 2023
d77b2cf
updated moved block for correct resource names and comment
milldr Oct 12, 2023
d2a4521
handle unknown ARN on creation for interruption queue policy
milldr Oct 12, 2023
eb65e88
improved pattern for pulling arn
milldr Oct 13, 2023
518f2c4
Merge branch 'main' into support-karpenter-crd
milldr Nov 7, 2023
f62bc51
Apply suggestions from code review
milldr Nov 7, 2023
892f5e9
pre-commit fixes
cloudpossebot Nov 7, 2023
205d0e4
Merge branch 'main' into support-karpenter-crd
milldr Nov 9, 2023
a10f4ff
pr comments
milldr Nov 9, 2023
54e04af
Merge branch 'support-karpenter-crd' of github.com:cloudposse/terrafo…
milldr Nov 9, 2023
bc0fbe2
Merge branch 'main' into support-karpenter-crd
milldr Nov 16, 2023
5ae3698
Fix typo, add cautions
Nuru Nov 21, 2023
a4da3e3
Merge branch 'main' into support-karpenter-crd
Nuru Nov 21, 2023
de75c39
Update version number
Nuru Nov 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions modules/eks/karpenter/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
## Version 1.348.0

Components PR [#868](https://github.com/cloudposse/terraform-aws-components/pull/868)

The `karpenter-crd` helm chart can now be installed alongside the `karpenter` helm chart to automatically manage the lifecycle of Karpenter CRDs. However since this chart must be installed before the `karpenter` helm chart, the Kubernetes namespace must be available before either chart is deployed. Furthermore, this namespace should persist whether or not the `karpenter-crd` chart is deployed, so it should not be installed with that given `helm-release` resource. Therefore, we've moved namespace creation to a separate resource that runs before both charts. Terraform will handle that namespace state migration with the `moved` block.

There are several scenarios that may or may not require additional steps. Please review the following scenarios and follow the steps for your given requirements.

### Upgrading an existing `eks/karpenter` deployment without changes

If you currently have `eks/karpenter` deployed to an EKS cluster and have upgraded to this version of the component, no changes are required. `var.crd_chart_enabled` will default to `false`.

### Upgrading an existing `eks/karpenter` deployment and deploying the `karpenter-crd` chart

If you currently have `eks/karpenter` deployed to an EKS cluster, have upgraded to this version of the component, do not currently have the `karpenter-crd` chart installed, and want to now deploy the `karpenter-crd` helm chart, a few additional steps are required!

First, set `var.crd_chart_enabled` to `true`.

Next, update the installed Karpenter CRDs in order for Helm to automatically take over their management when the `karpenter-crd` chart is deployed. We have included a script to run that upgrade. Run the `./karpenter-crd-upgrade` script or run the following commands on the given cluster before deploying the chart. Please note that this script or commands will only need to be run on first use of the CRD chart.

Before running the script, ensure that the `kubectl` context is set to the cluster where the `karpenter` helm chart is deployed. In Geodesic, you can usually do this with the `set-cluster` command, though your configuration may vary.

```bash
set-cluster <tenant>-<region>-<stage> terraform
```

Then run the script or commands:

```bash
kubectl label crd awsnodetemplates.karpenter.k8s.aws provisioners.karpenter.sh app.kubernetes.io/managed-by=Helm --overwrite
kubectl annotate crd awsnodetemplates.karpenter.k8s.aws provisioners.karpenter.sh meta.helm.sh/release-name=karpenter-crd --overwrite
kubectl annotate crd awsnodetemplates.karpenter.k8s.aws provisioners.karpenter.sh meta.helm.sh/release-namespace=karpenter --overwrite
```

:::info

Previously the `karpenter-crd-upgrade` script included deploying the `karpenter-crd` chart. Now that this chart is moved to Terraform, that helm deployment is no longer necessary.

For reference, the `karpenter-crd` chart can be installed with helm with the following:
```bash
helm upgrade --install karpenter-crd oci://public.ecr.aws/karpenter/karpenter-crd --version "$VERSION" --namespace karpenter
```

:::

Now that the CRDs are upgraded, the component is ready to be applied. Apply the `eks/karpenter` component and then apply `eks/karpenter-provisioner`.

#### Note for upgrading Karpenter from before v0.27.3 to v0.27.3 or later

If you are upgrading Karpenter from before v0.27.3 to v0.27.3 or later,
you may need to run the following command to remove an obsolete webhook:

```bash
kubectl delete mutatingwebhookconfigurations defaulting.webhook.karpenter.sh
```

See [the Karpenter upgrade guide](https://karpenter.sh/v0.32/upgrading/upgrade-guide/#upgrading-to-v0273)
for more details.

### Upgrading an existing `eks/karpenter` deployment where the `karpenter-crd` chart is already deployed

If you currently have `eks/karpenter` deployed to an EKS cluster, have upgraded to this version of the component, and already have the `karpenter-crd` chart installed, simply set `var.crd_chart_enabled` to `true` and redeploy Terraform to have Terraform manage the helm release for `karpenter-crd`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you don't have to import the CRD helm chart?


### Net new deployments

If you are initially deploying `eks/karpenter`, no changes are required, but we recommend installing the CRD chart. Set `var.crd_chart_enabled` to `true` and continue with deployment.
56 changes: 46 additions & 10 deletions modules/eks/karpenter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,19 +21,14 @@ components:
eks/karpenter:
metadata:
type: abstract
settings:
spacelift:
workspace_enabled: true
vars:
enabled: true
tags:
Team: sre
Service: karpenter
eks_component_name: eks/cluster
eks_component_name: "eks/cluster"
name: "karpenter"
# https://github.com/aws/karpenter/tree/main/charts/karpenter
chart_repository: "oci://public.ecr.aws/karpenter"
chart: "karpenter"
chart_repository: "https://charts.karpenter.sh"
chart_version: "v0.16.3"
chart_version: "v0.31.0"
create_namespace: true
kubernetes_namespace: "karpenter"
resources:
Expand All @@ -47,9 +42,14 @@ components:
atomic: true
wait: true
rbac_enabled: true
# "karpenter-crd" can be installed as an independent helm chart to manage the lifecycle of Karpenter CRDs
crd_chart_enabled: true
crd_chart: "karpenter-crd"
# Set `legacy_create_karpenter_instance_profile` to `false` to allow the `eks/cluster` component
milldr marked this conversation as resolved.
Show resolved Hide resolved
# to manage the instance profile for the nodes launched by Karpenter (recommended for all new clusters).
legacy_create_karpenter_instance_profile: false
# Enable interruption handling to deploy a SQS queue and a set of Event Bridge rules to handle interruption with Karpenter.
interruption_handler_enabled: true

# Provision `karpenter` component on the blue EKS cluster
eks/karpenter-blue:
Expand Down Expand Up @@ -281,6 +281,37 @@ For your cluster, you will need to review the following configurations for the K
ttl_seconds_until_expired: 2592000
```

## Node Interruption

Karpenter also supports listening for and responding to Node Interruption events. If interruption handling is enabled, Karpenter will watch for upcoming involuntary interruption events that would cause disruption to your workloads. These interruption events include:

- Spot Interruption Warnings
- Scheduled Change Health Events (Maintenance Events)
- Instance Terminating Events
- Instance Stopping Events

:::info

The Node Interruption Handler is not the same as the Node Termination Handler. The latter is always enabled and cleanly shuts down the node in 2 minutes in response to a Node Termination event. The former gets advance notice that a node will soon be terminated, so it can have 5-10 minutes to shut down a node.

:::

For more details, see refer to the [Karpenter docs](https://karpenter.sh/v0.32/concepts/disruption/#interruption) and [FAQ](https://karpenter.sh/v0.32/faq/#interruption-handling)

To enable Node Interruption handling, set `var.interruption_handler_enabled` to `true`. This will create an SQS queue and a set of Event Bridge rules to deliver interruption events to Karpenter.

## Custom Resource Definition (CRD) Management

Karpenter ships with a few Custom Resource Definitions (CRDs). In earlier versions
of this component, when installing a new version of the `karpenter` helm chart, CRDs
were not be upgraded at the same time, requiring manual steps to upgrade CRDs after deploying the latest chart.
However Karpenter now supports an additional, independent helm chart for CRD management.
This helm chart, `karpenter-crd`, can be installed alongside the `karpenter` helm chart to automatically manage the lifecycle of these CRDs.

To deploy the `karpenter-crd` helm chart, set `var.crd_chart_enabled` to `true`.
milldr marked this conversation as resolved.
Show resolved Hide resolved
(Installing the `karpenter-crd` chart is recommended. `var.crd_chart_enabled` defaults
to `false` to preserve backward compatibility with older versions of this component.)

milldr marked this conversation as resolved.
Show resolved Hide resolved
## Troubleshooting

For Karpenter issues, checkout the [Karpenter Troubleshooting Guide](https://karpenter.sh/docs/troubleshooting/)
Expand Down Expand Up @@ -312,14 +343,16 @@ For more details, refer to:
| Name | Version |
|------|---------|
| <a name="provider_aws"></a> [aws](#provider\_aws) | >= 4.9.0 |
| <a name="provider_kubernetes"></a> [kubernetes](#provider\_kubernetes) | >= 2.7.1, != 2.21.0 |

## Modules

| Name | Source | Version |
|------|--------|---------|
| <a name="module_eks"></a> [eks](#module\_eks) | cloudposse/stack-config/yaml//modules/remote-state | 1.5.0 |
| <a name="module_iam_roles"></a> [iam\_roles](#module\_iam\_roles) | ../../account-map/modules/iam-roles | n/a |
| <a name="module_karpenter"></a> [karpenter](#module\_karpenter) | cloudposse/helm-release/aws | 0.10.0 |
| <a name="module_karpenter"></a> [karpenter](#module\_karpenter) | cloudposse/helm-release/aws | 0.10.1 |
| <a name="module_karpenter_crd"></a> [karpenter\_crd](#module\_karpenter\_crd) | cloudposse/helm-release/aws | 0.10.1 |
| <a name="module_this"></a> [this](#module\_this) | cloudposse/label/null | 0.25.0 |

## Resources
Expand All @@ -331,6 +364,7 @@ For more details, refer to:
| [aws_iam_instance_profile.default](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_instance_profile) | resource |
| [aws_sqs_queue.interruption_handler](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sqs_queue) | resource |
| [aws_sqs_queue_policy.interruption_handler](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sqs_queue_policy) | resource |
| [kubernetes_namespace.default](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/namespace) | resource |
| [aws_eks_cluster_auth.eks](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/eks_cluster_auth) | data source |
| [aws_iam_policy_document.interruption_handler](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_partition.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/partition) | data source |
Expand All @@ -349,6 +383,8 @@ For more details, refer to:
| <a name="input_chart_version"></a> [chart\_version](#input\_chart\_version) | Specify the exact chart version to install. If this is not specified, the latest version is installed | `string` | `null` | no |
| <a name="input_cleanup_on_fail"></a> [cleanup\_on\_fail](#input\_cleanup\_on\_fail) | Allow deletion of new resources created in this upgrade when upgrade fails | `bool` | `true` | no |
| <a name="input_context"></a> [context](#input\_context) | Single object for setting entire context at once.<br>See description of individual variables for details.<br>Leave string and numeric variables as `null` to use default value.<br>Individual variable settings (non-null) override settings in context object,<br>except for attributes, tags, and additional\_tag\_map, which are merged. | `any` | <pre>{<br> "additional_tag_map": {},<br> "attributes": [],<br> "delimiter": null,<br> "descriptor_formats": {},<br> "enabled": true,<br> "environment": null,<br> "id_length_limit": null,<br> "label_key_case": null,<br> "label_order": [],<br> "label_value_case": null,<br> "labels_as_tags": [<br> "unset"<br> ],<br> "name": null,<br> "namespace": null,<br> "regex_replace_chars": null,<br> "stage": null,<br> "tags": {},<br> "tenant": null<br>}</pre> | no |
| <a name="input_crd_chart"></a> [crd\_chart](#input\_crd\_chart) | The name of the Karpenter CRD chart to be installed, if `var.crd_chart_enabled` is set to `true`. | `string` | `"karpenter-crd"` | no |
| <a name="input_crd_chart_enabled"></a> [crd\_chart\_enabled](#input\_crd\_chart\_enabled) | `karpenter-crd` can be installed as an independent helm chart to manage the lifecycle of Karpenter CRDs. Set to `true` to install this CRD helm chart before the primary karpenter chart. | `bool` | `false` | no |
| <a name="input_create_namespace"></a> [create\_namespace](#input\_create\_namespace) | Create the namespace if it does not yet exist. Defaults to `false` | `bool` | `null` | no |
| <a name="input_delimiter"></a> [delimiter](#input\_delimiter) | Delimiter to be used between ID elements.<br>Defaults to `-` (hyphen). Set to `""` to use no delimiter at all. | `string` | `null` | no |
| <a name="input_descriptor_formats"></a> [descriptor\_formats](#input\_descriptor\_formats) | Describe additional descriptors to be output in the `descriptors` output map.<br>Map of maps. Keys are names of descriptors. Values are maps of the form<br>`{<br> format = string<br> labels = list(string)<br>}`<br>(Type is `any` so the map values can later be enhanced to provide additional options.)<br>`format` is a Terraform format string to be passed to the `format()` function.<br>`labels` is a list of labels, in order, to pass to `format()` function.<br>Label values will be normalized before being passed to `format()` so they will be<br>identical to how they appear in `id`.<br>Default is `{}` (`descriptors` output will be empty). | `any` | `{}` | no |
Expand Down
6 changes: 4 additions & 2 deletions modules/eks/karpenter/interruption_handler.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ locals {
interruption_handler_enabled = local.enabled && var.interruption_handler_enabled
interruption_handler_queue_name = module.this.id

dns_suffix = data.aws_partition.current.dns_suffix
dns_suffix = join("", data.aws_partition.current[*].dns_suffix)

events = {
health_event = {
Expand Down Expand Up @@ -40,7 +40,9 @@ locals {
}
}

data "aws_partition" "current" {}
data "aws_partition" "current" {
count = local.interruption_handler_enabled ? 1 : 0
}

resource "aws_sqs_queue" "interruption_handler" {
count = local.interruption_handler_enabled ? 1 : 0
Expand Down
12 changes: 4 additions & 8 deletions modules/eks/karpenter/karpenter-crd-upgrade
Original file line number Diff line number Diff line change
Expand Up @@ -2,27 +2,23 @@

function usage() {
cat >&2 <<'EOF'
./karpenter-crd-upgrade <version>
./karpenter-crd-upgrade

Use this script to upgrade the Karpenter CRDs by installing or upgrading the karpenter-crd helm chart.
Use this script to prepare a cluster for karpenter-crd helm chart support by upgrading Karpenter CRDs.

EOF
}

function upgrade() {
VERSION="${1}"
[[ $VERSION =~ ^v ]] || VERSION="v${VERSION}"

set -x

kubectl label crd awsnodetemplates.karpenter.k8s.aws provisioners.karpenter.sh app.kubernetes.io/managed-by=Helm --overwrite
kubectl annotate crd awsnodetemplates.karpenter.k8s.aws provisioners.karpenter.sh meta.helm.sh/release-name=karpenter-crd --overwrite
kubectl annotate crd awsnodetemplates.karpenter.k8s.aws provisioners.karpenter.sh meta.helm.sh/release-namespace=karpenter --overwrite
helm upgrade --install karpenter-crd oci://public.ecr.aws/karpenter/karpenter-crd --version "$VERSION" --namespace karpenter
}

if (($# == 0)); then
usage
upgrade
else
upgrade $1
usage
fi
Loading
Loading