Skip to content

Commit

Permalink
feat: --copy-labels and GCP support (#95)
Browse files Browse the repository at this point in the history
This PR introduces two new features:

1. Support for Google Cloud Platform (GCP). Set with `-cloud gcp`. AWS
(`-cloud aws`) is the default if not specified.
2. New `-copy-labels` flag. When used, this flag will copy the specified
labels (optionally all labels with `'*'`) from the PVC to the cloud disk
volume.

GCP labels have a different set of constraints than k8s labels and AWS
tags. The biggest difference is that `.` and `/` are not allowed, so a
k8s label like `app.kubernetes.io/name` will be converted to
`app-kubernetes-io_name` when applied to a GCP PD volume.

Also some small refactors:
- Converted provisioner magic strings like `ebs.csi.aws.com` to
constants like `AWS_EBS_CSI`
- opportunistically converted some if/else blocks to switch statements
  • Loading branch information
joemiller committed May 28, 2024
1 parent 60eedb0 commit 4a09989
Show file tree
Hide file tree
Showing 11 changed files with 1,163 additions and 159 deletions.
39 changes: 39 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ The `k8s-pvc-tagger` watches for new PersistentVolumeClaims and when new AWS EBS

`--allow-all-tags` - Allow all tags to be set via the PVC; even those used by the EBS/EFS controllers. Use with caution!

`--copy-labels` - A csv encoded list of label keys from the PVC that will be used to set tags on Volumes. Use `*` to copy all labels from the PVC.

#### Annotations

`k8s-pvc-tagger/ignore` - When this annotation is set (any value) it will ignore this PVC and not add any tags to it
Expand All @@ -36,6 +38,10 @@ NOTE: Until version `v1.2.0` the legacy annotation prefix of `aws-ebs-tagger` wi

4. The cmdline arg `--default-tags={"me": "touge"}` and the annotation `k8s-pvc-tagger/tags: | {"cost-center": "abc", "environment": "prod"}` will create the tags `me=touge`, `cost-center=abc` and `environment=prod` on the EBS/EFS Volume

5. The cmdline arg `--copy-labels '*'` will create a tag from each label on the PVC with the exception of the those used by the controllers unless `--allow-all-tags` is specified.

6. The cmdline arg `--copy-labels 'cost-center,environment'` will copy the `cost-center` and `environment` labels from the PVC onto the cloud volume.

#### ignored tags

The following tags are ignored by default
Expand Down Expand Up @@ -72,12 +78,45 @@ metadata:
{"OwnerID": "{{ .Namespace }}/{{ .Name }}"}
```
### Multi-cloud support
Currently supported clouds: AWS, GCP.
Only one mode is active at a given time. Specify the cloud `k8s-pvc-tagger` is running in with the `--cloud` flag. Either `aws` or `gcp`.

If not specified `--cloud aws` is the default mode.

> NOTE: GCP labels have constraints that do not match the contraints allowed by Kubernetes labels. When running in GCP mode labels will be modified to fit GCP's constraints, if necessary. The main difference is `.` and `/` are not allowed, so a label such as `dom.tld/key` will be converted to `dom-tld_key`.

### Installation

#### AWS IAM Role

You need to create an AWS IAM Role that can be used by `k8s-pvc-tagger`. For EKS clusters, an [IAM Role for Service Accounts](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-technical-overview.html) should be used instead of using an AWS access key/secret. For non-EKS clusters, I recommend using a tool like [kube2iam](https://github.com/jtblin/kube2iam). An example policy is in [examples/iam-role.json](examples/iam-role.json).

#### GCP Service Account

You need a GCP Service Account (GSA) that can be used by `k8s-pvc-tagger`. For GKE clusters, [Workload Identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity) should be used instead of a static JSON key.

It is recommended you create a custom IAM role for use by `k8s-pvc-tagger`. The permissions needed are:

- compute.disks.get
- compute.disks.list
- compute.disks.setLabels

An example terraform resources is in [examples/gcp-custom-role.tf](examples/gcp-custom-role.tf).

Or, with `gcloud`:

```sh
gcloud iam roles create CustomDiskRole \
--project=<your-project-id> \
--title="k8s-pvc-tagger" \
--description="Custom role to manage disk permissions" \
--permissions="compute.disks.get,compute.disks.list,compute.disks.setLabels" \
--stage="GA"
```

#### Install via helm

```
Expand Down
8 changes: 2 additions & 6 deletions aws.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,8 @@ import (
log "github.com/sirupsen/logrus"
)

var (
// awsSession the AWS Session
awsSession *session.Session
)
// awsSession the AWS Session
var awsSession *session.Session

const (
// Matching strings for region
Expand Down Expand Up @@ -215,7 +213,6 @@ func (client *FSxClient) addFSxVolumeTags(volumeID string, tags map[string]strin
ResourceARN: describeFileSystemOutput.FileSystems[0].ResourceARN,
Tags: convertTagsToFSxTags(tags),
})

if err != nil {
log.Errorln("Could not FSx create tags for volumeID:", volumeID, err)
promActionsTotal.With(prometheus.Labels{"status": "error", "storageclass": storageclass}).Inc()
Expand All @@ -240,7 +237,6 @@ func (client *FSxClient) deleteFSxVolumeTags(volumeID string, tags []*string, st
ResourceARN: describeVolumesOutput.Volumes[0].ResourceARN,
TagKeys: tags,
})

if err != nil {
log.Errorln("Could not FSx delete tags for volumeID:", volumeID, err)
promActionsTotal.With(prometheus.Labels{"status": "error", "storageclass": storageclass}).Inc()
Expand Down
11 changes: 11 additions & 0 deletions examples/gcp-custom-role.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
resource "google_project_iam_custom_role" "k8s-pvc-tagger" {
project = var.gcp_project
role_id = "k8s-pvc-tagger"
title = "k8s-pvc-tagger"
description = "A Custom role with minimum permission set for k8s-pvc-tagger"
permissions = [
"compute.disks.get",
"compute.disks.list",
"compute.disks.setLabels",
]
}
211 changes: 211 additions & 0 deletions gcp.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
package main

import (
"context"
"fmt"
"maps"
"strings"
"time"

"github.com/prometheus/client_golang/prometheus"
log "github.com/sirupsen/logrus"
"google.golang.org/api/compute/v1"
"k8s.io/apimachinery/pkg/util/wait"
)

type GCPClient interface {
GetDisk(project, zone, name string) (*compute.Disk, error)
SetDiskLabels(project, zone, name string, labelReq *compute.ZoneSetLabelsRequest) (*compute.Operation, error)
GetGCEOp(project, zone, name string) (*compute.Operation, error)
}

type gcpClient struct {
gce *compute.Service
}

func newGCPClient(ctx context.Context) (GCPClient, error) {
client, err := compute.NewService(ctx)
if err != nil {
return nil, err
}
return &gcpClient{gce: client}, nil
}

func (c *gcpClient) GetDisk(project, zone, name string) (*compute.Disk, error) {
return c.gce.Disks.Get(project, zone, name).Do()
}

func (c *gcpClient) SetDiskLabels(project, zone, name string, labelReq *compute.ZoneSetLabelsRequest) (*compute.Operation, error) {
return c.gce.Disks.SetLabels(project, zone, name, labelReq).Do()
}

func (c *gcpClient) GetGCEOp(project, zone, name string) (*compute.Operation, error) {
return c.gce.ZoneOperations.Get(project, zone, name).Do()
}

func addPDVolumeLabels(c GCPClient, volumeID string, labels map[string]string, storageclass string) {
sanitizedLabels := sanitizeLabelsForGCP(labels)
log.Debugf("labels to add to PD volume: %s: %s", volumeID, sanitizedLabels)

project, location, name, err := parseVolumeID(volumeID)
if err != nil {
log.Error(err)
return
}
disk, err := c.GetDisk(project, location, name)
if err != nil {
log.Error(err)
return
}

// merge existing disk labels with new labels:
updatedLabels := make(map[string]string)
if disk.Labels != nil {
updatedLabels = maps.Clone(disk.Labels)
}
maps.Copy(updatedLabels, sanitizedLabels)
if maps.Equal(disk.Labels, updatedLabels) {
log.Debug("labels already set on PD")
return
}

req := &compute.ZoneSetLabelsRequest{
Labels: updatedLabels,
LabelFingerprint: disk.LabelFingerprint,
}
op, err := c.SetDiskLabels(project, location, name, req)
if err != nil {
log.Errorf("failed to set labels on PD: %s", err)
promActionsTotal.With(prometheus.Labels{"status": "error", "storageclass": storageclass}).Inc()
return
}

waitForCompletion := func(_ context.Context) (bool, error) {
resp, err := c.GetGCEOp(project, location, op.Name)
if err != nil {
return false, fmt.Errorf("failed to set labels on PD %s: %s", disk.Name, err)
}
return resp.Status == "DONE", nil
}
if err := wait.PollUntilContextTimeout(context.TODO(),
time.Second,
time.Minute,
false,
waitForCompletion); err != nil {
log.Errorf("set label operation failed: %s", err)
promActionsTotal.With(prometheus.Labels{"status": "error", "storageclass": storageclass}).Inc()
return
}

log.Debug("successfully set labels on PD")
promActionsTotal.With(prometheus.Labels{"status": "success", "storageclass": storageclass}).Inc()
}

func deletePDVolumeLabels(c GCPClient, volumeID string, keys []string, storageclass string) {
if len(keys) == 0 {
return
}
sanitizedKeys := sanitizeKeysForGCP(keys)
log.Debugf("labels to delete from PD volume: %s: %s", volumeID, sanitizedKeys)

project, location, name, err := parseVolumeID(volumeID)
if err != nil {
log.Error(err)
return
}
disk, err := c.GetDisk(project, location, name)
if err != nil {
log.Error(err)
return
}
// if disk.Labels is nil, then there are no labels to delete
if disk.Labels == nil {
return
}

updatedLabels := maps.Clone(disk.Labels)
for _, k := range sanitizedKeys {
delete(updatedLabels, k)
}
if maps.Equal(disk.Labels, updatedLabels) {
return
}

req := &compute.ZoneSetLabelsRequest{
Labels: updatedLabels,
LabelFingerprint: disk.LabelFingerprint,
}
op, err := c.SetDiskLabels(project, location, name, req)
if err != nil {
log.Errorf("failed to delete labels from PD: %s", err)
promActionsTotal.With(prometheus.Labels{"status": "error", "storageclass": storageclass}).Inc()
return
}

waitForCompletion := func(_ context.Context) (bool, error) {
resp, err := c.GetGCEOp(project, location, op.Name)
if err != nil {
return false, fmt.Errorf("failed retrieve status of label update operation: %s", err)
}
return resp.Status == "DONE", nil
}
if err := wait.PollUntilContextTimeout(context.TODO(),
time.Second,
time.Minute,
false,
waitForCompletion); err != nil {
promActionsTotal.With(prometheus.Labels{"status": "error", "storageclass": storageclass}).Inc()
log.Errorf("delete label operation failed: %s", err)
return
}

log.Debug("successfully deleted labels from PD")
promActionsTotal.With(prometheus.Labels{"status": "success", "storageclass": storageclass}).Inc()
}

func parseVolumeID(id string) (string, string, string, error) {
parts := strings.Split(id, "/")
if len(parts) < 5 {
return "", "", "", fmt.Errorf("invalid volume handle format")
}
project := parts[1]
location := parts[3]
name := parts[5]
return project, location, name, nil
}

func sanitizeLabelsForGCP(labels map[string]string) map[string]string {
newLabels := make(map[string]string, len(labels))
for k, v := range labels {
newLabels[sanitizeKeyForGCP(k)] = sanitizeValueForGCP(v)
}
return newLabels
}

func sanitizeKeysForGCP(keys []string) []string {
newKeys := make([]string, len(keys))
for i, k := range keys {
newKeys[i] = sanitizeKeyForGCP(k)
}
return newKeys
}

// sanitizeKeyForGCP sanitizes a Kubernetes label key to fit GCP's label key constraints
func sanitizeKeyForGCP(key string) string {
key = strings.ToLower(key)
key = strings.NewReplacer("/", "_", ".", "-").Replace(key) // Replace disallowed characters
key = strings.TrimRight(key, "-_") // Ensure it does not end with '-' or '_'

if len(key) > 63 {
key = key[:63]
}
return key
}

// sanitizeKeyForGCP sanitizes a Kubernetes label value to fit GCP's label value constraints
func sanitizeValueForGCP(value string) string {
if len(value) > 63 {
value = value[:63]
}
return value
}
Loading

0 comments on commit 4a09989

Please sign in to comment.