This document shows how to install different service meshes on AWS EKS cluster created using LWDE (to use other provider/creation methods make sure that the configuration meets product requirements).
- Contributor(s): Oleksandr
- Date Created: 18/03/2024
- Status: Approved
- Version: 1.0
The following Flux HelmRelease
template will be used to create clusters in LWDE. Depending on service mesh requirements, some values can be changed. Any changes will be highlighted in the Cluster Configuration
subsection of the Product section.
HelmRelease manifest
apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
metadata:
name: <cluster-name>
namespace: flux-system
spec:
chart:
spec:
chart: capi-aws-eks-cluster
version: 0.1.6
sourceRef:
kind: HelmRepository
name: lw-helm-charts-registry
interval: 5m0s
timeout: 15m0s
targetNamespace: <cluster-name>
releaseName: <cluster-name>
install:
createNamespace: true
dependsOn:
- name: capi-providers
values:
global:
variables:
targetEnv: lwde-production
baseURL:
sys: "<cluster-name>.sys.wyer.live"
mgmt: "mgmt.wyer.live"
apps:
enabled: false
machinePool:
replicas: 2
cluster:
region: us-east-1
addons:
- conflictResolution: overwrite
name: aws-ebs-csi-driver
version: v1.28.0-eksbuild.1
- conflictResolution: overwrite
name: vpc-cni
version: v1.16.3-eksbuild.2
- conflictResolution: overwrite
name: coredns
version: v1.10.1-eksbuild.7
This section provides instructions on how to install and operate Istio service mesh in Ambient mode.
This subsection provides installation instructions for Istio Ambient. Following these instructions you will install Istio service mesh in Ambient mode and Istio Gateway.
-
Add Helm repo
helm repo add istio https://istio-release.storage.googleapis.com/charts helm repo update
-
Install Istio CRDs
helm install istio-base istio/base -n istio-system --create-namespace --wait
-
Install Istio CNI
helm install istio-cni istio/cni -n istio-system --set profile=ambient --wait
-
Install Istio Discovery service
helm install istiod istio/istiod -n istio-system --set profile=ambient --wait
-
Install Istio ztunnel
helm install ztunnel istio/ztunnel -n istio-system --wait
-
Install Istio Ingress Gateway
helm install istio-ingress istio/gateway -n istio-ingress --create-namespace --wait
-
Verify installation
# Ensure pods are up and running kubectl get pods -n istio-system kubectl get pods -n istio-ingress # Ensure LoadBalancer service is created kubectl get svc -n istio-ingress istio-ingress
Refer to the official documentation to test Istio Ambient features.
List of day 2 operations:
Refer to the official Upgrade Guide for detailed instructions on upgrading.
Istio supports a wide variety of observability tools, you can find more details in the official documentation.
Follow the below instructions to uninstall Istio:
-
Uninstall Istio Gateway
helm delete istio-ingress -n istio-ingress
-
Uninstall Istio CNI
helm delete istio-cni -n istio-system
-
Uninstall Istio ztunnel
helm delete ztunnel -n istio-system
-
Uninstall Istio Discovery service
helm delete istiod -n istio-system
-
Uninstall Istio CRDs
helm delete istio-base -n istio-system
-
Delete remaining CRDs
kubectl get crd -oname | grep --color=never 'istio.io' | xargs kubectl delete
-
Delete namespaces
kubectl delete namespace istio-ingress kubectl delete namespace istio-system
This section provides instructions on how to install and operate Istio service mesh in Sidecar mode.
This subsection provides installation instructions for Istio Sidecar. Following these instructions you will install Istio service mesh in Sidecar mode and Istio Gateway.
-
Add Helm repo
helm repo add istio https://istio-release.storage.googleapis.com/charts helm repo update
-
Install Istio CRDs
helm install istio-base istio/base -n istio-system --create-namespace --set defaultRevision=default
-
Install Istio Discovery service
helm install istiod istio/istiod -n istio-system --wait
-
Install Istio Ingress Gateway
helm install istio-ingress istio/gateway -n istio-ingress --create-namespace --wait
-
Verify installation
# Ensure pods are up and running kubectl get pods -n istio-system kubectl get pods -n istio-ingress # Ensure LoadBalancer service is created kubectl get svc -n istio-ingress istio-ingress
Refer to the official documentation to test Istio features.
List of day 2 operations:
Refer to the official Upgrade Guide for detailed instructions on upgrading.
Istio supports a wide variety of observability tools, you can find more details in the official documentation.
Follow the below instructions to uninstall Istio:
-
Uninstall Istio Gateway
helm delete istio-ingress -n istio-ingress
-
Uninstall Istio Discovery service
helm delete istiod -n istio-system
-
Uninstall Istio CRDs
helm delete istio-base -n istio-system
-
Delete remaining CRDs
kubectl get crd -oname | grep --color=never 'istio.io' | xargs kubectl delete
-
Delete namespaces
kubectl delete namespace istio-ingress kubectl delete namespace istio-system
This section provides instructions on how to install and operate Cilium service mesh.
To install Cilium and enable Cilium service mesh on AWS EKS Cluster, the cluster should meet the following requirements:
- Disabled default AWS VPC CNI and respective addon
- Disabled kube proxy
- Cluster supports Persistent Storage (local/cloud)
The below changes were made to the Helm release template to create AWS EKS cluster using LWDE:
YAML
spec:
values:
cluster:
vpcCni:
disable: true
kubeProxy:
disable: true
addons:
- conflictResolution: overwrite
name: aws-ebs-csi-driver
version: v1.28.0-eksbuild.1
- conflictResolution: overwrite
name: coredns
version: v1.10.1-eksbuild.7
This section shows how to install Cilium service mesh, as part of this installation we will also enable Cilium Gateway API controller and mTLS features.
Note: Cilium recommends us to taint Managed Nodes Group, so Cilium works properly, but it is not necessary, as the below setup was tested and works properly too.
Below you can find step-by-step instructions for deploying Cilium Service Mesh on a brand new AWS EKS cluster.
-
Deploy Gateway API CRDs:
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.0.0/config/crd/standard/gateway.networking.k8s.io_gatewayclasses.yaml kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.0.0/config/crd/standard/gateway.networking.k8s.io_gateways.yaml kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.0.0/config/crd/standard/gateway.networking.k8s.io_httproutes.yaml kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.0.0/config/crd/standard/gateway.networking.k8s.io_referencegrants.yaml kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.0.0/config/crd/experimental/gateway.networking.k8s.io_grpcroutes.yaml kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.0.0/config/crd/experimental/gateway.networking.k8s.io_tlsroutes.yaml
-
Install local-path-provisioner:
git clone https://github.com/rancher/local-path-provisioner.git cd local-path-provisioner helm install local-path-storage --namespace local-path-storage ./deploy/chart/local-path-provisioner --create-namespace --set storageClass.defaultClass=true # Verify installation kubectl get pods -n local-path-storage kubectl get storageClass local-path
-
Patch
aws-node
DaemonSet to prevent conflict behavior:kubectl -n kube-system patch daemonset aws-node --type='strategic' -p='{"spec":{"template":{"spec":{"nodeSelector":{"io.cilium/aws-node-enabled":"true"}}}}}'
-
Add Helm repo
helm repo add cilium https://helm.cilium.io/ helm repo update
-
Deploy Cilium with the needed configuration:
# Value of API_SERVER_IP variable is public IP or hostname of your API Server API_SERVER_IP=<api server endpoint> API_SERVER_PORT=443 helm upgrade --install cilium cilium/cilium --version 1.15.1 \ --namespace kube-system \ --set eni.enabled=true \ --set ipam.mode=eni \ --set egressMasqueradeInterfaces=eth0 \ --set routingMode=native \ --set kubeProxyReplacement=true \ --set k8sServiceHost=${API_SERVER_IP} \ --set k8sServicePort=${API_SERVER_PORT} \ --set gatewayAPI.enabled=true \ --set authentication.mutual.spire.enabled=true \ --set authentication.mutual.spire.install.enabled=true \ --set authentication.mutual.spire.install.server.dataStorage.storageClass=local-path \ --set encryption.enabled=true \ --set encryption.type=wireguard \ --set encryption.nodeEncryption=true
Note: Cilium suggests us to enable Wireguard encryption to meet most of the common requirements
-
Verify Cilium installation:
cilium status
-
Once all pods are up and running, run connectivity test to ensure everything works properly:
cilium connectivity test
-
Ensure Wireguard is enabled:
kubectl -n kube-system exec -ti ds/cilium -- bash cilium-dbg status | grep Encryption
-
Ensure
SPIRE
server is healthy:kubectl exec -n cilium-spire spire-server-0 -c spire-server -- /opt/spire/bin/spire-server healthcheck
Troubleshooting: if you faced problems during installation, please refer to the Troubleshooting commands in the official documentation for the respective topic. A wide list of troubleshooting commands is also provided on this troubleshooting page
Additionally, you can follow these official examples to test the functionality of enabled features:
List of day 2 operations:
Refer to the official Upgrade Guide for detailed instructions on upgrading.
By default Cilium supports metrics collection using Prometheus and Hubble, refer to the Observability page to get more details.
We can't fully uninstall Cilium as it's cluster CNI and replaces kube proxy, so the below instructions show how to disable Cilium service mesh and uninstall Cilium prerequisites
-
Upgrade Cilium:
# Value of API_SERVER_IP variable is public IP or hostname of your API Server API_SERVER_IP=<api server endpoint> API_SERVER_PORT=443 helm upgrade --install cilium cilium/cilium --version 1.15.1 \ --namespace kube-system \ --set eni.enabled=true \ --set ipam.mode=eni \ --set egressMasqueradeInterfaces=eth0 \ --set routingMode=native \ --set kubeProxyReplacement=true \ --set k8sServiceHost=${API_SERVER_IP} \ --set k8sServicePort=${API_SERVER_PORT} \ --set gatewayAPI.enabled=false \ --set authentication.mutual.spire.enabled=false \ --set authentication.mutual.spire.install.enabled=false \ --set encryption.enabled=false \ --set encryption.nodeEncryption=false # Restart pods kubectl -n kube-system rollout restart deployment/cilium-operator kubectl -n kube-system rollout restart ds/cilium
-
Delete Gateway CRDs
kubectl delete -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.0.0/config/crd/standard/gateway.networking.k8s.io_gatewayclasses.yaml kubectl delete -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.0.0/config/crd/standard/gateway.networking.k8s.io_gateways.yaml kubectl delete -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.0.0/config/crd/standard/gateway.networking.k8s.io_httproutes.yaml kubectl delete -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.0.0/config/crd/standard/gateway.networking.k8s.io_referencegrants.yaml kubectl delete -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.0.0/config/crd/experimental/gateway.networking.k8s.io_grpcroutes.yaml kubectl delete -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.0.0/config/crd/experimental/gateway.networking.k8s.io_tlsroutes.yaml
-
Uninstall local-path-provisioner
helm uninstall local-path-storage --namespace local-path-storage
-
Run connectivity test to ensure Cilium works properly
cilium connectivity test
This section provides instructions on how to install and operate Linkerd service mesh.
This subsection provides installation instructions for Linkerd. Following these instructions you will install prerequisites for Linkerd and Linkerd Control Plane.
-
Prepare local environment
Add Helm repos
helm repo add linkerd https://helm.linkerd.io/stable helm repo add jetstack https://charts.jetstack.io --force-update helm repo update
Clone
poc-servicemesh2024
GitHub repogit clone https://github.com/livewyer-ops/poc-servicemesh2024.git
-
Install
cert-manager
helm upgrade -i \ cert-manager jetstack/cert-manager \ --namespace cert-manager \ --create-namespace \ --version v1.14.3 \ --set installCRDs=true # Ensure pods are up and running kubectl get pods -n cert-manager
-
Install trust-manager
helm upgrade -i -n cert-manager trust-manager jetstack/trust-manager --wait # Ensure pods are up and running kubectl get pods -n cert-manager
-
Install Linkerd CRDs
helm upgrade -i linkerd-crds linkerd/linkerd-crds \ -n linkerd --create-namespace --version 1.8.0
-
Install Bootstrap CA Helm chart
cd poc-servicemesh2024 helm upgrade -i -n cert-manager bootstrap-ca ./charts/bootstrap-ca --wait # Ensure trust anchor configmap is created kubectl get cm -n linkerd linkerd-identity-trust-roots
-
Install Linkerd Control plane
helm upgrade -i linkerd-control-plane \ -n linkerd \ --set identity.externalCA=true \ --set identity.issuer.scheme=kubernetes.io/tls \ linkerd/linkerd-control-plane --version 1.16.11
-
Verify Linkerd installation and mTLS on Linkerd pods
linkerd check linkerd viz -n linkerd edges deployment
Refer to the official documentation to test Linkerd features
List of day 2 operations:
Additionally, here are some tips to keep in mind before introducing Linkerd in production.
Refer to the official Upgrade Guide for detailed instructions on upgrading.
Linkerd has a viz
extension, that uses Prometheus as a scraping tool and allows us to deploy a Linkerd Dashboard and watch service mesh metrics.
Additionally, Linkerd supports different observability configurations such as:
Follow the below instructions to uninstall Linkerd and prerequisites:
-
Uninstall Linkerd Control Plane
helm uninstall linkerd-control-plane -n linkerd
-
Uninstall Linkerd CRDs
helm uninstall linkerd-crds -n linkerd
-
Delete Issuers and Certificates
helm uninstall bootstrap-ca -n cert-manager
-
Uninstall Trust Manager
helm uninstall trust-manager -n cert-manager
-
Uninstall Cert Manager
# Delete CRs kubectl get Issuers,ClusterIssuers,Certificates,CertificateRequests,Orders,Challenges --all-namespaces # Uninstall helm release helm --namespace cert-manager delete cert-manager
-
Delete namespaces
kubectl delete namespace cert-manager kubectl delete namespace linkerd kubectl delete crd bundles.trust.cert-manager.io
Using the above installation method we automated a part of the Control plane TLS certificates management, but we still need to renew Trust Anchor once it is about to expire. You can refer to this doc to get an understanding of Trust Anchor rotation process. Regarding the complexity of this process, Linkerd recommends using a trust anchor with a 10-year expiration period.
Linkerd automatically renews Webhook TLS certificates after each update, but you can delegate this task or rotate them manually as explained in the official documentation.