This guide is focused on deploying Scylla on GKE with maximum performance (without any persistence guarantees). It sets up the kubelets on GKE nodes to run with static cpu policy and uses local sdd disks in RAID0 for maximum performance.
Most of the commands used to setup the Scylla cluster are the same for all environments As such we have tried to keep them separate in the general guide.
If you don't want to run the commands step-by-step, you can just run a script that will set everything up for you:
# Edit according to your preference
GCP_USER=$(gcloud config list account --format "value(core.account)")
GCP_PROJECT=$(gcloud config list project --format "value(core.project)")
GCP_ZONE=us-west1-b
# From inside the examples/gke folder
cd examples/gke
./gke.sh -u "$GCP_USER" -p "$GCP_PROJECT" -z "$GCP_ZONE"
# Example:
# ./gke.sh -u [email protected] -p gke-demo-226716 -z us-west1-b
After you deploy, see how you can benchmark your cluster with cassandra-stress.
First of all, we export all the configuration options as environment variables. Edit according to your own environment.
GCP_USER=$( gcloud config list account --format "value(core.account)" )
GCP_PROJECT=$( gcloud config list project --format "value(core.project)" )
GCP_REGION=us-west1
GCP_ZONE=us-west1-b
CLUSTER_NAME=scylla-demo
CLUSTER_VERSION=$( gcloud container get-server-config --zone ${GCP_ZONE} --format "value(validMasterVersions[0])" )
First we need to change kubelet CPU Manager policy to static by providing a config file. Create file called systemconfig.yaml
with the following content:
kubeletConfig:
cpuManagerPolicy: static
Then we'll create a GKE cluster with the following:
-
A NodePool of 2
n1-standard-8
Nodes, where the operator and the monitoring stack will be deployed. These are generic Nodes and their free capacity can be used for other purposes.gcloud container \ clusters create "${CLUSTER_NAME}" \ --cluster-version "${CLUSTER_VERSION}" \ --node-version "${CLUSTER_VERSION}" \ --machine-type "n1-standard-8" \ --num-nodes "2" \ --disk-type "pd-ssd" --disk-size "20" \ --image-type "UBUNTU_CONTAINERD" \ --system-config-from-file=systemconfig.yaml \ --enable-stackdriver-kubernetes \ --no-enable-autoupgrade \ --no-enable-autorepair
-
A NodePool of 2
n1-standard-32
Nodes to deploycassandra-stress
later on.gcloud container --project "${GCP_PROJECT}" \ node-pools create "cassandra-stress-pool" \ --cluster "${CLUSTER_NAME}" \ --zone "${GCP_ZONE}" \ --node-version "${CLUSTER_VERSION}" \ --machine-type "n1-standard-32" \ --num-nodes "2" \ --disk-type "pd-ssd" --disk-size "20" \ --node-taints role=cassandra-stress:NoSchedule \ --image-type "UBUNTU_CONTAINERD" \ --no-enable-autoupgrade \ --no-enable-autorepair
-
A NodePool of 4
n1-standard-32
Nodes, where the Scylla Pods will be deployed. Each of these Nodes has 8 local SSDs attached, which are combined into a RAID0 array by using gcloud beta featureephemeral-storage
. It is important to disableautoupgrade
andautorepair
. Automatic cluster upgrade or node repair has a hard timeout after which it no longer respect PDBs and force deletes the Compute Engine instances, which also deletes all data on the local SSDs. At this point, it's better to handle upgrades manually, with more control over the process and error handling.gcloud beta container \ node-pools create "scylla-pool" \ --cluster "${CLUSTER_NAME}" \ --node-version "${CLUSTER_VERSION}" \ --machine-type "n1-standard-32" \ --num-nodes "4" \ --disk-type "pd-ssd" --disk-size "20" \ --ephemeral-storage local-ssd-count="8" \ --node-taints role=scylla-clusters:NoSchedule \ --node-labels scylla.scylladb.com/gke-ephemeral-storage-local-ssd=true \ --image-type "UBUNTU_CONTAINERD" \ --no-enable-autoupgrade \ --no-enable-autorepair
(By default GKE doesn't give you the necessary RBAC permissions)
Get the credentials for your new cluster
gcloud container clusters get-credentials "${CLUSTER_NAME}" --zone="${GCP_ZONE}"
Create a ClusterRoleBinding for your user.
In order for this to work you need to have at least permission container.clusterRoleBindings.create
.
The easiest way to obtain this permission is to enable the Kubernetes Engine Admin
role for your user in the GCP IAM web interface.
kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user "${GCP_USER}"
If you don't have Helm installed, Go to the helm docs to get the binary for your distro.
To run Scylla, it is necessary to convert ephemeral storage's filesystem to xfs
. Deploy the xfs-formatter
DaemonSet to have it taken care of.
Unfortunately, GKE is only able to provision the ephemeral storage with ext4
filesystem while Scylla requires xfs
filesystem. Deploying the xfs-format
DaemonSet will format the storage as xfs
and prevent GKE from reformatting it back to ext4
.
Note that despite our best efforts, this solution is only a workaround. Its robustness depends on GKE's disk formatting logic remaining unchanged, for which there is no guarantee.
kubectl apply -f examples/gke/xfs-formatter-daemonset.yaml
Afterwards, deploy the local volume provisioner, which will discover the RAID0 arrays' mount points and make them available as PersistentVolumes.
helm install local-provisioner examples/common/provisioner
In order for the example to work you need to modify the cluster definition in the following way:
sed -i "s/<gcp_region>/${GCP_REGION}/g;s/<gcp_zone>/${GCP_ZONE}/g" examples/gke/cluster.yaml
This will inject your region and zone into the cluster definition so that it matches the kubernetes cluster you just created.
Now you can follow the generic guide to install the operator and launch your Scylla cluster in a highly performant environment.
Instructions on how to access the database can also be found in the generic guide.
Once you are done with your experiments delete your cluster using the following command:
gcloud container --project "${GCP_PROJECT}" clusters delete --zone "${GCP_ZONE}" "${CLUSTER_NAME}"