Automated multi-region Azure Kubernetes Service (AKS) cluster setup configured for Ray workloads with GPU support.
This repository provides infrastructure-as-code automation to deploy production-ready AKS clusters across multiple Azure regions. Each cluster is configured with:
- Azure CNI with Overlay networking
- OIDC and Workload Identity for secure authentication
- Three node pools: system, CPU (on-demand), and GPU (spot instances with A100 GPUs)
- NGINX Ingress Controller with LoadBalancer
- NVIDIA Device Plugin for GPU workload scheduling
- Anyscale Operator for AI/ML workload orchestration
- Azure CLI (
az) installed and authenticated - Anyscale CLI installed with credentials at
~/.anyscale/credentials.json - Helm 3.x
jqfor JSON parsing
- Clone the repository:
git clone <repository-url>
cd aks-anyscale- (Optional) Configure environment variables:
export ANYSCALE_CLOUD_NAME="my-cloud-name"
export REGIONS="eastus westus"
export PRIMARY_REGION="eastus"
export SUBSCRIPTION="your-subscription-id"
export RESOURCE_GROUP="my-rg"- Run the setup script:
./setup.shThe script will provision all infrastructure including:
- Resource group and storage account
- Virtual networks with NAT gateways
- AKS clusters with three node pools
- Kubernetes add-ons (NGINX, NVIDIA Device Plugin)
- Anyscale Operator installation and cloud registration
Default configuration can be overridden via environment variables. See setup.sh for all available options.
Configuration files for Kubernetes components:
values_nginx.yaml: NGINX Ingress Controller settingsvalues_nvidia.yaml: NVIDIA Device Plugin settingsvalues_anyscale.yaml: Anyscale Operator custom instance typescloud_resource.yaml: Template for secondary region registration