Skip to content

wandb/terraform-azurerm-wandb

Repository files navigation

Weights & Biases Azure Module

This is a Terraform module for provisioning a Weights & Biases Cluster on Azure. Weights & Biases Server is our self-hosted distribution of wandb.ai. It offers enterprises a private instance of the Weights & Biases application, with no resource limits and with additional enterprise-grade architectural features like audit logging and single sign-on.

About This Module

Pre-requisites

This module is intended to run in an Azure account with minimal preparation, however it does have the following pre-requisites:

Terraform version >= 1

Credentials / Permissions

How to Use This Module

Cluster Sizing

By default, the type of kubernetes instances, number of instances, redis cluster size, and database instance sizes are standardized via configurations in ./deployment-size.tf, and is configured via the size input variable.

Available sizes are, small, medium, large, xlarge, and xxlarge. Default is small.

All the values set via deployment-size.tf can be overridden by setting the appropriate input variables.

  • kubernetes_instance_type - The instance type for the EKS nodes
  • kubernetes_min_node_per_az - The minimum number of nodes in the EKS cluster
  • kubernetes_max_node_per_az - The maximum number of nodes in the EKS cluster
  • redis_capacity - The instance type for the redis cluster
  • database_sku_name - The instance type for the database

Examples

We have included documentation and reference examples for additional common installation scenarios for Weights & Biases, as well as examples for supporting resources that lack official modules.

  • Route

Requirements

Name Version
terraform ~> 1.0
azapi ~> 1.0
azurerm ~> 3.17
helm ~> 2.6
kubernetes ~> 2.23

Providers

Name Version
azapi ~> 1.0
azurerm ~> 3.17

Modules

Name Source Version
app_aks ./modules/app_aks n/a
app_lb ./modules/app_lb n/a
cert_manager ./modules/cert_manager n/a
clickhouse ./modules/clickhouse n/a
cron_job ./modules/cron_job n/a
database ./modules/database n/a
identity ./modules/identity n/a
networking ./modules/networking n/a
pod_identity ./modules/identity n/a
redis ./modules/redis n/a
storage ./modules/storage n/a
vault ./modules/vault n/a
wandb wandb/wandb/helm 1.2.0

Resources

Name Type
azapi_resource_list.az_zones data source
azurerm_subscription.current data source

Inputs

Name Description Type Default Required
allowed_ip_ranges Allowed public IP addresses or CIDR ranges. list(string) [] no
allowed_subscriptions List of allowed customer subscriptions coma seperated values string "" no
app_wandb_env Extra environment variables for W&B map(string) {} no
azuremonitor # To support otel azure monitor sql and redis metrics need operator-wandb chart minimum version 0.14.0 bool false no
blob_container Use an existing bucket. string "" no
bucket_path path of where to store data for the instance-level bucket string "" no
clickhouse_private_endpoint_service_name ClickHouse private endpoint 'Service name' (ends in .azure.privatelinkservice). string "" no
clickhouse_region ClickHouse region (eastus2, westus3, etc). string "" no
cluster_sku_tier The Azure AKS SKU Tier to use for this cluster (https://learn.microsoft.com/en-us/azure/aks/free-standard-pricing-tiers) string "Free" no
controller_image_tag Tag of the controller image to deploy string "1.14.0" no
create_private_link Use for the azure private link. bool false no
create_redis Boolean indicating whether to provision an redis instance (true) or not (false). bool false no
database_availability_mode n/a string "SameZone" no
database_sku_name Specifies the SKU Name for this MySQL Server. Defaults to null and value from deployment-size.tf is used string null no
database_version Version for MySQL string "5.7" no
deletion_protection If the instance should have deletion protection enabled. The database / Bucket can't be deleted when this value is set to true. bool true no
disable_storage_vault_key_id Flag to disable the customer_managed_key block, the properties 'encryption.identity, encryption.keyvaultproperties' cannot be updated in a single operation. bool false no
domain_name Domain for accessing the Weights & Biases UI. string null no
enable_database_vault_key Flag to enable managed key encryption for the database. Once enabled, cannot be disabled. bool false no
enable_storage_vault_key Flag to enable managed key encryption for the storage account. bool false no
external_bucket config an external bucket any null no
kubernetes_cluster_oidc_issuer_url OIDC issuer URL for the Kubernetes cluster. Can be determined using kubectl get --raw /.well-known/openid-configuration string "" no
kubernetes_instance_type Instance type for primary node group. Defaults to null and value from deployment-size.tf is used string null no
kubernetes_max_node_per_az Maximum number of nodes for the AKS cluster. Defaults to null and value from deployment-size.tf is used number null no
kubernetes_min_node_per_az Minimum number of nodes for the AKS cluster. Defaults to null and value from deployment-size.tf is used number null no
license Your wandb/local license string n/a yes
location n/a string n/a yes
namespace String used for prefix resources. string n/a yes
node_max_pods Maximum number of pods per node number 30 no
node_pool_num_zones Number of availability zones to use for the node pool when node_pool_zones is not set. If neither are set, 3 zones will be used number 2 no
node_pool_zones Availability zones for the node pool list(string) null no
oidc_auth_method OIDC auth method string "implicit" no
oidc_client_id The Client ID of application in your identity provider string "" no
oidc_issuer A url to your Open ID Connect identity provider, i.e. https://cognito-idp.us-east-1.amazonaws.com/us-east-1_uiIFNdacd string "" no
oidc_secret The Client secret of application in your identity provider string "" no
operator_chart_version Version of the operator chart to deploy string "1.3.4" no
other_wandb_env Extra environment variables for W&B map(any) {} no
parquet_wandb_env Extra environment variables for W&B map(string) {} no
redis_capacity Number indicating size of an redis instance. Defaults to null and value from deployment-size.tf is used number null no
size Deployment size string "small" no
ssl Enable SSL certificate bool true no
storage_account Azure storage account name string "" no
storage_key Azure primary storage access key string "" no
subdomain Subdomain for accessing the Weights & Biases UI. Default creates record at Route53 Route. string null no
tags Map of tags for resource map(string) {} no
use_internal_queue Uses an internal redis queue instead of using azure queue. bool false no
wandb_image Docker repository of to pull the wandb image from. string "wandb/local" no
wandb_version The version of Weights & Biases local to deploy. string "latest" no
weave_wandb_env Extra environment variables for W&B map(string) {} no

Outputs

Name Description
address n/a
aks_max_node_count n/a
aks_min_node_count n/a
aks_node_instance_type n/a
client_id n/a
cluster_ca_certificate n/a
cluster_client_certificate n/a
cluster_client_key n/a
cluster_host n/a
database_instance_type n/a
fqdn The FQDN to the W&B application
oidc_issuer_url n/a
private_link_resource_id n/a
private_link_sub_resource_name n/a
standardized_size n/a
tenant_id n/a
url The URL to the W&B application

Upgrading from 3.x to 4.x

3.0.0 introduced autoscaling to the AKS cluster and made the size variable the preferred way to set the cluster size. Previously, unless the size variable was set explicitly, there were default values for the following variables:

  • kubernetes_instance_type
  • kubernetes_node_count
  • redis_capacity
  • database_sku_name

The size variable is now defaulted to small, and the following values to can be used to partially override the values set by the size variable:

  • kubernetes_instance_type
  • kubernetes_min_node_per_az
  • kubernetes_max_node_per_az
  • redis_capacity
  • database_sku_name

For more information on the available sizes, see the Cluster Sizing section.

If having the cluster scale nodes in and out is not desired, the kubernetes_min_node_per_az and kubernetes_max_node_per_az can be set to the same value to prevent the cluster from scaling.

Upgrading from 2.x to 3.x

When upgrading from 2.x to 3.x, the following changes are required:

  1. Add the azapi provider to the required_providers block:
terraform {
  required_providers {
    azapi = {
      source  = "azure/azapi"
      version = "~> 1.0"
    }
  }
}
  1. Add the azapi provider to the provider block:
provider "azapi" {
    # azapi provider configuration should be the same as azurerm provider configuration
}