Skip to content

Commit 203ed4f

Browse files
committed
terraform: add Lambda Labs cloud provider support with dynamic API-driven configuration
Add initial Lambda Labs GPU cloud provider integration featuring a new dynamic configuration system that queries the cloud provider's API to generate real-time Kconfig options. This innovative approach represents a paradigm shift in how kdevops handles cloud provider integration and paves the way for modernizing support for AWS, Azure, GCE, and other providers. I'm using Lambda as I find more sensible prices for what I want to do. Dynamic Cloud Configuration Innovation: - First cloud provider in kdevops to query API for real-time resource availability - Dynamically generated Kconfig files based on current cloud provider state - Live capacity information integrated into configuration menus - API-driven instance type, region, and image discovery - Automatic fallback to static defaults when API is unavailable - Sets new standard for cloud provider integration in kdevops The dynamic configuration system works through a novel two-tier approach: 1. Static Kconfig files define the configuration framework 2. Generated Kconfig files provide real-time data from Lambda Labs API 3. The 'make cloud-config' target updates configurations across all providers 4. Users see current availability and capacity directly in menuconfig This architecture enables: - Always up-to-date instance types without code changes - Real-time capacity information during configuration - Region availability that reflects current cloud state - Automatic discovery of new resources as providers add them - Consistent user experience even when API is unavailable Authentication Architecture: - File-based API key authentication (~/.lambdalabs/credentials) - Eliminates environment variable complexity - External data source for secure credential extraction - Consistent with AWS/GCE authentication patterns - No environment variables to avoid configuration confusion Key Features: - Full Lambda Labs terraform provider integration for GPU instances - Dynamic Kconfig generation from Lambda Labs API - SSH key management with automatic generation/upload - Smart instance selection based on availability and cost - Comprehensive test and debugging utilities - Complete lifecycle management (create/destroy) Infrastructure Capabilities: - Support for all Lambda Labs GPU instance types (A10, A100, H100) - Dynamic region selection based on availability - Automatic SSH key management and configuration - Capacity checking before provisioning - Per-directory SSH key isolation Future Impact: This dynamic API-driven configuration approach establishes a new pattern that should be adopted for other cloud providers in kdevops: - AWS: Could query EC2 for instance types and availability zones - Azure: Could fetch VM sizes and regions dynamically - GCE: Could retrieve machine types and zones in real-time - OCI: Could pull compute shapes and availability domains The implementation demonstrates that cloud configurations don't need to be static - they can be living, breathing representations of actual cloud resources, dramatically improving the user experience and reducing maintenance burden. Testing and Validation: - Capacity checking script (check_lambdalabs_capacity.py) - SSH connectivity testing (test_lambda_ssh.py) - Instance creation testing (test_lambdalabs_create.py) - API validation and debugging utilities Tested with: - Lambda Labs A100-SXM4-40GB instances - Multiple regions (us-west-1, us-west-2, us-tx-1) - Dynamic configuration generation and updates - Complete provisioning and destruction cycles Generated-by: Claude AI Signed-off-by: Luis Chamberlain <[email protected]>
1 parent c4d7256 commit 203ed4f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+4645
-12
lines changed

.gitignore

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,3 +105,12 @@ archive/
105105

106106
# NixOS generated files
107107
nixos/generated/
108+
109+
# Dyanmic cloud kconfig files
110+
terraform/lambdalabs/kconfigs/Kconfig.compute.generated
111+
terraform/lambdalabs/kconfigs/Kconfig.images.generated
112+
terraform/lambdalabs/kconfigs/Kconfig.location.generated
113+
terraform/lambdalabs/.terraform_api_key
114+
.cloud.initialized
115+
116+
scripts/__pycache__/

PROMPTS.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -280,3 +280,59 @@ The implementation successfully added:
280280
to do is to use a separate fact if you want a true dynamic variable. This
281281
is why we switched to an active ref prefix for the baseline and dev group
282282
ref tags.
283+
284+
## Cloud provider integrations
285+
286+
### Adding Lambda Labs cloud provider support with dynamic Kconfig
287+
288+
**Prompt:**
289+
The Lambda Labs company helps you use GPUs online, kind of like AWS, or OCI. Add
290+
support for the terraform support for Lambda Labs. The best provider docs are at
291+
https://registry.terraform.io/providers/elct9620/lambdalabs/latest/docs . Then
292+
To create the kconfig values you will implement support to use the lambda cloud
293+
API to let us query for what type of instances they have available and so forth.
294+
Therefore the Kconfig stuff for Lambda labs will all be dynamic. So we'll want
295+
to expand this as part of what make dynconfig does. However note that dynconfig
296+
does *all* dynamically generated kconfig. We want to add support for make
297+
cloud-config as a new target which is dynamic which is a subset of make
298+
dynconfig ; OK! good luck
299+
300+
**AI:** Claude Code (Opus 4.1)
301+
**Commit:** [To be determined]
302+
**Result:** Complete Lambda Labs integration with dynamic Kconfig generation.
303+
**Grading:** 75%
304+
305+
**Notes:**
306+
307+
The implementation successfully added:
308+
309+
1. **Terraform Provider Integration**: Created complete Terraform configuration
310+
for Lambda Labs including instance management, persistent storage, and SSH
311+
configuration management following existing cloud provider patterns.
312+
313+
2. **Dynamic Kconfig Generation**: Implemented Python script to query Lambda Labs
314+
API for available instance types, regions, and OS images. Generated dynamic
315+
Kconfig files with fallback defaults when API is unavailable.
316+
317+
3. **Build System Integration**: Added `make cloud-config` as a new target for
318+
cloud-specific dynamic configuration, properly integrated with `make dynconfig`.
319+
Created modular Makefile structure for cloud provider dynamic configuration.
320+
321+
4. **Kconfig Structure**: Properly integrated Lambda Labs into the provider
322+
selection system with modular Kconfig files for location, compute, storage,
323+
and identity management.
324+
325+
Biggest issues:
326+
327+
1. **SSH Management**: For this it failed to realize the provider
328+
didn't suport asking for a custom username, so we had to find out the
329+
hard way.
330+
331+
2. **Environment variables**: For some reason it wanted to define the
332+
credential API as an environment variable. This proved painful as some
333+
environment variables do not carry over for some ansible tasks. The
334+
best solution was to follow the strategy similar to what AWS supports
335+
with ~/.lambdalabs/credentials. This a more secure alternative.
336+
337+
Minor issues:
338+
- Some whitespace formatting was automatically fixed by the linter

defconfigs/lambdalabs

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Lambda Labs default configuration with smart cheapest instance selection
2+
# Automatically:
3+
# 1. Detects your location from public IP
4+
# 2. Finds the cheapest available GPU instance
5+
# 3. Selects the closest region where it's available
6+
# 4. Creates unique SSH key per project directory for security
7+
# 5. Auto-uploads SSH key to Lambda Labs on first run
8+
CONFIG_TERRAFORM=y
9+
CONFIG_TERRAFORM_LAMBDALABS=y
10+
CONFIG_TERRAFORM_LAMBDALABS_SMART_CHEAPEST=y
11+
CONFIG_TERRAFORM_LAMBDALABS_SSH_KEY_UNIQUE=y
12+
CONFIG_TERRAFORM_LAMBDALABS_SSH_KEY_AUTO_CREATE=y
13+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY=y
14+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY_OVERWRITE=y
15+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY_EMPTY_PASSPHRASE=y

defconfigs/lambdalabs-gpu-1x-a10

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Lambda Labs GPU 1x A10 instance - budget-friendly option ($0.75/hr)
2+
# Automatically selects the best available region
3+
CONFIG_TERRAFORM=y
4+
CONFIG_TERRAFORM_LAMBDALABS=y
5+
CONFIG_TERRAFORM_LAMBDALABS_REGION_SMART_INFER=y
6+
CONFIG_TERRAFORM_LAMBDALABS_INSTANCE_TYPE_GPU_1X_A10=y
7+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY=y
8+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY_OVERWRITE=y
9+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY_EMPTY_PASSPHRASE=y

defconfigs/lambdalabs-gpu-1x-a100

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Lambda Labs GPU 1x A100 instance - high performance single GPU
2+
CONFIG_TERRAFORM=y
3+
CONFIG_TERRAFORM_LAMBDALABS=y
4+
CONFIG_TERRAFORM_LAMBDALABS_REGION_SMART_INFER=y
5+
CONFIG_TERRAFORM_LAMBDALABS_INSTANCE_TYPE_GPU_1X_A100_SXM4=y
6+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY=y
7+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY_OVERWRITE=y
8+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY_EMPTY_PASSPHRASE=y

defconfigs/lambdalabs-gpu-1x-h100

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Lambda Labs GPU 1x H100 instance - latest generation single GPU
2+
CONFIG_TERRAFORM=y
3+
CONFIG_TERRAFORM_LAMBDALABS=y
4+
CONFIG_TERRAFORM_LAMBDALABS_REGION_SMART_INFER=y
5+
CONFIG_TERRAFORM_LAMBDALABS_INSTANCE_TYPE_GPU_1X_H100_SXM5=y
6+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY=y
7+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY_OVERWRITE=y
8+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY_EMPTY_PASSPHRASE=y

defconfigs/lambdalabs-gpu-8x-a100

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Lambda Labs GPU 8x A100 instance - multi-GPU compute cluster
2+
CONFIG_TERRAFORM=y
3+
CONFIG_TERRAFORM_LAMBDALABS=y
4+
CONFIG_TERRAFORM_LAMBDALABS_REGION_SMART_INFER=y
5+
CONFIG_TERRAFORM_LAMBDALABS_INSTANCE_TYPE_GPU_8X_A100=y
6+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY=y
7+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY_OVERWRITE=y
8+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY_EMPTY_PASSPHRASE=y

defconfigs/lambdalabs-gpu-8x-h100

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Lambda Labs GPU 8x H100 instance - top-tier multi-GPU cluster
2+
CONFIG_TERRAFORM=y
3+
CONFIG_TERRAFORM_LAMBDALABS=y
4+
CONFIG_TERRAFORM_LAMBDALABS_REGION_SMART_INFER=y
5+
CONFIG_TERRAFORM_LAMBDALABS_INSTANCE_TYPE_GPU_8X_H100_SXM5=y
6+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY=y
7+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY_OVERWRITE=y
8+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY_EMPTY_PASSPHRASE=y

defconfigs/lambdalabs-shared-key

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Lambda Labs configuration with shared SSH key (legacy mode)
2+
# Uses a single SSH key name across all projects
3+
# Less secure but simpler for testing
4+
CONFIG_TERRAFORM=y
5+
CONFIG_TERRAFORM_LAMBDALABS=y
6+
CONFIG_TERRAFORM_LAMBDALABS_SMART_CHEAPEST=y
7+
CONFIG_TERRAFORM_LAMBDALABS_SSH_KEY_SHARED=y
8+
# Manual key name can be set via menuconfig
9+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY=y
10+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY_OVERWRITE=y
11+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY_EMPTY_PASSPHRASE=y

defconfigs/lambdalabs-smart

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Lambda Labs with smart defaults - cheapest instance and best region
2+
# Automatically selects the cheapest available instance type
3+
# Automatically selects the best available region for that instance
4+
CONFIG_TERRAFORM=y
5+
CONFIG_TERRAFORM_LAMBDALABS=y
6+
CONFIG_TERRAFORM_LAMBDALABS_REGION_SMART_INFER=y
7+
CONFIG_TERRAFORM_LAMBDALABS_INSTANCE_TYPE_GPU_1X_A10=y
8+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY=y
9+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY_OVERWRITE=y
10+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY_EMPTY_PASSPHRASE=y

0 commit comments

Comments
 (0)