Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Provisioning Now Working, Has Bugs #41

Open
wants to merge 32 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
676ffd8
refactor: Update SOP to reflect current project implementation status
aronchick Dec 10, 2024
61fcd55
refactor: Rename createRegionalInfrastructure method to CreateRegiona…
aronchick Dec 10, 2024
29a3477
refactor: Update getRegionAvailabilityZones to use regional EC2 client
aronchick Dec 10, 2024
af3e244
refactor: Prevent automatic stdout logging in logger initialization
aronchick Dec 11, 2024
9179654
refactor: Disable console logging by always returning nil in createCo…
aronchick Dec 11, 2024
0797bfc
refactor: Prevent console logging by disabling console core and calle…
aronchick Dec 11, 2024
7a4b17e
fix: Remove invalid zap.WithOptions() call in logger initialization
aronchick Dec 11, 2024
9df7dfb
refactor: Remove unused code and update regional resource handling
aronchick Dec 11, 2024
5c640d7
refactor: Modify setupNetworking to create Internet Gateway for all V…
aronchick Dec 11, 2024
e946bdf
refactor: Modify setupNetworking to create Internet Gateways for all …
aronchick Dec 11, 2024
11a3959
fix: Define regionalClient in AWS provider's setupNetworking method
aronchick Dec 11, 2024
b102f50
refactor: Update VPC cleanup to use regional clients and modify funct…
aronchick Dec 11, 2024
e7a4c62
test: Fix AWS networking setup timeout and improve error handling
aronchick Dec 11, 2024
7ed638d
style: Apply linter formatting to AWS provider code
aronchick Dec 11, 2024
2c5c1a8
feat: Enhance VPC creation with logging, tagging, and resource manage…
aronchick Dec 11, 2024
21c8089
feat: Enhance VPC infrastructure creation with improved logging and t…
aronchick Dec 11, 2024
d823c5c
refactor: Improve error handling and logging for multi-region AWS inf…
aronchick Dec 11, 2024
5a0e1ac
refactor: Improve VPC creation error handling and logging in AWS prov…
aronchick Dec 11, 2024
8f3af6d
fixing locations
aronchick Dec 12, 2024
5b81ffb
refactor: Improve AWS provider infrastructure setup and region handling
aronchick Dec 12, 2024
5ae3c13
refactor: Enhance network connectivity logging with route table and i…
aronchick Dec 12, 2024
3625a2a
feat: Add comprehensive VPC infrastructure creation method with detai…
aronchick Dec 12, 2024
30b802c
feat: Ensure unique internet gateway creation for each VPC
aronchick Dec 12, 2024
2acd6cb
refactor: Enhance VPC networking setup with detailed logging and veri…
aronchick Dec 12, 2024
9736d9f
test: Add network diagnostics to AWS provider
aronchick Dec 12, 2024
2e306d4
basic amazon now working
aronchick Dec 14, 2024
975d05e
Update internal/clouds/general/location.go
aronchick Dec 14, 2024
b37ef74
Update internal/clouds/general/location.go
aronchick Dec 14, 2024
e3a39ea
Update cmd/beta/aws/create_deployment.go
aronchick Dec 14, 2024
bfa4b27
merge from main
aronchick Dec 14, 2024
d5803f6
spelling fix
aronchick Dec 14, 2024
d7a529d
removing duplicate mocks
aronchick Dec 14, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .cspell/custom-dictionary.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ allowlistedlocalpaths
Andaime
andaimeconfig
andaimeuser
APAC
apitype
apiv
armcompute
Expand Down Expand Up @@ -109,6 +110,7 @@ dryrun
dupl
eastasia
eastus
enis
errcheck
Errf
errgroup
Expand Down Expand Up @@ -170,6 +172,7 @@ gosimple
govet
gserviceaccount
Gtthi
Hamina
heartbeatcheckfrequency
heartbeatfrequency
heartbeattopic
Expand All @@ -182,6 +185,7 @@ housekeepingbackgroundtaskinterval
housekeepinginterval
housekeepingtimeout
htmltemplate
iface
ignorephysicalresourcelimits
igws
ineffassign
Expand Down Expand Up @@ -237,6 +241,7 @@ mitchellh
mktemp
mmddhhmm
mockprivatekey
Msgf
nakedret
nameprovider
nanosec
Expand Down Expand Up @@ -372,6 +377,7 @@ ukwest
unbuffered
unconvert
UNIQID
unitedstates
unmarshaling
unmarshalling
unpadded
Expand Down
10 changes: 5 additions & 5 deletions .envrc
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# shellcheck disable=SC1090
. <( flox activate; );

unset GOROOT
export PATH=$PATH:$GOROOT/bin
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin
export PATH=/Users/daaronch/.cache/flox/run/aronchick/andaime.19a05c92/bin:$PATH
unset GOROOT
export PATH=$PATH:$GOROOT/bin
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin
export PATH=/Users/daaronch/.cache/flox/run/aronchick/andaime.19a05c92/bin:$PATH
58 changes: 36 additions & 22 deletions ai/sop/spot.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,51 @@
# AWS Spot Instance Implementation Tasks

## Phase 1: Core Infrastructure Setup
## Phase 1: Core Infrastructure Setup (In Progress)

### Configuration & Types
1. Define spot instance configuration struct
- [ ] Add spot-specific fields to deployment config
- [ ] Create validation functions for spot configs
- [ ] Add price threshold configurations
- [ ] Add AZ validation requirements
- [ ] Create function to check minimum AZ count (>=2)
- [ ] Add early validation before deployment starts
- [ ] Implement clear error messaging for AZ validation failures
- [ ] Add AZ count validation to config validation pipeline
- [x] Add basic spot-specific fields to deployment config
- [x] Initial validation functions for deployment configs
- [ ] Enhance price threshold configurations
- [x] Implement AZ validation requirements
- [x] Create function to check minimum AZ count (>=2)
- [x] Add early validation before deployment starts
- [x] Implement clear error messaging for AZ validation failures
- [ ] Refine AZ count validation in config pipeline

2. Create spot instance type definitions
- [ ] Define SpotInstanceRequest struct
- [ ] Add spot pricing history types
- [ ] Create spot termination notice types
- [ ] Add AZ distribution configuration
- [ ] Define minimum AZ requirements
- [ ] Create AZ distribution strategy types
- [ ] Add AZ fallback configurations
- [x] Initial SpotInstanceRequest struct design
- [ ] Complete spot pricing history types
- [ ] Develop spot termination notice types
- [x] Initial AZ distribution configuration
- [x] Define basic AZ requirements
- [ ] Create advanced AZ distribution strategy types
- [ ] Implement comprehensive AZ fallback configurations

### Basic Operations
3. Implement spot price checking
- [ ] Create function to fetch current spot prices
- [ ] Add price history analysis
- [ ] Implement price threshold validation
- [x] Basic infrastructure for price checking
- [ ] Develop comprehensive price history analysis
- [ ] Implement advanced price threshold validation

4. Create spot request handling
- [ ] Implement spot instance request creation
- [ ] Add request status monitoring
- [ ] Create request cancellation logic
- [x] Initial spot instance request creation framework
- [ ] Enhance request status monitoring
- [ ] Develop robust request cancellation logic

## Current Project Status
- Infrastructure creation is functional
- Basic AWS deployment workflow implemented
- SSH connectivity and machine provisioning working
- Bacalhau cluster deployment integrated
- Logging and error handling in place

## Immediate Next Steps
- Implement spot instance specific features
- Enhance price and availability zone strategies
- Develop more granular fallback mechanisms
- Create comprehensive testing suite for spot instances
- Improve CLI and configuration management for spot deployments

## Phase 2: Instance Management

Expand Down
31 changes: 21 additions & 10 deletions cmd/andaime.go
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,23 @@ func GetSession(region string) *session.Session {

return sess
}

// getUbuntuAMIId retrieves the latest Ubuntu AMI ID from AWS for a given architecture.
// It specifically looks for Ubuntu 22.04 (Jammy) images from Canonical.
//
// Parameters:
//
// svc: AWS EC2 service client
// arch: Target architecture (e.g., "x86_64" or "arm64")
//
// Returns:
//
// string: AMI ID if found
// error: Error if any occurred during the operation
func getUbuntuAMIId(svc *ec2.EC2, arch string) (string, error) {
const ubuntuVersion = "ubuntu-jammy-22.04"
const canonicalOwnerID = "099720109477"

describeImagesInput := &ec2.DescribeImagesInput{
Filters: []*ec2.Filter{
{
Expand All @@ -208,34 +224,29 @@ func getUbuntuAMIId(svc *ec2.EC2, arch string) (string, error) {
Values: aws.StringSlice([]string{"available"}),
},
},
Owners: aws.StringSlice([]string{"099720109477"}), // Canonical's owner ID
Owners: aws.StringSlice([]string{canonicalOwnerID}),
}

// Call DescribeImages to find matching AMIs
result, err := svc.DescribeImages(describeImagesInput)
if err != nil {
fmt.Printf("Failed to describe images, %v\n", err)
return "", err
return "", fmt.Errorf("failed to describe images: %w", err)
}

if len(result.Images) == 0 {
fmt.Println("No Ubuntu AMIs found")
return "", err
return "", fmt.Errorf("no Ubuntu AMIs found")
}

// Filter the results to find the latest image that matches the desired pattern
var latestImage *ec2.Image
for _, image := range result.Images {
if strings.Contains(*image.Name, "ubuntu-jammy-22.04") {
if strings.Contains(*image.Name, ubuntuVersion) {
if latestImage == nil || *image.CreationDate > *latestImage.CreationDate {
latestImage = image
}
}
}

if latestImage == nil {
fmt.Println("No matching Ubuntu 22.04 AMIs found")
return "", fmt.Errorf("no matching Ubuntu 22.04 AMIs found")
return "", fmt.Errorf("no matching %s AMIs found", ubuntuVersion)
}

if VerboseModeFlag {
Expand Down
58 changes: 48 additions & 10 deletions cmd/beta/aws/create_deployment.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ import (
"fmt"
"time"

"github.com/aws/aws-sdk-go-v2/service/ec2"
"github.com/bacalhau-project/andaime/pkg/display"
"github.com/bacalhau-project/andaime/pkg/logger"
"github.com/bacalhau-project/andaime/pkg/models"
aws_interface "github.com/bacalhau-project/andaime/pkg/models/interfaces/aws"
aws_provider "github.com/bacalhau-project/andaime/pkg/providers/aws"

"github.com/bacalhau-project/andaime/pkg/sshutils"
Expand Down Expand Up @@ -41,7 +41,7 @@ func ExecuteCreateDeployment(cmd *cobra.Command, _ []string) error {
FilePath: viper.GetString("general.log_path"),
Format: viper.GetString("general.log_format"),
WithTrace: true,
EnableConsole: true,
EnableConsole: false,
EnableBuffer: true,
BufferSize: 8192,
InstantSync: true,
Expand Down Expand Up @@ -103,13 +103,6 @@ func ExecuteCreateDeployment(cmd *cobra.Command, _ []string) error {
return fmt.Errorf("failed to write VPC ID to config: %w", err)
}

// Ensure EC2 client is initialized
ec2Client := awsProvider.GetEC2Client()
if ec2Client == nil {
ec2Client = ec2.NewFromConfig(*awsProvider.GetConfig())
awsProvider.SetEC2Client(ec2Client)
}

m := display.NewDisplayModel(deployment)
prog := display.GetGlobalProgramFunc()

Expand Down Expand Up @@ -181,6 +174,20 @@ func prepareDeployment(
m.Deployment.SetMachines(machines)
m.Deployment.SetLocations(locations)

if m.Deployment.AWS.RegionalResources.VPCs == nil {
m.Deployment.AWS.RegionalResources.VPCs = make(map[string]*models.AWSVPC)
}
if m.Deployment.AWS.RegionalResources.Clients == nil {
m.Deployment.AWS.RegionalResources.Clients = make(map[string]aws_interface.EC2Clienter)
}
for _, machine := range m.Deployment.GetMachines() {
region := machine.GetRegion()
if _, exists := m.Deployment.AWS.RegionalResources.VPCs[region]; !exists {
m.Deployment.AWS.RegionalResources.SetVPC(region, &models.AWSVPC{})
m.Deployment.AWS.RegionalResources.SetClient(region, nil)
}
}
aronchick marked this conversation as resolved.
Show resolved Hide resolved

return m.Deployment, nil
}

Expand Down Expand Up @@ -268,6 +275,14 @@ func runDeployment(ctx context.Context, awsProvider *aws_provider.AWSProvider) e
return fmt.Errorf("failed to create infrastructure: %w", err)
}

l.Debug("Infrastructure created successfully")
l.Debug("Creating regional networking...")
if err := awsProvider.CreateRegionalResources(ctx, m.Deployment.AWS.RegionalResources.GetRegions()); err != nil {
return fmt.Errorf("failed to create regional networking: %w", err)
}

l.Debug("Regional networking created successfully")

// Wait for network propagation and connectivity
l.Info("Waiting for network propagation...")
if err := awsProvider.WaitForNetworkConnectivity(ctx); err != nil {
Expand Down Expand Up @@ -301,13 +316,35 @@ func runDeployment(ctx context.Context, awsProvider *aws_provider.AWSProvider) e
)
}

machine.SetMachineResourceState(
models.ServiceTypeSSH.Name,
models.ResourceStatePending,
)

if err := sshConfig.WaitForSSH(ctx, sshutils.SSHRetryAttempts, sshutils.GetAggregateSSHTimeout()); err != nil {
machine.SetMachineResourceState(
models.ServiceTypeSSH.Name,
models.ResourceStateFailed,
)
return fmt.Errorf(
"failed to establish SSH connection to machine %s: %w",
machine.GetName(),
err,
)
}
machine.SetMachineResourceState(
models.ServiceTypeSSH.Name,
models.ResourceStateSucceeded,
)

aronchick marked this conversation as resolved.
Show resolved Hide resolved
m.QueueUpdate(display.UpdateAction{
MachineName: machine.GetName(),
UpdateData: display.UpdateData{
UpdateType: display.UpdateTypeResource,
ResourceType: "SSH",
ResourceState: models.MachineResourceState(models.ServiceTypeSSH.State),
},
})

l.Infof("Machine %s is accessible via SSH", machine.GetName())
}
Expand Down Expand Up @@ -398,7 +435,8 @@ func writeConfig() {
machines[name] = map[string]interface{}{
"public_ip": machine.GetPublicIP(),
"private_ip": machine.GetPrivateIP(),
"location": machine.GetLocation(),
"region": machine.GetRegion(),
"zone": machine.GetZone(),
}
}

Expand Down
6 changes: 3 additions & 3 deletions cmd/beta/azure/create_deployment_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -224,12 +224,12 @@ func (suite *CmdBetaAzureCreateDeploymentSuite) TestPrepareDeployment() {
// Check if machines were properly configured
suite.Require().Len(deployment.Machines, 1)

var machine *models.Machine
var machine models.Machiner
for _, m := range deployment.Machines {
machine = m.(*models.Machine)
machine = m
break
}
suite.Equal("eastus", machine.GetLocation())
suite.Equal("eastus", machine.GetRegion())
suite.Equal("Standard_D2s_v3", machine.GetVMSize())
suite.True(machine.IsOrchestrator())
}
Expand Down
6 changes: 3 additions & 3 deletions cmd/beta/gcp/create_deployment_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -351,9 +351,9 @@ func (suite *CmdBetaGCPCreateDeploymentSuite) TestPrepareDeployment_CustomMachin
// Verify custom machine configuration
suite.Require().Len(deployment.Machines, 2, "Expected 2 machines to be created")
for _, m := range deployment.Machines {
machine, ok := m.(*models.Machine)
suite.Require().True(ok, "Expected machine to be of type *models.Machine")
suite.Equal("us-west1-b", machine.GetLocation(), "Expected location to be us-west1-b")
machine := m
suite.Equal("us-west1", machine.GetRegion(), "Expected location to be us-west1")
suite.Equal("us-west1-b", machine.GetZone(), "Expected zone to be us-west1-b")
suite.Equal(
"n1-standard-4",
machine.GetVMSize(),
Expand Down
8 changes: 4 additions & 4 deletions cmd/beta/gcp/create_vm.go
Original file line number Diff line number Diff line change
Expand Up @@ -103,15 +103,15 @@ func createVM(cmd *cobra.Command, args []string) error {
m.Deployment.GCP.BillingAccountID = billingAccountID
m.Deployment.GCP.DefaultRegion = getRegionFromZone(zone)
m.Deployment.GCP.DefaultZone = zone
region := getRegionFromZone(zone)
machine, err := models.NewMachine(
models.DeploymentTypeGCP,
vmName,
machineType,
diskSizeGB,
models.CloudSpecificInfo{
Zone: zone,
Region: getRegionFromZone(zone),
},
region,
zone,
models.CloudSpecificInfo{},
aronchick marked this conversation as resolved.
Show resolved Hide resolved
)
if err != nil {
return err
Expand Down
3 changes: 2 additions & 1 deletion cmd/beta/gcp/progress_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ func (suite *GCPProgressTestSuite) SetupTest() {
machine := &models.Machine{
ID: "test-machine",
Name: "test-machine",
Location: "us-central1-a",
Region: "us-central1",
Zone: "us-central1-a",
VMSize: "n1-standard-2",
SSHPort: 22,
SSHUser: "test-user",
Expand Down
4 changes: 3 additions & 1 deletion cmd/beta/provision/provisioner_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -417,9 +417,11 @@ func (cbpts *CmdBetaProvisionTestSuite) TestProvisionerLowLevelFailure() {

testMachine, err := models.NewMachine(
models.DeploymentTypeAWS,
"us-east-1",
"us-east-1a",
"test",
1,
"us-east-1",
"us-east-1a",
models.CloudSpecificInfo{},
)
testMachine.SetNodeType(models.BacalhauNodeTypeOrchestrator)
Expand Down
Loading
Loading