terraform-provider-aws-parallelcluster fails on parallelcluster 3.11.0 with login nodes enabled #6489

kondakovm · 2024-10-22T09:26:48Z

The terraform-provider-aws-parallelcluster fails while parsing the cluster status during creation on the 3.11 API with login nodes enabled, resulting in the following error:

 Error: Error while waiting for cluster to finish updating.

   with module.parallelcluster_clusters.aws-parallelcluster_cluster.managed_configs["ParallelCluster"],
   on .terraform/modules/parallelcluster_clusters/modules/clusters/main.tf line 35, in resource "aws-parallelcluster_cluster" "managed_configs":
   35: resource "aws-parallelcluster_cluster" "managed_configs" {

 json: cannot unmarshal array into Go struct field _DescribeClusterResponseContent.loginNodes of type map[string]interface {}

Despite this error, the cluster was created and is fully operational, but terraform cannot read or import it ending with the same error.
This is most likely connected to transitioning from a single login node to multiple login nodes in a pool.

Additional info:
The deployment with login nodes works on parallelcluster 3.10.1.
The deployment works on parallelcluster 3.11.0 without login nodes enabled.

Required Info:

AWS ParallelCluster version 3.11.0
Full cluster configuration without any credentials or personal data:

 Region: eu-central-1
 CustomS3Bucket: parallelcluster-custom-bucket-name
 Image:
   Os: alinux2
 SharedStorage:
   - MountDir: /home
     Name: parallelcluster_shared
     StorageType: Ebs
     EbsSettings:
       VolumeType: gp3
       Size: 1000
       DeletionPolicy: Delete
 HeadNode:
   InstanceType: c5.large
   LocalStorage:
     RootVolume:
       Size: 100
       VolumeType: gp3
       DeleteOnTermination: true
   Networking:
     SubnetId: subnet-123456789123
   Iam:
     AdditionalIamPolicies:
       - Policy: arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
   Ssh:
     KeyName: parallelcluster_ssh_key
 LoginNodes:
   Pools:
     - Name: login
       Count: 1
       InstanceType: t3.small
       Ssh:
         KeyName: parallelcluster_ssh_key
       Networking:
         SubnetIds:
           - subnet-123456789123
       Iam:
         AdditionalIamPolicies:
           - Policy: arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
 Scheduling:
   Scheduler: slurm
   SlurmQueues:
     - Name: queue1
       CapacityType: SPOT
       Networking:
         SubnetIds:
           - subnet-123456789123
       Iam:
         AdditionalIamPolicies:
           - Policy: arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
       ComputeResources:
         - InstanceType: c5.xlarge
           MinCount: 0
           MaxCount: 10
           Name: c5xlarge
   SlurmSettings:
     QueueUpdateStrategy: TERMINATE
     Dns:
       DisableManagedDns: true
       UseEc2Hostnames: true

The text was updated successfully, but these errors were encountered:

hanwen-pcluste · 2024-10-25T17:44:43Z

Thank you for reporting the issue. We will work on a fix

hanwen-pcluste · 2024-10-28T15:25:03Z

The problem is solved in ParallelCluster 3.11.1. Please use the latest version.

kondakovm · 2024-10-29T09:45:40Z

Thank you for taking care of the issue, unfortunately, I got the same error when deploying the config with the 3.11.1 API using the parallelcluster provider:

Error: Error while waiting for cluster to finish updating.
  with module.parallelcluster_clusters.aws-parallelcluster_cluster.managed_configs["ParallelCluster"],
  on .terraform/modules/parallelcluster_clusters/modules/clusters/main.tf line 35, in resource "aws-parallelcluster_cluster" "managed_configs":
  35: resource "aws-parallelcluster_cluster" "managed_configs" {
json: cannot unmarshal array into Go struct field _DescribeClusterResponseContent.loginNodes of type map[string]interface {}

The cluster is created and fully functional, no errors in Lambda API logs, but terraform can't read/modify login nodes status.

gmarciani · 2024-11-04T11:37:54Z

Hi @kondakovm ,

we are working on the issue with 3.11.1.
Will give an update there once we have more info.

Thank you for reporting the issue.

kondakovm added the 3.x label Oct 22, 2024

hanwen-pcluste added the bug label Oct 25, 2024

hanwen-pcluste added the closing-soon-if-no-response label Oct 28, 2024

github-actions bot removed the closing-soon-if-no-response label Oct 29, 2024

gmarciani mentioned this issue Nov 4, 2024

[Docs] Fix cross-compatibility matrix by declaring that provider 1.0.0 is compatible with PC versions 3.8.0-3.10.1. aws-tf/terraform-provider-aws-parallelcluster#205

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

terraform-provider-aws-parallelcluster fails on parallelcluster 3.11.0 with login nodes enabled #6489

terraform-provider-aws-parallelcluster fails on parallelcluster 3.11.0 with login nodes enabled #6489

kondakovm commented Oct 22, 2024

hanwen-pcluste commented Oct 25, 2024

hanwen-pcluste commented Oct 28, 2024

kondakovm commented Oct 29, 2024

gmarciani commented Nov 4, 2024

terraform-provider-aws-parallelcluster fails on parallelcluster 3.11.0 with login nodes enabled #6489

terraform-provider-aws-parallelcluster fails on parallelcluster 3.11.0 with login nodes enabled #6489

Comments

kondakovm commented Oct 22, 2024

hanwen-pcluste commented Oct 25, 2024

hanwen-pcluste commented Oct 28, 2024

kondakovm commented Oct 29, 2024

gmarciani commented Nov 4, 2024