Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: increase verbosity of pcluster cli when Importing CDK times out #6451

Open
MDBeudekerCN opened this issue Oct 7, 2024 · 3 comments

Comments

@MDBeudekerCN
Copy link

I'm trying to create a cluster using the pcluster cli from my local machine
When I have errors in the configuration it responds with the correct errors and FSids that are in my account, but when the configuration of my cluster is correct nothing happens;

❯ pcluster create-cluster -n frankfurt-test-2 -c pcluster-example-config.yaml --dryrun true
{
  "message": "Request would have succeeded, but DryRun flag is set."
}

However, when turning off the Dry-run flag:

❯ pcluster create-cluster -n frankfurt-test-2 -c pcluster-reduced.yaml --debug

It just keeps hanging on this command. Adding the --debug flag does not change anything.
Note that I am using granted to gain access to my aws account and that I'm able to deploy resources using terraform and I can u se the aws cli to interact with my resources.

I checked the logs of my pcluster ~/.parallelcluster/pcluster-cli.log and saw that it was stuck on the following line:

cdk_builder.py:34:build_cluster_template() - importing CDK.....

I used nvm to switch to a newer version of cdk, and now it runs, but it took me quite a long time to find this specific error.

ParallelCluster version:
3.11.0

  • Cluster name:
    frankfurt-test-2
  • Output of pcluster describe-cluster command.
    N/A - cluster would not create
  • [Optional] Arn of the cluster CloudFormation main stack:
    N/A - cluster would not create

Bug description and how to reproduce:
I don't get a but, just no feedback at all.

For issues with Slurm scheduler, please attach the following logs:

  • From Head node: /var/log/parallelcluster/clustermgtd, /var/log/parallelcluster/clusterstatusmgtd (if version >= 3.2.0), /var/log/parallelcluster/slurm_resume.log, /var/log/parallelcluster/slurm_suspend.log, /var/log/parallelcluster/slurm_fleet_status_manager.log (if version >= 3.2.0) and/var/log/slurmctld.log.
  • From Compute node: /var/log/parallelcluster/computemgtd.log and /var/log/slurmd.log.
@hanwen-pcluste
Copy link
Contributor

Hello Maurits,

Sorry for the late reply. To help us reproduce the issue, can you provide the version of CDK which were giving you troubles?

Thank you!

@tachylatus
Copy link

tachylatus commented Oct 18, 2024

I just started setting up ParallelCluster for the first time this week, and this is by far the biggest headache and time waster I stumbled upon, as I assumed the issue was somewhere in AWS (IAM, networking etc.).

Steps to reproduce on Ubuntu 22.04.5, using a manually compiled Python 3.12:

# this installs ancient Node.JS version 12, which triggers this issue
sudo apt-get install nodejs
# setup virtual environment, upgrade pip and install latest aws-parallelcluster
python3.12 -m venv ~/.local/opt/pcluster
~/.local/opt/pcluster/bin/pip install -U pip
~/.local/opt/pcluster/bin/pip install -U aws-parallelcluster
# install symlink to pcluster
ln -sft ~/.local/bin/ ~/.local/opt/pcluster/bin/pcluster
# create-cluster silently hangs, with "Importing CDK..." in ~/.parallelcluster/pcluster-cli.log
pcluster create-cluster --cluster-name test --cluster-configuration cluster-config.yaml --debug

Output of pcluster/bin/pip freeze:

attrs==23.2.0
aws-cdk.assets==1.204.0
aws-cdk.aws-acmpca==1.204.0
aws-cdk.aws-apigateway==1.204.0
aws-cdk.aws-applicationautoscaling==1.204.0
aws-cdk.aws-autoscaling==1.204.0
aws-cdk.aws-autoscaling-common==1.204.0
aws-cdk.aws-autoscaling-hooktargets==1.204.0
aws-cdk.aws-batch==1.204.0
aws-cdk.aws-certificatemanager==1.204.0
aws-cdk.aws-cloudformation==1.204.0
aws-cdk.aws-cloudfront==1.204.0
aws-cdk.aws-cloudwatch==1.204.0
aws-cdk.aws-codebuild==1.204.0
aws-cdk.aws-codecommit==1.204.0
aws-cdk.aws-codeguruprofiler==1.204.0
aws-cdk.aws-codestarnotifications==1.204.0
aws-cdk.aws-cognito==1.204.0
aws-cdk.aws-dynamodb==1.204.0
aws-cdk.aws-ec2==1.204.0
aws-cdk.aws-ecr==1.204.0
aws-cdk.aws-ecr-assets==1.204.0
aws-cdk.aws-ecs==1.204.0
aws-cdk.aws-efs==1.204.0
aws-cdk.aws-elasticloadbalancing==1.204.0
aws-cdk.aws-elasticloadbalancingv2==1.204.0
aws-cdk.aws-events==1.204.0
aws-cdk.aws-fsx==1.204.0
aws-cdk.aws-globalaccelerator==1.204.0
aws-cdk.aws-iam==1.204.0
aws-cdk.aws-imagebuilder==1.204.0
aws-cdk.aws-kinesis==1.204.0
aws-cdk.aws-kms==1.204.0
aws-cdk.aws-lambda==1.204.0
aws-cdk.aws-logs==1.204.0
aws-cdk.aws-route53==1.204.0
aws-cdk.aws-route53-targets==1.204.0
aws-cdk.aws-s3==1.204.0
aws-cdk.aws-s3-assets==1.204.0
aws-cdk.aws-sam==1.204.0
aws-cdk.aws-secretsmanager==1.204.0
aws-cdk.aws-servicediscovery==1.204.0
aws-cdk.aws-signer==1.204.0
aws-cdk.aws-sns==1.204.0
aws-cdk.aws-sns-subscriptions==1.204.0
aws-cdk.aws-sqs==1.204.0
aws-cdk.aws-ssm==1.204.0
aws-cdk.aws-stepfunctions==1.204.0
aws-cdk.cloud-assembly-schema==1.204.0
aws-cdk.core==1.204.0
aws-cdk.custom-resources==1.204.0
aws-cdk.cx-api==1.204.0
aws-cdk.region-info==1.204.0
aws-parallelcluster==3.11.0
boto3==1.35.42
botocore==1.35.42
cattrs==23.1.2
certifi==2024.8.30
charset-normalizer==3.4.0
click==8.1.7
clickclick==20.10.2
connexion==2.13.1
constructs==3.4.344
Flask==2.2.5
idna==3.10
importlib_resources==6.4.5
inflection==0.5.1
itsdangerous==2.2.0
Jinja2==3.1.4
jmespath==0.10.0
jsii==1.85.0
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
MarkupSafe==3.0.1
marshmallow==3.22.0
packaging==24.1
publication==0.0.3
python-dateutil==2.9.0.post0
PyYAML==6.0.2
referencing==0.35.1
requests==2.32.3
rpds-py==0.20.0
s3transfer==0.10.3
setuptools==69.5.1
six==1.16.0
tabulate==0.8.10
typeguard==2.13.3
typing_extensions==4.12.2
urllib3==2.2.3
Werkzeug==2.3.8

@hanwen-pcluste
Copy link
Contributor

hanwen-pcluste commented Oct 21, 2024

I can reproduce the issue. Per my understanding, the issue is because of the outdated NodeJS, right? I am working on improving ParallelCluster code

Thank you for the reproducer,
Hanwen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants