Skip to content

Commit

Permalink
Merge pull request #1256 from shirady/troubleshoot-sts
Browse files Browse the repository at this point in the history
Doc | Add Guide TroubleShoot AWS STS Cluster
  • Loading branch information
shirady authored Dec 25, 2023
2 parents 6970bb0 + 9db5c86 commit 22c083c
Showing 1 changed file with 129 additions and 0 deletions.
129 changes: 129 additions & 0 deletions doc/dev_guide/troubleshoot_sts_cluster.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# TroubleShoot AWS STS Cluster
Here are a couple of errors we saw during installations and how to investigate/fix them. The issues are happening when trying to install noobaa and the system is stuck in phase 'Configuring' while noobaa is trying to create the default backingstore that matches the AWS STS platform.

Please open the operator logs:

```bash
kubectl logs <operator-pod> -n <your-namespace> -f
```

Note: Changes in the printings of the logs attached here (in the logs you'll see your details):
- The role ARN with `<role-ARN>`.
- The request id with `<request-id>`.
- The namespace in the examples is "test1".

### Main issues:
#### 1) Wrong role:

```
time="2023-11-26T15:26:25Z" level=warning msg="⏳ Temporary Error: could not use AWS AssumeRoleWithWebIdentity with role name <role-ARN> and web identity token file /var/run/secrets/openshift/serviceaccount/token, AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: <request-id>" sys=test1/noobaa
```

##### Solution:
Edit the role in the AWS console.
You can use the script in `scripts/create_aws_role.sh` to help you change the role.
Most of the time it is related to the "Condition" part under trusted entities.

You also need to see that the token is projected:
```bash
MY_TOKEN=$(kubectl exec $(kubectl get pods -n <your-namespace> | grep operator | awk '{ print $1}') -c noobaa-operator -n <your-namespace> -- cat /var/run/secrets/openshift/serviceaccount/token)
```

```bash
echo ${MY_TOKEN}
```

And to verify that the issue is with the role please test it with assume-role-with-web-identity

```bash
aws sts assume-role-with-web-identity --role-arn <role-ARN> --role-session-name "test" --web-identity-token ${MY_TOKEN}
```

You should see in the output the credentials (which includes the `AccessKeyId`, `SecretAccessKey`, and `SessionToken` - output example is in file `doc/dev_guide/create_aws_sts_setup_on_minikube.md`, but in case the role is wrong you'll see still `AccessDenied`, so you can create a new role with the script and test it.

#### 2) Cluster configurations

```
time="2023-11-26T15:17:53Z" level=warning msg="⏳ Temporary Error: could not use AWS AssumeRoleWithWebIdentity with role <role-ARN> and web identity token file /var/run/secrets/openshift/serviceaccount/token, InvalidIdentityToken: No OpenIDConnect provider found in your account for https://kubernetes.default.svc\n\tstatus code: 400, request id: <request-id>" sys=test1/noobaa
```

##### Solution:
You need to make sure that the account issuer is set, try to run:

```bash
oc get authentication cluster -o jsonpath --template='{ .spec.serviceAccountIssuer }'
```

The structure of the output should be:
1) In case the OIDC bucket configurations are in an S3 public bucket: `https://<oidc_bucket_name>.s3.<aws_region>.amazonaws.com`.
2) In case the OIDC bucket configurations are in an S3 private bucket (with a public CloudFront distribution URL): `d111111abcdef8.cloudfront.net` (this example it taken from [AWS docs](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/GettingStartedCreateDistribution.html))
Please follow the Openshift documentation.

#### 3) Wrong audience

```
time="2023-11-27T08:05:43Z" level=warning msg="⏳ Temporary Error: could not use AWS AssumeRoleWithWebIdentity with role name <role-ARN> and web identity token file /var/run/secrets/openshift/serviceaccount/token, InvalidIdentityToken: Incorrect token audience\n\tstatus code: 400, request id: <request-id>" sys=test1/noobaa
```

##### Solution:
Add the needed audience to match between the create role and the identity provider, for example:
- api - as we did in the local cluster example in `doc/dev_guide/create_aws_sts_setup_on_minikube.md`.
- openshift - as needed in the openshift cluster.

#### 4) Missing details:

```
time="2023-11-27T07:50:20Z" level=info msg="Secret noobaa-aws-cloud-creds-secret was created successfully by cloud-credentials operator" sys=test1/noobaa
time="2023-11-27T07:50:20Z" level=info msg="identified aws region us-east-2" sys=test1/noobaa
time="2023-11-27T07:50:20Z" level=info msg="Initiating a Session with AWS" sys=test1/noobaa
time="2023-11-27T07:50:20Z" level=info msg="AssumeRoleWithWebIdentityInput, roleARN = webIdentityTokenPath = , " sys=test1/noobaa
time="2023-11-27T07:50:20Z" level=info msg="SetPhase: temporary error during phase \"Configuring\"" sys=test1/noobaa
time="2023-11-27T07:50:20Z" level=warning msg="⏳ Temporary Error: could not read WebIdentityToken from path , open : no such file or directory" sys=test1/noobaa
```

##### Solution:
The cloud credential operator (CCO) did not create the needed secret (it created a secret that matches AWS platform).
- Check that the secret contains the needed elements for AWS STS (role ARN and path for the token):

```bash
kubectl get secret noobaa-aws-cloud-creds-secret -n <your-namesapce> -o json | jq -r '.data.credentials' | base64 -d
```

You would see structure of:

```
[default]
aws_access_key_id = <access-key>
aws_secret_access_key = <secret-access-key>
```

instead of:

```
[default]
sts_regional_endpoints = regional
role_arn = <role-ARN>
web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token
```

- Check that the credential request contains the needed elements (role ARN and path):

```bash
kubectl get credentialsrequest noobaa-aws-cloud-creds -n test1 -o json | grep -E 'stsIAMRoleARN|cloudTokenPath'
```

- Try to delete the credentialsrequest and the secret.

```bash
kubectl delete credentialsrequest noobaa-aws-cloud-creds -n <your-namespace>
```

```bash
kubectl delete secret noobaa-aws-cloud-creds-secret -n <your-namespace>
```

- If after noobaa operator creates a new credential request and we get the secret from the CCO and it still not match what we need, we need to investigate in the logs of the CCO.

```bash
kubectl logs $(kubectl get pod -n openshift-cloud-credential-operator | grep cloud-credential-operator | awk '{ print $1}') -c cloud-credential-operator -n openshift-cloud-credential-operator --tail 50 -f
```

0 comments on commit 22c083c

Please sign in to comment.