Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Logs When ENI Allocation Fails Due to Insufficient Subnet IPs #3172

Open
YeongJJo opened this issue Jan 8, 2025 · 2 comments
Open

Comments

@YeongJJo
Copy link

YeongJJo commented Jan 8, 2025

What would you like to be added:

Although it is possible to identify this issue through pod events, I believe it would also be efficient to detect errors through logs in pods like aws-node.

For example:
Error: ENI allocation failed for worker node XXX - all IPs are allocated and the subnet has insufficient available IPs.

.
.
.
Why is this needed:

Description:
When the number of IPs in the subnet where a worker node is located is less than the maximum number of IPs that an ENI can use, the ENI is not allocated and there are no error logs.

Detailed Information:

  • In the test environment, there are two worker nodes located in different subnets (for convenience, let's call them A and B).
  • EC2 Type for Nodes A and B: Both node groups use an EC2 instance type that allows up to 30 IP addresses per ENI.
  • Resource Availability: Both nodes A and B have ample CPU and memory resources available.
  • Subnet for Node A: The subnet where node A is located has hundreds of available IPs, providing plenty of room. Therefore, additional ENIs have been allocated to node A, resulting in 2 ENIs and a total of 60 IPs (private IPs + secondary IPs) being used.

Now, for node B

  • Pending Pod Requests: There are still some pod creation requests pending.
  • ENI Usage on Node B: The primary ENI on node B has all 30 IPs in use.
  • Subnet for Node B: The subnet where node B is located has only about 20 available IPs left.

Observed Behavior

  • Since both nodes have ample CPU and memory resources, node scaling does not occur.
  • In this case, no additional ENIs are allocated to node B.
  • Pod creation continuously fails with the following error event:
    plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container

Additional Information:

  • However, there are no error or warning logs in the aws-node pod.
  • Although it is possible to identify this issue through pod events, I believe it would also be efficient to detect errors through logs in pods like aws-node.
  • For example:
    Error: The ENI of worker node XXX has all IPs allocated and the subnet has insufficient available IPs.
@dshehbaj
Copy link
Member

Hi @YeongJJo

I noticed that we do print out error log when ENI allocation fails or there are not enough IPv4 addresses/prefixes available.

ds.log.Errorf("DataStore has no available IP/Prefix addresses")

log.Errorf("Unable to attach IPs/Prefixes for the ENI, subnet doesn't seem to have enough IPs/Prefixes. Consider using new subnet or carve a reserved range using create-subnet-cidr-reservation")

Just want to make sure this is what you mean by logs, and if you are able to see them being printed when you run into the scenario you described above.

@yash97
Copy link
Contributor

yash97 commented Jan 17, 2025

To add more, logs of aws-node are present in this directory /var/log/aws-routed-eni. kubectl logs won't show all logs. This directory is hostVolume so you access it from worker node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants