Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure proper priority for route tables #2858

Merged
merged 1 commit into from
Dec 31, 2024

Conversation

hanwen-cluster
Copy link
Contributor

@hanwen-cluster hanwen-cluster commented Dec 31, 2024

#2855 made the pcluster route table/metric number larger (meaning lower priority). Thereafter, some unwanted default rules on AL2023 ec2-net-utils took priority and failed test_multiple_nics integration test on AL2023. Then, #2857 made the number too small, interfering route table configurations from IMDS on AL2.

Therefore, this commit tries to imitate the priority prior to these two PRs. This is not the cleanest fix, because it is staying in the lucky priority range instead of fully resolving the issue (i.e. prevent IMDS and ec2-net-utils from configuring the route tables). However, this commit is the least breaking change. So I propose to go with this commit.

Metric number range before the two PRs
Network card (0,0): 1000
Network card (0,1): 1000 (which was causing conflicts and the reason for all these PRs)
Network card (n,1): 100n (for p5, which has 32 network card, it will be 1000-10031)

Metric number range after the first PR:
Network card (0,0): 1000000
Network card (0,1): 1000001 (conflict fixed :) )
Network card (n,1): 1000001+n*1000 (for p5, it will be 1000000-1031001)

Metric number range after the second PR:
Network card (0,0): 10
Network card (0,1): 75
Network card (n,1): 0x(hexadecimal number)n01+10 (for p5, it will be 10-12555. The hexadecimal number was accidentally introduced because bash automatically interpret numbers start with "00" as hexadecimal number)

Metric number range after this commit:
Network card (0,0): 1000
Network card (0,1): 1001
Network card (n,1): n01+1000 (for p5, it will be 1000-4101)

Tests

  • multi-nic tests on RHEL8, Rocky8, AL2, AL2023, Ubuntu20, Ubuntu22 have passed

Checklist

  • Make sure you are pointing to the right branch.
  • If you're creating a patch for a branch other than develop add the branch name as prefix in the PR title (e.g. [release-3.6]).
  • Check all commits' messages are clear, describing what and why vs how.
  • Make sure to have added unit tests or integration tests to cover the new/modified code.
  • Check if documentation is impacted by this change.

Please review the guidelines for contributing and Pull Request Instructions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

aws#2855 made the pcluster route table/metric number larger (meaning lower priority). Thereafter, some unwanted default rules on AL2023 ec2-net-utils took priority and failed test_multiple_nics integration test on AL2023.
Then, aws#2857 made the number too small, interfering route table configurations from IMDS on AL2.

Therefore, this commit tries to imitate the priority prior to these two PRs. This is not the cleanest fix, because it is staying in the lucky priority rand instead of fully resolving the issue (i.e. prevent IMDS and ec2-net-utils from configuring the route tables). However, this commit is the least breaking change. So I propose to go with this commit.

Metric number range before the two PRs
Network card (0,0): 1000
Network card (0,1): 1000 (which was causing conflicts and the reason for all these PRs)
Network card (n,1): 100n (for p5, which has 32 network card, it will be 1000-10031)

Metric number range after the first PR:
Network card (0,0): 1000000
Network card (0,1): 1000001 (conflict fixed :) )
Network card (n,1): 1000001+n*1000 (for p5, it will be 1000000-1031001)

Metric number range after the second PR:
Network card (0,0): 10
Network card (0,1): 75
Network card (n,1): 0x(hexadecimal number)n01+10 (for p5, it will be 10-12555. The hexadecimal number was accidentally introduced because bash automatically interpret numbers start with "00" as hexadecimal number)

Metric number range after this commit:
Network card (0,0): 1000
Network card (0,1): 1001
Network card (n,1): n01+1000 (for p5, it will be 1000-4101)

Signed-off-by: Hanwen <[email protected]>
@hanwen-cluster hanwen-cluster requested review from a team as code owners December 31, 2024 16:19
Copy link

codecov bot commented Dec 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.63%. Comparing base (10395d1) to head (ac3e225).
Report is 34 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2858      +/-   ##
===========================================
- Coverage    76.67%   76.63%   -0.04%     
===========================================
  Files           22       22              
  Lines         2242     2243       +1     
===========================================
  Hits          1719     1719              
- Misses         523      524       +1     
Flag Coverage Δ
unittests 76.63% <ø> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hanwen-cluster hanwen-cluster enabled auto-merge (rebase) December 31, 2024 16:27
@hanwen-cluster hanwen-cluster merged commit e3949f3 into aws:develop Dec 31, 2024
34 of 46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants