chore(tests): add network perf tests for Retina #772

ritwikranjan · 2024-09-23T13:28:13Z

Description

This pull request introduces several updates related to performance testing, dependency upgrades, and workflow enhancements. The most important changes include the addition of a new performance measurement workflow, updates to dependencies in go.mod, and modifications to the e2e test setup and execution.

Performance Testing Enhancements:

Added a new GitHub Actions workflow for network performance measurement that runs every two hours (.github/workflows/perf.yaml).
Introduced a new performance test script and related functions for gathering and publishing network performance metrics (test/e2e/retina_perf_test.go, test/e2e/scenarios/perf/get-network-performance-measures.go). [1] [2]

Workflow and Configuration Changes:

Updated the e2e test command to include a more specific file pattern (.github/workflows/e2e.yaml).
Added azure-cli feature to the devcontainer configuration (.devcontainer/devcontainer.json).

Documentation:

Added documentation for reading Retina performance test results and the metrics published to Azure App Insights (test/e2e/README.md).

These changes collectively enhance the testing infrastructure, improve dependency management, and provide better documentation for performance testing.

Related Issue

If this pull request is related to any issue, please mention it here. Additionally, make sure that the issue is assigned to you before submitting this pull request.

Checklist

I have read the contributing documentation.
I signed and signed-off the commits (git commit -S -s ...). See this documentation on signing commits.
I have correctly attributed the author(s) of the code.
I have tested the changes locally.
I have followed the project's style guidelines.
I have updated the documentation, if necessary.
I have added tests, if applicable.

Screenshots (if applicable) or Testing Completed

Please add any relevant screenshots or GIFs to showcase the changes made.

Additional Notes

Add any additional notes or context about the pull request here.

Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.

- Added new performance tests for iperf throughput (TCP and UDP) - Metrics include CPU Utilization Host, CPU Utilization Remote, Max RTT, Mean RTT, Min RTT, Retransmits, and Total Throughput This commit introduces new performance tests to measure iperf throughput under various conditions for the Retina project. Signed-off-by: Ritwik Ranjan <[email protected]>

test/e2e/retina_perf_test.go

test/e2e/scenarios/perf/get-network-performance-measures.go

go.mod

test/e2e/retina_perf_test.go

test/e2e/scenarios/perf/get-network-performance-measures.go

Signed-off-by: Ritwik Ranjan <[email protected]>

SRodi

Run a test on uksouth and getting this

                                --------------------------------------------------------------------------------
                                RESPONSE 400: 400 Bad Request
                                ERROR CODE: ErrCode_InsufficientVCPUQuota
                                --------------------------------------------------------------------------------
                                {
                                  "code": "ErrCode_InsufficientVCPUQuota",
                                  "details": null,
                                  "message": "Insufficient regional vcpu quota left for location uksouth. left regional vcpu quota 20, requested quota 36",
                                  "subcode": ""
                                }
                                --------------------------------------------------------------------------------
                Test:           TestPerfRetina

I also run the test in westus2, and that was not an issue, but I got the following:

2024/09/27 17:48:52 DaemonSet is not ready: kube-system/retina-agent. 0 out of 3 expected pods are ready
2024/09/27 17:48:54 DaemonSet is not ready: kube-system/retina-agent. 0 out of 3 expected pods are ready
2024/09/27 17:48:56 DaemonSet is not ready: kube-system/retina-agent. 0 out of 3 expected pods are ready
2024/09/27 17:48:58 DaemonSet is not ready: kube-system/retina-agent. 0 out of 3 expected pods are ready
2024/09/27 17:49:00 DaemonSet is not ready: kube-system/retina-agent. 0 out of 3 expected pods are ready
2024/09/27 17:49:02 Error received when checking status of resource retina-svc. Error: 'client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline', Resource details: 'Resource: "/v1, Resource=services", GroupVersionKind: "/v1, Kind=Service"
Name: "retina-svc", Namespace: "kube-system"'
2024/09/27 17:49:02 Retryable error? true
2024/09/27 17:49:02 Retrying as current number of retries 0 less than max number of retries 30
    runner.go:27: 
                Error Trace:    /home/srodi/src/retina/test/e2e/framework/types/runner.go:27
                                                        /home/srodi/src/retina/test/e2e/retina_perf_test.go:65
                Error:          Received unexpected error:
                                did not expect error from step InstallHelmChart but got error: failed to install chart: context deadline exceeded
                Test:           TestPerfRetina
DeleteResourceGroup setting stored value for parameter [SubscriptionID] set as [......-.....-....-....-.........]
DeleteResourceGroup setting stored value for parameter [ResourceGroupName] set as [srodi-e2e-netobs-1727452628]
DeleteResourceGroup setting stored value for parameter [Location] set as [westus2]
#################### DeleteResourceGroup ################################################################
2024/09/27 17:49:02 deleting resource group "srodi-e2e-netobs-1727452628"...
2024/09/27 17:49:05 resource group "srodi-e2e-netobs-1727452628" deleted successfully
--- FAIL: TestPerfRetina (3269.87s)

FYI @ritwikranjan

go.mod

test/e2e/scenarios/perf/get-network-performance-measures.go

test/e2e/retina_perf_test.go

Signed-off-by: Ritwik Ranjan <[email protected]>

SRodi · 2024-10-02T14:40:03Z

@ritwikranjan I just got another fail on insufficient quota, this time for centralus. I would suggest to make sure the test can run in any regions part of locations slice. ([]string{"eastus2", "centralus", "southcentralus", "uksouth", "centralindia", "westus2"})

    runner.go:27: 
                Error Trace:    /home/srodi/src/retina/test/e2e/framework/types/runner.go:27
                                                        /home/srodi/src/retina/test/e2e/retina_perf_test.go:52
                Error:          Received unexpected error:
                                did not expect error from step CreateNPMCluster but got error: failed to finish the create cluster request: PUT https://management.azure.com/subscriptions/....-....-....-....-.........../resourceGroups/srodi-e2e-netobs-1727879517/providers/Microsoft.ContainerService/managedClusters/srodi-e2e-netobs-1727879517
                                --------------------------------------------------------------------------------
                                RESPONSE 400: 400 Bad Request
                                ERROR CODE: ErrCode_InsufficientVCPUQuota
                                --------------------------------------------------------------------------------
                                {
                                  "code": "ErrCode_InsufficientVCPUQuota",
                                  "details": null,
                                  "message": "Insufficient vcpu quota requested 32, remaining 0 for family standardDSv2Family for region centralus.",
                                  "subcode": ""
                                }
                                --------------------------------------------------------------------------------
                Test:           TestE2EPerfRetina
--- FAIL: TestE2EPerfRetina (26.22s)
FAIL
FAIL    command-line-arguments  26.239s
FAIL

SRodi

@ritwikranjan I am getting the following error while running the test based on the most recent commit

    runner.go:27: 
                Error Trace:    /home/srodi/src/retina/test/e2e/framework/types/runner.go:27
                                                        /home/srodi/src/retina/test/e2e/retina_perf_test.go:63
                Error:          Received unexpected error:
                                did not expect error from step GetNetworkPerformanceMeasures but got error: failed to get network performance measures: failed to execute tests: error getting CSV data from orchestrator pod: error reading logs from pod netperf-orch-59dsc: the server rejected our request for an unknown reason (get pods netperf-orch-59dsc)
                Test:           TestE2EPerfRetina

Signed-off-by: Ritwik Ranjan <[email protected]>

ritwikranjan · 2024-10-03T16:04:04Z

Will help with identifying issue #655

Signed-off-by: Ritwik Ranjan <[email protected]>

test/e2e/scenarios/perf/publish-perf-results.go

Signed-off-by: Ritwik Ranjan <[email protected]>

test/e2e/scenarios/perf/get-network-performance-measures.go

test/e2e/retina_perf_test.go

Signed-off-by: Ritwik Ranjan <[email protected]>

.github/workflows/perf.yaml

test/e2e/retina_perf_test.go

test/e2e/scenarios/perf/get-perf-regression-results.go

Signed-off-by: Ritwik Ranjan <[email protected]>

test/e2e/scenarios/perf/get-perf-regression-results.go

test/e2e/retina_perf_test.go

ritwikranjan requested a review from a team as a code owner September 23, 2024 13:28

ritwikranjan requested review from alexcastilio and karina-ranadive September 23, 2024 13:28

ritwikranjan changed the title ~~[WIP] chore/ Network perf test for Retina~~ [WIP] chore/tests: add network perf tests for Retina Sep 23, 2024

Merge branch 'main' into chore/add-perf-tests

a10c8e5

SRodi reviewed Sep 25, 2024

View reviewed changes

matmerr reviewed Sep 25, 2024

View reviewed changes

test/e2e/scenarios/perf/get-network-performance-measures.go Outdated Show resolved Hide resolved

address PR comments and update perf-tests dependency

8d90d10

Signed-off-by: Ritwik Ranjan <[email protected]>

ritwikranjan changed the title ~~[WIP] chore/tests: add network perf tests for Retina~~ chore/tests: add network perf tests for Retina Sep 27, 2024

ritwikranjan changed the title ~~chore/tests: add network perf tests for Retina~~ chore(tests): add network perf tests for Retina Sep 27, 2024

Merge branch 'microsoft:main' into chore/add-perf-tests

a3e3455

SRodi reviewed Sep 27, 2024

View reviewed changes

fix: remove go.mod version bump

34b158a

ritwikranjan self-assigned this Oct 1, 2024

ritwikranjan added the type/enhancement New feature or request label Oct 1, 2024

fix: downgrading the accidently upgraded k8s modules

0432003

SRodi reviewed Oct 1, 2024

View reviewed changes

go.mod Outdated Show resolved Hide resolved

test/e2e/scenarios/perf/get-network-performance-measures.go Outdated Show resolved Hide resolved

timraymond reviewed Oct 1, 2024

View reviewed changes

test/e2e/retina_perf_test.go Show resolved Hide resolved

test/e2e/retina_perf_test.go Outdated Show resolved Hide resolved

test/e2e/retina_perf_test.go Outdated Show resolved Hide resolved

fix: address PR comments

4de9504

Signed-off-by: Ritwik Ranjan <[email protected]>

ritwikranjan requested review from timraymond, matmerr, anubhabMajumdar and SRodi October 2, 2024 14:19

SRodi reviewed Oct 2, 2024

View reviewed changes

ritwikranjan added 2 commits October 3, 2024 11:44

Merge branch 'main' into chore/add-perf-tests

333fad2

fixed stuff after merge from main

1d1da2c

Signed-off-by: Ritwik Ranjan <[email protected]>

chore: add app insights workflow for retina perf test

ca3ad39

Signed-off-by: Ritwik Ranjan <[email protected]>

Merge branch 'main' into chore/add-perf-tests

5c64a8e

Signed-off-by: Ritwik Ranjan <[email protected]>

ritwikranjan dismissed SRodi’s stale review via 5c64a8e October 9, 2024 21:30

ritwikranjan added 3 commits October 14, 2024 16:42

Merge branch 'main' into chore/add-perf-tests

9dc522e

fix: add image tag of retina as a property of the metric

5169330

Merge branch 'main' into chore/add-perf-tests

220faf8

Signed-off-by: Ritwik Ranjan <[email protected]>

jimassa reviewed Oct 23, 2024

View reviewed changes

test/e2e/scenarios/perf/publish-perf-results.go Outdated Show resolved Hide resolved

ritwikranjan and others added 9 commits October 23, 2024 15:31

fix: use trackEvent instead of trackMetric for perf results

a412d26

Merge branch 'main' into chore/add-perf-tests

8e64af1

Use ms/retina to perform the tests

530d1b3

Signed-off-by: Ritwik Ranjan <[email protected]>

Remove az login condition

ddba587

Signed-off-by: Ritwik Ranjan <[email protected]>

fix: remove condition from az login

a184122

Increase test timeout

c2119b6

Signed-off-by: Ritwik Ranjan <[email protected]>

fix: Increase test timeout to 2h from 30m

8b0aa3b

Signed-off-by: Ritwik Ranjan <[email protected]>

Update perf.yaml

58adeab

Merge branch 'microsoft:main' into chore/add-perf-tests

42dfca9

timraymond reviewed Oct 24, 2024

View reviewed changes

test/e2e/scenarios/perf/get-network-performance-measures.go Outdated Show resolved Hide resolved

test/e2e/retina_perf_test.go Outdated Show resolved Hide resolved

ritwikranjan mentioned this pull request Oct 30, 2024

Perf test for amd64 based linux image #914

Open

ritwikranjan and others added 3 commits November 4, 2024 14:05

Merge branch 'main' into chore/add-perf-tests

dd08b15

address PR comments

e6d6001

Merge branch 'test/add-perf-tests' into chore/add-perf-tests

485e3e2

Signed-off-by: Ritwik Ranjan <[email protected]>

anubhabMajumdar reviewed Nov 4, 2024

View reviewed changes

.github/workflows/perf.yaml Outdated Show resolved Hide resolved

test/e2e/retina_perf_test.go Outdated Show resolved Hide resolved

test/e2e/scenarios/perf/get-perf-regression-results.go Outdated Show resolved Hide resolved

ritwikranjan added 4 commits November 5, 2024 12:42

address PR comments

990edd3

Merge branch 'main' into chore/add-perf-tests

99f4ba1

Signed-off-by: Ritwik Ranjan <[email protected]>

run go mod tidy after merge

3557a72

Merge branch 'main' into chore/add-perf-tests

9c4d806

anubhabMajumdar reviewed Nov 5, 2024

View reviewed changes

test/e2e/scenarios/perf/get-perf-regression-results.go Outdated Show resolved Hide resolved

test/e2e/retina_perf_test.go Outdated Show resolved Hide resolved

ritwikranjan added 4 commits November 5, 2024 16:35

address pr comments

e922434

Merge remote-tracking branch 'upstream/main' into chore/add-perf-tests

f434a69

go mod tidy

5f523c7

Add perf test run on merge group

de49986

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(tests): add network perf tests for Retina #772

chore(tests): add network perf tests for Retina #772

ritwikranjan commented Sep 23, 2024 •

edited

Loading

SRodi left a comment

SRodi commented Oct 2, 2024

SRodi left a comment

ritwikranjan commented Oct 3, 2024

chore(tests): add network perf tests for Retina #772

Are you sure you want to change the base?

chore(tests): add network perf tests for Retina #772

Conversation

ritwikranjan commented Sep 23, 2024 • edited Loading

Description

Performance Testing Enhancements:

Workflow and Configuration Changes:

Documentation:

Related Issue

Checklist

Screenshots (if applicable) or Testing Completed

Additional Notes

SRodi left a comment

Choose a reason for hiding this comment

SRodi commented Oct 2, 2024

SRodi left a comment

Choose a reason for hiding this comment

ritwikranjan commented Oct 3, 2024

ritwikranjan commented Sep 23, 2024 •

edited

Loading