Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to run the same tests with multi docker image variants #5547

Merged
merged 20 commits into from
Sep 20, 2024

Conversation

blakerouse
Copy link
Contributor

@blakerouse blakerouse commented Sep 16, 2024

What does this PR do?

Extends the integration testing framework for Kubernetes to allow multiple docker images to be used for the tests.

Why is it important?

Providers better coverage of our docker variants and it allows the tests author to write tests targeting different image variants.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

How to test this PR locally

  • INSTANCE_PROVISIONER="kind" SNAPSHOT="true" mage integration:kubernetes
  • INSTANCE_PROVISIONER="kind" SNAPSHOT="true" mage integration:kubernetesMatrix
  • INSTANCE_PROVISIONER="kind" SNAPSHOT="true" TEST_PLATFORMS="kubernetes/arm64/1.30.2/wolfi" mage integration:kubernetesMatrix

@blakerouse blakerouse added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team skip-changelog labels Sep 16, 2024
@blakerouse blakerouse self-assigned this Sep 16, 2024
@blakerouse blakerouse requested a review from a team as a code owner September 16, 2024 20:31
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Copy link
Contributor

mergify bot commented Sep 16, 2024

This pull request does not have a backport label. Could you fix it @blakerouse? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Sep 16, 2024

backport-v8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Sep 16, 2024
Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Controlling+filtering by docker image should be added to https://github.com/elastic/elastic-agent/blob/main/docs/test-framework-dev-guide.md for the case where you only want to test one of these.

{Type: define.Kubernetes},
{Type: define.Kubernetes, DockerImage: "docker.elastic.co/beats/elastic-agent"},
{Type: define.Kubernetes, DockerImage: "docker.elastic.co/beats/elastic-agent-complete"},
{Type: define.Kubernetes, DockerImage: "docker.elastic.co/beats/elastic-agent-ubi"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 9.0 the elastic-agent and elastic-agent-ubi will eventually be the same, so this is redundant there but not in 8.x where Ubuntu will continue to exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do a followup in main to merge those into the same.

@pkoutsovasilis
Copy link
Contributor

Overall I like this PR! I left some comments which are more points for a quick discussion around which fields a k8s test could define, namely Distro, Version, etc. but I am happy for +1 as soon as we agree on whether we do or don't need them 🙂

@blakerouse
Copy link
Contributor Author

Controlling+filtering by docker image should be added to https://github.com/elastic/elastic-agent/blob/main/docs/test-framework-dev-guide.md for the case where you only want to test one of these.

This is currently not supported. It will be even more work to allow filtering to a specific docker image for running tests. I am looking to just to get this done for the elastic-agent-server container image. Adding that will make this take even longer, its already snowballed into this work.

@blakerouse
Copy link
Contributor Author

@pkoutsovasilis Please give this another look, I took a different path. This now more aligns with OS testing.

@pkoutsovasilis
Copy link
Contributor

pkoutsovasilis commented Sep 18, 2024

So after some offline syncing with @blakerouse I think that the code changes are looking good. Now some things I noticed while running these tests locally (with 8.16.0-SNAPSHOT version) as our CI still needs some love are the following:

  • all ubi-image based tests are failing because agent pod is getting OOMKilled, we need to bump the memory limits?!
  • integration:kubernetesMatrix mage target, which will be the default now when this PR gets merged, takes ~ 2 hours to complete, is this causing any inconvenience for us?!

PS: so nice that you moved the k8s integration tests alongside the other integration tests 🚀

@blakerouse
Copy link
Contributor Author

@pkoutsovasilis I have reduced the tests to only test basic and wolfi at the moment. This is to reduce the time spent and get around the issue with UBI. Being that the UBI is going away in 9.0, and we haven't really been testing. The complete images are just additions to the original so they should basically work the same, so removing it for now to improve performance should be okay.

I think the overall solution to reducing the time it takes is to split up the work across multiple runners, but I want that done in a way where its not hard-coded in the YAML. I am already working on that here with buildkite, where it uses the batches to generate a buildkite pipeline. I think the same approach should be taken here, so its only required to update the golang testing framework for new versions and not also adjust matching YAML pipelines.

See: #5391

Copy link
Contributor

@pkoutsovasilis pkoutsovasilis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes look good to me

Copy link
Contributor

mergify bot commented Sep 19, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b integration-k8s-multi-images upstream/integration-k8s-multi-images
git merge upstream/main
git push upstream integration-k8s-multi-images

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried it locally and works as expected. LGTM.

@blakerouse blakerouse enabled auto-merge (squash) September 20, 2024 01:15
Copy link

Copy link
Contributor

mergify bot commented Sep 20, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b integration-k8s-multi-images upstream/integration-k8s-multi-images
git merge upstream/main
git push upstream integration-k8s-multi-images

@ycombinator
Copy link
Contributor

CI failure in Merge coverage reports step (see below) seems totally unrelated to this PR as the file mentioned there isn't even touched by this PR:

2024/09/20 15:30:25 OVERLAP BEFORE: github.com/elastic/elastic-agent/internal/pkg/agent/cmd/run.go {132 40 134 3 1 0} {133 2 133 40 1 0}

Also, the failure isn't happening consistently either. Force merging...

@ycombinator ycombinator merged commit 67b709f into elastic:main Sep 20, 2024
11 of 12 checks passed
mergify bot pushed a commit that referenced this pull request Sep 20, 2024
…5547)

* Add ability to run the same tests with multi docker image variants.

* Fix buildkite.

* Add back distro for kubernetes types.

* Refactor how kubernetes tests are selected.

* Allow specific docker variant.

* Add kubernetes testing to the dev guide.

* Add back K8S_VERSION to buildkite job.

* Fix docker image location.

* Don't create so many clusters.

* Ensure basic is always picked first.

* Fix cloud variant image name.

* revert builkite changes

* Move kubernetes integration tests in CI next to other integration tests.

* Rename wolfi-complete to complete-wolfi.

* Reduce docker variants on some tests.

* Only test basic and wolfi.

(cherry picked from commit 67b709f)

# Conflicts:
#	.buildkite/scripts/steps/k8s-extended-tests.sh
#	magefile.go
@blakerouse blakerouse deleted the integration-k8s-multi-images branch September 20, 2024 16:58
blakerouse added a commit that referenced this pull request Sep 20, 2024
…cker image variants (#5585)

* Add ability to run the same tests with multi docker image variants (#5547)

* Add ability to run the same tests with multi docker image variants.

* Fix buildkite.

* Add back distro for kubernetes types.

* Refactor how kubernetes tests are selected.

* Allow specific docker variant.

* Add kubernetes testing to the dev guide.

* Add back K8S_VERSION to buildkite job.

* Fix docker image location.

* Don't create so many clusters.

* Ensure basic is always picked first.

* Fix cloud variant image name.

* revert builkite changes

* Move kubernetes integration tests in CI next to other integration tests.

* Rename wolfi-complete to complete-wolfi.

* Reduce docker variants on some tests.

* Only test basic and wolfi.

(cherry picked from commit 67b709f)

# Conflicts:
#	.buildkite/scripts/steps/k8s-extended-tests.sh
#	magefile.go

* Fix merge issue.

---------

Co-authored-by: Blake Rouse <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants