Migrate tests to GitHub Actions #3430

mwiencek · 2024-12-22T22:23:16Z

Problem

Our Selenium tests (on Jenkins) are slow. They used to take up to an hour, though since moving to a newer server, they take under 30 minutes.
Spreading our test workflows across different services (CircleCI and Jenkins) isn't ideal, but CircleCI doesn't seem to have enough or as much free resources as GitHub Actions to run our Selenium tests there.
All of our Jenkins configuration is stored inside Jenkins as opposed to version control. We never looked into converting things to use Jenkinsfiles/pipelines.
Jenkins requires additional maintenance (of our repository, containers, upgrades to plugins and Jenkins itself).
Having to manually build and push new musicbrainz-tests image versions whenever changes are made to Dockerfile.tests is annoying.
Our JavaScript code coverage reports are incomplete: they only include coverage from the Selenium tests.

Solution

Splits the Selenium tests into four parallel jobs which can be re-run individually.
Moves all of our tests away from CircleCI/Jenkins and onto GitHub Actions.
Combines the JS, Perl, pgTAP, and Selenium tests into a single CI workflow stored in git (.github/workflows/ci.yml).
Allows us to move all other jobs running on Jenkins to GitHub Actions in the future, so that we can retire Jenkins for good.
Builds the tests image as part of the CI workflow (we no longer have to manually build and push tests images).
Extracts JS coverage from our t/web.js and Perl tests, and merges it with the coverage from our Selenium tests.

Other changes:

Converts Dockerfile.tests to a multi-stage build.

One downside I'll note is that we can no longer SSH into running jobs (like on CircleCI) by default. There might be a way to emulate this feature if we need it.

Testing

Just the new GitHub Actions workflow!

mwiencek · 2024-12-22T22:25:50Z

This was (finally) passing at https://github.com/metabrainz/musicbrainz-server/actions/runs/12457761240 but there are probably (definitely) some issues with running them as part of a pull request that I'll need to address. 😛

mwiencek · 2024-12-22T22:31:56Z

HACKING-PROD.md

-`/home/musicbrainz/musicbrainz-server/`, and then you can run any test you want to check like this:
-
-    $ sudo -E -H -u musicbrainz carton exec -- prove -lv t/tests.t :: --tests Failing::Test
+Finally, you will need to update the `musicbrainz-tests` image version in


(Reminder to myself that this documentation needs to be updated again.)

Recent Selenium builds have been failing with WebDriverError: disconnected: not connected to DevTools (failed to check if window was closed: disconnected: not connected to DevTools) even though we haven't changed anything that I'm aware of. Trying to update our Chrome dependencies here to see if that helps in any way. Also bump selenium-webdriver to 4.27.0; I've combined it with this commit because the changelog suggests it's somewhat tied to specific Chrome versions: https://github.com/SeleniumHQ/selenium/blob/trunk/javascript/node/selenium-webdriver/CHANGES.md

This started failing during CI runs recently, and I have no clue why. (If you run the tests manually in your browser, it will also fail if you click on the page before it executes.) I'm just removing the test because the old autocomplete code is on the way out anyway.

We'd like to move Selenium tests away from Jenkins (which we moved to in machine with heavy MBS traffic, which can cause test slowdowns and flakiness. Maintaining Jenkins also isn't free and requires us to store some CI configuration outside of git. GitHub Actions looks like a suitable alternative for us, as its default runners provide more vCPUs and RAM (4 + 16GB) than CircleCI's (2 + 4GB), while also providing unlimited build minutes. Since we already use GitHub, that's one less service for us to rely on, and GHA integrates better with GitHub. The main disadvantage I could find is that there's no built-in way to SSH into a running tests container on GHA.

See previous commit, "Migrate from CircleCI to GitHub Actions."

It's now symlinked from the GitHub Actions workspace.

The metabrainz/musicbrainz-tests image will be built and cached prior to the two test jobs running. It no longer need to be built and pushed by hand in advance.

This comment wasn't entirely correct, as generating cpanfile.snapshot had nothing to do with Chrome updating. That was a side effect of rebuilding Dockerfile.tests, not cpanfile.snapshot. And it's no longer an issue since da2d499 anyway.

Pull requests don't have permission to push to ghcr.io, so we have to cache the image ourselves between jobs.

This may or may not help with debugging issues related to elements not being findable on the page. The screenshot is uploaded to the build artifacts.

The previous nyc_download step does not fail if no artifacts are found.

https://issues.chromium.org/issues/42323769

It seems that Chrome's `--headless=new` is now just `--headless`, with the old `--headless` becoming `--headless=old` [1]. But `--headless=new` still works, too. However, Firefox doesn't start in headless mode if you use `--headless=new`. Plain old `--headless` works for both browsers. [1] https://developer.chrome.com/docs/chromium/headless

These can take up quite a bit of space in the (limited) GitHub artifacts storage, and are only really useful if a particular test fails.

This seems to bypass some "element not clickable" issues in Firefox.

It made sense to use false prior to ce43326 when we had the old `handleAlert` commands verifying that these were shown, but now we must have the Firefox behavior align with Chrome.

Fixes some issues running the tests in Firefox.

While start_server's output is seen by svlogd fine, the plackup output appears to be buffered when not attached to a tty.

I'm seeing these kinds of failures in the Selenium tests: [browser console log] [SEVERE] http://mbtest:5000/ws/js/artist/?q=gr%C3%B6up%20member&page=1&direct=false - Failed to load resource: the server responded with a status of 500 (Internal Server Error) The plackup logs show the request being dispatched, but the Solr logs don't show any indication that it was received. I'm curious what LWP receives as the response. This logging should also be useful in production for debugging search errors.

While we'll still no longer need to rebuild and push test images by hand, we'll still have to bump `TEST_IMAGE_TAG`. Which should shave several minutes off each build using a cached image.

Trigger an input event to ensure the validation code responsible for toggling the disabled state is run.

The logging added in 647114c did not provide any further information, so this will allow us to see the response returned to the browser.

* Logs all console entries at the time they occur. * Works in Firefox.

mwiencek marked this pull request as draft December 22, 2024 22:28

mwiencek commented Dec 22, 2024

View reviewed changes

mwiencek force-pushed the github-actions branch from d037368 to 0620dee Compare December 23, 2024 02:27

mwiencek force-pushed the github-actions branch 14 times, most recently from 96379b6 to 1cfc071 Compare February 11, 2025 20:27

mwiencek force-pushed the github-actions branch 3 times, most recently from 8e37f1a to cdb05db Compare February 28, 2025 06:45

mwiencek added 9 commits March 4, 2025 15:32

Run Selenium tests on GitHub Actions

fcdfbfa

See previous commit, "Migrate from CircleCI to GitHub Actions."

Dockerfile.tests: Use ARG where ENV is not needed

93fd775

Dockerfile.tests: Don't make musicbrainz-server dir

d375dac

It's now symlinked from the GitHub Actions workspace.

Dockerfile.tests: Convert to multi-stage build

8f9bdbd

Build tests image from CI workflow

8b08fb0

The metabrainz/musicbrainz-tests image will be built and cached prior to the two test jobs running. It no longer need to be built and pushed by hand in advance.

Dockerfile.tests: Set environment variables from CI workflow

9acb612

mwiencek added 4 commits March 5, 2025 16:42

Remove outdated comment

6f5141f

This comment wasn't entirely correct, as generating cpanfile.snapshot had nothing to do with Chrome updating. That was a side effect of rebuilding Dockerfile.tests, not cpanfile.snapshot. And it's no longer an issue since da2d499 anyway.

Save/restore tests image to GHA cache

4150bfe

Pull requests don't have permission to push to ghcr.io, so we have to cache the image ourselves between jobs.

Bump chrome-for-testing to 133.0.6943.53

f2fa625

Bump selenium-webdriver to 4.28.1

52cd37c

mwiencek force-pushed the github-actions branch 5 times, most recently from 8875622 to 7dc48de Compare March 6, 2025 02:09

mwiencek added 6 commits March 5, 2025 20:22

Take a screenshot when the Selenium tests fail

a65127b

This may or may not help with debugging issues related to elements not being findable on the page. The screenshot is uploaded to the build artifacts.

Check for *_nyc_output in nyc_merge step

f2ebdb3

The previous nyc_download step does not fail if no artifacts are found.

Take a screenshot after every Selenium test command

1cfe939

Disable very intelligent Chrome password leak detection

b95bfc8

https://issues.chromium.org/issues/42323769

Skip unsupported Selenium logging method in Firefox

204fe7f

mwiencek force-pushed the github-actions branch from 7dc48de to 2af6192 Compare March 6, 2025 02:24

mwiencek added 12 commits March 5, 2025 20:50

Remove screenshots for successful tests

8548e93

These can take up quite a bit of space in the (limited) GitHub artifacts storage, and are only really useful if a particular test fails.

Maximize browser window in Selenium tests

48d6f69

This seems to bypass some "element not clickable" issues in Firefox.

Toggle dom.disable_beforeunload to true

30915e0

It made sense to use false prior to ce43326 when we had the old `handleAlert` commands verifying that these were shown, but now we must have the Firefox behavior align with Chrome.

Scroll element into view before mouseOver

dd0d05c

Fixes some issues running the tests in Firefox.

Fix runit service logging for website test service

c48d565

While start_server's output is seen by svlogd fine, the plackup output appears to be buffered when not attached to a tty.

Add TESTS_IMAGE_TAG and skip build-tests-image if it exists

7aec1a4

While we'll still no longer need to rebuild and push test images by hand, we'll still have to bump `TEST_IMAGE_TAG`. Which should shave several minutes off each build using a cached image.

Attempt to workaround disabled submit button issue

3ea585c

Trigger an input event to ensure the validation code responsible for toggling the disabled state is run.

Replace deprecated ClientRequest properties

96ab8df

t/selenium.mjs: Log failed /ws/js responses

c384f96

The logging added in 647114c did not provide any further information, so this will allow us to see the response returned to the browser.

t/selenium.js: Better browser console logging

3a1878d

* Logs all console entries at the time they occur. * Works in Firefox.

tmp

955b3e1

mwiencek force-pushed the github-actions branch from 2af6192 to 955b3e1 Compare March 6, 2025 03:13

mwiencek mentioned this pull request Mar 10, 2025

Bump chrome-for-testing to 130.0.6723.91 #3399

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate tests to GitHub Actions #3430

Migrate tests to GitHub Actions #3430

mwiencek commented Dec 22, 2024

mwiencek commented Dec 22, 2024

mwiencek Dec 22, 2024

Migrate tests to GitHub Actions #3430

Are you sure you want to change the base?

Migrate tests to GitHub Actions #3430

Conversation

mwiencek commented Dec 22, 2024

Problem

Solution

Testing

mwiencek commented Dec 22, 2024

mwiencek Dec 22, 2024

Choose a reason for hiding this comment