Race condition between chromedriver and selenium with a good stack trace indicating it #2770

krschacht · 2024-08-26T15:50:55Z

Expected Behavior

I've been using successfully using Capybara in Rails for quite some time (many months). But one day, about a month ago, my system tests started sporadically failing in my Github CI Actions with Net::ReadTimeout with "Net::ReadTimeout with #<TCPSocket:(closed)>". If I re-run the test suite a few times I can eventually get it to successfully run through. I've tried many different workarounds but none of them work around the issue. I've tried rolling back all changes in my repo to months ago when tests were consistently passing, and that doesn't seem to fix it either.

We've spent many hours investigating the cause and we currently think there is a race condition somewhere between chromedriver and selenium. My project is an open source project so here is a direct link to one of the failed CI runs where you can see the full stack trace: https://github.com/AllYourBot/hostedgpt/actions/runs/10533347868/job/29189182499?pr=498

The Net::ReadTimeout is coming from capybara (aka selenium) failing to hit chromedriver when attempting to set up the server. One of my engineers has outlined his read of that stack trace:

I think the tests run (and fail) before puma is started by capybara
The test hung because the server was still running and ruby wouldn't exit
It says the TCP socket was closed -- does this means the socket was open when it started but closed during the exchange? Or that it was never open? I suspect the former because the stack trace is in the middle of a read loop.
The failure is in the area of code which causes chromedriver to build a new session (ie, start chrome up):

Also, another thing that suggests a race condition is that when we SSH into the job mid-run, it sometimes fails or hangs for a bit. But if I interrupt the process (^c) and then re-run it, it goes fine.

Steps to reproduce

On github you can fork this repo
I've configured the Github CI Actions to not run system tests on forks, but (a) delete this line to remove the short circuit, and (b) change the very next "runs-on" line back to ubuntu-latest which are the default Github Action servers.
Push a change to the repo to trigger Github CI to run

The text was updated successfully, but these errors were encountered:

woodhull · 2024-08-27T12:10:12Z

We've also been experiencing this. We initially thought it was some problem with the first request timing out (like asset compilation?) but think we've eliminated those potential causes.

It still seems most likely there is a problem with our app, but maybe there is a regression with newer chrome or selenium versions.

We're also on GitHub Actions for what it's worth. One other theory we had is that the environment was cpu constrained on overloaded VMs within the GitHub actions runner pool.

krschacht · 2024-08-27T13:03:03Z

@woodhull try reverting your codebase to a point in time where CI was working, push that up to a branch/PR and see if it works. Ours doesn’t. This is how we determined it’s something outside of our codebase or gemfile.lock

woodhull · 2024-08-27T22:01:28Z

We resolved this by locking an older version of chromium. For us at least this started happening when alpine was upgraded to 3.19 to 3.20 and chrome along with it because we were letting the version float by basing our docker image on alpine without specifying an explicit version.

I tried the selenium-driver nightlies and the issue was still present there with the latest chrome version.

searls · 2024-08-28T11:52:33Z

Off-topic: I've lost so many hours to Selenium timing issues over the years, that I'm glad I bit the bullet and converted to Capybara+Playwright -- it's been rock solid so far, with approximately zero flakes in the two months since I switched. Here's the guide I wrote: https://justin.searls.co/posts/running-rails-system-tests-with-playwright-instead-of-selenium/

robacarp · 2024-08-28T15:31:13Z

@woodhull did you pin alpine and chromium+chromedriver, or just alpine?

twalpole · 2024-08-28T18:27:50Z

If there's a race condition between selenium and chromedriver shouldn't this be reported on once of those projects? Capybara doesn't really control their timing

krschacht · 2024-08-28T22:11:21Z

I opened this on Selenium's side too: SeleniumHQ/selenium#14454

krschacht changed the title ~~Race condition between chromedriver and selenium~~ Race condition between chromedriver and selenium with a good stack trace indicating it Aug 26, 2024

krschacht mentioned this issue Aug 27, 2024

adds ssh debugger for selenium test AllYourBot/hostedgpt#498

Closed

krschacht mentioned this issue Sep 14, 2024

Add language model prices AllYourBot/hostedgpt#501

Merged

jlvallelonga mentioned this issue Sep 22, 2024

use playwright instead of selenium AllYourBot/hostedgpt#508

Closed

Dantemss mentioned this issue Oct 18, 2024

Ruby 3 openstax/accounts#1260

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition between chromedriver and selenium with a good stack trace indicating it #2770

Race condition between chromedriver and selenium with a good stack trace indicating it #2770

krschacht commented Aug 26, 2024 •

edited

Loading

woodhull commented Aug 27, 2024

krschacht commented Aug 27, 2024

woodhull commented Aug 27, 2024

searls commented Aug 28, 2024

robacarp commented Aug 28, 2024

twalpole commented Aug 28, 2024

krschacht commented Aug 28, 2024

Race condition between chromedriver and selenium with a good stack trace indicating it #2770

Race condition between chromedriver and selenium with a good stack trace indicating it #2770

Comments

krschacht commented Aug 26, 2024 • edited Loading

Meta

Expected Behavior

Steps to reproduce

woodhull commented Aug 27, 2024

krschacht commented Aug 27, 2024

woodhull commented Aug 27, 2024

searls commented Aug 28, 2024

robacarp commented Aug 28, 2024

twalpole commented Aug 28, 2024

krschacht commented Aug 28, 2024

krschacht commented Aug 26, 2024 •

edited

Loading