-
Notifications
You must be signed in to change notification settings - Fork 568
Intensive and repeated use cause DockerTimeoutException with 3.141.59d #807
Description
Zalenium Image Version(s): 3.141.59d ( NOT reproducible with 3.141.59c )
Docker Version: 18.09.0, build 4d60db4
If using docker-compose, version: 1.23.1, build b02f1306
OS: OSX High Sierra ( also reproducible on CentOS 7.2.1511 )
Docker Command to start Zalenium: Executing through docker-compose.yml
Expected Behavior
After intensive and repeated use, DockerTimeoutException won't occur.
Actual Behavior
After intensive and repeated use, DockerTimeoutException occurs repeatedly.
The exception can be observed just by looking at Zalenium logs(i.e.; docker logs -f zalenium).
The node containers remain functioning and still be used as normal, but once all these node containers get removed, the newly created node containers are not functioning and unable to be connected.
Minimal code to reproduce the problem
docker-compose.yml
--desiredContainers 0helps us to ensure that the nodes containers get removed.
version: "2"
services:
zalenium:
image: dosel/zalenium:3.141.59d
container_name: zalenium
privileged: true
tty: true
ports:
- "4444:4444"
volumes:
- /tmp/videos:/home/seluser/videos/
- /var/run/docker.sock:/var/run/docker.sock
command: >
start
--seleniumImageName elgalu/selenium:3.14.0-p17
--desiredContainers 0
--maxDockerSeleniumContainers 10
environment:
- PULL_SELENIUM_IMAGE=trueRuby script
- Run this ruby script in a single thread for the duration of several minutes. It does not have to be run concurrently.
- It takes about 50 times of repetition for the problem to occur, however this might depend on the number of CPU cores and etc.
- Shorter
idleTimeouti.e. higher frequency of stale containers getting removed seems to make the problem more reproducible in lesser period of time.
In a "real" UI testing environment where idleTimeout defaults to 90 seconds, each UI test takes a dozen of seconds and the concurrency of tests is about four, it usually takes half an hour for the problem to occur.
require 'selenium-webdriver' # version 3.141.0 ( also reproducible with version 2.53.4 )
def exec
caps = Selenium::WebDriver::Remote::Capabilities.chrome
caps[:idleTimeout] = 20
driver = Selenium::WebDriver.for :remote, url: 'http://localhost:4444/wd/hub', desired_capabilities: caps
driver.quit
end
i = 0
loop do
i += 1
begin
exec
puts i
rescue => e
puts i
puts e
end
endProblematic code
The recent commit below seems to be responsible for this problem.
More precisely, the use of DockerContainerClient as a singleton.
Moving containers cleanup from bash to Java code.
d28d6f0
Tentative workaround
This is truly an ugly hack.
Force DockerContainerClient to be non-singleton.
diff --git a/src/main/java/de/zalando/ep/zalenium/container/ContainerFactory.java b/src/main/java/de/zalando/ep/zalenium/container/ContainerFactory.java
index 74db0ee..4326552 100644
--- a/src/main/java/de/zalando/ep/zalenium/container/ContainerFactory.java
+++ b/src/main/java/de/zalando/ep/zalenium/container/ContainerFactory.java
@@ -33,10 +33,11 @@ private static DockerContainerClient createDockerContainerClient() {
if (dockerContainerClient == null) {
dockerContainerClient = new DockerContainerClient();
dockerContainerClient.initialiseContainerEnvironment();
+ return dockerContainerClient;
}
}
}
- return dockerContainerClient;
+ return new DockerContainerClient();
}
private static KubernetesContainerClient createKubernetesContainerClient() {Log output of DockerTimeoutException
04:01:59.469 [AutoStartProxyPoolPoller] WARN d.z.e.z.c.DockerContainerClient - Failed to fetch container status [ContainerCreationStatus [isCreated=true, containerName=zalenium_pC5neu, containerId=b99b88c6d07d1ec0e24f826ce96012bad88cd65e68df05b32880f757288aed63, nodePort=40050]].
com.spotify.docker.client.exceptions.DockerTimeoutException: Timeout: GET unix://localhost:80/containers/b99b88c6d07d1ec0e24f826ce96012bad88cd65e68df05b32880f757288aed63/json
at com.spotify.docker.client.DefaultDockerClient.propagate(DefaultDockerClient.java:2823)
at com.spotify.docker.client.DefaultDockerClient.request(DefaultDockerClient.java:2692)
at com.spotify.docker.client.DefaultDockerClient.inspectContainer(DefaultDockerClient.java:1070)
at de.zalando.ep.zalenium.container.DockerContainerClient.isTerminated(DockerContainerClient.java:649)
at de.zalando.ep.zalenium.proxy.DockeredSeleniumStarter.containerHasFinished(DockeredSeleniumStarter.java:317)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
at java.util.concurrent.ConcurrentHashMap$KeySpliterator.forEachRemaining(ConcurrentHashMap.java:3527)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at de.zalando.ep.zalenium.proxy.AutoStartProxySet.checkContainers(AutoStartProxySet.java:319)
at de.zalando.ep.zalenium.proxy.AutoStartProxySet.lambda$0(AutoStartProxySet.java:119)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: javax.ws.rs.ProcessingException: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at com.spotify.docker.client.DefaultDockerClient.request(DefaultDockerClient.java:2690)
... 13 common frames omitted
Caused by: javax.ws.rs.ProcessingException: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
at org.glassfish.jersey.apache.connector.ApacheConnector.apply(ApacheConnector.java:481)
at org.glassfish.jersey.apache.connector.ApacheConnector$1.run(ApacheConnector.java:491)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at jersey.repackaged.com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:299)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
at jersey.repackaged.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:50)
at jersey.repackaged.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:37)
at org.glassfish.jersey.apache.connector.ApacheConnector.apply(ApacheConnector.java:487)
at org.glassfish.jersey.client.ClientRuntime$2.run(ClientRuntime.java:178)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:340)
at org.glassfish.jersey.client.ClientRuntime$3.run(ClientRuntime.java:210)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 common frames omitted
Caused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:286)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:263)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:190)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:71)
at org.glassfish.jersey.apache.connector.ApacheConnector.apply(ApacheConnector.java:435)
... 21 common frames omitted
Note
I don't think that this "DockerTimeoutException problem" is related to jersey-apache-connector bug.
I've already seen that this problem occurs both WITH or WITHOUT the jersey patch above.
(see also #808)