Skip to content
This repository was archived by the owner on Sep 21, 2021. It is now read-only.
This repository was archived by the owner on Sep 21, 2021. It is now read-only.

Intensive and repeated use cause DockerTimeoutException with 3.141.59d #807

@mad-p

Description

@mad-p

Zalenium Image Version(s): 3.141.59d ( NOT reproducible with 3.141.59c )
Docker Version: 18.09.0, build 4d60db4
If using docker-compose, version: 1.23.1, build b02f1306
OS: OSX High Sierra ( also reproducible on CentOS 7.2.1511 )
Docker Command to start Zalenium: Executing through docker-compose.yml

Expected Behavior

After intensive and repeated use, DockerTimeoutException won't occur.

Actual Behavior

After intensive and repeated use, DockerTimeoutException occurs repeatedly.
The exception can be observed just by looking at Zalenium logs(i.e.; docker logs -f zalenium).
The node containers remain functioning and still be used as normal, but once all these node containers get removed, the newly created node containers are not functioning and unable to be connected.

Minimal code to reproduce the problem

docker-compose.yml

  • --desiredContainers 0 helps us to ensure that the nodes containers get removed.
version: "2"

services:
  zalenium:
    image: dosel/zalenium:3.141.59d
    container_name: zalenium
    privileged: true
    tty: true
    ports:
      - "4444:4444"
    volumes:
      - /tmp/videos:/home/seluser/videos/
      - /var/run/docker.sock:/var/run/docker.sock
    command: >
      start
        --seleniumImageName elgalu/selenium:3.14.0-p17
        --desiredContainers 0
        --maxDockerSeleniumContainers 10
    environment:
      - PULL_SELENIUM_IMAGE=true

Ruby script

  • Run this ruby script in a single thread for the duration of several minutes. It does not have to be run concurrently.
  • It takes about 50 times of repetition for the problem to occur, however this might depend on the number of CPU cores and etc.
  • Shorter idleTimeout i.e. higher frequency of stale containers getting removed seems to make the problem more reproducible in lesser period of time.
    In a "real" UI testing environment where idleTimeout defaults to 90 seconds, each UI test takes a dozen of seconds and the concurrency of tests is about four, it usually takes half an hour for the problem to occur.
require 'selenium-webdriver' # version 3.141.0 ( also reproducible with version 2.53.4 )

def exec
  caps = Selenium::WebDriver::Remote::Capabilities.chrome
  caps[:idleTimeout] = 20
  driver = Selenium::WebDriver.for :remote, url: 'http://localhost:4444/wd/hub', desired_capabilities: caps
  driver.quit
end

i = 0
loop do
  i += 1
  begin
    exec
    puts i
  rescue => e
    puts i
    puts e
  end
end

Problematic code

The recent commit below seems to be responsible for this problem.
More precisely, the use of DockerContainerClient as a singleton.

Moving containers cleanup from bash to Java code.
d28d6f0

Tentative workaround

This is truly an ugly hack.
Force DockerContainerClient to be non-singleton.

diff --git a/src/main/java/de/zalando/ep/zalenium/container/ContainerFactory.java b/src/main/java/de/zalando/ep/zalenium/container/ContainerFactory.java
index 74db0ee..4326552 100644
--- a/src/main/java/de/zalando/ep/zalenium/container/ContainerFactory.java
+++ b/src/main/java/de/zalando/ep/zalenium/container/ContainerFactory.java
@@ -33,10 +33,11 @@ private static DockerContainerClient createDockerContainerClient() {
                 if (dockerContainerClient == null) {
                     dockerContainerClient = new DockerContainerClient();
                     dockerContainerClient.initialiseContainerEnvironment();
+                    return dockerContainerClient;
                 }
             }
         }
-        return dockerContainerClient;
+        return new DockerContainerClient();
     }
     
     private static KubernetesContainerClient createKubernetesContainerClient() {

Log output of DockerTimeoutException

04:01:59.469 [AutoStartProxyPoolPoller] WARN  d.z.e.z.c.DockerContainerClient - Failed to fetch container status [ContainerCreationStatus [isCreated=true, containerName=zalenium_pC5neu, containerId=b99b88c6d07d1ec0e24f826ce96012bad88cd65e68df05b32880f757288aed63, nodePort=40050]].
com.spotify.docker.client.exceptions.DockerTimeoutException: Timeout: GET unix://localhost:80/containers/b99b88c6d07d1ec0e24f826ce96012bad88cd65e68df05b32880f757288aed63/json
        at com.spotify.docker.client.DefaultDockerClient.propagate(DefaultDockerClient.java:2823)
        at com.spotify.docker.client.DefaultDockerClient.request(DefaultDockerClient.java:2692)
        at com.spotify.docker.client.DefaultDockerClient.inspectContainer(DefaultDockerClient.java:1070)
        at de.zalando.ep.zalenium.container.DockerContainerClient.isTerminated(DockerContainerClient.java:649)
        at de.zalando.ep.zalenium.proxy.DockeredSeleniumStarter.containerHasFinished(DockeredSeleniumStarter.java:317)
        at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
        at java.util.concurrent.ConcurrentHashMap$KeySpliterator.forEachRemaining(ConcurrentHashMap.java:3527)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
        at de.zalando.ep.zalenium.proxy.AutoStartProxySet.checkContainers(AutoStartProxySet.java:319)
        at de.zalando.ep.zalenium.proxy.AutoStartProxySet.lambda$0(AutoStartProxySet.java:119)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: javax.ws.rs.ProcessingException: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
        at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
        at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
        at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
        at com.spotify.docker.client.DefaultDockerClient.request(DefaultDockerClient.java:2690)
        ... 13 common frames omitted
Caused by: javax.ws.rs.ProcessingException: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
        at org.glassfish.jersey.apache.connector.ApacheConnector.apply(ApacheConnector.java:481)
        at org.glassfish.jersey.apache.connector.ApacheConnector$1.run(ApacheConnector.java:491)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at jersey.repackaged.com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:299)
        at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
        at jersey.repackaged.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:50)
        at jersey.repackaged.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:37)
        at org.glassfish.jersey.apache.connector.ApacheConnector.apply(ApacheConnector.java:487)
        at org.glassfish.jersey.client.ClientRuntime$2.run(ClientRuntime.java:178)
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
        at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:340)
        at org.glassfish.jersey.client.ClientRuntime$3.run(ClientRuntime.java:210)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        ... 1 common frames omitted
Caused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
        at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:286)
        at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:263)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:190)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:71)
        at org.glassfish.jersey.apache.connector.ApacheConnector.apply(ApacheConnector.java:435)
        ... 21 common frames omitted

Note

I don't think that this "DockerTimeoutException problem" is related to jersey-apache-connector bug.
I've already seen that this problem occurs both WITH or WITHOUT the jersey patch above.
(see also #808)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions