-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update dependencies #428
Update dependencies #428
Conversation
I have to investigate why it fails more often, i suspect pytest to be responsible... |
@AdamGleave @hill-a @erniejunior I identified the issue and unfortunately, it comes from tensorflow (v1.5.0 works fine, tf>=v1.8.0 makes travis hang more often, even though no test fails) |
Do we know what's causing it to fail? Do the tests run OK in the Docker image on local machines? Can debug what's causing the test to hang by attaching |
I suspect an out of RAM error from travis. We are pretty close to the limit and some memory is not properly released during the tests (I tried to fix that in the past but was not successful).
I need to check that but I would say yes, I'm using tf 1.8.0 since the beginning on my machine and I could run the tests without any problem in the past, but we should double-check. |
One option would be to split our test suite in half (e.g. alphabetically) and run in two separate Travis instances. This won't help if one test consumes more memory than available on Travis, but if there are leaks this would help. And would have the added benefit of speeding up the test suite as well. Confused why out of RAM would cause hanging rather than e.g. |
yes, there may be two things: first, if it uses swap it becomes super slow, then when you have no RAM left, you usually don't get any error (see what happens when you do a fork bomb) and things just hang...
Good idea, how easy is it to implement? And yes, this is definitely memory leaks that were happening. |
It's now hanging on @araffin I've changed the default for |
@AdamGleave I managed to reproduce the deadlock (for me, it is currently happening on atari test ACER + lstm). You were right, it is not due to memory problem.
I would avoid that because this makes it really user unfriendly (e.g. you cannot use in a ipython terminal anymore) and I would be interested in knowing where it does actually comes from (in the sense what changed in tf that broke it ?) |
I normally use use When I last looked into it, the deadlock happened in the graph destructor. I think what happens is:
Although irritating, TensorFlow is working as intended, and I think we'll always being rolling the dice on it if using |
I'm done fiddling around with this. Happy to make changes to the @araffin if you want me to work on merging the Codacy reports then ping me once the Docker image is updated with the dependencies. |
@AdamGleave I'll do that (just don't have a good internet connection for now...) |
@AdamGleave the image is pushed! (tag: 2.7.1, you run the codacy coverage reporter using |
Why are we pip installing things inside the test running script, rather than making this part of Docker? I think this'll slow down each of the tests by a constant amount. |
I was just testing what was causing travis to hang (for tf and gym), you can remove them normally (tf 1.8.0 and gym 0.14.0 are in the docker image now). |
@hill-a could you review that one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Update dependencies:
Docker images (notably the one used by travis) are also updated (using latest gym version)