Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threads over-subscription #12

Open
tleonardi opened this issue Oct 11, 2018 · 7 comments
Open

Threads over-subscription #12

tleonardi opened this issue Oct 11, 2018 · 7 comments
Labels
bug Something isn't working

Comments

@tleonardi
Copy link
Owner

It looks like nanocompore sometimes spawns more threads than it should.. Starting it with nthreads=4 with the 7SK IVT data starts 16 threads.

@tleonardi tleonardi added the bug Something isn't working label Oct 11, 2018
@tleonardi
Copy link
Owner Author

I have to look into the issue more carefully, not sure why/when it happens, but it happened more than once already. @a-slide do you have any idea?

@tleonardi
Copy link
Owner Author

tleonardi commented Oct 12, 2018

Ok, it looks like I figured it out. On my system numpy is built against openBlas, which by default is multithreaded. The result is that the np.array() call in __process_references() spawns multiple threads (and every worker process does the same).
Since we are using multiprocessing, the solution seems to be to disable multithreading for openBlas and mkl before importing numpy:

import os
os.environ["MKL_NUM_THREADS"] = "1"
os.environ["MKL_THREADING_LAYER"] = "sequential"
os.environ["NUMEXPR_NUM_THREADS"] = "1"
os.environ["OMP_NUM_THREADS"] = "1"

I'm currently testing whether it works as it should.. will commit as soon as I'm sure all is fine.

@a-slide
Copy link
Collaborator

a-slide commented Oct 12, 2018

I did not notice, but my version is actually also build against OpenBlas.
I had a quick look as well and it looks like the method you describe should work but you might want to include that as well

os.environ['OPENBLAS_NUM_THREADS'] = '1'

@a-slide
Copy link
Collaborator

a-slide commented May 17, 2019

Not completely fixed apparently.
Numpy is still causing issues in a cluster environment.
An option to explore might be to use this package to set the number of threads:
https://github.com/joblib/threadpoolctl

@a-slide a-slide changed the title Threads out of control Threads over-subscription May 17, 2019
@tleonardi
Copy link
Owner Author

This should be fixed by #94 but I haven't tested it yet. Did you?

@tleonardi
Copy link
Owner Author

I think it's fixed

@tleonardi
Copy link
Owner Author

But it's not, reopening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants