Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale testing #23

Open
JoshKarpel opened this issue Jul 17, 2020 · 1 comment
Open

Scale testing #23

JoshKarpel opened this issue Jul 17, 2020 · 1 comment
Labels
bug Something isn't working

Comments

@JoshKarpel
Copy link
Contributor

@bbockelm reports possible issues observed by the Coffea team when scaling past ~50 workers with TLS, as well as issues where auto-scaled-down workers are killed while still holding useful results in memory. We should investigate both issues on our setup and see if we can reproduce them.

I'm also interested in testing overall stability during large/long calculations by manually killing workers and seeing if Dask can dynamically recover in a reasonable way (as it claims it can).

@JoshKarpel JoshKarpel added the bug Something isn't working label Jul 17, 2020
@JoshKarpel JoshKarpel self-assigned this Jul 17, 2020
@JoshKarpel
Copy link
Contributor Author

@bbockelm got to the bottom of this: dask/distributed#4069

@JoshKarpel JoshKarpel removed their assignment Aug 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant