Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support fully-remote Dask schedulers #24

Open
JoshKarpel opened this issue Jul 17, 2020 · 1 comment
Open

Support fully-remote Dask schedulers #24

JoshKarpel opened this issue Jul 17, 2020 · 1 comment
Assignees
Labels

Comments

@JoshKarpel
Copy link
Contributor

Right now, we support a workflow that looks like this:

Screenshot from 2020-07-17 12-52-55

We'd like to be able to support this:

Screenshot from 2020-07-17 12-53-37

This will require Dask-CHTC to be able to do spooled remote submits. There will be networking and security implications for this, including the need to acquire HTCondor IDTOKENs. Presumably we could have a CLI command to help out with that.

This is related to #5, because a goal of this workflow is to be able to gather workers from many different places and hook them all up to your central scheduler.

I think we also need to understand how this impacts communication overheads - its possible that some workflows that rely on short tasks will be seriously impeded by running like this.

It would be very nice if we could support non-Linux platforms as well, but it will be annoying. We might be able to swing Windows with special install instructions, but Mac is non-starter at the moment. Hopefully we'll get improved HTCondor Python bindings distribution mechanisms soon-ish.

@JoshKarpel JoshKarpel self-assigned this Jul 17, 2020
@JoshKarpel
Copy link
Contributor Author

We ended up running into a major roadblock with the above plan: my home ISP was blocking incoming TLS connections. We can't really control this, so we need to rethink the approach. Braindump below...

The new plan is to do this:
Screenshot from 2020-09-01 08-35-30

The client will still be on the user's computer, the scheduler will be in Kubernetes, and it will remote-submit to a CHTC schedd. This will work because all the communications between the client and scheduler is from the client (i.e., they are outgoing connections, and will not be blocked by ISPs). The connections from the scheduler to the workers will be inside the CHTC network and won't be blocked.

The scheduler will need a remote-submit-enabled IDTOKEN, presumably generated by us on the user's behalf. I think that leads to the big question, which is: how do users "request" that the remote scheduler "service" be started up for them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant