Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SLURM control arguments #1033

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Add SLURM control arguments #1033

wants to merge 5 commits into from

Conversation

outpaddling
Copy link

Aurash Mohaimani and I developed and extensively tested these new parameters for tuning SLURM resource limits at individual stages of the pipeline. Without these controls, canu tries to use all available nodes, which is neither kosher nor particularly beneficial on a large, shared cluster.

In our experience, a few hundred cores have provided more than adequate performance for the CPU-bound steps in large genome assemblies. A few dozen have worked fine for the I/O-intensive steps.

We took what appeared to be the least invasive approach, utilizing SLURM's array job limiting capabilities.

I suspect that other popular schedulers such as PBS have similar features to which these changes could be easily adapted.

It seems to me that a scheduler-agnostic approach would be more difficult and invasive, though.

Regards,

Jason

@skoren
Copy link
Member

skoren commented Aug 16, 2018

I personally think it's the scheduler's job to restrict computations. The running processes want to complete as fast as possible so request lots of resources. The grid is free to deny them and throttle them. This could also be handled through restricting runs to specific queues. We most often use Canu on a large shared SLURM cluster submitting thousands of jobs and they just get throttled by the scheduler dropping priority to allow other users to run.

However, I understand not all grids may be configured to handle these requests well. So we probably do want to add some way to limit the jobs on all grids. It makes sense to use the existing *Concurrency options I think rather than adding new slurm-specific options. This would mean adding a new grid configuration option in the Grid_*.pm files to tell Canu how to restrict jobs and building in the array job limitation into the array command in Execution.pm.

@outpaddling
Copy link
Author

At our site, a decision was made by consensus among faculty and IT staff not to impose any limits via software, but let the users exercise judgment in fair share. So giving them the ability to govern themselves is important to us and we appreciate your willingness to provide it. Let us know if there is anything we can do to help. Our testing will be limited to SLURM environments as we no longer run other schedulers (except HTCondor, which I don't think is supported).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants