Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not the proper method to specify required number of nodes for OLCF #710

Open
DanilaOleynik opened this issue Mar 4, 2019 · 9 comments
Open

Comments

@DanilaOleynik
Copy link

https://github.com/radical-cybertools/saga-python/blob/f460528a10f0e748e6a1d252be20789ff87228ac/src/saga/adaptors/lsfsummit/lsfjobsummit.py#L207

Hi,

OLCF supports cross-platform job scheduling from the different facilities (DTN cluster, RHEA etc) for Titan and will provide the same support for Summit very soon (in next few days). So, it will be beneficial to identify the number of cores per node based on hostname (at least for Summit). Probably will be better to have 42 cores by default, and only for summitdev use 20.

@andre-merzky
Copy link
Member

Hi Danila,

we are aware of that issue. The next SAGA release (which is inn preparation) will come with a significant change in that context: instead of using different code paths for certain configurations and systems, we'll begin to support machine specific configuration files which should address exactly the issue as referred to in this ticket.

@DanilaOleynik
Copy link
Author

Hi Andre,

That's sounds great! I already started preparation for deployment of Harvester instance for Summit (for ATLAS production reasons) and will be nice to have a similar software stack like for Titan.

@andre-merzky
Copy link
Member

Danila, do you have a time frame on when you will need this?

@DanilaOleynik
Copy link
Author

As always, yesterday :-) Will be good to have it ASAP - we need to migrate to Summit, and for the moment, it's one of the critical issues.

@andre-merzky
Copy link
Member

As always, yesterday :-)

hehe - why am I asking ;-)

The release which adds support for configuration files goes out next weekend. I hope we can release our summit a week or two later, and specifically the ability t configure CPN per host (summit should then work out of the box though).

@DanilaOleynik
Copy link
Author

Hi Andre,

OLCF support deployed LSF utilities/clients to DTN38 recently, that is quite excited. I am going to continue the deployment and configuration process for a production version of Harvester for ATLAS on Summit. So - the question about the readiness of LSF adaptor and example of configuration became hotter. I am already playing with version 0.60.0

@andre-merzky
Copy link
Member

Hi Danila,

late response, but the LSF adaptor is by now ready for Summit, and we use it there. It will be released in first week of July.

@DanilaOleynik
Copy link
Author

Great!

Will test it as only new release will be available.

Cheers,
Danila

@andre-merzky andre-merzky added this to the cfg milestone Feb 3, 2020
@andre-merzky
Copy link
Member

We should allow to specify number of nodes directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants