-
Notifications
You must be signed in to change notification settings - Fork 868
WeeklyTelcon_20210413
Geoffrey Paulsen edited this page Apr 13, 2021
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Austen Lauria (IBM)
- Brendan Cunningham (Cornelis Networks)
- Christoph Niethammer (HLRS)
- Edgar Gabriel (UH)
- Geoffrey Paulsen (IBM)
- Harumi Kuno (HPE)
- Hessam Mirsadeghi (UCX/nVidia)
- Howard Pritchard (LANL)
- Jeff Squyres (Cisco)
- Joseph Schuchart
- Josh Hursey (IBM)
- Matthew Dosanjh (Sandia)
- Michael Heinz (Cornelis Networks)
- Naughton III, Thomas (ORNL)
- Raghu Raja (AWS)
- Ralph Castain (Intel)
- William Zhang (AWS)
- Akshay Venkatesh (NVIDIA)
- Artem Polyakov (nVidia/Mellanox)
- Aurelien Bouteiller (UTK)
- Brandon Yates (Intel)
- Brian Barrett (AWS)
- Charles Shereda (LLNL)
- David Bernhold (ORNL)
- Erik Zeiske
- Geoffroy Vallee (ARM)
- George Bosilca (UTK)
- Joshua Ladd (nVidia/Mellanox)
- Marisa Roman (Cornelius)
- Mark Allen (IBM)
- Matias Cabral (Intel)
- Nathan Hjelm (Google)
- Noah Evans (Sandia)
- Scott Breyer (Sandia?)
- Shintaro iwasaki
- Todd Kordenbrock (Sandia)
- Tomislav Janjusic
- Xin Zhao (nVidia/Mellanox)
- blocking on PR 8769 issues (see New topics above)
- Mark commented 5 days ago that it might still not work.
- OSU with Dynamic window issue 8774
- Many folks in Israel are out this week.
- Not a show stopper, requires options, and presumably some variant of UCX will fix it.
- Same as issue 8212?
- No, 8774 is different and older.
- Taking small PRs
- Same waiting-mode as v4.0.x
- Pushing back the alpha build for v5.0.0 from this Friday to NEXT friday.
- Issue 8776 - libevent confusion if running with external 3rd party tools
- PR 8792 - Need to move this over to v5.0.x
- Need to check with Brian if this is relevant on v4.0 or v4.1
- compile with --disable-dlopen, or slurp in all of the plugins.
- 3 line change, should be small work.
- Not a linker error, job just hangs and fails, really might want on v4.0 and v4.1
- PR 8799 - should probably be PRed to v5.0
- Howard's concerned that these package specific for config lookups, into the way that mpicc is linked, (for example cray)
- mpicc --show - shows some long dependencies.
- Just let him know on the ticket.
- Howard will update the ticket.
- Howard's concerned that these package specific for config lookups, into the way that mpicc is linked, (for example cray)
- Docs - Man pages will be included in this effort.
- Likely include nroff and http in the tarball (so users don't need sphynx, and don't need internet)
- If this doesn't make v5.0.0, it can go into later.
- Packagers need some advice, and need a README, few more weeks at minimum.
- 8808 - same memory backing file.
- what is the failure profile for this?
- Rare, but what happens is if two users are sharing a node, and we leave backing files because a job fails, another user tries to create the backing file, it can conflict. So we add user-id to give a little more safety for conflicting.
- Does mean that there's a cleanup issue for shared memory files.
- Only reason is because moved the backing file out of dev/shmem.
- Came to the conclusion that we want this as soon as possible.
- Geoff has a clang-v10 format file.
- Ongoing code changes?
- git integration.
- Some things it does really UGLY.
- Struct tables, it munges together.
- no way we could find to say "leave this alone".
- If we have a CI that's checking, then that'll fail.
- Are we going to ask everyone to put this git-config and just deal with it?
- How do we make this ongoing?
- Brian has a to-do to fail a PR if it doesn't like the format?
- just hasn't bubbled up.
- Will want some way to exclude something.
- Brian CANT turn on REPO yet, because entire repo hasn't been done yet.
- How will people cleanly do this BEFORE they put them up (so people don't play wack-a-mole).
- Can do the reformat before we have this checker.
- CI checker will hit some issues.
- When you do it, do it with a clean clone, and don't init the submodules
- from Christoph Niethammer HLRS (Guest) to Everyone:
Clang-format understands also special comments that switch formatting in a delimited range. The code between a comment // clang-format off or /* clang-format off */ up to a comment // clang-format on or /* clang-format on */ will not be formatted. The comments themselves will be formatted (aligned) normally. (source: https://releases.llvm.org/11.1.0/tools/clang/docs/ClangFormatStyleOptions.html )
- Sessions branch is pretty big, But howard wants to wait until v5.0.0 has been released for a while.
- So plan was to wait for rest of formatting until sessions is rebased, and then format master.
- Howard's having a few more issues on sessons, so is okay with us reformatting
- Wont merge to Open MPI master until v5.0.0 is at a point where it won't take big PRs.
- Rebase this on top of ULFM is also challenging.
- Probably will do a 2 stage rebase. Rebase up to the Opal reformat, and then squash, and possibly run clang-reformat on the sessions branch, and then try to rebase on top of whatever else is on master.
- possibly a few weeks out.
- Release a few weeks out.
- Also some changes with libcurl, especially since this breaks OMPI built.
- PMIx can interface with REST interfaces (used by libcurl)
- JSON
- Build system issue in PMIx when we changed to static DSOs.
- Think this has been resolved
- rhc has no strong issues either way.
- We prepend LD_LIBRARY_PATH pointing to the PRRTE installation.
- At the moment in OMPI, we overlay this with OMPI library location.
- Seems like the best fix would be to make these two independent.
- PREFIX - enable prefix by default.
- In Open MPI happens to be the same as the OMPI prefix.
- But PRRTE does this by default, because we want the daemons to match the commands.
- OMPI doesn't want to do that. And that's okay
- Instead of --enable prefix-by-default we need --enable mpi-prefix-by-default.
- Looking at it from OMPI perspective
- user asked for prefixing, user wants prefixing... dont care if same or not, just want it to work.
- If user DOESNT want prefixing, then dont want EITHER prefixing.
- But if have a global PRRTE that might want prefixed.
- PRRTE will prefix by default
- What happens when I want MPI libs redirected?
- Problem is if you build PRRTE INTERNAL, then you can't redirect MPI libraries.
- Gotta set PATH and LD_LIBRARY_PATH correctly
- One of those things, --enable-prefix is NOT default in < v4.0
- There are times when want to redirect OMPIs to a different set of libraries.
- right now it's a configure / compile time, which is problematic. have to redo all of the subcomponents.
- What would be nice is if this was at runtime, so that ompi's mpirun can find all of the subcomponents at runtime.
- Setting LD_LIBRARY_PATH is the way to point to another set of libraries.
- This breaks because mpirun will overwrite LD_LIBRARY_PATH.
- Personally Doesn't want this as a default.
- Joseph doesn't want us setting LD_LIBRARY_PATH
- Jeff will discuss absoft to upgrade gcc (need C11 compiler for 32bit support)
- PMIx and PRRTE are close to a release canidate.
- This week ( First full week of April)
- What do we do with the mpirun Manpage?
- Didn't want OMPI requiring Sphynx, but if PRRTE and PMIx in same tar
- Ralph almost has singleton comm spawn working
- Single node without the mpirun process
-
OMPI docs and manpages, but persistant problem that mpirun is really prrterun
- PMIx and PRRTE now use pandoc. It'd be bad to require both pandoc and sphynx
- Josh Hursey is wrote this up https://github.com/openpmix/prrte/issues/931, as a means to draw how to man mpirun for Open MPI
-
PR 8329 - convert README, HACKING, and possibly Manpages to restructured text.
- Uses https://www.sphinx-doc.org/en/master/ (Python tool, can pip install)
- Intent this is for v5.0
- mpirun / prrterun - we had quite a bit of details in orte, but are updating as much as possible.
- Ralph has asked about this for PMIx/PRRTE since this is turning out to work
-
No update - 3/16
- Could be independent of PMIx and PRRTE.
- PMIx and PRRTE want to follow suite, and not require both pandoc and sphynx.