-
Notifications
You must be signed in to change notification settings - Fork 861
WeeklyTelcon_20230411
Geoffrey Paulsen edited this page Apr 11, 2023
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- A. Bouteilla (ATK)
- Edgar Gabriel (AMD)
- Geoffrey Paulsen (IBM)
- Howard Pritchard (LANL)
- Jeff Squyres (Cisco)
- Joseph Schuchart (UTK)
- Luke Robison (Amazon)
- Matthew Dosanjh (Sandia)
- Quincey Koziol
- Thomas Huber
- Thomas Naughton (ORNL)
- Todd Kordenbrock (Sandia)
- Tomislav Janjusic (nVidia)
- William Zhang (AWS)
- Austen Lauria (IBM)
- Brendan Cunningham (Cornelis Networks)
- Brian Barrett (Amazon)
- Christoph Nietham
- David Bernholdt
- Josh Fisher (Cornelis Networks)
- Josh Hursey (IBM)
-
Tuned and MCA parameter Issue
- Issue 11532 somehow related to Issue 11459
- Summary: Something that was in v4.x (CLI and MCA params) is now broken.
- We probably either need a code fix or a doc fix.
- We need an answer by v5.0.0 as this is a break in ShellScripts compatibility.
-
New issue came out of this
- Summary: Used to be two different formats of files:
- Tuned files, and MCA param files
- Did similar things, but two different formats.
- PRTE eliminated one of the formats entirely
- On OMPI side, have a minor incompatibility here.
- Right now they get no warning that the file's not being read.
- MINIMUM: Should emit an error (human is asking us to do something)
- Might need to be fix in schizo
- Discussion - do we want to put back 2nd flavor of MCA param file.
- A little weird we have silent translation for everything except this.
- Luke Volunteers to make the Issue
- Summary: Used to be two different formats of files:
-
MPIR Shim (https://github.com/openpmix/mpir-to-pmix-guide) went away.
- Howard pushed repo to somewhere.
- Howard will hook it up to CI for some testing.
- MPIR shim has some docs 12.7, it just need some new URL and info.
-
release RC11 last week. Please test.
-
Issue reported that map-by not working.
-
OFI nic selection broken on v5.0.x
- Fix itself updated the PMIx and PRTE pointers
- Testing this patch right now with RC11 - Thomas
-
python needs fix in PMIx and then bringing this into
- Come to some conslusion as to what is needed and can be backported?
-
PMIx in main has startup hang. 3%-20% failures.
- Luke see the hangs in OMPI
main
, but not as much in OMPIv5
- OMPI
main
when point to PMIxmain
- Tommy - one of the reasons why we're reluctant to push PMIx pointers.
- Whatever fixes that go into PMIx
- Luke see the hangs in OMPI
-
We'd talked about supplying some docs about how HAN is great, and why we're enabling it for v5.0.0 by default.
- Like to include instructions on how to reproduce as well for users.
- Section 5 Open MPI specific Features - good to highlight
- Geoff emailed George, asking if they can.
- George just linked in #general slack channel. 6x increase in speed.
- George is asking for volunteers.
- Ask about sitable paper(s)
- PR 11579 - Howard's adding some debug
- HCOLL is using OMPI devel headers when compiling in debug
- But part of installing devel headers, put communicator's headers.
- People are complaining that if they have a sessions only test, it'd segv down in HCOLL because it's deciding that MPI_COMM_WORLD is valid. HCOLL is MPI3 compat, but not MPI4.
- Bug in libhcoll (external to OMPI)
- This is a bug in libhcoll, Known issue in HCOLL version (X)
- Howard is closing this PR, and adding this as a known issue.
- No travel planned.