-
Notifications
You must be signed in to change notification settings - Fork 868
WeeklyTelcon_20171107
Geoffrey Paulsen edited this page Jan 9, 2018
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Jeff Squyres
- Geoff Paulsen (IBM)
- Brian
- Edgar Gabriel
- Geoffroy Vallee
- Todd Kordenbrock
- Thomas Naughton
- Ralph
- Howard Pritchard
- Josh Hursey
- Nathan Hjelm
- Mohan
- George
- Set Async Modex options - have them fixed in a branch for PMIx.
- have to turn off dstore for this to work. (3 bugs in dstore)
- Brian looking at a solution.
- Ucx can use instant on by fence in ucx instead of PMIx
- Any interconnect can use instant on approach - if you have an app that's sparcely connected, then these parameters will get you a much faster start.
- Most fabrics (other than uginie BTL) need a barrier before communication.
- ONLY works if dstore is off, since dstore incorrectly tells folks that the data has arrived.
- Decision: If users set params to use Async Modex, it will auto disable dstore.
- 3 things to fix in dstore.
- Runtime option - disable dstore if async is requested.
- Fix cross versioning issue
- return a pointer to memory where something is stored, but can't today because they're packing data.
Review All Open Blockers
Review v2.0.x Milestones v2.0.4
- Decided to merge a few other things yesterday.
- IBM Jenkins was problematic - just ignore until they can get back online.
- Howard can't get to AWS instance anymore.
- This is new, Brian can't look at it today.
- Schedule: At this point, it's looking like a post Supercomputing.
Review v2.x Milestones v2.1.2
- Still over the New Year horrizon
Review v3.0.x Milestones v3.0
- v3.0.1 - No RC canidate. Hope to get to this week, but
- No real timeline.
- Are there any current blockers?
- Jenkin's server is off.
- Edgar how are we on IO?
- If 3.0.1 or 3.1 are not getting out this week, we should try to get bugfix from yesterday in.
- Edgar just pulled it into master, will create PRs to v3.0.x and v3.1.x soon.
- Will hold RC for this.
- Issue 4453
- Performance of IO was a bit scary, but a neccisary hit for correctness.
Review v3.1.x Milestones v3.1](https://github.com/open-mpi/ompi/milestone/27)
- Schedule -
- Outlook - Probably will not get out by supercomputing. :(
- Brian will send out requests to start testing v3.1
- Cisco did last week.
- Add v3.1 to MTT tests
- Database is active now to accept v3.1 tests.
- MTT disk full issue has been resolved.
Review Master Master Pull Requests
- Can't tell how master is doing.
- What are we testing?
- We test builds on a number of platforms, a number of compilers, and a number of configurations
- The tests a are a bit more limited than we'd like.
- Makes sure it runs ompi_info, run example programs using shared memory.
- Adding tests to run single-node is pretty easy.
- George: Open Sheme is broken in master, and George had to go back to v2.1 to get it to work.
- UCX couldn't find remote peer. He can use UCX as MTL in Open MPI.
- Challenge with OSHMEM - we removed support for BTLs, so have to have a transport provider that supports OSHMEM, which is only UCX.
- We need to setup a test build with UCX.
- Howard has some ARM boxes to test multi-box OSHMEM
- IBM will enable OSHMEM + UCX in coming weeks.
Review Master MTT testing
- PR - New Compare-and-swap function will return a new type to use new C11.
- Suggested using a configure based switch, and configure figures out how to #define.
- Configury can figure out what doesn't work, and enable renaming of types, etc.
- Configure will detect C11 atomics under the covers if the compiler is C11.
- If we use C11 signature, we'll need to use generics.
- Those generics will be protected by the same macro. Compilers in general implemented generics before C11.
- It's a mess because there is no standardization around generics.
- Nathan: This is why this is the final step.
- Any operation on atomic int because atomic. So need a new OPAL_Atomic that will either be atomic int or C volatile.
- But you're right, for generics to work, types have to exactly match.
- IF you're using attributes, it's even worse.
- ONLY need for a subset of types: size_t, int32, int64, pointer, and a couple others.
- If you want to use the OPAL Atomic interface, you MUST use one of the six supported types.
- Need to ensure all compilers supported, support Generics.
- But they didn't standardize the naming of the type.
- Open MPI WOULD be standardizing the name of the type internal to our code.
- Eventually Nathan wants to move to where atomics can be used automaticlly with C11.
- But sometimes it's slow (and correct)
- If you know what you're doing, then can cast away atomic for speed.
- But sometimes it's slow (and correct)
- Will not affect external MPI_Ops (they're not atomics).
- Patcher Overwrite - PR 2941
- Allinia - Debugger attach on v2.x - Issue 3660
- Request they retest with new version - Issue 3384
-
Need to see if Attributes are MT - IBM will see if we have any tests to audit.
- Asked, need to get answer back from them.
- Jan / Feb
- Possible locations: San Jose, Portland, Albuquerque, Dallas
- Discuss What to do for partner's broken CI pieces?
- Big section of going through old issues and old PRs.
- Mellanox, Sandia, Intel
- LANL, Houston, IBM, Fujitsu
- Amazon,
- Cisco, ORNL, UTK, NVIDIA