-
Notifications
You must be signed in to change notification settings - Fork 868
WeeklyTelcon_20160830
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squyres
- Brad Benton
- Edgar Gabriel
- Geoffroy Vallee
- Howard
- Josh Hursey
- Joshua Ladd
- Nathan Hjelm
- Ralph
- Ryan Grant
- Sylvain Jeaugey
- Milestones
- 1.10.4
- Only potential blocker is issue with wrapper compiler.
- mpifort is not libpath-ing rpath lib
- when you do C builds, add rpath to all dependent libs during build.
- static builds on 1.10
- Only potential blocker is issue with wrapper compiler.
-
- No PRs Left.
- But OSHMEM tests are not DOA, but still failing 20% of tests.
- it's better, so might just ship 2.0.1 as is, and work on fixing these in next released.
- Still thinking that we should get 2.0.1 out, and get a fix in this fro v2.1.0 (short cycle for that).
- Comm Spawn is still Broken. - timeout in OPAL_PMIX_Exchange macro. Fixed in master?
- Bug in CID creation in PMIx? Fix got rolled into ___, so have to be back ported.
- When refactored code to make all CID Allreduces to be non-blocking, this got hit.
- Symptom is a race condition where keys are becoming the same.
- Only an issue in dynamics, normal comm creation uses MPI traffic, not PMIx.
- Was this broken in 2.0.0? Ralph would be surprised, because he watches for this, but isn't certain.
- Nathan could look at code, and compare new / old, to see if he can remember, or get loop_spawn to work again on cray.
- If this is a regression, We shouldn't release.
-
Assuming we'll ship soon, go refactor your PRs from 2.1.0
- Will start merging 2.0.2 PRs in, and then close ompi-release, and then merge the two repos in ompi repo.
-
Timeline for 2.1.0 is very short, because we wanted a small number of fairly low to medium risk that we can get done by end of October. Probably looking at freeze end of September. Shooting for mid-October.
-
Don't yet have a plan for 2.0.2
- Going to make a new fork? What do we call that new fork? is it 2.2 or 3.0? Depends on backwards compatibility story.
-
coll_sync - slated for 2.0.2, classified as bugfix, but don't dump in at last second before 2.1.0
-
Mellanox needs PMIx 2.0 in 2.1.0
- PMIx will release a 2.0 that just has shared memory data as an addition,
- but doesn't have everything else they were targeting for 2.0.0.
- This should come out Early September.
- This is the piece that Mellanox and IBM are interested in.
- Put items requested on the wiki (e.g., PMIx direct modex, OpenSHMEM, stability improvements)
- What do people want to see for 2.1.0?
- Finalize the list in Dallas meeting
- Hopefully target Sept./Oct. release, not Super Computing Goal.
- PMIx will release a 2.0 that just has shared memory data as an addition,
Review Master MTT testing (https://mtt.open-mpi.org/)
- Master has a sea of red, due to OSHMEM issues.
- mpifort failing to link on 1.10 with static as well.
- MPI_Comm_spawn failures at mellanox and maybe ibm. Failing on master a week ago, and now failing on v2.x
- Was working a week or two ago.
- Howard or Ralph will look at when they get some cycles.
- Sylvain might look at some PMIx commits also on v2.x and see if he can isolate.
- Ralph made a lot of progress there. Still need to get submission thing working.
- One
- Josh started moving MTT server to Amazon cloud server.
- No progress last week. Just need to test, and work with Jeff on DNS, and schedule a day to do the move.
-
Next steps for migration?
-
Jenkins and MTT is all that's left.
-
Got download numbers to Edger, some interesting data he'll share (devel list?)
-
Non-profit stuff.
- Cisco is okay with.
- Quarterly opportunity to apply is coming up. We fill out a proposal, and they will accept or reject.
- We're on their agenda (end of september).
- Should get non-profit prices for github dues (Possibly reduced or $0) Unfortunately bill is coming up soon, so Jeff will ask if we can just pay for a month or two, instead of full year.
-
Contribution agreement. Now that we're on github, we're getting more and more anonymous contributions.
- Some folks (who haven't yet signed contributor's agreement) have some IPv6 fixes, and kind of new feature.
- First patch is more bugfix, and restores IPv6 functionality.
- Second patch is more of a non-local feature. Bigger, more technical discussion needed.
- These are critical for Mellanox (in master). Need to be able to run on IPv6 only systems.
- If it's a one-off, just remind them to check with company and make sure it's okay, then do git signoff to make sure you understand it.
- Just put this into the contributors document. Modify this document to explain the process.
- Some folks (who haven't yet signed contributor's agreement) have some IPv6 fixes, and kind of new feature.
-
Other Open Source communities have a big list of things that contributors agree to when they git signoff on a commit. We could do something like that.
- The Agreement also protects the company that contributes.
- Changing the rules on contributors members.
- 2nd issue is that if it's a "big" change that we'd normally require a contributor agreement, members need to have their legal teams review the change in contributor agreements.
- Once Jeff writes up alternate language for contributor's agreement, then all members should get them reviewed by their legal departments.
-
C89 -
- By removing the C99 check, he's defaulting back to GNU89, which isn't even a superset of C89.
- Giles approach is a bit better, but not a good idea.
- when you have a bad GCC, can fix glibc version BLAH, these are the functions that failed to link.
- Patches are incomplete, because glibc on system was built with GCC without C89 compiler. It's not C89, it's GNU 89.
- inlining is different.
- If you can't use GNU 89, can add an attribute to functions to make things compile.
- Consensus to drop this, if submitter wants to answer questions asked on list, we'll consider it.
-
Date of another face to face. January or February? Think about, and discuss next week.
-
Non-Profit
- Ralph sent email out to list, please comment either pro/con.
- LANL, Houston, IBM
- Cisco, ORNL, UTK, NVIDIA
- Mellanox, Sandia, Intel