WeeklyTelcon_20200804

Open MPI Weekly Telecon ---

Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

Did not capture attendance accurately -- this may not be fully correct. I put a "yes" next to the people I know were there today.

NOT-YET-UPDATED

Release Branches

Blockers All Open Blockers

Review v4.0.x Milestones v4.0.5

Still waiting on blocker (also v4.1): cache line stuff
- Why is this a correctness issue (not just a performance optimization)?
  - We align the data in the shared memory stuff to be on cache line sizes
  - We start the ring every 128 bytes (i.e., local rank 0)
  - Other processes then find out the real cache line size of 64.
  - Then other processes attach to shared memory, and use the cache line size/alignment of 64.
  - First message will get sent, but then the 2nd message will never be received (and/or it's reading corrupt data because it's reading at offset 64 instead of 128).
- How is this not happening anywhere else?
  - Previously, cache line size was setup very, very late (after all the shmem stuff was setup -- even the non-local-process-rank-0). I.e., we got lucky.
  - I.e., we brought the hwloc initialization forward at some point and broke this.
  - This only happens in smcuda BTL (and possibly only in single-node runs, because other BTLs/PMLs may have been selected).
  - The plain sm and vader BTLs do this differently.
  - Meaning: this is a very specific corner case.
- Solutions?
  - Trivial fix: just have everyone use a fixed value (e.g., 128 or 64).
  - Pretty simple: modex-send the size to be used from local rank 0 to the others. The others modex recv the value and use it.
  - A little more complicated: also add code to smcuda to read the Linux /proc / /sys / whatever to get the cache line size.
- There's a PR for master that does the fix -- but in a way that will kill scalability.
  - Once Brian's configury fixes are in, this is easy to fix on master.
  - Or it could be done the "A little more complicated" way, above. Neither of which are difficult.
- For 4.0 and 4.1: George will make one-liner patch to make everyone use a fixed value.
  - This clears the blocker.
https://github.com/open-mpi/ompi/issues/7968: added something to README for v4.0: there's a known issue when using UCX with very, very old IB hardware (pre-Connect X) -- it'll segv. According to Mellanox, UCX 1.10 will fix this issue.

Review v4.1.x Milestones v4.1.0

Same cache line blocker as v4.0.
https://github.com/open-mpi/ompi/issues/7982: OFI BTL and FI_DELIVERY_COMPLETE. This only matters for MPI one-sided.
- EFA and other providers are misbehaving
- https://github.com/open-mpi/ompi/pull/7973: PR for fix: Disable EFA provider
  - ...but then later discovered that other providers also misbehave in the same way.
- AWS proposal: extend #7973 to exclude other providers that misbehave.
- Meaning: if you're using libfabric over verbs, the OFI BTL won't be used.
  - In v4.0x, there is no OFI BTL. So this is not an issue.
  - In v4.1 this is a minor inconvenience because we still have osc/pt2pt. I.e., OMPI will automatically fall back to osc/pt2pt.
  - This is unfortunately a big problem for master/v5.0. Need to figure this out -- i.e., coordinate with libfabric community.
  - NOTE: This is a different code path than the MPI-one-sided problem Cisco MTT discovered when we removed osc/rdma (and all MPI_WIN_CREATE operations failed).
    - Looks like Cisco MTT is still failing one-sided tests -- need to follow up with Nathan.
- Howard asks: how can I see this problem?
  - Anything with MPI_PUT. E.g., IBM one-sided tests.
ADAPT / HAN.
- Need to test and produce some documentation for ADAPT and HAN.

Review v5.0.0 Milestones v5.0.0

No update this week other than master discussion.

Master

osc/pt2pt removal on master
- George: There are many machines where osc/pt2pt is the only mechanism, and it was the most performant.
- Brian: osc/pt2pt wasn't removed because it wasn't needed, it was removed because it's very buggy (to include no good path to becoming multi-thread safe) and "unrecoverably broken" (Brian's words! And he wrote it!) and no one will take ownership of fixing it.
- ...so if someone wants to take ownership of fixing it, they can!
Ralph points out:
- AWS MTT builds for SLURM, need to fix up the compiles for external hwloc/libevent. Brian+William will talk internally.
- Java: builds failing from Aurelien PR. He'll have a look.

Annual review of OMPI committers

It's after July, so Jeff will go de-activate people.
- Brian will go do it today.

Virtual meeting next week

Agenda items for next week.
- Talk through MPI-4 features. Howard will make a list of big-ticket MPI-4 features (from MPI-4 changelog).
  - Sessions
  - Default error handler
  - ...etc.
- Walk through PRRTE issues.
  - Figure out: which are blockers for v5.0? (etc.)
- With these two, we're good enough for Monday's meeting.
  - Please add any other items to the wiki.
  - We'll evaluate if we still need Tuesday's meeting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WeeklyTelcon_20200804

Open MPI Weekly Telecon ---

Attendees (on Web-ex)

Release Branches

Blockers All Open Blockers

Review v4.0.x Milestones v4.0.5

Review v4.1.x Milestones v4.1.0

Review v5.0.0 Milestones v5.0.0

Master

Annual review of OMPI committers

Virtual meeting next week

Back to 2020 WeeklyTelcon-2020

Clone this wiki locally