Skip to content

WeeklyTelcon_20190115

Geoffrey Paulsen edited this page Mar 12, 2019 · 2 revisions

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Geoff Paulsen
  • Jeff Squyres
  • Brian Barrett
  • Josh Hursey
  • Ralph Castain
  • Xin Zhao
  • <oops, missed recording a few others on the call this week>

not there today (I keep this for easy cut-n-paste for future notes)

  • Todd Kordenbrock
  • Edgar Gabriel
  • Howard Pritchard
  • Joshua Ladd
  • Aravind Gopalakrishnan (Intel)
  • Nathan Hjelm
  • Dan Topa (LANL)
  • Thomas Naughton
  • Matias Cabral
  • Akshay Venkatesh (nVidia)
  • David Bernholdt
  • Geoffroy Vallee
  • Matthew Dosanjh
  • Arm (UTK)
  • George
  • Peter Gottesman (Cisco)
  • mohan

Agenda/New Business

  • Session work is complete (Nathan and Howard worked on)

    • or check archives for MPI sessions working group.
    • works with MPI_Init.
    • Involved a lot of cleanup for setup and shutdown.
    • Can keep it as prototype, or put it in, without headers.
    • For MPI_Init/MPI_Finalize only apps, fully backward compatible.
      • Initialize a "default" Session.
    • Asking about adding this to master in mid-January
    • Part of cleanup is to have reverse setup and shutdown.
    • Cleanup sounds good. Well contained. Set of patches.
      • Calling it "instances" inside of MPI, but we'll be renaming it if/when MPI standardizes sessions.
    • Summary - patches for cleanup lets do them and look at them.
      • Under work for sessions, need to look at a bit closer
      • We can discuss sessions bindings in the future.
    • Session init is all local, so timing should still be good.
  • github suggestion on email filtering

Minutes

Review v2.1.6

  • Released v2.1.6 last week.
  • Will remove v2.1.6 from weekly discussions.

Review v3.0.x Milestones v3.0.3

  • Scheduled 3.1.4 may of 2019? Probably earlier
  • Brian will put out an RC on Friday
  • PR6097 - conflict needs resolving in 3.?.x
    • made cherry-pick easy in 3.1.x, but HARD in v3.0.x.
    • Josh H will take a look.
  • Should resolve https://github.com/open-mpi/ompi/issues/6198 before releasing

Review v3.1.x Milestones v3.1.0

  • Brian will put out an RC on Friday
  • Scheduled 3.1.4 april of 2019? Possibly earlier
  • PR6097 - conflict needs resolving in 3.?.x
    • made cherry-pick easy in 3.1.x
  • Should resolve https://github.com/open-mpi/ompi/issues/6198 before releasing

Review v4.0.x Milestones v4.0.1

  • Schedule: Need a quick turn around for a v4.0.1
  • v4.0.0
  • https://github.com/open-mpi/ompi/issues/6278
    • Removed symbols and nice message on master and v4.0.x does not give a compile time error. What do we want?
      • Do we want compile time error? Or just removed symbol and linker error
      • Could add a Check for C11, and use 'static assert' for nice message.
      • For older compilers could just NOT declare the function.
        • but that doesn't work for v4.0.x since the symbols in the library will be there, and the comiler will only issue a warning that about no prototype, but will succeed and link correctly.
        • It was decided that this is okay, if the C11 static assert check is in mpi.h. Most users set 'no prototype' as an error.
    • Tests on v4.0.x started passing, but possibly false positives. We will look at how the ibm tests are passing with #6278 issue on master and v4.0.x
  • Should resolve https://github.com/open-mpi/ompi/issues/6198 before releasing

Master

  • Issue 6242 -
  • Issue 6228 - Open MPI v4.0.2 would like PMIx 3.1.0 (still unreleased)
  • PR 6191 - Aravind - asked Brian and Howard to take a look.
  • Opal Issue - One version embedded in Open MPI, and another in PRTE.
    • How do we manage that overlapping code?
    • similar to libevent, and hwloc (prte, pmix, and ompi)
    • Already affecting us, if you want an external PMIX, you have to use external libevent, and hwloc
    • We have a decision in near future about libopal. Used by other packages, need to figure out a way out of this.
    • Brian is writing a doc on an approach
    • Some discussion.
  • Libtool issue came up before or during supercomputing.
    • Went around with many options - Ultimately will need to version all .so's
      • need to explicitly version on each release branch going forward.
      • WONT make opals on various release branches compatible with each other.
  • Amazon AWS / Jenkins is still crippled
    • Jenkins Broke the EC2 plugin, and there is a fix for EC2, but EC2 has not released the fix.
    • Scope of how this affects Open MPI Projects:
      • release build process is broken
      • only about 10% of CI tests right now.
    • Status: we're currently stuck waiting on this EC2 fix.

PMIx

  • Ralph worked a lot on PMIx Tools interface, and documenting it for standard.
    • Ralph should have 3 new chapters of PMIx v4 standard document in a few weeks.
    • Ralph will send email to PMIx announce list.
    • PMIX gropu, PMIX tools, and PMIx fabric
  • Will release a version of PMIx v3.1.0 in next week or two for Open MPI v4.0.x

MTT

  • Cisco showing build failure.
  • IBM test configure should have caused that.
  • Cisco has a one-sided info check that failed a hundred times.
    • Cisco install fail looks like a legit compile fail (ipv6 master)

New topics

  • March 4th is next MPI Forum (then June)
  • Target mid-late April Face to Face.
    • Pick a date first, and then figure out where.
    • Jeff will send out Doodle.
  • We have a new open-mpi SLACK channel for Open MPI developers.
    • Not for users, just developers...
    • email Jeff If you're interested in being added.

Review Master Master Pull Requests

  • didn't discuss today.

Oldest PR

Oldest Issue


Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2018 WeeklyTelcon-2018

Clone this wiki locally