Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hwloc v2.1.0: use a git submodule #6821

Merged

Conversation

jsquyres
Copy link
Member

@jsquyres jsquyres commented Jul 15, 2019

Update

This PR changed a little since it was first created. Short version:

  • Rename hwloc210 component to hwloc2
  • Make a submodule that now points to the hwloc-2.1.0 tag in the hwloc repo

Original

Remove the entire opal/mca/hwloc/hwloc201/hwloc directory, which was
just an expended hwloc v2.0.1 tarball. Replace it with a git
submodule pointing to https://github.com/open-mpi/hwloc.git at the
hash 4fe5f41804eaf2bcd84e7ac98dc82f6e3df04b86, which is equivalent to
the hwloc-2.0.1 tag.

Signed-off-by: Jeff Squyres jsquyres@cisco.com

This is a first foray into git submodules: just replace opal/mca/hwloc/hwloc201/hwloc (which is just an expanded hwloc v2.0.1 tarball) with a git submodule pointing to the hwloc github repo, pointing at the hash that is the hwloc-2.0.1 tag. There will be no auto-updating this submodule: it's just a way for a) us to stop physically embedding the hwloc source code in our repository, and b) to learn a bit about git submodules before we start using them more extensively elsewhere in the Open MPI code base.

This PR will almost certainly fail CI until Jenkins learns how to git clone --recursive ....

@bgoglin
Copy link
Contributor

bgoglin commented Jul 15, 2019

How do you tell OMPI's make dist to include hwloc's stuff?

@jsquyres jsquyres force-pushed the pr/make-hwloc201-tarball-a-submodule branch from 6b9d22a to b814ff7 Compare July 15, 2019 21:19
@jsquyres
Copy link
Member Author

@bgoglin When you git clone --recursive ... the OMPI repo, you'll get a full copy of the hwloc repo at 4fe5f41804eaf2bcd84e7ac98dc82f6e3df04b86, too. That content will just no longer be physically stored in OMPI's git repo. So things like ./autogen.pl, ./configure, make all, and make dist work fine (although I did just fix up a commit on this PR for make dist 😬).

All the CI's are currently failing because they're not doing git clone --recursive ....

@rhc54
Copy link
Contributor

rhc54 commented Jul 15, 2019

@jsquyres No reason to treat libevent any differently as they are also faithfully tagging their releases. I'd suggest just pointing to their latest stable release tag: https://github.com/libevent/libevent/releases/tag/release-2.1.10-stable

@hjelmn
Copy link
Member

hjelmn commented Jul 16, 2019

Oh the humanity! Are we sure this is a good idea after all the discussion about the evils of git submodules?

@jsquyres
Copy link
Member Author

@hjelmn Yes 😄. We've been talking about this for a while now; this is the first step towards separating out:

  • hwloc
  • libevent
  • pmix
  • OPAL

The latter two will have a bot that updates the git submodule hash in Open MPI whenever commits are made downstream.

@jsquyres
Copy link
Member Author

@rhc54 Re libevent: possibly, ya. Do we have any local changes to our libevent tree?

I wouldn't mind getting used to git submodules with just hwloc first. There are changes to git commands that our development community will need to get used to, for example. We might be able to put some detection in autogen.pl -- e.g., if you don't git clone --recursive ..., your opal/mca/hwloc/hwloc201/hwloc tree will be non-existent (or empty? I forget which -- but it'll be something detectable).

@rhc54
Copy link
Contributor

rhc54 commented Jul 16, 2019

if you don't git clone --recursive ..., your opal/mca/hwloc/hwloc201/hwloc tree will be non-existent (or empty?

It is empty, so we can detect it easily enough by looking for a signature file in the top level of the hwloc tree.

Do we have any local changes to our libevent tree?

Yeah, I believe @abouteiller recently added one to prevent an infinite loop under an unusual race condition. I asked that he push it upstream so it may be something we can pickup from there.

However, this may be indicative of an issue for all the submodules - we may develop updates that take some time to circle back to the submodule. What do we do with those changes in the meantime?

@jsquyres
Copy link
Member Author

What we talked about yesterday on the RM call was the possibility of having a github.com:open-mpi/libevent-mirror, where we can have our own copies of libevent. It would likely be exactly for this case: basically a mirror of the real libevent git repo, but with any local fixes that we want applied to it.

For hwloc, the hope is that we don't need to do this, because we have a good/close relationship with the primary developer (i.e., history has shown that @bgoglin has been very helpful/receptive to taking any fixes that we need into the main hwloc repo/release branches).

@jjhursey
Copy link
Member

bot:ibm:gnu:retest

@jjhursey
Copy link
Member

I unexpectedly had a couple of minutes free this morning so tried this with the IBM:GNU CI.

The Jenkins Git Plugin 2.0 has an Additional Behaviours option for Advanced sub-modules behaviours that lets you add the necessary option.

So in your Jenkins project Config under Source Code Management there is Git and under that is Additional Behaviours. Select the drop down option for Advanced sub-modules behaviours then check the box next to Recursively update submodules.

That will cause it to search in the subtree for submodules. You should see output like the following:

08:44:55  > git checkout -f 75d9bb76fba99dbb6a8a720f4aa31d5056e6864c # timeout=60
08:45:05 Commit message: "Merge b814ff79c9be707ac36848d865c24fcfea985458 into 06c6325bc8ae7cb1cd286768da8c8ebd9b44469f"
08:45:05 First time build. Skipping changelog.
08:45:05  > git remote # timeout=10
08:45:05  > git submodule init # timeout=10
08:45:05  > git submodule sync # timeout=10
08:45:05  > git config --get remote.origin.url # timeout=10
08:45:05  > git submodule init # timeout=10
08:45:05  > git config -f .gitmodules --get-regexp ^submodule\.(.+)\.url # timeout=10
08:45:05  > git config --get submodule.opal/mca/hwloc/hwloc201/hwloc.url # timeout=10
08:45:05  > git config -f .gitmodules --get submodule.opal/mca/hwloc/hwloc201/hwloc.path # timeout=10
08:45:05  > git submodule update --init --recursive opal/mca/hwloc/hwloc201/hwloc

@jjhursey
Copy link
Member

bot:ibm:xl:retest

@hjelmn
Copy link
Member

hjelmn commented Jul 16, 2019

@jsquyres Ok, but we need to carefully document how submodules are used and how they affect OMPI development.

As for OPAL. If we are indeed finally splitting that out I will work on the opal library context stuff. It made no sense while opal was completely internal. Will probably be an interesting topic for a face-to-face meeting as we will likely have to change some MCA semantics and bump the MCA major version.

@gpaulsen
Copy link
Member

I see that the url = https://github.com/open-mpi/hwloc.git in .gitmodules uses "https". This is good. Github actually recommends using "https" url for read-only access to submodules.

@hppritcha
Copy link
Member

bot:ompi:retest

@rhc54
Copy link
Contributor

rhc54 commented Aug 27, 2019

bot:asw:retest

@rhc54
Copy link
Contributor

rhc54 commented Aug 27, 2019

bot:ompi:retest

@hppritcha
Copy link
Member

ompi:bot:retest

to see what's happening here.

@hppritcha
Copy link
Member

I'll take a shot at seeing what's going on with jenkins and this PR.

@bwbarrett
Copy link
Member

bot:ompi:retest

@open-mpi open-mpi deleted a comment from ibm-ompi Dec 4, 2019
@open-mpi open-mpi deleted a comment from ibm-ompi Dec 4, 2019
@open-mpi open-mpi deleted a comment from ibm-ompi Dec 4, 2019
@open-mpi open-mpi deleted a comment from ibm-ompi Dec 4, 2019
@bwbarrett
Copy link
Member

bot:retest

Ok, it looks like the community and IBM tests pass. One last time for MLNX, since the results expired out.

@artemry-nv
Copy link

bot:mellanox:retest

@artemry-nv
Copy link

Mellanox OpenMPI CI configs have been updated to use git submodules.

@open-mpi open-mpi deleted a comment from ibm-ompi Dec 24, 2019
@open-mpi open-mpi deleted a comment from ibm-ompi Dec 24, 2019
@jsquyres
Copy link
Member Author

bot:ompi:retest

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
@jsquyres jsquyres force-pushed the pr/make-hwloc201-tarball-a-submodule branch from c82c39d to 76d801e Compare December 24, 2019 18:55
@jsquyres
Copy link
Member Author

I've updated this PR and some surrounding periphery:

  1. Renamed the hwloc201 component to hwloc2 (since it can now be used to represent any hwloc v2.x.y, depending on where we point the submodule)
  2. Initially replaced the hwloc subdir with a submodule pointing to the hwloc repo, at the tag hwloc-2.0.1
  3. Just to prove that it was possible, added a commit which updated the submodule to point to hwloc-2.0.4 (i.e., the current release in the hwloc v2.0.x series)
  4. Added one more commit to update the submodule to point to hwloc-2.1.0 (i.e., the most recently release)

Also:

@open-mpi open-mpi deleted a comment from ibm-ompi Dec 24, 2019
@open-mpi open-mpi deleted a comment from ibm-ompi Dec 24, 2019
@open-mpi open-mpi deleted a comment from ibm-ompi Dec 24, 2019
@open-mpi open-mpi deleted a comment from ibm-ompi Dec 24, 2019
@jsquyres jsquyres force-pushed the pr/make-hwloc201-tarball-a-submodule branch from 76d801e to 99c78b0 Compare December 24, 2019 19:34
@jsquyres jsquyres changed the title hwloc201: use a git submodule hwloc v2.1.0: use a git submodule Dec 24, 2019
Rename the component to be "hwloc2" (since it can now be any v2.x.y
version of hwloc), and make the embedded copy of hwloc be a git
submodule.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
@jsquyres jsquyres force-pushed the pr/make-hwloc201-tarball-a-submodule branch from 99c78b0 to a2a9a95 Compare December 25, 2019 00:01
@open-mpi open-mpi deleted a comment from ibm-ompi Dec 25, 2019
@open-mpi open-mpi deleted a comment from ibm-ompi Dec 25, 2019
@jsquyres
Copy link
Member Author

jsquyres commented Jan 2, 2020

bot:ibm:retest

@open-mpi open-mpi deleted a comment from ibm-ompi Jan 2, 2020
@open-mpi open-mpi deleted a comment from ibm-ompi Jan 2, 2020
@bwbarrett bwbarrett self-requested a review January 7, 2020 16:23
@jsquyres
Copy link
Member Author

jsquyres commented Jan 7, 2020

Per discussions 6/7 Jan 2020, @bwbarrett will take responsibility for merging this PR, because the introduction of a git submodule will almost certainly break something in the nightly tarball build process.

@bwbarrett will merge this PR when he has a little time to also fix the nightly tarball build process.

@bwbarrett bwbarrett merged commit f853971 into open-mpi:master Jan 8, 2020
@jsquyres jsquyres deleted the pr/make-hwloc201-tarball-a-submodule branch January 13, 2020 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants