Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Depend on conda-forge channel by default. #1388

Closed
wants to merge 1 commit into from

Conversation

johanneskoester
Copy link
Contributor

  • I have read the guidelines above.
  • This PR adds a new recipe.
  • This PR updates an existing recipe.
  • This PR does something else (explain below).

This PR enables the conda-forge channel. This will remove some of the workload for maintaining general purpose packages for us. We can still commit general purpose recipes to bioconda, but we should aim to move them to conda-forge at some point. Also, we should start an effort to remove duplication between the two channels.

@bioconda/core please let me know if you are ok with merging this PR.

@johanneskoester johanneskoester mentioned this pull request Apr 25, 2016
4 tasks
@chapmanb
Copy link
Member

Johannes;
Awesome, I'm excited about integrating with conda-forge. I played with this some while trying to work on the variantbam recipe since conda-forge has the up to date icu package that it needs. Although that fixed icu, the build ended up breaking because of problems with the gcc pulled in via conda-forge. I get:

gcc install OS matches gcc build OS: Skipping post-link portability fixes.
/anaconda/envs/_build/gcc/libexec/gcc/x86_64-unknown-linux-gnu/4.8.5/cc1: error while loading shared libraries: libisl.so.10: cannot open shared object file: No such file or directory
Installation failed: gcc is not able to compile a simple 'Hello, World' program.
Error: post-link failed for: gcc-4.8.5-3

I'm not sure the root cause of this as it's beyond my level of gcc expertise but it might be worth debugging on some gcc requiring recipes prior to merging.

@tomkinsc
Copy link
Member

This is great. It always felt a bit wrong to commit general-purpose packages to bioconda rather than some, other, general-purpose channel. It's great conda-forge can be that channel.

@bgruening
Copy link
Member

As already discussed strong 👍 from me.
xref: conda-forge/staged-recipes#299

I expect also some recipes to fail, do we have a plan how to deal with them? Rebuilding everything would be one option but last time I tried that I found many tarballs missing resulting in many false-positives.

@daler
Copy link
Member

daler commented Apr 25, 2016

General purpose on conda-forge sounds great to me, and I agree with Brad and Bjoern about prior testing and planning for failures.

@johanneskoester
Copy link
Contributor Author

Right. One possibility would be to try a local rebuild of all packages in our docker container.

@chapmanb that's a bit worrying. I actually thought that they fixed a problem with the gcc in the default channel (see here), and not vice-versa. Are you sure that error occurs when enabling conda-forge?

@chapmanb
Copy link
Member

Johannes;
It does only occur when pulling in the conda-forge channel. Reading through that gcc thread, it is the same problem I saw and it doesn't appear from the comments as if a fix has been put in place. They recommend not using gcc from conda and using the Docker container gcc. I could do this for this particular problem, but if I read it correctly including conda-forge will break any package that uses gcc so could cause issues until fixed.

@johanneskoester
Copy link
Contributor Author

I see, thanks Brad for digging into this! In that case, we should definitely wait until the dust has settled.

@ostrokach
Copy link
Contributor

One thing to keep in mind is that conda-forge builds its Linux packages using a docker container based on CentOS 6.
Adding conda-forge as a dependency would mean we are no longer supporting CentOS 5. I would still be in favour of this...

@croth1 croth1 mentioned this pull request May 7, 2016
4 tasks
@jakirkham
Copy link

jakirkham commented May 7, 2016

Just now seeing this, very exciting everyone. Looking forward to seeing some more new faces at conda-forge. 😄


A few comments on some of these point raised to hopefully clarify some things. Sorry its a bit lengthy, but I hope this helps give a clear picture of these points.

The gcc package still comes from defaults. We don't build our own. Unfortunately, that package has some bad version constraints (in particular isl is unpinned). This has been fixed in conda-recipes, but has not subsequently been rebuilt. Hence why it caused us some issues. We tried to rebuild the gcc package with the correct version constraints, but it was outside of the CI time limits so this was not possible to pursue. Instead, we have pinned this through our copy of cloog instead, which appears to resolve the issue. However, it is really not a proper fix. Also, it is a bit worrisome as channels still have collisions and thus could pick up defaults copy. We have something that seems to work on this front, but it is a bit hacky. Until conda is released with that fix ( conda/conda#2323 conda/conda#2369 ), we are in a potentially tricky situation with gcc. The good news is there has been a beta released which we will be playing with. So, hopefully a release with these fixes with be around to try in the near future. Additionally, we have largely switched to devtoolset-2 as our compiler. So, we are not nearly as exposed to this issue as we once were. The long term plan is to eliminate the gcc package entirely, which will make this a non-issue.

With regards to CentOS 6, we have been using this for some time and have not had issues with people complaining about GLIBC compatibility. We will likely stick with it for the foreseeable future. At this point, there is a lot of inertia that must be overcome to switch to CentOS 5 so the reason to switch would need to be very pressing. As we are approaching 500 packages in conda-forge, this is no small task to rebuild them on CentOS 5 (or even determine if they need to be rebuilt). The other big reason we are unlikely to switch to CentOS 5 is it is approaching its EOL in less than a year. It would be pretty annoying to switch to CentOS 5 and then switch back to CentOS 6. While there have been discussions about switching to CentOS 5 in the past, it is doubtful anything will change here.

TL;DR gcc package issue is basically settled. CentOS 6 will remain for the foreseeable future.

@kyleabeauchamp
Copy link
Contributor

Centos6 is fine with me, I think there are some advantages not going too far back into the past. CUDA, for example, requires Centos6 IIRC.

@bgruening
Copy link
Member

@jakirkham if you don't define dependencies in GCC how do you handle run-time dependencies on libgcc? Or is libgcc recommended to depend on?

Thanks for your input!

@bgruening
Copy link
Member

@jakirkham more concrete if I do not depend on libgcc for some recipes I get a similar error like this:

mothur: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory

And how do you recommend are packages handled that needs a more recent GCC, for example because of new language features?

@ostrokach
Copy link
Contributor

From my experience, libgcc should correspond to the most recent version of gcc used to build one of the installed packages. I have been using libgcc-5 for some time and all the conda packages still seem to work fine (one of my packages required gcc5 to compile).

Maybe someone should be put in charge of building different versions of gcc and libgcc on their local machine, and pushing them to the conda-forge channel? That way people could include the required version as a dependency in each recipe?

@bgruening
Copy link
Member

@ostrokach this is my thinking as well. We should depend on explicit versions of GCC. Explicit is better than implicit - but there seem to be a large agreement to just use whatever is inside the build-box.

@jakirkham
Copy link

jakirkham commented May 11, 2016

@bgruening, et al -- devtoolset-2 does some partial static linking with libstdc++ and friends. This means that final packages are only dependent on the system versions of these libraries. So, as long as one has the appropriate system libraries of that version or newer they will be fine. In our case, this means 4.4.7 or newer. In short, we don't need these libraries.

However, systems may not have libgomp or libgfortran installed by default. In practice, this affects docker containers. We could simply require people install these on their systems. Most clusters have them anyways and installing them on a docker container is trivial. Alternatively, someone could use patchelf or similar to clean this up afterwards and package these system libraries. Though, we have not gone either of these routes yet and have been using the packaged gcc here. We would like to switch away from that, but we need to think a bit about what is the right way to do this.

@johanneskoester
Copy link
Contributor Author

Mhm, I still don't get the whole idea of moving away from a packaged gcc. I
just don't see the advantage. It only seems to cause less flexibility and make
the packages less self-contained. But I might miss something, I am not a
compiler expert.

On May 11 2016, at 10:52 pm, jakirkham <notifications@github.com> wrote:

@bgruening, et al -- devtoolset-2 does some
partial static linking with libstdc++ and friends. This means that final
packages are only dependent on the system versions of these libraries. So, as
long as one has the appropriate system libraries of that version or newer they
will be fine. In our case, this means 4.4.7 or newer. In short, we don't need
these libraries.

However, systems may not have libgomp or libgfortran installed by default.
In practice, this affects docker containers. We could simply require people
install these on their systems. Most clusters have them anyways.
Alternatively, someone could patchelf or similar to clean this up afterwards
and package these system libraries. Though, we have not gone either of these
routes yet and have been using the packaged gcc here. We would like to switch
away from that, but we need to think a bit about what is the right way to do
this.


You are receiving this because you authored the thread.
Reply to this email directly or [view it on
GitHub](#1388 (comment)
8586547)![](https://github.com/notifications/beacon/ABxcVuXRsYRVlV7Kozpb6oSbX1
4slOyGks5qAkGSgaJpZM4IO51-.gif)

@jakirkham
Copy link

jakirkham commented May 14, 2016

Mhm, I still don't get the whole idea of moving away from a packaged gcc. I just don't see the advantage. It only seems to cause less flexibility and make the packages less self-contained. But I might miss something, I am not a compiler expert.

There are a few problems with the packaged gcc.

First, it isn't something we can rebuild on CI. This is actually pretty debilitating because if we discover there is something wrong with the we can't fix it. While the same could be argued about using devtoolset, I have yet to find a bug in devtoolset whereas I know of several we have had to fix with the packaged gcc. Admittedly, many of these have been fixed and the fixes live in conda-recipes, but which fixes have been included in the actual package is difficult to track. It also means that we can't actually build a newer version of the compiler. So, the one thing that this should give us flexibility over, it simply fails to do.

Second, there is no guarantee about what glibc compatibility comes with a packaged gcc. This affects which Linux OSes it can be used on. Though it also gives us two tuning knobs (OS gcc was built under and OS we build packages under) for the exact same parameter when we would rather just have one. Further there is no way for one to know how this first knob (where gcc was built) set as we are unable to build it ourselves. This in turn applies constraints on the second knob (where packages are built), which we may or may not be happy with.

Finally, and this is the real kicker, if we package gcc and ship the libraries with packages we build, we need to make sure that we are using something that is more recent than the end user's system (this was also referenced by @ostrokach above). While some people have argued this problem is not so great, I remain skeptical. For instance, switching to gcc 6 shows some very subtle breaks that require tweaks at the code level to account for. See this bug report for an example ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69550 ). Note Fedora has already started moving to gcc 6.x so this problem will come up sooner rather than later. As a consequence, we will find ourselves needing to try the latest versions of compilers and libgcc to guarantee that we stay above the user's libgcc version, but this will open us up to build very ancient codes in an environment where they may break and require non-trivial debugging and patches. This is something Ubuntu is experiencing. This doesn't feel like something we should wade into given the shear quantity of stuff we are trying to package.

In short, packaging gcc is not really any more flexible and actually makes us vulnerable to a different set of problems that simply don't occur when we use the system compiler or devtoolset.

@jakirkham
Copy link

And how do you recommend are packages handled that needs a more recent GCC, for example because of new language features?

Sorry this one got lost, @bgruening. devtoolset-2 provides support for C++11. In practice, that seems to cover the code we want to package quite well.

If in the future we want to switch to gcc 5.x, there is devtoolset-4 on CentOS 6, which provides gcc 5.2.1. This is something we are open to considering.

@jakirkham
Copy link

In any event, the choice of compiler at conda-forge has no effect on what bioconda does internally. In some cases, when code uses Fortran or OpenMP, we use the gcc package as we want to ship libgfortran and libgomp (this may change in the future). This doesn't seem to cause us any problems.

@jakirkham
Copy link

Response to @bgruening in another comment.

PR against conda-forge would be great indeed.

👍

@jakirkham the problem for us is that we can not depend on conda-forge until we figured out the GCC issue. We would love to migrate a few repositories into the more general conda-forge repo. Can we prioritise it? Google- hangout to discuss it further?

👍 We (conda-forge) have meetings every two weeks on Friday at 1400UTC. We are taking a brief hiatus due to conference season, but our next one is scheduled for 3 June. If that works for people, let's add it to the conda-forge hackpad as an item. If not, let's figure out another time when people can meet.

@jakirkham jakirkham mentioned this pull request May 14, 2016
4 tasks
@johanneskoester
Copy link
Contributor Author

@jakirkham 3 June sounds good. I think I would be able to attend.

I must say that I am still not convinced. Here are my answers to your points from above:

  1. I would really think that if Continuum provides a proper gcc package, there is no need to rebuild it in custom channels. The same should be done for libc (e.g. musl). My feeling is that relying on old compilers and old libc causes various problems. For compilers, old devtoolsets sometimes can cause problems with new code (e.g. c++11). For libc, I am sure that it would be advantageous to be able to depend on a new one from time to time. E.g., I recently encountered in Snakemake development, that the conda Python version is not able to touch symlinks although it is supported by the filesystem. However, the system Python can do that, because it has been built against a newer libc. My feeling is that Continuum should really try to put some manpower on this to try to fix these major issues. Otherwise, it will cause us problems at some point.
  2. Well gcc has been packages by Continuum, right? So it will be their CentOS container. If we can't rely on that, we can't rely on any package from the default channel.
  3. If our gcc built packages properly depend on libgcc, e.g. pinning to >=the version used for building, this problem is solved. Because the package built with the latest gcc will pull in its minimum version. If it is like you say that they are backwards compatible, there is no issue. Also the system you install on (even if it uses a newer gcc) won't have an effect, right?

@jakirkham
Copy link

@jakirkham 3 June sounds good. I think I would be able to attend.

Excellent. It is on the hackpad.

As for the rest of this, will either write a response later or save it for the meeting.

@bgruening
Copy link
Member

@jakirkham

First, it isn't something we can rebuild on CI. This is actually pretty debilitating because if we discover there is something wrong with the we can't fix it. While the same could be argued about using devtoolset, I have yet to find a bug in devtoolset whereas I know of several we have had to fix with the packaged gcc. Admittedly, many of these have been fixed and the fixes live in conda-recipes, but which fixes have been included in the actual package is difficult to track. It also means that we can't actually build a newer version of the compiler. So, the one thing that this should give us flexibility over, it simply fails to do.

I don't think this should prevent us from doing the right thing - what ever this is. I simply ask travis to increase our max build-time and bioconda has now 75min! I also ask for conda-forge/staged-recipes increase in build-time and you have it as well. If you can create a GCC feedstock I'm happy to ask for an increased max buildtime for this repo as well.
Nevertheless, I think this package should be maintained by Continuum :)

@johanneskoester
Copy link
Contributor Author

Björn, that's amazing! Thanks!!

Dr. rer. nat. Johannes Köster

Centrum Wiskunde & Informatica, Amsterdam

Harvard Medical School, Boston

http://johanneskoester.bitbucket.org

On May 25 2016, at 8:53 am, Björn Grüning <notifications@github.com>
wrote:

@jakirkham

First, it isn't something we can rebuild on CI. This is actually pretty
debilitating because if we discover there is something wrong with the we can't
fix it. While the same could be argued about using devtoolset, I have yet to
find a bug in devtoolset whereas I know of several we have had to fix with the
packaged gcc. Admittedly, many of these have been fixed and the fixes live in
conda-recipes, but which fixes have been included in the actual package is
difficult to track. It also means that we can't actually build a newer version
of the compiler. So, the one thing that this should give us flexibility over,
it simply fails to do.

I don't think this should prevent us from doing the right thing - what ever
this is. I simply ask travis to increase our max build-time and bioconda has
now 75min! I also ask for conda-forge/staged-recipes increase in build-time
and you have it as well. If you can create a GCC feedstock I'm happy to ask
for an increased max buildtime for this repo as well.
Nevertheless, I think this package should be maintained by Continuum :)


You are receiving this because you authored the thread.
Reply to this email directly or [view it on
GitHub](#1388 (comment)
1488496)![](https://github.com/notifications/beacon/ABxcVq84HNWjSigcS2jyekVUec
t2Ne4rks5qE_H0gaJpZM4IO51-.gif)

@croth1
Copy link

croth1 commented Sep 9, 2016

Did you make any progress discussing the remaining blockers for the integration of the conda-forge channel lately?

@johanneskoester
Copy link
Contributor Author

Yes. We are about to use CentOS 6 (even their image) for building. The next step will be to activate conda-forge as a channel when building our packages. This will for sure cause some problems. We will see.

@bgruening bgruening closed this Sep 27, 2016
@bgruening bgruening deleted the depend-on-conda-forge branch September 27, 2016 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants