Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build executables in stack_snapshot and use them as tools #1306

Closed
aherrmann opened this issue Apr 20, 2020 · 3 comments · Fixed by #1305
Closed

Build executables in stack_snapshot and use them as tools #1306

aherrmann opened this issue Apr 20, 2020 · 3 comments · Fixed by #1305

Comments

@aherrmann
Copy link
Member

aherrmann commented Apr 20, 2020

Is your feature request related to a problem? Please describe.

Some of the packages defined by a call to stack_snapshot depend on executables (tools) that themselves are Hackage packages. Examples include libraries like language-c which depends on alex and happy, or grpc-haskell-core which depends on c2hs. Furthermore, some packages provide executables that the user of stack_snapshot will likely wish to invoke directly, e.g. doctest or ghcide.

Since #907 stack_snapshot provides the tools attribute which allows to pass executables that are built outside of stack_snapshot in to be available as build tools for the packages defined by stack_snapshot. This approach is very limited:

  1. Every package depends on the same set of tools, which means that changing any tools or adding additional tools requires a full rebuild of all packages.
  2. This does not scale well for tools that appear in the middle of the dependency graph. c2hs is such an example, it itself depends on Hackage packages like language-c, which in turn depends on alex and happy. I.e. to build a package that depends on c2hs, say haskell-grpc-core we need to provide alex and happy, then a stack_snapshot for c2hs dependencies, then build c2hs itself, and then finally a stack_snapshot for haskell-grpc-core and other packages. I.e. in a typical use-case we end up with external workspaces @alex, @happy, @c2hs_deps, @c2hs, and @stackage.
  3. This is very inconvenient for packages with many dependencies that provide executables. E.g. ghcide has many dependencies and provides a library and an executable. If we wish to build it with stack_snapshot we need to define a stack_snapshot exporting ghcide the library and all dependencies of ghcide the binary, and then define a separate external workspace for the ghcide binary where we import all these dependencies. This incurs a lot of boilerplate and we have to be careful to keep the versions/revisions in sync between stack_snapshot and the http_archive used for the haskell_cabal_binary call. Alternatively, we could build both the library and binary in one http_archive but then we have to manually perform the dependency resolution for ghcide.

Describe the solution you'd like

I'd like stack_snapshot to build both library and executable components of packages and integrate them into the dependency graph as appropriate. I.e. we shouldn't have to provide alex, happy, c2hs, ... to the tools attribute of stack_snapshot. And we shouldn't have to resort to intermediate stack_snapshot invocations as required for tools like c2hs.

As of now stack does not provide us any information about the components provided by a package. And we would prefer to avoid implementing a Cabal file parser in Starlark. For that reason I'd propose to have users define the components of the packages they expect to be built, with a predefined set for common packages like alex, happy, .... In the end this would look something like this

stack_snapshot(
    name = "stackage",
    packages = [
        "haskell-grpc-core",
        "ghcide",
    ]
    components = {
        "ghcide": ["lib", "exe"],  # Where `exe` is short for `exe:<package_name>`
        # The following are common and included in `rules_haskell`
        # "alex": ["exe"],
        # "happy": ["exe"],
        # "c2hs": ["exe"],
    },
)

Describe alternatives you've considered

  • Hazel used to parse Cabal files to automatically determine which components are available. However, this introduces a fair amount more complexity as seems necessary that we would prefer to avoid. In any case this only addresses the question of whether the components attribute should be user defined or inferred.
  • stack ls dependencies json could possibly be patched to include information on package components. However, it is unclear how much effort that would require. Also, parsing JSON into Starlark is not without issues either. As above this only addresses whether components should be user defined or inferred.

Additional context

@aherrmann
Copy link
Member Author

Addressing the comment in #1303 (comment)

Arguably binaries should go in a separate namespace, because their names could clash with a library name. I'd have,

@stackage//:doctest
@stackage-bin//:doctest

That would work around name clashes, however, I'm not sure it's worth the complexity. With libraries and binaries in separate workspaces we would have to fetch the Cabal sources twice, which seems wasteful, and would have to generate criss-cross dependencies between these two workspaces for cases like c2hs.

Which is easier to remember than any mangling of the binary name. Now, you could ask, what happens if two packages export a binary with the same name? That's a situation Cabal doesn't handle well, but we can do better. For some reason that I don't remember, we chose to have @stackage//:doctest instead of @stackage//doctest:doctest (I think this was discussed in the original ticket). However, if we switched to the latter, then we'd have a reasonable scheme for both binaries and packages that export multiple libraries (which Cabal now allows). We'd have e.g.

@stackage//doctest:doctest
@stackage//doctest/exe:doctest

Sure, that would be an option. It's worth pointing out that some of these targets would have to be aliases, if we don't want to duplicate the package sources multiple times. Currently stack_snapshot fetches the doctest sources into a directory doctest-0.16.3. This is fine, because at the moment the corresponding haskell_cabal_library is defined at the top-level and can take srcs = glob(["doctest-0.16.3/**"]). The package hierarchy matters because Cabal has to build in the directory containing the .cabal file. Just as one possible approach, with the layout proposed above we could put the sources into doctest/src, then define all targets in the doctest package, i.e. @stackage//doctest:doctest and @stackage//doctest:doctest_exe_doctest, but then define friendlier exe aliases @stackage//doctest/exe:doctest.

Doesn't look so nice. But how about,

@stackage//doctest:doctest
@stackage//doctest:another-public-library
@stackage-bin//doctest:doctest

As mentioned above, I'm not convinced that separating binaries and libraries into different workspaces doesn't introduce more problems than it solves.

@mboes
Copy link
Member

mboes commented May 27, 2020

That would work around name clashes, however, I'm not sure it's worth the complexity. With libraries and binaries in separate workspaces we would have to fetch the Cabal sources twice, which seems wasteful,

That's not the only way to do it. Assuming we have a way to determine which packages have executables (this is a problem either way, and contingent on commercialhaskell/stack#5275), then e.g. stackage-bin//:alex or stackage-bin//:hspec-discover can be aliases to internal targets not exposed to the user. These internal targets can have whatever name we like, and it no longer matters how horrible they look (like stackage//hspec-discover/exe:hspec-discover). The targets can even have names mangled so badly as to be unrecognizable. The user only needs to know the alias.

and would have to generate criss-cross dependencies between these two workspaces for cases like c2hs.

This is again not necessarily the case I believe, because stackage-bin//... can be aliases. But even if it were the case, this should be fine for Bazel. Repositories are only a way to partition source code, but the dependency graph is allowed to cut across all repositories arbitrarily.

Just as one possible approach, with the layout proposed above we could put the sources into doctest/src, then define all targets in the doctest package, i.e. @stackage//doctest:doctest and @stackage//doctest:doctest_exe_doctest, but then define friendlier exe aliases @stackage//doctest/exe:doctest.

... or exe aliases elsewhere. That looks good to me: mangled target names internally if that's what's necessary for technical reasons, and aliases to give short, memorable names to the user.

As mentioned above, I'm not convinced that separating binaries and libraries into different workspaces doesn't introduce more problems than it solves.

Fair enough. I think the problems that it introduces are essentially vacuous, because AFAICS these can be aliases. And the interface to the user looks quite a bit nicer to me. No mangling, which requires to learn and to remember the mangling scheme, and no unholy mix of / and : that are all too easy to swap around. You want a binary? Sure! In the common case it has a simple name and unambiguous name: @stackage-bin//hspec-discover. Only binaries in non-eponymous packages have longer names.

@aherrmann
Copy link
Member Author

That's not the only way to do it. Assuming we have a way to determine which packages have executables (this is a problem either way, and contingent on commercialhaskell/stack#5275), then e.g. stackage-bin//:alex or stackage-bin//:hspec-discover can be aliases to internal targets not exposed to the user.

Yes, with aliases we are free to expose libraries and binaries to the user under any name.

Just as one possible approach, with the layout proposed above we could put the sources into doctest/src, then define all targets in the doctest package, i.e. @stackage//doctest:doctest and @stackage//doctest:doctest_exe_doctest, but then define friendlier exe aliases @stackage//doctest/exe:doctest.

... or exe aliases elsewhere. That looks good to me: mangled target names internally if that's what's necessary for technical reasons, and aliases to give short, memorable names to the user.

Great, #1303 is in line with that approach.

As mentioned above, I'm not convinced that separating binaries and libraries into different workspaces doesn't introduce more problems than it solves.

Fair enough. I think the problems that it introduces are essentially vacuous, because AFAICS these can be aliases. And the interface to the user looks quite a bit nicer to me. No mangling, which requires to learn and to remember the mangling scheme, and no unholy mix of / and : that are all too easy to swap around. You want a binary? Sure! In the common case it has a simple name and unambiguous name: @stackage-bin//hspec-discover. Only binaries in non-eponymous packages have longer names.

Sounds good to me.

On a side-note, aliases are not without cost either. In particular, bazel query interacts poorly with aliases as it considers aliases targets of their own and last I checked there's no flag to make it resolve aliases during query. For example the vendored_packages aliases that we produce in stack_snapshot break hrepl. To avoid breaking hrepl more we should try to keep the haskell_cabal_library targets in their current place and only use aliases for executables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants