Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for splitting linker invocation to a second execution of rustc #64191

Open
alexcrichton opened this issue Sep 5, 2019 · 55 comments
Labels
A-linkage Area: linking into static, shared libraries and binaries C-feature-request Category: A feature request, i.e: not implemented / a PR. I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@alexcrichton
Copy link
Member

alexcrichton commented Sep 5, 2019

This issue is intended to track support for splitting a rustc invocation that ends up invoking a system linker (e.g. cdylib, proc-macro, bin, dylib, and even staticlib in the sense that everything is assembled) into two different rustc invocations. There are a number of reasons to do this, including:

  • This can improved pipelined compilation support. The initial pass of pipelined compilation explicitly did not pipeline linkable compilations because the linking step needs to wait for codegen of all previous steps. By literally splitting it out build systems could then synchronize with previous codegen steps and only execute the link step once everything is finished.

  • This makes more artifacts cacheable with caching solutions like sccache. Anything involving the system linker cannot be cached by sccache because it pulls in too many system dependencies. The output of the first half of these linkable compilations, however, is effectively an rlib which can already be cached.

  • This can provide build systems which desire more control over the linker step with, well, more control over the linker step. We could presumably extend the second half here with more options eventually. This is a somewhat amorphous reason to do this, the previous two are the most compelling ones so far.

This is a relatively major feature of rustc, and as such this may even require an RFC. This issue is intended to get the conversation around this feature started and see if we can drum up support and/or more use cases. To give a bit of an idea about what I'm thinking, though, a strawman for this might be:

  1. Add two new flags to rustc, --only-link and --do-not-link.
  2. Cargo, for example, would first compile the bin crate type by passing the --do-not-link flag, passing all the flags it normally does today.
  3. Cargo, afterwards, would then execute rustc again, only this time passing the --only-link flag.

These two flags would indicate to rustc what's happening, notably:

  • --do-not-link indicates that rustc should be creating a linkable artifact, such as a one of the ones mentioned above. This means that rustc should not actually perform the link phase of compilation, but rather it's skipped entirely. In lieu of this a temporary artifact is emitted in the output directory, such as *.rlink. Maybe this artifact is a folder of files? Unsure. (maybe it's just an rlib!)

  • The converse of --do-not-link, --only-link, is then passed to indicate that the compiler's normal phases should all be entirely skipped except for the link phase. Note that for performance this is crucial in that this does not rely on incremental compilation, nor does this rely on queries, or anything like that. Instead the compiler forcibly skips all this work and goes straight to linking. Anything the compiler needs as input for linking should either be in command line flags (which are reparsed and guaranteed to be the same as the --do-not-link invocation) or the input would be an output of the --do-not-link invocation. For example maybe the --do-not-link invocation emits an file that indicates where to find everything to link (or something like that).

The general gist is that --do-not-link says "prepare to emit the final crate type, like bin, but only do the crate-local stuff". This step can be pipelined, doesn't require upstream objects, and can be cached. This is also the longest step for most final compilations. The gist of --only-link is that it's execution time is 99% the linker. The compiler should do the absolute minimal amount of work to figure out how to invoke the linker, it then invokes the linker, and then exits. To reiterate again, this will not rely on incremental compilation because engaging all of the incremental infrastructure takes quite some time, and additionally the "inputs" to this phase are just object files, not source code.

In any case this is just a strawman, I think it'd be best to prototype this in rustc, learn some requirements, and then perhaps open an RFC asking for feedback on the implementation. This is a big enough change it'd want to get a good deal of buy-in! That being said I would believe (without data at this time, but have a strong hunch) that the improvements to both pipelining and the ability to use sccache would be quite significant and worthwhile pursuing.

@alexcrichton alexcrichton added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-compiletime Issue: Problems and improvements with respect to compile times. labels Sep 5, 2019
@alexcrichton
Copy link
Member Author

cc @cramertj and @jsgf, we talked about this at the RustConf and figured y'all would want to be aware of this

@jsgf
Copy link
Contributor

jsgf commented Sep 6, 2019

Would the --only-link literally just invoke ld? If so, it might be useful to just be able to extract whatever extra libraries/options its adding to the link line, so we can independently regenerate the link line.

This would be useful for linking hybrid Rust/C++/(other) programs where the final executable is non-Rust. In other words, we could have C++ depending on Rust without needing to use staticlib/cdylib.

@alexcrichton
Copy link
Member Author

I don't think it'd just be a thin wrapper around ld, no, but it would also prepare files to get passed to the linker. For example when producing a dylib rustc will unpack an rlib and make a temporary *.a without bytecode/metadata. Additionally if performing LTO this'd probably be the time we'd take out all the bytecode and process it. (maybe LTO throws a wrench in this whole thing). Overall though I don't think it's safe to assume that it'll just be ld.

@jsgf
Copy link
Contributor

jsgf commented Sep 7, 2019

Firstly, we'd want the final linker doing LTO in order to get it cross-language, regardless of whatever language the final target is in and what mix of languages went into the target.

Secondly, since Buck has full dependency information, including Rust dependencies on C/C++, it will arrange for all the right libraries to be on the final link line. As a result we never want to use or honor #[link] directive, and our .rlibs don't contain anything to be unpacked.

(Even if that weren't true, at least on Unix systems, the .rlib is just a .a and could be used directly, except perhaps for the extension).

I like this proposal because it allows us to factor out the Rust-specific details from the language-independent ones. For example there's no reason for rustc to implement LTO if we're already having to solve that for other languages - especially when that solution pretty infrastructure-specific (distributed thin LTO, for example). There's also no real reason for us to use staticlib/cdylib if we can arrange for all the Rust linker parameters to be on the final link line, even if the final executable is C++, and it would be a significant code duplication reduction (unless LTO see it and eliminate it, but that's still a compile-time cost).

Ultimately, Rust lives in the world of linkable object files, and a final artifact is generated by calling the linker with a certain set of inputs. Since Rust doesn't have unusual requirements that make it incompatible with C/C++ linkage (eg special linkage requirements or elaborate linker scripts) then the final linker stage could be broadly language agnostic.

@cramertj
Copy link
Member

cramertj commented Sep 9, 2019

+1 to wanting the ability to turn off / disable #[link].

@jonas-schievink jonas-schievink added the A-linkage Area: linking into static, shared libraries and binaries label Sep 10, 2019
@michaelwoerister
Copy link
Member

I'm generally in favor of this. Some thoughts:

  • One of the computationally more heavy things that linking needs is the list of exported symbols (i.e. the linker script). Getting this list involves analyzing the HIR and reading upstream crate metadata. But it should be easy to serialize this information during the --do-not-link step and store it in the .rlink output.
  • Could we move all of LTO out of rustc? That would make things simpler for rustc but probably has some overhead. Also, as far as I know, llvm-ar does not support doing LTO, but for staticlibs one might want to have it (Firefox does this at least).

@alexcrichton
Copy link
Member Author

I don't disagree that y'all's sorts of projects don't want to use the rustc-baked-in LTO, but I don't think we can remove it because many other projects do use it (and rightfully want to). Also this is still just a sort of high-level concept, but if a lot of feature requests are piled onto this it's unfortunately unlikely to happen.

@0dvictor
Copy link
Contributor

Hi, my name is Victor and I'm workin with @tmandry.
I am very interested in this Issue as it potentially enables many other great cool. Therefore, I plan to make a prototype and ultimately implement it. @alexcrichton , do you have any suggestions/tips to start?

@alexcrichton
Copy link
Member Author

Great @0dvictor! The steps I'd recommend for doing this would probably look like:

  • Poke around rustc and the linking phase, and formulate a high-level plan of how you'd like to implement this feature.
  • Confirm with compiler team folks that the plan of action you've got is reasonable. The details probably wouldn't be fully fleshed out, but this is likely to be a large-ish change so the compiler folks will want to be onboard and it's best to start that early.
  • Iterate towards a working prototype (probably with compiler team help)
  • Start writing tests/etc
  • Evaluate with the compiler team at this point if the change needs an RFC or if it's good to land unstable in the compiler

As for the actual change itself I haven't looked too much into this, so I wouldn't know where best to start there.

@tmandry
Copy link
Member

tmandry commented Nov 19, 2019

Some implementation notes:

Recommended reading

Current state of things

The main run_compiler function calls Compiler::link, which ensures there is a codegen step running. It then dispatches to the trait method CodegenBackend::join_codegen_and_link.

For the LLVM backend (which is the only one right now), this method is implemented here. join_codegen_and_link joins all the threads running codegen, which save their results to an object file per thread (often named foo.bar.<hash>-cgu.<threadno>.rcgu.o on linux). The names of these object files are saved in the CodegenResults struct (specifically, look in CompiledModule).

Finally, it calls link_binary which has the main logic for invoking the linker.

Strategy

Obviously, we need to split apart all the code that assumes codegen and linking happen at the same time. This starts with the join_codegen_and_link trait method. Thankfully, it doesn't seem like there is too much code that assumes this, but there's still a question of what to do in the new code.

For the flags, we can start with unstable options (-Z no-link and -Z only-link) and later stabilize them via the RFC process. When the no-link flag is passed, we should not invoke link_binary anymore. When the only-link flag is passed, we need a way of recovering the information that was in our CodegenResults struct so we can call link_binary.

@tmandry
Copy link
Member

tmandry commented Nov 19, 2019

  • In lieu of this a temporary artifact is emitted in the output directory, such as *.rlink. Maybe this artifact is a folder of files? Unsure. (maybe it's just an rlib!)

I don't much like using a directory as output, since some build systems might not support this.

Probably the best thing to do is to make an ar file (which I should note is what an rlib is today). I don't much care what the extension is (I like .rlink, since it probably gets handled differently than rlibs or staticlibs do today).

That said, the choice of extension should be up to whoever is invoking rustc, and we should use the -o flag to decide where our output goes.

I think there might be details we need to pay attention to regarding the linker's default behavior when linking object files vs archive files (like symbol visibility), but not sure what those details are. cc @petrhosek

@bjorn3
Copy link
Member

bjorn3 commented Nov 19, 2019

Bundling it together in a single ar file would do unnecessary work (both IO and CPU) Object files are always written to the disk. When building an ar file, they are copied then copied to the ar file and a symtab is created (ranlib). Creating a symtab can't be avoided if you dont want to unpack the ar file again before linking, as the linker requires a symtab to be present.

@tmandry
Copy link
Member

tmandry commented Nov 19, 2019 via email

@0dvictor
Copy link
Contributor

0dvictor commented Dec 2, 2019

I finally get a prototype working for the first part - generate a linkable object or bitcode file, as well as the linking command to invoke manually to finish linking. In addition, I also successfully ran LTO linking with a native lib.

While starting on the second stage, I found Rust is moving to LLD per #39915. Can I make an assumption that I only need to support LLD or "clang/gcc -fuse-ld=lld"?

@alexcrichton
Copy link
Member Author

While moving to LLD is nice, it's unlikely to happen any time soon, so it's best to not make the assumption of LLD.

@tmandry
Copy link
Member

tmandry commented Dec 6, 2019

@0dvictor As a suggestion, you may want to file a PR that includes only the -Z no-link option while you work on finishing the implementation of -Z link-only. That way you can start getting feedback sooner and stay more in sync with mainline. But you may decide it's not worth the trouble.

rustc flags are in src/librustc/session/config.rs. See e.g. this PR for an example.

@0dvictor
Copy link
Contributor

0dvictor commented Dec 8, 2019

@0dvictor As a suggestion, you may want to file a PR that includes only the -Z no-link option w

Good idea, let me polish my changes and create a PR.

@0dvictor
Copy link
Contributor

Sorry about the delay.

After studying the code and making some experiments, I found all linker arguments comes from the following four sources:

  1. Target specs: e.g. linker, pre/post link args/objects, etc.
  2. CLI of rustc: e.g. -L -C relocation-model, -C link-args, -Z pre-link-args, etc.
  3. Compiled modules, include allocator and metadata when necessary;
  4. Dependent libraries (rlib and native)
  5. Crate source code: i.e. [link_args = "-foo -bar -baz"]

At linking stage, assume user always pass the required CLI arguments:

  • (1) and (2) can be constructed without any information from compiling stage.
  • (3) is the object or bitcode files from compiling state.
    • i.e. --emit=object or --emit=llvm-bc
  • (4) is the information contained in a rmeta file
    • Can be obtained via --emit=metadata
  • (5) is not part of any generated files from compiling stage
    • Need somehow be passed to linking stage from compiling stage

Therefore, in my experiment, to compile without linking is basically:
--emit=metadata,object` or `--emit=metadata,llvm-bc
The user can choose to generate either an object file or LLVM bitcode file.

I have to make the following three changes to get it to work [PR #67195]:

  • Generate and save the .bc/.o/.ll/.s file of the allocator and metadata (if needed) when user requests --emit=llvm-bc/object/llvm-ir/asm
    • [I feel this should be the expected behavior instead of completely ignoring allocator and metadata. I was very confused before knowing the .bc/.o/.ll/.s file does not contain all code compiled from the rust source code.]
  • Write the metadata to file when user requests --emit=metadata even if the OutputType is OutputType::Exe.
    • [I also feel this should be the expected behavior.]
  • Skip linking.

To minimize the impact of existing code, all changes are guarded by -Z no-link.

I have not included (5) yet. My plan is to save it in either the rmeta file, or the bitcode/object file using LLVM’s !llvm.linker.options. I prefer the latter as we can get it for free for targets using LLD. (Of course we still have to generate corresponding linker args for targets do not use LLD)

If we want one single .rlink file, we can ar the .rmeta and .bc/o files generated in compiling stage.

@0dvictor
Copy link
Contributor

Then for the linking stage, I plan to insert code here to read the .rmeta or .rlink file, resolve all dependencies then reconstruct the CodegenResults, so that we can create and execute the linker.

@0dvictor
Copy link
Contributor

Finally, some thoughts on LTO: once this Issue finishes, we should be able to do LTO easily when we use LLD (either directly or via clang/gcc -fuse-ld=lld). I have successfully run LTO linking a rust crate and a llvm bitcode file generated by clang -flto -c. However, LLD only takes uncompressed bitcode files (either by itself or residing inside a .a file). Rust rlib only contains compressed ones, so that we would have to extract and uncompressed them before sending to LLD.

Out of curiosity, why does an rlib contains both native object file and LLVM bitcode file? Is it because the time cost of “LLVM bitcode => native object” is too expensive? Otherwise, we only need to save LLVM bitcode in an rlib and generate native objects when needed.

@mati865
Copy link
Contributor

mati865 commented Dec 10, 2019

Out of curiosity, why does an rlib contains both native object file and LLVM bitcode file? Is it because the time cost of “LLVM bitcode => native object” is too expensive? Otherwise, we only need to save LLVM bitcode in an rlib and generate native objects when needed.

#66961

@tmandry
Copy link
Member

tmandry commented Dec 13, 2019

  • Generate and save the .bc/.o/.ll/.s file of the allocator and metadata (if needed) when user requests --emit=llvm-bc/object/llvm-ir/asm

    • [I feel this should be the expected behavior instead of completely ignoring allocator and metadata. I was very confused before knowing the .bc/.o/.ll/.s file does not contain all code compiled from the rust source code.]
  • Write the metadata to file when user requests --emit=metadata even if the OutputType is OutputType::Exe.

    • [I also feel this should be the expected behavior.]
  • Skip linking.

To minimize the impact of existing code, all changes are guarded by -Z no-link.

Yeah, those should probably be the default. Would you mind opening an issue to track this?

@jsgf
Copy link
Contributor

jsgf commented Feb 14, 2020

The splitting is mostly meant to be able to perform pipelined compilation on the final artifact, not to perform it codegen and linking on different machines. What is even the benefit of performing codegen and linking on different machines by the way? It doesn't increase parallelism, as the linking has to wait on codegen anyway.

Sure, but from our point of view we want to break the build up into atomic actions with well-defined inputs and outputs and then be able to freely schedule them across a build cluster. I don't want to have to treat Rust build + link as a special case - adding a constraint that they have to execute on the same machine would make it much harder to schedule.

But I think if we can use a thin ar to logically bundle them all up, it will be managable.

This will never happen. There are many implementation details written to .rlink. For example:

OK, so I think there's too much stuff in the .rlink file then. If we assume that the no-link and link-only invocations are passed an identical set of flags (aside from the -Z option itself), then all the other details are regeneratable and don't need to be in the .rlink. The .rgcu.o files are the only thing that's uniquely useful to pass from the first invocation to the second.

Also why not create one rust library that depends on all rust dependencies and then depend on that rust library from C++?

That doesn't scale. There could be hundreds of C++ libraries linked in, any of which could be using some combination of Rust libraries.

@bjorn3
Copy link
Member

bjorn3 commented Feb 14, 2020

then all the other details are regeneratable and don't need to be in the .rlink

No, link args can come from #[link] attributes on extern blocks. Because of proc-macros those can be generated non-deterministically.

@jsgf
Copy link
Contributor

jsgf commented Feb 14, 2020

No, link args can come from #[link] attributes on extern blocks. Because of proc-macros those can be generated non-deterministically.

Fair enough - they can be encoded in the .rlink too. But as they won't work in our environment we'll want to ignore them or complain if they're present (that would effectively be a proc-macro making up a dependency the build system doesn't know about which is a big nono - and if the build system does know about it, we don't need the #[link]).

@tmandry
Copy link
Member

tmandry commented Feb 15, 2020

What is even the benefit of performing codegen and linking on different machines by the way? It doesn't increase parallelism, as the linking has to wait on codegen anyway.

Linking, in theory, depends on a lot more artifacts than codegen does. Codegen should only require source code and the rmeta files from any crates you depend on. Linking requires all the generated code. In our sccache-like environment, this would mean uploading many rlib files and possibly system libraries to the worker. Network bandwidth becomes a bottleneck. So it's much better to send compile steps to the workers, hitting cache when possible, and do linking locally.

That would require making the linker arguments stable, as changes could break such thing.

Link args don't need to be stable, just the file format which contains them. I don't think the fact that the file contains references to implementation details like compiler-builtins is a problem, actually. As long as those details can change without changing the schema, a well-written tool should be able to consume them without breakage.

That said, stabilizing rlink seems more ambitious than having a rustc option which spits out the final linker line, allowing you to run it yourself.

@0dvictor
Copy link
Contributor

0dvictor commented Apr 8, 2020

Apologize for my such late reply.

After experimenting the archiving approach (using ar) to collect the linker's input files, I found there are several drawbacks, which made me believe ar is not the correct approach:

  • Generating a regular .a file requires extra storage, CPU, and IO resources
    • Effectively doubles the peak storage and IO usages
    • What is worse: when crate-type is a rlib or staticlib, the .a file must be extracted before passing to the linker
  • Creating "thin archive" does not solve the fundamental issue that a distributed build system has to deal with unknown number of files
  • Not all platform supports "thin archive"
    • E.g. I don't think msvc's link.exe supports it

Similarly, simply archiving the files with tar or zip or even somehow embedding them into a .rlink file would not be ideal either.

Therefore, I am experimenting a different approach: after finishing LLVM's optimizations, linking all CGUs into a combined one using llvm::Linker; so that only one .o file is generated for the crate. (Of course, there will be potentially another two .o files: the allocator and the metadata). Therefore, the distributed build system only has to deal with a known number of .o files: one for the main crate, zero or one for the allocator, and zero or one for the metadata. If -Z human-readable-cgu-names is also enabled, these .o files can be easily identified.
The main benefits comparing to the archiving approach would be:

  • Platform independent
  • Zero change required on linker side
  • No extra storage or IO cost
    • Not reading/writing more data from/into disk
  • Unlikely to increase rustc's peak memory usage
    • Linking with llvm::Linker is expected to use much less memory than optimizations; therefore, performing linking after optimizations is unlikely to exceed the memory footprint during optimization stage.
  • Bonus: also enables parallelism for --emit [llvm-bc|llvm-ir|asm|obj]

WDYT?

@bjorn3
Copy link
Member

bjorn3 commented Apr 8, 2020

Therefore, I am experimenting a different approach: after finishing LLVM's optimizations, linking all CGUs into a combined one using llvm::Linker

That won't work for non LLVM based backends.

Linking with llvm::Linker is expected to use much less memory than optimizations; therefore, performing linking after optimizations is unlikely to exceed the memory footprint during optimization stage.

What about a crate with many codegen units? During optimizations only a few codegen units are in memory at any time, while during linking with llvm::Linker, all of them have to be in memory. Also llvm::Linker links bitcode I believe, which means that all passes after the user facing LLVM-ir are forced to be done in a single thread after the linking. Especially in debug mode I expect this to be a significant performance regression.

@0dvictor
Copy link
Contributor

That won't work for non LLVM based backends.

Correct, but we can implement similar, if not the same, way to combine the CGUs for that non-LLVM backend.

What about a crate with many codegen units? During optimizations only a few codegen units are in memory at any time, while during linking with llvm::Linker, all of them have to be in memory. Also llvm::Linker links bitcode I believe, which means that all passes after the user facing LLVM-ir are forced to be done in a single thread after the linking. Especially in debug mode I expect this to be a significant performance regression.

Great points. I should've been clear that this proposed feature would be guarded by an option, say -Z combine-cgus, and even if the feature is stablized, user should be able to turn if on/off; so that user is able to pick the best behavior that fits their usecases. In other words, the new feature should not introduce regressions to existing workloads.

Regardless of splitting out the linker invocation, Being able to generate a single or defined number of .o files eases many tasks. The distributed compiling tool is one example: the tool would not have to parse the .rlink and/or other files to figure out the undefined number of files need be copied from the server to the client. This also means rustc does not have to provide a stable .rlink file format.
Other examples include tasks that run with --emit [llvm-bc|llvm-ir|asm|obj]. As these workloads require a single .[bc|ll|asm|o] output file, currently they are done in a single thread. Enabling parallelism for these tasks should yield a good performance boost.

@tmandry
Copy link
Member

tmandry commented Apr 14, 2020

Can llvm::Linker not link both bitcode and object code? IIUC, it would indeed regress performance quite a bit in that case, since LLVM IR -> Object code is one of the slowest compilation steps.

@0dvictor, can we compare the build time using the flag in your working branch versus -Ccodegen-units=1, versus a baseline? That would help indicate if this flag is worth maintaining.

@0dvictor
Copy link
Contributor

Can llvm::Linker not link both bitcode and object code? IIUC, it would indeed regress performance quite a bit in that case, since LLVM IR -> Object code is one of the slowest compilation steps.

Unfortunately, llvm:Linker is only for linking bitcode.

can we compare the build time using the flag in your working branch versus -Ccodegen-units=1, versus a baseline? That would help indicate if this flag is worth maintaining.

Good idea, let me do that.

@adetaylor
Copy link
Contributor

@jsgf The project I'm working on seems to share similar properties to yours:

  • Using a non-Cargo build system based on static dependency resolution (and has 20000+ targets)
  • Final linking performed by an existing C++ toolchain
  • A few Rust .rlibs scattered throughout a very deep dependency tree, which may eventually roll up into one or multiple binaries

We can't:

  • Switch from our existing linker to rustc for final linking. C++ is the boss in our codebase; we're not ready to make the commitment to put Rust in charge of our final linking.
  • Create a Rust static library for each of our .rlibs. This works if we're using Rust in only one place. For any binary containing several Rust subsystems, there would be binary bloat and often violations of the one-definition-rule
  • Create a Rust static library for each of our output binaries. The build directives for the Rust .rlibs don't know what final binaries they'll end up in; and the build directives for the final binaries don't know what .rlibs they've pulled in from deep in their dependency tree. Our build system forbids that sort of global knowledge, or it would be too slow across so many targets.
  • Create a single Rust static library containing all our Rust .rlibs. That monster static library would depend on many C++ symbols, so each binary would become huge (and in fact not actually link properly)

Therefore we want to link rlibs directly into our final linker invocation. This is, in fact, what we're doing, but we have to add some magic:

  • The final C++ linker needs to pull in all the Rust stdlib .rlibs, which would be easy apart from the fact they contain the symbol metadata hash in their names.
  • We need to remap __rust_alloc to __rdl_alloc etc.
  • In future the final rustc-driven linker invocation might add extra magic.

Therefore one of the solutions you and @tmandry suggest would be awesome: either stabilizing .rlink (which sounds unlikely) or having some official way to build a linker command line which is not under the control of rustc (-Zbinary-dep-depinfo helps a bit).

I think, though, that this request is a little bit orthogonal to this issue. I wonder if we should submit a new issue? I think it's quite a big request.

@tmandry
Copy link
Member

tmandry commented May 1, 2020 via email

@jsgf
Copy link
Contributor

jsgf commented May 15, 2020

This issue came up in a discussion about @dtolnay's cxx which wants to be able to describe callback relationships between Rust and C++ code - eg Rust calls C++, which then calls back into Rust. If you're using linker symbols to resolve these calls (vs runtime indirect function calls via pointers), then that effectively means you have a rust.o and cxx.o with mutual references.

AFAIK the only way to correctly link this is with something like --start-group rust.o cxx.o --end-group on the linker command line. If rustc is driving the linking process, then it would need to understand this concept, which I think it out of scope for it. Alternatively, if the build system understands these kinds of relationships, then it already knows how to invoke the linker this way. (Moot for Cargo, since it doesn't have a way to model C++ dependencies on Rust, and certainly not cyclic ones.)

@joshtriplett
Copy link
Member

I'm interested in the outcome of this. Is it currently in a state where cargo could potentially implement a useful subset of it? Or is there further work needed on the final link step before cargo can do anything?

@jsgf
Copy link
Contributor

jsgf commented Jul 20, 2020

@joshtriplett Kernel?

@0dvictor
Copy link
Contributor

0dvictor commented Aug 3, 2020

I apologize for the long delay in continuing this work. PR #75094 is created to allow generating a single object file for a crate.

Performance was measured by compiling rustc itself:

-Ccodegen-units=1 [DEFAULT] -Zcombine-cgu
raw comp time (second) 987 294 303
relative perf to -Ccodegen-units=1 (%) 100 29.79 30.70
relative perf to default (%) 335.71 100 103.06
relative perf to -Zcombine-cgu (%) 325.74 97.03 100

Though -Zcombine-cgu is around 3% slower than the default configuration, it is still a significant improvement comparing to the single threaded case, which would significantly improve the performance of --emit [llvm-bc|llvm-ir|asm|obj].

tmandry added a commit to tmandry/rust that referenced this issue Sep 9, 2020
Add `-Z combine_cgu` flag

Introduce a compiler option to let rustc combines all regular CGUs into a single one at the end of compilation.

Part of Issue rust-lang#64191
@jsgf
Copy link
Contributor

jsgf commented Jun 6, 2021

I tried using the -Zno-link/-Zlink-only mechanism in earnest, and unfortunately I think its deficient in a number of ways. I have some thoughts on a simpler to use mechanism. More detail: https://internals.rust-lang.org/t/alternative-approach-to-zno-link-zlink-only-split-linking/14842

bors added a commit to rust-lang-ci/rust that referenced this issue Feb 9, 2022
…twco,bjorn3

Store rlink data in opaque binary format on disk

This removes one of the only uses of JSON decoding (to Rust structs) from the compiler, and fixes the FIXME comment. It's not clear to me what the reason for using JSON here originally was, and from what I can tell nothing outside of rustc expects to read the emitted information, so it seems like a reasonable step to move it to the metadata-encoding format (rustc_serialize::opaque).

Mostly intended as a FIXME fix, though potentially a stepping stone to dropping the support for Decodable to be used to decode JSON entirely (allowing for better/faster APIs on the Decoder trait).

cc rust-lang#64191
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-linkage Area: linking into static, shared libraries and binaries C-feature-request Category: A feature request, i.e: not implemented / a PR. I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests