RFC: `#[export]` (dynamically linked crates) #3435

m-ou-se · 2023-05-23T13:06:18Z

bjorn3 · 2023-05-23T13:43:39Z

text/0000-export.md

+```
+_RNvNtC_7mycrate3foo3bar_f8771d213159376fafbff1d3b93bb212
+```


This doesn't seem to be able to handle multiple semver incompatible versions of the same crate. We currently include the StableCrateId in the symbol name for this which includes all -Cmetadata arguments. Cargo uses a separate -Cmetadata value for every crate version.

I'm not sure if we can properly support that. If e.g. libmycrate.so is separately compiled, we cannot (and should not) make the symbols dependent on the crate id it will have in a dependency tree of whatever binary will use this library later.

Perhaps we want to include just the major version in the symbol names, but not everyone uses versions (a cargo concept) or follows semver, nor does the version of the public API need to match the public of the stable ABI.

Maybe we need an crate-global attribute to optionally specify the ABI version (or even just a random hash) for the crate, to allow for symbols with identical paths and signatures of multiple identically named crates to exist at once, although I imagine most use cases would not use e.g. two different versions of librustls.so at the same time. (If that's a supported use case, they should probably be called librustls1.so and librustls2.so, using rustls1 and rustls2 as the name used in the symbols rather than just rustls.)

The StableCrateId does not depend on any dependencies. It only depends on the crate name, if it is a lib or bin and all -Cmetadata arguments (in sorted order). The SVH (crate hash) does depend on dependencies. In addition the current cargo impl hashes dependency versions into the -Cmetadata argument it passes, but that can easily be changed to only pass the major part of the semver version.

That could work, but I'm not convinced the API version and the ABI version should be treated as one and the same. Renaming structs or non-exported pub functions/methods etc. etc. could result in an incompatible API while preserving a compatible ABI. And in the other direction, a change in invariants or private fields could result in an incompatible ABI while preserving a compatible API.

As I mentioned above, it seems a weird situation if a binary somehow ends up depending on two different librustls.so, considering they'd both have to have the same file name. (And if those are called librustls1.so and librustls2.so, then the rustls1 and rustls2 names are probably what should be used in the symbol names rather than just rustls. And that could just be a build option to cargo or a property in Cargo.toml.)

These arguments actually bring into question whether there should be a supported ABI version in Cargo as well. Since Cargo already reconciles latest-compatible updates for APIs, why shouldn't it do the same with ABIs too?

bjorn3 · 2023-05-23T13:44:37Z

text/0000-export.md

+to avoid building unnecessary indirect dependencies.
+A system for that has been proposed in [RFC 1977](https://rust-lang.github.io/rfcs/1977-public-private-dependencies.html).
+
+### Name Mangling and Safety


What will the format of the crate metadata used to lookup symbol names and function signatures be? Will it be some stable binary format or a rust like syntax that users can directly write? The former is faster, but the later is easier to avoid breaking across rustc versions and makes it easier to prevent the user accidentally breaking the ABI by requiring the interface file to be modified for this to happen. Also will dylibs embed the crate metadata as a whole, just a hash to check compatibility or not at all.

The exact hashing algorithm is something to be designed later (hopefully in an incremental way so we can support e.g. i32 before we support more complicated types).

Ideally it should be specified in a way that it can also be generated by other tools that are not rustc.

Dynamic libraries will include these hashes in the symbol names. Including more information can be useful for debugging tools, but is not necessary. (See the "future possibilities" section.)

text/0000-export.md

sdroege · 2023-05-23T13:56:37Z

This generally looks great and useful, but there's one aspect that seems a bit strange to me. From my understanding, to link to such a shared crate you would need to have either (a version of) its source code available or some kind of types-and-functions-without-function-bodies version of it (i.e. something like the Rust version of a C header file).

Did you consider including the API of the crate in a way that rustc can understand inside the shared library, e.g. in a separate section, or as a separate interface definition file? That way the shared library build results would be all that is necessary to link to the library, and for example cargo (and other build systems) could later get a mechanism to look for system installed shared libraries in that format and directly make use of them.

m-ou-se · 2023-05-23T14:03:16Z

@sdroege That's certainly a possibility and not incompatible with this RFC. The "future possibilities" section already mentions the possibility of adding more information in a separate (optional / debug) section, and what you're proposing seems to be a form of that.

something like the Rust version of a C header file

Yes, I'm basically proposing that we can have #[export] attributes rather than a separate header file (like C) or a bindings crate (Rust today).

sdroege · 2023-05-23T14:09:16Z

Sounds good to me, thanks for the fast response :) Maybe you could extend the "A tool to create a stripped, 'dynamic import only' version [...]" item in the "future possibilities" section with a sentence that this could also be as a binary format as part of a shared library section or so.

Such a mechanism would also be useful to allow non-Rust languages to link to such shared libraries without having to parse Rust code or requiring tools for converting the API definitions between any possible source and target language.

kornelski · 2023-05-23T14:27:21Z

How about flipping the default and exporting all the same symbols as static linking does, and provide an opposite attribute to opt out? (#[unstable_abi] or such)

I imagine once Rust ABI situation matures, users will want to support both kinds of linkage without having to manage the public API twice, so the explicit opt in useful initially may be a liability long term. Similarly Rust was conservative with #[must_use] which causes clutter and is inconsistently applied, unlike Swift's discatdableResult.

m-ou-se · 2023-05-23T14:31:22Z

@kornelski you can do that by adding a #![export] once globally to your crate. It works recursively over pub items, as explained in the RFC.

For the forseeable future, we're not going to be able to export fully generic functions (e.g. like std::option::Option::map), so I doubt most crates would be using this for their entire interface.

Also, for any type with invariants (types with private fields), we can't safely mark those as exported, without the author unsafely committing to keeping the invariants stable. (See the section on private types in the RFC.)

m-ou-se · 2023-05-23T14:35:38Z

I imagine once Rust ABI situation matures, users will want to support both kinds of linkage without having to manage the public API twice

I think that's inevitable. Basically, pub is about an API commitment, e.g. File::open(path) always existing, while #[extern] is about an ABI commitment, e.g. File always being a wrapper around a file descriptor. We can't really change pub to suddenly also include the ABI commitment, which includes committing to not changing any invariants about the private fields.

text/0000-export.md

m-ou-se · 2023-05-23T14:59:51Z

@sdroege Thanks! Updated!

jtracey · 2023-05-23T15:42:58Z

text/0000-export.md

+2. Make it the responsibility of the loader/linker.
+
+Option (1) simply means making everything `unsafe`, which isn't very helpful to the user.
+Option (2) means the loader (the part that loads the dynamic library at runtime) needs to perform the checks.


Do you mean dynamic linker here, or is this to say that dynamic linking of rust libs would be skipped at initialization, and evaluated lazily by the loader? On that note, I don't see any mention of dynamic loading rust libs anywhere; maybe it's out of scope, but seems like it should be mentioned, if only as future work.

Yes, I used "loader" as synonym for "dynamic linker" (e.g. /lib64/ld-linux-x86-64.so.2 or w/e your system uses). The part of the operating system that loads dynamic libraries and resolves symbols when executables are loaded.

Yeah they're typically the same file in practice, I just mean to ask whether libraries would be linked/loaded before or after .init.

Amanieu · 2023-05-23T14:55:33Z

text/0000-export.md

+When using `dynamic = true` for a dependency, there is no need to build that full crate:
+only the signatures of its exported items are necessary.
+Cargo will pass a flag to the Rust compiler which will stop it from generating
+code for non-exported items and function bodies.


I would imagine that the default behavior of Cargo in this case would be to build the full crate, but emit it as a separate .so file. Building just the current crate without the dynamic dependency would be done only if the metadata is available without the source code, but this is future work and not part of the initial plan.

It depends a bit on the use case what makes most sense as the default. E.g. I wouldn't expect cargo to start building libjsonparser.so if we expect that library file to already exist on the target system. (Just like how we don't attempt to build libc.so or kernel32.dll or similar.)

This is something that I could see being different for different profiles too. E.g. you may prefer to build from source for cargo test so you don't need the correct version installed, but any release builds would want to link the system. It would be nice if there were some way to configure this.

I assume the rustc flag will be something like --extern=dynamic=... that treats the extern crate like a header file?

Amanieu · 2023-05-23T15:02:44Z

text/0000-export.md

+or when using a `extern dyn crate …;` statement in the source code,
+only the items marked as `#[export]` will be available and the dependency will be linked dynamically.
+
+### Building Dynamic Dependencies


How does Cargo handle the case where a crate is used as a dynamic dependency in one place in the crate graph, but as a normal dependency in another place? I presume in that case it is still compiled as a normal dependency, but the crate declaring it as a dynamic dependency is still restricted to only using exported items.

I suppose I'd just defer that to the 'unresolved questions' section of the tracking issue. I think I agree with your expectation, but I don't feel strongly about it.

Amanieu · 2023-05-23T15:04:45Z

text/0000-export.md

+}
+```
+
+The library can then be either linked statically or dynamically, by informing cargo of the choice:


I don't think static/dynamic is the correct distinction to make here. I can see a use case for statically linked dependencies that require a stable ABI: proprietary libraries. This is quite common on Windows with C/C++ libraries.

That's a good point. Perhaps that means we should use slightly different names, but I don't think that changes anything else substantially about the RFC.

Amanieu · 2023-05-23T15:39:43Z

text/0000-export.md

+if it has any private fields,
+or if any of the fields are not of an `#[export]`ed or builtin type.
+
+#### Types with Private Fields


I think that dealing with private fields is very complex and probably not something that many people can commit to. I would rather see a system where a type with private fields is treated as an opaque extern type (#1861). This effectively means that you can only interact with this type by reference and all operations on it must be done through exported functions.

I agree with that for many use cases, especially once "crabi" supports dyn trait objects. But for types like NonZero or File, we should be able to allow them to be used by value, since the invaraint / private details will likely not change. (We can still pick a new hash if File does change, but "it's just a file descriptor / handle" will be true for the forseeable future.)

I do agree this is a feature that should be less often than opaque dyn-like objects or fully public types, which is why I think it's fine that exporting a type with private fields requires an unsafe attribute, steering people away to safer alternatives for most situations.

This effectively means that you can only interact with this type by reference and all operations on it must be done through exported functions.

I don't see how that helps about the private invariant issue -- if the invariants change, it is quite important that this is considered a separate type, even if everything is by-reference.

text/0000-export.md

obi1kenobi

This is great and I'm excited to see it happen! Thanks for putting in the work on it.

I added a few questions and suggestions to clarify edge cases relevant to semver and the name resolution edge cases I recently ran into. Nothing critical, should be straightforward to resolve.

text/0000-export.md

obi1kenobi · 2023-05-23T16:26:08Z

text/0000-export.md

+For aliases of functions, an `#[export]` attribute on the `use` statement will use the
+path (and name) of the alias, not of the original function.
+(So it won't 'leak' the name of any (possibly private/unstable) module it was re-exported from.)
+


Consider throwing in a quick mention that removing #[export] is considered a major breaking change since it affects the ABI. I think this is pretty clear, but will let me have an official document I can cite in cargo-semver-checks when we implement a semver-check for this.

Suggested change

Removing `#[export]` from an item that was previously exported is a major breaking change,

since it removes that item from the stable ABI.

How ABI stability relates to API version is an interesting question. I'll have to think about that a bit.

If you'd like a sounding board to bounce ideas off of, give me a ping!

I started the cargo-semver-checks project so I've been trying to answer the same question coming from the opposite direction — e.g. we already treat removing #[repr(C)] as a major breaking change.

Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com>

text/0000-export.md

Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com>

text/0000-export.md

petrochenkov · 2023-05-23T19:08:38Z

text/0000-export.md

+//! application crate
+
+fn main() {
+    library::hello();


I guess my main question after seeing this example is whether this proposal requires a stable Rust metadata format, or it just works on symbol names.

How is library::hello resolved here?

Does library contain some metadata saying that the crate root module contains a name hello and it's a function?
That means keeping and reading Rust metadata.
Keeping metadata format compatible across rustc versions (by making it stable, or by keeping logic for reading all previous metadata versions) would put a lot of maintenance burden on rustc development, maybe a prohibitive amount of maintenance burden.

Or we just hash the whole library::hello path to obtain some string like hello_HASH and dlsym that string from the dynamic library?
In that case hashing the path only seems not to be enough, because it means the argument types and return type are not included into the hash.
Do we need to hash the whole call expression library::hello() instead? Even then we don't know the return type to hash.
Also, if the hash requires knowing types then resolution of such paths needs to be delayed until type checking, is that right?

I don't like any of these alternatives, TBH.

Have you fully read the RFC? The library::hello path in that example is resolved no different than it is today. There is still a crate named library with a hello item. I'm not proposing to somehow enable generate symbol names out of thin air based on just the call expression. The symbol name comes from the definition of the item.

whether this proposal requires a stable Rust metadata format, or it just works on symbol names

A metadata format is perhaps a future possibility, but definitely not part of this RFC. This RFC proposes symbols with a hash based on all information relevant for type safety. That hash does need to be stable, but can start out with strict limitations for only certain kinds of signatures and then slowly extended over time.

The library::hello path in that example is resolved no different than it is today.

Then I don't understand how can the dynamic library and the application be built by different versions of rustc ("2. Cases where dynamic library and application can be compiled and shipped separately from each other.").

To resolve paths like we do it today we need to read a lot of various Rust metadata from the dynamic library, but the metadata details typically change between every rustc release and rustc 1.N.0 won't be able to correctly parse metadata generated by rustc 1.M.0.

I read the whole RFC, and I have an impression that either I'm missing something big, or it focuses in great detail on secondary issues while skipping on the primary problem with the metadata format incompatibility.

Let's say that we want to dynamically link library compiled by rust 1.M.0 with main compiled by 1.N.0 (where N > M for concreteness). My understanding is that you would build both main and library with cargo + rustc 1.N.0 while signaling that the library dependency is dynamic = true. This is basically exactly like a statically linked build process except that the code of library is not emitted and instead symbols using the stable ABI mangling scheme are called instead, i.e. main would call _Rlibrary5hello_ABCD or what have you.

The previous compile of the library crate by 1.M.0 also produced a function of the name _Rlibrary5hello_ABCD because it had the same stable ABI hash, so when these two are linked together (as one would do for an FFI binding) everything works out. The metadata itself is not shared.

There are some future work discussions about being able to skip parts of library while building main on 1.N.0 (since most of that work would be discarded anyway) but it doesn't seem to be part of the RFC as written.

Ah, I see, the "Building Dynamic Dependencies" paragraph talks about this, although in a way that is obscured by potential future optimizations.

So, we

still need the source code for the dynamic library and all its dependencies;

instead of doing cargo build on that dynamic library we are now basically doing cargo check on it (with rustc 1.N.0) to get its rmeta.

And the benefits we get (compared to just building the dynamic library with 1.N.0) are

saving some time on not doing codegen;

having opportunity to make changes to the pre-build dynamic library as long as they don't sufficiently diverge from the previously published source code used for generating rmeta.

All this sounds more or less reasonable.

still need the source code for the dynamic library and all its dependencies

Since we only need the metadata and not the function bodies, we could skip some things. Once cargo has a separation between public and private dependencies, we can skip private dependencies entirely.

You need private dependencies if they are proc macros. Proc macros may expand to #[export] attributes or items with #[export] attributes.

Dante-Broggi · 2023-05-23T19:34:57Z

As Prior Art, this could reference Swift's ABI attributes. In particular,
@frozen seems equivalent to #[export] #[repr(Swift)] on structs (for the appropriate definition of #[repr(Swift)].
@usableFromInline seems equivalent to #[export] on functions, except that it is allowed on internal/pub(crate) functions, to allow use from @inlinable functions.
@inlinable seems equivalent to the #[export_inline] future direction on functions, except that it is allowed on internal/pub(crate) functions, to allow use from other (public) @inlinable functions.

In addition, IIRC, swift defaults public functions in ABI stable libraries to ABI stable, but the Swift stdlib uses the unstable @_alwaysEmitIntoClient to unstably inline functions to prevent ABI presence.

p-avital · 2023-05-24T13:36:00Z

While I'm really happy to see some progress on the ABI front, I feel like this still doesn't address the most painful point of building shared libraries in Rust: "What if some of my dependencies didn't brand their types with #[export]?". We already have ways to get a stable ABI for types we fully own (C-repr, abi_stable, stabby...), but as soon as dependencies are introduced, things get real ugly real fast.

I really wish we'd put the option on the table of having the compiler "contaminate" types that are used by #[export]ed symbols. While this doesn't solve the "mismatching library version" part of the problem, there are existing ways to check for this problem. There are no ways to force a dependency to use a stable layout for some of its types.

Diggsey · 2023-05-24T23:37:24Z

One of the main use-cases for dynamic loading of libraries is for plugin-like systems, where you might have multiple libraries exposing the same API.

This RFC does not appear to be designed to solve this use-case, but it solves several closely related problems, and it would be a shame if the plugin use-case required a totally orthogonal approach.

Is there a way this RFC could be naturally built upon to support such a feature?

Mark-Simulacrum · 2023-05-25T00:01:03Z

text/0000-export.md

+
+It is an error to use a plain `#[export]` attribute on a type with out stable `#[repr(…)]`,
+if it has any private fields,
+or if any of the fields are not of an `#[export]`ed or builtin type.


Is builtin type right? For example, *const Foo is stable ABI but is opaque-ish if Foo isn't export too. &str is also builtin but doesn't feel like we should commit to a stable ABI for (at least yet). Maybe worth listing the specific types here rather than naming a category?

I think this work is largely being offloaded to the "crabi" ABI RFC. Probably the best thing to say here would be "FFI-safe type" in the sense of the improper_ctypes lint.

I don't think that lint is appropriate here, it has the same questions/problems with it.

I don't know why this RFC in particular would need to solve that problem though, the problem is pre-existing and this RFC isn't trying to address it.

This RFC is prescribing an error, rather than a lint, and seems intended to start with a very small set and expand over time. So it seems to me that it is trying to do something different; it's also a safe interface AFAICT - so that also matters.

Ah yes, I should clarify this section a bit. The rules I had in mind are a bit stricter than what I wrote down in the RFC. (Matching your expectations.)

Mark-Simulacrum · 2023-05-25T00:29:09Z

text/0000-export.md

+
+## Alternatives
+
+- Alternatives for using a hash of all relevant type information:


I think this RFC might want to talk about why this is the right choice where we previously made a different one in the move from legacy to v0 (which moved away from a hash, at least mostly). I think given the use case here this may be better, but the discussion seems worthwhile.

it seems like a good idea to me to combine both of them -- v0 mangling so debuggers and similar can get the full source symbol name including generic arguments, and the hash to detect ABI breaks (ABI breaks won't necessarily change the v0 mangling, such as reordering struct fields)

riking · 2023-05-25T22:13:09Z

text/0000-export.md

+
+Using `#[export(unsafe_stable_abi = «hash»)]`, one can make the (unsafe) promise
+that the type will remain ABI compatible as long as the provided hash remains the same.
+The hash must have been randomly generated to ensure uniqueness (which is part of the unsafe promise).


I'd like to see a brief mention of tool support here. People aren't typically familiar with the facilities to generate random numbers locally.

You can generate an unsafe_stable_abi hash by setting it to an empty string and running cargo fix.

I wouldn't be surprised if rust-analyzer would get some kind of "generate new random number" action for these attributes.

Producing a random number as a suggestion in the diagnostic when leaving the attribute empty (which could be used by cargo fix) seems helpful.

But why does it have to be a random number in the first place? As I mentioned before, this is terrible for code review - if the point is to avoid some kind of collision then it is susceptible to spoofing. A simple incrementing number or string should be sufficient for ABI versioning purposes.

A simple incrementing number or string should be sufficient for ABI versioning purposes.

That is not sufficient. It needs to be globally unique. Otherwise you'd need to (unsafely!) assume that the same (crate) name always refers to the exact same codebase with linear versioning. If a crate is forked and both forks introduce a different new safety invariant on private fields, it would go wrong if both just increment their number, resulting in the same number being used for two ABI-incompatible but identically named types.

In the Rust ecosystem there is no expectation that crate/package names are globally unique. (There exists a lot of Rust code outside of crates.io.) There is no expectation to put e.g. a domain name in the package/crate name to make them unique, like in some other languages.

This is also why I suggested a string instead of a number, it's a lot easier to put necessary disambiguating information in a string compared to a number. But I don't see how this changes the equation really, since it's unsafe either way. The rule is not "it must be globally unique", it is "this safety invariant must exactly match the safety invariant on any other ABI compatible types with the same unsafe_stable_abi which are linked with this crate" which is a much more manageable auditing task.

which are linked with this crate

The problem is that "this crate" is meaningless. Outside crates.io, crate names aren't unique.

We could easily allow a string literal instead of a hash, such that you can write "the first field must be a valid file descriptor and the second field must be a prime number" (and hash that) rather than specifying a random token like b58d7aabb705df024fa578d9a0e20a7c, but I'm not sure if that's much better.

Using GUIDs to uniquely identify something seems pretty standard for many forms of identification related to dynamic linking/FFI in various languages.

Another option would be to specify one random hash/guid for the whole crate at the top level, and then use only simple incremental numbers for the types in the crate. But that has a higher chance of resulting in issues when working on multiple versions/branches, if two branches both bump a number up to the same new number with incompatible safety invariants.

which are linked with this crate

The problem is that "this crate" is meaningless. Outside crates.io, crate names aren't unique.

"This crate" is meaningful, I mean this particular linker object that is linked with other things. This constraint is not something the crate author can check, and the (non)uniqueness of crate names doesn't really matter here. It is the author of the final binary who has the responsibility to not link multiple copies of a crate with the same unsafe crate ABI and different actual semantics, and we can provide some assistance in making it difficult to accidentally break this rule, but we can't check this condition, and using hashes only obscures the problem, it doesn't make it any easier to solve.

We could easily allow a string literal instead of a hash, such that you can write "the first field must be a valid file descriptor and the second field must be a prime number" (and hash that) rather than specifying a random token like b58d7aabb705df024fa578d9a0e20a7c, but I'm not sure if that's much better.

Really? A string with the literal safety comment would be much easier to verify than a random string of digits (of unknown provenance). And since it's free-form, crates can set up their own policies on how to use the field, i.e. use sequential versioning, hierarchical versions, short identifiers, or full safety comments.

How do I know that your UID is actually unique? Maybe the original programmer knows because they used some "generate random number" utility themselves but what about everyone else?

Using GUIDs to uniquely identify something seems pretty standard for many forms of identification related to dynamic linking/FFI in various languages.

Hand validation of GUIDs isn't something anyone does though, and anything which is unsafe basically has a "please validate me" sign on it. Hashing crate names in mangled names is fine since the generation and uniqueness of these IDs is handled by rustc itself, no one has to look at them directly. Putting them in the source code is a completely different ballgame.

Another option would be to specify one random hash/guid for the whole crate at the top level, and then use only simple incremental numbers for the types in the crate.

Certainly you could mix a crate hash into the type's hash, although that would prevent use cases where two types are deliberately sharing the same unsafe crate ABI despite being defined in different crates (because they are actually supposed to be ABI compatible).

stlankes · 2023-05-28T07:28:28Z

Maybe I misunderstood something. Why do you limit the RFC to shared libraries? Static libraries are sometimes also useful. Ok, it is perhaps only a special case. But we develop a library operating systems and link a static library as kernel to the application. Currently, the interface between application and libos is a C interface....

clarfonthey · 2023-05-29T22:59:25Z

So, I have a feeling this was explicitly left out of the RFC, but it feels worth mentioning something about the intended loading mechanism for these dynamic crates.

I think it is (mostly implied) that cargo run will work out-of-the-box even with these settings, although it would be good to mention in the RFC explicitly as well. It would also be useful to clarify if this enables shared storage of dynamically-compiled crates too, although I would understand initially tightening this down and deferring that as well.

Obviously, loading dynamic code is complicated and each operating system has its own way of dealing with things. There are "system-wide" locations as well as "app-specific" locations that will be most suited further down the road. However, I'm not 100% sure how exactly those would fit into this current proposal. The most obvious mechanism currently is that used by cargo install, but the RFC also doesn't say anything about whether that would work with this mechanism. I don't think this should be required for this proposal, but I think it should at least have some sort of opinion on the path there, and what kinds of steps would need to be taken to design that. I don't think end users should have to operate differently to cargo install a crate that's dynamically linked, but perhaps this would also lead to some notion of cargo installing library crates too? It all seems a bit fuzzy.

digama0 · 2023-05-29T23:40:00Z

Why would cargo run do anything like an install / put things in a global location? I would assume that cargo run on an executable with a dynamic dependency would just put the executable in target/debug/myprog and the .dll or .so for the library in target/debug/deps/mylib.so.

clarfonthey · 2023-05-30T02:09:35Z

Why would cargo run do anything like an install / put things in a global location? I would assume that cargo run on an executable with a dynamic dependency would just put the executable in target/debug/myprog and the .dll or .so for the library in target/debug/deps/mylib.so.

This isn't very out-there to consider when you remember that cargo will globally cache crate metadata and information from registries. What's to assume that dynamically built crates wouldn't be put there as well, since they by design could be used across different builds?

For example, I have eight different versions of the serde crate source in my .cargo/registry folder when they could have been copied into individual target folders, but aren't because they wouldn't change across workspaces that use them. The same could happen for dynamic library builds too.

programmerjake · 2023-06-05T01:01:46Z

text/0000-export.md

+or when using a `extern dyn crate …;` statement in the source code,
+only the items marked as `#[export]` will be available and the dependency will be linked dynamically.
+
+### Building Dynamic Dependencies


one issue we'll have with compiling a crate once for making a .so file and again for making a program that links against the .so is that compiling the same code multiple times isn't guaranteed to produce the same output, an extreme example is a proc macro designed to produce random numbers, which then end up as constants in the generated code and/or symbol names.

e.g.: https://crates.io/crates/const-random which is a dependency of the very widely used hashbrown crate

The point of this RFC is that you do not need the exact same crate for creating the .so and for linking to it. (So you can swap compatible versions, compile it with a different compiler or with different settings, etc.) You only need something that is ABI compatible, and having a mismatch does not result in memory unsafety thanks to all relevant safety information being hashed into the symbols.

having a mismatch does not result in memory unsafety thanks to all relevant safety information being hashed into the symbols.

I think there's one point where this does not work: when safety depends on functional properties of upstream code. Imagine crate A exporting a fn make_int() -> i32. Create B re-exports that function. Crate C does something like

if A::make_int() != B::make_int() { unsafe { unreachable_unchecked(); } }

This is perfectly sound as long as B promises that its function always behaves like that of A. Even if an updated A changes the integer returned, things will keep working.

But if we are in a dynamic linking situation, can't it happen that we get one version of the function via B (since maybe it got inlined or whatever) but a different version via A (since maybe we are dynamically linking that one)? Basically exactly the reason we need hashes for types with private fields?

OTOH it seems like there are other issues in that situation (e.g. a static mut in A that would suddenly behave differently because now there are 2 copies of it), so I might also be misunderstanding what exactly is going on during dynamic linking here.

@RalfJung Did you mean to call make_int() there, or compare the function pointers? Obviously unless make_int() is a pure function we would not necessarily be allowed to assume it always returns the same value, even if it was A::make_int() != A::make_int() in the comparison.

Right I was assuming a promise of A that the function is pure.

(It's common for unsafe code to depend on functional properties of safe upstream code like that.)

RalfJung · 2023-06-05T12:00:26Z

text/0000-export.md

+In this case, using the type as part of a function signature will not result in a hash based on the full (recursive) type definition,
+but will instead be based on the user provided hash (and the size and alignment of the type).


This seems potentially problematic if a type with custom invariants has as a field another type with such invariants. Like, imagine a type using Vec but then also adding further invariants. Now whenever the Vec invariant changes it is crucial that this type's invariant also changes! But by not doing any recursive traversal, this will be very hard for authors to ensure. (I assume changing the invariant is considered semver-compatible, at least currently it would be. Then crates using Vec don't have any chance of knowing which of possible several semver-compatible Vecs they will get.)

That's a good point! Thanks!

Then I suppose it should still complete recursive traversal, and hash the provided hash in addition to that to account for the safety invariants.

I think it can still be useful to have a way to fully override the hash, in case of a change in fields/grouping/types that remains ABI compatible. (Changing some u32 to i32, regrouping fields in different inner wrapper structs, etc.) But that should be used very rarely. (I imagine we might want to use that for a few things in the standard library, but not much else.)

Yeah there might be niche uses but it is very dangerous (unless all fields have primitive types whose invariant is never going to change)

m-ou-se · 2023-06-06T10:50:05Z

Maybe I misunderstood something. Why do you limit the RFC to shared libraries? Static libraries are sometimes also useful. Ok, it is perhaps only a special case. But we develop a library operating systems and link a static library as kernel to the application. Currently, the interface between application and libos is a C interface....

That was also brought up here: #3435 (comment):

That's a good point. Perhaps that means we should use slightly different names, but I don't think that changes anything else substantially about the RFC.

RalfJung · 2023-06-06T20:26:43Z

Is it possible to export an unsafe fn?

I think that has some similar issues to structs with invariants: even if the signature of the function stays the same, its safety requirements might change, therefore somehow those requirements have to be encoded into the symbol name.

tgross35 · 2023-10-22T20:36:31Z

A few questions:

Will it be possible to do #[export, no_mangle] (or separate attributes)? To create one interface and link it from both C and Rust, this would only do the "treat dependencies as header files" part (i.e. layout & symbols only, no codegen). Maybe this could be a good minimum viable product since it punts the mangling question that is still under heavy discussion?
1. I would be happy as an even smaller step if rustc with --extern=dynamic=...or equivalent would be able to enable this header-like behavior for all public extern "C" functions even without the #[export] attribute to make all existing Rust cdylibs usable as .so without duplicating interfaces.
2. Based on this, there seems to be two separate concerns blended in this RFC. Concern 1 being, we need this "header file" behavior to have rust-links-rust work ergonomically, even if it goes through the no_mangle + extern "C" that is available today. I don't think #[export] needs to have anything to do with this part. Concern 2 is that we want a way to ensure synchronization of Rust type ABIs via mangling, which I believe is what #[export] will do
How heavily will real-world usage depend on Cargo supporting binary dependencies Tracking Issue for RFC 3028: Allow "artifact dependencies" on bin, cdylib, and staticlib crates cargo#9096? Right now that is kind of a blocker to ergonomic rust shared libraries (at least for my use case)
How does this interact with dylib vs. cdylib? I assume that any prepackaged artifacts generally need to be cdylib in order to ensure stability across different compiler versions, but the interaction with libstd is a bit weird here (the differences between these two interfaces isn't very well documented)
Is this RFC more or less looking for team review at this point? Or are there large changes / blockers expected

There is also some overlap with how we link in dependencies when we create staticlibs (the duplication problem) but this is being worked on separately

Add export proposal.

33968d1

m-ou-se added the T-lang Relevant to the language team, which will review and decide on the RFC. label May 23, 2023

m-ou-se mentioned this pull request May 23, 2023

Experimental feature gate proposal crabi rust-lang/rust#105586

Open

bjorn3 reviewed May 23, 2023

View reviewed changes

Update.

4054f1b

jamesmunns reviewed May 23, 2023

View reviewed changes

text/0000-export.md Show resolved Hide resolved

m-ou-se added 2 commits May 23, 2023 16:42

Updated.

2fb0433

Update.

bb2acd3

jtracey reviewed May 23, 2023

View reviewed changes

Amanieu reviewed May 23, 2023

View reviewed changes

Update.

9b3e6d1

shepmaster reviewed May 23, 2023

View reviewed changes

text/0000-export.md Show resolved Hide resolved

obi1kenobi reviewed May 23, 2023

View reviewed changes

Typo.

4aa6cd9

Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com>

obi1kenobi reviewed May 23, 2023

View reviewed changes

text/0000-export.md Outdated Show resolved Hide resolved

Typo.

61083fb

Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com>

str4d reviewed May 23, 2023

View reviewed changes

text/0000-export.md Show resolved Hide resolved

text/0000-export.md Show resolved Hide resolved

petrochenkov reviewed May 23, 2023

View reviewed changes

epage mentioned this pull request May 24, 2023

Allow defining crate-type for dependencies? rust-lang/cargo#629

Open

Mark-Simulacrum reviewed May 25, 2023

View reviewed changes

riking reviewed May 25, 2023

View reviewed changes

programmerjake reviewed Jun 5, 2023

View reviewed changes

RalfJung reviewed Jun 5, 2023

View reviewed changes

kkysen mentioned this pull request Jun 14, 2023

Do not rewrite ADTs mentioned in extern blocks immunant/c2rust#960

Merged

aidanhs mentioned this pull request Jan 6, 2024

RFC: Allow symbol re-export in cdylib crate from linked staticlib #3556

Open

ojeda mentioned this pull request May 13, 2024

Rust wanted features Rust-for-Linux/linux#354

Open

35 tasks



	Removing `#[export]` from an item that was previously exported is a major breaking change,
	since it removes that item from the stable ABI.


		## Alternatives

		- Alternatives for using a hash of all relevant type information:

		In this case, using the type as part of a function signature will not result in a hash based on the full (recursive) type definition,
		but will instead be based on the user provided hash (and the size and alignment of the type).

RFC: #[export] (dynamically linked crates) #3435

Are you sure you want to change the base?

RFC: #[export] (dynamically linked crates) #3435

Conversation

m-ou-se commented May 23, 2023 • edited by rustbot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-ou-se May 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdroege commented May 23, 2023

m-ou-se commented May 23, 2023

sdroege commented May 23, 2023

kornelski commented May 23, 2023

m-ou-se commented May 23, 2023 • edited Loading

m-ou-se commented May 23, 2023 • edited Loading

m-ou-se commented May 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

obi1kenobi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

petrochenkov May 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

digama0 May 24, 2023 • edited Loading

Choose a reason for hiding this comment

petrochenkov May 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dante-Broggi commented May 23, 2023

p-avital commented May 24, 2023

Diggsey commented May 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

digama0 May 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

digama0 Jun 6, 2023 • edited Loading

Choose a reason for hiding this comment

stlankes commented May 28, 2023

clarfonthey commented May 29, 2023

digama0 commented May 29, 2023

clarfonthey commented May 30, 2023

programmerjake Jun 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

digama0 Jun 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RFC: `#[export]` (dynamically linked crates) #3435

RFC: `#[export]` (dynamically linked crates) #3435

m-ou-se commented May 23, 2023 •

edited by rustbot

Loading

m-ou-se May 23, 2023 •

edited

Loading

m-ou-se commented May 23, 2023 •

edited

Loading

m-ou-se commented May 23, 2023 •

edited

Loading

petrochenkov May 23, 2023 •

edited

Loading

digama0 May 24, 2023 •

edited

Loading

petrochenkov May 24, 2023 •

edited

Loading

digama0 May 26, 2023 •

edited

Loading

digama0 Jun 6, 2023 •

edited

Loading

programmerjake Jun 5, 2023 •

edited

Loading

digama0 Jun 6, 2023 •

edited

Loading

tgross35 commented Oct 22, 2023 •

edited

Loading