Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pre-pass to look for Type.GetType strings #1150

Closed
wants to merge 3 commits into from

Conversation

MichalStrehovsky
Copy link
Member

Linker has a design limitation that makes it impossible to add new assemblies to the closure when MarkStep is running. This is problematic because in reflection lightup it's common to use Type.GetType to access assemblies that might not have been referenced otherwise. To be able to analyze this, a global prepass is needed to find all assemblies that could possibly be referenced.

This is similar to the prepass done for PreserveDependencyAttribute and has similar problems (we're looking at everything, which means we could be looking at Type.GetType strings that are not actually reachable from what MarkStep is going to look at).

This is done with the expectation that adding a new assembly to the closure is not going to be more harmful than an existing hard AssemblyRef to an assembly that ends up not being needed by any of the code that survives trimming.

We'll need to overhaul these global pre-passes after .NET 5.

The new pass is pretty fast and mostly within noise range.

For an empty blazor app, it finds 52 strings that look like Type.GetType, out of which 37 are legitimate matches. The rest is things like "x, y", or "Width, Height".

Linker has a design limitation that makes it impossible to add new assemblies to the closure when MarkStep is running. This is problematic because in reflection lightup it's common to use Type.GetType to access assemblies that might not have been referenced otherwise. To be able to analyze this, a global prepass is needed to find all assemblies that could possibly be referenced.

This is similar to the prepass done for PreserveDependencyAttribute and has similar problems (we're looking at everything, which means we could be looking at Type.GetType strings that are not actually reachable from what MarkStep is going to look at).

This is done with the expectation that adding a new assembly to the closure is not going to be more harmful than an existing hard AssemblyRef to an assembly that ends up not being needed by any of the code that survives trimming.

We'll need to overhaul these global pre-passes after .NET 5.
@MichalStrehovsky
Copy link
Member Author

Opened as a draft because it's causing problem for tests.

System.Private.CoreLib has a Type.GetType string that references System.Runtime.WindowsRuntime. We now correctly detect this and add the assembly to the closure. This then blows up because we don't have a Windows.winmd referenced from S.R.WindowsRuntime...

@mrvoorhe
Copy link
Contributor

If you're following the pattern of preserve dependencies, you may want to expand your test coverage. For example, will this step also run into the same limitation as https://github.com/mono/linker/blob/master/test/Mono.Linker.Tests.Cases/PreserveDependencies/PreserveDependencyMethodInNonReferencedAssemblyChainedReference.cs ?

@MichalStrehovsky
Copy link
Member Author

If you're following the pattern of preserve dependencies, you may want to expand your test coverage. For example, will this step also run into the same limitation as

Yep, I'm just digging through your pull request #343 and found the tests as well. It's hitting the Windows,winmd issue you saw in the past. I think once I apply your Windows.winmd fix, we can also unblock these tests.

@mrvoorhe
Copy link
Contributor

I'm not an expert on windows runtime, but I know we have logic in our linker & our fork of cecil in order to support windows runtime.

For example we have this in our fork of cecil : jbevain/cecil#394

If you want any of our code let me know.

@MichalStrehovsky
Copy link
Member Author

I know we have logic in our linker & our fork of cecil in order to support windows runtime.

Thanks! Hopefully this won't be needed. If this start looking like a rat's nest, I'll cross my fingers and hope that dotnet/runtime#35318 delivers on its promises ("We will be removing the existing WinRT interop system from the .NET runtime (and any other associated components) as part of .NET 5.0.") because that would mean that System.Runtime.WindowsRuntime can no longer reference Windows.winmd (by the time the linker sees it, at least).

@marek-safar
Copy link
Contributor

I'd like to question if this approach is actually necessary for .net5 (it might be but I'm not sure at this point). We'll support reflection safe linking for BCL code only why cannot we simply require DynamicDependencyAttribute for cases where Type.GetType is a constant which included assembly name?

Secondly, adding SRM to read data in one step and use Cecil for everything else is just a hack. What is missing in Cecil for your needs and can you fix Cecil instead of adding a new dependency?

@MichalStrehovsky
Copy link
Member Author

We'll support reflection safe linking for BCL code only why cannot we simply require DynamicDependencyAttribute for cases where Type.GetType is a constant which included assembly name?

I thought the plan is to also allow turning it on for customers - and requiring customers to duplicate all Type.GetType strings in a DynamicDependencyAttribute feels like an anti-user solution ("the string is right there - why can't linker just do this for me?"). To be consistent, we would have to require the DynamicDependency whenever there's Type.GetType, or a store into a [DynamicallyAccessedMembers]-annotated location of type System.String - because the target of GetType is not always guaranteed to be included in the closure.

What is missing in Cecil for your needs and can you fix Cecil instead of adding a new dependency?

Cecil keeps the user string heap as an internal implementation detail that is not exposed anywhere: https://github.com/jbevain/cecil/blob/eea822cad4b6f320c9e1da642fcbc0c129b00a6e/Mono.Cecil.Metadata/UserStringHeap.cs#L13-L35. It also doesn't have a way to walk the heap. Yes, using SRM is a bit of a hack, but probably the best thing in this situation (even if we expose this in Cecil, as you can see Cecil makes an extra byte[] copy of the entire string heap that feels very much unnecessary - Cecil is not the best tool for "let's do a quick scan for interesting things").

Note I'm looking at all of this as a temporary solution. I find the fact that we need to run these global pre-passes on things that we don't even know if they're needed dissatisfactory and would like to propose an overhaul once .NET 5 is done and we have some more cycles. It would address most of the known perf problems mentioned in #918.

@marek-safar
Copy link
Contributor

I thought the plan is to also allow turning it on for customers

I'm not sure why do you think so but I don't we will be anywhere near to support that and it'd break many existing apps if enabled. It might be possible for .NET6 though.

Cecil keeps the user string heap as an internal implementation detail that is not exposed anywhere

I don't see that as an issue. You make PR which adds the capability if it's rejected we can look for the alternative.

Note I'm looking at all of this as a temporary solution.

We'll ship that and support for years to come.

@MichalStrehovsky
Copy link
Member Author

I don't see that as an issue. You make PR which adds the capability if it's rejected we can look for the alternative.

I don't think it's possible to support API like this in an efficient way in Cecil - what we want is an API to enumerate all string in the string heap of a module.

String heap concept logically doesn't exist in Cecil's object model - Cecil doesn't keep track of all strings in the mutable object model (e.g. if I have a LDSTR instruction that I created using Cecil's object model APIs, this is represented as a reference to a System.String instance, not as an offset into a heap within Cecil's object model).

Cecil works with the string heap as part of hydrating the object model from a file. Cecil can also create the string heap as it walks data structures as part of the file save operation.

But a string enumeration API would basically need to simulate a file save operation and look for all unique strings referenced from places like LDSTR. At that point we might as well just walk the method bodies ourselves without building Cecil features.

@MichalStrehovsky
Copy link
Member Author

Should we revive this pull request? I don't think we'll get to redoing TypeMapStep anytime soon and this problem has multiple hits in CoreLib alone.

@MichalStrehovsky
Copy link
Member Author

Closing in favor of lazy loading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants