Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linker fails when resolving type reference in sweep step #2260

Open
vitek-karas opened this issue Sep 6, 2021 · 10 comments
Open

Linker fails when resolving type reference in sweep step #2260

vitek-karas opened this issue Sep 6, 2021 · 10 comments

Comments

@vitek-karas
Copy link
Member

This was caused by #2234.

The problem is that first we call SweepAssemblyReferences on pretty much all assemblies. This will eventually call into AssemblyReferenceCorrector.UpdateScopeOfTypeReference which call LinkContext.TryResolve. In the failing case the type reference first points to "netstandard". It gets resolved to "System.Collections" and the type reference instance is overwritten to point to that. And the TryResolve cache remembers that this typeref points to "System.Collection" type-def.

Later on - we call ProcessAssemblyAction which in turn calls SweepAssembly on "System.Collection", this will remove the type-def from the assembly (not marked). This means that the type-def's Scope is reset to 'null'.

Now we eventually call ProcessAssemblyAction on the assembly which contained the type-ref. This will again end up calling AssemblyReferenceCorrector.UpdateScopeOfTypeReference and TryResolve on the same type-ref. Now the result is cached, so it returns the type-def which is now pointing to null scope. We try to "import" this type-def into the target assembly, and this fails, since it doesn't know where type type-def comes from.

Before the change, the TryResolve was a direct call to Resolve. In the first time around - it would return the same value (type-def). But second time around, the type-ref doesn't resolve anymore, since the type it points to has been removed, to Resolve returned null - and SweepStep has handling for this case - it instead resolves the assembly ref of the type-ref - which now points to "System.Collection" (was fixed up during the first pass) - so it imports that scope and leaves the type-ref pointing to that scope. This means it never actually relies on the type-ref to resolve - which it won't. In this case it's writing an unresolvable type-ref.

@vitek-karas
Copy link
Member Author

I'll have to dig into this deeper. The immediate fix can be to go back to calling Resolve in this place.
But this uncovers a more interesting question of how come that it is writing an unresolvable type-ref. All the assemblies involved are processed in "link" mode, so this should not really happen (the type-ref comes from a field-ref used in a method body).

vitek-karas added a commit to vitek-karas/linker that referenced this issue Sep 7, 2021
This uncovers a null ref in sweepstep.
The test is order dependent:
- Assembly (librarywithnonemtpy) has a virtual method which referes to type from second library (library). This processes the type reference.
- The type is removed from library
- The virtual method is re-processed again -> hits a type def which does not belong to any assembly/scope

For this to happen the virtual method must be processed after the type has been removed.
This also requires the method to be meant for removal later on (the body will be rewritten to throw), since otherwise the type would have been marked.
This also requires the library with the removed type to be kept (so something else in it must be kept).
@vitek-karas
Copy link
Member Author

The full problem description

Setup
Base.dll - library which declares a Base type which defines an abstract VMethod
Utils.dll - library with several types, contains RemovedType and UsedType
Implementation.dll - library which provides an Implementation with VMethod. The body of VMethod references RemovedType and it's the only reference to that type in the app.

The assemblies are loaded by the linker in this order (this is easy to achieve).

MarkStep

  • The uses Base and Implementation types. But it never instantiates Implementation. It does call Base.VMethod`.
  • MarkStep will mark Base, Implementation and Base.VMethod.
  • It will not see any direct reference to Implementation.VMethod. Since Implementation is not instantiated it will also try to remove the method's body
  • The body of Implementation.VMethod is not processed (marked) - it will be removed
  • Because of this RemovedType is not marked either.

SweepStep

  • First we call SweepAssemblyReferences on all assemblies (order doesn't matter)
    • This will see the body of Implementation.VMethod and will process the type-ref to RemovedType. This will populate the ref->def cache on LinkContext and resolve it to a type-def of RemovedType from the Utils.dll.
  • Then we call SweepAssembly(Utils.dll) (order does matter here)
    • This will see that RemovedType is not marked and will remove the type-def from the assembly. Cecil will set the Scope of the type-def to null (it doesn't belong to any assembly anymore)
  • Then we call SweepAssembly(Implementation.dll) (this needs to happen after SweepAssembly(Utils.dll) to trigger the bug)
    • This will eventually call SweepAssemblyReferences which will process all type-refs.
    • It will find the Implementation.VMethod and process the type-ref to RemovedType.
    • The resolution of that type-ref now goes through the context cache, which simple returns the cached type-def for RemoveType - but this type doesn't belong to any assembly anymore, so the type-def is effectively "broken" and Cecil will soon fail on it when we try to use it.

Before #2234 the type resolution didn't go through the context cache. So the second resolution of the type-ref for RemovedType did the full resolution algo and on the second try it would not find a matching type-def (it has been removed from the assembly). SweepStep has code to handle that case (originally meant for copyused assemblies) and thus it won't crash.

Additional notes:

  • If it doesn't crash this creates a type-ref which points to non-existing type
  • This is by design for copyused assemblies, but it should not happen for link assemblies
  • In this case it does happen even for link assemblies, BUT
  • After SweepStep we run the CopyRewriterStep which will actually rewrite the body of Implementation.VMethod to just a throw - and thus remove the "broken" type-ref from the output.
  • The final IL doesn't contain broken type-refs, but they appear in the intermediate state during SweepStep - this can break Cecil.

@vitek-karas
Copy link
Member Author

I don't know the right solution yet. I'm tempted to say that we should run the CoreRewriterStep before SweepStep, but I don't know if that's doable (and what is the reason we don't do it already).

There's still a question of copyused assemblies - in the new behavior after #2234 it's likely they can also run into the same problem - but for those there's no rewriter which will remove the broken type-refs. We actually want to keep the broken type-refs in that case.

This is also related to another issue with Trimming+R2R. In the copyused mode, we produce broken type-refs, but R2R can't process these - the R2R compiler fails on them.

@vitek-karas
Copy link
Member Author

I've added a (disabled) test for this issue (as a simple repro): #2263

vitek-karas added a commit to vitek-karas/linker that referenced this issue Sep 7, 2021
Basically a workaround for dotnet#2260. This simply goes back to using the `Resolve` method and avoid the cache for now.
@vitek-karas
Copy link
Member Author

Workaround is here: #2264

@sbomer
Copy link
Member

sbomer commented Sep 7, 2021

Thanks for investigating this @vitek-karas! My read of your notes is that after sweeping types in ProcessAssemblyAction, the type resolve cache is effectively invalidated (at least for unused typerefs), so we should stop using it after that point. I think going back to cecil's Resolve here is the right thing to do in that case.

We do still use the cache later in CodeRewriterStep but I think we should only be using it there for typerefs that resolve to marked types.

Whether CodeRewriterStep should run before SweepStep is a great question. I would be worried that it could leave unused typerefs in the assembly metadata (that aren't referenced by any methods) that then don't get fixed up by SweepStep.

vitek-karas added a commit that referenced this issue Sep 8, 2021
This uncovers a null ref in sweepstep.
The test is order dependent:
- Assembly (librarywithnonemtpy) has a virtual method which referes to type from second library (library). This processes the type reference.
- The type is removed from library
- The virtual method is re-processed again -> hits a type def which does not belong to any assembly/scope

For this to happen the virtual method must be processed after the type has been removed.
This also requires the method to be meant for removal later on (the body will be rewritten to throw), since otherwise the type would have been marked.
This also requires the library with the removed type to be kept (so something else in it must be kept).
vitek-karas added a commit to vitek-karas/linker that referenced this issue Sep 8, 2021
Basically a workaround for dotnet#2260. This simply goes back to using the `Resolve` method and avoid the cache for now.
vitek-karas added a commit that referenced this issue Sep 8, 2021
Basically a workaround for #2260. This simply goes back to using the `Resolve` method and avoid the cache for now.
@vitek-karas
Copy link
Member Author

The workaround (back to original behavior) is merged now.

Discussed with @sbomer, outcome:

  • Currently SweepStep performs type-ref rewrite twice.
    • Once before any sweeping where we remove any type-refs to type forwards and replace them with type-refs to the resolved types (implementation assemblies). This is necessary to be able to sweep type forwarders and potentially entire facades.
    • Second time after we sweep each assembly. This is done to sweep assembly refs from that assembly.
  • The second pass should not need to actually fixup any type-refs, it only needs to collect the list of references assemblies

Proposal

  • Change the AssemblyReferenceCorrector to refactor out the part which can walk all type-refs in an assembly
  • Use this "type-ref walker" in the first walk to rewrite type-refs to point to implementation assemblies. In this case there's no need to "ImportReference" anything, we just resolve the type-ref, and if it resolves we update the type-ref to point to the resolved type-def directly.
  • Use this "type-ref walker" in the second walk to determine list of used assembly refs
    • Currently we remove all assembly refs, and the walk type-refs and "ImportReference" the assembly refs back into the assembly
    • The new behavior should be to just walk the type-refs and build a hashset of all used assembly refs
    • Then go and remove the unused assembly refs from the Cecil model - avoid removing used assembly refs

Going forward consider doing also:

  • During actual sweeping, avoid scanning the graph for anything - the graph is broken until all assemblies are swept. This means we can't sweep assembly refs during the sweep, we should do it AFTER all type/method sweeping is done.
  • Ideally avoid any IL/metadata modifications after the SweepStep - since it could invalidate the sweeping. This specifically means moving CoreRewriterStep before SweepStep.

@sbomer
Copy link
Member

sbomer commented Sep 8, 2021

Use this "type-ref walker" in the first walk to rewrite type-refs to point to implementation assemblies. In this case there's no need to "ImportReference" anything

I think it'll still need to ImportReference of the resolved typedef in case it's in an unreferenced assembly, just not ImportReference to fix up unresolved typerefs - or is the suggestion to do that in the second phase?

During actual sweeping, avoid scanning the graph for anything - the graph is broken until all assemblies are swept. This means we can't sweep assembly refs during the sweep, we should do it AFTER all type/method sweeping is done.

I think you are saying that with the proposed approach, walking the types/methods won't visit assemblyrefs, so we need to separately sweep them in the second step. And before that's done, the graph will be in an inconsistent state - is that right? What did you mean by "avoid scanning the graph"?

@vitek-karas
Copy link
Member Author

I think it'll still need to ImportReference of the resolved typedef in case it's in an unreferenced assembly, just not ImportReference to fix up unresolved typerefs - or is the suggestion to do that in the second phase?

It might be necessary to do this, but only in cases where the type-ref point to type-forwarder, which is unresolvable. We would still like to remove the type-forwarder in this case, but typeref.Resolve won't help as it won't resolve it through to the unresolvable reference. I actually don't know how this should work, we would have to play with it a bit. For type-refs which directly point to unresolvable assemblies, there should be no need to call ImportReference.

What did you mean by "avoid scanning the graph"?

Sorry, bad choice of words from me. It's probably easier to say that during sweeping the graph is inconsistent - so it should be treated as such. Meaning that for example we should not be using it to figure out unused assembly refs.

@sbomer
Copy link
Member

sbomer commented Sep 9, 2021

It might be necessary to do this, but only in cases where the type-ref point to type-forwarder, which is unresolvable.

Sorry, I meant cases where the type-forwarder is resolvable, but points to an assembly that wasn't already referenced by the assembly that contains the typeref.

We would still like to remove the type-forwarder in this case, but typeref.Resolve won't help as it won't resolve it through to the unresolvable reference.

Good point - I guess today we just keep the original typeref pointing at the removed forwarder, but it might be better to update it to point directly at the target assembly.

agocke pushed a commit to dotnet/runtime that referenced this issue Nov 16, 2022
This uncovers a null ref in sweepstep.
The test is order dependent:
- Assembly (librarywithnonemtpy) has a virtual method which referes to type from second library (library). This processes the type reference.
- The type is removed from library
- The virtual method is re-processed again -> hits a type def which does not belong to any assembly/scope

For this to happen the virtual method must be processed after the type has been removed.
This also requires the method to be meant for removal later on (the body will be rewritten to throw), since otherwise the type would have been marked.
This also requires the library with the removed type to be kept (so something else in it must be kept).

Commit migrated from dotnet/linker@43fb979
agocke pushed a commit to dotnet/runtime that referenced this issue Nov 16, 2022
Basically a workaround for dotnet/linker#2260. This simply goes back to using the `Resolve` method and avoid the cache for now.

Commit migrated from dotnet/linker@a9888c2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants