Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Half of Serde compile time is spent in ElaborateDrops #37106

Closed
dtolnay opened this issue Oct 12, 2016 · 4 comments
Closed

Half of Serde compile time is spent in ElaborateDrops #37106

dtolnay opened this issue Oct 12, 2016 · 4 comments
Labels
P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@dtolnay
Copy link
Member

dtolnay commented Oct 12, 2016

The main serde crate compiles in 10 seconds, 5 of which is ElaborateDrops. Is this expected? Is there anything we can do on our end to reduce this? I don't see any other tickets about this pass in particular.

https://github.com/serde-rs/serde/tree/v0.8.12/serde

$ rustc --version
rustc 1.14.0-nightly (a3bc191b5 2016-10-10)

$ cargo rustc -- -Z time-passes
   Compiling serde v0.8.12
time: 0.024; rss: 55MB  parsing
time: 0.000; rss: 55MB  recursion limit
time: 0.000; rss: 55MB  crate injection
time: 0.000; rss: 55MB  plugin loading
time: 0.000; rss: 55MB  plugin registration
time: 0.199; rss: 97MB  expansion
time: 0.000; rss: 97MB  maybe building test harness
time: 0.002; rss: 97MB  maybe creating a macro crate
time: 0.000; rss: 97MB  checking for inline asm in case the target doesn't support it
time: 0.004; rss: 97MB  complete gated feature checking
time: 0.008; rss: 97MB  early lint checks
time: 0.003; rss: 97MB  AST validation
time: 0.042; rss: 103MB name resolution
time: 0.026; rss: 113MB lowering ast -> hir
time: 0.006; rss: 116MB indexing hir
time: 0.002; rss: 116MB attribute checking
time: 0.003; rss: 98MB  language item collection
time: 0.005; rss: 98MB  lifetime resolution
time: 0.000; rss: 98MB  looking for entry point
time: 0.000; rss: 98MB  looking for plugin registrar
time: 0.021; rss: 103MB region resolution
time: 0.002; rss: 103MB loop checking
time: 0.002; rss: 103MB static item recursion checking
time: 0.110; rss: 103MB compute_incremental_hashes_map
time: 0.000; rss: 103MB load_dep_graph
time: 0.032; rss: 105MB type collecting
time: 0.001; rss: 105MB variance inference
time: 0.039; rss: 109MB coherence checking
time: 0.286; rss: 110MB wf checking
time: 0.093; rss: 110MB item-types checking
time: 1.835; rss: 122MB item-bodies checking
time: 0.000; rss: 122MB drop-impl checking
time: 0.384; rss: 122MB const checking
time: 0.019; rss: 122MB privacy checking
time: 0.002; rss: 122MB stability index
time: 0.017; rss: 122MB intrinsic checking
time: 0.006; rss: 122MB effect checking
time: 0.078; rss: 122MB match checking
time: 0.022; rss: 118MB liveness checking
time: 0.486; rss: 118MB rvalue checking
time: 0.388; rss: 187MB MIR dump
  time: 0.030; rss: 185MB   SimplifyCfg
  time: 0.076; rss: 184MB   QualifyAndPromoteConstants
  time: 0.050; rss: 184MB   TypeckMir
  time: 0.002; rss: 184MB   SimplifyBranches
  time: 0.014; rss: 184MB   SimplifyCfg
time: 0.172; rss: 184MB MIR passes
time: 0.859; rss: 185MB borrow checking
time: 0.007; rss: 185MB reachability checking
time: 0.016; rss: 185MB death checking
time: 0.017; rss: 186MB stability checking
time: 0.000; rss: 186MB unused lib feature checking
time: 0.133; rss: 186MB lint checking
time: 0.000; rss: 186MB resolving dependency formats
  time: 0.001; rss: 186MB   NoLandingPads
  time: 0.014; rss: 186MB   SimplifyCfg
  time: 0.017; rss: 187MB   EraseRegions
  time: 0.006; rss: 187MB   AddCallGuards
  time: 4.833; rss: 204MB   ElaborateDrops
  time: 0.001; rss: 204MB   NoLandingPads
  time: 0.031; rss: 195MB   SimplifyCfg
  time: 0.009; rss: 195MB   InstCombine
  time: 0.006; rss: 195MB   Deaggregator
  time: 0.001; rss: 195MB   CopyPropagation
  time: 0.005; rss: 195MB   AddCallGuards
  time: 0.001; rss: 195MB   PreTrans
time: 4.927; rss: 195MB Prepare MIR codegen passes
  time: 0.141; rss: 200MB   write metadata
  time: 0.016; rss: 201MB   translation item collection
  time: 0.001; rss: 201MB   codegen unit partitioning
  time: 0.001; rss: 203MB   internalize symbols
time: 0.226; rss: 143MB translation
time: 0.000; rss: 143MB assert dep graph
time: 0.000; rss: 143MB serialize dep graph
  time: 0.006; rss: 88MB    llvm function passes [0]
  time: 0.002; rss: 88MB    llvm module passes [0]
  time: 0.078; rss: 94MB    codegen passes [0]
  time: 0.001; rss: 95MB    codegen passes [0]
time: 0.096; rss: 94MB  LLVM passes
time: 0.000; rss: 94MB  serialize work products
time: 0.011; rss: 93MB  linking
    Finished debug [unoptimized + debuginfo] target(s) in 10.71 secs
@dtolnay
Copy link
Member Author

dtolnay commented Oct 12, 2016

Looks like this is largely explained by array and tuple Deserialize impls. We have impls for arrays sized 1 through 32 and tuples sized 1 through 16 that look like this:

$(
    let $name = match try!(visitor.visit()) {
        Some(value) => value,
        None => { return Err(Error::end_of_stream()); }
    };
)+;

try!(visitor.end());

Ok(($($name,)+))

Without the array and tuple impls the total compile time is 4.35 seconds with 1.370 in ElaborateDrops.

I will look into using a loop for the larger array impls. On the Rust side is there maybe an optimization that would handle this pattern better?

@alexcrichton
Copy link
Member

cc @rust-lang/compiler

@alexcrichton alexcrichton added I-nominated T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 12, 2016
@arielb1
Copy link
Contributor

arielb1 commented Oct 13, 2016

callgrind does not show ElaborateDrops as hot (but it does take 10 minutes to run). Odd.

@pnkfelix
Copy link
Member

triage: P-high

@rust-highfive rust-highfive added P-high High priority and removed I-nominated labels Oct 13, 2016
arielb1 added a commit to arielb1/rust that referenced this issue Oct 13, 2016
There used to be only a global cache, which led to uncached calls to
trait selection when there were type parameters.

I'm running a check that there are no adverse performance effects.

Fixes rust-lang#37106 (drop elaboration times are now ~half of borrow checking,
so might still be worthy of optimization, but not critical).
bors added a commit that referenced this issue Oct 15, 2016
add a per-param-env cache to `impls_bound`

There used to be only a global cache, which led to uncached calls to
trait selection when there were type parameters.

This causes a 20% decrease in borrow-checking time and an overall 0.5% performance increase during bootstrapping (as borrow-checking tends to be a tiny part of compilation time).

Fixes #37106 (drop elaboration times are now ~half of borrow checking,
so might still be worthy of optimization, but not critical).

r? @pnkfelix
brson pushed a commit to brson/rust that referenced this issue Nov 3, 2016
There used to be only a global cache, which led to uncached calls to
trait selection when there were type parameters.

I'm running a check that there are no adverse performance effects.

Fixes rust-lang#37106 (drop elaboration times are now ~half of borrow checking,
so might still be worthy of optimization, but not critical).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants