Half of Serde compile time is spent in ElaborateDrops #37106

dtolnay · 2016-10-12T05:20:04Z

The main serde crate compiles in 10 seconds, 5 of which is ElaborateDrops. Is this expected? Is there anything we can do on our end to reduce this? I don't see any other tickets about this pass in particular.

https://github.com/serde-rs/serde/tree/v0.8.12/serde

$ rustc --version
rustc 1.14.0-nightly (a3bc191b5 2016-10-10)

$ cargo rustc -- -Z time-passes
   Compiling serde v0.8.12
time: 0.024; rss: 55MB  parsing
time: 0.000; rss: 55MB  recursion limit
time: 0.000; rss: 55MB  crate injection
time: 0.000; rss: 55MB  plugin loading
time: 0.000; rss: 55MB  plugin registration
time: 0.199; rss: 97MB  expansion
time: 0.000; rss: 97MB  maybe building test harness
time: 0.002; rss: 97MB  maybe creating a macro crate
time: 0.000; rss: 97MB  checking for inline asm in case the target doesn't support it
time: 0.004; rss: 97MB  complete gated feature checking
time: 0.008; rss: 97MB  early lint checks
time: 0.003; rss: 97MB  AST validation
time: 0.042; rss: 103MB name resolution
time: 0.026; rss: 113MB lowering ast -> hir
time: 0.006; rss: 116MB indexing hir
time: 0.002; rss: 116MB attribute checking
time: 0.003; rss: 98MB  language item collection
time: 0.005; rss: 98MB  lifetime resolution
time: 0.000; rss: 98MB  looking for entry point
time: 0.000; rss: 98MB  looking for plugin registrar
time: 0.021; rss: 103MB region resolution
time: 0.002; rss: 103MB loop checking
time: 0.002; rss: 103MB static item recursion checking
time: 0.110; rss: 103MB compute_incremental_hashes_map
time: 0.000; rss: 103MB load_dep_graph
time: 0.032; rss: 105MB type collecting
time: 0.001; rss: 105MB variance inference
time: 0.039; rss: 109MB coherence checking
time: 0.286; rss: 110MB wf checking
time: 0.093; rss: 110MB item-types checking
time: 1.835; rss: 122MB item-bodies checking
time: 0.000; rss: 122MB drop-impl checking
time: 0.384; rss: 122MB const checking
time: 0.019; rss: 122MB privacy checking
time: 0.002; rss: 122MB stability index
time: 0.017; rss: 122MB intrinsic checking
time: 0.006; rss: 122MB effect checking
time: 0.078; rss: 122MB match checking
time: 0.022; rss: 118MB liveness checking
time: 0.486; rss: 118MB rvalue checking
time: 0.388; rss: 187MB MIR dump
  time: 0.030; rss: 185MB   SimplifyCfg
  time: 0.076; rss: 184MB   QualifyAndPromoteConstants
  time: 0.050; rss: 184MB   TypeckMir
  time: 0.002; rss: 184MB   SimplifyBranches
  time: 0.014; rss: 184MB   SimplifyCfg
time: 0.172; rss: 184MB MIR passes
time: 0.859; rss: 185MB borrow checking
time: 0.007; rss: 185MB reachability checking
time: 0.016; rss: 185MB death checking
time: 0.017; rss: 186MB stability checking
time: 0.000; rss: 186MB unused lib feature checking
time: 0.133; rss: 186MB lint checking
time: 0.000; rss: 186MB resolving dependency formats
  time: 0.001; rss: 186MB   NoLandingPads
  time: 0.014; rss: 186MB   SimplifyCfg
  time: 0.017; rss: 187MB   EraseRegions
  time: 0.006; rss: 187MB   AddCallGuards
  time: 4.833; rss: 204MB   ElaborateDrops
  time: 0.001; rss: 204MB   NoLandingPads
  time: 0.031; rss: 195MB   SimplifyCfg
  time: 0.009; rss: 195MB   InstCombine
  time: 0.006; rss: 195MB   Deaggregator
  time: 0.001; rss: 195MB   CopyPropagation
  time: 0.005; rss: 195MB   AddCallGuards
  time: 0.001; rss: 195MB   PreTrans
time: 4.927; rss: 195MB Prepare MIR codegen passes
  time: 0.141; rss: 200MB   write metadata
  time: 0.016; rss: 201MB   translation item collection
  time: 0.001; rss: 201MB   codegen unit partitioning
  time: 0.001; rss: 203MB   internalize symbols
time: 0.226; rss: 143MB translation
time: 0.000; rss: 143MB assert dep graph
time: 0.000; rss: 143MB serialize dep graph
  time: 0.006; rss: 88MB    llvm function passes [0]
  time: 0.002; rss: 88MB    llvm module passes [0]
  time: 0.078; rss: 94MB    codegen passes [0]
  time: 0.001; rss: 95MB    codegen passes [0]
time: 0.096; rss: 94MB  LLVM passes
time: 0.000; rss: 94MB  serialize work products
time: 0.011; rss: 93MB  linking
    Finished debug [unoptimized + debuginfo] target(s) in 10.71 secs

The text was updated successfully, but these errors were encountered:

dtolnay · 2016-10-12T05:44:21Z

Looks like this is largely explained by array and tuple Deserialize impls. We have impls for arrays sized 1 through 32 and tuples sized 1 through 16 that look like this:

$(
    let $name = match try!(visitor.visit()) {
        Some(value) => value,
        None => { return Err(Error::end_of_stream()); }
    };
)+;

try!(visitor.end());

Ok(($($name,)+))

Without the array and tuple impls the total compile time is 4.35 seconds with 1.370 in ElaborateDrops.

I will look into using a loop for the larger array impls. On the Rust side is there maybe an optimization that would handle this pattern better?

alexcrichton · 2016-10-12T05:57:40Z

cc @rust-lang/compiler

arielb1 · 2016-10-13T18:56:01Z

callgrind does not show ElaborateDrops as hot (but it does take 10 minutes to run). Odd.

pnkfelix · 2016-10-13T20:07:08Z

triage: P-high

There used to be only a global cache, which led to uncached calls to trait selection when there were type parameters. I'm running a check that there are no adverse performance effects. Fixes rust-lang#37106 (drop elaboration times are now ~half of borrow checking, so might still be worthy of optimization, but not critical).

@pnkfelix

add a per-param-env cache to `impls_bound` There used to be only a global cache, which led to uncached calls to trait selection when there were type parameters. This causes a 20% decrease in borrow-checking time and an overall 0.5% performance increase during bootstrapping (as borrow-checking tends to be a tiny part of compilation time). Fixes #37106 (drop elaboration times are now ~half of borrow checking, so might still be worthy of optimization, but not critical). r? @pnkfelix

There used to be only a global cache, which led to uncached calls to trait selection when there were type parameters. I'm running a check that there are no adverse performance effects. Fixes rust-lang#37106 (drop elaboration times are now ~half of borrow checking, so might still be worthy of optimization, but not critical).

alexcrichton added I-nominated T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 12, 2016

dtolnay mentioned this issue Oct 12, 2016

Deserialize big arrays using a loop serde-rs/serde#579

Closed

rust-highfive added P-high High priority and removed I-nominated labels Oct 13, 2016

arielb1 mentioned this issue Oct 13, 2016

add a per-param-env cache to impls_bound #37152

Merged

bors closed this as completed in #37152 Oct 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Half of Serde compile time is spent in ElaborateDrops #37106

Half of Serde compile time is spent in ElaborateDrops #37106

dtolnay commented Oct 12, 2016

dtolnay commented Oct 12, 2016

alexcrichton commented Oct 12, 2016

arielb1 commented Oct 13, 2016

pnkfelix commented Oct 13, 2016

Half of Serde compile time is spent in ElaborateDrops #37106

Half of Serde compile time is spent in ElaborateDrops #37106

Comments

dtolnay commented Oct 12, 2016

dtolnay commented Oct 12, 2016

alexcrichton commented Oct 12, 2016

arielb1 commented Oct 13, 2016

pnkfelix commented Oct 13, 2016