Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Literally initializing [&str; N] (using proc macro) doesn't scale, rustc consumes huge amounts of time and RAM (stable and nightly) #90445

Closed
twelho opened this issue Oct 31, 2021 · 1 comment · Fixed by #90637
Assignees
Labels
C-bug Category: This is a bug.

Comments

@twelho
Copy link

twelho commented Oct 31, 2021

Hi Rust folks,

I'm using a procedural macro to read and preprocess some strings to be stored in an application without hitting the disk at any intermediate point. The macro outputs a TokenStream for a literal fixed-size array of string slices. When scaling up the amount of strings to a moderately high count (100k), I noticed something strange: compiling the application suddenly takes orders of magnitude more time and makes the compiler consumes almost 40 GiB of RAM in the process. Here's a MWE, for both the proc macro lib and the consumer binary bin:

lib/Cargo.toml:

[package]
name = "lib"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[lib]
proc-macro = true

[dependencies]

lib/src/lib.rs:

use proc_macro::TokenStream;

fn parse_count(stream: TokenStream) -> u32 {
    stream.to_string().parse().unwrap()
}

#[proc_macro]
pub fn generate_data(input: TokenStream) -> TokenStream {
    let mut s = String::from("[");
    for i in 0..parse_count(input) {
        if i > 0 {
            s.push_str(", ");
        }
        s.push_str(&format!("\"{}\"", i));
    }
    s.push_str("]");
    s.parse().unwrap()
}

bin/Cargo.toml:

[package]
name = "bin"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
lib = {path = "../lib"}

bin/src/main.rs:

use lib::generate_data;

fn main() {
    let data = generate_data!(100000);
    println!("{}", data.len());
}

Compiling the above takes many minutes, compared to just some seconds when the issue does not occur (see cases below). The memory consumption also peaks very high (truncated from the start, there's a slow increase over many minutes):

image

This issue is present on both latest stable and nightly (see versions used below), using both debug and release builds. In my project I have lto = true and opt-level = "z" as well, but neither helps. Running RUSTFLAGS="-Z time-passes" cargo build on nightly reveals that most of the time is spent on some borrow checking routine:

...
time: 110.766; rss:  161MB ->  623MB ( +462MB)  MIR_borrow_checking
...

I have also verified that it is not the macro that causes the hang by using proc-macro-error to emit a warning just as the macro is about to return the final TokenStream. The warning is emitted right away, before memory hogging begins.

From my testing, with the same element count, the issue does not occur, when:

  • The array is defined using a reference, i.e. the proc macro returns a reference to an array (&[...]).
  • The stored type in the array is something other than &str, compilation happens in seconds with e.g. u32.

This leads me to believe that the references of the &str types are at fault, but I don't have enough experience in rustc internals to say anything more than that something doesn't scale (quadratic time?).

Meta

rustc +stable --version --verbose:

rustc 1.56.0 (09c42c458 2021-10-18)
binary: rustc
commit-hash: 09c42c45858d5f3aedfa670698275303a3d19afa
commit-date: 2021-10-18
host: x86_64-unknown-linux-gnu
release: 1.56.0
LLVM version: 13.0.0

rustc +nightly --version --verbose:

rustc 1.58.0-nightly (e249ce6b2 2021-10-30)
binary: rustc
commit-hash: e249ce6b2345587d6e11052779c86adbad626dff
commit-date: 2021-10-30
host: x86_64-unknown-linux-gnu
release: 1.58.0-nightly
LLVM version: 13.0.0
Backtrace

No backtrace, compilation does eventually succeed given enough time and memory.

@twelho twelho added the C-bug Category: This is a bug. label Oct 31, 2021
@twelho twelho changed the title Literally initializing [&str; N] (using proc macro) doesn't scale, rustc consumes huge amounts of time and RAM Literally initializing [&str; N] (using proc macro) doesn't scale, rustc consumes huge amounts of time and RAM (stable and nightly) Oct 31, 2021
@Mark-Simulacrum Mark-Simulacrum self-assigned this Nov 1, 2021
@nbdd0121
Copy link
Contributor

nbdd0121 commented Nov 1, 2021

Probably duplicate of #86244

bors added a commit to rust-lang-ci/rust that referenced this issue Nov 24, 2021
…atthewjasper

Optimize live point computation

This refactors the live-point computation to lower per-MIR-instruction costs by operating on a largely per-block level. This doesn't fundamentally change the number of operations necessary, but it greatly improves the practical performance by aggregating bit manipulation into ranges rather than single-bit; this scales much better with larger blocks.

On the benchmark provided in rust-lang#90445, with 100,000 array elements, walltime for a check build is improved from 143 seconds to 15.

I consider the tiny losses here acceptable given the many small wins on real world benchmarks and large wins on stress tests. The new code scales much better, but on some subset of inputs the slightly higher constant overheads decrease performance somewhat. Overall though, this is expected to be a big win for pathological cases (as illustrated by the test case motivating this work) and largely not material for non-pathological cases. I consider the new code somewhat easier to follow, too.
@bors bors closed this as completed in cfa3fe5 Dec 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants