-
Notifications
You must be signed in to change notification settings - Fork 11.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLVM12 PCH leads to 2x slower compilation times #49638
Comments
We have reproduced the issue on three different machines, all on Windows 10, using LLVM 12.0.0 x86_64-pc-windows-msvc. The issue is observed no matter the C++ project being compiled. We have a compilation pipeline (~140 projects) which using LLVM11 would take around 30minutes to complete with PCH and 60 without PCH. On LLVM12 we observe 60minutes without PCH and ~140minutes when using PCH. |
You should provide some reproducer, testcase. |
I am not really sure how to proceed regarding 'reproducer, testcase'. As as I have said previously, we are seeing this issue with all C++ projects, it's not isolated to a particular one. We have also observed that '-emit-pch' command line argument is broken in LLVM12, calling clang with -e mit-pch We have been able to get around the issue by simply not providing Should I create a separate issue for the 'emit-pch' problem? I can easily provide a sample zip for that. |
Yes, please. |
It should be enough to find one compilation that is slower with a PCH than without and then provide that one as a reproducer. Preprocessing the source with '-E -frewrite-includes' (for both PCH creation and use) should give you all sources as one file. And you could test if using -fpch-instantiate-templates makes a difference for you. If you use clang-cl you'll probably need to prefix these with /Xclang (or whatever the option is exactly). |
Working on getting the reproducer ready. |
Files and powershell script needed for reproducing issue On my i9 9900K machine I get 14.8 seconds on LLVM12, compared to 3.2 seconds using LLVM 11. The compilation script uses the clang++ driver and presumes the existence of an MSVC toolset and Windows 10 SDK, using hardcoded paths. Please update them accordingly. |
I might be seeing the same issue. I also noticed that the resulting codegen is not identical with and without PCH enabled, using clang-cl frontend. When I did a side by side comparison CPU usage trace I saw the extra cycles being spent starting at clang::Sema::ActOnEndOfTranslationUnit going into a deeply recursive clang::Sema::PerformPendingInstantiations <-> clang::Sema::InstantiateFunctionDefinition call cycle. Each entry into InstantiateFunctionDefinition would incur an expensive call to clang::ASTReader::ReadLateParsedTemplates. The compared compilations were identical modulo the usage of the PCH file. Both compilations used /FI to include the pch header. I've attached a screenshot showing the comparsion, the top trace is from the compilation using a PCH, the bottom without. |
More specifically, in the non-PCH case, there do not appear to be any calls to ReadLateParsedTemplates. I can highlight the difference there specifically if desired. The call depth appears to be the same in ActOnEndOfTranslationUnit, at least as far as I've bothered to expand the CPU trace. |
This should be related to |
That flag, or rather, using Using Given the earlier report indicated this is a regression from 11 to 12, I will try a bisect to see if a suspect change can be brought to light. |
Just from quickly looking at the source, your bisecting will most likely point you to 2c9dbcd . Notice how it removes 'LateParsedTemplates.clear();', which means repeated calls of the function are no longer no-ops. |
Confirmeing that the bisect goes to that change (thanks for the pointer!). Was removing the clear() call a bug, so we can add it back, or required to fix the problem the change was addressing? |
I don't know. It seems gargvaibhav64@gmail.com does not have a bugzilla account, so I've sent a direct mail pointing here. |
I think the reason we only see this on Windows is because the default when targeting Windows is to use |
Thanks for the analysis; I've added back the clear() call. Please can someone test with Clang trunk and see if the problem is still visible? |
I've compiled the trunk and after testing on our internal codebase I can confirm the issue is now gone. Compilation is quick & smooth now. Thank you, Richard Smith, for the fix. |
We are very happy with the results of this fix. However, I've noticed that |
Richard, any objections to backporting this? |
Merged: a52b7bf |
mentioned in issue #51489 |
Extended Description
Starting with LLVM12, we are getting a 1.5x - 2x performance hit if we compile our projects using PCH, compared to without PCH with LLVM 12. Exactly the opposite of what one would expect by using precompiled headers.
Compared to LLVM 11, with PCH enabled, we are observing a penalty of 3-4x.
The text was updated successfully, but these errors were encountered: