Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"llvm-profdata merge" doesn't work in 13.0 #50966

Open
berolinux opened this issue Aug 25, 2021 · 16 comments
Open

"llvm-profdata merge" doesn't work in 13.0 #50966

berolinux opened this issue Aug 25, 2021 · 16 comments
Labels
bugzilla Issues migrated from bugzilla confirmed Verified by a second party llvm-tools All llvm tools that do not have corresponding tag

Comments

@berolinux
Copy link

Bugzilla Link 51624
Version trunk
OS Linux
Blocks #50580 #51489
CC @tstellar

Extended Description

Doing something along the lines of
export LLVM_PROFILE_FILE=xyz-%p.profile.d
export CFLAGS="-O2 -fprofile-instr-generate"
export CXXFLAGS="-O2 -fprofile-instr-generate"
./configure
make

Run the generated binaries in some expected ways, e.g. "make check"

llvm-profdata merge --output=xyz.profile xyz-*.profile.d

consistently (obviously with PIDs varying) results in
warning: xyz-670250.profile.d: malformed instrumentation profile data
warning: xyz-670257.profile.d: malformed instrumentation profile data
error: no profile can be merged

The *.profile.d files look ok at a first glance, and "file" recognizes them as "LLVM raw profile data, version 7".

Looks like only "llvm-profdata merge" is broken, using -fprofile-instr-use= seems to be ok.

This is a regression from 12.x.

@slacka
Copy link
Mannequin

slacka mannequin commented Sep 25, 2021

Can we get a bisect on this?

@tstellar
Copy link
Collaborator

mentioned in issue #51489

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
@asl asl added this to the LLVM 13.0.1 release milestone Dec 12, 2021
@chfast
Copy link
Member

chfast commented Dec 18, 2021

I have the same problem even without running multiple processes. This only happens on my CI server, not locally.

@tstellar
Copy link
Collaborator

The deadline for requesting fixes for the release has passed. This bug is being removed from the LLVM 13.0.1 release milestone. If you have a fix or think this bug is important enough to block the release, please explain why in a comment and add the bug back to the LLVM 13.0.1 release milestone.

@tstellar tstellar removed this from the LLVM 13.0.1 release milestone Dec 22, 2021
@chfast
Copy link
Member

chfast commented Dec 27, 2021

I have identified the core cause of the problem. The malformed data is generated for programs running with shared libraries (both for shared libs linked directly or loaded with dlopen()) when both the main binary and a shared library are instrumented.

I'm not sure this was intended or not but this scenario was working with clang 12.

The workaround (or proper fix?) is the add %m to LLVM_PROFILE_FILE. The %m will expand to different identifiers for the main binary and the shared library.

@aminya
Copy link
Member

aminya commented Aug 6, 2022

I also have the same problem. I run my program multiple times with different arguments, I have written a hash function to hash the arguments passed to the function.

In my case, the generated profraw is empty if I use something like data-%my_call_hash-%p.profraw, but if I add %m to the name, like data-%my_call_hash-%p-%m.data, the .profraw is not generated altogether. Removing %p doesn't make a difference.

It gives no error on Windows, but on Linux, it gives some error about out-of-bounds count (I can provide the exact error if you want).

The workaround (or proper fix?) is the add %m to LLVM_PROFILE_FILE. The %m will expand to different identifiers for the main binary and the shared library.

Yes, I have a shared library that is not instrumented. Adding %m doesn't work for me as I mentioned above.

@EugeneZelenko EugeneZelenko added the llvm-tools All llvm tools that do not have corresponding tag label Aug 6, 2022
@KevinHake
Copy link

KevinHake commented Sep 14, 2023

As far as I can tell, this remains broken in every version since 13 (my own project gets this error on every version from 16.0.5 down, so we're stuck using LLVM 12.0.1 for now)

@Endilll Endilll added the confirmed Verified by a second party label Sep 14, 2023
@KevinHake
Copy link

I did a bisect, the first bad commit is e50a388.

@Endilll
Copy link
Contributor

Endilll commented Sep 19, 2023

CC @gulfemsavrun

@gulfemsavrun
Copy link
Contributor

Thanks for the report. Is there a small reproducer that you can provide?

@aminya
Copy link
Member

aminya commented Sep 21, 2023

@KevinHake How did you test your bisect? That could be a good reproduction.

@KevinHake
Copy link

I haven't had time to try to create a minimal project that has the same issue. Nor have I made a hello world to see if the most basic executable actually works (tho I imagine the automated tests would've caught that long ago if it were broken). We're loading a few other dlls for graphics that were not compiled with clang, not sure if that's related.

The default.profraw output between good and bad versions didn't look obviously wrong (nor parse-able at all by eye), but I might try llvm-profdata merge in the debugger to see if it gives a better idea what triggers it to give up.

@gulfemsavrun
Copy link
Contributor

Is it possible for you to provide the profiles (*.profraw files), so I can take a look?

@KevinHake
Copy link

It's my client's code, I don't have rights to share unfortunately. I'll look into paring it down into something minimal, it probably won't take long.

@KevinHake
Copy link

Here is the small reproducer:

int main() { return 0; }

My environment:

Operating System: Microsoft Windows 10 Enterprise, v10.0.19045 N/A Build 19045

Ninja v1.11.1 (https://github.com/ninja-build/ninja/releases)
- copied to C:\Program Files\Ninja\ninja.exe
CMake v3.27.6 (https://cmake.org/download/)
- installed to C:\Program Files\CMake\bin\cmake.exe
Visual Studio 2022 - used to build different versions of LLVM, using this bat script run from dir llvm-project/build/ :

@echo off

path C:\Program Files\Ninja;%PATH%
path C:\Program Files\CMake\bin;%PATH%

call "C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Auxiliary\Build\vcvars64.bat"

REM last working commit
set LLVMVER=LLVM-2021-07-23-120b187
REM first bad commit:
REM set LLVMVER=LLVM-2021-07-23-e50a388

echo Configuring LLVM build:
cmake --version
cmake -G Ninja -DLLVM_ENABLE_PROJECTS=clang;lld;libc -DLLVM_ENABLE_RUNTIMES= -DCMAKE_INSTALL_PREFIX=c:\src\llvm\%LLVMVER% -DCMAKE_BUILD_TYPE=Release ../llvm

rem Looks like some kind of concurrency bug makes the pool fail when getting a new piece to compile... this loop worked without giving up on threads
echo Building %LLVMVER%
:build
cmake --build .
if %ERRORLEVEL% neq 0 (
    echo Build failed! Let's try again...
    goto build
)

echo Build finished successfully!

echo Installing...
cmake --install .

endlocal
pause

It took ~1hr to build each version (my install dir c:\src\llvm\ has several LLVM versions, including the last good and first bad commits)

I build the simple reproducer this way:

%LLVM_BIN_DIR%\clang-cl.exe  -fprofile-instr-generate -fcoverage-mapping /c /Fomain.obj main.c
%LLVM_BIN_DIR%\lld-link.exe main.obj

, where %LLVM_BIN_DIR% points to the installed binaries for a particular LLVM commit.

I then reproduce the issue by running main.exe, which produces default.profraw, and then running %LLVM_BIN_DIR%\llvm-profdata merge *.profraw -o default.profdata
The profdata merge works normally when using the LLVM build from commit 120b187, and fails in the next commit e50a388.

@KevinHake
Copy link

In the RFC for e50a388 (https://lists.llvm.org/pipermail/llvm-dev/2021-June/151154.html), it sounds like the new build ID was targeting ELF binaries..? Given it seems to break coverage for even the simplest PE/COFF binary in Windows, it's worth checking if the same occurs for Mach-O on a Mac (I don't have a Mac). For that matter I haven't tested in Linux with an ELF file either.

If I'm not the only one that repros this with int main(){return 0;} that seems like a pretty easy automated test for each platform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla confirmed Verified by a second party llvm-tools All llvm tools that do not have corresponding tag
Projects
None yet
Development

No branches or pull requests

9 participants