Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update BOLT #87

Merged
merged 18 commits into from
Oct 28, 2020
Merged

Update BOLT #87

merged 18 commits into from
Oct 28, 2020

Conversation

shintaro-iwasaki
Copy link
Collaborator

To 90a9f97cbda3bef63d9866d300b73b8ccf65c7f5

jprotze and others added 18 commits October 7, 2020 13:31
The worker thread can start execution of the task before creation of the second task
Fixes the spurious failure reported in https://reviews.llvm.org/D61657

cherry-pick: 34b34e90fc3299debfda4add0e277f59b0a699da
llvm/llvm-project@34b34e9
The test disables suppression and therefore sometimes triggers a know false
positive in the openmp runtime. The test should only verify that the env
var is handles as expected.

cherry-pick: 69f87400a85e13482c535365bb19272a15d054b9
llvm/llvm-project@69f8740
RequiresDataSharing was always 0, resulting dead code in device runtime library.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D88829

cherry-pick: 3a12ff0dac5ab4f0e1f446abe66b451c1df8dac1
llvm/llvm-project@3a12ff0
…cture by Default

Summary:
This patch changes the CMake files for Clang and Libomptarget to query the
system for its supported CUDA architecture. This makes it much easier for the
user to build optimal code without needing to set the flags manually. This
relies on the now deprecated FindCUDA method in CMake, but full support for
architecture detection is only availible in CMake >3.18

Reviewers: jdoerfert ye-luo

Subscribers: cfe-commits guansong mgorny openmp-commits sstefan1 yaxunl

Tags: #clang #OpenMP

Differential Revision: https://reviews.llvm.org/D87946

cherry-pick: d564409946a5a13cb6391fc0fec54dcbd6f6d249
llvm/llvm-project@d564409
[libomptarget][amdgcn] Implement partial barrier

named_sync is used to coordinate non-spmd kernels. This uses bar.sync on nvptx.
There is no corresponding ISA support on amdgcn, so this is implemented using
shared memory, one word initialized to zero.

Each wave increments the variable by one. Whichever wave is last is responsible
for resetting the variable to zero, at which point it and the others continue.

The race condition on a wave reaching the barrier before another wave has
noticed that it has been released is handled with a generation counter, packed
into the same word.

Uses a shared variable that is not needed on nvptx. Introduces a new hook,
kmpc_impl_target_init, to allow different targets to do extra initialization.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D88602

cherry-pick: 8b6cd15242673c04618fb0aafc07d5de9e0bbe1e
llvm/llvm-project@8b6cd15
[openmp][libomptarget] Include header from LLVM source tree

The change is to the amdgpu plugin so is unlikely to break anything.

The point of contention is whether libomptarget can depend on LLVM.
A community discussion was cautiously not opposed yesterday.

This introduces a compile time dependency on the LLVM source tree, in this case
expressed as skipping the building of the plugin if LLVM_MAIN_INCLUDE_DIR is not
set. One the source files will #include llvm/Frontend/OpenMP/OMPGridValues.h,
instead of copy&pasting the numbers across.

For users that download the monorepo, the llvm tree is already on disk. This will
inconvenience users who download only the openmp source as a tar, as they would
now also have to download (at least a file or two) from the llvm source, if they want
to build the parts of the openmp project that (post this patch) depend on llvm.

There was interest expressed in going further - using llvm tools as part of
building libomp, or linking against llvm libraries. That seems less clear cut
an improvement and worthy of further discussion. This patch seeks only to change
policy to support openmp depending on the llvm source tree. Including in the
other direction, or using libraries / tools etc, are purposefully out of scope.

Reviewers are a best guess at interested parties, please feel free to add others

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D87841

cherry-pick: 7d2ecef5ed11698ae106bfbf295c44d761c7f946
llvm/llvm-project@7d2ecef
…s a struct member.

This patch fixes a problem whereby the pointee object of a PTR_AND_OBJ entry with a `map(to)` motion clause can be overwritten on the device even if its reference counter is >=1.

Currently, we check the reference counter of the parent struct in order to determine whether the motion clause should be respected, but since the pointee object is not part of the struct, it's got its own reference counter which should be used to enqueue the copy or discard it.

The same behavior has already been implemented in targetDataEnd (omptarget.cpp:539-540), but we somehow missed doing the same in targetDataBegin.

Differential Revision: https://reviews.llvm.org/D89597

cherry-pick: 5adb3a6d86eecade2cb94b1a04d35e673d4e5866
llvm/llvm-project@5adb3a6
[libomptarget][amdgcn] Implement missing symbols in deviceRTL

Malloc, wtime are stubs. Malloc needs a hostrpc implementation which is
a work in progress, wtime needs some experimentation to find out the
multiplier to get a time in seconds as documentation is scarce.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D89725

cherry-pick: d27b39ce11629f8742a487f9d1d2a343756d0da7
llvm/llvm-project@d27b39c
The calls to atmi_memcpy presently determine the direction of copy (host to
device or device to host) by storing pointers in a map during malloc and
looking up the pointers during memcpy. As each call site already knows the
direction, this stash+lookup can be eliminated.

This NFC will be followed by a functional one that deletes those map lookups.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D89776

Change-Id: I1d9089bc1e56b3a9a30e334735fa07dee1f84990

cherry-pick: aa616efbb34e6321c0f24f61e017efdcf398ba04
llvm/llvm-project@aa616ef
[libomptarget][amdgcn] Refactor memcpy to eliminate maps

Builds on D89776 to remove now dead code.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D89888

cherry-pick: 55dc123555dbb3e7ca717c1ecc0ab6cd934bdee5
llvm/llvm-project@55dc123
[libomptarget] Require LLVM source tree to build libomptarget

This is to permit reliably #including files from the LLVM tree in libomptarget,
as an improvement on the copy and paste that is currently in use. See D87841
for the first example of removing duplication given this new requirement.

The weekly openmp dev call reached consensus on this approach. See also D87841
for some alternatives that were considered. In the future, we may want to
introduce a new top level repo for shared constants, or start using the ADT
library within openmp.

This will break sufficiently exotic build systems, trivial fixes as below.

Building libomptarget as part of the monorepo will continue to work.
If openmp is built separately, it now requires a cmake macro indicating
where to find the LLVM source tree.

If openmp is built separately, without the llvm source tree already on disk,
the build machine will need a copy of a subset of the llvm source tree and
the cmake macro indicating where it is.

Reviewed By: protze.joachim

Differential Revision: https://reviews.llvm.org/D89426

cherry-pick: 26790ed248870a1e293e844945bf677825a43084
llvm/llvm-project@26790ed
`size_t` has different width on 32- and 64-bit architecture, but the
computation to floor to power of two assumed it is 64-bit, which can cause an
integer overflow. In this patch, architecture detection is added so that the
operation for 64-bit `size_t`. Thank Luke for reporting the issue.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89878

cherry-pick: 3091ed099f2f6a3d16dbdae7d0406f54dfc3031f
llvm/llvm-project@3091ed0
Patch by Erdner, Todd <todd.erdner@intel.com>

Differential Revision: https://reviews.llvm.org/D89898

cherry-pick: d6a0957467e86d5a87964d45fae18733e212c86f
llvm/llvm-project@d6a0957
The implementation of target nowait just wraps the target region into a task. The essential four parameters (base ptr, ptr, size, mapper) are taken as firstprivate such that they will be copied to the private location. When there is no user-defined mapper, the mapper variable will be nullptr. However, it will be still copied to the corresponding place. Therefore, a memcpy will be generated and the source pointer will be nullptr, causing a segmentation fault. The root cause is when calling `emitOffloadingArraysArgument`, the last argument `Options` has a field about whether it requires a task. It only takes depend clause into account. In this patch, the nowait clause is also included.

There're two things that will be done in another patches:
1. target data nowait has not been supported yet. D90099 added the support.
2. When there is no mapper, the mapper array can be nullptr no matter whether it requires outer task or not. It can avoid an unnecessary data copy. This is an optimization that is covered in D90101.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89844

cherry-pick: e20d64c3d9d81cad701f31d8481367222c76c787
llvm/llvm-project@e20d64c
Summary:
This patch adds support for passing in the original delcaration name in the
source file to the libomptarget runtime. This will allow the runtime to provide
more intelligent debugging messages. This patch takes the original expression
parsed from the OpenMP map / update clause and provides a textual
representation if it was explicitly mapped, otherwise it takes the name of the
variable declaration as a fallback. The information in passed to the runtime in
a global array of strings that matches the existing ident_t source location
strings using ";name;filename;column;row;;". See
clang/test/OpenMP/target_map_names.cpp for an example of the generated output
for a given map clause.

Reviewers: jdoervert

Differential Revision: https://reviews.llvm.org/D89802

cherry-pick: a87d7b3d448a16e416d1980b9d6aea99e4c9900b
llvm/llvm-project@a87d7b3
Summary:
This patch adds basic support for priting the source location and names for the
mapped variables. This patch does not support names for custom mappers. This is
based on D89802. The names information currently will be printed out only in
debug mode or using env LIBOMPTARGET_INFO during execution. But the information
is added when availible to the Device and Private data structures. To get the
information out the code must be built with debug symbols on using -g or
-Rpass=openmp-opt

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D90172

cherry-pick: d981c7b7581efc3ef378709042100e75da0185a0
llvm/llvm-project@d981c7b
…API"

This reverts commit d981c7b7581efc3ef378709042100e75da0185a0 and
a87d7b3d448a16e416d1980b9d6aea99e4c9900b. Test fails under msan.

cherry-pick: 207cf71fa9ce117051c73fe308e8e434f3c84c9c
llvm/llvm-project@207cf71
@shintaro-iwasaki
Copy link
Collaborator Author

The coverage of BOLT is the same as that of the official LLVM OpenMP: https://jenkins-pmrs.cels.anl.gov/job/bolt-llvmproj-review-centos/7/.

I will merge this PR.

@shintaro-iwasaki shintaro-iwasaki merged commit f96b9e7 into pmodels:main Oct 28, 2020
@shintaro-iwasaki shintaro-iwasaki deleted the bolt-ddec63 branch October 30, 2020 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants