Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update BOLT #79

Merged
merged 95 commits into from
Oct 26, 2020
Merged

Conversation

shintaro-iwasaki
Copy link
Collaborator

This patch upgrades BOLT by merging upstream commits.
2.x branch, which will be based on LLVM OpenMP 11.0 will be diverged right after this PR.

JonChesterfield and others added 30 commits April 26, 2020 12:30
Summary:
[libomptarget] Implement smid for amdgcn

Implementation is in a new file as it uses an intrinsic with
complicated encoding that warranted substantial comments.

Reviewers: jdoerfert, grokos, ABataev, ronlieb

Reviewed By: jdoerfert

Subscribers: jvesely, mgorny, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D72956

cherry-pick: 03c2a59cd696135d79528d39e8e82ee59c1fcf97
llvm/llvm-project@03c2a59
- pthread affinity np has different semantic than sched affinity counterpart. On success returns strictly 0.

Reviewers: chandlerc, AndreyChurbanov, jdoerfert

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D72132

cherry-pick: ea99c09963488130ec0a61ef39df3fd0fcecad3c
llvm/llvm-project@ea99c09
…rong number of devices, by Shiley Tian.

Summary:
This patch is to fix issue in the following simple case:

  #include <omp.h>
  #include <stdio.h>

  int main(int argc, char *argv[]) {
    int num = omp_get_num_devices();
    printf("%d\n", num);

    return 0;
  }

Currently it returns 0 even devices exist. Since this file doesn't contain any
target region, the host entry is empty so further actions like initialization
will not be proceeded, leading to wrong device number returned by runtime
function call.

Reviewers: jdoerfert, ABataev, protze.joachim

Reviewed By: ABataev

Subscribers: protze.joachim

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D72576

cherry-pick: 9148b8b734e7279c86a7a75883efdfdf48e8d148
llvm/llvm-project@9148b8b
Summary:
[nfc][libomptarget] Remove SHARED annotation from local variables

A few local variables in reduction.cu were marked SHARED. This patch leaves
all per-kernel global state localised in omp_data.cu.

Reviewers: ABataev, jdoerfert, grokos

Reviewed By: jdoerfert

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D73239

cherry-pick: 0e9374e3740f82f2d46af564f6b059e68ff307c8
llvm/llvm-project@0e9374e
The OpenMP spec defines the OMP_ATK_* and OMP_ATV_* to be lowercase.

Differential Revision: https://reviews.llvm.org/D73248

cherry-pick: ad24cf2a942068e5bcdda3fbe58c084715266cf3
llvm/llvm-project@ad24cf2
This fixed build failures due to missing ompt headers.

See https://bugs.gentoo.org/700762.

Differential Revision: https://reviews.llvm.org/D73249

cherry-pick: 3c545e4b7318c337bed43d5bc76aad040565f1ef
llvm/llvm-project@3c545e4
…bles"

This reverts commit 0e9374e3740f82f2d46af564f6b059e68ff307c8.
Revert D73239. It fails some local testing, cause presently unknown

cherry-pick: ab9762a9f574207e1dbf2d82a13f68b0b434fc39
llvm/llvm-project@ab9762a
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=44733 | TEST 'libomp :: ompt/synchronization/reduction/tree_reduce.c' FAILED on 32-bit x86 ]]

For 32-bit we need at least 3 variables to avoid atomic reduction to be
choosen by runtime function `__kmp_determine_reduction_method`.
This patch adds reduction variables to the testcase.

Reviewers: mgorny, Hahnfeld

Differential Revision: https://reviews.llvm.org/D73850

cherry-pick: 90e4ebdce55fd3c1f8877f19784a5339b9890f98
llvm/llvm-project@90e4ebd
Summary: [libomptarget] Implement wavefront functions for amdgcn

Reviewers: jdoerfert, ABataev, grokos, arsenm

Reviewed By: arsenm

Subscribers: saiislam, wdng, arsenm, jvesely, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D73077

cherry-pick: 6a82f0f0b9d1f1f0ea1d1614b11e5b11bfcb9870
llvm/llvm-project@6a82f0f
…stsuite

EXCLUDE_FROM_ALL means something else for add_lit_testsuite as it does
for something like add_executable. Distinguish between the two by
renaming the variable and making it an argument to add_lit_testsuite.

Differential revision: https://reviews.llvm.org/D74168

cherry-pick: 4fe839ef3a51e0ea2e72ea2f8e209790489407a2
llvm/llvm-project@4fe839e
Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D74258

cherry-pick: 3ff4e2eee8c39fd4ae13b1df412f0af219fbf382
llvm/llvm-project@3ff4e2e
…ple kernels can be executed concurrently

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D74145

cherry-pick: a5153dbc368e1a484e179fafce7a260410d20569
llvm/llvm-project@a5153db
Change the string from "Intel(R) OMP" to "LLVM OMP" in libomp.so

Differential Revision: https://reviews.llvm.org/D74462

cherry-pick: 4f1f2b7a5b573bc9a2480c7607f02c2e4cbfb2be
llvm/llvm-project@4f1f2b7
Summary:
[libomptarget][nfc] Change enum values to match those in cuda/rtl

support.h and cuda/rtl.cpp (and downsteam hsa/rtl.cpp) have enums for execution
mode. These are actually independent - the numbers that used within support, or
within the plugin, are never passed across the boundary.

Nevertheless, trying to work out why the values are different between the two
has generated a reasonable amount of confusion. This patch changes support to
match the values in plugin, on the basis that the plugin also has some comments
which I'd have to update if I changed that one instead. Credit to Ron for
working through this in our own fork. See ROCm/aomp/issues/7
for that earlier diagnostic write up.

Also happy with generic = 0, spmd = 1 - provided it's the same in both places.

Reviewers: jdoerfert, grokos, ABataev, ronlieb

Reviewed By: grokos

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D74503

cherry-pick: c2ce9ea4e372ee31271da44c727dc598e12261a5
llvm/llvm-project@c2ce9ea
…ec description.

Summary: The 5.0 spec states, "The omp_get_max_threads routine returns an upper bound on the number of threads that could be used to form a new team if a parallel construct without a num_threads clause were encountered after execution returns from this routine." The attached test shows Max Threads: 96, Num Threads: 128 without the proposed change. The number of threads should not exceed the (max) nthreads ICV, hence we should return the higher SPMD thread number even when omp_get_max_threads() is called in a generic kernel. This change does fail the api test, max_threads.c, because now it would return 64 instead of 32.

Reviewers: jdoerfert, ABataev, grokos, JonChesterfield

Reviewed By: jdoerfert

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D74092

cherry-pick: 190a11148b756e0b650ad9c5b6cf5314e9afdd0a
llvm/llvm-project@190a111
cherry-pick: 578c13d13c42d20ada6ccadacccdf814caaf43b7
llvm/llvm-project@578c13d
From the context, it looks like the test should not be run with `check-all`,
but it does. It turns out option argument resolving to True/False which
could not be passed down as is. There is one such example in
AddLLVM.cmake.

cherry-pick: c2c4f1c1202ab5d66b3cf4b1d9769f3a8572d321
llvm/llvm-project@c2c4f1c
…sor to run

Team tests seem to require at least two physical cores, and using the same trick
as in https://reviews.llvm.org/D55598 doesn't work (why?) .
Using lit configuration instead.

Differential Revision: https://reviews.llvm.org/D74921

cherry-pick: 99b03c1c18de3e4228e31ef04d38f2d530d335be
llvm/llvm-project@99b03c1
From https://secure.phabricator.com/book/phabricator/article/arcanist_new_project/:

> An .arcconfig file is a JSON file which you check into your project's root.

I've done some experimentation, and it looks like the subproject
.arcconfigs just get ignored, as the documentation says. Given that
we're fully on the monorepo now, it's safe to remove them.

Differential Revision: https://reviews.llvm.org/D74996

cherry-pick: e34ddc09f464667b5f3a49bf60255e89041ddf44
llvm/llvm-project@e34ddc0
Differential Revision: https://reviews.llvm.org/D75001

cherry-pick: e16e267bb6ee8dd923bf328fd277d8e637eb34c6
llvm/llvm-project@e16e267
…alization of libomptarget.

Summary:
Instead of using global variables with unpredicted time of
deinitialization, use dynamically allocated variables with functions
explicitly marked as global constructor/destructor and priority. This
allows to prevent the crash because of the incorrect order of dynamic
libraries deinitialization.

Reviewers: grokos, hfinkel

Subscribers: caomhin, kkwli0, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D74837

cherry-pick: 63cef621f954eb87c494021725f4eeac89132d16
llvm/llvm-project@63cef62
…nce the warning from gcc.

Summary: fixed the warning from gcc since prios 0-100 are reserved for the internal use.

Reviewers: grokos

Subscribers: kkwli0, caomhin, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D75458

cherry-pick: c4a9d976c139f4dc7241809aa965fbc5355314fc
llvm/llvm-project@c4a9d97
Patch by Michael Klemm.

Differential Revision: https://reviews.llvm.org/D74956

cherry-pick: 95df6747cf1978aa1732d8b93af585e34cec4987
llvm/llvm-project@95df674
…sics

Summary:
[libomptarget] Implement hip atomic functions in terms of intrinsics

All but atomicInc can be implemented using type generic clang intrinsics.
There is not yet a corresponding intrinsic for atomicInc in clang, only one in
LLVM. This patch leaves atomicInc as an unresolved symbol.

Reviewers: jdoerfert, ABataev, hfinkel, grokos, arsenm

Reviewed By: arsenm

Subscribers: sri, saiislam, wdng, jvesely, mgorny, jfb, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D73076

cherry-pick: 133db44996a80db957360a0b727479d4a8d4d1a2
llvm/llvm-project@133db44
Summary:
[libomptarget][nfc] Move GetWarp/LaneId functions into per arch code

No code change for nvptx. Amdgcn currently has two implementations of GetLaneId,
this patch keeps the one a colleague considered to be superior for our ISA.

GetWarpId is currently the same function for amdgcn and nvptx, but I think it's
cleaner to keep it grouped with all the others than to keep it in support.cu.

Reviewers: jdoerfert, grokos, ABataev

Reviewed By: jdoerfert

Subscribers: jvesely, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D75587

cherry-pick: 918a1065be642a3d5f804c95d7971c2d1b96cdf5
llvm/llvm-project@918a106
Summary:
[libomptarget] Implement locks for amdgcn

The nvptx implementation deadlocks on amdgcn. atomic_cas with multiple
active lanes can deadlock - if one lane succeeds, all the others are locked
out. The set_lock implementation therefore runs on a single lane.

Also uses a sleep intrinsic instead of the system clock for a probably
minor performance improvement. The unset/test implementations may be revised
later, based on code size / performance or similar concerns.

This implements the lock at a per-wavefront scope. That's not strictly as
specified, since openmp describes locks in terms of threads. I think the
nvptx implementation provides true per-thread locking on volta and the same
per-warp locking on other architectures.

Reviewers: jdoerfert, ABataev, grokos

Reviewed By: jdoerfert

Subscribers: jvesely, mgorny, jfb, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D75546

cherry-pick: 221ada654b28a524d01dc70ec16d38e0f2484f78
llvm/llvm-project@221ada6
cherry-pick: f0689d2e6209a6c404ebf85ba4f1f98b0e22f8a8
llvm/llvm-project@f0689d2
…load.

Summary: Explicitly initialize data members of RTLsTy class upon construction.

Reviewers: grokos

Subscribers: guansong, openmp-commits, caomhin, kkwli0

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D75946

cherry-pick: c422d69b1ad7ae3fdbe0d4ec795a2931e08459f7
llvm/llvm-project@c422d69
shiltian and others added 23 commits October 7, 2020 13:30
…_info"

This reverts commit ee1bf45e1d42d7f386d8321c3a8799476344ad91.

cherry-pick: aaf50adb539dc2504123486448a2c6f28f2c2c72
llvm/llvm-project@aaf50ad
Adds OMPT support for the mutexinoutset dependency

Reviewed by: hbae

Differential Revision: https://reviews.llvm.org/D81890

cherry-pick: 63a3c5925dc97160283d97cab1859315c45c114c
llvm/llvm-project@63a3c59
cherry-pick: 73b7ff4e16c6c806adca57998cf4144a3be59c49
llvm/llvm-project@73b7ff4
Summary:
lookupMapping took significant time due to linear complexity searching.
This is bad for offloading from multiple host threads because lookupMapping is protected by mutex.
Use std::set for logarithmic complexity searching.

Before my change.
libomptarget inclusive time 16.7 sec, exclusive time 8.6 sec.
After the change
libomptarget inclusive time 7.3 sec, exclusive time 0.4 sec.

Most of the overhead of libomptarget (exclusive time) is gone.

Reviewers: jdoerfert, grokos

Reviewed By: grokos

Subscribers: tianshilei1992, yaxunl, guansong, sstefan1

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D82264

cherry-pick: 6e5f64c44f26465829e5aed38e900129ace6e64f
llvm/llvm-project@6e5f64c
Summary:
`config.test_extra_flags` is passed in from `lit.site.cfg.in` files, but they're not used in the LIT configs. This variable can be useful for distros which don't have the standard c/c++ headers in the default search paths. Since the tests run clang on c/c++ source code, we rely on `test_extra_flags` to pass in the necessary header files.

This is a similar setup that's also done in litomptarget https://github.com/llvm/llvm-project/blob/master/openmp/libomptarget/test/lit.cfg#L42 and openmp/runtime.

Reviewers: jdoerfert, jdenny, protze.joachim

Reviewed By: jdoerfert

Subscribers: yaxunl, guansong, sstefan1, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D82516

cherry-pick: 1eaebe192f08934527e741ba65d29c933d685680
llvm/llvm-project@1eaebe1
Reviewed by: runlieb

Differential Revision: https://reviews.llvm.org/D82452

cherry-pick: d4230c67bf8a2b4f9c4d80898f54fd00db719ae9
llvm/llvm-project@d4230c6
Summary: Warnings are printed by clang when building LIBOMPTARGET_ENABLE_DEBUG=ON due incorrect format string.

Reviewers: tianshilei1992, jdoerfert

Reviewed By: tianshilei1992

Subscribers: yaxunl, guansong, sstefan1, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D82789

cherry-pick: 45bb073da8efc8652f7b8b351709ce0c838ed4b0
llvm/llvm-project@45bb073
When configuring in-tree, the correct names are LLVM_VERSION_MAJOR
and LLVM_VERSION_MINOR. This has been wrong since the code was added
in commits fc473dee98 and 821649229e.

cherry-pick: 0e0483bf5c383d5815b9f945fea7e347d4badc0e
llvm/llvm-project@0e0483b
This adds the missing event callbacks to express dependencies on included tasks
and taskwait with depend clause.

The test fails for GCC, see bug report:
https://bugs.llvm.org/show_bug.cgi?id=46573

Reviewed by: hbae

Differential Revision: https://reviews.llvm.org/D81891

cherry-pick: 47cb8a0f0bcbab7bde2d07a84ec02ed0f3186987
llvm/llvm-project@47cb8a0
cherry-pick: 3fc97f9636ba2d7323f61ec3dea431fb1f850f03
llvm/llvm-project@3fc97f9
Fixes build with LIBOMP_OMPT_SUPPORT=off

Reported by: Jason Edson

Reviewed by: Hahnfeld

Differential Revision: https://reviews.llvm.org/D83171

cherry-pick: 30205865d96aab4a39464674bafe7600fd1dabaa
llvm/llvm-project@3020586
…ersions

If the compilation fails, the test is marked as unsupported.
-> This will never change for a specific version of gcc

If the linking fails, the test is marked as expected to fail.
-> This might change as LLVM/OpenMP implements the missing GOMP interface function

Reviewed by: Hahnfeld

Differential Revision: https://reviews.llvm.org/D83077

cherry-pick: 8289f2891e8691987b9ef9da248011798796ef45
llvm/llvm-project@8289f28
Following tests are failing after upgrading to version 5.0 but are passing
for version 4.5:
1. openmp/runtime/test/env/kmp_set_dispatch_buf.c
2. openmp/runtime/test/worksharing/for/kmp_set_dispatch_buf.c

To be enabled as soon as these tests are fixed.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D82963

cherry-pick: 4c4bda1630a67e656fb1e18b25cb27344df58493
llvm/llvm-project@4c4bda1
Reviewed by: Hahnfeld, saiislam

Differential Revision: https://reviews.llvm.org/D82267

cherry-pick: 6d9626d2da3e59da0b0ae85d975b05a8f96b9939
llvm/llvm-project@6d9626d
This patch adds missing GOMP_5.0 loop entry points which incorporate
new non-monotonic default into entry point name.  Since monotonic
schedules are a subset of nonmonotonic, it is acceptable to use
monotonic as the implementation.  This patch simply has the nonmonotonic
(and possibly non-monontonic) versions of the loop entry points as
wrappers around the monotonic ones.

Differential Revision: https://reviews.llvm.org/D73922

cherry-pick: 95a28df5c458cd751355fc84263762c330474254
llvm/llvm-project@95a28df
…ing clang builtins

This function uses __builtin_amdgcn_atomic_inc32():
  uint32_t atomicInc(uint32_t *address, uint32_t max);

These functions use __builtin_amdgcn_fence():
__kmpc_impl_threadfence()
__kmpc_impl_threadfence_block()
__kmpc_impl_threadfence_system()

They will take place of current mechanism of directly calling IR functions.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D83132

cherry-pick: 38d6640ba55e52a4ae23059164660075c3d8e18a
llvm/llvm-project@38d6640
Summary:
Retaining per device primary context is preferred to creating a context owned by the plugin.

From CUDA documentation
1. Note that the use of multiple CUcontext s per device within a single process will substantially degrade performance and is strongly discouraged. Instead, it is highly recommended that the implicit one-to-one device-to-context mapping for the process provided by the CUDA Runtime API be used." from https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DRIVER.html
2. Right under cuCtxCreate. In most cases it is recommended to use cuDevicePrimaryCtxRetain. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g65dc0012348bc84810e2103a40d8e2cf
3. The primary context is unique per device and shared with the CUDA runtime API. These functions allow integration with other libraries using CUDA.  https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX

Two issues are addressed by this patch:
1. Not using the primary context caused interoperability issue with libraries like cublas, cusolver. CUBLAS_STATUS_EXECUTION_FAILED and cudaErrorInvalidResourceHandle
2. On OLCF summit, "Error returned from cuCtxCreate" and "CUDA error is: invalid device ordinal"

Regarding the flags of the primary context. If it is inactive, we set CU_CTX_SCHED_BLOCKING_SYNC. If it is already active, we respect the current flags.

Reviewers: grokos, ABataev, jdoerfert, protze.joachim, AndreyChurbanov, Hahnfeld

Reviewed By: jdoerfert

Subscribers: openmp-commits, yaxunl, guansong, sstefan1, tianshilei1992

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D82718

cherry-pick: c5348aecd7723e7aa7b18406d0c97724c0659f34
llvm/llvm-project@c5348ae
Commit 95a28df5c provided implementation for GOMP*_nonmonotonic*runtime*
functions. Now the tests succeed with gcc 9 and 10

cherry-pick: 0fa0cf8638b0777a1a44feebf78a63865e48ecf6
llvm/llvm-project@0fa0cf8
Summary:
We carried a lot of unused and untested code in the device runtime.
Among other reasons, we are planning major rewrites for which reduced
size is going to help a lot.

The number of code lines reduced by 14%!

Before:
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
CUDA                            13            489            841           2454
C/C++ Header                    14            322            493           1377
C                               12            117            124            559
CMake                            4             64             64            262
C++                              1              6              6             39
-------------------------------------------------------------------------------
SUM:                            44            998           1528           4691
-------------------------------------------------------------------------------

After:
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
CUDA                            13            366            733           1879
C/C++ Header                    14            317            484           1293
C                               12            117            124            559
CMake                            4             64             64            262
C++                              1              6              6             39
-------------------------------------------------------------------------------
SUM:                            44            870           1411           4032
-------------------------------------------------------------------------------

Reviewers: hfinkel, jhuber6, fghanim, JonChesterfield, grokos, AndreyChurbanov, ye-luo, tianshilei1992, ggeorgakoudis, Hahnfeld, ABataev, hbae, ronlieb, gregrodgers

Subscribers: jvesely, yaxunl, bollu, guansong, jfb, sstefan1, aaron.ballman, openmp-commits, cfe-commits

Tags: #clang, #openmp

Differential Revision: https://reviews.llvm.org/D83349

cherry-pick: cd0ea03e6f157e8fb477cd8368b29e1448eeb265
llvm/llvm-project@cd0ea03
There are various runtime calls in the device runtime with unused, or
always fixed, arguments. This is bad for all sorts of reasons. Clean up
two before as we match them in OpenMPOpt now.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D83268

cherry-pick: c98699582a6333bbe76ff7853b4cd6beb45754cf
llvm/llvm-project@c986995
parallel_offloading_map.c use too many threads if the underlying libomptarget
implementation uses x86/64 emulation.  This patch reduces the number of threads.
@shintaro-iwasaki shintaro-iwasaki merged commit 1cf40d9 into pmodels:main Oct 26, 2020
@shintaro-iwasaki shintaro-iwasaki deleted the UpdateBOLT-50d0db branch October 30, 2020 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.