Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISC-V] coreclr-pal and coreclr-debug directories #82380

Merged
merged 10 commits into from
Apr 13, 2023

Conversation

clamp03
Copy link
Member

@clamp03 clamp03 commented Feb 20, 2023

  • Successfully cross-build for RISC-V.
  • Run A simple application "helloworld"
  • Fail a test in clr.paltest

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Feb 20, 2023
This was referenced Feb 20, 2023
@jkotas jkotas added arch-riscv Related to the RISC-V architecture area-PAL-coreclr labels Feb 20, 2023
@clamp03
Copy link
Member Author

clamp03 commented Feb 21, 2023

If you don't mind, I want to fix a paltest error after 4 risc-v PRs are merged. It is hard to investigate an error now while I am trying to manage PRs.

@clamp03
Copy link
Member Author

clamp03 commented Feb 23, 2023

@mangod9 @janvorli Could you please review this? I will update as possible as I can for any comments. Thank you.

.macro PROLOG_STACK_ALLOC Size
addi sp, sp, -\Size
//.cfi_adjust_cfa_offset \Size
.cfi_def_cfa sp,\Size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not adjust the CFA. The adjustment is made in the code using this macro, e.g. in the CallSignalHandlerWrapper implementation that was added in this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will put CFG adjust codes in the code using this macro. Thank you!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this changed from my previous review.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated other codes, however I missed these
I removed .cfi_def_cfa in the macros. PROLOG_STACK_ALLOC and EPILOG_STACK_FREE
Thank you.

.macro EPILOG_STACK_FREE Size
addi sp, sp, \Size
//.cfi_adjust_cfa_offset -\Size
.cfi_def_cfa sp,-\Size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason for not using the .cfi_adjust_cfa_offset? I am just curious, it is not a problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented based on LOONGARCH. I updated assembly codes and copied most other implementations (including cfi).

#if HAVE_FULLY_FEATURED_PTHREAD_MUTEXES && \
HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES && \
!(defined(__FreeBSD__) || defined(TARGET_OSX))
!(defined(__FreeBSD__) || defined(TARGET_OSX) || defined(TARGET_RISCV64))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should define the HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES_EXITCODE to 1 (=> we don't have them) for RISCV64 in the tryrun.cmake. Now we set it to 0 for may architectures including RISCV64 here:

if(TARGET_ARCH_NAME MATCHES "^(x86|s390x|armv6|loongarch64|riscv64|ppc64le)$")
set_cache_value(HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES_EXITCODE 0)
endif()

This is used for cross compilation. For compiling on real RISCV6 device, it would get detected at build time using a test execution in the configure.cmake here:

if(NOT CLR_CMAKE_HOST_ARCH_ARM AND NOT CLR_CMAKE_HOST_ARCH_ARM64)
set(CMAKE_REQUIRED_LIBRARIES pthread)
check_cxx_source_runs("
// This test case verifies the pthread process-shared robust mutex's cross-process abandon detection. The parent process starts
// a child process that locks the mutex, the process process then waits to acquire the lock, and the child process abandons the
// mutex by exiting the process while holding the lock. The parent process should then be released from its wait, be assigned
// ownership of the lock, and be notified that the mutex was abandoned.
#include <sys/mman.h>
#include <sys/time.h>
#include <errno.h>
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <new>
using namespace std;
struct Shm
{
pthread_mutex_t syncMutex;
pthread_cond_t syncCondition;
pthread_mutex_t robustMutex;
int conditionValue;
Shm() : conditionValue(0)
{
}
} *shm;
int GetFailTimeoutTime(struct timespec *timeoutTimeRef)
{
int getTimeResult = clock_gettime(CLOCK_REALTIME, timeoutTimeRef);
if (getTimeResult != 0)
{
struct timeval tv;
getTimeResult = gettimeofday(&tv, NULL);
if (getTimeResult != 0)
return 1;
timeoutTimeRef->tv_sec = tv.tv_sec;
timeoutTimeRef->tv_nsec = tv.tv_usec * 1000;
}
timeoutTimeRef->tv_sec += 30;
return 0;
}
int WaitForConditionValue(int desiredConditionValue)
{
struct timespec timeoutTime;
if (GetFailTimeoutTime(&timeoutTime) != 0)
return 1;
if (pthread_mutex_timedlock(&shm->syncMutex, &timeoutTime) != 0)
return 1;
if (shm->conditionValue != desiredConditionValue)
{
if (GetFailTimeoutTime(&timeoutTime) != 0)
return 1;
if (pthread_cond_timedwait(&shm->syncCondition, &shm->syncMutex, &timeoutTime) != 0)
return 1;
if (shm->conditionValue != desiredConditionValue)
return 1;
}
if (pthread_mutex_unlock(&shm->syncMutex) != 0)
return 1;
return 0;
}
int SetConditionValue(int newConditionValue)
{
struct timespec timeoutTime;
if (GetFailTimeoutTime(&timeoutTime) != 0)
return 1;
if (pthread_mutex_timedlock(&shm->syncMutex, &timeoutTime) != 0)
return 1;
shm->conditionValue = newConditionValue;
if (pthread_cond_signal(&shm->syncCondition) != 0)
return 1;
if (pthread_mutex_unlock(&shm->syncMutex) != 0)
return 1;
return 0;
}
void DoTest_Child();
int DoTest()
{
// Map some shared memory
void *shmBuffer = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0);
if (shmBuffer == MAP_FAILED)
return 1;
shm = new(shmBuffer) Shm;
// Create sync mutex
pthread_mutexattr_t syncMutexAttributes;
if (pthread_mutexattr_init(&syncMutexAttributes) != 0)
return 1;
if (pthread_mutexattr_setpshared(&syncMutexAttributes, PTHREAD_PROCESS_SHARED) != 0)
return 1;
if (pthread_mutex_init(&shm->syncMutex, &syncMutexAttributes) != 0)
return 1;
if (pthread_mutexattr_destroy(&syncMutexAttributes) != 0)
return 1;
// Create sync condition
pthread_condattr_t syncConditionAttributes;
if (pthread_condattr_init(&syncConditionAttributes) != 0)
return 1;
if (pthread_condattr_setpshared(&syncConditionAttributes, PTHREAD_PROCESS_SHARED) != 0)
return 1;
if (pthread_cond_init(&shm->syncCondition, &syncConditionAttributes) != 0)
return 1;
if (pthread_condattr_destroy(&syncConditionAttributes) != 0)
return 1;
// Create the robust mutex that will be tested
pthread_mutexattr_t robustMutexAttributes;
if (pthread_mutexattr_init(&robustMutexAttributes) != 0)
return 1;
if (pthread_mutexattr_setpshared(&robustMutexAttributes, PTHREAD_PROCESS_SHARED) != 0)
return 1;
if (pthread_mutexattr_setrobust(&robustMutexAttributes, PTHREAD_MUTEX_ROBUST) != 0)
return 1;
if (pthread_mutex_init(&shm->robustMutex, &robustMutexAttributes) != 0)
return 1;
if (pthread_mutexattr_destroy(&robustMutexAttributes) != 0)
return 1;
// Start child test process
int error = fork();
if (error == -1)
return 1;
if (error == 0)
{
DoTest_Child();
return -1;
}
// Wait for child to take a lock
WaitForConditionValue(1);
// Wait to try to take a lock. Meanwhile, child abandons the robust mutex.
struct timespec timeoutTime;
if (GetFailTimeoutTime(&timeoutTime) != 0)
return 1;
error = pthread_mutex_timedlock(&shm->robustMutex, &timeoutTime);
if (error != EOWNERDEAD) // expect to be notified that the robust mutex was abandoned
return 1;
if (pthread_mutex_consistent(&shm->robustMutex) != 0)
return 1;
if (pthread_mutex_unlock(&shm->robustMutex) != 0)
return 1;
if (pthread_mutex_destroy(&shm->robustMutex) != 0)
return 1;
return 0;
}
void DoTest_Child()
{
// Lock the robust mutex
struct timespec timeoutTime;
if (GetFailTimeoutTime(&timeoutTime) != 0)
return;
if (pthread_mutex_timedlock(&shm->robustMutex, &timeoutTime) != 0)
return;
// Notify parent that robust mutex is locked
if (SetConditionValue(1) != 0)
return;
// Wait a short period to let the parent block on waiting for a lock
sleep(1);
// Abandon the mutex by exiting the process while holding the lock. Parent's wait should be released by EOWNERDEAD.
}
int main()
{
int result = DoTest();
return result >= 0 ? result : 0;
}" HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES)
set(CMAKE_REQUIRED_LIBRARIES)
endif()

Copy link
Member Author

@clamp03 clamp03 Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The codes set NAMED_MUTEX_USE_PTHREAD_MUTEX HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES_EXITCODE to 0 to fix paltest fail (threading/NamedMutex/test1/paltest_namedmutex_test1).
Do you mean we should set it to 1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused - setting the HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES_EXITCODE in the tryrun.cmake to 0 means that the riscv Linux has functional robust mutexes. But in your comment, you say it doesn't. That's why I have mentioned setting it to 1 for riscv4 which should do the same thing as the extra condition you have added to the #if. In the cross compilation case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I will set HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES_EXITCODE to 1 like others.
Thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the change in the tryrun.cmake, the change in this ifdef should not be needed (because the HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES is now 0)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Removed the codes.

clamp03 added a commit to clamp03/runtime that referenced this pull request Mar 8, 2023
Updated by review in dotnet#82380
- Successfully cross-build for RISC-V.
- Run A simple application "helloworld"
- Fail a test in clr.paltest
Copy link
Member

@gbalykov gbalykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Looked just through half of this, have few comments.

src/coreclr/pal/inc/rt/ntimage.h Outdated Show resolved Hide resolved
src/coreclr/pal/inc/rt/ntimage.h Outdated Show resolved Hide resolved
src/coreclr/debug/di/riscv64/floatconversion.S Outdated Show resolved Hide resolved
src/coreclr/debug/di/riscv64/floatconversion.S Outdated Show resolved Hide resolved
src/coreclr/debug/di/riscv64/floatconversion.S Outdated Show resolved Hide resolved
@@ -3713,7 +3691,7 @@ YieldProcessor()
#elif defined(HOST_LOONGARCH64)
__asm__ volatile( "dbar 0; \n");
#elif defined(HOST_RISCV64)
__asm__ __volatile__( "wfi");
__asm__ __volatile__( "fence iorw, iorw"); // TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be __asm__ __volatile__( "pause");. But this is part of Zihintpause extension, and as I remember it's not part of rv64gc, which is chosen for implementation in runtime. Maybe someone can correct me on this, but looks like we can't use it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use fence iorw, iorw because the instruction has similar functionality to dbar in LOONGARCH. (If I don't know, I referenced many things from LOONGARCH.) If someone know better instruction, please let me know. Thank you!

Copy link

@brucehoult brucehoult Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be asm volatile( "pause");. But this is part of Zihintpause extension, and as I remember it's not part of rv64gc

What is the desired result of YieldProcessor, exactly?

wfi is certainly incorrect in any user program or even the OS. It is a Machine mode instruction that halts the entire CPU core, potentially forever, if there is no timer set up or pending I/O with completion interrupt. It is illegal instruction in S or U modes.

fence iorw,iorw will take some time to execute (perhaps), but it doesn't save anything

pause is the correct thing if you want a short delay in a busy-wait loop. It will sleep the current HART for a short time, saving energy or allowing other HARTs on the same core (hyperthreading) to execute faster. It is undefined how long it sleeps for, but is likely to be on the order of 10 to 100 clock cycles -- maybe similar to the time needed for a cache miss, but certainly less than the time needed to do an OS context switch to another process.

You can always use pause (.word 0x0100000f) on a machine that does not implement Zihintpause because it is encoded as fence w, (which in fact is just a NOP, as there is no successor set). If Zihintpause is not implemented then there is no better way to get a short pause in a polling loop anyway.

I tried...

        .globl mypause
mypause:        
        pause
        addi a0,a0,-1
        bnez a0,mypause
        ret

... on my VisionFive 2.

With argument 100,000,000 it takes 2.16 seconds. With the pause commented out, 0.23 seconds. So the pause takes 20ns or 30 clock cycles.

With fence iorw,iorw instead of pause it takes 0.124s. Yes -- even faster than without the fence. That's dual-issue pipelines for you ))

Copy link
Member

@gbalykov gbalykov Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pause is the correct thing if you want a short delay in a busy-wait loop

Yes, that's the intention, similar to yield instruction on arm. Thank you for detailed explanation.

My concern about pause availability was that even though it is a hint, on ubuntu 20.04 with gcc 9.4 I was not able to compile it for riscv: Error: unrecongnized opcode 'pause'. Similar error with fence w,, fence w,0, etc.: iilegal operands. So, it also depends on minimal supported gcc/clang version. I can't check on ubuntu 22.04 right now, but suppose it should work fine there since there's 2 year difference. And if this change doesn't break runtime build, then everything's fine.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running four programs doing pause in a loop on the VisionFive 2 shows 100% CPU use on top and raises the temperature from 46 C to 52 C after some minutes. Commenting out the pause results in the temperature quickly stabilising at 57.5 C.

So there is a significant energy saving by using pause.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern about pause availability was that even though it is a hint, on ubuntu 20.04 with gcc 9.4 I was not able to compile it for riscv: Error: unrecongnized opcode 'pause'

the ZiHintPause ISA extension was ratified in February 2021, so it's not reasonable to expect Ubuntu 20.04 to have it, or even gcc 9.4 -- a purely bug fix release in June 2021, when the actual current gcc release was 11.1.

RISC-V is developing extremely rapidly at the moment, with I think 15 important ISA extensions ratified in November 2021 and more every few months since. I think the pace will slow after the end of this year and wait for cores and SoCs to catch up.

It's important to use the latest tools. You need to be using gcc 12.2 (you can build it from here https://github.com/riscv-collab/riscv-gnu-toolchain) or llvm at least 14, but better 16.

If using old toolchains, .word 0x0100000f is a safe way to insert pause.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to yield instruction on arm

ARM's documentation says "The YIELD instruction has no effect in a single-threaded system" (by which they mean single-thread core, not single core system).

I can't find anything suggesting that there might be energy savings from YIELD on a single-threaded system.

I just tried ...

        .globl _mypause
        .align 2
_mypause:
1:      
        subs x0,x0,#1
        yield
        bne 1b
        ret

... on my M1 Mac Mini and the execution time for 10,000,000,000 loops was 3.2 seconds, both with and without the yield.

I also tried Graviton 3 on AWS. Result: 10,000,000,000 loops in 3.85 seconds, both with and without the yield.

So in fact ARM yield and RISC-V pause are not quite the same thing. Or M1 and Graviton 3 both don't implement yield.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brucehoult Wow!! Thank you so much for your explanation and experiments. It set it to .word 0x0100000f. Thank you.

@ayakael
Copy link
Contributor

ayakael commented Mar 20, 2023

Testing the crossbuild to riscv64 on Alpine Linux edge, which uses clang-16, reveals some text relocations when linking libcoreclr.so: https://lab.ilot.io/ayakael/dotnet-stage0/-/jobs/1488.

Starting from binutils' ld 2.35 and lld-15, if there are text relocations a warning is issued by default.

@clamp03
Copy link
Member Author

clamp03 commented Mar 21, 2023

Testing the crossbuild to riscv64 on Alpine Linux edge, which uses clang-16, reveals some text relocations when linking libcoreclr.so: https://lab.ilot.io/ayakael/dotnet-stage0/-/jobs/1488.

Starting from binutils' ld 2.35 and lld-15, if there are text relocations a warning is issued by default.

Thank you!

@clamp03
Copy link
Member Author

clamp03 commented Apr 9, 2023

@janvorli @mangod9 Could you review it? And if it is good, could you please merge it? I fixed all the things requested by the reviewers and waited 50 days. Other two PRs (#82379 , #82382) are already merged. Actually, I cannot fix all bugs and unimplemented code with just this PR. After the PR is merged, I will continue to fix and my colleagues and community members can help to support RISC-V in coreclr better.

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are just two minor nits, otherwise it looks good.

.macro PROLOG_STACK_ALLOC Size
addi sp, sp, -\Size
//.cfi_adjust_cfa_offset \Size
.cfi_def_cfa sp,\Size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this changed from my previous review.

#if HAVE_FULLY_FEATURED_PTHREAD_MUTEXES && \
HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES && \
!(defined(__FreeBSD__) || defined(TARGET_OSX))
!(defined(__FreeBSD__) || defined(TARGET_OSX) || defined(TARGET_RISCV64))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the change in the tryrun.cmake, the change in this ifdef should not be needed (because the HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES is now 0)

@clamp03
Copy link
Member Author

clamp03 commented Apr 12, 2023

There are just two minor nits, otherwise it looks good.

Thank you so much. If I fix incorrectly again, please let me know.

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@janvorli
Copy link
Member

The WASM leg failure is a known issue, merging the change.

@janvorli janvorli merged commit e73d7fc into dotnet:main Apr 13, 2023
@clamp03 clamp03 deleted the riscv_pal_debug branch April 14, 2023 02:37
janvorli pushed a commit that referenced this pull request Apr 14, 2023
* [RISC-V] coreclr-vm and other directories in coreclr

- Successfully cross-build for RISC-V.
- Run A simple application "helloworld"
- Fail a test in clr.paltest

* Remove commented codes

* Fix mistakes

* Update by reviews

* [VM] Fix test

* [VM] Update

Updated by review in #82380

* [VM] Update assert and todo comments

* [VM] Updated

* [VM] Fix a bug

* [VM] Add getRISCV64PassStructInRegisterFlags

* Revert "[VM] Add getRISCV64PassStructInRegisterFlags"

This reverts commit cd1ea45.

* [VM] Restore getLoongArch64PassStructInRegisterFlags

In coreclr-jit patch, it makes getRISCV64PassStructInRegisterFlags for
RISCV64. So restore getLoongArch64PassStructInRegisterFlags

* [VM] FIX TEST ERRORS

* [VM] Update Stubs

* Fix exceptionhandling, stack smashing detected

* [VM] Fix vm/riscv64

---------

Co-authored-by: Timur Mustafin <t.mustafin@partner.samsung.com>
@ghost ghost locked as resolved and limited conversation to collaborators May 14, 2023
@clamp03 clamp03 self-assigned this Sep 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-riscv Related to the RISC-V architecture area-PAL-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants