Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consensus/ethash: avoid runtime errors due to OOD on mmap writes #23799

Merged
merged 3 commits into from
Nov 2, 2021

Conversation

holiman
Copy link
Contributor

@holiman holiman commented Oct 24, 2021

This PR avoids a pretty rare crash.

Some more info here: edsrzf/mmap-go#12 (comment) .

When we map a file for generating the DAG, we do a simple truncate to e.g. 1Gb. This is fine, even if we have nowhere near 1Gb disk available, as the actual file doesn't take up the full 1Gb, merely a few bytes. When we start generating into it, however, it eventually crashes with a unexpected fault address .

Examples:

There are two ways we can fix this, the simplest being to use the fallocate linux system call.
The second way is to convert the internals of ethash; instead of generating into a []byte, we generate into an interface which supports WriteAt. Then we can first generate into a file (or into a wrapped memory slice), and later do a readonly-mmap on it.

This PR implements the first approach. I tried the second approach, but it becomes a pretty large refactor, especially since it also touches the cache generator (which uses the same schema), and the cache-generator is not as 'linear' -- it reads from self in a way which the DAG-generator does not, and thus is more difficult to convert this way. In the end, I thought it seems a bit wasteful to do a major refactoring of ethash at this point.

This PR also fixes an over-allocation that could happen if the mmap fails, and we generate the dag in-memory, and does some more cleanup in case of failures.

Re testing

I found this error when I tried running go test . -run - -bench BenchmarkHashimotoFullMmap/WithLock -v. It generates the dag to the system temp, which in my case was limited to 1Gb. Without this fix, the benchmark crashed, but it runs fine with this fix.

Re platforms

I got the supported-platform info from this

[user@work syscall]$ grep Fallocate * 
grep: js: Is a directory
syscall_linux.go://sys	Fallocate(fd int, mode uint32, off int64, len int64) (err error)
zsyscall_linux_386.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_amd64.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_arm.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_arm64.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_mips.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_mips64.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_mips64le.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_mipsle.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_ppc64.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_ppc64le.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_riscv64.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_s390x.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {


import (
"os"
"syscall"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In https://pkg.go.dev/syscall they say: Deprecated: this package is locked down. Callers should use the corresponding package in the golang.org/x/sys repository instead. That is also where updates required by new systems or versions should be applied. See https://golang.org/s/go1.4-syscall for more information.


// Package ethash implements the ethash proof-of-work consensus engine.

//go:build linux && (386 || amd64 || arm || arm64 || mips || mips64 || mips64le || ppc || pcc64 || pcc64le || riscv64 || s390x || sparc64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to specify all the possible $GOARCH in there? A simple

Suggested change
//go:build linux && (386 || amd64 || arm || arm64 || mips || mips64 || mips64le || ppc || pcc64 || pcc64le || riscv64 || s390x || sparc64)
//go:build linux

should work, I think, as I see no reason for fallocate not to be present for a given architecture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, indeed, it is present in all files. I was confused because they didn't put it in in syscall_linux.go, but instead added it individually in all files.

@holiman
Copy link
Contributor Author

holiman commented Oct 26, 2021

Fixed, ptal

@holiman
Copy link
Contributor Author

holiman commented Oct 27, 2021

@gballet lgty?

@@ -358,7 +361,7 @@ func (d *dataset) generate(dir string, limit int, lock bool, test bool) {
if err != nil {
logger.Error("Failed to generate mapped ethash dataset", "err", err)

d.dataset = make([]uint32, dsize/2)
d.dataset = make([]uint32, dsize/4)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: check line 330, and you'll see that this is the same as what's used already whenever we don't use disk-based dag.

@fjl fjl removed the status:triage label Nov 2, 2021
@holiman holiman added this to the 1.10.12 milestone Nov 2, 2021
@holiman holiman merged commit 178debe into ethereum:master Nov 2, 2021
sidhujag pushed a commit to syscoin/go-ethereum that referenced this pull request Nov 2, 2021
…ereum#23799)

When we map a file for generating the DAG, we do a simple truncate to e.g. 1Gb. This is fine, even if we have nowhere near 1Gb disk available, as the actual file doesn't take up the full 1Gb, merely a few bytes. When we start generating into it, however, it eventually crashes with a unexpected fault address .

This change fixes it (on linux systems) by using the Fallocate syscall, which preallocates suffcient space on disk to avoid that situation. 


Co-authored-by: Felix Lange <fjl@twurst.com>
@holiman holiman deleted the ethash_fixmem branch November 10, 2021 18:32
yongjun925 pushed a commit to DODOEX/go-ethereum that referenced this pull request Dec 3, 2022
…ereum#23799)

When we map a file for generating the DAG, we do a simple truncate to e.g. 1Gb. This is fine, even if we have nowhere near 1Gb disk available, as the actual file doesn't take up the full 1Gb, merely a few bytes. When we start generating into it, however, it eventually crashes with a unexpected fault address .

This change fixes it (on linux systems) by using the Fallocate syscall, which preallocates suffcient space on disk to avoid that situation. 


Co-authored-by: Felix Lange <fjl@twurst.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants