Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB can't be opened after crash #1023

Closed
markadev opened this issue Sep 3, 2019 · 3 comments
Closed

DB can't be opened after crash #1023

markadev opened this issue Sep 3, 2019 · 3 comments
Labels
area/crash This issue causes a panic or some other of exception that causes a crash. kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/confirmed The issue has been triaged by still not reproduced.

Comments

@markadev
Copy link

markadev commented Sep 3, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12.6 linux/amd64

What version of Badger are you using?

1.6.0, with local patches to fix value log GC (TxnTooBig) and bloom filter memory use. (Without the local patches, vlog GC doesn't work and memory use OOMs the machine way to easily)

Does this issue reproduce with the latest master?

Dunno, master binary format is different so the broken DB I created is incompatible.

What are the hardware specifications of the machine (RAM, OS, Disk)?

SLES Linux in a VM, 8GB RAM

What did you do?

Ran a test program which does concurrent writes and value log GC, then performs reads to validate data is accessible. This program generates the busted databas: main.go.txt.

Within 24 hours, that test program spewed out some warnings about "This entry should have been caught". I copied the DB to another machine (Ubuntu, same arch) for analysis and tried to open it with a dumb test program.

What did you expect to see?

I expect to be able to Open() the database.

What did you see instead?

Attempt to open the database failed with error:

badger 2019/09/03 13:42:31 INFO: All 8 tables opened in 1.084s
badger 2019/09/03 13:42:31 INFO: Replaying file id: 48 at offset: 744099210
badger 2019/09/03 13:42:31 INFO: Replay took: 230.700769ms
open failed Unable to find log file. Please retry
github.com/dgraph-io/badger.init.ializers
	/home/marka/src/badger-dataloss/thirdparty/github.com/dgraph-io/badger/errors.go:66
runtime.main
	/usr/local/go/src/runtime/proc.go:188
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1337
failed to read value pointer from vlog file: {Fid:33 Len:334 Offset:347265756}
github.com/dgraph-io/badger.(*valueLog).populateDiscardStats
	/home/marka/src/badger-dataloss/thirdparty/github.com/dgraph-io/badger/value.go:1449
github.com/dgraph-io/badger.(*valueLog).open
	/home/marka/src/badger-dataloss/thirdparty/github.com/dgraph-io/badger/value.go:859
github.com/dgraph-io/badger.Open
	/home/marka/src/badger-dataloss/thirdparty/github.com/dgraph-io/badger/db.go:318
main.main
	/home/marka/src/badger-dataloss/cmd/lookup/main.go:14
runtime.main
	/usr/local/go/src/runtime/proc.go:200
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1337
panic: failed to read value pointer from vlog file: {Fid:33 Len:334 Offset:347265756}: Unable to find log file. Please retry

Indeed, value log file 33 is no longer present.

Repeated attempts to open the database always failed with the same error. On inspection of the populateDiscardStats function, it looks like it doesn't properly handle the case where the !badger!discard value has been moved to a later value log file.

When I hacked in some code into populateDiscardStats to handle ErrRetry then it seemed to fix the problem.

@jarifibrahim
Copy link
Contributor

jarifibrahim commented Sep 10, 2019

Related to #1031

@jarifibrahim jarifibrahim added area/crash This issue causes a panic or some other of exception that causes a crash. kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/confirmed The issue has been triaged by still not reproduced. labels Sep 10, 2019
@connorgorman
Copy link
Contributor

connorgorman commented Sep 13, 2019

@jarifibrahim Do you know if this is not fixed on master?
@markadev Did your branch by any chance have this change? #929. Looks like the same error

@jarifibrahim
Copy link
Contributor

This issue is fixed in master via #929.

@markadev Please try the master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/crash This issue causes a panic or some other of exception that causes a crash. kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/confirmed The issue has been triaged by still not reproduced.
Development

No branches or pull requests

3 participants