Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dead lock in updateDiscardStats on master #993

Closed
simon-xe-wang opened this issue Aug 19, 2019 · 4 comments
Closed

dead lock in updateDiscardStats on master #993

simon-xe-wang opened this issue Aug 19, 2019 · 4 comments
Assignees
Labels
area/crash This issue causes a panic or some other of exception that causes a crash. kind/bug Something is broken. priority/P0 Critical issue that requires immediate attention. status/accepted We accept to investigate or work on it.

Comments

@simon-xe-wang
Copy link

What version of Go are you using (go version)?

$ go version


What version of Badger are you using?

master

Does this issue reproduce with the latest master?

yes

What are the hardware specifications of the machine (RAM, OS, Disk)?

What did you do?

keep writing keys and do vlog gc every 5 mins

What did you expect to see?

What did you see instead?

prog hangs
see stack trace:

goroutine 0 [idle]:
runtime.futex(0xc6be48, 0x80, 0x0, 0x0, 0x0, 0xc00038e800, 0x0, 0x0, 0x7fff3646c328, 0x40bc71, ...)
/root/go/src/runtime/sys_linux_amd64.s:535 +0x21
runtime.futexsleep(0xc6be48, 0x0, 0xffffffffffffffff)
/root/go/src/runtime/os_linux.go:46 +0x4b
runtime.notesleep(0xc6be48)
/root/go/src/runtime/lock_futex.go:151 +0xa1
runtime.stopm()
/root/go/src/runtime/proc.go:1936 +0xc1
runtime.findrunnable(0xc000028f00, 0x0)
/root/go/src/runtime/proc.go:2399 +0x54a
runtime.schedule()
/root/go/src/runtime/proc.go:2525 +0x21c
runtime.park_m(0xc009352180)
/root/go/src/runtime/proc.go:2605 +0xa1
runtime.mcall(0x0)
/root/go/src/runtime/asm_amd64.s:299 +0x5b

goroutine 1 [semacquire, 94 minutes]:
sync.runtime_Semacquire(0xc014f50098)
/root/go/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc014f50090)
/root/go/src/sync/waitgroup.go:130 +0x65
github.com/dgraph-io/badger.(*request).Wait(0xc014f50060, 0x0, 0x0)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/value.go:938 +0x31
github.com/dgraph-io/badger.(*Txn).commitAndSend.func1(0xc017df8980, 0xc01132cac0)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/txn.go:501 +0x40
github.com/dgraph-io/badger.(*Txn).Commit(0xc017df8980, 0x0, 0x0)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/txn.go:554 +0xbe
github.com/dgraph-io/badger.(*DB).Update(0xc0000fe700, 0xc000053c28, 0x0, 0x0)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/txn.go:700 +0xbc
main.doWrite(0xb, 0x16e360)
/root/bgrload/main.go:57 +0xff
main.writeTest()
/root/bgrload/main.go:32 +0xac
main.main()
/root/bgrload/main.go:185 +0x22d

goroutine 5 [select, 94 minutes]:
github.com/dgraph-io/badger/y.(*WaterMark).process(0xc000020580, 0xc00000f060)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/y/watermark.go:225 +0x2d1
created by github.com/dgraph-io/badger/y.(*WaterMark).Init
/root/bgrload/thirdparty/github.com/dgraph-io/badger/y/watermark.go:74 +0xc9

goroutine 6 [select, 94 minutes]:
github.com/dgraph-io/badger/y.(*WaterMark).process(0xc0000205c0, 0xc00000f060)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/y/watermark.go:225 +0x2d1
created by github.com/dgraph-io/badger/y.(*WaterMark).Init
/root/bgrload/thirdparty/github.com/dgraph-io/badger/y/watermark.go:74 +0xc9

goroutine 7 [select]:
github.com/dgraph-io/badger.(*DB).updateSize(0xc0000fe700, 0xc00000f120)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/db.go:1021 +0x12f
created by github.com/dgraph-io/badger.Open
/root/bgrload/thirdparty/github.com/dgraph-io/badger/db.go:289 +0xb1c

goroutine 21 [semacquire, 103 minutes]:
sync.runtime_SemacquireMutex(0xc000122284, 0x0)
/root/go/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc000122280)
/root/go/src/sync/mutex.go:134 +0x109
github.com/dgraph-io/badger.(*valueLog).flushDiscardStats(0xc0000fe880, 0xc009380120, 0xc00000001d)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/value.go:1403 +0x49
github.com/dgraph-io/badger.(*valueLog).updateDiscardStats(0xc0000fe880, 0xc00ad48180, 0x0, 0x0)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/value.go:1393 +0x185
github.com/dgraph-io/badger.(*levelsController).compactBuildTables(0xc009378000, 0x0, 0x963be0, 0xc014902680, 0xc000124120, 0xc000124180, 0xc00ad48090, 0x5, 0x5, 0xc00ad480c0, ...)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/levels.go:636 +0x20ec
github.com/dgraph-io/badger.(*levelsController).runCompactDef(0xc009378000, 0x0, 0x963be0, 0xc014902680, 0xc000124120, 0xc000124180, 0xc00ad48090, 0x5, 0x5, 0xc00ad480c0, ...)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/levels.go:791 +0xc6
github.com/dgraph-io/badger.(*levelsController).doCompact(0xc009378000, 0x0, 0x3ff0000000000000, 0x0, 0x0, 0x0, 0x0, 0x0)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/levels.go:860 +0x484
github.com/dgraph-io/badger.(*levelsController).runWorker(0xc009378000, 0xc0001221c0)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/levels.go:356 +0x2ed
created by github.com/dgraph-io/badger.(*levelsController).startCompact
/root/bgrload/thirdparty/github.com/dgraph-io/badger/levels.go:332 +0x89

goroutine 22 [runnable]:
runtime.goparkunlock(...)
/root/go/src/runtime/proc.go:307
time.Sleep(0x989680)
/root/go/src/runtime/time.go:105 +0x159
github.com/dgraph-io/badger.(*levelsController).addLevel0Table(0xc009378000, 0xc0285008f0, 0x0, 0x0)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/levels.go:907 +0x36e
github.com/dgraph-io/badger.(*DB).handleFlushTask(0xc0000fe700, 0xc00b422dc0, 0x2200000029, 0x3acee240, 0x0, 0x0, 0x0, 0x0, 0x0)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/db.go:925 +0x7df
github.com/dgraph-io/badger.(*DB).flushMemtable(0xc0000fe700, 0xc0001221e0, 0x0, 0x0)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/db.go:941 +0x14c
github.com/dgraph-io/badger.Open.func4(0xc0000100b8)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/db.go:303 +0x37
created by github.com/dgraph-io/badger.Open
/root/bgrload/thirdparty/github.com/dgraph-io/badger/db.go:302 +0x1548

goroutine 24 [select, 94 minutes]:
github.com/dgraph-io/badger.(*DB).doWrites(0xc0000fe700, 0xc0001222e0)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/db.go:719 +0x3f8
created by github.com/dgraph-io/badger.Open
/root/bgrload/thirdparty/github.com/dgraph-io/badger/db.go:337 +0x108d

goroutine 25 [chan receive, 378 minutes]:
github.com/dgraph-io/badger.(*valueLog).waitOnGC(0xc0000fe880, 0xc000122300)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/value.go:1343 +0x67
created by github.com/dgraph-io/badger.Open
/root/bgrload/thirdparty/github.com/dgraph-io/badger/db.go:340 +0x1155

goroutine 26 [select, 378 minutes]:
github.com/dgraph-io/badger.(*publisher).listenForUpdates(0xc00008d650, 0xc000122320)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/publisher.go:67 +0x158
created by github.com/dgraph-io/badger.Open
/root/bgrload/thirdparty/github.com/dgraph-io/badger/db.go:343 +0x121d

goroutine 27 [semacquire, 103 minutes]:
sync.runtime_SemacquireMutex(0xc000122284, 0x0)
/root/go/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc000122280)
/root/go/src/sync/mutex.go:134 +0x109
github.com/dgraph-io/badger.(*valueLog).pickLog(0xc0000fe880, 0x2200000029, 0x5c12684, 0x963be0, 0xc00a5a6000, 0x0, 0x0, 0x0)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/value.go:1151 +0xe4
github.com/dgraph-io/badger.(*valueLog).runGC(0xc0000fe880, 0x3fe0000000000000, 0x2200000029, 0x5c12684, 0x0, 0x0)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/value.go:1362 +0x14c
github.com/dgraph-io/badger.(*DB).RunValueLogGC(0xc0000fe700, 0x3fe0000000000000, 0x0, 0x0)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/db.go:1076 +0x190
main.runGC()
/root/bgrload/main.go:131 +0xa5
main.main.func1(0xc0000f4000)
/root/bgrload/main.go:180 +0xc9
created by main.main
/root/bgrload/main.go:177 +0x228

goroutine 171401 [semacquire, 94 minutes]:
sync.runtime_SemacquireMutex(0xc0000fe8a8, 0xc01438ee00)
/root/go/src/runtime/sema.go:71 +0x3d
sync.(*RWMutex).Lock(0xc0000fe8a0)
/root/go/src/sync/rwmutex.go:98 +0x79
github.com/dgraph-io/badger.(*valueLog).createVlogFile(0xc0000fe880, 0x2a, 0x1e, 0xc0414a1dc8, 0x1)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/value.go:708 +0x76d
github.com/dgraph-io/badger.(*valueLog).write.func1(0xc014f50000, 0xc03744cc30)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/value.go:1022 +0x31b
github.com/dgraph-io/badger.(*valueLog).write(0xc0000fe880, 0xc000099090, 0x1, 0xa, 0x0, 0x0)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/value.go:1062 +0x477
github.com/dgraph-io/badger.(*DB).writeRequests(0xc0000fe700, 0xc000099090, 0x1, 0xa, 0x7cf00a, 0x435ca6)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/db.go:633 +0x106
github.com/dgraph-io/badger.(*DB).doWrites.func1(0xc000099090, 0x1, 0xa)
/root/bgrload/thirdparty/github.com/dgraph-io/badger/db.go:706 +0x5a
created by github.com/dgraph-io/badger.(*DB).doWrites
/root/bgrload/thirdparty/github.com/dgraph-io/badger/db.go:759 +0x30b

@simon-xe-wang
Copy link
Author

look at this code below in value.go, you can find the lock lfDiscardStats is recursively called.

func (vlog *valueLog) updateDiscardStats(stats map[uint32]int64) error {
vlog.lfDiscardStats.Lock()
defer vlog.lfDiscardStats.Unlock()

    for fid, sz := range stats {
            vlog.lfDiscardStats.m[fid] += sz
            vlog.lfDiscardStats.updatesSinceFlush++
    }
    if vlog.lfDiscardStats.updatesSinceFlush > discardStatsFlushThreshold {
            if err := vlog.flushDiscardStats(); err != nil {
                    return err
            }
            vlog.lfDiscardStats.updatesSinceFlush = 0
    }
    return nil

}

// flushDiscardStats inserts discard stats into badger. Returns error on failure.
func (vlog *valueLog) flushDiscardStats() error {
vlog.lfDiscardStats.Lock()
if len(vlog.lfDiscardStats.m) == 0 {
vlog.lfDiscardStats.Unlock()
return nil
}
vlog.lfDiscardStats.Unlock()

@jarifibrahim jarifibrahim added area/crash This issue causes a panic or some other of exception that causes a crash. kind/bug Something is broken. priority/P0 Critical issue that requires immediate attention. status/accepted We accept to investigate or work on it. labels Aug 20, 2019
@jarifibrahim
Copy link
Contributor

Thanks for reporting the issue @SimonXW. This should be fixed by #976

@simon-xe-wang
Copy link
Author

Great. Thanks

@poonai poonai self-assigned this Aug 20, 2019
@ashish-goswami
Copy link
Contributor

Fixed by #976

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/crash This issue causes a panic or some other of exception that causes a crash. kind/bug Something is broken. priority/P0 Critical issue that requires immediate attention. status/accepted We accept to investigate or work on it.
Development

No branches or pull requests

4 participants