Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic due to NPE when processing WiredList #2573

Closed
justinjc opened this issue Aug 31, 2020 · 2 comments · Fixed by #3056
Closed

Panic due to NPE when processing WiredList #2573

justinjc opened this issue Aug 31, 2020 · 2 comments · Fixed by #3056
Labels
area:db All issues pertaining to dbnode T: Bug

Comments

@justinjc
Copy link
Collaborator

justinjc commented Aug 31, 2020

Stack trace:

panic: runtime error: invalid memory address or nil pointer dereference

github.com/m3db/m3/src/x/ident.BytesID.Equal(...)
    github.com/m3db/m3/src/x/ident/bytes_id.go:46
github.com/m3db/m3/src/dbnode/storage/series.(*dbSeries).OnEvictedFromWiredList(...)
    github.com/m3db/m3/src/dbnode/storage/series/series.go:580
github.com/m3db/m3/src/dbnode/storage.(*dbShard).OnEvictedFromWiredList(...)
    github.com/m3db/m3/src/dbnode/storage/shard.go:532
github.com/m3db/m3/src/dbnode/storage/block.(*WiredList).insertAfter(...)
    github.com/m3db/m3/src/dbnode/storage/block/wired_list.go:299
github.com/m3db/m3/src/dbnode/storage/block.(*WiredList).pushBack(...)
    github.com/m3db/m3/src/dbnode/storage/block/wired_list.go:348
github.com/m3db/m3/src/dbnode/storage/block.(*WiredList).processUpdateBlock(...)
    github.com/m3db/m3/src/dbnode/storage/block/wired_list.go:239
github.com/m3db/m3/src/dbnode/storage/block.(*WiredList).Start.func1(...)
    github.com/m3db/m3/src/dbnode/storage/block/wired_list.go:169

This is a regression after an upgrade from:
b429452 (from v0.14.2)
to:
99477fc (from v0.15.12)
Update: This bug appears to have been present for a very long time.

The stack trace indicates that the id in the series is nil. Judging purely from the comment block above the id field:

// Note: The bytes that back "id ident.ID" are the same bytes
// that are behind the ID in "metadata doc.Document", the whole
// reason we keep an ident.ID on the series is since there's a lot
// of existing callsites that require the ID as an ident.ID.
id ident.ID

I'm guessing that this might be related to this commit: b4be7b1

cc: @robskillington

@justinjc justinjc added T: Bug area:db All issues pertaining to dbnode labels Aug 31, 2020
@robskillington
Copy link
Collaborator

Hm @justinjc I think this actually might be a race case where a series is evicted from the wired list and it is also evicted from the shard map and therefore s.id is nil on the series.

Could you perhaps try adding a closed bool to the series and checking that at the beginning of "OnEvictedFromWiredList" before checking against the ID?

If that works we may want to create this concept of open or closed, and then set it to false on "Reset(...)" and then set it true on "Close()".

This will help us discern whether it's a valid use of the series or if it's simply not in an open state and used outside of the shard lock map (like the case here where the series is being used outside of the RLock/RUnlock being held of the shard map).

@justinjc
Copy link
Collaborator Author

justinjc commented Jan 2, 2021

Quick update: it appears that this bug is present in older versions too (pre-v0.15.x). We seemed to have encountered this in clusters with old builds, but for some reason it procs significantly more frequently with new builds.

I've attempted to repro this with this test, which continuously writes and flushes series so that the series gets evicted (closed), nil-ing the ID. At the same time, I fetch IDs in a loop to continuously evict series from the wired list. The test does repro this issue, albeit very inconsistently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:db All issues pertaining to dbnode T: Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants