Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: panic in BitArray.Format #31115

Closed
danhhz opened this issue Oct 8, 2018 · 11 comments · Fixed by #31695
Closed

sql: panic in BitArray.Format #31115

danhhz opened this issue Oct 8, 2018 · 11 comments · Fixed by #31695
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@danhhz
Copy link
Contributor

danhhz commented Oct 8, 2018

I'm looking to re-enable the partitioning tests, which were skipped as flaky a while ago. They generate random datums of the various column types we support, partition by those datums, and run some checks. The TestInitialPartitioning/BIT one is panic-ing.

Seems to repro pretty quickly under stress, but here's one of the random values I saw fail in case that's easier:

CREATE TABLE "BIT" (a BIT PRIMARY KEY) PARTITION BY LIST (a) (PARTITION p VALUES IN (B'0110001110111000010000000110010110010110010110111010101100011001'))
panic: runtime error: index out of range [recovered]
	panic: runtime error: index out of range

goroutine 1028 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc421502bd0, 0x2ecfb00, 0xc421538900)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:184 +0x11f
panic(0x2668ba0, 0x3fe3a40)
	/usr/local/go/src/runtime/panic.go:502 +0x229
github.com/cockroachdb/cockroach/pkg/util/bitarray.BitArray.Format(0xc421eb7cd0, 0x1, 0x1, 0xc422f22f41, 0xc423a3fea0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/bitarray/bitarray.go:284 +0x310
github.com/cockroachdb/cockroach/pkg/util/bitarray.BitArray.String(0xc421eb7cd0, 0x1, 0x1, 0xc421eb7b41, 0x0, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/bitarray/bitarray.go:95 +0x6b
github.com/cockroachdb/cockroach/pkg/util/encoding.prettyPrintFirstValue(0x0, 0xc421eb7bfe, 0x0, 0x2, 0xc422f23168, 0x2, 0x2, 0x2, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/encoding/encoding.go:1486 +0x1451
github.com/cockroachdb/cockroach/pkg/util/encoding.prettyPrintValueImpl(0x0, 0x0, 0x0, 0xc421eb7bf0, 0xe, 0x10, 0x299bd8e, 0x1, 0xc423a3fd50, 0x70, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/encoding/encoding.go:1386 +0x112
github.com/cockroachdb/cockroach/pkg/util/encoding.PrettyPrintValue(0x0, 0x0, 0x0, 0xc421eb7bf0, 0xe, 0x10, 0x299bd8e, 0x1, 0x0, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/encoding/encoding.go:1350 +0x9b
github.com/cockroachdb/cockroach/pkg/keys.decodeKeyPrint(0x0, 0x0, 0x0, 0xc421eb7bf0, 0xe, 0x10, 0x1, 0xc421eb7cc0)
	/go/src/github.com/cockroachdb/cockroach/pkg/keys/printer.go:494 +0xf6
github.com/cockroachdb/cockroach/pkg/keys.prettyPrintInternal.func1(0xc421eb7bf0, 0xe, 0x10, 0x4013a65, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/keys/printer.go:530 +0x3cc
github.com/cockroachdb/cockroach/pkg/keys.prettyPrintInternal(0x0, 0x0, 0x0, 0xc421eb7bf0, 0xe, 0x10, 0xc42000b100, 0x2, 0x100)
	/go/src/github.com/cockroachdb/cockroach/pkg/keys/printer.go:563 +0x410
github.com/cockroachdb/cockroach/pkg/keys.PrettyPrintRange(0xc421eb7bf0, 0xe, 0x10, 0xc421eb7c08, 0x2, 0x8, 0x1e, 0x2, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/keys/printer.go:734 +0xa5
github.com/cockroachdb/cockroach/pkg/storage.(*atomicDescString).store(0xc422601280, 0x155bc13600000000, 0xc4239da5a0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:208 +0x1a0
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).initRaftMuLockedReplicaMuLocked(0xc422601200, 0xc422f387e0, 0xc4213f7b30, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:752 +0x3cc
github.com/cockroachdb/cockroach/pkg/storage.splitPostApply(0x2ecfb00, 0xc421333680, 0x0, 0x155bc13615e4b599, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2367 +0x148
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleReplicatedEvalResult(0xc422600600, 0x2ecfb00, 0xc421333680, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_proposal.go:649 +0x117f
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleEvalResultRaftMuLocked(0xc422600600, 0x2ecfb00, 0xc421333680, 0xc423c96600, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_proposal.go:822 +0xaa
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).processRaftCommand(0xc422600600, 0x2ecfb00, 0xc421333680, 0xc421eb79c0, 0x8, 0x6, 0xf, 0x300000003, 0x1, 0x5, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:5644 +0x979
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReadyRaftMuLocked(0xc422600600, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:4460 +0x1356
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRequestQueue.func1(0x2ecfb00, 0xc423104990, 0xc422600600, 0x2ecfb00)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3867 +0x109
github.com/cockroachdb/cockroach/pkg/storage.(*Store).withReplicaForRequest(0xc421ac0b00, 0x2ecfb00, 0xc423104990, 0xc422f2c1a0, 0xc422f25ed0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3518 +0x135
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRequestQueue(0xc421ac0b00, 0x2ecfb00, 0xc421538900, 0x127)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3855 +0x229
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc4213f1d00, 0x2ecfb00, 0xc421538900)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:225 +0x21b
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2(0x2ecfb00, 0xc421538900)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:165 +0x3e
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc4203d1280, 0xc421502bd0, 0xc4203d1270)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:199 +0xe9
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:192 +0xad
@danhhz danhhz added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Oct 8, 2018
@danhhz
Copy link
Contributor Author

danhhz commented Oct 8, 2018

assigning to @knz for triage

@benesch
Copy link
Contributor

benesch commented Oct 22, 2018

I dug a little bit and managed to capture one of these bit arrays that can't be formatted:

I181022 04:18:05.799505 26512 storage/replica_command.go:300  [n3,split,s3,r266/1:/{Table/86/1/B…-Max}] initiating a split of this range at key "މ:\xfd\x8d\xecT\xfe9\x81V\xb1\x00\xc9" %!s(PANIC=runtime error: index out of range) [r267]

@benesch
Copy link
Contributor

benesch commented Oct 22, 2018

Ok, here's an instant repro:

CREATE TABLE bit (a BIT PRIMARY KEY) PARTITION BY LIST (a) (PARTITION p VALUES IN (B'1000110111101100010101001111111000111001100000010101011010110001'));;
ALTER PARTITION p OF TABLE bit CONFIGURE ZONE USING DEFAULT;

The spans that partition creates involving invoking Next() on the primary key for the listed bit array, and that key can't be pretty-printed without panicing.

@knz
Copy link
Contributor

knz commented Oct 22, 2018

@benesch found out that Next() was called on the PK to increase it to a value that in turns makes it invalid.

However the Next() method for bitarrays nver produces invalid values.

I deduce that some code somewhere is trying to manually compute a Next() value on the raw bytes of the PK without using the Next code adequate for the column type.

What code would that be?

@benesch
Copy link
Contributor

benesch commented Oct 22, 2018

I believe it’s in partitionccl. The example I posted creates a partition with just one bit array. This results in a range with bounds [key, key.Next()), and the Next() operation is on a roachpb.Key, not a a bit array datum.

I don’t think the solution here is to avoid this call to Next(). We call Next() on keys all over the place, so I’m sure there’s another code path that could Next() a bit array key and then try to print it. I think we have to teach the bit array printer to handle nonsense lastBitArray values without panicking.

@knz
Copy link
Contributor

knz commented Oct 22, 2018

test

@benesch
Copy link
Contributor

benesch commented Oct 22, 2018

Apologies. Everywhere I've said Next() above I've been meaning to say PrefixEnd().

@knz
Copy link
Contributor

knz commented Oct 22, 2018

so having a next() function that simply bumps the last byte is not only bad because of this pretty-printing bug, it's might also be semantically incorrect.

If, say I have a bit array B'XXX(64 bits)XXX'.
This is encoded by two varints: a varint containing the XXXXX vits and then the "last bits used field" set to 64.

The symbolic "next" value is: B'XXX(64bits)XXX0 (a 0 appended)

The canonical encoding for this is 3 varints: the XXXXX bits, then a 0 varint, then a last bits used field set to 1.

The bad encoding produced by roachpb.key.next under scrutiny produces 2 varints: the XXXX bits then a lastbits used field set to 65.

So if we just look at the first 2 varints, the bad encoding sorts after the good.

In fact, it sorts "weirdly":

  • it sorts after B'XXXX0 (word X, word 0, lastused 1)
  • it sorts before B'XXXX1 (word X, word 2^63, lastused 1)
  • it incorrectly sorts after B'XXXX0(60 bits unset)0 (word X, word 0, last used 60)
  • it correctly sorts before B'XXXX1(60 bits set)1 (word X, word 1111111, last used 60)

There may be some accidental magic occuring because of the tag bytes in between the varints?

But maybe the lesson here is that there should be an additional tag byte after the lastbitsused varint. So that the roachpb.key.next() method would bump the tag byte, and not the actual varints that participate in the bytearray value?

I'd love to hear some advice from @nvanbenschoten about how to properly approach this.

@knz
Copy link
Contributor

knz commented Oct 22, 2018

please ignore my previous comment -- if the concern is in PrefixEnd() then my comment does not apply.

@knz
Copy link
Contributor

knz commented Oct 22, 2018 via email

@danhhz
Copy link
Contributor Author

danhhz commented Oct 22, 2018

Looks like it's being used to create the string description of the new replica after a split (see the stack trace in the original issue text).

benesch added a commit to benesch/cockroach that referenced this issue Oct 22, 2018
It is invalid for a bit array's lastBitsUsed field to be greater than
64. The FromEncodingParts function, however, would happily construct
an invalid bitarray if given a too-large lastBitsUsed value. Teach the
FromEncodingParts to return an error instead.

This presented as a panic when attempting to pretty-print a key with a
bitarray whose lastBitsUsed encoded value was 65. Such a key can be
created when calling PrefixEnd on a key with a bitarray whose
lastBitsUsed value is 64. By returning an error instead, the
pretty-printing code will try again after calling UndoPrefixEnd and be
able to print the key.

Fix cockroachdb#31115.

Release note: None
craig bot pushed a commit that referenced this issue Oct 22, 2018
31556: importccl: re-enable job control tests r=mjibson a=mjibson

I tracked down the problem. It was that after the CANCEL JOB was issued,
sometimes the 2nd stage of the IMPORT (the shuffle) would have started,
and sometimes it wouldn't have. If it did not start then RunJob would
block forever trying to send on the allowResponse chan. Fix this by
making a draining go routine after the first block.

Closes #24623
Closes #24658

Release note: None

31627: storage: remove spurious call to maybeInlineSideloadedRaftCommand r=nvanbenschoten,benesch a=tschottdorf

Entries are "thinned" only when passed to `r.append()` (i.e. written to
disk) and they are always returned "fat" from `Entries()` (i.e. Raft's way
to get entries from disk). Consequently Raft never sees thin entries and
won't ask us to commit them.

Touches #31618.

Release note: None

31695: bitarray: don't allow FromEncodingParts to return invalid bit array r=knz a=benesch

It is invalid for a bit array's lastBitsUsed field to be greater than
64. The FromEncodingParts function, however, would happily construct
an invalid bitarray if given a too-large lastBitsUsed value. Teach the
FromEncodingParts to return an error instead.

This presented as a panic when attempting to pretty-print a key with a
bitarray whose lastBitsUsed encoded value was 65. Such a key can be
created when calling PrefixEnd on a key with a bitarray whose
lastBitsUsed value is 64. By returning an error instead, the
pretty-printing code will try again after calling UndoPrefixEnd and be
able to print the key.

Fix #31115.

Release note: None

31697: partitionccl: deflake TestDropIndexWithZoneConfigCCL r=danhhz,eriktrinh a=benesch

A particularly adversarial goroutine schedule can cause this test to
observe the moment in time where the data is dropped but the zone config
is not. Deflake by retrying the check for the dropped zone config until
it succeeds (or times out).

Fix #31678.

Release note: None

Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
@craig craig bot closed this as completed in #31695 Oct 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants