Skip to content

Commit

Permalink
Fix Parquet DELTA_BINARY_PACKED decoder move buffer ptr to the end of…
Browse files Browse the repository at this point in the history
… page (facebookincubator#10485)

Summary:
When all data in the DELTA_BINARY_PACKED decoder has been read, its
corresponding buffer ptr needs to be moved to the end of the page data. When
the amount of data contained in the page is relatively small,
`totalValuesRemaining_` will be less than `valuesRemainingCurrentMiniBlock_`,
because the initial value of `valuesRemainingCurrentMiniBlock_` is always fixed as
`valuesPerBlock_ / miniBlocksPerBlock_`. For example, if `valuesPerBlock_` = 128
and `miniBlocksPerBlock_` = 4, then `valuesRemainingCurrentMiniBlock_` would be
32. If the page only contains 4 integers, the original code, after reading all 4 data
entries, would have `valuesRemainingCurrentMiniBlock_` = 32 - 4 = 28, and it would
not enter the if condition to move `bufferStart_` to the end of the page.

Pull Request resolved: facebookincubator#10485

Reviewed By: Yuhta

Differential Revision: D59974493

Pulled By: kevinwilfong

fbshipit-source-id: c67bb0690e80e42d1aeaa40673c86a127feafa0d
  • Loading branch information
liujiayi771 authored and facebook-github-bot committed Jul 19, 2024
1 parent facd967 commit 450b105
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 1 deletion.
11 changes: 10 additions & 1 deletion velox/dwio/parquet/reader/DeltaBpDecoder.h
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,14 @@ class DeltaBpDecoder {
}
}

const char* bufferStart() {
return bufferStart_;
}

int64_t validValuesCount() {
return static_cast<int64_t>(totalValuesRemaining_);
}

private:
bool getVlqInt(uint64_t& v) {
uint64_t tmp = 0;
Expand Down Expand Up @@ -173,6 +181,7 @@ class DeltaBpDecoder {
if (totalValueCount_ != 1) {
initBlock();
}
totalValuesRemaining_--;
return value;
} else {
++miniBlockIdx_;
Expand Down Expand Up @@ -201,7 +210,7 @@ class DeltaBpDecoder {
valuesRemainingCurrentMiniBlock_--;
totalValuesRemaining_--;

if (valuesRemainingCurrentMiniBlock_ == 0) {
if (valuesRemainingCurrentMiniBlock_ == 0 || totalValuesRemaining_ == 0) {
bufferStart_ += bits::nbytes(deltaBitWidth_ * valuesPerMiniBlock_);
}
return value;
Expand Down
7 changes: 7 additions & 0 deletions velox/dwio/parquet/reader/PageReader.h
Original file line number Diff line number Diff line change
Expand Up @@ -529,6 +529,13 @@ void PageReader::readWithVisitor(Visitor& visitor) {
visitor.setNumValuesBias(numValuesBeforePage);
visitor.setRows(pageRows);
callDecoder(nulls, nullsFromFastPath, visitor);
if (encoding_ == thrift::Encoding::DELTA_BINARY_PACKED &&
deltaBpDecoder_->validValuesCount() == 0) {
VELOX_DCHECK(
deltaBpDecoder_->bufferStart() == pageData_ + encodedDataSize_,
"Once all data in the delta binary packed decoder has been read, "
"its buffer ptr should be moved to the end of the page.");
}
if (currentVisitorRow_ < numVisitorRows_ || isMultiPage) {
if (mayProduceNulls) {
if (!isMultiPage) {
Expand Down

0 comments on commit 450b105

Please sign in to comment.