Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store: Prevent EOF errors in Azure objstore #1469

Merged
merged 1 commit into from
Sep 20, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions pkg/objstore/azure/azure.go
Original file line number Diff line number Diff line change
Expand Up @@ -190,15 +190,19 @@ func (b *Bucket) getBlobReader(ctx context.Context, name string, offset, length
}

var size int64
if length > 0 {
// If a length is specified and it won't go past the end of the file,
// then set it as the size.
if length > 0 && length <= props.ContentLength()-offset {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look right.

AFAIK there is no known case in immutable system that Thanos would ask for something that does not exists unless TSDB block is malformed (malformed index, index pointing to not existsing chunks). Is that the case?

Even in this case this change would delegate error to Thanos which would see wrong result from Azure anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have run bucket verify over all of my buckets and it reported no errors. That should catch most index issues, right?

These are the logs I'm producing:

level=debug ts=2019-08-28T12:48:46.576445Z caller=azure.go:198 msg="set size to length" contentlength=38275251 size=18794 length=18794 offset=38272364 name=01DKBY3R34HCNHK0Q190CZD8NN/chunks/000001
level=debug ts=2019-08-28T12:48:46.576436Z caller=azure.go:198 msg="set size to length" contentlength=163910930 size=23419 length=23419 offset=163903417 name=01DKBZZ6YZCW1NSM6WMCZZJ37J/chunks/000001
level=error ts=2019-08-28T12:48:46.609712Z caller=bucket.go:608 msg="error preloading samples" err="read range for 0: get range reader: cannot download blob, address: https://XXXXXXX.blob.core.windows.net/thanos-snap/01DKBZZ6YZCW1NSM6WMCZZJ37J/chunks/000001: unexpected EOF"
level=error ts=2019-08-28T12:48:46.657179Z caller=bucket.go:608 msg="error preloading samples" err="read range for 0: get range reader: cannot download blob, address: https://XXXXXXX.blob.core.windows.net/thanos-snap/01DKBY3R34HCNHK0Q190CZD8NN/chunks/000001: unexpected EOF"
level=debug ts=2019-08-28T12:48:46.65731Z caller=bucket.go:806 msg="stats query processed" stats="&{blocksQueried:4 postingsTouched:0 postingsTouchedSizeSum:0 postingsToFetch:0 postingsFetched:0 postingsFetchedSizeSum:0 postingsFetchCount:0 postingsFetchDurationSum:0 seriesTouched:0 seriesTouchedSizeSum:0 seriesFetched:0 seriesFetchedSizeSum:0 seriesFetchCount:0 seriesFetchDurationSum:0 chunksTouched:0 chunksTouchedSizeSum:0 chunksFetched:0 chunksFetchedSizeSum:0 chunksFetchCount:0 chunksFetchDurationSum:0 getAllDuration:118210554 mergedSeriesCount:0 mergedChunksCount:0 mergeDuration:2701}" err=null

When reading 01DKBZZ6YZCW1NSM6WMCZZJ37J/chunks/000001, for example, the Blob content length is 163,910,930 and the offset is 163,903,417, so there should only be 7,513 bytes left to read but azure.go requests from [163,903,417:(163,903,417+23,419)] which then results in an EOF.

The source of the call is chunkr.preload(samplesLimiter) in the blockSeries function called by func (s *BucketStore) Series in bucket.go.

I'll look more into how the partitioning logic is working in func (r *bucketChunkReader) preload(samplesLimiter *Limiter) today.

Copy link
Contributor Author

@wbh1 wbh1 Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like Minio also returns an EOF rather than trying to smartly just send back data up to the end of the file, which makes me think we should handle this in Thanos rather than client lib.

if size > 0 && err == io.ErrUnexpectedEOF {
    // If an EOF happens after reading some but not
    // all the bytes ReadFull returns ErrUnexpectedEOF
    err = io.EOF
}

None of the function calls that I traced my way through seemed to care about any stats from meta.json or anything else when making start/end points for chunks. So, I'm not sure how the other objstores don't run into this issue, too... I would think it's a malformed TSDB block except for the fact that when I run with my changes then I'm actually able to see the data in Thanos Query.

Also, interestingly, I am now able to observe this on other metrics (not just the one that is last) e.g. up where Thanos still displays data because it is able to load 1 or more blocks, but several others fail resulting in gaps sometimes. Maybe other users haven't observed this because errors from blockSeries aren't being logged? (I had to add my own logging for it to find the EOF issue)

Example screenshots:
[v0.6.1]
Screen Shot 2019-08-28 at 10 15 30 AM

[with my changes]
Screen Shot 2019-08-28 at 10 16 35 AM

Thoughts @bwplotka ?

[EDIT]
Thanks to @povilasv, who just fixed the thing I mentioned about run.Group not returning errors from blockSeries in #1468 😁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @bwplotka, just pinging you for another look.

btw, congrats on the Redhat move! 😄

Copy link

@devalexx devalexx Sep 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bwplotka I can confirm that it works for me too

size = length
level.Debug(b.logger).Log("msg", "set size to length", "size", size, "length", length, "offset", offset, "name", name)
} else {
size = props.ContentLength() - offset
level.Debug(b.logger).Log("msg", "set size to go to EOF", "contentlength", props.ContentLength(), "size", size, "length", length, "offset", offset, "name", name)
}

destBuffer := make([]byte, size)

if err := blob.DownloadBlobToBuffer(context.Background(), blobURL.BlobURL, offset, length,
if err := blob.DownloadBlobToBuffer(context.Background(), blobURL.BlobURL, offset, size,
destBuffer, blob.DownloadFromBlobOptions{
BlockSize: blob.BlobDefaultDownloadBlockSize,
Parallelism: uint16(3),
Expand Down