Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check entry empty state to ensure GC eligible #3634

Merged
merged 37 commits into from
Aug 19, 2021

Conversation

rallen090
Copy link
Collaborator

@rallen090 rallen090 commented Jul 29, 2021

What this PR does / why we need it:

Before removing in-memory series from the index, we need to be sure that they've been flushed to disk. Before the change in the PR, the proxy for determining this was based on if a given blockStart had been bootstrapped - the logic being that block must have been flushed to disk for it to have been bootstrapped (i.e. blockTickResult.NumSegmentsBootstrapped != 0).

The problem w/ the above approach is that it does not capture cold writes, since those could be written to old blockStarts, still be in-memory and not cold flushed yet to disk, but the bootstrapped state would still be true.

Instead, we now check that a given series is truly "empty" (i.e. no TSDB data nor index data for that series in-memory) to be eligible for GC. One challenge with this check is that we only keep track of the state (a) "is TSDB data still in-memory" and not (b) "is index data still in-memory". In other words, we can be sure that (a) is true but cannot be sure (b) is true. So this PR also makes it so that the state we track asserts that both (a) and (b) are true. The way we do this is by marking this state only after both TSDB and index (warm and cold) flushes are truly complete.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:


Does this PR require updating code package or user-facing documentation?:


@rallen090 rallen090 changed the base branch from master to r/index-active-block July 29, 2021 19:02
@codecov
Copy link

codecov bot commented Aug 2, 2021

Codecov Report

Merging #3634 (6651096) into r/index-active-block (c815135) will increase coverage by 0.0%.
The diff coverage is 76.3%.

Impacted file tree graph

@@                 Coverage Diff                  @@
##           r/index-active-block   #3634   +/-   ##
====================================================
  Coverage                  56.3%   56.3%           
====================================================
  Files                       551     551           
  Lines                     62189   62214   +25     
====================================================
+ Hits                      35015   35066   +51     
+ Misses                    24035   24013   -22     
+ Partials                   3139    3135    -4     
Flag Coverage Δ
aggregator 57.1% <ø> (ø)
cluster ∅ <ø> (∅)
collector 58.4% <ø> (ø)
dbnode 60.9% <76.3%> (+<0.1%) ⬆️
m3em 46.4% <ø> (ø)
metrics 19.8% <ø> (ø)
msg 74.4% <ø> (+0.1%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c815135...6651096. Read the comment docs.

src/dbnode/storage/flush.go Outdated Show resolved Hide resolved
return result
// IsEmpty returns true if the entry has no in-memory series data.
func (entry *Entry) IsEmpty() bool {
return entry.Series.IsEmpty()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of just doing entry.Series.IsEmpty() we will need to keep the idea of RelookupAndIncrementReaderWriterCount IMO which would re-lookup the latest lookup.Entry in the shard map.

There's a few reasons for this and we saw real problems without relooking up the entry. I can explain it more in depth if required but the problem is that sometimes a lookup.Entry is created for a new series which is passed off to the indexing queue but then a race of two datapoints for a series that doesn't exist yet causes only a single lookup.Entry to exist in the final shard map and the other one becomes orphaned (which could look empty but the lookup.Entry that made it into the shard map will have a series that is not actually empty).

That's why the caller (from within mutable_segments.go) should perform perhaps something like the following:

			isEmpty, ok := entry.RelookupAndReturnIsEmpty()
			if !ok {
				// Should not happen since shard will not expire until
				// no more block starts are indexed.
				// We do not GC this series if shard is missing since
				// we open up a race condition where the entry is not
				// in the shard yet and we GC it since we can't find it
				// due to an asynchronous insert.
				return true
			}

			return !isEmpty

Copy link
Collaborator

@robskillington robskillington Aug 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we should probably add a metric for if the edge case happens using the instrument.EmitAndLogInvariantError(...) so that this panics integration test and scenario tests if this edge case ever starts occurring.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting wasn't aware of this edge case but it makes sense. Will add back this safeguard.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR updated accordingly

flushedShards map[shardFlush]bool
)
if indexEnabled {
flushesForNs, ok := indexFlushes[n.ID().String()]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check dataFlushes here as well? If not, why so?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data flushes always precede index flushes, so it is safe to assume that the marking of a "full" flush can be indicated by the "index" flush. If the index is disabled, though, then we just use the data flush as the indicator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I am double checking w/ rob though so stay tuned. Assuming what I just said is true, I can add a comment here explaining the logic.

}] = s
for _, t := range i.blockStartsFromIndexBlockStart(block.StartTime()) {
for _, s := range shards {
s.MarkWarmIndexFlushStateSuccessOrError(t, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, IIUC, blockStartsFromIndexBlockStart returns data block starts between index block start and index block start + index block size. This is important because index block size >= data block size. If that's true, why do we need to MarkWarmIndexFlushStateSuccessOrError for each data block start time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. And the reason is because MarkWarmIndexFlushStateSuccessOrError is still marking data blockStarts (not index blockStarts), but each block now just have a flag for both data and index. This is because the state we are tracking is always still at the data block size not index block size. Eg we have 1h block and 2h indexBlock. In-mem we now have:

1pm: {dataFlushed: bool, indexFlushed: bool}
2pm: {dataFlushed: bool, indexFlushed: bool}
3pm: {dataFlushed: bool, indexFlushed: bool}
4pm: {dataFlushed: bool, indexFlushed: bool}
...

where determining if a given block is eligible for GC we check both dataFlushed && indexFlushed. So when an index flush occurs, that actually in this case covers 2 blocks.

This is a somewhat confusing nuance though so if you have any suggestions are making this easier to understand let me know.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, may be worth adding a comment here just clarify

@rallen090 rallen090 merged commit 926a478 into r/index-active-block Aug 19, 2021
@rallen090 rallen090 deleted the ra/index-active-block-flush-state branch August 19, 2021 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants