Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dbnode] Series ref resolver #3316

Merged
merged 38 commits into from
Mar 15, 2021
Merged

[dbnode] Series ref resolver #3316

merged 38 commits into from
Mar 15, 2021

Conversation

soundvibe
Copy link
Collaborator

@soundvibe soundvibe commented Mar 3, 2021

What this PR does / why we need it:

This PR implements series ref resolver which writes new series asynchronously, reducing lock contention during db-node bootstrapping. So now bootstrappers, instead of writing new series immediately, are able to use resolver to retrieve series ref. Resolver ensures that given series will be written async (and it will wait for the write to complete if it's still not inserted) or it will be returned immediately if it already exists.
Since new series are now written asynchronously, writing new data using series ref ideally should be done when more series ref resolvers are accumulated, because otherwise each data write will wait for the background write process to complete. So in Commit log and Peers bootstrappers some code were updated to retain the same performance as it was before.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:


Does this PR require updating code package or user-facing documentation?:


@codecov
Copy link

codecov bot commented Mar 8, 2021

Codecov Report

Merging #3316 (19b4e4e) into master (19b4e4e) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3316   +/-   ##
=======================================
  Coverage    72.6%    72.6%           
=======================================
  Files        1099     1099           
  Lines      103579   103579           
=======================================
  Hits        75210    75210           
  Misses      23210    23210           
  Partials     5159     5159           
Flag Coverage Δ
aggregator 76.8% <0.0%> (ø)
cluster 84.9% <0.0%> (ø)
collector 84.3% <0.0%> (ø)
dbnode 79.2% <0.0%> (ø)
m3em 74.4% <0.0%> (ø)
m3ninx 73.6% <0.0%> (ø)
metrics 19.8% <0.0%> (ø)
msg 74.7% <0.0%> (ø)
query 66.9% <0.0%> (ø)
x 80.3% <0.0%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 19b4e4e...535e786. Read the comment docs.

@soundvibe soundvibe changed the title [WIP] [dbnode] Series ref resolver [dbnode] Series ref resolver Mar 9, 2021
* master: (22 commits)
  Remove deprecated fields (#3327)
  Add quotas to Permits (#3333)
  [aggregator] Drop messages that have a drop policy applied (#3341)
  Fix NPE due to race with a closing series (#3056)
  [coordinator] Apply auto-mapping rules if-and-only-if no drop policies are in effect (#3339)
  [aggregator] Add validation in AddTimedWithStagedMetadatas (#3338)
  [coordinator] Fix panic in Ready endpoint for admin coordinator (#3335)
  [instrument] Config option to emit detailed Go runtime metrics only (#3332)
  [aggregator] Sort heap in one go, instead of iterating one-by-one (#3331)
  [pool] Add support for dynamic, sync.Pool backed, object pools (#3334)
  Enable PANIC_ON_INVARIANT_VIOLATED for tests (#3326)
  [aggregator] CanLead for unflushed window takes BufferPast into account (#3328)
  Optimize StagedMetadatas conversion (#3330)
  [m3msg] Improve message scan performance (#3319)
  [dbnode] Add reason tag to bootstrap retries metric (#3317)
  [coordinator] Enable rule filtering on prom metric type (#3325)
  Update m3dbnode-all-config.yml (#3204)
  [coordinator] Include Type in RollupOp.Equal (#3322)
  [coordinator] Simplify iteration logic of matchRollupTarget (#3321)
  [coordinator] Add rollup type to remove specific dimensions (#3318)
  ...
@@ -28,11 +28,14 @@ import (
"sync"
"time"

"golang.org/x/sync/errgroup"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps move the other third party imports at the bottom to the top here with errgroup import?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

errs, _ := errgroup.WithContext(ctx.GoContext())
errs.Go(worker.readSeriesBlocks)
if err := s.loadBlocks(worker.dataCh, writeType); err != nil {
close(worker.dataCh)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm doesn't it look like this is closed on line 119?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to close it from one place, either in the worker itself always (with defer), or from the outside always.

Maybe just remove this and rely on the defer in the worker itself?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Refactored to use cancellable context so that readSeriesBlocks could be cancelled if loadBlocks() returns an error unexpectedly.

@linasm linasm self-assigned this Mar 9, 2021
Copy link
Collaborator

@linasm linasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments (mostly nits and clarity suggestions).

src/dbnode/storage/bootstrap/util.go Outdated Show resolved Hide resolved
src/dbnode/storage/shard.go Outdated Show resolved Hide resolved
src/dbnode/storage/shard.go Outdated Show resolved Hide resolved
src/dbnode/storage/types.go Show resolved Hide resolved
src/dbnode/storage/shard.go Outdated Show resolved Hide resolved
src/dbnode/storage/shard.go Outdated Show resolved Hide resolved
src/dbnode/storage/shard.go Outdated Show resolved Hide resolved
src/dbnode/storage/shard_test.go Outdated Show resolved Hide resolved
src/dbnode/storage/bootstrap/bootstrapper/peers/source.go Outdated Show resolved Hide resolved
* master:
  [dbnode] Remove unused shardBlockVolume (#3347)
  Fix new Go 1.15+ vet check failures (#3345)
  [coordinator] Add config option to make rollup rules untimed (#3343)
  [aggregator] Raw TCP Client write queueing/buffering refactor (#3342)
  [dbnode] Fail M3TSZ encoding on DeltaOfDelta overflow (#3329)
@linasm linasm removed their assignment Mar 10, 2021
* master:
  [dtest] ns update/delete api (#3344)
src/dbnode/storage/shard.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@linasm linasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Any thoughts on putting storage.seriesResolver into its own file rather than further growing this monster shard.go file (see #3316 (comment))?

@robskillington
Copy link
Collaborator

I'd also support moving it into it's own file too

tags.Close()

select {
case <-ctx.Done():
Copy link
Collaborator

@robskillington robskillington Mar 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be expensive to check on every single series (since reading from channel does need thread safety), perhaps we can check this every N series? Say every 1024 just using a top level var here if i%1024==0 { /* check ctx.Done() */ }?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@robskillington robskillington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than minor comments (which would be good to address before merging, but stamping so that you can address those asynchronously and then merge PR)

* master:
  [dbnode] Remove unused getMockReader (#3351)
  [dbnode] Add back teardown of containers at end of Prometheus integration test (#3349)
  [query] Add series metadata label limit header for trimming response (#3348)

# Conflicts:
#	src/dbnode/storage/shard_test.go
@soundvibe soundvibe requested a review from linasm March 15, 2021 12:18
* master:
  [dbnode] Fix clock options not propagated where needed (#3353)
src/dbnode/storage/shard.go Outdated Show resolved Hide resolved
src/dbnode/storage/series_resolver.go Outdated Show resolved Hide resolved
@soundvibe soundvibe merged commit 3521fd7 into master Mar 15, 2021
@soundvibe soundvibe deleted the linasn/series-resolver branch March 15, 2021 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants