Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dbnode] Account for Neg/Pos Offsets when building per field roaring bitmap posting lists #2213

Merged
merged 3 commits into from
Mar 16, 2020

Conversation

notbdu
Copy link
Contributor

@notbdu notbdu commented Mar 15, 2020

What this PR does / why we need it:

Need to account for negative/positive offsets when building per field postings lists from multiple segments.

Also adds a test to verify that per field postings lists built from multiple segments is correct.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

@robskillington robskillington changed the title Account for Neg/Pos Offsets [dbnode] Account for Neg/Pos Offsets when building per field roaring bitmap posting lists Mar 16, 2020
for iter.Next() {
field, pl := iter.Current()
plIter := pl.Iterator()
for plIter.Next() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also do the inverse and make sure there's none that aren't included? (which is what we saw frequently, non-included metrics?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. get a doc reader and read each one, then perhaps just check if it has the field whether the postings list contains the ID or whether it doesn't have it whether it doesn't contain it? (can use the postings lists Contains(id ID) bool method)

}

i.currFieldPostingsList.UnionMany(i.currFieldPostingsLists)
i.currReaders = append(i.currReaders, reader)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add the reader right at the start in case you do a continue? It seems like if it's the first one you do a direct union and miss adding the reader to the i.currReaders?

return false
}
for _, reader := range i.currReaders {
if err := reader.Close(); err != nil {
Copy link
Collaborator

@robskillington robskillington Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of closing the readers here, perhaps do it from the defer above where we clear out the slice? That way any created readers always gets closed. Seems like there's some early returns in the method?

Something like

	defer func() {
		for idx, reader := range i.currReaders {
			if err := reader.Close(); err != nil {
				i.err = err
			}
			i.currReaders[idx] = nil
		}
		i.currReaders = i.currReaders[:0]
	}()

i.currFieldPostingsLists = append(i.currFieldPostingsLists, pl)
i.currReaders = append(i.currReaders, reader)
value := curr + fieldsKeyIter.segment.offset - negativeOffset
_ = i.currFieldPostingsList.Insert(value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should check the error of this insertion? (Also maybe we should add checking of this from the other code in the terms postings lists merging too? I think it also ignores error from insert)

@codecov
Copy link

codecov bot commented Mar 16, 2020

Codecov Report

Merging #2213 into master will decrease coverage by 12%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #2213      +/-   ##
=========================================
- Coverage    72.3%   60.3%   -12.1%     
=========================================
  Files        1022     944      -78     
  Lines       88809   84697    -4112     
=========================================
- Hits        64275   51110   -13165     
- Misses      20240   29915    +9675     
+ Partials     4294    3672     -622
Flag Coverage Δ
#aggregator 82.1% <ø> (ø) ⬆️
#cluster 71.7% <ø> (-13.6%) ⬇️
#collector 50.9% <ø> (-32%) ⬇️
#dbnode 74.7% <ø> (-4.4%) ⬇️
#m3em 52.5% <ø> (-21.9%) ⬇️
#m3ninx 76.3% <ø> (+3.8%) ⬆️
#m3nsch 78% <ø> (+26.8%) ⬆️
#metrics 17.6% <ø> (ø) ⬆️
#msg 74.9% <ø> (+0.1%) ⬆️
#query 33.9% <ø> (-35%) ⬇️
#x 62.2% <ø> (-21.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5289a5f...4ce0652. Read the comment docs.

doc := docIter.Current()
pID := docIter.PostingsID()
found := checkIfFieldExistsInDoc(field, doc)
require.Equal(t, found, pl.Contains(pID))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice 👍

Copy link
Collaborator

@robskillington robskillington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@notbdu notbdu merged commit 548b247 into master Mar 16, 2020
@notbdu notbdu deleted the bdu/multi-seg-builder-fix branch March 16, 2020 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants