Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix query analyzer logic for mixed conjunctions of terms and ranges #49803

Merged
merged 6 commits into from
Dec 10, 2019

Conversation

romseygeek
Copy link
Contributor

When the query analyzer examines a conjunction containing both terms and ranges,
it should only include ranges in the minimum_should_match calculation if there are no
other range queries on that same field within the conjunction. This is because we cannot
build a selection query over disjoint ranges on the same field, and it is not easy to check
if two range queries have an overlap.

The current logic to calculate this just sets minimum_should_match to 1 or 0, dependent
on whether or not the current range is over a field that has already been seen. However, this
can be incorrect in the case that there are terms in the same match group which adjust the
minimum_should_match downwards. Instead, the logic should be changed to match the
terms extraction, whereby we adjust minimum_should_match downwards if we have already
seen a range field.

Fixes #49684

@romseygeek romseygeek added >bug :Search Relevance/Percolator Reverse search: find queries that match a document v8.0.0 v7.6.0 labels Dec 3, 2019
@romseygeek romseygeek self-assigned this Dec 3, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Percolator)

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is tricky. :)

.build();

Result r = analyze(disj, Version.CURRENT);
assertThat(r.minimumShouldMatch, equalTo(1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you count the number of extractions and check verified for all queries?

.build();

result = analyze(q2, Version.CURRENT);
assertThat(result.minimumShouldMatch, equalTo(2));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's maybe also have a test for the case that there are multiple range queries on the same field, or multiple range queries on different fields?

@romseygeek
Copy link
Contributor Author

Thanks for the review @jpountz . I added more tests, and discovered that the logic was still not quite correct - tricky, as you say... We now only check that range fields from a particular set of extractions have not been seen in other results. Multiple identical range fields within a single Result extraction set will have already been dealt with, so don't require any adjustment to the msm.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It's hard for me to reason about all possible corner cases, but the tests give me some confidence that it's at least more correct than the previous logic.

@@ -286,6 +284,9 @@ private static Result handleConjunction(List<Result> conjunctionsWithUnknowns) {
verified = false;
}
}
// add range fields from this Result to the seenRangeFields set so that minimumShouldMatch is correctly
// calculated for subsequent Results
result.extractions.stream().filter(e -> e.range != null).forEach(e -> seenRangeFields.add(e.range.fieldName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I generally have a preference for using method refs when possible and using new lines when chaining, e.g.

Suggested change
result.extractions.stream().filter(e -> e.range != null).forEach(e -> seenRangeFields.add(e.range.fieldName));
result.extractions.stream()
.filter(Objects::nonNull)
.map(e -> e.range)
.forEach(seenRangeFields::add);

@romseygeek romseygeek merged commit fcae55a into elastic:master Dec 10, 2019
@romseygeek romseygeek deleted the percolator-test-failure branch December 10, 2019 10:44
romseygeek added a commit that referenced this pull request Dec 10, 2019
…49803)

When the query analyzer examines a conjunction containing both terms and ranges,
it should only include ranges in the minimum_should_match calculation if there are no
other range queries on that same field within the conjunction. This is because we cannot
build a selection query over disjoint ranges on the same field, and it is not easy to check
if two range queries have an overlap.

The current logic to calculate this just sets minimum_should_match to 1 or 0, dependent
on whether or not the current range is over a field that has already been seen. However, this
can be incorrect in the case that there are terms in the same match group which adjust the
minimum_should_match downwards. Instead, the logic should be changed to match the
terms extraction, whereby we adjust minimum_should_match downwards if we have already
seen a range field.

Fixes #49684
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
…lastic#49803)

When the query analyzer examines a conjunction containing both terms and ranges,
it should only include ranges in the minimum_should_match calculation if there are no
other range queries on that same field within the conjunction. This is because we cannot
build a selection query over disjoint ranges on the same field, and it is not easy to check
if two range queries have an overlap.

The current logic to calculate this just sets minimum_should_match to 1 or 0, dependent
on whether or not the current range is over a field that has already been seen. However, this
can be incorrect in the case that there are terms in the same match group which adjust the
minimum_should_match downwards. Instead, the logic should be changed to match the
terms extraction, whereby we adjust minimum_should_match downwards if we have already
seen a range field.

Fixes elastic#49684
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search Relevance/Percolator Reverse search: find queries that match a document v7.6.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] org.elasticsearch.percolator.CandidateQueryTests.testDuel2 failure
4 participants