-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle ignored fields directly in SourceValueFetcher #68738
Conversation
Pinging @elastic/es-search (Team:Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a good direction to me. It addresses two issues that I've been concerned about:
- The
fields
option tries to reconstruct the document parsing logic. This logic is really complex and we sometimes hit edge cases. The current error mode for these cases is to fail the entire request. This means that Kibana Discover, which uses"fields": ["*"]
can have a whole search fail because of one bad value in one document. (We should certainly aim to make this more robust, perhaps by sharing field loading logic with document parsing or working to simplify parsing behavior. But those efforts will take some time.) - We don't handle ignored fields correctly in when fetching fields
inner_hits
. This is because the list of ignored fields is stored on the root document, and is not easily available in the inner hits context.
Related to point 1, I have one major open question about this approach: can we avoid swallowing legitimate errors? Otherwise we may have some broken parsing logic, but never realize and just silently drop values. Maybe we could be more specific about what types of exceptions we swallow. We could also consider returning a list of fields/ values that were ignored as part of the response.
The first option is tricky because we're doing the Exception handling in SourceValueFetcher directly, and it doesn't know about any of the exceptions that may be thrown by the various Mapper implementations. I like the idea of the second option though - perhaps a new fetch phase option called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of the second option though - perhaps a new fetch phase option called show_malformed_values or something
I will give this some thought but am happy to move forward with this PR without an alternative ready for returning ignored fields/ values. It's a clear improvement over the current approach.
return fetcher.fetchValues(lookup); | ||
} | ||
|
||
public static List<?> fetchSourceValues(MappedFieldType fieldType, Object... values) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One motivation for this PR is to return well-formed values for a document field, even if its some of its values were ignored. So it'd be good to add a test for this case explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a yaml test for this
@@ -83,7 +83,7 @@ public TokenCountFieldMapper build(ContentPath contentPath) { | |||
@Override | |||
public ValueFetcher valueFetcher(SearchExecutionContext context, String format) { | |||
if (hasDocValues() == false) { | |||
return (lookup, ignoredFields) -> List.of(); | |||
return (lookup) -> List.of(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comment, could be lookup -> List.of()
.
try { | ||
values.addAll((List<?>) parseSourceValue(sourceValue)); | ||
} | ||
catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comment, we always put catch
on same line as previous brace.
@elasticmachine run elasticsearch-ci/bwc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @romseygeek ! It'd be great to get this into 7.12 since it fixes an edge case.
- 2 | ||
- match: | ||
hits.hits.0.fields.products.0: { "manufacturer" : ["Supersoft"]} | ||
- match: | ||
hits.hits.0.fields.products.1: { "manufacturer" : ["HyperSmart"]} | ||
|
||
--- | ||
"Test ignores malformed values while returning valid ones": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to move this to FieldFetcherTests
, to prefer unit testing as much as possible.
@elasticmachine update branch |
@elasticmachine update branch |
Currently, the value fetcher framework handles ignored fields by reading the stored values of the _ignored metadata field, and passing these through on calls to fetchValues(). However, this means that if a document has multiple values indexed for a field, and one malformed value, then the fields API will ignore everything, including the valid values, and return an empty list for this document. If a document source contains a malformed value, then it must have been ignored at index time. Therefore, we can safely assume that if we get an exception parsing values from source at fetch time, they were also ignored at index time and they can be skipped. This commit moves this exception handling directly into SourceValueFetcher and ArraySourceValueFetcher, removing the need to inspect the _ignored metadata and fixing the case of mixed valid and invalid values.
Currently, the value fetcher framework handles ignored fields by reading the stored values of the _ignored metadata field, and passing these through on calls to fetchValues(). However, this means that if a document has multiple values indexed for a field, and one malformed value, then the fields API will ignore everything, including the valid values, and return an empty list for this document. If a document source contains a malformed value, then it must have been ignored at index time. Therefore, we can safely assume that if we get an exception parsing values from source at fetch time, they were also ignored at index time and they can be skipped. This commit moves this exception handling directly into SourceValueFetcher and ArraySourceValueFetcher, removing the need to inspect the _ignored metadata and fixing the case of mixed valid and invalid values.
Currently, the value fetcher framework handles ignored fields by reading the stored values of the _ignored metadata field, and passing these through on calls to fetchValues(). However, this means that if a document has multiple values indexed for a field, and one malformed value, then the fields API will ignore everything, including the valid values, and return an empty list for this document. If a document source contains a malformed value, then it must have been ignored at index time. Therefore, we can safely assume that if we get an exception parsing values from source at fetch time, they were also ignored at index time and they can be skipped. This commit moves this exception handling directly into SourceValueFetcher and ArraySourceValueFetcher, removing the need to inspect the _ignored metadata and fixing the case of mixed valid and invalid values.
The ValueFetcher for geo_shape will shortcut the validation of its source value if it detects that the source format and the requested format are the same. This worked fine when malformed values were dealt with by checking the _ignored metadata, but since #68738 we need to always validate source values at fetch time. This commit removes this special shortcut logic, and adds tests to check that geo_shape value fetchers do not return malformed source inputs. Fixes #69071
The ValueFetcher for geo_shape will shortcut the validation of its source value if it detects that the source format and the requested format are the same. This worked fine when malformed values were dealt with by checking the _ignored metadata, but since elastic#68738 we need to always validate source values at fetch time. This commit removes this special shortcut logic, and adds tests to check that geo_shape value fetchers do not return malformed source inputs. Fixes elastic#69071
The ValueFetcher for geo_shape will shortcut the validation of its source value if it detects that the source format and the requested format are the same. This worked fine when malformed values were dealt with by checking the _ignored metadata, but since #68738 we need to always validate source values at fetch time. This commit removes this special shortcut logic, and adds tests to check that geo_shape value fetchers do not return malformed source inputs. Fixes #69071
The ValueFetcher for geo_shape will shortcut the validation of its source value if it detects that the source format and the requested format are the same. This worked fine when malformed values were dealt with by checking the _ignored metadata, but since #68738 we need to always validate source values at fetch time. This commit removes this special shortcut logic, and adds tests to check that geo_shape value fetchers do not return malformed source inputs. Fixes #69071
Currently, the value fetcher framework handles ignored fields by reading
the stored values of the
_ignored
metadata field, and passing these throughon calls to
fetchValues()
. However, this means that if a document has multiplevalues indexed for a field, and one malformed value, then the fields API will
ignore everything, including the valid values, and return an empty list for this
document.
If a document source contains a malformed value, then it must have been
ignored at index time. Therefore, we can safely assume that if we get an
exception parsing values from source at fetch time, they were also ignored
at index time and they can be skipped. This commit moves this exception
handling directly into SourceValueFetcher and ArraySourceValueFetcher,
removing the need to inspect the
_ignored
metadata and fixing the caseof mixed valid and invalid values.