Handle ignored fields directly in SourceValueFetcher #68738

romseygeek · 2021-02-09T11:42:50Z

Currently, the value fetcher framework handles ignored fields by reading
the stored values of the _ignored metadata field, and passing these through
on calls to fetchValues(). However, this means that if a document has multiple
values indexed for a field, and one malformed value, then the fields API will
ignore everything, including the valid values, and return an empty list for this
document.

If a document source contains a malformed value, then it must have been
ignored at index time. Therefore, we can safely assume that if we get an
exception parsing values from source at fetch time, they were also ignored
at index time and they can be skipped. This commit moves this exception
handling directly into SourceValueFetcher and ArraySourceValueFetcher,
removing the need to inspect the _ignored metadata and fixing the case
of mixed valid and invalid values.

elasticmachine · 2021-02-09T11:42:53Z

Pinging @elastic/es-search (Team:Search)

jtibshirani

This feels like a good direction to me. It addresses two issues that I've been concerned about:

The fields option tries to reconstruct the document parsing logic. This logic is really complex and we sometimes hit edge cases. The current error mode for these cases is to fail the entire request. This means that Kibana Discover, which uses "fields": ["*"] can have a whole search fail because of one bad value in one document. (We should certainly aim to make this more robust, perhaps by sharing field loading logic with document parsing or working to simplify parsing behavior. But those efforts will take some time.)
We don't handle ignored fields correctly in when fetching fields inner_hits. This is because the list of ignored fields is stored on the root document, and is not easily available in the inner hits context.

Related to point 1, I have one major open question about this approach: can we avoid swallowing legitimate errors? Otherwise we may have some broken parsing logic, but never realize and just silently drop values. Maybe we could be more specific about what types of exceptions we swallow. We could also consider returning a list of fields/ values that were ignored as part of the response.

romseygeek · 2021-02-10T10:21:13Z

Maybe we could be more specific about what types of exceptions we swallow. We could also consider returning a list of fields/ values that were ignored as part of the response.

The first option is tricky because we're doing the Exception handling in SourceValueFetcher directly, and it doesn't know about any of the exceptions that may be thrown by the various Mapper implementations. I like the idea of the second option though - perhaps a new fetch phase option called show_malformed_values or something that would return each ignored value along with its parsing exception?

jtibshirani

I like the idea of the second option though - perhaps a new fetch phase option called show_malformed_values or something

I will give this some thought but am happy to move forward with this PR without an alternative ready for returning ignored fields/ values. It's a clear improvement over the current approach.

jtibshirani · 2021-02-11T01:08:59Z

test/framework/src/main/java/org/elasticsearch/index/mapper/FieldTypeTestCase.java

+        return fetcher.fetchValues(lookup);
+    }
+
+    public static List<?> fetchSourceValues(MappedFieldType fieldType, Object... values) throws IOException {


One motivation for this PR is to return well-formed values for a document field, even if its some of its values were ignored. So it'd be good to add a test for this case explicitly.

I've added a yaml test for this

jtibshirani · 2021-02-11T01:19:57Z

modules/mapper-extras/src/main/java/org/elasticsearch/index/mapper/TokenCountFieldMapper.java

@@ -83,7 +83,7 @@ public TokenCountFieldMapper build(ContentPath contentPath) {
        @Override
        public ValueFetcher valueFetcher(SearchExecutionContext context, String format) {
            if (hasDocValues() == false) {
-                return (lookup, ignoredFields) -> List.of();
+                return (lookup) -> List.of();


Small comment, could be lookup -> List.of().

jtibshirani · 2021-02-11T01:21:15Z

server/src/main/java/org/elasticsearch/index/mapper/ArraySourceValueFetcher.java

+            try {
+                values.addAll((List<?>) parseSourceValue(sourceValue));
+            }
+            catch (Exception e) {


Small comment, we always put catch on same line as previous brace.

romseygeek · 2021-02-15T11:13:18Z

@elasticmachine run elasticsearch-ci/bwc

jtibshirani

Thanks @romseygeek ! It'd be great to get this into 7.12 since it fixes an edge case.

jtibshirani · 2021-02-15T20:24:48Z

rest-api-spec/src/main/resources/rest-api-spec/test/search/330_fetch_fields.yml

        - 2
  - match:
        hits.hits.0.fields.products.0: { "manufacturer" : ["Supersoft"]}
  - match:
        hits.hits.0.fields.products.1: { "manufacturer" : ["HyperSmart"]}
+
+---
+"Test ignores malformed values while returning valid ones":


It might be nice to move this to FieldFetcherTests, to prefer unit testing as much as possible.

romseygeek · 2021-02-16T13:42:35Z

@elasticmachine update branch

romseygeek · 2021-02-16T14:23:55Z

@elasticmachine update branch

Currently, the value fetcher framework handles ignored fields by reading the stored values of the _ignored metadata field, and passing these through on calls to fetchValues(). However, this means that if a document has multiple values indexed for a field, and one malformed value, then the fields API will ignore everything, including the valid values, and return an empty list for this document. If a document source contains a malformed value, then it must have been ignored at index time. Therefore, we can safely assume that if we get an exception parsing values from source at fetch time, they were also ignored at index time and they can be skipped. This commit moves this exception handling directly into SourceValueFetcher and ArraySourceValueFetcher, removing the need to inspect the _ignored metadata and fixing the case of mixed valid and invalid values.

Relates to #68738

The ValueFetcher for geo_shape will shortcut the validation of its source value if it detects that the source format and the requested format are the same. This worked fine when malformed values were dealt with by checking the _ignored metadata, but since #68738 we need to always validate source values at fetch time. This commit removes this special shortcut logic, and adds tests to check that geo_shape value fetchers do not return malformed source inputs. Fixes #69071

The ValueFetcher for geo_shape will shortcut the validation of its source value if it detects that the source format and the requested format are the same. This worked fine when malformed values were dealt with by checking the _ignored metadata, but since elastic#68738 we need to always validate source values at fetch time. This commit removes this special shortcut logic, and adds tests to check that geo_shape value fetchers do not return malformed source inputs. Fixes elastic#69071

The ValueFetcher for geo_shape will shortcut the validation of its source value if it detects that the source format and the requested format are the same. This worked fine when malformed values were dealt with by checking the _ignored metadata, but since #68738 we need to always validate source values at fetch time. This commit removes this special shortcut logic, and adds tests to check that geo_shape value fetchers do not return malformed source inputs. Fixes #69071

Handle ignored fields directly in SourceValueFetcher

c738b87

romseygeek added >bug :Search/Search Search-related issues that do not fall into other categories v8.0.0 v7.12.0 labels Feb 9, 2021

romseygeek requested review from jtibshirani and cbuescher February 9, 2021 11:42

romseygeek self-assigned this Feb 9, 2021

elasticmachine added the Team:Search Meta label for search team label Feb 9, 2021

Merge remote-tracking branch 'origin/master' into fetch/ignored-fields

174f3bb

jtibshirani reviewed Feb 9, 2021

View reviewed changes

Merge remote-tracking branch 'origin/master' into fetch/ignored-fields

0b2572a

jtibshirani reviewed Feb 11, 2021

View reviewed changes

romseygeek added 2 commits February 15, 2021 10:18

Merge remote-tracking branch 'origin/master' into fetch/ignored-fields

edcf43d

Add YAML test; small cleanups

b793998

jtibshirani approved these changes Feb 15, 2021

View reviewed changes

romseygeek added 2 commits February 16, 2021 11:52

Merge remote-tracking branch 'origin/master' into fetch/ignored-fields

38cc321

Add a unit test as well

865f332

Merge branch 'master' into fetch/ignored-fields

3caf378

Merge branch 'master' into fetch/ignored-fields

6cab139

romseygeek merged commit 8fba6e4 into elastic:master Feb 16, 2021

romseygeek deleted the fetch/ignored-fields branch February 16, 2021 15:19

jtibshirani mentioned this pull request Feb 16, 2021

FieldExtractorIT#testGeoShapeField throws XContentParseException #69071

Closed

romseygeek mentioned this pull request Feb 17, 2021

Always validate geo shapes when fetching #69104

Merged

romseygeek mentioned this pull request Feb 17, 2021

Adjust YAML test skip value after backport #69105

Merged

romseygeek added a commit that referenced this pull request Feb 17, 2021

Adjust YAML test skip value after backport (#69105)

2c73387

Relates to #68738

romseygeek mentioned this pull request Mar 1, 2021

Always validate geo shapes when fetching (#69104) #69684

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle ignored fields directly in SourceValueFetcher #68738

Handle ignored fields directly in SourceValueFetcher #68738

romseygeek commented Feb 9, 2021

elasticmachine commented Feb 9, 2021

jtibshirani left a comment

romseygeek commented Feb 10, 2021

jtibshirani left a comment

jtibshirani Feb 11, 2021

romseygeek Feb 15, 2021

jtibshirani Feb 11, 2021

jtibshirani Feb 11, 2021

romseygeek commented Feb 15, 2021

jtibshirani left a comment

jtibshirani Feb 15, 2021

romseygeek commented Feb 16, 2021

romseygeek commented Feb 16, 2021

Handle ignored fields directly in SourceValueFetcher #68738

Handle ignored fields directly in SourceValueFetcher #68738

Conversation

romseygeek commented Feb 9, 2021

elasticmachine commented Feb 9, 2021

jtibshirani left a comment

Choose a reason for hiding this comment

romseygeek commented Feb 10, 2021

jtibshirani left a comment

Choose a reason for hiding this comment

jtibshirani Feb 11, 2021

Choose a reason for hiding this comment

romseygeek Feb 15, 2021

Choose a reason for hiding this comment

jtibshirani Feb 11, 2021

Choose a reason for hiding this comment

jtibshirani Feb 11, 2021

Choose a reason for hiding this comment

romseygeek commented Feb 15, 2021

jtibshirani left a comment

Choose a reason for hiding this comment

jtibshirani Feb 15, 2021

Choose a reason for hiding this comment

romseygeek commented Feb 16, 2021

romseygeek commented Feb 16, 2021