Avoid eagerly loading StoredFieldsReader in fetch phase #83693

jtibshirani · 2022-02-08T22:24:17Z

Every time we create a hit document, we create a new SourceLookup and call
setSegmentAndDocument. This in turn creates a new StoredFieldsReader, which is
pretty expensive. In scenarios where you are retrieving a lot of hits, this can
add significant overhead. Prior to version 7.11, we did not create a new
SourceLookup per hit, so this is a performance regression.

This PR updates setSegmentAndDocument to avoid eagerly creating a
new StoredFieldsReader (through StoredFieldsReader#getMergeInstance).

Closes #82777.

Every time we create a hit document, we create a new SourceLookup and call setSegmentAndDocument. This in turn creates a new StoredFieldsReader, which is pretty expensive. In scenarios where you are retrieving a lot of hits, this can add significant overhead. Prior to version 7.11, we did not create a new SourceLookup per hit, so this is a performance regression. This PR updates setSegmentAndDocument to avoid eagerly creating a new StoredFieldsReader (through StoredFieldsReader#getMergeInstance).

elasticsearchmachine · 2022-02-08T22:24:41Z

Hi @jtibshirani, I've created a changelog YAML for you.

jtibshirani · 2022-02-08T22:30:19Z

Some notes:

An alternative would be to introduce SourceLookup#setDoc, which sets the document without the leaf reader context. This felt less solid, and I wanted to avoid touching the HitContext -- source handling in the fetch phase is very complex and doesn't have strong test coverage!
This gives more motivation for adding fetch benchmarks (rally-tracks#199). I'm also guessing that EQL tests might have caught it, which didn't exist back in 7.11.

jtibshirani · 2022-02-08T22:31:19Z

libs/core/src/main/java/org/elasticsearch/core/MemoizedSupplier.java

+
+import java.util.function.Supplier;
+
+public class MemoizedSupplier<T> implements Supplier<T> {


I restored this class, which was only very recently deleted because it was unused.

jtibshirani · 2022-02-08T22:34:12Z

server/src/main/java/org/elasticsearch/search/lookup/SourceLookup.java

+            // get better sequential access.
+            if (context.reader() instanceof SequentialStoredFieldsLeafReader lf) {
+                // Avoid eagerly loading the stored fields reader, since this can be expensive
+                Supplier<StoredFieldsReader> supplier = new MemoizedSupplier<>(lf::getSequentialStoredFieldsReader);


The main issue is that getSequentialStoredFieldsReader calls StoredFieldsReader#getMergeInstance, which creates a new stored fields reader. I wonder if this could be optimized in Lucene to avoid recreating it every time 🤔

I think ideally we'd like to push as much of this up into Lucene as possible, as it is all hacks at the moment and if anything changes in how stored fields are merged then we are in trouble.

elasticmachine · 2022-02-09T05:39:48Z

Pinging @elastic/es-search (Team:Search)

romseygeek

LGTM

romseygeek · 2022-02-09T11:35:02Z

server/src/main/java/org/elasticsearch/search/lookup/SourceLookup.java

+            // get better sequential access.
+            if (context.reader() instanceof SequentialStoredFieldsLeafReader lf) {
+                // Avoid eagerly loading the stored fields reader, since this can be expensive
+                Supplier<StoredFieldsReader> supplier = new MemoizedSupplier<>(lf::getSequentialStoredFieldsReader);


I think ideally we'd like to push as much of this up into Lucene as possible, as it is all hacks at the moment and if anything changes in how stored fields are merged then we are in trouble.

ywelsch

LGTM. Thanks @jtibshirani.

elasticsearchmachine · 2022-02-09T17:27:55Z

Hi @jtibshirani, I've updated the changelog YAML for you.

jtibshirani · 2022-02-09T17:28:39Z

Thanks for the reviews! I forgot to link the original issue: #82777.

jtibshirani · 2022-02-09T17:56:27Z

@elasticmachine ok to test

Every time we create a hit document, we create a new SourceLookup and call setSegmentAndDocument. This in turn creates a new StoredFieldsReader, which is pretty expensive. In scenarios where you are retrieving a lot of hits, this can add significant overhead. Prior to version 7.11, we did not create a new SourceLookup per hit, so this is a performance regression. This PR updates setSegmentAndDocument to avoid eagerly creating a new StoredFieldsReader (through StoredFieldsReader#getMergeInstance).

jtibshirani added >bug :Search/Search Search-related issues that do not fall into other categories v8.1.0 v7.17.1 v8.0.1 labels Feb 8, 2022

elasticsearchmachine added the v8.2.0 label Feb 8, 2022

Update docs/changelog/83693.yaml

51d18b0

jtibshirani commented Feb 8, 2022

View reviewed changes

jtibshirani added 2 commits February 8, 2022 15:22

Fix spotless

6619a44

Improve test

0e03617

jtibshirani marked this pull request as ready for review February 9, 2022 05:39

elasticmachine added the Team:Search Meta label for search team label Feb 9, 2022

romseygeek approved these changes Feb 9, 2022

View reviewed changes

ywelsch approved these changes Feb 9, 2022

View reviewed changes

Update docs/changelog/83693.yaml

9394682

jtibshirani mentioned this pull request Feb 9, 2022

Performance degradation for new HitContext constructor when only pulling docvalues #82777

Closed

jtibshirani merged commit 4e28da4 into elastic:master Feb 9, 2022

jtibshirani deleted the stored-fields-reader branch February 9, 2022 19:20

jtibshirani added the backport pending label Feb 9, 2022

jtibshirani mentioned this pull request Feb 9, 2022

Avoid eagerly loading StoredFieldsReader in fetch phase #83756

Merged

jtibshirani removed the backport pending label Feb 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid eagerly loading StoredFieldsReader in fetch phase #83693

Avoid eagerly loading StoredFieldsReader in fetch phase #83693

jtibshirani commented Feb 8, 2022 •

edited

Loading

elasticsearchmachine commented Feb 8, 2022

jtibshirani commented Feb 8, 2022 •

edited

Loading

jtibshirani Feb 8, 2022

jtibshirani Feb 8, 2022

romseygeek Feb 9, 2022

elasticmachine commented Feb 9, 2022

romseygeek left a comment

romseygeek Feb 9, 2022

ywelsch left a comment •

edited

Loading

elasticsearchmachine commented Feb 9, 2022

jtibshirani commented Feb 9, 2022

jtibshirani commented Feb 9, 2022


		import java.util.function.Supplier;

		public class MemoizedSupplier<T> implements Supplier<T> {

Avoid eagerly loading StoredFieldsReader in fetch phase #83693

Avoid eagerly loading StoredFieldsReader in fetch phase #83693

Conversation

jtibshirani commented Feb 8, 2022 • edited Loading

elasticsearchmachine commented Feb 8, 2022

jtibshirani commented Feb 8, 2022 • edited Loading

jtibshirani Feb 8, 2022

Choose a reason for hiding this comment

jtibshirani Feb 8, 2022

Choose a reason for hiding this comment

romseygeek Feb 9, 2022

Choose a reason for hiding this comment

elasticmachine commented Feb 9, 2022

romseygeek left a comment

Choose a reason for hiding this comment

romseygeek Feb 9, 2022

Choose a reason for hiding this comment

ywelsch left a comment • edited Loading

Choose a reason for hiding this comment

elasticsearchmachine commented Feb 9, 2022

jtibshirani commented Feb 9, 2022

jtibshirani commented Feb 9, 2022

jtibshirani commented Feb 8, 2022 •

edited

Loading

jtibshirani commented Feb 8, 2022 •

edited

Loading

ywelsch left a comment •

edited

Loading