Add guide for tuning kNN search (#89782)

This 'how to' guide explains performance considerations specific to kNN search. It takes inspiration from the 'tune for search speed' guide.
elastic · Oct 12, 2022 · f4038b3 · f4038b3
1 parent 44592b0
commit f4038b3
Show file tree

Hide file tree

Showing 3 changed files with 145 additions and 5 deletions.
diff --git a/docs/reference/how-to.asciidoc b/docs/reference/how-to.asciidoc
@@ -23,6 +23,8 @@ include::how-to/indexing-speed.asciidoc[]
 
 include::how-to/search-speed.asciidoc[]
 
+include::how-to/knn-search.asciidoc[]
+
 include::how-to/disk-usage.asciidoc[]
 
 include::how-to/size-your-shards.asciidoc[]

diff --git a/docs/reference/how-to/knn-search.asciidoc b/docs/reference/how-to/knn-search.asciidoc
@@ -0,0 +1,134 @@
+[[tune-knn-search]]
+== Tune approximate kNN search
+
+{es} supports <<approximate-knn, approximate k-nearest neighbor search>> for
+efficiently finding the _k_ nearest vectors to a query vector. Since
+approximate kNN search works differently from other queries, there are special
+considerations around its performance.
+
+Many of these recommendations help improve search speed. With approximate kNN,
+the indexing algorithm runs searches under the hood to create the vector index
+structures. So these same recommendations also help with indexing speed.
+
+[discrete]
+=== Prefer `dot_product` over `cosine`
+
+When indexing vectors for approximate kNN search, you need to specify the
+<<dense-vector-similarity, `similarity` function>> for comparing the vectors.
+If you'd like to compare vectors through cosine similarity, there are two
+options.
+
+The `cosine` option accepts any float vector and computes the cosine
+similarity. While this is convenient for testing, it's not the most efficient
+approach. Instead, we recommend using the `dot_product` option to compute the
+similarity. To use `dot_product`, all vectors need to be normalized in advance
+to have length 1. The `dot_product` option is significantly faster, since it
+avoids performing extra vector length computations during the search.
+
+[discrete]
+=== Ensure data nodes have enough memory
+
+{es} uses the https://arxiv.org/abs/1603.09320[HNSW] algorithm for approximate
+kNN search. HNSW is a graph-based algorithm which only works efficiently when
+most vector data is held in memory. You should ensure that data nodes have at
+least enough RAM to hold the vector data and index structures. To check the
+size of the vector data, you can use the <<indices-disk-usage>> API. As a
+loose rule of thumb, and assuming the default HNSW options, the bytes required
+is roughly `num_vectors * 4 * (num_dimensions + 32)`. Note that the required
+RAM is for the filesystem cache, which is separate from the Java heap.
+
+The data nodes should also leave a "buffer" for other ways that RAM is
+needed. For example your index might also include text fields and numerics,
+which also benefit from using filesystem cache. It's recommended to run
+benchmarks with your specific dataset to ensure there's a sufficient amount of
+memory to give good search performance.
+
+[discrete]
+include::search-speed.asciidoc[tag=warm-fs-cache]
+
+[discrete]
+=== Reduce vector dimensionality
+
+The speed of kNN search scales linearly with the number of vector dimensions,
+because each similarity computation considers each element in the two vectors.
+Whenever possible, it's better to use vectors with a lower dimension. Some
+embedding models come in different "sizes", with both lower and higher
+dimensional options available. You could also experiment with dimensionality
+reduction techniques like PCA. When experimenting with different approaches,
+it's important to measure the impact on relevance to ensure the search
+quality is still acceptable.
+
+[discrete]
+=== Exclude vector fields from `_source`
+
+{es} stores the original JSON document that was passed at index time in the
+<<mapping-source-field, `_source` field>>. By default, each hit in the search
+results contains the full document `_source`. When the documents contain
+high-dimensional `dense_vector` fields, the `_source` can be quite large and
+expensive to load. This could significantly slow down the speed of kNN search.
+
+You can disable storing `dense_vector` fields in the `_source` through the
+<<include-exclude, `excludes`>> mapping parameter. This prevents loading and
+returning large vectors during search, and also cuts down on the index size.
+Vectors that have been omitted from `_source` can still be used in kNN search,
+since it relies on separate data structures to perform the search. Before
+using the <<include-exclude, `excludes`>> parameter, make sure to review the
+downsides of omitting fields from `_source`.
+
+[discrete]
+=== Reduce the number of index segments
+
+{es} shards are composed of segments, which are internal storage elements in
+the index. For approximate kNN search, {es} stores the dense vector values of
+each segment as an HNSW graph. kNN search must check each segment, searching
+through one HNSW graph after another. This means kNN search can be
+significantly faster if there are fewer segments. By default, {es} periodically
+merges smaller segments into larger ones through a background
+<<index-modules-merge, merge process>>. If this isn't sufficient, you can take
+explicit steps to reduce the number of index segments.
+
+[discrete]
+==== Force merge to one segment
+
+The <<indices-forcemerge,force merge>> operation forces an index merge. If you
+force merge to one segment, the kNN search only need to check a single,
+all-inclusive HNSW graph. Force merging `dense_vector` fields is an expensive
+operation that can take significant time to complete.
+
+include::{es-repo-dir}/indices/forcemerge.asciidoc[tag=force-merge-read-only-warn]
+
+[discrete]
+==== Create large segments during bulk indexing
+
+A common pattern is to first perform an initial bulk upload, then make an
+index available for searches. Instead of force merging, you can adjust the
+index settings to encourage {es} to create larger initial segments:
+
+* Ensure there are no searches during the bulk upload and disable
+<<index-refresh-interval-setting,`index.refresh_interval`>> by setting it to
+`-1`. This prevents refresh operations and avoids creating extra segments.
+* Give {es} a large indexing buffer so it can accept more documents before
+flushing. By default, the <<indexing-buffer,`indices.memory.index_buffer_size`>>
+is set to 10% of the heap size. With a substantial heap size like 32GB, this
+is often enough. To allow the full indexing buffer to be used, you should also
+increase the limit <<index-modules-translog,`index.translog.flush_threshold_size`>>.
+
+[discrete]
+=== Avoid heavy indexing during searches
+
+Actively indexing documents can have a negative impact on approximate kNN
+search performance, since indexing threads steal compute resources from
+search. When indexing and searching at the same time, {es} also refreshes
+frequently, which creates several small segments. This also hurts search
+performance, since approximate kNN search is slower when there are more
+segments.
+
+When possible, it's best to avoid heavy indexing during approximate kNN
+search. If you need to reindex all the data, perhaps because the vector
+embedding model changed, then it's better to reindex the new documents into a
+separate index rather than update them in-place. This helps avoid the slowdown
+mentioned above, and prevents expensive merge operations due to frequent
+document updates.
+
+[discrete]
+include::search-speed.asciidoc[tag=readahead]
diff --git a/docs/reference/how-to/search-speed.asciidoc b/docs/reference/how-to/search-speed.asciidoc
@@ -10,6 +10,7 @@ goes to the filesystem cache so that Elasticsearch can keep hot regions of the
 index in physical memory.
 
 [discrete]
+tag::readahead[]
 === Avoid page cache thrashing by using modest readahead values on Linux
 
 Search can cause a lot of randomized read I/O. When the underlying block
@@ -34,6 +35,7 @@ as a transient setting). We recommend a value of `128KiB` for readahead.
 WARNING: `blockdev` expects values in 512 byte sectors whereas `lsblk` reports
 values in `KiB`. As an example, to temporarily set readahead to `128KiB`
 for `/dev/nvme0n1`, specify `blockdev --setra 256 /dev/nvme0n1`.
+end::readahead[]
 
 [discrete]
 === Use faster hardware
@@ -356,6 +358,7 @@ PUT index
 }
 --------------------------------------------------
 
+tag::warm-fs-cache[]
 [discrete]
 === Warm up the filesystem cache
 
@@ -369,6 +372,7 @@ depending on the file extension using the
 WARNING: Loading data into the filesystem cache eagerly on too many indices or
 too many files will make search _slower_ if the filesystem cache is not large
 enough to hold all the data. Use with caution.
+end::warm-fs-cache[]
 
 [discrete]
 === Use index sorting to speed up conjunctions
@@ -422,15 +426,15 @@ right number of replicas for you is
 
 === Tune your queries with the Search Profiler
 
-The {ref}/search-profile.html[Profile API] provides detailed information about 
+The {ref}/search-profile.html[Profile API] provides detailed information about
 how each component of your queries and aggregations impacts the time it takes
-to process the request. 
+to process the request.
 
 The {kibana-ref}/xpack-profiler.html[Search Profiler] in {kib}
-makes it easy to navigate and analyze the profile results and 
-give you insight into how to tune your queries to improve performance and reduce load. 
+makes it easy to navigate and analyze the profile results and
+give you insight into how to tune your queries to improve performance and reduce load.
 
-Because the Profile API itself adds significant overhead to the query, 
+Because the Profile API itself adds significant overhead to the query,
 this information is best used to understand the relative cost of the various
 query components. It does not provide a reliable measure of actual processing time.