Skip to content

Commit

Permalink
Add guide for tuning kNN search (#89782)
Browse files Browse the repository at this point in the history
This 'how to' guide explains performance considerations specific to kNN search.
It takes inspiration from the 'tune for search speed' guide.
  • Loading branch information
jtibshirani committed Oct 12, 2022
1 parent 44592b0 commit f4038b3
Show file tree
Hide file tree
Showing 3 changed files with 145 additions and 5 deletions.
2 changes: 2 additions & 0 deletions docs/reference/how-to.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ include::how-to/indexing-speed.asciidoc[]

include::how-to/search-speed.asciidoc[]

include::how-to/knn-search.asciidoc[]

include::how-to/disk-usage.asciidoc[]

include::how-to/size-your-shards.asciidoc[]
Expand Down
134 changes: 134 additions & 0 deletions docs/reference/how-to/knn-search.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
[[tune-knn-search]]
== Tune approximate kNN search

{es} supports <<approximate-knn, approximate k-nearest neighbor search>> for
efficiently finding the _k_ nearest vectors to a query vector. Since
approximate kNN search works differently from other queries, there are special
considerations around its performance.

Many of these recommendations help improve search speed. With approximate kNN,
the indexing algorithm runs searches under the hood to create the vector index
structures. So these same recommendations also help with indexing speed.

[discrete]
=== Prefer `dot_product` over `cosine`

When indexing vectors for approximate kNN search, you need to specify the
<<dense-vector-similarity, `similarity` function>> for comparing the vectors.
If you'd like to compare vectors through cosine similarity, there are two
options.

The `cosine` option accepts any float vector and computes the cosine
similarity. While this is convenient for testing, it's not the most efficient
approach. Instead, we recommend using the `dot_product` option to compute the
similarity. To use `dot_product`, all vectors need to be normalized in advance
to have length 1. The `dot_product` option is significantly faster, since it
avoids performing extra vector length computations during the search.

[discrete]
=== Ensure data nodes have enough memory

{es} uses the https://arxiv.org/abs/1603.09320[HNSW] algorithm for approximate
kNN search. HNSW is a graph-based algorithm which only works efficiently when
most vector data is held in memory. You should ensure that data nodes have at
least enough RAM to hold the vector data and index structures. To check the
size of the vector data, you can use the <<indices-disk-usage>> API. As a
loose rule of thumb, and assuming the default HNSW options, the bytes required
is roughly `num_vectors * 4 * (num_dimensions + 32)`. Note that the required
RAM is for the filesystem cache, which is separate from the Java heap.

The data nodes should also leave a "buffer" for other ways that RAM is
needed. For example your index might also include text fields and numerics,
which also benefit from using filesystem cache. It's recommended to run
benchmarks with your specific dataset to ensure there's a sufficient amount of
memory to give good search performance.

[discrete]
include::search-speed.asciidoc[tag=warm-fs-cache]

[discrete]
=== Reduce vector dimensionality

The speed of kNN search scales linearly with the number of vector dimensions,
because each similarity computation considers each element in the two vectors.
Whenever possible, it's better to use vectors with a lower dimension. Some
embedding models come in different "sizes", with both lower and higher
dimensional options available. You could also experiment with dimensionality
reduction techniques like PCA. When experimenting with different approaches,
it's important to measure the impact on relevance to ensure the search
quality is still acceptable.

[discrete]
=== Exclude vector fields from `_source`

{es} stores the original JSON document that was passed at index time in the
<<mapping-source-field, `_source` field>>. By default, each hit in the search
results contains the full document `_source`. When the documents contain
high-dimensional `dense_vector` fields, the `_source` can be quite large and
expensive to load. This could significantly slow down the speed of kNN search.

You can disable storing `dense_vector` fields in the `_source` through the
<<include-exclude, `excludes`>> mapping parameter. This prevents loading and
returning large vectors during search, and also cuts down on the index size.
Vectors that have been omitted from `_source` can still be used in kNN search,
since it relies on separate data structures to perform the search. Before
using the <<include-exclude, `excludes`>> parameter, make sure to review the
downsides of omitting fields from `_source`.

[discrete]
=== Reduce the number of index segments

{es} shards are composed of segments, which are internal storage elements in
the index. For approximate kNN search, {es} stores the dense vector values of
each segment as an HNSW graph. kNN search must check each segment, searching
through one HNSW graph after another. This means kNN search can be
significantly faster if there are fewer segments. By default, {es} periodically
merges smaller segments into larger ones through a background
<<index-modules-merge, merge process>>. If this isn't sufficient, you can take
explicit steps to reduce the number of index segments.

[discrete]
==== Force merge to one segment

The <<indices-forcemerge,force merge>> operation forces an index merge. If you
force merge to one segment, the kNN search only need to check a single,
all-inclusive HNSW graph. Force merging `dense_vector` fields is an expensive
operation that can take significant time to complete.

include::{es-repo-dir}/indices/forcemerge.asciidoc[tag=force-merge-read-only-warn]

[discrete]
==== Create large segments during bulk indexing

A common pattern is to first perform an initial bulk upload, then make an
index available for searches. Instead of force merging, you can adjust the
index settings to encourage {es} to create larger initial segments:

* Ensure there are no searches during the bulk upload and disable
<<index-refresh-interval-setting,`index.refresh_interval`>> by setting it to
`-1`. This prevents refresh operations and avoids creating extra segments.
* Give {es} a large indexing buffer so it can accept more documents before
flushing. By default, the <<indexing-buffer,`indices.memory.index_buffer_size`>>
is set to 10% of the heap size. With a substantial heap size like 32GB, this
is often enough. To allow the full indexing buffer to be used, you should also
increase the limit <<index-modules-translog,`index.translog.flush_threshold_size`>>.

[discrete]
=== Avoid heavy indexing during searches

Actively indexing documents can have a negative impact on approximate kNN
search performance, since indexing threads steal compute resources from
search. When indexing and searching at the same time, {es} also refreshes
frequently, which creates several small segments. This also hurts search
performance, since approximate kNN search is slower when there are more
segments.

When possible, it's best to avoid heavy indexing during approximate kNN
search. If you need to reindex all the data, perhaps because the vector
embedding model changed, then it's better to reindex the new documents into a
separate index rather than update them in-place. This helps avoid the slowdown
mentioned above, and prevents expensive merge operations due to frequent
document updates.

[discrete]
include::search-speed.asciidoc[tag=readahead]
14 changes: 9 additions & 5 deletions docs/reference/how-to/search-speed.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ goes to the filesystem cache so that Elasticsearch can keep hot regions of the
index in physical memory.

[discrete]
tag::readahead[]
=== Avoid page cache thrashing by using modest readahead values on Linux

Search can cause a lot of randomized read I/O. When the underlying block
Expand All @@ -34,6 +35,7 @@ as a transient setting). We recommend a value of `128KiB` for readahead.
WARNING: `blockdev` expects values in 512 byte sectors whereas `lsblk` reports
values in `KiB`. As an example, to temporarily set readahead to `128KiB`
for `/dev/nvme0n1`, specify `blockdev --setra 256 /dev/nvme0n1`.
end::readahead[]

[discrete]
=== Use faster hardware
Expand Down Expand Up @@ -356,6 +358,7 @@ PUT index
}
--------------------------------------------------

tag::warm-fs-cache[]
[discrete]
=== Warm up the filesystem cache

Expand All @@ -369,6 +372,7 @@ depending on the file extension using the
WARNING: Loading data into the filesystem cache eagerly on too many indices or
too many files will make search _slower_ if the filesystem cache is not large
enough to hold all the data. Use with caution.
end::warm-fs-cache[]

[discrete]
=== Use index sorting to speed up conjunctions
Expand Down Expand Up @@ -422,15 +426,15 @@ right number of replicas for you is

=== Tune your queries with the Search Profiler

The {ref}/search-profile.html[Profile API] provides detailed information about
The {ref}/search-profile.html[Profile API] provides detailed information about
how each component of your queries and aggregations impacts the time it takes
to process the request.
to process the request.

The {kibana-ref}/xpack-profiler.html[Search Profiler] in {kib}
makes it easy to navigate and analyze the profile results and
give you insight into how to tune your queries to improve performance and reduce load.
makes it easy to navigate and analyze the profile results and
give you insight into how to tune your queries to improve performance and reduce load.

Because the Profile API itself adds significant overhead to the query,
Because the Profile API itself adds significant overhead to the query,
this information is best used to understand the relative cost of the various
query components. It does not provide a reliable measure of actual processing time.

Expand Down

0 comments on commit f4038b3

Please sign in to comment.