Filter out infinities in radix-based select-k #1742

achirkin · 2023-08-16T15:42:10Z

As a means of filtering, ANN methods can produce a lot of repeated max_bound<T>/min_bound<T> values.
These are fed to a select_k function, which leads to poor performance if the radix-based implementation is used.
This is due to the nature of the algorithm (lots of values with the same bit representation).

This fix filters out max_bound<T>/min_bound<T> values as a special case. It works as follows:

In the zero-th pass (first histogram creation), we check the first k values of the input for being max_bound<T>/min_bound<T> and add them to the end of the output if found.
In the other passes, the max_bound<T>/min_bound<T> are explicitly ignored; this breaks the assumption that the inputs always have enough values; the PR makes the code not rely on this assumption by slightly modifying comparisons.
The back-fill sequence of k-th values (bits == kth_value_bits) is changed to fill the output from k - needed_num_of_kth in order to not override the max_bound<T>/min_bound<T> values written during the zero-th pass.

Closes: #1725

…n at first pass

yong-wang · 2023-08-21T11:28:25Z

In the zero-th pass (first histogram creation), we check the first k values of the input for being max_bound/min_bound and add them to the end of the output if found.

I think there is a corner case. What if the first k values doesn't contain enough max_bound<T>/min_bound<T> values? For example, suppose we need 5 infs in the top-k results and all these 5 infs are not in the first k values, then we get no inf in pass 0. During the last filtering step, we also won't get any inf because they are not saved in out_buf.

achirkin · 2023-08-21T11:44:58Z

This will work fine. The trick here is that if there are no bound values in the first k, that automatically means there are enough non-bound values there - because any input value should be preferred over the bound values.

yong-wang · 2023-08-21T11:57:05Z

Got it. Really smart strategy.

yong-wang · 2023-08-21T12:09:59Z

The back-fill sequence of k-th values (bits == kth_value_bits) is changed to fill the output from k - needed_num_of_kth in order to not override the max_bound<T>/min_bound<T> values written during the zero-th pass.

Shoud the k-th values be always written from the end?

If max_bound<T>/min_bound<T> is not the k-th value, they should not appear in the result, and should be overwritten by values that <= k-th value.
If max_bound<T>/min_bound<T> is the k-th value, then the last filter won't write any k-th values to the output, so the max_bound<T>/min_bound<T> values written during pass 0 are untouched.

achirkin · 2023-08-21T12:57:05Z

The problem is that with this implementation max_bound<T>/min_bound<T> get the special treatment; they do not appear in the histogram. As a result, select_bucket assumption is broken; I came up with a workaround to select the last bin for the next pass if there not enough (non-bound) values. Hence we can end up with a situation when the (bits == kth_value_bits) does not necessarily mean there are enough non-bound values (last bin is selected even though accumulated count is less than k). Luckily, we have needed_num_of_kth from the previous pass, so we know from where to start writing the "k-th" values - it does not really matter in which order we write them. But if there are not enough "k-th" values, they will not override the bound values written during zero-th pass.

yong-wang · 2023-08-22T06:06:21Z

Thanks for the explanation.

The code looks good to me.

I suggest adding unit tests which contain infs.

However, I'm a little concerned about whether we should add such special treatment. I'll add comments in #1725, which has concrete context.

… usage for double+uint64_t types

achirkin · 2023-08-23T07:04:48Z

Thanks for reminding me to write the tests! Indeed there was a bug :) forgot to filter out infinities in the last-filter (broken in the case it takes the original input data as in_buf).

yong-wang · 2023-08-27T09:47:46Z

Found a bug.

The test case is:
len = 32, k = 31, select_min = true
in = {0, 1, 2, 3, inf, inf, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31}
in_idx = {31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0}

Then the results are:
out = {0, 1, 2, 3, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, inf}
out_idx = {31, 30, 29, 28, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 4}

The last value in out_idx is 4, but should be 27 or 26, the values in in_idx corresponding to inf.

achirkin · 2023-08-27T10:52:58Z

Indeed, apparently the index isn't passed in the zero-th pass in the one-block kernel.

https://github.com/rapidsai/raft/blob/09ab49b22ae2a396794beae10ac16ef8524d3ce3/cpp/include/raft/matrix/detail/select_radix.cuh#L749C12-L749C12

Thanks! I'll fix and add your test case tomorrow. This shouldn't affect performance in any way, so I'll skip re-running all benchmarks.

… fix the non-set in_idx_buf in the zero-th pass of the one-block kernel

tfeher · 2023-09-11T19:17:13Z

Thank you @achirkin for implementing this workaround and for the detailed benchmarks presented here! Also thanks for @yong-wang for the constructive discussion and for further benchmarks. The conversaition here and in issue #1725 was really detailed and illuminating.

After going through the discussion, I have the following picture (please correct me if I am wrong):

The PR implements a workaround that improves top-k search for the case when the input has very high number of infinities.
We see a clear advantage of this when there are less than k non-inf values in the input.
Such case can occur in practice, but it probably also means that the problem parameters are set up incorrectly (normalization and precision used or too strong filtering).
The changes affect the register pressure, and the code complexity. There is a +/- 10% perf diff in the benchmarks. The average perf change (over the benchmark cases with only modest amount of infs) is close to 0.

I tend to agree with Yong, that it would be preferred not to complicate further the radix-k selection code, if this only treats a corner case. So the question of whether we should integrate this PR into RAFT, depends on whether the corner case needs to be addressed or not.

As Yong has pointed out, having so many infs during ANN search means

some other serious problem has already occurred, and the recall will be low. [...]
The same reasoning applies to ANN pre-filtering. If so many values are deleted that ANN could not return k items with valid distances, it means too many values have been deleted,

According to Artem,

the proposed fix adds the value in that it fixes the x10 slowdown in some edge cases with little to no cost to any of the other cases. Aside from the zero-th pass it doesn't really complicate the logic that much.

These are all important points. While it is true that on average, practically there is no perf change for the non corner cases, we average the results of arbitrarily defined gbench benchmarks. I am a bit concerned about the +/- 10% affect on these benchmarks: are we sure that we average a relevant subset? In Yong's benchmark plot we see a small but noticeable perf degradation. Instead of averaging the gbench benchmarks, I believe we shall take a set of relevant ANN benchmarks, and see how this PR affects the perf there (alternatively define gbench tests that corresponds these).

Because of these concerns, for the regular ANN search case, I would be happier to get a warning message like "k-th value is inf, please check your precision/normalization/filtering", instead of modifying the k-selection kernels.

I am not so sure about the pre-filtering. This could still motivate the solution presented in this PR. @cjnolet do we expect to filter so many values, that less than k non-inf values remain in the end? Do we expect this to occur so often in practice, that we should add a special case for radix topk? If yes, then I would be in favor merging this.

Add a few extra test and benchmark cases; in particular: 1. Allow specifying non-trivial input indices 2. Allow filling the input data with infinities to see how algorithms perform in edge cases These tests are borrowed from the controversial workaround #1742 Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #1821

cjnolet · 2023-09-20T11:16:02Z

@cjnolet do we expect to filter so many values, that less than k non-inf values remain in the end? Do we expect this to occur so often in practice, that we should add a special case for radix topk?

For deletion, we haven't gotten a whole lot of consensus on patterns encountered in practice, but we have been told that it's possible for the actual valid k values in a query to be less than k for some data points. We really need to be able to support the generalized cases very efficiently- so assume that not everyone's going to be returning a list with <5 materialized values, but folks that need that capability would prefer not to take a perf hit- especially since there's already going to be a perf hit in the filtering functions themselves.

Aside from delete, consider other (up-coming) use-cases, like filtering recommendations for items that a user has already purchased, or multi-valued keys where a document might only be returned once even if multiple tokens for the same document end up in the list of nearest neighbors. We want these cases to be fast, but we probably don't want to do it at the expense of the non-filtered case, since I still believe that's going to be the most widely used.

Filter out infinities in radix-based select-k

8786766

achirkin added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change 2 - In Progress Currenty a work in progress labels Aug 16, 2023

github-actions bot added the cpp label Aug 16, 2023

achirkin mentioned this pull request Aug 16, 2023

[BUG] radix::select_k<half> is slow #1725

Open

cjnolet assigned achirkin Aug 16, 2023

Write neighbours with inf dist at zero pass and allow zero current_le…

8964ba0

…n at first pass

achirkin marked this pull request as ready for review August 21, 2023 07:44

achirkin requested a review from a team as a code owner August 21, 2023 07:44

achirkin requested a review from yong-wang August 21, 2023 07:45

achirkin added 3 - Ready for Review and removed 2 - In Progress Currenty a work in progress labels Aug 21, 2023

achirkin and others added 4 commits August 22, 2023 09:43

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

35296d4

Add noinline to filter_and_histogram_for_one_block to reduce register…

3eb905f

… usage for double+uint64_t types

Add tests for inputs containing infinities

44014cf

Fix the last filter not ignoring the bound values

3b0b68d

achirkin and others added 4 commits August 23, 2023 09:37

Add benchmarks for the edge case of having many infinities

e7ece2c

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

d956724

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

28f5004

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

09ab49b

achirkin and others added 16 commits August 28, 2023 13:51

Make the result testing more strict, allow setting input indices, and…

39a8a75

… fix the non-set in_idx_buf in the zero-th pass of the one-block kernel

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

32e8bf6

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

6a1e670

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

01f2342

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

276ce93

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

e70b392

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

07a629c

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

e7189bb

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

08a692c

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

2ca00fd

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

645d8ec

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

0591fe8

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

8cf4da0

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

dcf6d68

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

b587374

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

b170f02

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

87551e7

achirkin mentioned this pull request Sep 13, 2023

matrix::select_k: extra tests and benchmarks #1821

Merged

Merge branch 'branch-23.10' into enh-select-k-radix-handle-infinities

67e8d5a

achirkin changed the base branch from branch-23.10 to branch-24.02 December 15, 2023 09:16

achirkin marked this pull request as draft December 15, 2023 09:17

achirkin added 0 - Stale / Orphaned PR is too outdated and needs significant rework, or author is no longer responsible. and removed 3 - Ready for Review labels Dec 15, 2023

cjnolet closed this Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter out infinities in radix-based select-k #1742

Filter out infinities in radix-based select-k #1742

achirkin commented Aug 16, 2023 •

edited

Loading

yong-wang commented Aug 21, 2023

achirkin commented Aug 21, 2023 •

edited

Loading

yong-wang commented Aug 21, 2023

yong-wang commented Aug 21, 2023

achirkin commented Aug 21, 2023

yong-wang commented Aug 22, 2023

achirkin commented Aug 23, 2023

yong-wang commented Aug 27, 2023

achirkin commented Aug 27, 2023

tfeher commented Sep 11, 2023

cjnolet commented Sep 20, 2023

Filter out infinities in radix-based select-k #1742

Filter out infinities in radix-based select-k #1742

Conversation

achirkin commented Aug 16, 2023 • edited Loading

yong-wang commented Aug 21, 2023

achirkin commented Aug 21, 2023 • edited Loading

yong-wang commented Aug 21, 2023

yong-wang commented Aug 21, 2023

achirkin commented Aug 21, 2023

yong-wang commented Aug 22, 2023

achirkin commented Aug 23, 2023

yong-wang commented Aug 27, 2023

achirkin commented Aug 27, 2023

tfeher commented Sep 11, 2023

cjnolet commented Sep 20, 2023

achirkin commented Aug 16, 2023 •

edited

Loading

achirkin commented Aug 21, 2023 •

edited

Loading