-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregations return different counts when invoked twice in a row #5021
Comments
Thanks for reporting this issue, this looks like a bad bug indeed. I'll look into it. |
More info from what I found. ES 1.0.0.RC2 ES 1.0.0.RC2 On Mac, counts may change between first and subsequent runs. On the first run, the counts are lower than on the subsequent runs. On Linux, the effect is more subtle. Counts do not change between runs. But, it seems different shard count lead to deviating entries, on the lower buckets. Here are two Linux examples, using Nils' data set. First is 10 shards, second is 5 shards, the lower three buckets differ. shards=10
shards=5
|
I just learned it is already known that the bucket counts differ over shard numbers, also for facets #1305 |
The byte[] array that was used to store the term was owned by the BytesRefHash which is used to compute counts. However, the BytesRefHash is released at some point and its content may be recycled. MockPageCacheRecycler has been improved to expose this issue (putting random content into the arrays upon release). Number of documents/terms have been increased in RandomTests to make sure page recycling occurs. Close elastic#5021
The byte[] array that was used to store the term was owned by the BytesRefHash which is used to compute counts. However, the BytesRefHash is released at some point and its content may be recycled. MockPageCacheRecycler has been improved to expose this issue (putting random content into the arrays upon release). Number of documents/terms have been increased in RandomTests to make sure page recycling occurs. Close #5021
The byte[] array that was used to store the term was owned by the BytesRefHash which is used to compute counts. However, the BytesRefHash is released at some point and its content may be recycled. MockPageCacheRecycler has been improved to expose this issue (putting random content into the arrays upon release). Number of documents/terms have been increased in RandomTests to make sure page recycling occurs. Close #5021
The byte[] array that was used to store the term was owned by the BytesRefHash which is used to compute counts. However, the BytesRefHash is released at some point and its content may be recycled. MockPageCacheRecycler has been improved to expose this issue (putting random content into the arrays upon release). Number of documents/terms have been increased in RandomTests to make sure page recycling occurs. Close elastic#5021
Hi,
A couple of days ago I started a thread on the mailing list (https://groups.google.com/forum/?fromgroups=#!topic/elasticsearch/c_xLCPOpvjc) about this issue, and the responses on it are slim.
The problem exists in the aggregations api since version 1.0.0.RC1 and is confirmed by me to also occur in 1.0.0.RC2.
The problem is that when you do a terms aggregation on an index sharded in multiple shards (10 in my case) it start to return inconsistent numbers. With this I mean that the numbers are different the second time compared to the first time. You cannot show these numbers to users as when they reload the analytics it shows totally different numbers than before without anything changing to the data.
I created a test suit as a gist for you to recreate the problem your self. It is hosted at: https://gist.github.com/thanodnl/8803745.
But since it contains datafiles it is kind of bugged in the web interface of github. Best you can clone this gist by running:
$ git clone https://gist.github.com/8803745.git
cd into the newly created directory and run:
$ ./aggsbug.load.sh
to load the test set into your local database. This can take a couple of minutes since it is loading ~1M documents. I tried to recreate it with a smaller set, but then the issue is not appearing.Once the data is loaded you can run a contained test with:
$ ./aggsbug.test.sh
. This will call the same aggregation twice, store the output, and later print the diff of the output.If you recreated the bug the output of the test should be something like:
When ran against 1.0.0.Beta2 the output is what is to be expected:
You see the output of the aggs is not occurring in the diff during the test, and the only diff between the two runs is the time it took to calculate the result.
The text was updated successfully, but these errors were encountered: