-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"java.lang.NullPointerException" occurs when using APPROX_QUANTILE_DS #11544
Comments
Hi @lsee9, thank you for the report. I would call this a bug of Druid because Druid should have returned a better error than NPE. To answer your questions, I think your assessment is correct about the lack of memory. Please see https://datasketches.apache.org/docs/Quantiles/OrigQuantilesSketch.html for the memory space required per |
Hi @jihoonson, thank you for your reply:) My druid spec: druid.processing.buffer.sizeBytes=1GiB
druid.processing.numMergeBuffers=10
druid.processing.numThreads=19 (20 core machine)
MaxDirectMemorySize=30g
heap size=32g BUT, For k = 128, the problem still occurs. And, I tried with a smaller k (k = 32, 64). If you have any good ideas, please reply! |
Hmm, were there lots of values per group-by key by any chance? What does this query return? (BTW, I copied the time filter from your comment, but is that correct? It is identical to SELECT COALESCE("mytable".country, '_') AS country, count(*)
FROM "mytable"
WHERE ("mytable".service_code = 'top')
AND __time >= '2021-06-01' AND __time <= '2021-06-01'
GROUP BY COALESCE("mytable".country, '_') |
Yes!! the time filter is correct. The result of running the query you said(ORDER BY count DESC): {"country":"kr","EXPR$1":490}
{"country":"us","EXPR$1":221}
{"country":"jp","EXPR$1":173}
{"country":"ca","EXPR$1":165}
{"country":"au","EXPR$1":155}
{"country":"de","EXPR$1":147}
{"country":"vn","EXPR$1":138}
{"country":"sg","EXPR$1":130}
{"country":"th","EXPR$1":127}
{"country":"hk","EXPR$1":123}
{"country":"nz","EXPR$1":122}
{"country":"gb","EXPR$1":115}
{"country":"ph","EXPR$1":112}
{"country":"tw","EXPR$1":111}
{"country":"id","EXPR$1":108}
...
{"country":"re","EXPR$1":6}
{"country":"ye","EXPR$1":6}
{"country":"bm","EXPR$1":4}
{"country":"gy","EXPR$1":4}
{"country":"li","EXPR$1":4}
{"country":"mc","EXPR$1":4}
{"country":"tc","EXPR$1":4}
{"country":"kp","EXPR$1":3}
{"country":"ad","EXPR$1":2}
{"country":"so","EXPR$1":2}
{"country":"gw","EXPR$1":1}
{"country":"mq","EXPR$1":1}
{"country":"sy","EXPR$1":1}
total num country: 200 each is not so much... |
☝️ The above comment is the druid table result. The number of rows in the original table is as follows. SELECT
country,
SUM("count") AS total_num_rows_original
FROM "mytable"
WHERE __time >= '2021-06-01' AND __time <= '2021-06-01' AND service_code = 'top'
GROUP BY 1
ORDER BY 2 DESC query result: {"country":"kr","total_num_rows_original":1082227280}
{"country":"us","total_num_rows_original":10978845}
{"country":"jp","total_num_rows_original":2896190}
{"country":"ca","total_num_rows_original":2767109}
{"country":"au","total_num_rows_original":1862148}
{"country":"vn","total_num_rows_original":1718031}
{"country":"nz","total_num_rows_original":575751}
{"country":"de","total_num_rows_original":556492}
{"country":"sg","total_num_rows_original":536305}
{"country":"id","total_num_rows_original":425479}
{"country":"hk","total_num_rows_original":373920}
{"country":"ph","total_num_rows_original":364786}
{"country":"","total_num_rows_original":361175}
{"country":"th","total_num_rows_original":360037}
{"country":"my","total_num_rows_original":333746}
{"country":"gb","total_num_rows_original":324027}
{"country":"mx","total_num_rows_original":240169}
{"country":"ae","total_num_rows_original":237182}
...
{"country":"ad","total_num_rows_original":3}
{"country":"gw","total_num_rows_original":3}
{"country":"so","total_num_rows_original":3}
{"country":"mq","total_num_rows_original":1}
{"country":"sy","total_num_rows_original":1} If total aggregation is performed, the number of original rows is about 81 billion, But the number of bytes required is 2^36 ~ 2^37 about 81 billion rows, increasing by 1 KB on a log scale. |
I think I see what's going on 🙂. Does your original query work if you add an extra filter of |
Yes, it does work if I add extra filter |
Yes, I think the problem is too many items per country. Druid uses a fixed-size buffer per row to keep the sketch ( As a workaround, you could use other functions to compute approximate quantiles, such as |
Yes, I understand! |
Druid version 0.21.1 uses datasketches-java-1.3.0-incubating and datasketches-memory-1.2.0-incubating |
Could someone point to the code that allocates this memory for BufferAggregator please? |
if rebuilding Druid is an option, I would suggest increasing this constant: Line 66 in e9d964d
It will increase the size of pre-allocated buffers in BufferAggregator, but not drastically. Sketches grow very slowly at that point. I suggest this as a temporary measure until we figure out how to fix this and go through release cycles. |
Hi @AlexanderSaydakov, thank you for taking a look. It does fail in the Druid master branch. You can easily reproduce it by running
Those buffers are allocated in |
See comments in datasketches-java/issues#358. |
Hi, @leerho, thank you for your reply. As suggested at datasketches-java/issues#358, Please let me know if it is feasible:) |
As Lee Rhodes said, it might take quite a while to fix the root cause and go through release cycles for datasketches-memory and datasketches-java. Therefore I would suggest using the workaround that I mentioned above, namely increasing the MAX_STREAM_LENGTH constant. It affects the size pre-allocated for each sketch in the BufferAggregator. The assumption was that due to data fragmentation across multiple dimensions with power-law distribution only a small number of sketches will reach that size and move to on-heap memory. Since this mechanism is broken now, let's set a much higher limit until it is fixed. And let's do it quickly before 0.22 branch is created. I can do a pull request if we agree on the value. Here is the size of one slot in the BufferAggregator in bytes for the default sketch parameter K=128 for different values of MAX_STREAM_LENGTH: I suggest setting to 1T. |
@leerho @AlexanderSaydakov, do you have a rough schedule for the new release of datasketches-memory and datasketches-java? If it's going to take long, perhaps we could add a config that can temporarily live for a couple of Druid releases to control the size of |
This can take weeks if not months. datasketches-memory is being prepared for a major release, which is not quite ready yet, and datasketches-java depends on it, which means a sequential process with voting stage for each and so on. |
@AlexanderSaydakov thanks, sounds good. I will make a PR soon. |
I created #11574. |
These changes are to use the latest datasketches-java-3.1.0 and also to restore support for quantile and HLL4 sketches to be able to grow larger than a given buffer in a buffer aggregator and move to heap in rare cases. This was discussed in #11544. Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
Affected Version
Description
Hello, I am trying to calculate quantiles using "APPROX_QUANTILE_DS()".
but java.lang.NullPointerException occurs in my query.
Exception occurs in "org.apache.datasketches.quantiles.DirectUpdateDoublesSketch.growCombinedMemBuffer".
So, I think this is due to out of memory. (There is not enough memory available for the operation)
However, increasing the memory does not solve the problem.
Also, the problem only occurs when using some service codes (e.g. 'top', 'cafe')
What I'm curious about is:
I don't have any good ideas to solve the problem :(
my query:
datasource configuration:
full log:
Any help would be greatly appreciated.
The text was updated successfully, but these errors were encountered: