Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for Exact Numeric Types #10190

Open
rahafrouz opened this issue Jul 16, 2020 · 12 comments
Open

Adding support for Exact Numeric Types #10190

rahafrouz opened this issue Jul 16, 2020 · 12 comments

Comments

@rahafrouz
Copy link

Description

Druid supports Double data type which is an Approximate Numeric Types. Is it on the roadmap anywhere to add support for Exact Numeric Types?

Highest numerical precision we can get with druid is the double (64bit) data type. Due to the nature of binary storage format, we would have 53 meaningful bits, which means 15 meaningful decimal digits for sure. ( 53 log10(2) ≈ 15.955 ).

As an example of Exact Numeric type: BigQuery has support for NUMERIC data type which gives 38 digits of precision and 9 decimal digits of scale. This is the ideal field for financial reporting, or any high precision calculations.

Motivation

Support for data with High precision ranging that contains very small and very big data points without data loss.

  • Financial reporting is a big motivating use-case.
@a2l007
Copy link
Contributor

a2l007 commented Jul 28, 2020

We have an internal custom aggregator based on BigDecimal which is being used in several production usecases. We'll look into opensourcing it for the benefit of the community.

@jihoonson
Copy link
Contributor

@a2l007 that sounds great! I have been looking at this area recently too, and wondering if there is any performance issue with using BigDecimal. Do you have know how slow it is compared to aggregators based on primitive types?

@a2l007
Copy link
Contributor

a2l007 commented Jul 28, 2020

We use a variant of BigDecimal called CompressedBigDecimal with the difference being that the latter is mutable and instead of using the object representation for complex types, it is represented as block compressible columns.

Haven't had a chance to do performance testing against the primitive types, but the CompressedBigDecimal based aggregator performs significantly better than BigDecimal itself.

@jihoonson
Copy link
Contributor

Oh I see. Looking forward to seeing it!

@himanshug
Copy link
Contributor

extension that @a2l007 mentioned (based on what I recall and work done when I was at Yahoo):
"big number" is stored as a primitive long data type where last 2 digits are assumed to be 2 digits right of decimal. that gives you a "big number" with 2 digits worth of precision. This was sufficient for the use case we had.

FWIW, I also did a prototype using java BigDecimal data type that was too slow for the use case, also tried creating a custom BigDecimal data type using two longs (one for integer part and other for fractional part) which also couldn't match the performance obtained by simpler approach above.

@jihoonson
Copy link
Contributor

FWIW, I also did a prototype using java BigDecimal data type that was too slow for the use case, also tried creating a custom BigDecimal data type using two longs (one for integer part and other for fractional part) which also couldn't match the performance obtained by simpler approach above.

@himanshug thanks for sharing. I was thinking the same. Did the performance not meet only your requirement or just slow in general?

@himanshug
Copy link
Contributor

himanshug commented Jul 28, 2020

@jihoonson java's BigDecimal is immutable . So, "aggregate" operation creates new objects unnecessarily.
You can't aggregate with BigDecimal in serialized off-heap representation, so a naive BufferAggregator off-heap impl would require constantly deserializing state stored off-heap into on-heap java BigDecimal object for aggregation then serializing it again into the off-heap buffer. So, using java BigDecimal is slow in general.

One line of thought was to create a custom BigDecimal implementation with code/algo pretty much same as java's BigDecimal, but that was mutable and had an equivalent off-heap representation with operations doable on the off-heap form itself (e.g. the way we do things for thetaSketches etc) . That complexity wasn't worth the effort/time given that our use case was satisfied by simpler approach described in earlier comment.

@jihoonson
Copy link
Contributor

@himanshug oh sorry, in your last comment, I missed that the performance wasn't as good as the simpler approach 😅 I'm considering adding a new Decimal data type which would be stored in 2 integers along with all the type-specific aggregators, but haven't written any code yet. So I've been wondering what could impact on query performance in general in this approach. Thanks for sharing your experience.

@a2l007
Copy link
Contributor

a2l007 commented Dec 11, 2020

FYI the proposal for the compressed bigdecimal extension is up #10668

@github-actions
Copy link

github-actions bot commented Sep 5, 2023

This issue has been marked as stale due to 280 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If this issue is still
relevant, please simply write any comment. Even if closed, you can still revive the
issue at any time or discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Sep 5, 2023
@usberic
Copy link

usberic commented Sep 25, 2023

Hi, I'm working on an application which deals with aggregating a few currency-based fields and was experimenting with using this data type extension. However, part of our application requires using Druid SQL queries, which seem to not support Compressed BigDecimal. I understand native queries are capable of performing aggregation functions, however is there any plan to support Druid SQL queries? Specifically I'm trying to use the summation of certain columns, however Druid SQL's SUM operator does not seem to support Complex data types.

@github-actions github-actions bot removed the stale label Sep 26, 2023
Copy link

github-actions bot commented Jul 2, 2024

This issue has been marked as stale due to 280 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If this issue is still
relevant, please simply write any comment. Even if closed, you can still revive the
issue at any time or discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants