-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Theta sketch compression #15731
Comments
Alternative to compression by default would be making it configurable by the user per column or per table or per installation or some other way. I am not sure this extra complexity is needed. |
How much of a CPU overhead does the compression come with? |
This time is just to convert sketches to bytes. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Theta sketch compression is available for quite some time in the Apache DataSketches library. I would suggest enabling it in Druid. The simplest way would be to start serializing Theta sketches in compressed format. Deserialization automatically detects and supports that format starting from datasketches-java-4.0.0 and datasketches-cpp-4.1.0 (May 2023).
There is some overhead in converting sketches to bytes, but in an I/O bound system usually this is a reasonable CPU vs I/O tradeoff. In other words, compression reduces I/O (and storage cost) by spending more CPU, which is likely to yield overall benefit.
The text was updated successfully, but these errors were encountered: