Use historical nodes to host shared cache #7570

leventov · 2019-04-29T16:38:17Z

There is an idea that instead of memcached, historical nodes themselves can be used to host shared, not only their local cache. Since there are a lot of them, only a small fraction of each historical node's memory can be devoted to the shared cache.

Such colocation can also simplify Druid setup because no separate fleet of memcached nodes would be required.

sascha-coenen · 2019-04-29T20:33:28Z

awesome idea
I was playing around with this using Apache Ignite
https://ignite.apache.org

Since then my mind keeps coming back to the thought how great it would be for many usecases if Druid had an underlying colocated cache.
Brokers could leave their individual queries or query state in the cache and one could then easily consult that cache to see the total number of executing queries. Zookeeper is oldfashioned, using REST endpoints is cumbersome...but a ready-made distributed cache or better to say an underlying generic compute grid like Ignite would be quite a nice platform foundation. Distributed clojures. Data Streamers... All so low-level that one could build on it.
With Ignite I could get the above distributed co-located cache and one can also optionally configure Ignite to write through to a remote section of itself too, so having a second-level cache, which in turn one can configure with optional persistence. Its quite nice to use such a product as a foundational component...

gianm · 2019-04-29T22:47:14Z

It seems for this case it would be nice to choose a 'small' (few dependencies) and embeddable (same JVM as the historical) distributed cache. Is Ignite like that? (Or are there others that are?)

leventov · 2019-05-01T15:30:16Z

Historical nodes are designed for restarts, including restarts of large groups of nodes nearly at the same time. We don't want such restarts to disrupt the quality of service. So the cache should be replicated at least 2x. This requirement makes simple local embeddable cache insufficient (unless we want to basically solve again all the hard problems with reliable replicated cluster services).

So Ignite looks to me like a more reasonable solution.

@devozerov would appreciate your opinion.

leventov · 2019-07-14T18:11:47Z

A related paper: https://blog.acolyer.org/2019/06/24/fast-key-value-stores/

leventov added the Area - Cache label Apr 29, 2019

leventov mentioned this issue Sep 23, 2019

Decouple segment storage and serving on Historicals #8575

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use historical nodes to host shared cache #7570

Use historical nodes to host shared cache #7570

leventov commented Apr 29, 2019 •

edited

Loading

sascha-coenen commented Apr 29, 2019

gianm commented Apr 29, 2019

leventov commented May 1, 2019

leventov commented Jul 14, 2019

Use historical nodes to host shared cache #7570

Use historical nodes to host shared cache #7570

Comments

leventov commented Apr 29, 2019 • edited Loading

sascha-coenen commented Apr 29, 2019

gianm commented Apr 29, 2019

leventov commented May 1, 2019

leventov commented Jul 14, 2019

leventov commented Apr 29, 2019 •

edited

Loading