-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Support new and different segment types #2965
Comments
Overall I think the approach is in a good direction. (thinking outloud below) I'm a bit cautious about the
If I'm reading this correctly, the The What I'm cautious about is that having As such, I suggest leaving the two QueryableIndex asQueryableIndex();
StorageAdapter asStorageAdapter(); for now, until use cases for more dynamic data expression can be more fleshed out. |
So, Charles, you are for adding the method and not taking away the other On Friday, May 13, 2016, Charles Allen notifications@github.com wrote:
|
@cheddar I'm proposing to not add a IF the |
@drcrallen The intent of This seems to me like a fair way of letting people experiment with using features provided by non-default engines. If those experiments go really well, and some features end up being things we want to generalize across engines, we could do that by either introducing a new interface in core Druid, or by introducing an extension that only has that interface, which other storage engine / query extensions could depend on. Does that sound fair? |
Ok, if the storage class, data expression class, and query class are all in the same extension then that should be fine, but it would be good if the resulting javadoc for the |
Correction: In the same extension OR are used by default druid queries via default druid classes |
I see Druid as two very abstract parts:
With the current implementation, power of 1) can not be fully leveraged since it can ONLY understand segments created by 2), which I find limiting. This proposal separates 1) and 2) into a clean and isolated pieces that can be iterated on separately and as a result, making druid more extensible and powerful. |
👍 |
To be clear 👍 on the proposal, but the wording of the javadoc contract for |
This issue has been marked as stale due to 280 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions. |
This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time. |
Currently, Druid has its own persistence format which was designed to handle structured data in the form of dimensions and metrics. It would be nice to expand Druid to be more straightforward in handling different structures, persistence formats and even functionality all as extensions to core. This proposal tries to move us in that direction.
There are three basic areas that require attention in order to do this:
** Ingestion
The ingestion side of this is already handled by the recent
Appenderator
changes, so I do not believe any noteworthy changes would be required in the immediate future. That said, not all ingestion mechanisms leverageAppenderator
s yet, so in order to get the capabilities enabled by this proposal, there will be work needed to implement all mechanisms of ingestion in terms ofAppenderator
objects.** Hand-off
Hand-off is comprised of two things: persisting on the ingestion-side and deserializing on the historical(read)-side.
Appenderator
s already own the persisting process on the ingestion side, so once again, I do not believe any noteworthy changes are required for that half of the story.On the deserialization end of the spectrum, though, there will be changes required to enable these formats to be added as extensions. Namely, the current
SegmentLoader
interface is implemented bySegmentLoaderLocalCacheManager
in a two-step algorithmWe need to change the algorithm of step 2 to be something that can be extended, which means a Guice or Jackson touch-point. I propose that we make it a Jackson touch point. Specifically, we should add a file to the zip that is a JSON-descriptor for the factory that should be used to deserialize the files. Essentially, resulting in this implementation instead:
Where the interface for
SegmentFactory
isAnd
legacyFactory
is an implementation ofSegmentFactory
that is just** Querying
Different segment persistence types can expose and enable new and different functionality in varying ways. We want to enable queries that can take advantage of the specific benefits of any given persistence type while also providing methods to connect into functionality already implemented. We can enable this with a relatively simple interface change. Currently, the Segment interface is
I.e. it has two methods that allow you to "convert" the
Segment
into an object with specific semantics that queries know how to deal with:I propose that we change the interface to be
This essentially means that all of the places that currently call
asQueryableIndex()
would be able to callas(QueryableIndex.class)
instead. This interface does potentially have external touch-points in that if anybody has implemented their own Query, the might be calling eitherasQueryableIndex()
orasStorageAdapter()
already. So, there is a decision to be made on whether we make the change backwards compatible from the beginning and then ultimately remove the two superfluous methods later or if we make the backwards incompatible change now.I think I'm a fan of making the backwards incompatible change now, because that should limit how many people are adversely affected by it. If we make it later, then everyone who has implemented their own storage format must update their implementation to not have the methods anymore.
Then again, if all we are doing is removing methods from an interface in the future, I think that would be a compile-time incompatibility, but not a runtime incompatibility. I.e. after we remove the methods, I think it's still possible for an implementation that was compiled assuming those methods exist to continue to work. But, I'm not sure about that. If that's the case, then the backwards-incompatible removal of the methods from the interface is actually a relatively low-risk change and only potentially adversely affects people who have implemented their own queries in terms of the removed methods.
I believe that these changes are sufficient to enable extensions to create their own persistence formats and even expose their own interfaces for querying, if required to access some specific properties of the new persistent format. This could be leveraged for many different uses, from building connectors to ORC or Parquet-based data to trying something completely new and different while still leveraging Druid's ingestion, segment-management and query-routing functionality.
In general, this will also improve the staying power of the Druid system as it will enable us to switch persistence formats when and if it is determined that the format implemented in the way-back-when is not keeping up the current needs and trends of the space.
The text was updated successfully, but these errors were encountered: