-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add as_any() to the ObjectStore to make it able to identify which ObjectStore is using for the related trait object #3839
Conversation
…ectStore is using for the related trait object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that rust supports trait upcasting you might be able to just make ObjectStore: Any
There appears to be some CI errors. I think adding a FWIW this seems like either approach is a breaking change (and would require releasing object_store 0.6.0), which is fine, but just something to keep in mind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @yahoNanJing
@@ -169,6 +170,10 @@ impl std::fmt::Display for AmazonS3 { | |||
|
|||
#[async_trait] | |||
impl ObjectStore for AmazonS3 { | |||
fn as_any(&self) -> &dyn Any { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I would like to understand the use-case more before merging this, I'm not a massive fan of breaking encapsulation in this way, it undermines the decorator pattern, and would rather avoid making a breaking change if possible Edit: turns out trait upcasting is still unstable, the RFC was just accepted. Using direct "inheritance" might have avoided this being a breaking change, as all |
I wonder if it is something like flushing the cache, or being able to take some different action if the item is not in the cache 🤔
I agree finding some way to avoid In other words I see no harm in adding this API as it allows significant flexibility even if it is not a preferred pattern for some projects |
Thanks @alamb and @tustvold for reviewing this. The main reason for adding |
How does as_any fit into this, the underlying stores don't expose or even know that information? I think you need to use the url that is used to register the store with the ObjectStoreRegistry as a cache prefix? |
If the At the first stage, I think the cache prefix based on the |
That is my point, the URL is a datafusion concept, the stores don't know the URL
How would this work if there are say two different S3 buckets registered? |
Here is a PR in DataFusion that is perhaps related: apache/datafusion#5543
So perhaps you are saying the caching should be done at a higher level? Like in |
I don't know whether it's common for a cluster to access several S3 buckets. However, it's a common case for a cluster to access one S3 bucket, one HDFS cluster, or other different kinds of storages. |
Thanks for pointing this out. Previously I thought the url is a necessary part of the |
Thanks @alamb. I still think it should be at the |
Is there some reason we should hold off on merging this PR ? |
I was waiting based on this feedback, I would rather not merge something, especially a breaking change, without a motivating use-case. I had understood apache/datafusion#5543 as removing the need for this |
Hi @tustvold, I'm afraid we still need to add the as_any() interface to the trait. Based on apache/datafusion#5543, I factored previous code. However, it doesn't work as expected. See the UT fails in https://github.com/yahoNanJing/arrow-ballista/blob/dev-20230322/ballista/core/src/cache_layer/object_store/mod.rs#L169-L173 Or maybe there are other correct way to downcast an |
I'm having a somewhat hard time understanding the code you've linked but please bear with me:
Calling get on this object store then:
A simple fix might be to swap the decoration order, but taking a step back, the cache key is known at the time FWIW this highlights my major reservation about as_any(), it makes for very hard to follow code, there is a reason Rust discourages its usage by not having first-class dynamic casting support, the friction is a feature 😄. I think if downstreams want to use this pattern they can derive a trait object with |
Thanks @tustvold for reading the proto branch for cache layer. The process details you described above are exactly correct.
I'm sorry I didn't explain it very clearly. Actually the process works well. And it's totally transparent to the Datafusion. The only unexpected thing is the function |
Thanks @tustvold for providing this genius idea. It works now without introducing |
I am going to close this for now, feel free to reopen should the need arise |
Which issue does this PR close?
Closes #3838.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?