Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce the datafusion-objectstore-hdfs in datafusion-contrib as an object store feature #260

Merged
merged 1 commit into from
Sep 27, 2022

Conversation

yahoNanJing
Copy link
Contributor

Which issue does this PR close?

Closes #259, #6.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@yahoNanJing yahoNanJing marked this pull request as draft September 21, 2022 07:01
@yahoNanJing
Copy link
Contributor Author

Will be ready after #258 merged.

/// Return the key and object store
#[allow(unused_variables)]
fn get_by_url(&self, url: &Url) -> Option<Arc<dyn ObjectStore>> {
#[cfg(feature = "hdfs")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being able to enable/disable object stores based on features sounds sensible, but I would recommend supporting multiple based on the scheme. Having to recompile different binaries for different backends is unfortunate. It's possible you intend to do this anyway, but thought I would mention

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a standalone system, we should know which object stores to be used before running it. Therefore, it's OK to make the decision at the compiling phase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you distribute binaries for such a system?

Copy link
Contributor Author

@yahoNanJing yahoNanJing Sep 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run it on K8S with replicas.

Copy link
Contributor

@tustvold tustvold Sep 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me phrase it differently, how would the ballista project distribute binaries/docker images for general consumption in arbitrary environments. I presume we want to provide this? People shouldn't need to compile from source for their specific environment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We made an agreement that it's supposed to be able to find all of the object stores from the ObjectStoreProvider. Otherwise, an error should be thrown. We'll change the related ObjectStoreProvider interface after we upgrading to latest datafusion.

}
}

None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the current construction this could return

"FeatureBasedObjectStoreProvider failed to create store for url {}"

This is already an improvement as it tells us that the provider was registered and used.

As described above, I think it would be even better if it could provide insight into why nothing was found, e.g. not compiled with the right feature, unsupported scheme, etc..

@yahoNanJing yahoNanJing marked this pull request as ready for review September 27, 2022 01:28
@yahoNanJing
Copy link
Contributor Author

We'll change the related ObjectStoreProvider interface after we upgrading to latest datafusion due to #3595, #3619

@yahoNanJing yahoNanJing merged commit b02780b into apache:master Sep 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce the datafusion-objectstore-hdfs in datafusion-contrib as an object store feature
3 participants