Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 Lookups might be doing full GET requests to S3 instead of just looking at metadata #2894

Closed
drcrallen opened this issue Apr 28, 2016 · 3 comments
Assignees
Labels
Milestone

Comments

@drcrallen
Copy link
Contributor

As per #2523 (comment) lookups regularly call org.jets3t.service.S3Service#listObjects when checking for new values.

This needs to be investigated to see if it can only check metadata and does not issue a full GET call.

@pdeva
Copy link
Contributor

pdeva commented Apr 29, 2016

it seems i am running into this issue too:
https://groups.google.com/forum/#!topic/druid-user/RUc8BNQ_6Ys

@gianm
Copy link
Contributor

gianm commented Apr 29, 2016

@drcrallen URIExtractionNamespaceFunctionFactory's cache populator calls puller.getVersion(uri) on the uri returned from getLatestVersion. puller.getVersion does a full getObject (does a GET). It doesn't need to do that, it could get away with getObjectDetails (does a HEAD).

Even a getObjectDetails doesn't seem necessary, since the objects that come back from listObjects have the modified dates in them.

@drcrallen
Copy link
Contributor Author

@gianm thanks, I'll see if I can get it updated with that fix for 0.9.1

@drcrallen drcrallen added this to the 0.9.1 milestone Apr 29, 2016
drcrallen added a commit to metamx/druid that referenced this issue Apr 29, 2016
@fjy fjy closed this as completed in #2900 May 4, 2016
fjy pushed a commit that referenced this issue May 4, 2016
* Make S3DataSegmentPuller do GET requests less often
* Fixes #2894

* Run intellij formatting on S3Utils

* Remove forced stream fetching on getVersion

* Remove unneeded finalize

* Allow initial object fetching to fail and be retried
nishantmonu51 pushed a commit to metamx/druid that referenced this issue May 6, 2016
* Make S3DataSegmentPuller do GET requests less often
* Fixes apache#2894

* Run intellij formatting on S3Utils

* Remove forced stream fetching on getVersion

* Remove unneeded finalize

* Allow initial object fetching to fail and be retried
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants