Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fear of creating extra requests? #36

Closed
tino opened this issue Jun 25, 2014 · 4 comments
Closed

Fear of creating extra requests? #36

tino opened this issue Jun 25, 2014 · 4 comments

Comments

@tino
Copy link

tino commented Jun 25, 2014

Hi,

I really like the fact that you are creating a protocol for uploads. Building an upload server and html5 client I struggled a lot with deciding how to handle big, resumable uploads. I finally went with a protocol alike to the Amazon S3 multipart upload. It works really well and I am able to fully saturate the available upload bandwidth with parallel chunks (which is important because usually files are multiple GBs big), pause uploads (also in-explicitly, when the network connection drops), and resume them. I was going through the tickets that are currently open, and I made the following observation.

It seams that you want to put every piece of information in as few requests as possible. I see mentions of max-content-length (#24) , checksum support (#7), protocol discovery (#29), etc to be implemented in the first POST request. Isn't this wat an OPTIONS request is for?

Also, finishing the upload happens now because the server knows the length of the to be uploaded data up-front. Streaming uploads are to be implemented in the future however. So why not leave room or implement a "finalising" request. I think that it is also the right place for a final checksum (or hashtree) to be exchanged. And a place to handle things like Entity-Location (#30) (In my case, all chunks are assembled by a background task. The last request returns the tasks status location url to be consumed by the client).

We are talking about a resumable upload protocol, so I assume most use-cases concern big files because otherwise a single POST or PUT would suffice. So I can't imagine the extra requests would be of any concern, but correct me if I make the wrong assumptions.

I can see the beauty in the way the protocol is currently designed (single start request, the rest managed by headers), but for me that beauty fades away quickly when the features Parallel Chunks, Checksums and Streams will be implemented within the same constraints.

Tino

@qsorix
Copy link

qsorix commented Jun 25, 2014

@tino, you're making very good points.

My use case does not involve big files. Sizes range from a few kilobytes to tens of megabytes. But they're being sent over a terrible GPRS link, with round trip time reaching 10 seconds, and frequent disconnections... if it wasn't for those disconnections, I would not require resumable uploads, as the link's speed is decent. For small files, another request will significantly impact total time. It is possible to use one protocol for small files and another one for big ones, but that complicates implementation of the clients... which in my case are small dumb sensors that I try not to overwhelm with logic.

On the other hand, my clients have plenty of time. If they need to wait because of another request, I can live with that.

Yes, I agree with you, and I would still try to use as little requests as possible when it makes sense. OPTIONS is a perfect example of a request that makes sense on its own, because a client can query server's options once and assume they'll not change for some time, e.g. until next 4xx error.

@vayam
Copy link
Member

vayam commented Jun 26, 2014

@qsorix is right. resumable makes sense for mobile uploads and chunked uploads for desktops.
the way I see chunked uploads evolve is /compose or /cat several individually resumable uploads. It will only build on top of this core protocol.

@tino
Copy link
Author

tino commented Jun 27, 2014

I see two usecases for this protocol being used in the open issues (the ones you mention above), and I think they conflict a little / they don't completely need the same features.

  • On one side, there is the relative small uploads over unreliable connections, as mentioned by @qsorix , that require a slim and fast way to resume an intermitted upload. This was probably the goal tus started with.
  • On the other hand, the features Parallel Chunks and Streams are oriented towards large uploads.

I have the feeling that it would be wrong to build the features for the large uploads upon the "core" implementation. These features are implemented a lot easier when there is room for say, a finalising request (with checksum, processing info, etc.). Just as the AWS S3 API has a "normal" (PUT) upload method, and a multipart method.

So my suggestion would be to have the PATCH-ing (after getting the offset via a HEAD request) as the core (as is now), but switching to a "multipart" upload flow when requiring more features (upload expiration, chunking, checksumming, etc): initialisation request, one or more chunks, finalising request.

@Acconut
Copy link
Member

Acconut commented Dec 22, 2014

@tino See #29 (comment) for a draft of an OPTIONS request.

@Acconut Acconut closed this as completed Oct 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants