Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PATCH vs PUT #14

Closed
felixge opened this issue Apr 17, 2013 · 48 comments
Closed

PATCH vs PUT #14

felixge opened this issue Apr 17, 2013 · 48 comments
Milestone

Comments

@felixge
Copy link
Contributor

felixge commented Apr 17, 2013

As pointed out by @Baughn in IRC, PATCH (partial entity update) seems more appropriate than PUT (replacing an entire entity) for resuming uploads. But it might cause issues with mobile http proxies as PATCH is somewhat exotic. Needs investigation.

@kvz
Copy link
Member

kvz commented Apr 18, 2013

@homeworkprod
Copy link

Yep, I was thinking about PATCH as well because it seems more appropriate here.

Then again, while PUT is about a full update and PATCH is about a partial update of a resource, resuming is just the continuation of the creation of a resource I'd say – so both don't exactly match here.

The difference I see between those two is that (in the version 0.1 of the procotol) PUT would use a header (Content-Range) to specify what parts are to be updated while PATCH would usually do that in its body (works very well with JSON objects/associative arrays). Not sure if PATCH with the Content-Range header for binary data makes sense – but it might, indeed.

@Baughn
Copy link

Baughn commented Apr 18, 2013

PUT already has a specification for resuming uploads, and if that was all we wanted to do, it'd be fine.

PATCH seems appropriate because we might want to replace corrupted segments in the middle of already-uploaded data. Although, it'd also make sense to have a three-step protocol - POST to create the file, PUT to upload/resume, and PATCH to fix corrupted uploads, with the last being wholly optional.

@evert
Copy link

evert commented Apr 18, 2013

There's no real specification I'm aware of for ranged-PUT requests. As far as I can tell, it seems to be actively discouraged.

PUT is primarily intended as a 'full replacement', whereas with PATCH the intention is a partial replacement. The 'append' use-case is specifically mentioned in the RFC: http://tools.ietf.org/html/rfc5789

Bonus: it also provides a discovery mechanism.

@evert
Copy link

evert commented Apr 18, 2013

I spend a ton of time just reading rfc's and implementing them. Not saying you should use this for a template, but I felt that considering all the rules and regulations, this was the most appropriate for myself:

http://code.google.com/p/sabredav/wiki/PartialUpdate

It covers all the use-cases afaik.

@homeworkprod
Copy link

@evert: I think you are referring to the example of adding lines to a textual log. To me this is a different use case, because a log consisting of a few or even no lines can still be considered as a complete document, while a partially uploaded image is just garbage or at least a corrupted document.

@j4james
Copy link

j4james commented Apr 18, 2013

There's no real specification I'm aware of for ranged-PUT requests. As far as I can tell, it seems to be actively discouraged.

Just to be clear, this isn't a ranged-PUT request in the sense of a Range header - it's a PUT request with a Content-Range. And while RFC2616 doesn't go into much detail, it does mention the use of Content-Range in PUT requests.

From section 9.6 PUT:

The recipient of the entity MUST NOT ignore any Content-* (e.g. Content-Range) headers that it does not understand or implement and MUST return a 501 (Not Implemented) response in such cases.

@evert
Copy link

evert commented Apr 18, 2013

From HTTPbis:

An origin server SHOULD reject any PUT request that contains a Content-Range header field (Section 4.2 of [Part5]), since it might be misinterpreted as partial content (or might be partial content that is being mistakenly PUT as a full representation). Partial content updates are possible by targeting a separately identified resource with state that overlaps a portion of the larger resource, or by using a different method that has been specifically defined for partial updates (for example, the PATCH method defined in [RFC5789]).

@evert
Copy link

evert commented Apr 18, 2013

@felixge
Copy link
Contributor Author

felixge commented Apr 18, 2013

Ok, I think in order to make progress on this, we need to clarify the goals. I think we should set the following priorities, in order:

  1. MUST work with existing implementations (proxies, http libraries, etc.)
  2. SHOULD comply with existing specifications
  3. SHOULD comply with with upcoming specifications

Ideally we can find something that does not violate existing or upcoming specifications and goes well with the spirit of HTTP/REST. But I'd rather sacrifice idealism for a protocol that works today and solves the end user problem.

I'll work on a few ideas now and post them in a bit, but please share your thoughts on these priorities.

@reschke
Copy link

reschke commented Apr 18, 2013

PATCH imho is the right thing to do here (or POST if you dislike "extension" methods).

Note that in either case you'll need to define a payload format that contains the "patch" instructions (in this case likely minimally: the offset plus an octet (byte) sequence).

@felixge
Copy link
Contributor Author

felixge commented Apr 18, 2013

@reschke thank you so much for your feedback!

Note that in either case you'll need to define a payload format that contains the "patch" instructions (in this case likely minimally: the offset plus an octet (byte) sequence).

Makes sense. How would a client find ask the server about the received data after a network interruption? What response headers should be used by the server?

One problem I see with PATCH is this requirement from http://tools.ietf.org/html/rfc5789#section-2 :

The server MUST apply the entire set of changes atomically

If I interpret this and the following explanations correctly, it means that we'd have to use many requests with small payloads to avoid loosing a lot of data due to a network error.

@reschke
Copy link

reschke commented Apr 18, 2013

a) using a HEAD request? Or do you need more than the current length?

b) the requirement is not really different from other methods; the problem here is on the transport level (incomplete HTTP message), not the semantical level...

@felixge
Copy link
Contributor Author

felixge commented Apr 18, 2013

a) using a HEAD request? Or do you need more than the current length?

It would be nice if the server could communicate:

  • The current length
  • The expected final length
  • The chunks that have been received so far (to support parallel chunk uploading against the same resource)

b) the requirement is not really different from other methods; the problem here is on the transport level (incomplete HTTP message), not the semantical level...

Ok, does this mean that the spec is unclear about processing partial requests that experienced transport level issues?

@reschke
Copy link

reschke commented Apr 18, 2013

The expected final length

The client should know the expected final length, right?

The chunks that have been received so far (to support parallel chunk uploading against the same resource)

Why do we need parallel chunk uploading?

Ok, does this mean that the spec is unclear about processing partial requests that experienced transport level issues?

Well, incomplete messages are just that: incomplete. What a recipient can do with them depends a lot on context. For instance, can you assume that if you do a GET and get a truncated result, the contents received so far is ok? No, you can't. You might after getting the remainder and checking checksums.

The same is true for PATCH: a server could do something with truncated messages when it has some confidence that it's doing the right thing. What the cited text from PATCH wants to prevent is that a PATCH requests two changes, and the server applies only one of them.

@felixge
Copy link
Contributor Author

felixge commented Apr 18, 2013

The client should know the expected final length, right?

Yes. The use case would be giving the URL of the partial resource another client that wouldn't have this information, but might be interested about the state of the resource during the transfer. Consider it a nice to have.

Why do we need parallel chunk uploading?

To speed up uploads, especially for mobile clients with high latency / data loss. I realize your skeptical about this, so I'm willing to perform whatever measurements you'd like to see to convince you (or be convinced that it's not necessary).

For instance, can you assume that if you do a GET and get a truncated result, the contents received so far is ok? No, you can't.

I think that depends on the requirements. If the guarantees provided by TCP and the data link layer are strong enough for the application, the partial content should be considered ok I think. We've been discussing this over here: #7 (comment)

The same is true for PATCH: a server could do something with truncated messages when it has some confidence that it's doing the right thing. What the cited text from PATCH wants to prevent is that a PATCH requests two changes, and the server applies only one of them.

That makes sense.

@reschke
Copy link

reschke commented Apr 18, 2013

For instance, can you assume that if you do a GET and get a truncated result, the contents received so far is
ok? No, you can't.

I think that depends on the requirements. If the guarantees provided by TCP and the data link layer are strong enough for the application, the partial content should be considered ok I think. We've been discussing this over here: #7

Hm, no. For instance, what you might see is the start of the data stream, followed by an error message written to the stream (think delayed error page with stacktrace). The problem is that once a server has said "200" and started to write data, there is no reliable way of indicating an error except closing the connection.

@felixge
Copy link
Contributor Author

felixge commented Apr 18, 2013

Hm, no. For instance, what you might see is the start of the data stream, followed by an error message written to the stream (think delayed error page with stacktrace). The problem is that once a server has said "200" and started to write data, there is no reliable way of indicating an error except closing the connection.

You're right. That's an excellent point, thanks for bringing it up.

@reschke
Copy link

reschke commented Apr 18, 2013

For PATCH, you may want to consider a payload format where you have many "chunks" with checksums in a single payload; so the server could apply all those for which it does know that they are intact.

@felixge
Copy link
Contributor Author

felixge commented Apr 18, 2013

@reschke that makes sense. But unfortunately XHR2 doesn't really offer the write() interface that would be required for this (send can only be called once), so this would rule out HTML5 clients : /.

@homeworkprod
Copy link

For PATCH, you may want to consider a payload format where you have many "chunks" with checksums in a single payload; so the server could apply all those for which it does know that they are intact.

Doesn't this violate the aforementioned constraint that all parts of a PATCH request have to be applied?

The server MUST apply the entire set of changes atomically

So when a PATCH request is incomplete, is there even a way to comply to the RFC except for not applying any changes on the server? Wouldn't this require multiple, smaller PATCH requests instead until all of them were successful?

@reschke
Copy link

reschke commented Apr 19, 2013

So when a PATCH request is incomplete, is there even a way to comply to the RFC except for not applying any changes on the server? Wouldn't this require multiple, smaller PATCH requests instead until all of them were successful?

I'd say that there is some wiggle room if you define the PATCH payload to explicitly address this case.

With pipelining that should work as well, but you'd have to find a way to make things still work if you loose one of the requests in between.

@Baughn
Copy link

Baughn commented Apr 19, 2013

I don't think we should worry too much about following specs.

Do so when it's convenient, yes, and to the degree we can interoperate with
other systems that'd be great, but fundamentally our servers and clients
only need to know how to talk with each other, or others that follow the
protocol.

Anything beyond that is a bonus.
On Apr 19, 2013 11:06 AM, "reschke" notifications@github.com wrote:

So when a PATCH request is incomplete, is there even a way to comply to
the RFC except for not applying any changes on the server? Wouldn't this
require multiple, smaller PATCH requests instead until all of them were
successful?

I'd say that there is some wiggle room if you define the PATCH payload to
explicitly address this case.

With pipelining that should work as well, but you'd have to find a way to
make things still work if you loose one of the requests in between.


Reply to this email directly or view it on GitHubhttps://github.com//issues/14#issuecomment-16643797
.

@kevinswiber
Copy link

@Baughn I think that's a great idea as it will allow fast iteration from an implementation standpoint, but there needs to be a commitment to keep iterating. Otherwise, there will be 8 versions of 0.x spec implementations with potentially no compatibility.

It's important to still have these conversations upfront about standards and do the best we can to keep the two in line with each other. The downside of not following them is that tus will not work with existing Web infrastructure. That's a pretty big penalty from my perspective.

I'm looking at tus with an eye toward Web APIs. It enables some really awesome scenarios. Not having a single, defined way to do it that Just Works would be a real bummer for implementors of the future.

@felixge
Copy link
Contributor Author

felixge commented Apr 20, 2013

@Baughn I fully agree that we need something we can use today. However, if we can manage to fit it into the existing or upcoming specs, that would provide incredible benefits for adoption and long term stability. At this point there is a very good discussion at the httpbis mailing list, and with a little luck we may see something as simple as PATCH + Offset header as I've proposed here.

So for now I'm very hopeful that we can do this in way that truly improves the web, instead of fragmenting it for our benefit. Let's see how the discussion works out.

@Acconut Acconut added the core label Dec 7, 2014
@Acconut
Copy link
Member

Acconut commented Dec 10, 2014

Wow, there was great input from all of you.
As it seems the decision has fallen for PATCH (see 993b2ee) or am I missing something?

@evert
Copy link

evert commented Dec 10, 2014

Is this interesting/relevant for a protocol?

http://sabre.io/dav/http-patch/

@Acconut
Copy link
Member

Acconut commented Dec 10, 2014

@evert That's a great resource. I like their use of the X-Update-Range header although I disagree on the content type being application/x-sabredav-partialupdate. Since the body is just bytes you could also use application/octet-stream.

@evert
Copy link

evert commented Dec 10, 2014

Well, it's good to define specific mimetypes for things. This keeps the door open to alternative patch formats and gives you a sane way to figure out which type of PATCH format is supported on the server.

One issue with the format used in this project, is that it doesn't at all attempt to prevent problems related to multiple clients writing to a resource. It's a sure way for corrupted files to sneak in.

@Acconut
Copy link
Member

Acconut commented Dec 10, 2014

Well, it's good to define specific mimetypes for things. This keeps the door open to alternative patch formats and gives you a sane way to figure out which type of PATCH format is supported on the server.

You right but if the body doesn't have a specific format, as it's currently the case, using application/octet-stream is ok.

One issue with the format used in this project, is that it doesn't at all attempt to prevent problems related to multiple clients writing to a resource. It's a sure way for corrupted files to sneak in.

I agree, too. The current spec doesn't handle this case and our reference implementation (tusd) disallows writes starting at a different offset then the number of bytes written already.

@evert
Copy link

evert commented Dec 10, 2014

I disagree that using a generic mime-type is a good idea. Even though it just looks like a stream of bytes, it is a 'format' in some way. Furthermore aside from the format of the actual body, you can also use the mime-type as an indicator of how the server should interprete the request, and it's the only way to advertise support using Accept-Patch: http://tools.ietf.org/html/rfc5789#section-3.1

Using a generic mimetype is definitely bad design in this case.

@Acconut
Copy link
Member

Acconut commented Dec 10, 2014

Ok, I understand your concerns but an example would help.
For example, you could upload the content of diff or patch files as body an then the server would apply the patch, is this correct? This seems to be a nice addition although I'm a bit unsure if that's out of tus' scope.

@evert
Copy link

evert commented Dec 10, 2014

Say if both tus and sabre.io's patch format both used application/octet-stream, how can a client figure out which one is supported?

The standard way to do this, is by defining a custom mime-type and broadcasting support in the Accept-Patch header.

@Acconut
Copy link
Member

Acconut commented Dec 10, 2014

First of all, I like the idea of multiple different formats. Each of them could be implemented as a separate extension.

Say if both tus and sabre.io's patch format both used application/octet-stream, how can a client figure out which one is supported?

The client knows which protocol to speak and so it's knows what application/octet-stream is in this specific protocol.
In addition I would not categorize a part of a binary file a "special" format which needs it's own content type.

@reschke
Copy link

reschke commented Dec 10, 2014

If client and server need out-of-band knowledge, this is simply a private protocol. Not necessarily bad, but also something I wouldn't promote here.

@Acconut
Copy link
Member

Acconut commented Dec 10, 2014

@reschke There's also an issue discussing protocol discovery (see #29) to check if the server supports tus protocol and its version (and maybe more information, as supported formats). Could this solve your concerns? Or are you similar to @evert promoting a separate content type?

@reschke
Copy link

reschke commented Dec 10, 2014

Yes, the message should be self-descriptive, and application/octet-stream essentially means hand-waving (yes, for binary formats as well)

@evert
Copy link

evert commented Dec 10, 2014

Every version should use a distinct mime-type. That's the best way to handle that.

@evert
Copy link

evert commented Dec 10, 2014

You should pay attention to what mister @reschke is saying, he's actually one of the authors of HTTP/1.1 ;)

@Acconut
Copy link
Member

Acconut commented Dec 10, 2014

If so then this is quite a honour.
Your arguments have persuaded me and we should introduce a special content type following http://tools.ietf.org/html/rfc2616#section-3.7.
What should happen if the server receives a request using octet-stream as content type? Should he fall back to a default mime type or respond with a 415 (my favourite).

@reschke
Copy link

reschke commented Dec 10, 2014

@Acconut
Copy link
Member

Acconut commented Dec 10, 2014

@reschke The content is nearly the same but thanks for pointing out.
So I would suggest application/x-tus-partial-binary indicating that it's part of a binary file.
I'm not sure if you can get rid of the x- prefix similar to HTTP headers.

@evert
Copy link

evert commented Dec 10, 2014

Even though I'm not doing this either.. i think the recommended way to do private mimetypes now is by using the vnd. prefix.

So what about for example:

application/vnd.tus.partial; version=0.2

@Acconut
Copy link
Member

Acconut commented Dec 10, 2014

@evert I'm not sure if that's the correct document but https://tools.ietf.org/html/rfc6838#section-3 says that the vendor tree should be the best case for use, you're right.

@qsorix
Copy link

qsorix commented Dec 11, 2014

For example, you could upload the content of diff or patch files as body an then the server would apply the patch, is this correct? This seems to be a nice addition although I'm a bit unsure if that's out of tus' scope.

@Acconut, do you mean like uploading some data, and then applying a diff patch on top of that? That's what I understood, but I must be wrong, because it would be tricky to do.

To apply a diff, the state of the resource must be known. When an upload is interrupted, client doesn't know it. The only continuation specified at the moment, is to check how many bytes server got and assume these are the same bytes client has. And then continue the upload.

Once the upload is finished, there should be no way to modify the resource, because server can process and remove it. So there's no opportunity to apply diffs.

Yes, the message should be self-descriptive, and application/octet-stream essentially means hand-waving (yes, for binary formats as well)

So what about for example:
application/vnd.tus.partial; version=0.2

@reschke, @evert,
So a PATCH request with application/octet-stream says nothing, while a PATCH request with application/vnd.tus.partial says "append this data at the specified offset". Is this what you're saying?

What if there was only an upload, not a resumable upload? Would we then use PUT application/octet-stream? I guess not. It would then be rather clear that we can use things like image/jpeg.

When saying vnd.tus.partial aren't we assuming this is either a request that's not going to upload everything, or a continuation of an interrupted request? Following this logic, I would use image/jpeg in the first request that tries to upload everything. If my connection is lost, the next request would be vnd.tus.partial, as this time the content is no longer a valid jpeg file, right? Then I would name it either vnd.tus.append or application/octet-stream. The way I understand it, it should say what to do with the data (append), or what the data is (random bytes, in this case).

So using a combination of content types makes sense to me. Now, does it make any difference on the server? I think not. And I'm inclined to say that this gets more complicated than it needs to be. The way the protocol is specified right now, server doesn't need to look at content types at all. So perhaps this doesn't need to be specified in 1.0?

Another point. Do we want clients to be able to say that, after all the pieces are combined, you'll get image/jpeg on the server? If so, perhaps it should be carried as meta-data in File Create request?

BTW: Perhaps this discussion deserves a separate issue?

@Acconut
Copy link
Member

Acconut commented Dec 11, 2014

Once the upload is finished, there should be no way to modify the resource, because server can process and remove it. So there's no opportunity to apply diffs.

I agree.

@Acconut, do you mean like uploading some data, and then applying a diff patch on top of that? That's what I understood, but I must be wrong, because it would be tricky to do.
To apply a diff, the state of the resource must be known. When an upload is interrupted, client doesn't know it. The only continuation specified at the moment, is to check how many bytes server got and assume these are the same bytes client has. And then continue the upload.

This was just a bad example since I could not think of any other format which should be used inside the body when uploading a file. Maybe @evert can tell us a good one?

Another point. Do we want clients to be able to say that, after all the pieces are combined, you'll get image/jpeg on the server? If so, perhaps it should be carried as meta-data in File Create request?

The client should not have to send the content type of the final resource in every request since it mustn't change. It may send the type in the file creation although the server can decide what to do with this information.

BTW: Perhaps this discussion deserves a separate issue?

Yeah, probably. :)

@evert
Copy link

evert commented Dec 11, 2014

What I meant to say with different PATCH formats, is that a server may support more than 1 PATCH request at any given endpoint. Sending a diff over can make a lot of sense in some cases.

This is not unlike a REST-ish API supporting more than 1 format at any given endpoint. To do content-negotation for GET requests the Accept and Content-Type headers are used.

Similarly, a server may support your protocol as well as this one http://sabre.io/dav/http-patch/ How can a server and/or client support both? Using the accept-patch header and unique content-types.

If you think this is too complex, well, I don't have much to add then :)

I should say I'm just here as an interested bystander... I don't really have an intention to implement tus, as it doesn't make a lot of sense for me... Mainly just interested in the discussion.

@qsorix
Copy link

qsorix commented Dec 11, 2014

Let's continue in #43. This issue should be closed. PATCH vs PUT is settled already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants