Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support HTTP server and client request streaming (single HTTP::Request) #3406

Merged
merged 2 commits into from
Oct 20, 2016

Conversation

asterite
Copy link
Member

@asterite asterite commented Oct 9, 2016

This is an alternative implementation to #3395

I realized there's no need to have two distinct types representing an HTTP::Request. I copy its docs:

# An HTTP request.
#
# It serves both to perform requests by an `HTTP::Client` and to
# represent requests received by an `HTTP::Server`.
#
# In the case of an `HTTP::Server`, `#body` will always raise
# and `#body_io` will optionally have an `IO` representing the request
# body. This will be `nil` if the request has no body.

With that we make sure users always invoke body_io to stream the request body. With two request types we could remove the body method completely, so it'll be a compile-time error, but this exception will be pretty evident as soon as you test your endpoint, so I think that's OK.

Having a single type is better in my opinion:

  1. It's simpler: less types to learn and deal with.
  2. It's backwards compatible.
  3. You can easily implement a proxy: just pass the request to an HTTP::Client (with two types you'd need to correctly convert one request type to another)
  4. It's the same way in Go, so that means this way of modelling things has been successful in another language. This also means that we can now stream request data with an HTTP::Client. For example we can pass an open File as a request's body_io, which covers the file-upload scenario (for other cases one would probably have to use a pipe).

@asterite asterite force-pushed the feature/http_server_request_stream_2 branch from 7f9b721 to c053950 Compare October 9, 2016 20:24
@asterite asterite changed the title [WIP] Support http server request streaming (single HTTP::Request) Support http server and client request streaming (single HTTP::Request) Oct 9, 2016
@asterite asterite changed the title Support http server and client request streaming (single HTTP::Request) Support HTTP server and client request streaming (single HTTP::Request) Oct 9, 2016
@RX14
Copy link
Contributor

RX14 commented Oct 9, 2016

Why not just make body always an IO?

On HTTP::Client this becomes an IO you can write to to send data. On HTTP::Server this becomes an IO you can read from to get the request body.

This does mean reworking a lot of the API of HTTP but I think that HTTP needs a rewrite to be streaming-first anyway. Passing in strings or other static data should be secondary, and use the underlying streaming to copy the data into the exposed IO.

@asterite
Copy link
Member Author

asterite commented Oct 9, 2016

@RX14 It's a good idea, and I thought about it after creating the PR. But there are two issues with that:

  1. We'd need to do MemoryIO.new(string_body) and use that for the HTTP::Request, which incurs a memory allocation (but this is minor, because there are already some memory allocations for each request)
  2. If the body is always an IO it means we never know its length, so we have to send the body as chunked. But for small bodies this is a bit heavy.

A solution for point 2 is to have a special IO that just wraps a String, and when performing the request we'd check that type and extract the String.

@asterite
Copy link
Member Author

asterite commented Oct 9, 2016

@RX14 Hmmm... that special IO could actually just be MemoryIO. I think this is doable :-)

@RX14
Copy link
Contributor

RX14 commented Oct 9, 2016

@asterite If the body IO object for a HTTP::Request is a thin wrapper over the underlying connection, passing a string is simply io << string. No extra objects, just sending a string down an IO the normal way.

We shouldn't "pass in" an IO, we should "safely expose" the underlying writable IO, and if we're passed an object to be the request body, copy that object to the IO the most efficient way.

@asterite
Copy link
Member Author

asterite commented Oct 9, 2016

@RX14 I guess I'd have to see some code. I can't see how that String would go straight to the socket without going through an HTTP::Request object. I also can't imagine a nice API for directly streaming into the request.

@RX14
Copy link
Contributor

RX14 commented Oct 9, 2016

You would essentially have a flow like so:

  1. Create request (establish connection (or lease from keep-alive pool)).
  2. Write to body IO.
  3. Finish (returns response).
  4. Read response body IO.
  5. Response follows usual #close semantics.

Single-call APIs would take a body object that would be copied to the body IO, and return a response.

Raw, linear flow:

request = HTTP::Request.new("POST", "http://example.com/foo", headers: ...)
request.body << "boo bar"
response = request.finish
response.status # => 200
response.body # => IO
response.close

Nicer flow:

response = HTTP::Request.post("http://example.com/foo", headers: ...) do |body|
  body << "foo bar"
end
response.status # => 200
response.body # => IO
response.read do |body|
  # read body, close after block?
end

Request body flow:

response = HTTP::Request.post("http://example.com/foo", headers: ..., body: "foo bar")
response.status # => 200
...

Sure you don't get the response in a block, but I think the benefit outweighs the drawbacks, and the positives of the API outweigh the negatives. Providing a #read block on the response which closes the body after use helps.

Request pooling could be done through an external Pool object which - much like a cookie jar - holds and manages connections to various domains that can be reused.

@asterite
Copy link
Member Author

asterite commented Oct 9, 2016

Where's the socket on those examples? Does the request hold a reference to that?

@asterite
Copy link
Member Author

asterite commented Oct 9, 2016

Nevermind... I think in either case I'd need to review this with @waj and later we can propose alternatives.

@RX14
Copy link
Contributor

RX14 commented Oct 9, 2016

Request and Response may not hold reference to the socket directly, but they would hold reference to the underlying IO wrappers which would hold reference to the socket. I think any sort of streaming Response would have to hold indirect reference, so why not Request?

@sdogruyol
Copy link
Member

I'd really love to see this or #3395 for the next release 🙏

@asterite asterite force-pushed the feature/http_server_request_stream_2 branch 2 times, most recently from 161cbdd to e1cae1b Compare October 12, 2016 17:31
@asterite
Copy link
Member Author

I've updated this PR with a few changes:

  • HTTP::Request has now only a body property that is an IO?. You can create a request with a String or Bytes and it will be wrapped in a MemoryIO and the Content-Length header will be set accordingly.
  • When writing the request to a socket, if there's a Content-Length header then the IO will be copied to the socket without chunking. If the number of bytes copied is different than what the Content-Length header says, an exception is raised. This of course will never happen if the request is created with a String or Bytes, but it allows a client to directly stream an IO in a non-chunked way if they know the content length.

@@ -53,6 +53,9 @@
# of the returned IO (or used for creating a String for the body). Invalid bytes in the given encoding
# are silently ignored when reading text content.
class HTTP::Client
# The set of possible valid body types
alias BodyType = String | Bytes | IO | Nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe BodyType shouldn't include Nil, because it means "no body", then if you want to allow no body then thats just BodyType?? I'm 50/50 on this change.

@@ -107,46 +107,58 @@ module HTTP

# :nodoc:
def self.serialize_headers_and_body(io, headers, body, body_io, version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to retain the body, body_io split? Why not just String | IO?

@sdogruyol
Copy link
Member

This is great 👍

@asterite asterite force-pushed the feature/http_server_request_stream_2 branch from e1cae1b to 436bbce Compare October 19, 2016 21:39
@asterite asterite added this to the 0.20.0 milestone Oct 20, 2016
@asterite asterite merged commit 86e80d0 into master Oct 20, 2016
@asterite asterite deleted the feature/http_server_request_stream_2 branch October 20, 2016 11:05
@RX14
Copy link
Contributor

RX14 commented Oct 23, 2016

I think we should change HTTP::Client::Response#body to be an IO, as Client::Response feels a bit outdated now that we've made this decision.

@asterite
Copy link
Member Author

No, because then you have to remember to close the body every time you do a request, which is tedious and bug-prone.

@RX14
Copy link
Contributor

RX14 commented Oct 23, 2016

@asterite I thought that was why we had the block form. Infact I think it already behaves that way. File.open is unsafe in the same way too, that's why it is encouraged to use File.open(&block). The precedent is already there in the stdlib.

@asterite
Copy link
Member Author

Just because there's a precedent of something bug-prone it doesn't mean we have to do everything in a bug-prone way. Fetching a request as a string is way too common to always have to do it with a block. Also parsing from an IO is slower than from a string, so I prefer to provide a convenient and fast way out of the box.

@RX14
Copy link
Contributor

RX14 commented Oct 23, 2016

@asterite I think that reading IOs into memory is more bug prone and certainly a lot less powerful than using IO. Most of all, getting people to remember that #body is an IO in some contexts and a String in others is confusing and bad API design.

Fetching a request as a string is way too common to always have to do it with a block.

Most responses you get from remote services will be formatted as JSON or some other sort of format that can be read easily from the wire in a block. It will likely be better performance to read directly from IO::Buffered here instead of copying to a string directly, then

Also parsing from an IO is slower than from a string

In the small response case you've just waited probably at least 1ms for the HTTP response so a few extra μs likely doesn't matter. In the large response case you're likely to want an IO so you don't have to buffer it to memory. There is always room for optimisation but I think having a consistent API is much more important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants