Support HTTP server and client request streaming (single HTTP::Request) #3406

asterite · 2016-10-09T20:14:25Z

This is an alternative implementation to #3395

I realized there's no need to have two distinct types representing an HTTP::Request. I copy its docs:

# An HTTP request.
#
# It serves both to perform requests by an `HTTP::Client` and to
# represent requests received by an `HTTP::Server`.
#
# In the case of an `HTTP::Server`, `#body` will always raise
# and `#body_io` will optionally have an `IO` representing the request
# body. This will be `nil` if the request has no body.

With that we make sure users always invoke body_io to stream the request body. With two request types we could remove the body method completely, so it'll be a compile-time error, but this exception will be pretty evident as soon as you test your endpoint, so I think that's OK.

Having a single type is better in my opinion:

It's simpler: less types to learn and deal with.
It's backwards compatible.
You can easily implement a proxy: just pass the request to an HTTP::Client (with two types you'd need to correctly convert one request type to another)
It's the same way in Go, so that means this way of modelling things has been successful in another language. This also means that we can now stream request data with an HTTP::Client. For example we can pass an open File as a request's body_io, which covers the file-upload scenario (for other cases one would probably have to use a pipe).

RX14 · 2016-10-09T20:38:19Z

Why not just make body always an IO?

On HTTP::Client this becomes an IO you can write to to send data. On HTTP::Server this becomes an IO you can read from to get the request body.

This does mean reworking a lot of the API of HTTP but I think that HTTP needs a rewrite to be streaming-first anyway. Passing in strings or other static data should be secondary, and use the underlying streaming to copy the data into the exposed IO.

asterite · 2016-10-09T20:42:45Z

@RX14 It's a good idea, and I thought about it after creating the PR. But there are two issues with that:

We'd need to do MemoryIO.new(string_body) and use that for the HTTP::Request, which incurs a memory allocation (but this is minor, because there are already some memory allocations for each request)
If the body is always an IO it means we never know its length, so we have to send the body as chunked. But for small bodies this is a bit heavy.

A solution for point 2 is to have a special IO that just wraps a String, and when performing the request we'd check that type and extract the String.

asterite · 2016-10-09T20:51:54Z

@RX14 Hmmm... that special IO could actually just be MemoryIO. I think this is doable :-)

RX14 · 2016-10-09T20:52:28Z

@asterite If the body IO object for a HTTP::Request is a thin wrapper over the underlying connection, passing a string is simply io << string. No extra objects, just sending a string down an IO the normal way.

We shouldn't "pass in" an IO, we should "safely expose" the underlying writable IO, and if we're passed an object to be the request body, copy that object to the IO the most efficient way.

asterite · 2016-10-09T21:04:33Z

@RX14 I guess I'd have to see some code. I can't see how that String would go straight to the socket without going through an HTTP::Request object. I also can't imagine a nice API for directly streaming into the request.

RX14 · 2016-10-09T21:23:38Z

You would essentially have a flow like so:

Create request (establish connection (or lease from keep-alive pool)).
Write to body IO.
Finish (returns response).
Read response body IO.
Response follows usual #close semantics.

Single-call APIs would take a body object that would be copied to the body IO, and return a response.

Raw, linear flow:

request = HTTP::Request.new("POST", "http://example.com/foo", headers: ...)
request.body << "boo bar"
response = request.finish
response.status # => 200
response.body # => IO
response.close

Nicer flow:

response = HTTP::Request.post("http://example.com/foo", headers: ...) do |body|
  body << "foo bar"
end
response.status # => 200
response.body # => IO
response.read do |body|
  # read body, close after block?
end

Request body flow:

response = HTTP::Request.post("http://example.com/foo", headers: ..., body: "foo bar")
response.status # => 200
...

Sure you don't get the response in a block, but I think the benefit outweighs the drawbacks, and the positives of the API outweigh the negatives. Providing a #read block on the response which closes the body after use helps.

Request pooling could be done through an external Pool object which - much like a cookie jar - holds and manages connections to various domains that can be reused.

asterite · 2016-10-09T21:29:09Z

Where's the socket on those examples? Does the request hold a reference to that?

asterite · 2016-10-09T21:31:00Z

Nevermind... I think in either case I'd need to review this with @waj and later we can propose alternatives.

RX14 · 2016-10-09T21:43:39Z

Request and Response may not hold reference to the socket directly, but they would hold reference to the underlying IO wrappers which would hold reference to the socket. I think any sort of streaming Response would have to hold indirect reference, so why not Request?

sdogruyol · 2016-10-10T13:05:14Z

I'd really love to see this or #3395 for the next release 🙏

asterite · 2016-10-12T17:31:50Z

I've updated this PR with a few changes:

HTTP::Request has now only a body property that is an IO?. You can create a request with a String or Bytes and it will be wrapped in a MemoryIO and the Content-Length header will be set accordingly.
When writing the request to a socket, if there's a Content-Length header then the IO will be copied to the socket without chunking. If the number of bytes copied is different than what the Content-Length header says, an exception is raised. This of course will never happen if the request is created with a String or Bytes, but it allows a client to directly stream an IO in a non-chunked way if they know the content length.

RX14 · 2016-10-12T17:38:19Z

src/http/client.cr

@@ -53,6 +53,9 @@
 # of the returned IO (or used for creating a String for the body). Invalid bytes in the given encoding
 # are silently ignored when reading text content.
 class HTTP::Client
+  # The set of possible valid body types
+  alias BodyType = String | Bytes | IO | Nil


Maybe BodyType shouldn't include Nil, because it means "no body", then if you want to allow no body then thats just BodyType?? I'm 50/50 on this change.

RX14 · 2016-10-12T17:41:36Z

src/http/common.cr

@@ -107,46 +107,58 @@ module HTTP

  # :nodoc:
  def self.serialize_headers_and_body(io, headers, body, body_io, version)


Do we need to retain the body, body_io split? Why not just String | IO?

sdogruyol · 2016-10-13T13:37:09Z

This is great 👍

RX14 · 2016-10-23T11:22:39Z

I think we should change HTTP::Client::Response#body to be an IO, as Client::Response feels a bit outdated now that we've made this decision.

asterite · 2016-10-23T12:24:12Z

No, because then you have to remember to close the body every time you do a request, which is tedious and bug-prone.

RX14 · 2016-10-23T15:28:52Z

@asterite I thought that was why we had the block form. Infact I think it already behaves that way. File.open is unsafe in the same way too, that's why it is encouraged to use File.open(&block). The precedent is already there in the stdlib.

asterite · 2016-10-23T16:12:34Z

Just because there's a precedent of something bug-prone it doesn't mean we have to do everything in a bug-prone way. Fetching a request as a string is way too common to always have to do it with a block. Also parsing from an IO is slower than from a string, so I prefer to provide a convenient and fast way out of the box.

RX14 · 2016-10-23T17:13:55Z

@asterite I think that reading IOs into memory is more bug prone and certainly a lot less powerful than using IO. Most of all, getting people to remember that #body is an IO in some contexts and a String in others is confusing and bad API design.

Fetching a request as a string is way too common to always have to do it with a block.

Most responses you get from remote services will be formatted as JSON or some other sort of format that can be read easily from the wire in a block. It will likely be better performance to read directly from IO::Buffered here instead of copying to a string directly, then

Also parsing from an IO is slower than from a string

In the small response case you've just waited probably at least 1ms for the HTTP response so a few extra μs likely doesn't matter. In the large response case you're likely to want an IO so you don't have to buffer it to memory. There is always room for optimisation but I think having a consistent API is much more important.

asterite force-pushed the feature/http_server_request_stream_2 branch from 7f9b721 to c053950 Compare October 9, 2016 20:24

asterite changed the title ~~[WIP] Support http server request streaming (single HTTP::Request)~~ Support http server and client request streaming (single HTTP::Request) Oct 9, 2016

asterite changed the title ~~Support http server and client request streaming (single HTTP::Request)~~ Support HTTP server and client request streaming (single HTTP::Request) Oct 9, 2016

RX14 mentioned this pull request Oct 10, 2016

Downloading large file (700 Mo) RX14/multipart.cr#3

Closed

asterite force-pushed the feature/http_server_request_stream_2 branch from c053950 to 70ed1e0 Compare October 12, 2016 17:23

Support http server request streaming (single HTTP::Request)

542577c

asterite force-pushed the feature/http_server_request_stream_2 branch 2 times, most recently from 161cbdd to e1cae1b Compare October 12, 2016 17:31

RX14 reviewed Oct 12, 2016

View reviewed changes

HTTP::Client: support passing an IO as the body

436bbce

asterite force-pushed the feature/http_server_request_stream_2 branch from e1cae1b to 436bbce Compare October 19, 2016 21:39

asterite added kind:feature topic:stdlib labels Oct 20, 2016

asterite added this to the 0.20.0 milestone Oct 20, 2016

asterite merged commit 86e80d0 into master Oct 20, 2016

asterite mentioned this pull request Oct 20, 2016

[WIP] Support http server request streaming #3395

Closed

asterite deleted the feature/http_server_request_stream_2 branch October 20, 2016 11:05

asterite mentioned this pull request Oct 20, 2016

Accept IO objects as HTTP request bodies #2096

Closed

sdogruyol mentioned this pull request Oct 30, 2016

HTTP::Request, added remote ip and peer_addr #1722

Closed

vonKingsley mentioned this pull request Nov 4, 2016

match against to_s for MemoryIO, fix for introduction of [3406] Memor… manastech/webmock.cr#11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support HTTP server and client request streaming (single HTTP::Request) #3406

Support HTTP server and client request streaming (single HTTP::Request) #3406

asterite commented Oct 9, 2016

RX14 commented Oct 9, 2016 •

edited

Loading

asterite commented Oct 9, 2016

asterite commented Oct 9, 2016

RX14 commented Oct 9, 2016

asterite commented Oct 9, 2016

RX14 commented Oct 9, 2016

asterite commented Oct 9, 2016

asterite commented Oct 9, 2016

RX14 commented Oct 9, 2016

sdogruyol commented Oct 10, 2016

asterite commented Oct 12, 2016

RX14 Oct 12, 2016

RX14 Oct 12, 2016

sdogruyol commented Oct 13, 2016

RX14 commented Oct 23, 2016

asterite commented Oct 23, 2016

RX14 commented Oct 23, 2016 •

edited

Loading

asterite commented Oct 23, 2016

RX14 commented Oct 23, 2016

		@@ -107,46 +107,58 @@ module HTTP

		# :nodoc:
		def self.serialize_headers_and_body(io, headers, body, body_io, version)

Support HTTP server and client request streaming (single HTTP::Request) #3406

Support HTTP server and client request streaming (single HTTP::Request) #3406

Conversation

asterite commented Oct 9, 2016

RX14 commented Oct 9, 2016 • edited Loading

asterite commented Oct 9, 2016

asterite commented Oct 9, 2016

RX14 commented Oct 9, 2016

asterite commented Oct 9, 2016

RX14 commented Oct 9, 2016

asterite commented Oct 9, 2016

asterite commented Oct 9, 2016

RX14 commented Oct 9, 2016

sdogruyol commented Oct 10, 2016

asterite commented Oct 12, 2016

RX14 Oct 12, 2016

Choose a reason for hiding this comment

RX14 Oct 12, 2016

Choose a reason for hiding this comment

sdogruyol commented Oct 13, 2016

RX14 commented Oct 23, 2016

asterite commented Oct 23, 2016

RX14 commented Oct 23, 2016 • edited Loading

asterite commented Oct 23, 2016

RX14 commented Oct 23, 2016

RX14 commented Oct 9, 2016 •

edited

Loading

RX14 commented Oct 23, 2016 •

edited

Loading