Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign HTTP::Client to implement missing features #6011

Open
straight-shoota opened this issue Apr 25, 2018 · 19 comments
Open

Redesign HTTP::Client to implement missing features #6011

straight-shoota opened this issue Apr 25, 2018 · 19 comments

Comments

@straight-shoota
Copy link
Member

straight-shoota commented Apr 25, 2018

HTTP::Client is missing some quite essential feature, and while there are some individual issues about that, it is better to discuss necessary design changes in a central place.

Some missing features:

The first group is about adding more flexibility to use different types of connections or enhance their usage. This needs to separate the implementation for establishing a connection from the HTTP::Client.
#6001 is an incomplete experiment for that.

The concept of using a configurable transport is employed by many other implementations of HTTP clients:

  • Go: RoundTripper (interface request -> response) and Transport (implementation, including proxy, reusing connections, HTTP/2 and a lot of basic HTTP workflow like redirects).
  • Java: com.google.api.client.http.HttpTransport: builds a request (method, url -> request ) and request.execute() gets the response.
  • Ruby: Faraday implements a middleware interface similar to HTTP::Handler.
  • Ruby: Hurley has a connection property which is a lambda request -> response

It makes great sense to have a HTTP::Client as a high-level interface delegate the heavy-lifting to a replaceable low-level API. User should be able to configure HTTP::Client to use a specific transport.

The default transport should allow to connect to any TCP server and maintain a connection pool to re-use idle sockets, even beyond the life-span of a single client instance. Other transports could connect to a proxy, a Unix socket or just mock a connection for testing purposes.

It is a question how the interface between HTTP::Client and transport should be designed. In the experimental PR #6001 the transport returns an IO for a specific request, but the client still writes the request and reads the response.
I'm more tending towards making the interface as simple as Request -> Response. This would obviously move most of the implementation of HTTP::Client to the transport, but I don't think that's a bad thing to move the internal heavy-lifting to the lower level. The interface is also more versatile because it allows the transport to manipulate the request before writing/response after reading. This is for example necessary for proxy requests, because they need the full URL as resource. An additional benefit is that requests and responses don't necessarily need to be sent through an IO. A transport can for example implement an in-memory client cache or directly create responses for non-HTTP protocols such as file.

Such a simple interface would also make it easy to chain transports for more flexibility and might be good place to inject middleware.

This brings us to the second group of missing, more high-level features. Some HTTP clients such as Faraday allow to inject custom middleware to the request-response-chain which can implement for example authentication or following redirects etc.

I am convinced that essential features such as following redirects, basic auth etc. should not require a middleware handler, but be implemented directly either in the low- or high-level API. They need to be optional and configurable, but I wouldn't like to see a HTTP::Client setup using a herd of middleware handlers just to provide basic HTTP. That should only be used for stuff that's on-top (for example OAuth) and can't reasonably be included in the default client implementation.

Related: #1330 (socket factory as transport)

@jwoertink
Copy link
Contributor

Yes, that proxy support would be amazing!

@rishavs
Copy link

rishavs commented Nov 8, 2018

@straight-shoota Just a suggestion; can you use checkboxes instead of bullet points in your OP? it will make it easier to follow the progress in individual points, instead of checking each item individually.

BTW, really love this list of proposed improvements. proxy and http/2 support are the biggest ones IMO. For basic auth, i am more used to the idea of using a middleware, ala Node but i will be fine with handling it in a separate layer.

@straight-shoota
Copy link
Member Author

@rishavs We're currently talking about rough design. The implementation of individual features is a far way to go.

@straight-shoota
Copy link
Member Author

But I'd like to see this discussion moving forward.

In my opinion, the exposed API of HTTP::Client needs to be dead simple but build on a complex backend. It should be as easy as running curl or accessing a website in a browser. In almost all cases, the application doesn't care about how a HTTP response is acquired. Does it create a new TCP connection or re-use an existing one? Is it a simple HTTP/1.1 protocol or multiplexing through HTTP/2 -- at some point even HTTP/3 HTTP::Client should simply use the best available method supported by the server.
This is obviously a very sophisticated goal and will take a long time to reach, most likely till after 1.0.

Maybe this complex implementation doesn't even need to be the default - or even included in the stdlib at all. But the stdlib HTTP::Client API should be designed in such a way that the transport adapter is easily exchangeable. Its interface should be simple enough (Request -> Response) and don't assume anything about connection details. This makes it easy to improve the implementation step by step. The start would be more or less a simple, one-shot connection implementation for TCP/IP. It might even keep the connection alive and reuse it, if the next request is for the same host. Otherwise, a new connection should be established. Proxy and Unix sockets are further feature improvements and relatively easy. A big step would later be a reusable connection pool and HTTP/2. Eventually even HTTP/3.

@PercussiveElbow
Copy link

Has there been any movement recently on UNIX socket support?

@straight-shoota
Copy link
Member Author

Not yet, unfortunately. But I'd like to move on with this. Possibly for 0.29.0.

@m-o-e
Copy link
Contributor

m-o-e commented Dec 17, 2019

But I'd like to see this discussion moving forward.

+1 from me (a happy crystal user who started missing persistent HTTP connections today 😞).

It should be as easy as running curl

Perhaps it could make sense to rebuild HTTP::Client on top of libcurl?

I don't know how core team would feel about introducing an external dependency
for something as central as HTTP, but in my experience I end up reaching for the
curl-binding in most languages most of the time anyway (e.g. typhoeus or PycURL),
because the stdlib client usually lacks in features and/or performance.

@straight-shoota
Copy link
Member Author

@m-o-e I think this has been brought up in the discussion before, and it probably won't work to integrate libcurl into Crystal's event loop.

@m-o-e
Copy link
Contributor

m-o-e commented Dec 17, 2019

@straight-shoota Ah okay, understood. Sorry for the duplicate!

@yorci
Copy link

yorci commented Jun 2, 2020

That would be great to have outgoing IP bind from HTTP::Client which tcpsocket have already. Its really useful when work with API's req/hour limits based on IP.

@rishavs
Copy link

rishavs commented Oct 15, 2020

Are any of these, part of the 1.0 plan?

@asterite
Copy link
Member

@rishavs no

@erdnaxeli
Copy link
Contributor

erdnaxeli commented Nov 23, 2020

In almost all cases, the application doesn't care about how a HTTP response is acquired. Does it create a new TCP connection or re-use an existing one?

For this specific point (reuse connections) I think the application should be aware of what is happening. Sockets are a limited resource and app should control how they use it. Coming from Python (where the most used HTTP lib is requests), I think a concept of session should be good:

session = HTTP::Session.new
session.get("https://crystal-lang.org")

Inside a session, calls reuse connections (if possible), but simple calls with HTTP::Client should not.

@straight-shoota
Copy link
Member Author

Most apps really don't need to care about that. It's good to have a low-level API for managing connections manually. But for typical use cases you just want to send HTTP requests from different parts of an application without having to drag a session object around. HTTP connection management can easily be abstracted away.

@erdnaxeli
Copy link
Contributor

erdnaxeli commented Nov 24, 2020

But if we do that I am worried about an app leaking sockets without control over them…

I went look at how other do it: Ruby doesn't do that by default, Python neither does it, Golang does it by default (you must instantiate a client to change it).

So ok why not, the default behavior would persist connections, and if you want to change it you would need to instantiate a client object and change the parameters.

@didactic-drunk
Copy link
Contributor

Leaking sockets is a real problem. See taylorfinnell/awscr-s3#77. In this case the library could use a pool because all requests go to the same endpoint.

The same isn't true for the 100's of one off cases where a developer has a list of urls and tries:

urls.map { |url| HTTP::Client... }...

Often it will work in testing for a small list of urls then fail sporadically in production.

I think connection reuse should be opt in rather than opt out. Whether through a session or other API I don't know, but with real use I'm hitting socket issues from 3rd party libraries with no way to control it.

@straight-shoota
Copy link
Member Author

If you run one-off requests with HTTP::Client's class methods, the connection should be properly closed automatically. That's expected behaviour. If you happen to observe socket leakage, it's a bug.

mloughran added a commit to mloughran/twirp.cr that referenced this issue Jul 27, 2022
Ideally Crystal's HTTP client would manage a pool of connections and reuse them when appropriate, but it does not – see crystal-lang/crystal#6011 – so using a new TCP connection for each request seems like the safest approach.

The previous implementation was hopelessly broken due to the `@client.close`, which resulted in `Closed stream (IO::Error)` errors when used concurrently... However, IIRC, without the explicit close other issues arise when TCP sockets timeout of their own accord.
@crysbot
Copy link

crysbot commented May 10, 2024

This issue has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/avoiding-closuring-a-proc-thats-an-instance-variable/6834/6

@crysbot
Copy link

crysbot commented Jul 21, 2024

This issue has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/pooling-http-connections-automatically/7030/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests