Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix erroneous retries on a failed request to a newly opened socket #150

Merged
merged 1 commit into from
Sep 23, 2024

Conversation

ahl
Copy link
Contributor

@ahl ahl commented Sep 21, 2024

In migrating to hyper v1, we encountered an issue with reqwest that we tracked down to hyper-util. I've created a reproduction here.

We see what appears to be aberrant behavior when a client (reqwest::Client or hyper_util::client::legacy::Client) is making a request to a server that may close the connection deliberately. In particular, we see that the client opens a new connection and may try opening connections many many times! If the client is unable to start writing the request to the newly opened connection, it will open a new connection and try again until it's able to write some or all of the request to the socket prior to the server closing it.

This appears to be a result of #133 which reintroduced retry logic. It is quite similar to hyperium/hyper@ee61ea9 but diverges in some important ways. In particular:

#133

        loop {
            req = match self.try_send_request(req, pool_key.clone()).await {
                Ok(resp) => return Ok(resp),
                Err(TrySendError::Nope(err)) => return Err(err),
                Err(TrySendError::Retryable { mut req, error }) => {
                    if !self.config.retry_canceled_requests {
                        // if client disabled, don't retry
                        // a fresh connection means we definitely can't retry
                        return Err(error);
                    }

                    trace!(
                        "unstarted request canceled, trying again (reason={:?})",
                        error
                    );
                    *req.uri_mut() = uri.clone();
                    req
                }
            }
        }

hyperium/hyper@ee61ea9

        loop {
            match self.future.poll() {
                Ok(Async::Ready(resp)) => return Ok(Async::Ready(resp)),
                Ok(Async::NotReady) => return Ok(Async::NotReady),
                Err(ClientError::Normal(err)) => return Err(err),
                Err(ClientError::Canceled {
                    connection_reused,
                    req,
                    reason,
                }) => {
                    if !self.client.retry_canceled_requests || !connection_reused {
                        // if client disabled, don't retry
                        // a fresh connection means we definitely can't retry
                        return Err(reason);
                    }
                    trace!("unstarted request canceled, trying again (reason={:?})", reason);
                    let mut req = request::join(req);
                    req.set_proxy(self.is_proxy);
                    req.set_uri(self.uri.clone());
                    self.future = self.client.send_request(req, &self.domain);
                }
            }
        }

Note that the comment has been preserved across the years, but the critical check for connection_reused is absent on the new revision.

Here are the call-specific error types each commit introduced:

#133

enum TrySendError<B> {
    Retryable { error: Error, req: Request<B> },
    Nope(Error),
}

hyperium/hyper@ee61ea9

pub(crate) enum ClientError<B> {
    Normal(::Error),
    Canceled {
        connection_reused: bool,
        req: (::proto::RequestHead, Option<B>),
        reason: ::Error,
    }
}

It seems as though the newer code may have been accidentally similar to the older code rather than intentionally omitting connection_reused, but I may be wrong.

In this case, we are establishing a new connection. The documentation for retry_canceled_requests suggests that the setting should only be applicable for pooled connections that have been reused:

Set whether to retry requests that get disrupted before ever starting to write.

This means a request that is queued, and gets given an idle, reused connection, and then encounters an error immediately as the idle connection was found to be unusable.

When this is set to false, the related ResponseFuture would instead resolve to an Error::Cancel.

This fix borrows from the older code. With it applied, the reproducer above issues a single connection request (which fails, as expected).

I've deleted commented out code that appears to be no longer relevant in that it applies to functionality that is either implemented by #133 (and this fix) or may no longer be applicable. If these deletions were overly cavalier or simply unwanted, I'm happy to revert them.

Copy link
Member

@seanmonstar seanmonstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

I see what happened, when the code was transferred over I consolidated the "cant retry" concept into just Nope, assuming that the only case was establishing a connection (the connection_for() call). But this rightly fixes the case where a connection was created, but then errors immediately afterwards.

@seanmonstar seanmonstar merged commit d3e9699 into hyperium:master Sep 23, 2024
16 checks passed
@ahl
Copy link
Contributor Author

ahl commented Sep 23, 2024

@seanmonstar thanks for this; do you know when we can expect a new release of hyper-util?

@seanmonstar
Copy link
Member

Landing some dependency updates, and then release likely tomorrow.

r58Playz pushed a commit to r58Playz/hyper-util-wasm that referenced this pull request Oct 12, 2024
…yperium#150)

The legacy pool client will in certain circumstances try other connections if it it notices an error just as it's trying to send a request. There is also some code that prevents retrying forever, such as if it is a new connection, since it likely won't ever work then.

However, there was a bug that this fixes, where if a new connection was successfully created, but _then_ errored when trying to send the request, it would consider retrying that. This could end up in a loop if the server accepts and then errors all connections.

The fix is to re-introduce tracking whether this connection has been successfully used once all the way through, "reused", and if it hasn't, abort any retrying.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants