Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow reading already retrieved response body after connection is closed #8451

Closed
1 task done
DanielNoord opened this issue Jun 10, 2024 · 12 comments · Fixed by #9239
Closed
1 task done

Allow reading already retrieved response body after connection is closed #8451

DanielNoord opened this issue Jun 10, 2024 · 12 comments · Fixed by #9239

Comments

@DanielNoord
Copy link

Is your feature request related to a problem?

The changes in #3365 have been discussed in other issues before and this is a related issue.

With ClientReponse.text I'm able to get the body of a response even if I did raise the exception. Assuming I read() it before.

async def text(self, encoding: Optional[str] = None, errors: str = "strict") -> str:
"""Read response payload and decode."""
if self._body is None:
await self.read()
if encoding is None:
encoding = self.get_encoding()
return self._body.decode(encoding, errors=errors) # type: ignore[union-attr]

If _body is already set it will just be decoded.

However, I'm talking with a server that sets its headers incorrectly and therefore can't rely on get_encoding(). Calling ClientReponse.read actually breaks my code:

async def read(self) -> bytes:
"""Read response payload."""
if self._body is None:
try:
self._body = await self.content.read()
for trace in self._traces:
await trace.send_response_chunk_received(
self.method, self.url, self._body
)
except BaseException:
self.close()
raise
elif self._released: # Response explicitly released
raise ClientConnectionError("Connection closed")
protocol = self._connection and self._connection.protocol
if protocol is None or not protocol.upgraded:
await self._wait_released() # Underlying connection released
return self._body

Even if ._body is already set I won't be able to read it again as there is now a check against self._released.

Describe the solution you'd like

Would it be possible to expose a method that retrieves the body of a ClientReponse if it is already set without checking if the response is released yet?

Describe alternatives you've considered

Not really.

Related component

Client

Additional context

No response

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct
@unights
Copy link

unights commented Jul 26, 2024

+1, as a temporary alternative, I use ._body directly.

@unights
Copy link

unights commented Jul 26, 2024

And I found this. It works well.

import asyncio

from aiohttp import ClientResponse, ClientSession


class WrappedResponse(ClientResponse):

    async def read(self) -> bytes:
        if self._body is None:
            await super().read()
        return self._body


async def main():
    async with ClientSession(response_class=WrappedResponse) as session:
        async with session.get("https://httpbin.org/get") as resp:
            await resp.read()

        body = await resp.read()

        print(len(body))


if __name__ == '__main__':
    asyncio.run(main())

@Dreamsorcerer
Copy link
Member

@DanielNoord Can you provide a reproducer of this? I'm not clear how you are unable to read it a second time. I'm under the impression that you should be able to if you don't leave the response context (or call resp.close()).

@Dreamsorcerer Dreamsorcerer added the needs-info Issue is lacking sufficient information and will be closed if not provided label Sep 7, 2024
@DanielNoord
Copy link
Author

I don't have access to my work-laptop so can't give the full reproducer, but I can give the exact flow of what happens.

I use a an adapter pattern that manages my ClientSession and does some additional handling of common errors.

  1. SessionManager calls .read() on the response to check for some common errors. This sets ._body
  2. Some ClientResponseException is raised and ._released is set
  3. In the handler of ClientResponseException I'd like to read the response body. I can't use read() again as it checks against ._released so my only option is text(). However, since the encoding headers are set incorrectly (thanks AWS...) self._body.decode(self.get_encoding(), errors=errors) will fail.

Thus, after ._released is set I can't access the body without accessing _body directly since both text() and read() will raise an exception. Of course I could access ._body but its name implies it is private and shouldn't be relied on.
Ideally we would get a bytes() that calls read() if _body isn't set and otherwise just returns _body to work around get_encoding not returning the correct results.

Of course, it would be better for AWS to set their headers correctly, but I have failed to convince them to do so 😓

@Dreamsorcerer
Copy link
Member

Dreamsorcerer commented Sep 8, 2024

Some ClientResponseException is raised and ._released is set

Can you provide the traceback for this exception when possible? I'm a little confused where that exception is coming from that could fail read() while also managing to set _body (it looks like that'd only be possible if the exception comes from the writer, which shouldn't be related to AWS's headers). But, also, maybe the source of that exception shouldn't be setting _released.

Note that you can just use an explicit encoding in .text(), so if you know the encoding that should still work for you.

@DanielNoord
Copy link
Author

It doesn't even matter what type of exception it is though, right?
_released will be set after the first call to read()?

I'd like to give you a reproducer but don't know how I can easily set up a test server that sends the incorrect responses and then show it as a reproducer. Do you have a test in the test suite that I could modify (easily) so I can create a reproducer that helps you understand the issue?

@Dreamsorcerer
Copy link
Member

_released will be set after the first call to read()?

No, a quick test allows me to call read() twice...
The comments suggest that it should be disallowed after an explicit release (e.g. resp.close() or leaving async with).

Do you have a test in the test suite that I could modify (easily) so I can create a reproducer that helps you understand the issue?

test_client_functional.py is probably a good place.

@DanielNoord
Copy link
Author

@Dreamsorcerer

Sorry took a bit but I have written my use case into the format that client_functional expects:

async def test_encoding_header_is_incorrect(aiohttp_client: AiohttpClient) -> None:

    async def status_handler(
        status: int, body: bytes, response: ClientResponse
    ) -> None:
        if status == 429:
            # Handle backoff and retries
            pass
        elif status == 401:
            # Handle authentication error
            pass
        elif b"Some known error" in body:
            # Handle known error
            pass

        await response.raise_for_status()

    async def send_incorrect_heading(request: web.Request) -> web.Response:
        return web.Response(
            status=400,
            headers={"Content-Type": "text/html; charset=utf-8"},
            body="Ø".encode("latin1"),
        )

    app = web.Application()
    app.router.add_route("GET", "/headers", send_incorrect_heading)
    client = await aiohttp_client(app)

    async with client.get("/headers") as resp:

        try:
            await status_handler(resp.status, await resp.read(), resp)
        except ClientResponseError:
            # I can still read _body as it was already set, but _body is private
            body = resp._body
            assert body == "Ø".encode("latin1")
            try:
                # This will crash as read() checks whether the resposne was released
                await resp.read()
            except ClientConnectionError:
                # This also won't work as text() checks the encoding which is set incorrectly by AWS
                await resp.text()

The status_handler is a general piece of code that we call on each response to do some general error handling.
We wrap this in a try ... except so that we can insert a log entry into the database for any failing request. To make debugging more effective we'd like to also include the response body (which we retrieve in the except).

Currently I can do so by reading _body which I assert on to check that it is as expected. I think this is valid use as I don't leave the response context. However, _body is of course private.
I can't use read() as the response is already "released" so read() will fail.
I also can't use text(). It sidesteps the check for "released" but tries to use the encoding to turn _body into str, which will fail.

Ideally in the first except I would be able to call get_already_set_body to retrieve the body via a public API that doesn't check "released" or use the encoding. Hope this example makes the use case clearer.

@Dreamsorcerer
Copy link
Member

OK, so it's the raise_for_status() method, which actually calls .release(). So, that method is considered to be an explicit release, I guess. That probably makes sense given that it's raising an exception, so if someone's code depends on calling .release() it may not get called.

@DanielNoord
Copy link
Author

I don't really understand your response, sorry. Could you clarify a bit?

What I'm asking is whether it is possible to add/change a public API that gives access to ._body when it is already set but also when the response is already released. Both .text() and .read() fulfil only one these constraints.

If it should be impossible to access ._body after .released() was called then .text() should also start checking for it, as currently the code above would pass perfectly fine if "Ø".encode("latin1") would be "Ø".encode("utf-8"). Personally, I think this is actually quite a nice API as the code I wrote shows a legitimate use case for re-accessing the body even when the response is released. The issue is just that I want the raw bytes and not the str as .text() returns.

@Dreamsorcerer Dreamsorcerer removed the needs-info Issue is lacking sufficient information and will be closed if not provided label Sep 22, 2024
@Dreamsorcerer
Copy link
Member

Dreamsorcerer commented Sep 22, 2024

Just discussed this. What we're going to do is make .text()/.json() match the .read() behaviour, so they shouldn't be used outside the context (probably v4 only to reduce breakage).

Then we'll change .raise_for_status() so it doesn't release() if we're already in a context, in which case it can just rely on __aexit__() to release as the exception propagates. This change will allow your example to work correctly.

@DanielNoord
Copy link
Author

Awesome! Thanks for the clear response and plan forward. Looking forward to the changes. Let me know if you need any further input or help with testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants