Allow reading already retrieved response body after connection is closed #8451

DanielNoord · 2024-06-10T14:58:08Z

Is your feature request related to a problem?

The changes in #3365 have been discussed in other issues before and this is a related issue.

With ClientReponse.text I'm able to get the body of a response even if I did raise the exception. Assuming I read() it before.

aiohttp/aiohttp/client_reqrep.py

Lines 1083 to 1091 in f662958

    
           async def text(self, encoding: Optional[str] = None, errors: str = "strict") -> str: 
        
               """Read response payload and decode.""" 
        
               if self._body is None: 
        
                   await self.read() 
        
               if encoding is None: 
        
                   encoding = self.get_encoding() 
        
               return self._body.decode(encoding, errors=errors)  # type: ignore[union-attr]

If _body is already set it will just be decoded.

However, I'm talking with a server that sets its headers incorrectly and therefore can't rely on get_encoding(). Calling ClientReponse.read actually breaks my code:

aiohttp/aiohttp/client_reqrep.py

Lines 1040 to 1058 in f662958

    
           async def read(self) -> bytes: 
        
               """Read response payload.""" 
        
               if self._body is None: 
        
                   try: 
        
                       self._body = await self.content.read() 
        
                       for trace in self._traces: 
        
                           await trace.send_response_chunk_received( 
        
                               self.method, self.url, self._body 
        
                           ) 
        
                   except BaseException: 
        
                       self.close() 
        
                       raise 
        
               elif self._released:  # Response explicitly released 
        
                   raise ClientConnectionError("Connection closed") 
        
               protocol = self._connection and self._connection.protocol 
        
               if protocol is None or not protocol.upgraded: 
        
                   await self._wait_released()  # Underlying connection released 
        
               return self._body

Even if ._body is already set I won't be able to read it again as there is now a check against self._released.

Describe the solution you'd like

Would it be possible to expose a method that retrieves the body of a ClientReponse if it is already set without checking if the response is released yet?

Describe alternatives you've considered

Not really.

Related component

Client

Additional context

No response

Code of Conduct

I agree to follow the aio-libs Code of Conduct

The text was updated successfully, but these errors were encountered:

unights · 2024-07-26T10:00:47Z

+1, as a temporary alternative, I use ._body directly.

unights · 2024-07-26T10:11:37Z

And I found this. It works well.

import asyncio

from aiohttp import ClientResponse, ClientSession


class WrappedResponse(ClientResponse):

    async def read(self) -> bytes:
        if self._body is None:
            await super().read()
        return self._body


async def main():
    async with ClientSession(response_class=WrappedResponse) as session:
        async with session.get("https://httpbin.org/get") as resp:
            await resp.read()

        body = await resp.read()

        print(len(body))


if __name__ == '__main__':
    asyncio.run(main())

Dreamsorcerer · 2024-09-07T19:08:02Z

@DanielNoord Can you provide a reproducer of this? I'm not clear how you are unable to read it a second time. I'm under the impression that you should be able to if you don't leave the response context (or call resp.close()).

DanielNoord · 2024-09-08T09:08:11Z

I don't have access to my work-laptop so can't give the full reproducer, but I can give the exact flow of what happens.

I use a an adapter pattern that manages my ClientSession and does some additional handling of common errors.

SessionManager calls .read() on the response to check for some common errors. This sets ._body
Some ClientResponseException is raised and ._released is set
In the handler of ClientResponseException I'd like to read the response body. I can't use read() again as it checks against ._released so my only option is text(). However, since the encoding headers are set incorrectly (thanks AWS...) self._body.decode(self.get_encoding(), errors=errors) will fail.

Thus, after ._released is set I can't access the body without accessing _body directly since both text() and read() will raise an exception. Of course I could access ._body but its name implies it is private and shouldn't be relied on.
Ideally we would get a bytes() that calls read() if _body isn't set and otherwise just returns _body to work around get_encoding not returning the correct results.

Of course, it would be better for AWS to set their headers correctly, but I have failed to convince them to do so 😓

Dreamsorcerer · 2024-09-08T11:21:30Z

Some ClientResponseException is raised and ._released is set

Can you provide the traceback for this exception when possible? I'm a little confused where that exception is coming from that could fail read() while also managing to set _body (it looks like that'd only be possible if the exception comes from the writer, which shouldn't be related to AWS's headers). But, also, maybe the source of that exception shouldn't be setting _released.

Note that you can just use an explicit encoding in .text(), so if you know the encoding that should still work for you.

DanielNoord · 2024-09-09T06:59:22Z

It doesn't even matter what type of exception it is though, right?
_released will be set after the first call to read()?

I'd like to give you a reproducer but don't know how I can easily set up a test server that sends the incorrect responses and then show it as a reproducer. Do you have a test in the test suite that I could modify (easily) so I can create a reproducer that helps you understand the issue?

Dreamsorcerer · 2024-09-09T12:00:14Z

_released will be set after the first call to read()?

No, a quick test allows me to call read() twice...
The comments suggest that it should be disallowed after an explicit release (e.g. resp.close() or leaving async with).

Do you have a test in the test suite that I could modify (easily) so I can create a reproducer that helps you understand the issue?

test_client_functional.py is probably a good place.

DanielNoord · 2024-09-15T13:22:22Z

@Dreamsorcerer

Sorry took a bit but I have written my use case into the format that client_functional expects:

async def test_encoding_header_is_incorrect(aiohttp_client: AiohttpClient) -> None:

    async def status_handler(
        status: int, body: bytes, response: ClientResponse
    ) -> None:
        if status == 429:
            # Handle backoff and retries
            pass
        elif status == 401:
            # Handle authentication error
            pass
        elif b"Some known error" in body:
            # Handle known error
            pass

        await response.raise_for_status()

    async def send_incorrect_heading(request: web.Request) -> web.Response:
        return web.Response(
            status=400,
            headers={"Content-Type": "text/html; charset=utf-8"},
            body="Ø".encode("latin1"),
        )

    app = web.Application()
    app.router.add_route("GET", "/headers", send_incorrect_heading)
    client = await aiohttp_client(app)

    async with client.get("/headers") as resp:

        try:
            await status_handler(resp.status, await resp.read(), resp)
        except ClientResponseError:
            # I can still read _body as it was already set, but _body is private
            body = resp._body
            assert body == "Ø".encode("latin1")
            try:
                # This will crash as read() checks whether the resposne was released
                await resp.read()
            except ClientConnectionError:
                # This also won't work as text() checks the encoding which is set incorrectly by AWS
                await resp.text()

The status_handler is a general piece of code that we call on each response to do some general error handling.
We wrap this in a try ... except so that we can insert a log entry into the database for any failing request. To make debugging more effective we'd like to also include the response body (which we retrieve in the except).

Currently I can do so by reading _body which I assert on to check that it is as expected. I think this is valid use as I don't leave the response context. However, _body is of course private.
I can't use read() as the response is already "released" so read() will fail.
I also can't use text(). It sidesteps the check for "released" but tries to use the encoding to turn _body into str, which will fail.

Ideally in the first except I would be able to call get_already_set_body to retrieve the body via a public API that doesn't check "released" or use the encoding. Hope this example makes the use case clearer.

Dreamsorcerer · 2024-09-15T19:47:49Z

OK, so it's the raise_for_status() method, which actually calls .release(). So, that method is considered to be an explicit release, I guess. That probably makes sense given that it's raising an exception, so if someone's code depends on calling .release() it may not get called.

DanielNoord · 2024-09-15T21:16:09Z

I don't really understand your response, sorry. Could you clarify a bit?

What I'm asking is whether it is possible to add/change a public API that gives access to ._body when it is already set but also when the response is already released. Both .text() and .read() fulfil only one these constraints.

If it should be impossible to access ._body after .released() was called then .text() should also start checking for it, as currently the code above would pass perfectly fine if "Ø".encode("latin1") would be "Ø".encode("utf-8"). Personally, I think this is actually quite a nice API as the code I wrote shows a legitimate use case for re-accessing the body even when the response is released. The issue is just that I want the raw bytes and not the str as .text() returns.

Dreamsorcerer · 2024-09-22T17:18:16Z

Just discussed this. What we're going to do is make .text()/.json() match the .read() behaviour, so they shouldn't be used outside the context (probably v4 only to reduce breakage).

Then we'll change .raise_for_status() so it doesn't release() if we're already in a context, in which case it can just rely on __aexit__() to release as the exception propagates. This change will allow your example to work correctly.

DanielNoord · 2024-09-22T17:21:22Z

Awesome! Thanks for the clear response and plan forward. Looking forward to the changes. Let me know if you need any further input or help with testing.

DanielNoord added the enhancement label Jun 10, 2024

Dreamsorcerer added the needs-info Issue is lacking sufficient information and will be closed if not provided label Sep 7, 2024

Dreamsorcerer removed the needs-info Issue is lacking sufficient information and will be closed if not provided label Sep 22, 2024

Dreamsorcerer mentioned this issue Sep 22, 2024

Stop raise_for_status() releasing when in a context #9239

Merged

Dreamsorcerer closed this as completed in #9239 Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow reading already retrieved response body after connection is closed #8451

Allow reading already retrieved response body after connection is closed #8451

DanielNoord commented Jun 10, 2024

unights commented Jul 26, 2024

unights commented Jul 26, 2024

Dreamsorcerer commented Sep 7, 2024

DanielNoord commented Sep 8, 2024

Dreamsorcerer commented Sep 8, 2024 •

edited

Loading

DanielNoord commented Sep 9, 2024

Dreamsorcerer commented Sep 9, 2024

DanielNoord commented Sep 15, 2024

Dreamsorcerer commented Sep 15, 2024

DanielNoord commented Sep 15, 2024

Dreamsorcerer commented Sep 22, 2024 •

edited

Loading

DanielNoord commented Sep 22, 2024

Allow reading already retrieved response body after connection is closed #8451

Allow reading already retrieved response body after connection is closed #8451

Comments

DanielNoord commented Jun 10, 2024

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Related component

Additional context

Code of Conduct

unights commented Jul 26, 2024

unights commented Jul 26, 2024

Dreamsorcerer commented Sep 7, 2024

DanielNoord commented Sep 8, 2024

Dreamsorcerer commented Sep 8, 2024 • edited Loading

DanielNoord commented Sep 9, 2024

Dreamsorcerer commented Sep 9, 2024

DanielNoord commented Sep 15, 2024

Dreamsorcerer commented Sep 15, 2024

DanielNoord commented Sep 15, 2024

Dreamsorcerer commented Sep 22, 2024 • edited Loading

DanielNoord commented Sep 22, 2024

Dreamsorcerer commented Sep 8, 2024 •

edited

Loading

Dreamsorcerer commented Sep 22, 2024 •

edited

Loading