Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid character in header #5269

Closed
zhaoxuan opened this issue Nov 22, 2020 · 30 comments
Closed

invalid character in header #5269

zhaoxuan opened this issue Nov 22, 2020 · 30 comments
Labels
bug reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR

Comments

@zhaoxuan
Copy link

zhaoxuan commented Nov 22, 2020

🐞 Describe the bug

[2020-11-22 21:24:53 +0800] [55277] [ERROR] Error handling request
Traceback (most recent call last):
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadHttpMessage: 400, message='invalid character in header'

💡 To Reproduce

I do not have any idea about it, the CPU pressure was not hight.

aiohttp == 3.7.3
gunicorn == 20.0.4
uvloop == 0.14.0

📋 Your version of the Python

Python == 3.6.9

📋 Your version of the aiohttp/yarl/multidict distributions

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.7.3
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: fafhrd91@gmail.com
License: Apache 2
Location: /opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages
Requires: multidict, typing-extensions, idna-ssl, chardet, attrs, yarl, async-timeout
Required-by: aiohttp-devtools
$ python -m pip show multidict
Name: multidict
Version: 4.7.6
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages
Requires:
Required-by: yarl, aiohttp
$ python -m pip show yarl
Name: yarl
Version: 1.5.1
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages
Requires: multidict, typing-extensions, idna
Required-by: aiohttp
@webknjaz
Copy link
Member

At glance, this looks like it may be a duplicate of #5220

@webknjaz webknjaz added the reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR label Nov 22, 2020
@webknjaz
Copy link
Member

@zhaoxuan this needs a reproducer. Without it, we'll have no idea how to trigger this behavior and won't even be able to confirm whether it's a bug.

@asvetlov
Copy link
Member

Agree with @webknjaz
If a reproducer exists we can check it and figure out what's going on.
Otherwise, we should close the issue after some period and pray that #5220 fixes this problem too.

@zhaoxuan
Copy link
Author

@webknjaz it was weird, I added some log in aiohttp/web_protocol.py:314 but I got nothing, so how can I get the error message?

@webknjaz
Copy link
Member

@zhaoxuan we need to have a simple aiohttp app that reproduces the issue posted here. Even if it's a "hello world". The simpler the better, with no other deps than aiohttp.
Also, we need to know what HTTP query triggers this behavior.

P.S. You cannot log what feed_data() returns because it raises an exception meaning that the control flow doesn't get to return at all. And you cannot debug the pyx module as easily because it would need rebuilding.

@derlih
Copy link
Contributor

derlih commented Nov 24, 2020

@webknjaz may be we could add a debug level logging to feed_data() method to make debugging of such issues easier in the future?

@asvetlov
Copy link
Member

I'm curious to see the concrete proposal for debug logging.

@zhaoxuan
Copy link
Author

@webknjaz My app contains IO operation (MySQL Redis Memcache), so a simple aiohttp app could not reproduce this issue.

from aiohttp import web

from aiocache import Cache
from aiocache.serializers import PickleSerializer

async def test_handle(request: web.Request):
    text = 'test'
    mc = request.app['config']['memcache']

    value = await mc.get(text)

    if bool(value):
        print("in memcache: {}".format(value))
    else:
        await mc.set(text, '123', 60)

    return web.Response(text=text)

app = web.Application()
app['config'] = {}

mc = Cache(Cache.MEMCACHED, serializer=PickleSerializer(), endpoint='127.0.0.1', port=11211, pool_size=64)
app['config']['memcache'] = mc

app.router.add_get('/test', test_handle)

if __name__ == '__main__':
    web.run_app(app)

I am not sure it would work.

@zhaoxuan
Copy link
Author

zhaoxuan commented Dec 8, 2020

There are still unresolved issues for this error.

[2020-12-08 14:47:30 +0800] [36619] [ERROR] Error handling request
Traceback (most recent call last):
  File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadHttpMessage: 400, message='invalid character in header'
[2020-12-08 14:47:30 +0800] [36602] [ERROR] Error handling request
Traceback (most recent call last):
  File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadHttpMessage: 400, message='invalid character in header'
[2020-12-08 14:47:30 +0800] [36608] [ERROR] Error handling request
Traceback (most recent call last):
  File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadHttpMessage: 400, message='invalid character in header'

@webknjaz
Copy link
Member

@zhaoxuan we need a reproducer without aiocache.

@slavkoja
Copy link

I do not know if it is related, but i got the same error, with the public server. I want to download all images in process of MyHeritage family tree migration. after i extract all URL i want to use aiohttp client to download them. Here is simplified example with only one (real) URL:

import asyncio

import aiohttp

async def adown(session, url):
    async with session.get(url) as resp:
        if resp.status == 200:
            await response.read()

    return url, destf

async def main(urls):
    tasks = []
    async with aiohttp.ClientSession() as http:
        for url in urls:
            tasks.append(asyncio.create_task(adown(http, url)))

        for result in await asyncio.gather(*tasks, return_exceptions=True):
            print(result, type(result))

URL = "https://www.myheritageimages.com/P/storage/site181607832/files/50/10/16/501016_030952g968apad0ff5615z.jpg"

asyncio.run(main([URL]))

With result (original in one line):

python3 "aioheritage.py"
400, message='invalid character in header', 
    url=URL('https://www.myheritageimages.com/P/storage/site181607832/files/50/10/16/501016_030952g968apad0ff5615z.jpg')
    <class 'aiohttp.client_exceptions.ClientResponseError'>

I can download all these URL with wget, it reports these headers (for this one):

wget -qSO /dev/null https://www.myheritageimages.com/P/storage/site181607832/files/50/10/16/501016_030952g968apad0ff5615z.jpg
  HTTP/1.1 200 OK
  Etag: "8389ecdb6-20beb-52ab876d5a8f3"
  Last-Modified: Mon, 01 Feb 2016 17:10:39 GMT
  Content-Type: image/jpeg
  Content-Length: 127209
  Cache-Control: max-age=315261854, public
  Expires: Tue, 18 Mar 2031 08:44:33 GMT
  Date: Sun, 21 Mar 2021 12:00:19 GMT
  Set-Cookie: visid_incap_319037=T+vq1wp2TdO9gAVTYLyL+FM1V2AAAAAAQUIPAAAAAABtOQePeUjSDNNJKchrmEJ/; expires=Sun, 20 Mar 2022 21:39:11 GMT; HttpOnly; path=/; Domain=.myheritageimages.com
  Set-Cookie: incap_ses_727_319037=o0vJdI0mfF9lb7jP8tIWClM1V2AAAAAAdJOjF9EvO1s6QdSnjUNI4Q==; path=/; Domain=.myheritageimages.com
  Set-Cookie: ___utmvmzyufoSw=GntjHtylUBq; path=/; Max-Age=900
  Set-Cookie: ___utmvazyufoSw=pKt\001hMfL; path=/; Max-Age=900
  Set-Cookie: ___utmvbzyufoSw=tZg      XdgOSalF: KtB; path=/; Max-Age=900
  X-CDN: Imperva
  X-Iinfo: 11-98177944-0 0CNN RT(1616328018934 107) q(0 -1 -1 -1) r(0 -1)

Debian testing, Python 3.9.2, aiohttp 3.7.4

@yotamgod

This comment has been minimized.

@ppmdo

This comment has been minimized.

@kolamor
Copy link

kolamor commented May 30, 2021

I had such when header \x00Date: ...
httpx
httpx.RemoteProtocolError: illegal header line: bytearray(b'\x00Date: Sun, 30 May 2021 18:54:19 GMT')

@ppmdo
Copy link

ppmdo commented May 31, 2021

Can confirm that the solution stated here works: #5355 (comment)

Basically:
set AIOHTTP_NO_EXTENSIONS=1

Seems that the underlying C http parsing library might be broken.

@yotamgod
Copy link

Can confirm that the solution stated here works: #5355 (comment)

Basically:
set AIOHTTP_NO_EXTENSIONS=1

Seems that the underlying C http parsing library might be broken.

I've seen this solution too and tried looking into it.
However, I haven't been able to find what other extensions are disabled, and what functionality is lost by doing so.
Personally I was using aiohttp for production, and this is not an option I can properly consider without knowing exactly what the implications are.
Anyone know?

@ppmdo
Copy link

ppmdo commented May 31, 2021

As far as I can tell, no functionality is lost. Nonetheless, performance is degraded as mentioned here: https://github.com/aio-libs/aiohttp/blob/09ac1cbb4c07b4b3e1872374808512c9bc52f2ab/CHANGES/3828.feature

Using super dumb timeit tests I've seen a drop of about 30% in performance.

import asyncio
import os

os.environ.setdefault('AIOHTTP_NO_EXTENSIONS', '0')

from aiohttp import ClientSession


url = 'https://www.java.com/'

async def dorequest(url):
    async with ClientSession() as session:
        async with session.get(url) as response:
            return await response.text()

def main():
    result = asyncio.run(dorequest(url))

if __name__ == '__main__':
    import timeit
    print(timeit.timeit("main()", setup="from __main__ import main", number=150))

@g66-gopanear
Copy link

g66-gopanear commented Jun 14, 2021

Can confirm that the solution stated here works: #5355 (comment)

Basically:
set AIOHTTP_NO_EXTENSIONS=1

Seems that the underlying C http parsing library might be broken.

This works for me if I set is as a environment variable. However, It does not matter whether I set AIOHTTP_NO_EXTENSIONS=1 or AIOHTTP_NO_EXTENSIONS=0 even if AIOHTTP_NO_EXTENSIONS=whatever text, it still works only if I have it as an environment variable

@webknjaz
Copy link
Member

Using super dumb timeit tests I've seen a drop of about 30% in performance.

HTTP parser is the bottleneck in web apps and so it's implemented as an optional C-extension most of which is coming from http_parser — a lib that used to be the default in NodeJS. It is now deprecated and should be replaced by its successor lhttp which is not yet implemented (#5364).

This is the reason for the performance drop.

@Wolfhound905
Copy link

Has anyone found a workaround for this? I am making a request to an API I myself can not fix. But this works fine with requests lib. How could the "invalid character in header" be fixed?

@asvetlov
Copy link
Member

Set AIOHTTP_NO_EXTENSIONS=1

@slavkoja
Copy link

I read somewhere, that it can leads to 30% performance drop, and this is significant drop, thus not solution at all, only workaround for simple cases.

@asvetlov
Copy link
Member

Man, you have a broken HTTP server. Slow processing is better than nothing in such case, isn't it?

@Dreamsorcerer
Copy link
Member

Dreamsorcerer commented Aug 8, 2022

Encountered this at work. There are a few things which make this particularly frustrating.

First, the 400 in the exception has nothing to do with the response, and is confusing, thinking that the server sent a 400 HTTP response, when it was actually a 200 response.

Second, I had to hack aiohttp to even be able to print out the raw response which was causing the issue.

Third, even after getting the raw response, it was very difficult to find what was actually wrong in the response. After quite some time, I found a Set-Cookie header with a \x01 byte in the middle of it.

Figuring out this information should be much quicker, so we can report the problems to the server owner.

For servers where this is not possible, there should probably also be a way to discard any headers with invalid characters and continue parsing the rest of the response.

I think changing the 400 output of the error message should be trivially fixed by us, but everything else may need some changes in llhttp.

@pallas
Copy link

pallas commented Aug 8, 2022

Are you setting the lenient headers flag?

@Dreamsorcerer
Copy link
Member

I'm rather short on time, so if anybody wants to pick this up and try setting the lenient option, that would be great.
nodejs/llhttp#174 (comment)

@EnriqueOrtiz27
Copy link

I simply want to share that adding AIOHTTP_NO_EXTENSIONS=1 as an environment variable before running the code also fixed this issue for me. Thanks.

@Dreamsorcerer
Copy link
Member

I simply want to share that adding AIOHTTP_NO_EXTENSIONS=1 as an environment variable before running the code also fixed this issue for me. Thanks.

Because it then doesn't use llhttp, which means it is a lot slower. So, not recommended for production.

@Dreamsorcerer
Copy link
Member

I've added some information to the exceptions to highlight what character is causing a parse error (in 3.8.5).
However, we won't be enabling lenient headers currently as this will reintroduce the GHSA-45c4-8wx5-qw6w vulnerability. I'll follow this up with llhttp, and if the situation changes we'll make any recommended changes. Currently, it appears there is nothing we can do.

@Dreamsorcerer
Copy link
Member

After reviewing this again, it should be safe to enable lenient headers in the response parser (i.e. client side only). This should be enabled in the next release (#7490). The parser will revert to being strict when using dev mode (python -X dev), which should help developers detect mistakes in their servers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR
Projects
None yet
Development

No branches or pull requests