Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent Content-decoder error on compressed response from www.google.com #284

Closed
outoftime opened this issue Apr 15, 2015 · 9 comments

Comments

@outoftime
Copy link

We're observing an intermittent error (~40% of requests) when making a request to https://www.google.com with compressed responses accepted. Here's a reduction:

require 'em-synchrony'
require 'em-synchrony/em-http'

original_headers = {
  "User-Agent" => "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36",
  "Accept-Encoding"=>"gzip,compressed"
}

EventMachine.synchrony do
  error_count = 32.times.count do
    error = EventMachine::HttpRequest.new('https://www.google.com')
      .get(:head => original_headers)
      .error == 'Content-decoder error'
  end

  puts error_count
  EventMachine.stop
end

Note that the User-Agent header is needed to reproduce the problem; without it, the error is not in evidence.

When making requests to Google with curl, using the same headers, I haven't observed the problem, so I think the problem is specific to em-http-request or one of its dependencies.

@ayanko
Copy link

ayanko commented Jun 11, 2015

Have similar problem while working with https://www.googleapis.com/

Is it possible disable compression support in http-em-client?

@igrigorik
Copy link
Owner

@outoftime you're advertising "compressed" encoding, which em-http does not support and hence fails to decode when it receives it. Update your header to "Accept-Encoding"=>"gzip" and I believe that should fix it.

@ayanko yes, you can pass in :decoding => false to disable auto-decoding. See:

}, :decoding => false

@ayanko
Copy link

ayanko commented Jun 11, 2015

Thx!

So this 69 faraday adapter line should be corrected at least:

options[:head]['accept-encoding'] = 'gzip'

@igrigorik
Copy link
Owner

Closing, please feel free to reopen if there is more to do here.. It'd be nice to figure out why we get the error, but we need a reproducible test case.

@joanniclaborde
Copy link

This should be reopened. I tried the code snippet in the first comment with a single encoding at a time ("Accept-Encoding"=>"gzip", "Accept-Encoding"=>"compressed" and "Accept-Encoding"=>"deflate"), and there's definitely a problem with gzip.

The error is not always the same: Zlib::DataError: invalid code lengths set, Zlib::DataError: invalid stored block lengths, Zlib::DataError: invalid distance too far back, etc.

Also, em-http-request does support the compressed encoding.

@igrigorik igrigorik reopened this Feb 19, 2016
@genki
Copy link

genki commented Feb 24, 2016

I have encountered same problem.

@colinmorelli
Copy link

Echoing the concerns above. In my case, no additional headers are needed to see this behavior. Just a standard request to https://www.google.com/ triggers this problem with the error Zlib::DataError: invalid block type

@boxofrad
Copy link
Contributor

Hi! 👋

I also have this issue, and used git bisect to find that the problem was introduced with this commit (6270932).

Here's my reproduction script (based on @outoftime's):

require 'eventmachine'
require_relative 'lib/em-http-request'

100.times do
  EventMachine.run do
    request = EventMachine::HttpRequest.new('https://www.googleapis.com/')
      .get(head: { 'Accept-Encoding' => 'gzip' })

    request.callback { EventMachine.stop }

    request.errback do
      exit! if request.error == 'Content-decoder error'
      EventMachine.stop
    end
  end
end

Here's my git bisect command:

$ git bisect start HEAD v1.0.2
$ git bisect run $SHELL -c "rm -rf vendor .bundle Gemfile.lock && bundle install --without development && bundle exec ruby elgoog.rb"

Here's the output:

6270932b9bbcd88f2ffd8472be26f79117779695 is the first bad commit
commit 6270932b9bbcd88f2ffd8472be26f79117779695
Author: Martin Ottenwaelter <martin.ottenwaelter@gmail.com>
Date:   Fri Nov 16 19:05:36 2012 +0100

    manually extract the stream from gzip files to fix #204

:040000 040000 8e43d196aaac13327a39571a9564bc9925b7d153 ac5ee32f99b6cbe71ff2d3b540bf913eef483e6b M      lib
:040000 040000 331b78ef3247f15ae362596dc40e877c8964a4fe a3c8ec13c1fb77aefa7dfcbab48f2af5019e7994 M      spec

Hope that helps! I'll do some more investigation myself - but thought sharing my findings may help somebody else to diagnose the problem further.

@boxofrad
Copy link
Contributor

I've investigated this a little further and found that it seems to be related to the way that the stream of bytes is delivered to the gzip decoder.

To find this I captured the chunks of data from the gzipped response body of a GET request to https://www.googleapis.com/ ("Not Found") in both the successful and error cases.

Here's a script that demonstrates the issue:

require 'eventmachine'
require_relative 'lib/em-http/decoders'

GOOD = ["\x1F", "\x8B", "\b", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\xF3", "\xCB", "/", "Q", "p\xCB/\xCDK\x01\x00M\x8Ct\xB1\t\x00\x00\x00"]
BAD  = ["\x1F", "\x8B", "\b", "\x00", "\x00", "\x00", "\x00", "\x00\x00\x00\xF3\xCB/Qp\xCB/\xCDK\x01\x00M\x8Ct\xB1\t\x00\x00\x00"]

def decode(stream)
  decoder = EventMachine::HttpDecoders::GZip.new {}
  stream.each { |bytes| decoder << bytes }
end

puts 'Good stream - will succeed'
decode GOOD

puts 'Bad stream - will error'
decode BAD

boxofrad added a commit to geckoboard/em-http-request that referenced this issue Jun 30, 2016
This fixes an issue where a gzip stream chunked into smaller pieces
would result in a buffer overrun when decoding the headers.

Previously the position where the headers end in `compressed` was
determined by the delta of `@pos` before and after reading the
headers. This didn't take into account the fact that `@data` may already
contain data, but not enough to have advanced `@pos` on previous
iterations.
igrigorik added a commit that referenced this issue Jun 30, 2016
Prevent data corruption when decoding gzip stream, fixes #284
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants