-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Gateway] Content-Encoding: gzip and Content-Type: text/html #7268
Comments
I missed one important point in my reasoning. |
Most of the strangeness you're noticing here are quirks of content-type detection. We:
A reasonable extension to the gateway would be to serve Would that cover your use-case? If so, would you be willing to implement it? |
@Stebalien Summary Current behaviour as expected: https://ipfs.infura.io/ipfs/QmQ2x72Nw9oDhrPckfdbbjBEc6WiB3gqrnRZqmqxHMdmVS https://ipfs.bluelightav.org/ipfs/QmeeqFYbLabqZA2KjmFTCRfAVpv4kjgRHNistw63V6Jp4X?filename=index.html.gz Current Behaviour: The browser displays the gz content (pretty ugly) Expected Behaviour: 2 - Not sure to understand what you mean by :
I don't expect to have two files (index.html and index.html.gz) as required with nginx in gzip static mode. 3 - I like the idea when the the Accept-Encoding: gzip, deflate, br do not contain gzip to inflate on the fly the content (as long as the content-type detection is application/gzip or application-x-gzip) 4 - I see this enhancement as a feature rather than a project use case. It will probably benefit a lot of users. @hsanjuan made a comment about a wider discussion on this topic. 5 - My help will be very limited on this issue as I'm not a Go developper. However you could expect feedback and testing. Warmly |
The current behavior is correct (ish). We first use the filename to detect the content type, only falling back on the file content if the filename is ambiguous.
Currently, when the user visits I'm suggesting that we alternatively serve an
Are you're talking about auto-compressing responses if the user-agent specifies an If you'd like to add a The However, if the user asks to download some The general solution is to compress at the edges:
To avoid wasting CPU cycles, we can probably play some neat tricks to avoid ever having to re-compress data but that can be done as an optimization later. |
@Stebalien Let me think about that. |
Blocks are just serialized IPLD objects. |
It seems correct but I'm little bit confused with your wording as I don't use directories. But the general idea is when a content-type is application/x-gzip. This content is either directly served if the Request Header has an Accept-Encoding: gzip. Otherwise fallback to decompress and serve the content. However to mimic the current browser behaviour a filename=index.html is required. |
That's not what I'm trying to say. If we're serving a file, we need to serve the file as-is for the reasons I listed in my response to your point 4. I'm saying that in the special case where we're serving a directory, we could consider automatically decompressing and serving a |
If the user asks for foo.json.gz but the user accepts I would not go the lengths of gzipping things for users as this does not really kill the need of removing nginx on production gateways and nginx does it anyways. Otherwise we are talking mostly about local usage for which it does not help too much. |
If we did that, we'd also have to set |
There is no need to do decompress. It’s publisher responsibility to choose formats understood by his expected readers. |
@hsn10 no the gateway is a web server and should act as a web server who fits in an infrastructure. (proxy, ssl, cache, etc...) nginx is able to compress and uncompress a content. As a user when I upload a compressed HTML content I expect in return with the filename trick (or no filename trick if we consider html the default) to receive the appropriate response-headers to let the browser do its decompress job. We cannot simply rely on the proxy to do the compression job. Right now infura is not compressing while gateway.ipfs.io do. Some improvement needs to be done at the gateway level. I opened a ticket @ infura and invited them as a public infrastructure provider to participate in this thread. INFURA/infura#200 The question now are the points raised by @Stebalien. I see two scenarii. In my current use case I upload a buffer to ipfs with the js-ipfs-http-client, no filename is involved. I work in block mode. The other use case is when using the file API. As @Stebalien suggested we could imagine to have two files (typical nginx use case). I'm in the first scenario as I want to improve the upload/download network traffic and let the browser decompress, I do not intend to upload two files. A question here is when a client is unable to decompress as specified in its Request-Header: accept-encoding: gzip, deflate, br a compressed content and how the gateway should behave. It also opens the various compression algorithm question support (brotli, etc...) I made also some quick tests with compressed json content through my nginx and I confirm to have received a Transfer-Encoding response however I'm not really familiar with this header. From my perspective it is less critical mainly because my json traffic is under my application control. This is not the case for HTML as usually either through ipfs, ipns, ens or dns-link a user input an address or use a bookmark to load a content (for sure this content could also be controlled at the application level, as I do) and we cannot expect from them to also add a filename to trick a compressed content. The default in a web server is usually index.html. If we can store compressed content and let the browser uncompress the content we ease the network traffic, we could also cache smaller content in nginx as we are dealing with immutable content. I agree that there are some edge case who need to be addressed (non compressed client support) with an uploaded compressed content, the other way is also a valid use case. Uploaded non compressed content and the client is able to decompress). I would also suggest to focus on a minimal use case. It sounds to to me that enlarging the conversation to all the mime-types is probably prematured. Could we agree first to the need of a compressed uploaded html content. It sound to me the minimal use case ? Thanks |
@xmaysonnave what is your goal?
Side note: please try to be brief and use ample punctuation and bullet points. |
@Stebalien my initial motivation is the first point but the other points are perfectly valid when compressed html content are uploaded. |
In that case, the solution here really is to to use an nginx reverse proxy. The gateway is not a full-featured HTTP server; it implements the bare minimum.
If you're running a public go-ipfs gateway, you'll always have an nginx reverse proxy (for load balancing and caching if nothing else). If you're running go-ipfs on a personal computer, you won't want compression between your local browser and your local go-ipfs node. |
Who is supposed to setup the proper content-type and content-encoding if not the gateway? |
Both. Given an uncompressed
go-ipfs already does this correctly. |
Thanks for your detailed behaviour description with uncompressed html content. It works well I agree. What about compressed html content stored on ipfs ? |
Not unless there's a really good motivation. The correct solution is the one I posted at the bottom of #7268 (comment). That is:
The consensus on compressing content pre-hashing in IPFS is that it's not the way to go. See ipld/specs#76 for a long discussion but the TL;DR is:
On the other hand, if compression is done on the network between two peers and on disk:
|
@Stebalien Thanks for the references. |
Dear Friends,
Nginx when configured is able to send either static or on the fly gzip content.
The received Response-Header when one request an hypothetical https://example.org web page (assuming your web site default is index.html) is :
My Chrome browser is able to inflate and process the web page.
I uploaded an index.html.gz to my local ipfs server and made the following curl tests :
curl -X HEAD -I http://127.0.0.1:8080/ipfs/QmRhkAAucjWCdZmYVMgBf6oYEBQD9pWrMfKGHcgmMveHE2
Content-Type: application/x-gzip
curl -X HEAD -I curl -X HEAD -I http://127.0.0.1:8080/ipfs/QmRhkAAucjWCdZmYVMgBf6oYEBQD9pWrMfKGHcgmMveHE2\?filename\=index.html.gz
Content-Type: application/gzip
First Point -> You can noticed that the Content-Type is not consistent.
curl -X HEAD -I http://127.0.0.1:8080/ipfs/QmRhkAAucjWCdZmYVMgBf6oYEBQD9pWrMfKGHcgmMveHE2\?filename\=index.html
Content-Type: text/html
This test mimic what you usually expect when you request an index.html from a web server. While the content is zipped you didn't receive the proper
Content-Encoding: gzip
Nginx has a particular setup to achieve that. Either it's setup is meant to gunzip on the fly or is able to serve an already gunzipped resource if an index.html and an index.html.gz exist at the same directory level.
The gain is very significant in my situation as my index.html is 5.5MB while the index.html.gz is 1.8MB. I uploaded and pinned on infura an index.html.gz available here:
I made some curl tests on infura but it appears that their is some web front-end processing.
curl -X HEAD -I https://ipfs.infura.io/ipfs/QmQ2x72Nw9oDhrPckfdbbjBEc6WiB3gqrnRZqmqxHMdmVS\?filename\=index.html
I receive :
content-type: text/plain; charset=utf-8
curl -X HEAD -I https://ipfs.infura.io/ipfs/bafybeiadqvziczgdi4k5qgqyixg6q7b6yzzcwxxcupxd3fm2nwehfi6j2q\?filename=index.html
Even with a regular index.html page I receive:
content-type: text/plain; charset=utf-8
My browser do not complain and display the content as a web page.
The go-ipfs server is able to understand if a requested resource is gunzipped. However I'm considering whether or not this situation could be improved.
I would expect with the following curl :
curl -X HEAD -I http://127.0.0.1:8080/ipfs/QmRhkAAucjWCdZmYVMgBf6oYEBQD9pWrMfKGHcgmMveHE2\?filename\=index.html
to receive :
Browsers will then be able to properly inflate the content and process the web page. The way infura is processing/proxying the response header is another topic.
Thanks
The text was updated successfully, but these errors were encountered: