feat: customizeable content type parsing in @helia/verified-fetch #422

SgtPooki · 2024-02-06T20:06:47Z

Goals

keep bundle size small
Provide some content-type recognition for VERY COMMON use-cases (see below)
allow overriding of content-type parsing for more complicated consumer scenarios.

Initial design idea

Remove dependency on mime-types, don't depend on file-type

some interface such as createVerifiedFetch(helia, { contentTypeParser: (bytes) => myFn}) and we provide a default contentTypeParser that determines content type for the below list only.

We would pass the contentTypeParser function the first block of bytes we receive; and because most of our blocks are 1MB or below, we can safely assume the majority of content types users need to recognize can be determined by looking at those first 1MB of bytes.

If content-type is not a recognized type from the below list, we do not set it (allows browser sniffing).

Supported content types

image/jpg
image/png
video
tar
[TODO: which types should we support by default]?

References

https://en.wikipedia.org/wiki/List_of_file_signatures

cc @achingbrain @aschmahmann @lidel @2color

The text was updated successfully, but these errors were encountered:

SgtPooki · 2024-02-06T20:08:58Z

FYI that file-type is fairly small compared to the entirety of @helia/verified-fetch (currently totals 560.2kb) at only 26.7kb:

lidel · 2024-02-06T21:17:55Z

Providing a way to pass custom content type sniffer sounds sensible, but will be a very niche feature request if your default is something comprehenbsive like file-type with magic bytes sniffing.

I think the question we could ask is when is content-type relevant:

If we use verified-fetch in JS the same way as fetch, the content-type header won't matter.
End user will use .json(), .text(), .blob() etc themselves.
If we use verified-fetch in service worker for web gateway implementation, then we pass response to browser renderer directly, and returned content-type matters. In this case hard-coding a few content types won't be enough anyway, and the user wants something more future-proof, like file-type.

That is to say, I think it is sensible to either:

go with file-type everywhere (avoid maintaining "minimal list of types we support", ~5% bundle size increase does not sound like a lot when compared to UX/DX of content-type being taken care of)
OR skip setting content-type by default entierely, and only use it gateway contexts, in which you use contentTypeParser to pass file-type that does the comprehensive magic bytes sniffing.

achingbrain · 2024-02-07T08:41:17Z

My feeling is that if we don't need to do content type sniffing then let's not do it.

If we need to do it, we should do the minimum required (e.g. just support detecting a small subset of content types) and provide an extension mechanism for more comprehensive detection.

Given that we're billing this as fetch-like, most people will just do .json(), .blob(), etc and get on with things which suggests that we don't need to detect content types - we just try to process the data as the requested type and fail loudly if we can't.

There are valid use-cases for content detection though (e.g. service worker gateway) so allowing users to configure a mime type sniffer if they need it seems like a good compromise.

2color · 2024-02-07T11:29:00Z

Mostly agree with @lidel and @achingbrain, though I don't have a strong inclination either way.

If we don't include magic-byte sniffing by default, it should be as easy as possible to configure so it works smoothly in service workers.

SgtPooki · 2024-02-07T16:27:49Z

Sounds good. Ill get a PR out today that will not do content-type unless passed a function for it

fixes #422

* adds `contentTypeParser` function to createVerifiedFetch options & implements it. * renamed `getStreamAndContentType` to `getStreamFromAsyncIterable` that now returns a stream with the firstChunk seen, so we can pass it to the `contentTypeParser` function. * updates tests in packages/verified-fetch & packages/interop * updates packageDocumentation with example Related #416 Fixes #422 --------- Co-authored-by: achingbrain <alex@achingbrain.net>

* adds `contentTypeParser` function to createVerifiedFetch options & implements it. * renamed `getStreamAndContentType` to `getStreamFromAsyncIterable` that now returns a stream with the firstChunk seen, so we can pass it to the `contentTypeParser` function. * updates tests in packages/verified-fetch & packages/interop * updates packageDocumentation with example Related ipfs/helia#416 Fixes ipfs/helia#422 --------- Co-authored-by: achingbrain <alex@achingbrain.net>

SgtPooki self-assigned this Feb 6, 2024

SgtPooki mentioned this issue Feb 7, 2024

feat: also check content-type with @sgtpooki/file-type #416

Closed

SgtPooki added a commit that referenced this issue Feb 7, 2024

feat: require content-type parser to set content-type

c4f3355

fixes #422

SgtPooki linked a pull request Feb 7, 2024 that will close this issue

feat: require content-type parser to set content-type #423

Merged

3 tasks

SgtPooki mentioned this issue Feb 7, 2024

feat: require content-type parser to set content-type #423

Merged

3 tasks

achingbrain closed this as completed in #423 Feb 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: customizeable content type parsing in @helia/verified-fetch #422

feat: customizeable content type parsing in @helia/verified-fetch #422

SgtPooki commented Feb 6, 2024

SgtPooki commented Feb 6, 2024 •

edited

Loading

lidel commented Feb 6, 2024

achingbrain commented Feb 7, 2024

2color commented Feb 7, 2024

SgtPooki commented Feb 7, 2024

feat: customizeable content type parsing in @helia/verified-fetch #422

feat: customizeable content type parsing in @helia/verified-fetch #422

Comments

SgtPooki commented Feb 6, 2024

Goals

Initial design idea

Supported content types

References

SgtPooki commented Feb 6, 2024 • edited Loading

lidel commented Feb 6, 2024

achingbrain commented Feb 7, 2024

2color commented Feb 7, 2024

SgtPooki commented Feb 7, 2024

SgtPooki commented Feb 6, 2024 •

edited

Loading