Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codecs for ENS Contenthash: URI [0xF2] and Data URL [0xF3] #353

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

adraffy
Copy link

@adraffy adraffy commented Jun 10, 2024

ENS (Ethereum Name Service) encodes contenthash() using multicodec. The purpose of a contenthash() is to describe the web contents for a corresponding ENS name.

Currently, ENS supports IPFS, IPNS, Swarm, Arweave, Onion, etc.

Example using IPFS:

We would like to support the following (2) new codecs:

  • 0xF2URI
    • Encoded: 0xf268747470733a2f2f656e732e646f6d61696e732f
    • Format: <codec><uri: utf8-string>
    • Decoded: https://ens.domains/
  • 0xF3Data URL
    • Encoded: 0xf309746578742f68746d6c3c68746d6c3e68656c6c6f3c2f68746d6c3e
    • Format: <codec><len(mime): uint8><mime: ascii-string><data: uint8[]>
    • Decoded:
      • mime = text/html (9ch)
      • data = <html>hello</html> (encoding depends on mime)

@adraffy adraffy requested review from rvagg and vmx as code owners June 10, 2024 23:53
@rvagg
Copy link
Member

rvagg commented Jun 11, 2024

I think this seems reasonable, though novel. I'm not so sure about introducing a new tag, data for this though. Would namespace as well for that be OK? Even that doesn't map super cleanly onto what you're doing here.

Do you think you'll want more of these into the future? I wonder if we can't figure out a better tag whether this should just be an entirely new classification.

@vmx, what do you think?

@vmx
Copy link
Member

vmx commented Jun 11, 2024

I wonder if URI could use a Multiaddress instead. Would that be an option (I know to little about the Eth/ENS ecosystem).

@adraffy
Copy link
Author

adraffy commented Jun 11, 2024

namespace works. I'd be happy to change it to whatever you suggest.

IMO, the closest codec is json which oddly uses tag:ipld.

I picked tag:data as unlike most codecs, data-uri is both a codec and the data itself.


I think tag:multiaddr for uri suggests too much internal encoding, as we want something maximally general (a literal UTF-8 string) where the content is ultimately validated by the client (since URL standards are ever-evolving)

@vmx
Copy link
Member

vmx commented Jun 11, 2024

I think tag:multiaddr for uri suggests too much internal encoding, as we want something maximally general (a literal UTF-8 string) where the content is ultimately validated by the client (since URL standards are ever-evolving)

Keeping it simple makes sense.

@aschmahmann
Copy link

IIUC this is related to https://discuss.ens.domains/t/draft-ensip-17-datauri-format-in-contenthash/18048/28 and ensdomains/docs#165.

Apologies for the long text, I'm going to be OOO for a couple days and wanted to make sure to leave some context. cc @lidel who has been involved in the ENS work and interop here since long before me 😅.

TLDR:

  • As per usual, unless a codec makes very little sense, is duplicative, or seems to trivially open the door for a whole bunch more codecs I'm generally +1 on applications - although sometimes I recommend moving to a higher byte range in the table
  • DataURL in particular seems like something the ENS and IPFS communities could work on, if the mime-type issue is causing them big enough problems that identity raw CIDs are insufficient then it seems like that'll happen for data that's too big to reasonably use a DataURL for.

Some thoughts:

URI

I wonder if URI could use a Multiaddress instead

Probably not multiaddress itself, but harmonization with something like multipath multiformats/multiformats#55 would likely make this work and be pretty sensible. It would likely also let us use the 0x2f as an escape hatch for people generally wanting to use/experiment with strings rather than code numbers which is what this roughly does (otherwise, the codes like for http could potentially be used instead).

FWIW libp2p has recently proposed going the other way as well (i.e. representing multiaddrs as URIs multiformats/multiaddr#171).

I don't in principle have an objection to a URI based namespace, the two byte range is probably fine although URIs could probably tolerate even three due to the size of the data.

Perhaps more of an ENS-related comment, but want to call out:

  1. There is some redundancy here because for any namespace (IPFS, Swarm, etc.) you could encode under the URI namespace or under their individual namespace. Not necessarily a big issue here, but certainly a change implementations will need to take care of
  2. Related to ^ it seems like this could have always been the case, I'm not sure the historical context here but probably worth validating with folks who did this in earlier rounds that this makes sense. Totally fair + reasonable to say we want to save some bytes with known namespaces and then have the utf-8 URI escape hatch (although I don't know if "contenthash" is a reasonable name for this kind of thing 🙃).

Data URL

Seems fine, although maybe the three byte range (along with arweave, skynet, etc.) makes more sense here given these will likely be larger anyhow.

A few comments / thoughts:

  1. Given the above technically this already works as a Data URI, right https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs, right? If so, I assume the idea is to preserve space by not needing to do base64 encoding.
  2. While saving some bytes here seems fine. This seems non-optimal in that it both isn't as compact as it could be (e.g. mime types are still expressed in text), not flexible enough to include any other metadata, and we couldn't work around it within the existing namespaces (the URI namespace adds a sort of escape hatch here as long as you assume names won't collide).
  • In my bias as someone who works on the IPFS project IMO this could've/should've been resolved by having the tooling for this either in IPFS (either in UnixFS, CID, or another IPLD format), and this seems like as good a time as any to resolve it independently of what happens in this PR (although it may justify bumping to 3 bytes)
    • A CID with the identity multihash and raw codec (or sometimes codecs like JSON or CBOR) would've been sufficient except for the need of a mimetype
    • Technically this could be resolved in a few different ways, one is Manifest files for unixfs loaded via HTTP Gateway ipfs/specs#257, note: the latest request here came from the ENS community as well so definitely seems like a good opportunity to chat anyhow
    • Given the very large number of ENS contenthash records that are IPFS-based this seems like something we could/should fix or the hack within ENS (whether in ENS or the "contenthash" namespace could fix either)
    • I understand this isn't really the place for an ENSIP comment and with my "multiformats hat" I don't have objection, but if you want to chat would definitely be happy to

@0xc0de4c0ffee
Copy link

IMO, the closest codec is json which oddly uses tag:ipld.

everything is IPLD 😄

🙏 everyone, I'm one of author of that data:uri ENSIP draft proposal, https://discuss.ens.domains/t/draft-ensip-17-datauri-format-in-contenthash/18048 using simple namespace hex("data:") format.

We did our homework before sending draft over ENS forum to make an exception for hex("data:") prefix for reasons below..

a) mime/content type support in cidv1 is pending for loong time (?wen cidv2?)

#159
#4

b) ENS already supports string(data:uri) format in avatar records,
so contenthash with plaintext bytes(data:uri) as hex("data:") namespace is full RFC2397 & it won't collide with cidv1 namespaces.
https://datatracker.ietf.org/doc/html/rfc2397

if(contenthash.startsWith("e301")){
    //ipfs
} else if(contenthash.startsWith("e501")){
    //ipns
}
// else... other contenthash namespaces...
else if(contenthash.startsWith(hex("data:"))){
    //datauri
}

ENS is not ready for such changes with new ENSIP specs, all contenthash MUST follow namespace+CIDv1 format.
&& we're back to square one, using raw data in cidv1 with IPFS namespace.

our current working specs for on-chain raw IPFS+CIDv1 generator without content/mime types..

import { encode, decode } from "@ensdomains/content-hash";
import { CID } from 'multiformats/cid'
import { identity } from 'multiformats/hashes/identity'
//import * as cbor from '@ipld/dag-cbor'
import * as json from 'multiformats/codecs/json'
import * as raw from 'multiformats/codecs/raw'
const utf8 = new TextEncoder()

const json_data = {"hello":"world"}
const json_cid = CID.create(1, json.code, identity.digest(json.encode(json_data)))
const html_data = "<h1>Hello World</h1>";
const html_cid = CID.create(1, raw.code, identity.digest(utf8.encode(html_data)))

This all works ok using json/raw data.. only down side, there's no content/type in CIDv1 so we've to parse/guess magic bytes in raw data on client side OR request ipfs gateways to resolve that.

we can even use dag-cbor to link multiple files/ipfs cids.. but on public ipfs gateways there's no index file and ipfs __redirect supported. we've to happily decode that on our "smart" clients for now.

const blog = CID.parse("bafybeidnycldkehcy6xixzqg72vad6pitav4lk5np3ev6tr6titlkvfpvi")
let link = { json: json_cid, "/": html_cid, "index.html": html_cid, blog: blog }
let cbor_link = CID.create(1, cbor.code, identity.digest(cbor.encode(link)))

Back to @adraffy's f3 namespace, I'd suggest this format..

const data_uri = "data:text/html,<html>hello</html>";
const data_cid = CID.create(1, raw.code, identity.digest(utf8.encode(data_uri)))
  • RAW CIDv1 with full data uri string : 01550021646174613a746578742f68746d6c2c3c68746d6c3e68656c6c6f3c2f68746d6c3e

01 - 55 - 00 - 21 - 646174613a746578742f68746d6c2c3c68746d6c3e68656c6c6f3c2f68746d6c3e
v1 - codec/raw - hash/none - varint.encode(datauri.length) - utf8 datauri
https://ipfs.io/ipfs/bafkqailemf2gcotumv4hil3iorwwylb4nb2g23b6nbswy3dphqxwq5dnnq7a
ENS contenthash with data-uri "f3" namespace :
0xf30101550021646174613a746578742f68746d6c2c3c68746d6c3e68656c6c6f3c2f68746d6c3e

@adraffy
Copy link
Author

adraffy commented Jun 12, 2024

@aschmahmann and @0xc0de4c0ffee thanks for the feedback.

As for codec numbers, I'd be happy with any assignment. Initially picked lower numbers since these two codecs seem useful beyond ENS.

Yes, you could put both ipfs://... and data:... into uri however there is a difference w/r/t how they are handled and interpreted. These details were not included as they are ENS application-specific, but possibly the codec names should reflect that, eg. Redirect URI.

From the ENS + web content perspective:

  • the intention of ipfs is that the content is on IPFS and the server would know how to decipher the CID and serve directory-like dags from a single root hash using whatever IPFS gateway (likely their own node) to fetch the content
  • the intention of url is that the server would blindly HTTP 307 with no processing
    • for ENS/identity, the original address (https://raffy.eth.limo/) would disappear
    • for many browsers, ipfs: would fail without a specific handler for that scheme
    • https://ipfs.io/ipfs/... would work but force an explicit gateway
    • typical use-case: redirect to an existing web2 website
    • alternative use-case: redirect to a custom URL scheme, eg. itms-apps:, spotify:, etc.
    • inefficient but valid use-case: redirect to an (base64-encoded) inline asset, eg. an image
  • the intention of data-uri is that the server would serve the content as a static file
    • for ENS/identity, the original URL would be preserved as well as the path/query/fragment
    • eg. text/html with an embedded <script> can parse the window.location
    • eg. application/pdf with #page=7 can jump

You are correct about the base64 overhead concern, but there is also URL length limits (vs body)

Coffee, I put your response on ENS forum

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants